[06:47:13] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Operations: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10Joe) >>! In T210717#4860057, @EBernhardson wrote: > Even more generally, it we install a reverse proxy for local TLS connection poolin... [06:48:07] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Operations: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10Joe) I think I have a decent idea of how to implement a basic version of what we want via nginx. I'll work on it this week hopefully. [08:16:51] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Operations: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10dcausse) Just one thought as I discovered this last week. A non negligible time spent by curl is by reading `/etc/ssl/certs/ca-certifi... [08:26:22] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Operations: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10Joe) >>! In T210717#4861725, @dcausse wrote: > I think this explain why we've seen a +15ms when I broke connection pooling (T212768).... [09:23:22] fsero: _joe_ I started the migration of the OLD zotero LVS IPs to the new kubernetes based infra [09:23:32] mobrovac: ^ [09:23:36] <_joe_> akosiaris: cool [09:23:57] that does entail a new helm release, but that's easy and already done [09:24:07] while at it I consulted https://grafana.wikimedia.org/d/000000620/xxxx-zotero-debugging-kubernetes?orgId=1&from=now-30d&to=now [09:24:09] nice [09:24:24] <_joe_> akosiaris: why a release? [09:24:30] and trimmed down the memory limits from 4Gi to 2Gi and the requests from 4Gi to 300Mi [09:24:39] <_joe_> does the deployment need to know the LVS IPs? [09:24:41] should allow the scheduler to do way better placements [09:24:46] yes, exactly that [09:26:41] great, shall we reduce the number of replicas as well? [09:29:07] makes sense I guess. The old service had the exorbitant amount of 2, but it was based on xulrunner, not node [09:29:28] I am not sure how many concurrent requests it was able of servicing tbh [09:29:50] I guess we can cut them by half and see what happens [09:30:01] i'd rather not make many changes at once tbh [09:30:11] yeah step by step [09:30:51] first the memory changes and the move to zotero.svc instead of zoterov2 [09:30:57] fyi i've got a dentist's appointment in 30 mins, so won't be around for at least a couple of hours [09:31:07] and after everything is deemed fine there we can play with the number of replicas [10:39:36] 10serviceops, 10Operations, 10Scap, 10Goal: SRE FY2019 Q3:TEC6: First steps towards Canary Deployments - https://phabricator.wikimedia.org/T213156 (10jijiki) p:05Triage→03Normal [10:39:52] 10serviceops, 10Operations, 10Scap, 10Goal, 10User-jijiki: SRE FY2019 Q3:TEC6: First steps towards Canary Deployments - https://phabricator.wikimedia.org/T213156 (10jijiki) [10:46:56] 10serviceops, 10Operations, 10Thumbor, 10Wikimedia-Logstash, 10User-jijiki: Stream Thumbor logs to logstash - https://phabricator.wikimedia.org/T212946 (10jijiki) @fgiunchedi @herron I agree, it is a good opportunity to move it to the new infrastructure. Since we will be upgrading Thumbor servers to Stre... [10:49:48] <_joe_> addshore: leszek_wmde https://github.com/wikimedia/operations-software-service-checker [10:49:56] amazing [10:52:32] thanks _joe_ [10:53:19] <_joe_> the part you're interested in is "Spec format support" [10:53:38] _joe_: so the "get the pipeline turned on" has to happen after blubber stuff is all ready but before helm charting? [10:55:00] <_joe_> addshore: I think you need the helm chart as well [10:55:04] okay! [10:55:15] <_joe_> but akosiaris knows better [11:13:18] addshore: no, not necessarily [11:13:45] in fact you need to turn on the pipeline before creating the helm chart cause otherwise there are no images to test the chart against [11:13:58] <_joe_> right [11:14:51] so the moment you have a working .pipeline/blubber.yaml, it's "turn on the pipeline". thcipriani is your friend on this one :-). Then create the chart and do some benchmarking and testing per https://wikitech.wikimedia.org/wiki/User:Alexandros_Kosiaris/Benchmarking_kubernetes_apps (which is bound to be moved into an organized documentation portal) [11:15:24] <_joe_> then there is the beta deployment that in this case is going to be needed :P [11:17:57] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 3 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [11:18:00] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Investigate systemd hardening to replace Firejail for Thumbor - https://phabricator.wikimedia.org/T212941 (10jijiki) [11:18:47] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 3 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [11:19:52] <_joe_> fsero: let me know if you want me to take a look at your puppet patch [11:20:26] thanks _joe_ im trying to make PCC happy first [11:43:58] _joe_: so, the beta deployment actually requires the service to be running the production side k8s cluster right? as afaik there isn't one somewhere else? [11:44:21] <_joe_> no, there are ways, once you have your docker image [11:44:27] okay! [11:44:31] <_joe_> we can look into it once you're there [11:44:35] we already have the service running on beta from some other random server in labs right now, but not from within the pipeline [11:45:02] <_joe_> uh, no, the idea would be to have it run in beta on a dedicated VM [11:45:08] <_joe_> from the pipeline [11:45:28] within deployment-prep (right now out test beta service is outside of deployment-prep) :) [11:45:37] <_joe_> yes [11:45:39] but yes, okay, that makes sense *goes to find the document to write this down* [11:45:59] <_joe_> sorry we're supposed to document all this during this quarter :D [11:46:12] hehe :) [11:46:31] what better way to test the documentation that to stumble through the process while the documentation is being written :) [11:55:29] and so the helm chart is needed before the beta deployment via the pipeline _joe_ ? :) [11:55:45] <_joe_> nope [11:55:54] <_joe_> just the docker image I guess [11:55:58] okay! [11:56:13] trying to build a dependancy graph of the todos ;) [11:56:37] <_joe_> blubber => pipeline => helm => production [11:57:01] <_joe_> and well, beta is completely independent, basically once you have the pipeline we can work on it [12:58:35] _joe_: PCC is happy now i will greatly appreciate a review, im probably doing a ton of things wrong so be merciless under the review please :) [13:21:04] fsero: I ended up following your proposal. I amended manually (with kubectl) the service to also listen on 1969 by adding a port in the array. Had to grapple a bit with the nodeport but it works. icinga recovered already [13:23:01] I 'll do the citoid migration and then we are free to do a helm deployment to bring everything back into order [14:59:05] 10serviceops, 10Operations, 10TechCom-RFC, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10daniel) Did the chat with @joe happen? What was the outcome? [15:26:49] akosiaris: great :) we should create deployments objects includinf the deploy timestamp [15:26:57] something like zotero-YYMMDD [15:27:10] so then we can change the selector to the latest deployment [15:27:44] anyway in this case we could have do a rollingupgrade of the deployment to set new limits [15:51:11] 10serviceops, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Kanban (Doing), 10User-Clarakosi, 10User-Eevans: Plan/design a session storage service - https://phabricator.wikimedia.org/T206015 (10Eevans) [16:52:55] 10serviceops, 10Operations, 10TechCom-RFC, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10WMDE-leszek) It did (today, not on Monday though). I hope the outcome is I hope that @Joe and @akosiaris have a better understanding of what are we h... [17:23:36] <_joe_> !log repooling mw1299 for testing the new apache configuration [17:24:34] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production ... - https://phabricator.wikimedia.org/T212801 [17:31:06] _joe_: you have a !log here? :) [17:31:19] <_joe_> oh sigh [17:31:23] <_joe_> well [17:31:34] <_joe_> the channel is logged at least [17:31:36] <_joe_> :P [17:33:05] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10thcipriani) [17:33:09] 10serviceops, 10Citoid, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 (10thcipriani) [17:33:14] 10serviceops, 10CX-cxserver, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate cxserver to kubernetes - https://phabricator.wikimedia.org/T213195 (10thcipriani) [17:33:26] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production ... - https://phabricator.wikimedia.org/T212801 [17:33:30] 10serviceops, 10CX-cxserver, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate cxserver to kubernetes - https://phabricator.wikimedia.org/T213195 (10thcipriani) [17:33:37] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production ... - https://phabricator.wikimedia.org/T212801 [17:33:41] 10serviceops, 10Citoid, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 (10thcipriani) [17:34:20] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production ... - https://phabricator.wikimedia.org/T212801 [17:34:24] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10thcipriani) [17:35:20] greg-g, _joe_: T213196 :-) [17:35:35] <_joe_> James_F: grrr [17:35:37] <_joe_> :P [17:35:49] <_joe_> but you're right [17:36:37] addshore: https://phabricator.wikimedia.org/T212138#4863139 makes me think we can't launch SDC. :-((( [17:39:17] *reads* [17:40:00] we had to do a manual reindex when we turned on lexemes too [17:41:04] https://phabricator.wikimedia.org/T195321 is vaugly related to it for lexeme [17:41:29] not sure where the rest of the discussion happened [17:44:44] I think we ended up using this reindex, https://tools.wmflabs.org/sal/log/AWON1uf7BEfgIt1jMYzn [17:44:45] perhaps [17:45:09] and this for test? https://tools.wmflabs.org/sal/log/AWOIw2IFCdtJF089zJsa [21:14:46] we have to use new "cluster.mailers" config in Phabricator due to upstream changes apparently. reading https://secure.phabricator.com/T13053 https://secure.phabricator.com/T12677 for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482400/ but Phabricator is releng and mail is infra ?:) [21:16:12] it allows to fail-over to more than one mail system [21:17:04] addshore: Worked it out, config was screwed up. Fixed now. [21:17:11] addshore: Thanks for the pointers. [22:07:58] 10serviceops, 10Mail, 10Operations, 10Phabricator, and 2 others: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Dzahn) [22:40:58] 10serviceops, 10Operations, 10Patch-For-Review, 10User-jijiki: Create a mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10Dzahn) summarizing joe's work: 1) [[ https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482790/ | introduction of systemd::timer::job ]] -> 2) [[https:/... [22:42:24] 10serviceops, 10Operations, 10Patch-For-Review, 10User-jijiki: Create a mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10Dzahn) [23:38:21] James_F: coool! :) [23:53:51] 10serviceops, 10Operations, 10Traffic, 10Wikidata, and 3 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) [23:54:13] 10serviceops, 10Operations, 10Traffic, 10Wikidata, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn)