[06:47:13] <wikibugs>	 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Operations: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10Joe) >>! In T210717#4860057, @EBernhardson wrote: > Even more generally, it we install a reverse proxy for local TLS connection poolin...
[06:48:07] <wikibugs>	 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Operations: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10Joe) I think I have a decent idea of how to implement a basic version of what we want via nginx. I'll work on it this week hopefully.
[08:16:51] <wikibugs>	 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Operations: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10dcausse) Just one thought as I discovered this last week. A non negligible time spent by curl is by reading `/etc/ssl/certs/ca-certifi...
[08:26:22] <wikibugs>	 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Operations: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10Joe) >>! In T210717#4861725, @dcausse wrote: > I think this explain why we've seen a +15ms when I broke connection pooling (T212768)....
[09:23:22] <akosiaris>	 fsero: _joe_ I started the migration of the OLD zotero LVS IPs to the new kubernetes based infra
[09:23:32] <akosiaris>	 mobrovac: ^
[09:23:36] <_joe_>	 akosiaris: cool
[09:23:57] <akosiaris>	 that does entail a new helm release, but that's easy and already done
[09:24:07] <akosiaris>	 while at it I consulted https://grafana.wikimedia.org/d/000000620/xxxx-zotero-debugging-kubernetes?orgId=1&from=now-30d&to=now
[09:24:09] <mobrovac>	 nice
[09:24:24] <_joe_>	 akosiaris: why a release?
[09:24:30] <akosiaris>	 and trimmed down the memory limits from 4Gi to 2Gi and the requests from 4Gi to 300Mi
[09:24:39] <_joe_>	 does the deployment need to know the LVS IPs?
[09:24:41] <akosiaris>	 should allow the scheduler to do way better placements
[09:24:46] <akosiaris>	 yes, exactly that
[09:26:41] <fsero>	 great, shall we reduce the number of replicas as well?
[09:29:07] <akosiaris>	 makes sense I guess. The old service had the exorbitant amount of 2, but it was based on xulrunner, not node
[09:29:28] <akosiaris>	 I am not sure how many concurrent requests it was able of servicing tbh
[09:29:50] <akosiaris>	 I guess we can cut them by half and see what happens
[09:30:01] <mobrovac>	 i'd rather not make many changes at once tbh
[09:30:11] <akosiaris>	 yeah step by step
[09:30:51] <akosiaris>	 first the memory changes and the move to zotero.svc instead of zoterov2
[09:30:57] <mobrovac>	 fyi i've got a dentist's appointment in 30 mins, so won't be around for at least a couple of hours
[09:31:07] <akosiaris>	 and after everything is deemed fine there we can play with the number of replicas
[10:39:36] <wikibugs>	 10serviceops, 10Operations, 10Scap, 10Goal: SRE FY2019 Q3:TEC6: First steps towards Canary Deployments - https://phabricator.wikimedia.org/T213156 (10jijiki) p:05Triage→03Normal
[10:39:52] <wikibugs>	 10serviceops, 10Operations, 10Scap, 10Goal, 10User-jijiki: SRE FY2019 Q3:TEC6: First steps towards Canary Deployments - https://phabricator.wikimedia.org/T213156 (10jijiki)
[10:46:56] <wikibugs>	 10serviceops, 10Operations, 10Thumbor, 10Wikimedia-Logstash, 10User-jijiki: Stream Thumbor logs to logstash - https://phabricator.wikimedia.org/T212946 (10jijiki) @fgiunchedi @herron I agree, it is a good opportunity to move it to the new infrastructure. Since we will be upgrading Thumbor servers to Stre...
[10:49:48] <_joe_>	 addshore: leszek_wmde https://github.com/wikimedia/operations-software-service-checker
[10:49:56] <addshore>	 amazing
[10:52:32] <leszek_wmde>	 thanks _joe_
[10:53:19] <_joe_>	 the part you're interested in is "Spec format support"
[10:53:38] <addshore>	 _joe_: so the "get the pipeline turned on" has to happen after blubber stuff is all ready but before helm charting?
[10:55:00] <_joe_>	 addshore: I think you need the helm chart as well
[10:55:04] <addshore>	 okay!
[10:55:15] <_joe_>	 but akosiaris knows better
[11:13:18] <akosiaris>	 addshore: no, not necessarily
[11:13:45] <akosiaris>	 in fact you need to turn on the pipeline before creating the helm chart cause otherwise there are no images to test the chart against
[11:13:58] <_joe_>	 right
[11:14:51] <akosiaris>	 so the moment you have a working .pipeline/blubber.yaml, it's "turn on the pipeline". thcipriani is your friend on this one :-). Then create the chart and do some benchmarking and testing per https://wikitech.wikimedia.org/wiki/User:Alexandros_Kosiaris/Benchmarking_kubernetes_apps (which is bound to be moved into an organized documentation portal)
[11:15:24] <_joe_>	 then there is the beta deployment that in this case is going to be needed :P
[11:17:57] <wikibugs>	 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 3 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki)
[11:18:00] <wikibugs>	 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, 10User-jijiki: Investigate systemd hardening to replace Firejail for Thumbor - https://phabricator.wikimedia.org/T212941 (10jijiki)
[11:18:47] <wikibugs>	 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 3 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki)
[11:19:52] <_joe_>	 fsero: let me know if you want me to take a look at your puppet patch 
[11:20:26] <fsero>	 thanks _joe_ im trying to make PCC happy first
[11:43:58] <addshore>	 _joe_: so, the beta deployment actually requires the service to be running the production side k8s cluster right? as afaik there isn't one somewhere else?
[11:44:21] <_joe_>	 no, there are ways, once you have your docker image
[11:44:27] <addshore>	 okay!
[11:44:31] <_joe_>	 we can look into it once you're there
[11:44:35] <addshore>	 we already have the service running on beta from some other random server in labs right now, but not from within the pipeline
[11:45:02] <_joe_>	 uh, no, the idea would be to have it run in beta on a dedicated VM
[11:45:08] <_joe_>	 from the pipeline
[11:45:28] <addshore>	 within deployment-prep (right now out test beta service is outside of deployment-prep) :)
[11:45:37] <_joe_>	 yes
[11:45:39] <addshore>	 but yes, okay, that makes sense *goes to find the document to write this down*
[11:45:59] <_joe_>	 sorry we're supposed to document all this during this quarter :D
[11:46:12] <addshore>	 hehe :)
[11:46:31] <addshore>	 what better way to test the documentation that to stumble through the process while the documentation is being written :)
[11:55:29] <addshore>	 and so the helm chart is needed before the beta deployment via the pipeline _joe_ ? :)
[11:55:45] <_joe_>	 nope
[11:55:54] <_joe_>	 just the docker image I guess
[11:55:58] <addshore>	 okay!
[11:56:13] <addshore>	 trying to build a dependancy graph of the todos ;)
[11:56:37] <_joe_>	 blubber => pipeline => helm => production
[11:57:01] <_joe_>	 and well, beta is completely independent, basically once you have the pipeline we can work on it
[12:58:35] <fsero>	 _joe_: PCC is happy now i will greatly appreciate a review, im probably doing a ton of things wrong so be merciless under the review please :)
[13:21:04] <akosiaris>	 fsero: I ended up following your proposal. I amended manually (with kubectl) the service to also listen on 1969 by adding a port in the array. Had to grapple a bit with the nodeport but it works. icinga recovered already
[13:23:01] <akosiaris>	 I 'll do the citoid migration and then we are free to do a helm deployment to bring everything back into order
[14:59:05] <wikibugs>	 10serviceops, 10Operations, 10TechCom-RFC, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10daniel) Did the chat with @joe happen? What was the outcome?
[15:26:49] <fsero>	 akosiaris: great :) we should create deployments objects includinf the deploy timestamp
[15:26:57] <fsero>	 something like zotero-YYMMDD
[15:27:10] <fsero>	 so then we can change the selector to the latest deployment
[15:27:44] <fsero>	 anyway in this case we could have do a rollingupgrade of the deployment to set new limits
[15:51:11] <wikibugs>	 10serviceops, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Kanban (Doing), 10User-Clarakosi, 10User-Eevans: Plan/design a session storage service - https://phabricator.wikimedia.org/T206015 (10Eevans)
[16:52:55] <wikibugs>	 10serviceops, 10Operations, 10TechCom-RFC, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10WMDE-leszek) It did (today, not on Monday though). I hope the outcome is I hope that @Joe and @akosiaris have a better understanding of what are we h...
[17:23:36] <_joe_>	 !log repooling mw1299 for testing the new apache configuration
[17:24:34] <wikibugs>	 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production ... - https://phabricator.wikimedia.org/T212801
[17:31:06] <greg-g>	 _joe_: you have a !log here? :)
[17:31:19] <_joe_>	 oh sigh
[17:31:23] <_joe_>	 well 
[17:31:34] <_joe_>	 the channel is logged at least
[17:31:36] <_joe_>	 :P
[17:33:05] <wikibugs>	 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10thcipriani)
[17:33:09] <wikibugs>	 10serviceops, 10Citoid, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 (10thcipriani)
[17:33:14] <wikibugs>	 10serviceops, 10CX-cxserver, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate cxserver to kubernetes - https://phabricator.wikimedia.org/T213195 (10thcipriani)
[17:33:26] <wikibugs>	 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production ... - https://phabricator.wikimedia.org/T212801
[17:33:30] <wikibugs>	 10serviceops, 10CX-cxserver, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate cxserver to kubernetes - https://phabricator.wikimedia.org/T213195 (10thcipriani)
[17:33:37] <wikibugs>	 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production ... - https://phabricator.wikimedia.org/T212801
[17:33:41] <wikibugs>	 10serviceops, 10Citoid, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 (10thcipriani)
[17:34:20] <wikibugs>	 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production ... - https://phabricator.wikimedia.org/T212801
[17:34:24] <wikibugs>	 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10thcipriani)
[17:35:20] <James_F>	 greg-g, _joe_: T213196 :-)
[17:35:35] <_joe_>	 James_F: grrr
[17:35:37] <_joe_>	 :P
[17:35:49] <_joe_>	 but you're right
[17:36:37] <James_F>	 addshore: https://phabricator.wikimedia.org/T212138#4863139 makes me think we can't launch SDC. :-(((
[17:39:17] <addshore>	 *reads*
[17:40:00] <addshore>	 we had to do a manual reindex when we turned on lexemes too
[17:41:04] <addshore>	 https://phabricator.wikimedia.org/T195321 is vaugly related to it for lexeme
[17:41:29] <addshore>	 not sure where the rest of the discussion happened
[17:44:44] <addshore>	 I think we ended up using this reindex, https://tools.wmflabs.org/sal/log/AWON1uf7BEfgIt1jMYzn
[17:44:45] <addshore>	 perhaps
[17:45:09] <addshore>	 and this for test? https://tools.wmflabs.org/sal/log/AWOIw2IFCdtJF089zJsa
[21:14:46] <mutante>	 we have to use new "cluster.mailers" config in Phabricator due to upstream changes apparently.  reading https://secure.phabricator.com/T13053  https://secure.phabricator.com/T12677 for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482400/   but Phabricator is releng and mail is infra ?:)
[21:16:12] <mutante>	 it allows to fail-over to more than one mail system
[21:17:04] <James_F>	 addshore: Worked it out, config was screwed up. Fixed now.
[21:17:11] <James_F>	 addshore: Thanks for the pointers.
[22:07:58] <wikibugs>	 10serviceops, 10Mail, 10Operations, 10Phabricator, and 2 others: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Dzahn)
[22:40:58] <wikibugs>	 10serviceops, 10Operations, 10Patch-For-Review, 10User-jijiki: Create a  mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10Dzahn) summarizing joe's work:  1) [[ https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482790/ | introduction of systemd::timer::job ]]  ->   2)  [[https:/...
[22:42:24] <wikibugs>	 10serviceops, 10Operations, 10Patch-For-Review, 10User-jijiki: Create a  mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10Dzahn)
[23:38:21] <addshore>	 James_F: coool! :)
[23:53:51] <wikibugs>	 10serviceops, 10Operations, 10Traffic, 10Wikidata, and 3 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn)
[23:54:13] <wikibugs>	 10serviceops, 10Operations, 10Traffic, 10Wikidata, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn)