[06:54:20] <wikibugs>	 10serviceops: mcrouter memcached flapping in gutter pool - https://phabricator.wikimedia.org/T255511 (10elukey) @RLazarus @CDanis maybe as interim solution, while we think about a more "final" solution, we could change `--probe-timeout-initial` from 3s to something like 30/60/300s? To avoid constant flaps if ano...
[10:51:02] <liw>	 all deployers: scap sync --canary-wait-time option is available (https://phabricator.wikimedia.org/T217924)
[11:10:45] <jbond42>	 hi all could some one let me know how i go about mergeing and deploying a wikimedia-config patch https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/606407
[11:11:10] <jbond42>	 not something i have had to do before
[11:14:47] <jbond42>	 liw: mori.tzm  mentione you may be able to help with ^^
[11:20:53] <liw>	 jbond42, I fear I am utterly incompetent with regards to anything related to MediaWiki configuration, sorry
[11:21:36] <jbond42>	 liw: ack thanks will wiat for someone elses guess everyone is on lunch
[11:38:52] <jbond42>	 mutante: jayme: akosiaris: perhaps ^^
[11:41:26] <jayme>	 I've not touched that area till now - so I've no real idea either. :/
[11:42:20] <jbond42>	 ack thanks jayme 
[11:45:10] * akosiaris around now, looking
[11:45:25] <jbond42>	 akosiaris: thanks
[11:46:25] <akosiaris>	 jbond42: scap sync-file and you should be good 
[11:46:40] <akosiaris>	 +1ed
[11:47:26] <jbond42>	 akosiaris: i have never deployed a mediawiki-config at all.  is it just submtit then  `scap sync-file $thefile` from deploy1001?
[11:48:38] <akosiaris>	 and a git pull before that
[11:49:02] <jbond42>	 ack thanks ill give it a try
[11:49:26] <akosiaris>	 dir is /srv/mediawiki-staging btw
[11:49:35] <jbond42>	 yep ack
[11:51:41] <akosiaris>	 labs should pick it up automatically in 10 or so mins IIRC
[11:52:07] <jbond42>	 seems to have gone smooly and ack 
[13:10:23] <wikibugs>	 10serviceops, 10CX-cxserver, 10Language-Team (Language-2020-Focus-Sprint), 10Release-Engineering-Team (Pipeline): Migrate apertium to the deployment pipeline - https://phabricator.wikimedia.org/T255672 (10KartikMistry)
[14:15:09] <wikibugs>	 10serviceops, 10MediaWiki-General, 10Operations, 10Core Platform Team Workboards (Clinic Duty Team), and 3 others: Revisit timeouts, concurrency limits in remote HTTP calls from MediaWiki - https://phabricator.wikimedia.org/T245170 (10AMooney) @tstarling  anything left to do for this task?
[15:36:26] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw: (Need by: TBD) rack/setup/install kubernetes20[07-14].codfw.wmnet and kubestage200[1-2].codfw.wmnet. - https://phabricator.wikimedia.org/T252185 (10akosiaris)
[17:59:19] <cdanis>	 I want to put some Mediawiki appserver saturation metrics (busy vs idle worker threads, overall CPU usage) on a dashboard somewhere but I'm not quite sure where that should be
[17:59:49] <apergos>	 what are you trying to get out of the numbers?
[18:00:40] <cdanis>	 mostly I want to make it more obvious when we're in a situation where all the workers are consumed/stuck, and link to it from an alert about such
[18:00:59] <cdanis>	 there's the RED dashboard but RED explicitly doesn't cover saturation 🙃
[18:01:39] <cdanis>	 (4GS is basically RED plus saturation, though)
[18:15:26] <apergos>	 ημμ
[18:15:29] <apergos>	 woops
[18:15:30] <apergos>	 hmmm
[18:16:15] * apergos looks around at the other existing apps-related dashboards
[18:16:30] <apergos>	 this would be the regular appservers or also api and whatever else?
[18:16:52] <cdanis>	 mostly appserver and api, maybe parsoid as well
[18:17:13] <cdanis>	 i'm more concerned with things are in the user query path, and i suspect (but haven't checked) that a high utilization of jobrunner threads is 'normal'
[18:17:28] <wikibugs>	 10serviceops, 10Operations: move all 86 new codfw appservers into production (mw2[291-2377].codfw.wmnet) - https://phabricator.wikimedia.org/T247021 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2339.codfw.wmnet ` The log can be found in `/var/log/...
[18:18:03] <apergos>	 I hate to add to dashboard proliferation but maybe a new one is appropriate
[18:20:15] <apergos>	 https://grafana.wikimedia.org/d/000000550/mediawiki-application-servers?orgId=1   otoh there is already 'worker saturation' there 
[18:20:21] <apergos>	 unless you just added it that is :-P
[18:20:23] <cdanis>	 yeah, that's for a single worker
[18:20:28] <cdanis>	 this is for the whole fleet
[18:20:56] <cdanis>	 AIUI there isn't really a whole-fleet MW dashboard aside from the RED dashboard
[18:21:27] <apergos>	 not so much
[18:21:29] <apergos>	 guess it's time
[18:22:39] <cdanis>	 idk I'm tempted to add a (maybe default-collapsed) 'Saturation' section to the RED dashboard, it's just one panel
[18:46:18] <wikibugs>	 10serviceops, 10MediaWiki-General, 10Security-Team, 10Performance-Team (Radar), 10Security: Create a tmp directory just for MediaWiki - https://phabricator.wikimedia.org/T179901 (10BPirkle) a:05BPirkle→03None
[18:51:30] <wikibugs>	 10serviceops, 10Operations: move all 86 new codfw appservers into production (mw2[291-2377].codfw.wmnet) - https://phabricator.wikimedia.org/T247021 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2339.codfw.wmnet'] `  and were **ALL** successful.
[18:53:19] <apergos>	 well if you want to seperate out api and regular and jobrunner also
[18:53:23] <apergos>	 then that's 4 panels
[18:53:32] <apergos>	 or i guess you could do them all on one, meh
[18:53:42] <cdanis>	 no, I'd just make it show whichever cluster is selected as a variable at the top
[18:53:54] <apergos>	 that works too
[18:55:01] <cdanis>	 I do think it is a high-level system health indicator for appservers, and AIUI the RED dashboard is intended to display such things, even if this is stretching the definition of RED a bit
[18:56:27] <rzl>	 I'd rather have the right graphs in the right place than be religious about the definitions
[18:56:46] <rzl>	 but the "RED" name predates me, so grain of salt
[18:57:55] <cdanis>	 for context, nice short breakdown of USE/RED/4GS here https://grafana.com/blog/2018/08/02/the-red-method-how-to-instrument-your-services/ (with a longer talk I haven't watched available there too)
[22:05:55] <wikibugs>	 10serviceops, 10Operations, 10observability, 10Patch-For-Review: Reliable metrics for idle/busy PHP-FPM workers - https://phabricator.wikimedia.org/T252605 (10CDanis) 05Open→03Resolved We now have an alert and a graph based on scraping the status string that php-fpm provides to systemd, which is reliab...