[01:24:50] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` phab1001.eqiad.wmnet ` The log can be found in... [01:48:36] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Dzahn) [01:56:50] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['phab1001.eqiad.wmnet'] ` Of which those **FAILED**: ` ['phab1001.eqiad.wmnet'] ` [08:57:56] https://grafana.wikimedia.org/d/G8zPL7-Wz/kubernetes-node is done. Took me a while but I think it's pretty good to give an idea what is going on kubernetes wise in a node [09:55:19] <_joe_> woha [09:55:53] <_joe_> akosiaris: nice [09:56:54] <_joe_> pretty impressive indeed [10:10:14] akosiaris: nice! [nit] for counters, either set decimals to 0 in the Y axis (to avoid 23.0) or decimals to -1 to also force the mouseover to have integers, *but* in the latter case you have a uglier Y axis values (like 20-30) instead of 22,23,24,25 for example [10:14:41] volans: for counters? [10:14:45] gauge you mean? [10:14:54] yes for integers [10:15:07] ah yeah, makes sense [10:15:26] I hate the -1 trick though, mess up with the units picked in the axis, dunno why [10:15:29] seems almost a bug [10:15:39] but I dislike the decimals in the mouseover too :D [10:21:26] 10serviceops, 10Operations, 10SRE-swift-storage, 10Patch-For-Review, and 2 others: Swift object servers become briefly unresponsive on a regular basis - https://phabricator.wikimedia.org/T226373 (10fgiunchedi) I've investigated a bit the scope and impact of this issue, namely by joining the transactions ID... [10:21:56] volans: done. I went the decimals: 0 way. decimals: -1 feels unnatural [10:22:26] I get your mouseover point, but even saying that decimals after the dot can be -1 is ugh [10:23:22] indeed! [10:23:53] and also mess it up with the scale, if you zoom in a place where there was a 1 pod difference [10:24:10] you get like 20 - 30, while with 0 decimals you get like 22 23 24 25 [10:24:17] yup [10:27:39] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mw2260.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201912051027_jiji_93252.log`. [11:05:55] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2260.codfw.wmnet'] ` and were **ALL** successful. [12:19:50] 10serviceops, 10Operations: VP9-enabled ffmpeg doesn't get installed after reimage of mw job runner/video scaler - https://phabricator.wikimedia.org/T239831 (10jijiki) @brion It looks like all our videoscalers were lacking VP9 codec support. What do you think we should do ? [12:22:18] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mw2261.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201912051222_jiji_116344.log`. [12:25:32] 10serviceops, 10Performance-Team (Radar): Reconsider memcached connection method for MW in PHP7 world - https://phabricator.wikimedia.org/T235216 (10Joe) I'm pretty sure using unix sockets would improve performance, it did for sure when we were on HHVM. It's pretty easy to test this effect as we now collect ti... [12:32:09] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team (Radar): Ensure apcu incr/decr are atomic (Upgrade php-apcu) - https://phabricator.wikimedia.org/T236800 (10Krinkle) [13:03:36] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2261.codfw.wmnet'] ` and were **ALL** successful. [14:20:33] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mw2260.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201912051420_jiji_139305.log`. [15:22:59] 10serviceops, 10Product-Infrastructure-Team-Backlog (Kanban): [Bug] Chromium binary missing in proton's production docker image - https://phabricator.wikimedia.org/T238890 (10MSantos) 05Open→03Resolved [15:25:38] 10serviceops, 10Operations: VP9-enabled ffmpeg doesn't get installed after reimage of mw job runner/video scaler - https://phabricator.wikimedia.org/T239831 (10brion) Ideally: fix the installs ASAP :) If can't be done: disable `$wgFFmpegVP9RowMT` and that should hopefully work if it reverted to a version that... [16:05:24] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2260.codfw.wmnet'] ` and were **ALL** successful. [16:28:47] serviceopsen: might be a couple minutes late, sorry -- go ahead without me [16:29:30] <_joe_> rlazarus: ack! [16:38:22] 10serviceops, 10Operations, 10Patch-For-Review: VP9-enabled ffmpeg doesn't get installed after reimage of mw job runner/video scaler - https://phabricator.wikimedia.org/T239831 (10brion) Ok, I'll run: ` foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --error --throttle ` to catch... [16:46:21] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10jijiki) @Jclark-ctr Do we have an estimate when we are able to have those servers racked? [17:19:16] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mw2260.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201912051719_jiji_174984.log`. [17:23:17] deployment freeze starts dec 23, we still have regular deploys the next two weeks [17:24:17] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10wiki_willy) Hi @jijiki - we received these on Sept 25, but just recently received the racking instructions from your team on Nov 25, so there might be... [17:29:09] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10jijiki) @wiki_willy given we are responsible for this delay, we would like to check with your team and tell us what is the earliest we can do, and we c... [17:30:38] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10wiki_willy) Thanks for confirming @jijiki - I'll let @Jclark-ctr provide an ETA on them, when he gets in a bit later today. Thanks, Willy [17:42:15] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10Jclark-ctr) @jijiki If no surprises i could have them Racked by Dec 20th [17:43:36] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10jijiki) That would be lovely, thank you! [17:46:07] 10serviceops, 10Operations, 10ops-eqiad: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10wiki_willy) [17:54:48] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2260.codfw.wmnet'] ` and were **ALL** successful. [18:41:38] 10serviceops, 10Operations: VP9-enabled ffmpeg doesn't get installed after reimage of mw job runner/video scaler - https://phabricator.wikimedia.org/T239831 (10jijiki) @brion thank you! You can mark this as resolved if there is nothing else to be done [18:43:45] 10serviceops, 10DC-Ops, 10Operations, 10ops-codfw: mw2259 down and mgmt does not exist? - https://phabricator.wikimedia.org/T239758 (10Dzahn) Thank you very much @Papaul. I can confirm the server is reachable again and everything looked fine in Icinga. Also the mgmt interface is in Icinga again. I jus... [18:44:20] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10Dzahn) [19:22:48] 10serviceops, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) - events-.wikimedia.org/v1/events - -events.wikimedia.org/v1/events - intake-.wikim... [19:25:27] 10serviceops, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) - logging-sink & analytics-sink (or sink-*) ? [19:50:33] 10serviceops, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10mpopov) >>! In T236386#5716552, @Ottomata wrote: > > - events-.wikimedia.org/v1/events > - -events.wiki... [19:59:15] 10serviceops, 10Operations, 10Parsoid-PHP, 10Wikimedia-production-error: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Dzahn) This was assigned to me to (make it possible to) raise the memory limit. That has happened now. Of the 2 patches linked one has been abandoned beca... [19:59:27] 10serviceops, 10Operations, 10Parsoid-PHP, 10Wikimedia-production-error: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Dzahn) a:05Dzahn→03None [20:05:56] 10serviceops, 10Operations, 10Parsoid-PHP, 10Wikimedia-production-error: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Dzahn) If you want to change the memory_limit for just Parsoid servers again that is now: ` 18606 'wmgMemoryLimitParsoid' => [ 18607 'default' =>... [20:30:54] 10serviceops, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) > Is /v1/events plural with the intention that eventually EventGate will support batches of events in the same req... [21:54:23] 10serviceops, 10Operations: VP9-enabled ffmpeg doesn't get installed after reimage of mw job runner/video scaler - https://phabricator.wikimedia.org/T239831 (10brion) 05Open→03Resolved [22:12:28] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Dzahn) >>! In T238956#5712739, @Mainframe98 wrote: > I suspect that this move inadvertently caused {T239786} as I only started seeing it today. Yes,... [22:13:21] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Dzahn) [22:15:49] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Dzahn) 05Open→03Resolved Summary: We switched from phab1003 to phab1001. Then we realized phab1001 had wrong BIOS settings (disks in legacy IDE... [22:16:00] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10mmodell) >>! In T238956#5712104, @Lucas_Werkmeister_WMDE wrote: > The recently added Gerrit integration directly beneath a tasks description seems to... [22:21:44] 10serviceops, 10Release-Engineering-Team: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Dzahn) [22:26:08] 10serviceops, 10Release-Engineering-Team: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Dzahn) [22:30:31] Phabricator switch is now done for real and phab1001 is prod. [22:30:43] basically no downtime at all .. besides git-ssh ..which is also fixed. [22:31:21] i'll be back later, call or text if any issues or ping Mukunda.