[00:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210210T0000).
[00:00:04] <jouncebot>	 ejegg and kemayo: A patch you scheduled for Evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[00:00:55] <James_F>	 legoktm: I guess it'll be on wmf.30 now. ;-)
[00:00:55] <wikibugs>	 (03PS1) 10Ottomata: Refine - use spark assembly without hadoop jars [puppet] - 10https://gerrit.wikimedia.org/r/663061 (https://phabricator.wikimedia.org/T273711)
[00:01:19] <twentyafterfour>	 !log train status: wmf.28 and wmf.29 are undeployed.  wmf.27 is everywhere with the exception of testwikis which is at wmf.30 refs T271344
[00:01:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:01:24] <legoktm>	 I turn away for a second and we've jumped 3 weeks
[00:01:24] <stashbot>	 T271344: 1.36.0-wmf.30 deployment blockers - https://phabricator.wikimedia.org/T271344
[00:01:35] <legoktm>	 twentyafterfour: thank you :)
[00:01:41] <legoktm>	 (and everyone else pushing the train forward!)
[00:02:20] <James_F>	 +1
[00:02:25] <twentyafterfour>	 legoktm: just playing the roll of chaos monkey.
[00:02:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Refine - use spark assembly without hadoop jars [puppet] - 10https://gerrit.wikimedia.org/r/663061 (https://phabricator.wikimedia.org/T273711) (owner: 10Ottomata)
[00:03:00] <wikibugs>	 (03PS2) 10Legoktm: Add hiera for docker_registry_ha I76a6fc9d21380 [labs/private] - 10https://gerrit.wikimedia.org/r/663058
[00:03:43] <wikibugs>	 (03CR) 10Legoktm: [V: 03+2 C: 03+2] Add hiera for docker_registry_ha I76a6fc9d21380 [labs/private] - 10https://gerrit.wikimedia.org/r/663058 (owner: 10Legoktm)
[00:03:45] <wikibugs>	 (03PS2) 10Ottomata: Refine - use spark assembly without hadoop jars [puppet] - 10https://gerrit.wikimedia.org/r/663061 (https://phabricator.wikimedia.org/T273711)
[00:03:47] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:04:12] <Amir1>	 thcipriani: twentyafterfour I can help with the backport of featured feeds
[00:04:18] <Amir1>	 and review if needed
[00:04:27] <Amir1>	 this has been going on for too long
[00:04:33] <wikibugs>	 (03CR) 10Legoktm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27952/console" [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[00:06:46] <twentyafterfour>	 Amir1: that would be welcome for sure 
[00:06:56] <Urbanecm>	 did we again rollback? :-(
[00:07:15] <Amir1>	 Should I do it now?
[00:07:18] <Amir1>	 Ops around?
[00:07:29] <Urbanecm>	 Amir1: it's B&C right now, btw
[00:07:33] <Amir1>	 oooh
[00:07:35] <Amir1>	 nice timing 
[00:07:41] <Urbanecm>	 that makes me ask...is someone leading that window?
[00:07:46] <Urbanecm>	 there are two config patches
[00:08:30] <James_F>	 Train deploys over-ride B&C windows.
[00:08:36] <Kemayo>	 Plus, mine can be skipped -- it was scheduled on the assumption that .29 was going to stay out.
[00:08:53] <Kemayo>	 Hopefully I'll be back with another one for .30 soon. :D
[00:09:28] * James_F grins.
[00:09:30] <James_F>	 Hopefully.
[00:09:39] <James_F>	 twentyafterfour: Are you planning to push wmf.30 to group0?
[00:10:03] <twentyafterfour>	 James_F: not planning to go to group0 until blockers/logspammers are resolved? 
[00:10:12] <twentyafterfour>	 I think that will happen tomorrow 
[00:10:24] * James_F nods.
[00:10:25] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) changes to add the new backends to scap...
[00:10:28] <twentyafterfour>	 but I can stick around and do it tonight if everyone is comfortable with it
[00:11:11] <Amir1>	 I can push the fix
[00:11:15] <Amir1>	 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/FeaturedFeeds/+/662965
[00:11:29] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Fix issues with recent caching update [extensions/FeaturedFeeds] (wmf/1.36.0-wmf.30) - 10https://gerrit.wikimedia.org/r/662965 (https://phabricator.wikimedia.org/T264391) (owner: 1020after4)
[00:11:34] <twentyafterfour>	 Amir1: go for it? 
[00:11:39] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Refine - use spark assembly without hadoop jars [puppet] - 10https://gerrit.wikimedia.org/r/663061 (https://phabricator.wikimedia.org/T273711) (owner: 10Ottomata)
[00:13:33] <Amir1>	 https://versions.toolforge.org/ says not even test wikis are on wmf.30
[00:13:46] <twentyafterfour>	 Amir1: it's still syncing 
[00:13:51] <twentyafterfour>	 scap is running currently 
[00:13:56] <Amir1>	 aah
[00:13:57] <Amir1>	 I see
[00:14:08] <twentyafterfour>	 sync-apaches:  46% (ok: 160; fail: 0; left: 187)
[00:14:12] <James_F>	 Yeah, first scap takes forever.
[00:14:19] <wikibugs>	 (03PS1) 10Ottomata: Fix type in refine.pp spark conf [puppet] - 10https://gerrit.wikimedia.org/r/663062 (https://phabricator.wikimedia.org/T273711)
[00:14:33] <James_F>	 We could make it much faster by dropping i18n and doing everything in English. ;-)
[00:14:44] <Urbanecm>	 only first scap James_F ?
[00:15:32] <James_F>	 Urbanecm: Yeah. Needs to do a total copy of all the new files to each host, plus a full i18n build and sync.
[00:15:40] <James_F>	 Up to ~an hour.
[00:16:02] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Fix type in refine.pp spark conf [puppet] - 10https://gerrit.wikimedia.org/r/663062 (https://phabricator.wikimedia.org/T273711) (owner: 10Ottomata)
[00:16:05] <twentyafterfour>	 it shouldn't be that long at this point 
[00:16:08] <Urbanecm>	 i thought that i18n build happens every time someone runs scap sync-world, but maybe I'm wrong :)
[00:16:31] <James_F>	 Urbanecm: Yes, but it's working from the i18n already being present, that's just the small build step.
[00:16:32] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) @Krinkle also see T149924#6699330  and...
[00:16:34] <twentyafterfour>	 it does but it's cached so it takes longer for a new branch 
[00:16:41] <Urbanecm>	 got it
[00:16:43] <James_F>	 So very much longer.
[00:16:43] <wikibugs>	 (03Merged) 10jenkins-bot: Fix issues with recent caching update [extensions/FeaturedFeeds] (wmf/1.36.0-wmf.30) - 10https://gerrit.wikimedia.org/r/662965 (https://phabricator.wikimedia.org/T264391) (owner: 1020after4)
[00:16:53] <James_F>	 #StuffOnlyRelEngHaveToCryAbout
[00:17:11] <thcipriani>	 also, the actual xfer of the bits via rsync on first scap makes it take extra long
[00:17:23] <twentyafterfour>	 sync-apaches:  92% (ok: 321; fail: 0; left: 26) 
[00:17:27] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) >>! In T247653#6814030, @Krinkle wrote:...
[00:17:43] <Amir1>	 deprecating cdb would have helped?
[00:18:08] <thcipriani>	 it would have simplified the process
[00:18:15] <thcipriani>	 the process itself is weird
[00:18:25] <twentyafterfour>	 scap-cdb-rebuild:   0% (ok: 0; fail: 0; left: 366)
[00:18:29] <wikibugs>	 (03CR) 10CRusnov: "Thank you for the feedback. Here are my responses:" (032 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/662762 (https://phabricator.wikimedia.org/T263768) (owner: 10CRusnov)
[00:18:40] <thcipriani>	 we go: cdb -> json -> sync json with servers -> each server rebuilds cdb from json
[00:18:42] <James_F>	 Amir1: Yes, but SRE had concerns that it would slow down prod.
[00:19:00] <twentyafterfour>	 cdb rebuild is going fast, already 33$
[00:19:03] <twentyafterfour>	 33%
[00:19:07] <thcipriani>	 getting rid of that weird dance would have been a win
[00:19:10] <brennen>	 i wonder if it would be helpful to log a few more of the scap steps in here.
[00:19:15] <brennen>	 ...or just noisy.
[00:19:15] <James_F>	 Clearly we should have logmsgbot ping the channel with status updates for mega-scaps.
[00:19:17] <twentyafterfour>	 50%
[00:19:21] <James_F>	 Ha! Snap, brennen. :-D
[00:19:27] <brennen>	 :)
[00:19:34] <James_F>	 thcipriani: Alas.
[00:42:37] <wikibugs>	 (03CR) 10Legoktm: k8s: Add docker-registry credentials to pull restricted images (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[00:48:53] <wikibugs>	 (03PS2) 10Anne Tomasevich: Add external entity search URI for new MediaSearch extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/662792 (https://phabricator.wikimedia.org/T265939)
[00:50:00] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 441512552 and 275 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:50:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add external entity search URI for new MediaSearch extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/662792 (https://phabricator.wikimedia.org/T265939) (owner: 10Anne Tomasevich)
[00:52:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 104112 and 405 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:52:48] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:55:33] <logmsgbot>	 !log milimetric@deploy1001 Started deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade
[00:55:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:58:50] <mutante>	 !log doc1001 - reloaded apache2
[00:58:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:59:41] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) There is definitely only doc1001 in the...
[01:06:28] <logmsgbot>	 !log milimetric@deploy1001 Finished deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade (duration: 10m 55s)
[01:06:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:06:36] <logmsgbot>	 !log milimetric@deploy1001 Started deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade
[01:06:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:06:43] <logmsgbot>	 !log milimetric@deploy1001 Finished deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade (duration: 00m 06s)
[01:06:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:16:35] <wikibugs>	 (03CR) 10BryanDavis: "This change is a contributing factor to T274310" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/661915 (owner: 10Giuseppe Lavagetto)
[01:19:04] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn)  `  Got error 'PHP message: PHP Parse er...
[01:25:10] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) This is a stretch server with PHP 7.0....
[01:27:56] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Krinkle) >>! In T247653#6817226, @Dzahn wrote:...
[01:41:10] <icinga-wm>	 PROBLEM - PHP opcache health on mwdebug1002 is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[01:43:22] <logmsgbot>	 !log krinkle@deploy1001 Started deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - Ibf28e02ec03
[01:43:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:43:29] <logmsgbot>	 !log krinkle@deploy1001 Finished deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - Ibf28e02ec03 (duration: 00m 06s)
[01:43:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:54:03] <logmsgbot>	 !log krinkle@deploy1001 Started deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - Ib67da94fb1bdf0
[01:54:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:54:09] <logmsgbot>	 !log krinkle@deploy1001 Finished deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - Ib67da94fb1bdf0 (duration: 00m 06s)
[01:54:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:57:42] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) after the changes above and rebooting th...
[02:43:08] <icinga-wm>	 RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:51:00] <icinga-wm>	 PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:19:45] <wikibugs>	 (03CR) 10Anne Tomasevich: Add external entity search URI for new MediaSearch extension (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/662792 (https://phabricator.wikimedia.org/T265939) (owner: 10Anne Tomasevich)
[03:23:08] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:28:08] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1012 is OK: HTTP OK: HTTP/1.1 200 OK - 690 bytes in 1.208 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:36:08] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:46:12] <ryankemper>	 !log `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service`
[03:46:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:49:06] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1012 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.064 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[04:43:04] <icinga-wm>	 RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:50:50] <icinga-wm>	 PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:18:14] <wikibugs>	 10SRE, 10Commons, 10Traffic, 10Patch-For-Review: Investigate unusual media traffic pattern for AsterNovi-belgii-flower-1mb.jpg on Commons - https://phabricator.wikimedia.org/T273741 (10Aawarapam) The traffic spikes are closely matching indian holidays. 2 Oct, 5 sept, 14 Nov, 31 Dec, 12-14 Jan etc.
[05:58:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1076 to clone db1162 T258361', diff saved to https://phabricator.wikimedia.org/P14277 and previous config saved to /var/cache/conftool/dbconfig/20210210-055846-marostegui.json
[05:58:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:58:52] <stashbot>	 T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361
[06:00:05] <wikibugs>	 (03PS1) 10Marostegui: install_server: Reimage db1162 to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/663103 (https://phabricator.wikimedia.org/T258361)
[06:01:25] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Reimage db1162 to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/663103 (https://phabricator.wikimedia.org/T258361) (owner: 10Marostegui)
[06:03:06] <wikibugs>	 (03PS1) 10Marostegui: db1170: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/663104 (https://phabricator.wikimedia.org/T258361)
[06:04:54] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.reboot-single for host mc1020.eqiad.wmnet
[06:04:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:06:34] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1170: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/663104 (https://phabricator.wikimedia.org/T258361) (owner: 10Marostegui)
[06:08:42] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1162.eqiad.wmnet'] ` The log ca...
[06:11:10] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add db1170 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/663105 (https://phabricator.wikimedia.org/T258361)
[06:11:26] <logmsgbot>	 !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1020.eqiad.wmnet
[06:11:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:12:15] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instances.yaml: Add db1170 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/663105 (https://phabricator.wikimedia.org/T258361) (owner: 10Marostegui)
[06:16:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1170:3312 and db1170:3317 to dbctl, depooled T258361', diff saved to https://phabricator.wikimedia.org/P14278 and previous config saved to /var/cache/conftool/dbconfig/20210210-061638-marostegui.json
[06:16:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:16:44] <stashbot>	 T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361
[06:19:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1170:3312, db1170:3317 with minimal weight for the first time T258361', diff saved to https://phabricator.wikimedia.org/P14279 and previous config saved to /var/cache/conftool/dbconfig/20210210-061924-marostegui.json
[06:19:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:20:37] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
[06:20:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:22:36] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
[06:22:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:28:49] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1162.eqiad.wmnet'] `  and were **ALL** successful.
[06:35:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Give more weight to db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14281 and previous config saved to /var/cache/conftool/dbconfig/20210210-063534-marostegui.json
[06:35:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:40] <stashbot>	 T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361
[06:39:15] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] "This looks good: https://puppet-compiler.wmflabs.org/compiler1003/27955/clouddb1013.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/662797 (https://phabricator.wikimedia.org/T274044) (owner: 10Bstorm)
[06:41:37] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Productionize db1162 [puppet] - 10https://gerrit.wikimedia.org/r/663107 (https://phabricator.wikimedia.org/T258361)
[06:42:39] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Productionize db1162 [puppet] - 10https://gerrit.wikimedia.org/r/663107 (https://phabricator.wikimedia.org/T258361) (owner: 10Marostegui)
[06:43:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully pool db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14282 and previous config saved to /var/cache/conftool/dbconfig/20210210-064330-marostegui.json
[06:43:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:43:35] <stashbot>	 T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361
[06:43:56] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) Fully pooled:  db1170:3312 db1170:3317
[06:44:15] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[06:45:17] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[06:45:31] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[07:35:33] <legoktm>	 I'm going to be live hacking/debugging on mwdebug1003
[07:42:58] <icinga-wm>	 RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:49:02] <icinga-wm>	 PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:05:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1127 T266483', diff saved to https://phabricator.wikimedia.org/P14283 and previous config saved to /var/cache/conftool/dbconfig/20210210-080512-marostegui.json
[08:05:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:18] <stashbot>	 T266483: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483
[08:14:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14284 and previous config saved to /var/cache/conftool/dbconfig/20210210-081453-root.json
[08:14:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:19:05] <godog>	 !log swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
[08:19:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:19:09] <stashbot>	 T272836: Decom ms-be[1019-1026] from swift - https://phabricator.wikimedia.org/T272836
[08:29:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14285 and previous config saved to /var/cache/conftool/dbconfig/20210210-082957-root.json
[08:30:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:34:48] <legoktm>	 I'm done on mwdebug1003
[08:37:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: "AFAICT removing a lvs service will need to be done in reverse steps as mentioned here: https://wikitech.wikimedia.org/wiki/LVS#Remove_a_lo" [puppet] - 10https://gerrit.wikimedia.org/r/662009 (https://phabricator.wikimedia.org/T217032) (owner: 10Cwhite)
[08:37:56] <wikibugs>	 (03PS4) 10Legoktm: docker_registry_ha: Have restricted/ images that are limited read/write [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521)
[08:37:58] <wikibugs>	 (03PS2) 10Legoktm: k8s: Add docker-registry credentials to pull restricted images [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521)
[08:38:37] <hashar>	 apergos: we are back to 1.36.0-wmf.27 , the upgrade we did yesterday got rolled back :D
[08:39:03] <Majavah>	 morning hashar and apergos 
[08:39:21] <legoktm>	 morning!
[08:39:22] <hashar>	 not sure why thought
[08:39:24] <hashar>	 though
[08:39:40] <Majavah>	 afaik just so there is a safe version to rollback
[08:39:48] <hashar>	 the puzzling thing is that yesterday with wmf.29 we only had a few Serialization Closure issues
[08:40:49] <hashar>	 but I guess that was the same amount of issues we previously had so
[08:41:16] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet
[08:41:18] <Majavah>	 lets see what happens now that the fix and the fix for the fix are merged
[08:41:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:37] <legoktm>	 !log depooling mw1404.eqiad.wmnet for perf benchmarking (T274041)
[08:41:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:42] <stashbot>	 T274041: Investigate performance impact of HookContainer loading 500+ interfaces - https://phabricator.wikimedia.org/T274041
[08:43:03] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] k8s: Add docker-registry credentials to pull restricted images (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[08:44:34] <wikibugs>	 10SRE, 10DBA: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Marostegui)
[08:45:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14286 and previous config saved to /var/cache/conftool/dbconfig/20210210-084500-root.json
[08:45:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:45:13] <apergos>	 hey hashar and Majavah
[08:45:33] <apergos>	 when was 29 -> frwiki rolled back?
[08:45:38] <apergos>	 er, 30 -> frwiki
[08:46:10] <apergos>	 I checked logstash errors this morning and there were indeed no new logged errors after the push to frwiki, to me that is conclusive
[08:46:23] <apergos>	 I also checked for errors from Feed* and there were no exceptions 
[08:46:30] <Majavah>	 amir manually patched mwdebug1002 to have frwiki to .30, otherwise it is only on group0
[08:47:09] <Majavah>	 .29 does not have the fixes
[08:47:28] <apergos>	 oh so 30 is still on frwiki
[08:47:35] <Majavah>	 on mwdebug1002, yes
[08:47:46] <apergos>	 oh, only on mwdebug1002?
[08:47:50] <Majavah>	 yes
[08:47:55] <Majavah>	 I played around with featuredfeeds on frwiki mwdebug1002 today morning after my exam was done, not sure if you checked logstash before or after that
[08:48:04] <apergos>	 what time utc?
[08:48:22] <Majavah>	 maybe around 8 utc
[08:49:22] <wikibugs>	 (03PS3) 10Matthias Mullie: Add external entity search URI for new MediaSearch extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/662792 (https://phabricator.wikimedia.org/T265939) (owner: 10Anne Tomasevich)
[08:49:24] <legoktm>	 mwdebug1002 doesn't actually get any real traffic aside from basic monitoring and devs
[08:50:09] <apergos>	 yeah so that's o good, I will comment on the task and retract my opinion, it was based on 30 -> frwiki everywhere
[08:50:18] <apergos>	 clearly a msitaken understanding
[08:50:51] <wikibugs>	 (03PS4) 10Matthias Mullie: Add external entity search URI for new MediaSearch extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/662792 (https://phabricator.wikimedia.org/T265939) (owner: 10Anne Tomasevich)
[08:54:20] <apergos>	 ok well. I"ll go check logstash again right now and see if there's anything new
[08:54:32] <apergos>	 from mwdebug1002 right? I can at least filter for that host
[08:56:03] <wikibugs>	 (03PS1) 10Elukey: cumin: add presto test alias [puppet] - 10https://gerrit.wikimedia.org/r/663150
[08:56:36] <Majavah>	 apergos: yes
[08:57:22] <apergos>	 welp, I still see nothing that looks like errors...
[08:57:45] <Majavah>	 sounds promising
[08:57:50] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) db1162 is now replicating, but I won't pool it until I'm back next week.
[08:58:53] <apergos>	 https://logstash.wikimedia.org/goto/61b2bb002beab0f1fdb5dceec68b3502 
[08:58:54] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] cumin: add presto test alias [puppet] - 10https://gerrit.wikimedia.org/r/663150 (owner: 10Elukey)
[08:58:57] <apergos>	 for those with access.
[09:00:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14287 and previous config saved to /var/cache/conftool/dbconfig/20210210-090004-root.json
[09:00:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:00:15] <wikibugs>	 (03PS1) 10Elukey: sre.hadoop: add presto in test cumin aliases [cookbooks] - 10https://gerrit.wikimedia.org/r/663151
[09:00:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1076 (re)pooling @ 10%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14288 and previous config saved to /var/cache/conftool/dbconfig/20210210-090057-root.json
[09:01:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:01:12] <apergos>	 so, hashar, Majavah, what is next to move this forward?
[09:01:25] <Majavah>	 I don't have logstash access so can't look at that link you jsut sent
[09:02:04] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] [WIP] linkrecommendation: Cron job to load datasets (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: 10Kosta Harlan)
[09:02:23] <apergos>	 Majavah: I thought that might be the case. the only log messages I see reported are for around 1:40 a.m. (utc) and they are saying that deferredupdates started and ended, no errors
[09:02:32] <apergos>	 again from mwdebug1002
[09:03:56] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] sre.hadoop: add presto in test cumin aliases [cookbooks] - 10https://gerrit.wikimedia.org/r/663151 (owner: 10Elukey)
[09:04:53] <Majavah>	 is there a way of knowing what was inside the deferred update? ie do we know that deferred update did come from featuredfeeds caching?
[09:05:09] <Majavah>	 frwiki was switched to .30 on mwdebug1002 about 00:40 utc
[09:06:44] <wikibugs>	 (03Merged) 10jenkins-bot: sre.hadoop: add presto in test cumin aliases [cookbooks] - 10https://gerrit.wikimedia.org/r/663151 (owner: 10Elukey)
[09:07:40] <apergos>	 there were a few asyncrefreshes but none of them from feeds
[09:10:12] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] docker_registry_ha: Properly override nginx timeouts [puppet] - 10https://gerrit.wikimedia.org/r/662806 (owner: 10Legoktm)
[09:10:57] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
[09:11:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:15] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
[09:11:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:15:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 80%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14289 and previous config saved to /var/cache/conftool/dbconfig/20210210-091507-root.json
[09:15:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14290 and previous config saved to /var/cache/conftool/dbconfig/20210210-091601-root.json
[09:16:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:19:10] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] docker_registry_ha: Have restricted/ images that are limited read/write (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[09:23:15] <vgutierrez>	 !log rolling restart of cp nodes to catch up on kernel upgrades
[09:23:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:30:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14292 and previous config saved to /var/cache/conftool/dbconfig/20210210-093011-root.json
[09:30:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:30:55] <wikibugs>	 10SRE, 10Internet-Archive: noc.wikimedia.org disappeared - https://phabricator.wikimedia.org/T274342 (10Gilles)
[09:31:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14293 and previous config saved to /var/cache/conftool/dbconfig/20210210-093104-root.json
[09:31:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:31:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14294 and previous config saved to /var/cache/conftool/dbconfig/20210210-093132-root.json
[09:31:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:31:48] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp1075.eqiad.wmnet
[09:31:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:14] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp1076.eqiad.wmnet
[09:33:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:21] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp2027.codfw.wmnet
[09:34:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:35:50] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
[09:35:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:38] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
[09:36:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:12] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
[09:37:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:38:30] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp4027.ulsfo.wmnet
[09:38:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:38:56] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp4021.ulsfo.wmnet
[09:38:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:04] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp5007.eqsin.wmnet
[09:40:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:15] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp5001.eqsin.wmnet
[09:40:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:25] <wikibugs>	 (03PS1) 10Gilles: Don’t apply X-Wikimedia-Debug routing to noc.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/663156 (https://phabricator.wikimedia.org/T245552)
[09:45:11] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1076.eqiad.wmnet
[09:45:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:00] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1075.eqiad.wmnet
[09:46:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14295 and previous config saved to /var/cache/conftool/dbconfig/20210210-094608-root.json
[09:46:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:29] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
[09:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14296 and previous config saved to /var/cache/conftool/dbconfig/20210210-094635-root.json
[09:46:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:47:15] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2027.codfw.wmnet
[09:47:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:47:18] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
[09:47:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:00] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
[09:48:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:27] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] tox: Add py3 env that uses default system python3 [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/662966 (owner: 10Kormat)
[09:50:31] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4027.ulsfo.wmnet
[09:50:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:53:22] <wikibugs>	 (03Merged) 10jenkins-bot: tox: Add py3 env that uses default system python3 [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/662966 (owner: 10Kormat)
[09:56:04] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5001.eqsin.wmnet
[09:56:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:05] <wikibugs>	 10SRE, 10Internet-Archive: noc.wikimedia.org is a 404 when X-Wikimedia-Debug is enabled - https://phabricator.wikimedia.org/T274342 (10Aklapper)
[09:57:29] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5007.eqsin.wmnet
[09:57:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:02] <icinga-wm>	 PROBLEM - Host cp4021 is DOWN: PING CRITICAL - Packet loss = 100%
[10:00:07] <vgutierrez>	 !log power cycling cp4021
[10:00:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14297 and previous config saved to /var/cache/conftool/dbconfig/20210210-100111-root.json
[10:01:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14298 and previous config saved to /var/cache/conftool/dbconfig/20210210-100139-root.json
[10:01:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:02:39] <wikibugs>	 (03PS1) 10Elukey: Rename the cdh module to bigtop [puppet] - 10https://gerrit.wikimedia.org/r/663160
[10:02:59] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Decommission db1081 [puppet] - 10https://gerrit.wikimedia.org/r/663161 (https://phabricator.wikimedia.org/T273040)
[10:03:38] <icinga-wm>	 RECOVERY - Host cp4021 is UP: PING OK - Packet loss = 0%, RTA = 68.36 ms
[10:05:23] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission
[10:05:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:08] <icinga-wm>	 RECOVERY - Check systemd state on relforge1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:10:27] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4021.ulsfo.wmnet
[10:10:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:44] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Decommission db1081 [puppet] - 10https://gerrit.wikimedia.org/r/663161 (https://phabricator.wikimedia.org/T273040) (owner: 10Marostegui)
[10:14:47] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[10:14:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:50] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Research Intern (ChristineDeKock) - https://phabricator.wikimedia.org/T274304 (10Aklapper)
[10:15:50] <icinga-wm>	 PROBLEM - Check systemd state on relforge1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:15:58] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission db1081.eqiad.wmnet - https://phabricator.wikimedia.org/T273040 (10Marostegui) a:05Marostegui→03wiki_willy
[10:16:02] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission db1081.eqiad.wmnet - https://phabricator.wikimedia.org/T273040 (10Marostegui)
[10:16:24] <moritzm>	 !log installing firejail security updates
[10:16:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:16:38] <wikibugs>	 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[10:16:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14299 and previous config saved to /var/cache/conftool/dbconfig/20210210-101642-root.json
[10:16:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:18] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Clsuter for Research Intern (ChristineDeKock) - https://phabricator.wikimedia.org/T274304 (10ChristineDeKock)
[10:18:51] <wikibugs>	 10Puppet, 10SRE, 10puppet-compiler, 10Patch-For-Review, 10User-jbond: OKR: Work required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10jbond) @Ladsgroup This could well be to do with how puppetlabs defines core type however it has definitely been removed from the puppet git repo...
[10:20:30] <wikibugs>	 (03Abandoned) 10Elukey: Rename the cdh module to bigtop [puppet] - 10https://gerrit.wikimedia.org/r/663160 (owner: 10Elukey)
[10:25:09] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp1077.eqiad.wmnet
[10:25:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:34] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp1078.eqiad.wmnet
[10:25:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:59] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp2029.codfw.wmnet
[10:26:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:18] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp2030.codfw.wmnet
[10:26:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:54] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
[10:26:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:31] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet
[10:27:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:39] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Remove old OpenStack Rocky files/templates/manifests [puppet] - 10https://gerrit.wikimedia.org/r/663027 (owner: 10Andrew Bogott)
[10:27:50] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp4028.ulsfo.wmnet
[10:27:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:16] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp4022.ulsfo.wmnet
[10:28:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:34] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp5008.eqsin.wmnet
[10:28:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:49] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet
[10:28:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:31:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 80%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14300 and previous config saved to /var/cache/conftool/dbconfig/20210210-103146-root.json
[10:31:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:06] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1077.eqiad.wmnet
[10:37:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:30] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1078.eqiad.wmnet
[10:37:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:02] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2029.codfw.wmnet
[10:38:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:11] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet
[10:38:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:11] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2030.codfw.wmnet
[10:39:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:49] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
[10:39:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:11] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4022.ulsfo.wmnet
[10:40:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:22] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5002.eqsin.wmnet
[10:40:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:40:47] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4028.ulsfo.wmnet
[10:40:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:42:34] <vgutierrez>	 !log powercycle cp5008
[10:42:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14301 and previous config saved to /var/cache/conftool/dbconfig/20210210-104649-root.json
[10:46:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:52:12] <wikibugs>	 (03PS3) 10Kormat: mysql_root_clients: Allow orch access to clouddb [puppet] - 10https://gerrit.wikimedia.org/r/662697 (https://phabricator.wikimedia.org/T273606)
[10:54:16] <wikibugs>	 (03CR) 10Kormat: "@bstorm: Can i get you to look at this, please? I don't want to merge it without a sanity-check :)" [puppet] - 10https://gerrit.wikimedia.org/r/662697 (https://phabricator.wikimedia.org/T273606) (owner: 10Kormat)
[11:00:46] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5008.eqsin.wmnet
[11:00:51] <icinga-wm>	 RECOVERY - PHP opcache health on mwdebug1002 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[11:00:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:02:07] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend access for daniram [puppet] - 10https://gerrit.wikimedia.org/r/663172
[11:09:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Extend access for daniram [puppet] - 10https://gerrit.wikimedia.org/r/663172 (owner: 10Muehlenhoff)
[11:12:49] <icinga-wm>	 RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:18:04] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp4023.ulsfo.wmnet
[11:18:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:31] <icinga-wm>	 PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:22:43] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=yes; selector: name=mw1404.eqiad.wmnet
[11:22:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:35] <wikibugs>	 (03PS1) 10Ayounsi: Improve loopback dhcp term [homer/public] - 10https://gerrit.wikimedia.org/r/663176
[11:28:44] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4023.ulsfo.wmnet
[11:28:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:32:18] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: Add support for php deployments (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/651757 (owner: 10Giuseppe Lavagetto)
[11:32:51] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: Add support for php deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/651757
[11:38:18] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "LGTM, comment inline" (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/663176 (owner: 10Ayounsi)
[11:42:28] <wikibugs>	 (03CR) 10Ayounsi: Improve loopback dhcp term (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/663176 (owner: 10Ayounsi)
[11:42:42] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp1079.eqiad.wmnet
[11:42:47] <stashbot>	 vgutierrez@cumin1001: Failed to log message to wiki. Somebody should check the error logs.
[11:42:59] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp1080.eqiad.wmnet
[11:43:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:43:24] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
[11:43:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:43:41] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp2032.codfw.wmnet
[11:43:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:43:50] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "This cr/firewall.conf file is only loaded in core routers, right?" [homer/public] - 10https://gerrit.wikimedia.org/r/663176 (owner: 10Ayounsi)
[11:43:57] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp3054.esams.wmnet
[11:43:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:44:13] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp3055.esams.wmnet
[11:44:14] <wikibugs>	 (03PS2) 10Muehlenhoff: Initial client profile for unprivileged Cumin [puppet] - 10https://gerrit.wikimedia.org/r/662945
[11:44:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:44:38] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp4029.ulsfo.wmnet
[11:44:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:45:06] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp5009.eqsin.wmnet
[11:45:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:45:19] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp5003.eqsin.wmnet
[11:45:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:25] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: Improve loopback dhcp term (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/663176 (owner: 10Ayounsi)
[11:47:22] <wikibugs>	 (03PS1) 10Volans: wmf-auto-reimage: splay the start when in parallel [puppet] - 10https://gerrit.wikimedia.org/r/663178
[11:47:55] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] Add support for php deployments (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/651757 (owner: 10Giuseppe Lavagetto)
[11:54:02] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
[11:54:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:20] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2032.codfw.wmnet
[11:54:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:36] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1079.eqiad.wmnet
[11:54:38] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3054.esams.wmnet
[11:54:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:58] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1080.eqiad.wmnet
[11:55:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:12] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5009.eqsin.wmnet
[11:55:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:55:37] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3055.esams.wmnet
[11:55:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:56:58] <vgutierrez>	 !log powercycle cp5003
[11:57:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:18] <vgutierrez>	 10% of crash ratio on rebooting cp hosts /o\
[11:58:46] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "LGTM but we should have it under the systemd module e.g." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/663051 (owner: 10Dzahn)
[12:00:05] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: Your horoscope predicts another unfortunate European mid-day backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210210T1200).
[12:00:05] <jouncebot>	 Urbanecm, dcausse, and Lucas_WMDE: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:11] <Lucas_WMDE>	 o/
[12:00:50] <Urbanecm>	 I can deploy today
[12:01:02] <tabbycat>	 beware of the horoscope
[12:01:19] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4029.ulsfo.wmnet
[12:01:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:01:29] <icinga-wm>	 PROBLEM - dhclient process on sretest1002 is CRITICAL: PROCS CRITICAL: 1 process with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[12:01:49] <Urbanecm>	 tabbycat: is there sth affecting deployers? :-)
[12:02:06] <wikibugs>	 (03PS2) 10Urbanecm: Set wgGEHelpPanelAskMentor to true for several wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661448 (https://phabricator.wikimedia.org/T272753)
[12:02:11] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Set wgGEHelpPanelAskMentor to true for several wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661448 (https://phabricator.wikimedia.org/T272753) (owner: 10Urbanecm)
[12:03:36] <Urbanecm>	 Lucas_WMDE: can I just +2 your patch, or do you want to self-deploy? Sounds simple enough to me.
[12:03:36] <wikibugs>	 (03Merged) 10jenkins-bot: Set wgGEHelpPanelAskMentor to true for several wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661448 (https://phabricator.wikimedia.org/T272753) (owner: 10Urbanecm)
[12:04:32] <wikibugs>	 10SRE, 10Data-Persistence-Backup, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10jcrespo) It looks congestion-dependent?  It peaks around ~22 UTC and improves at ~6 UTC: https://grafana-...
[12:04:52] <Lucas_WMDE>	 I’d like to deploy it :)
[12:05:01] <Lucas_WMDE>	 dcausse: around?
[12:05:03] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5003.eqsin.wmnet
[12:05:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:05:15] <Urbanecm>	 Lucas_WMDE: okay, I'll ping you once done
[12:05:33] <wikibugs>	 (03PS3) 10Urbanecm: Enable GrowthExperiments on bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650012 (https://phabricator.wikimedia.org/T266020) (owner: 10Gergő Tisza)
[12:05:37] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Enable GrowthExperiments on bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650012 (https://phabricator.wikimedia.org/T266020) (owner: 10Gergő Tisza)
[12:06:36] <wikibugs>	 (03Merged) 10jenkins-bot: Enable GrowthExperiments on bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650012 (https://phabricator.wikimedia.org/T266020) (owner: 10Gergő Tisza)
[12:06:38] <Urbanecm>	 (I'll sneak one this patch as well, forgot to schedule it)
[12:06:58] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "LGTM as workaround, I am wondering if more than one minute could be better, like 2/3, but we can always review it later!" [puppet] - 10https://gerrit.wikimedia.org/r/663178 (owner: 10Volans)
[12:07:08] <wikibugs>	 (03PS1) 10Marostegui: install_server: Do not reimage db1157 [puppet] - 10https://gerrit.wikimedia.org/r/663181 (https://phabricator.wikimedia.org/T258361)
[12:07:20] <apergos>	 hasharLunch: Majavah: I checked wikimedia-versions.json on mwdebug1001,2,3 and .30 is deployed only to testwiki, testwikidatawiki, labtestwiki on all.   not deployed to frwiki
[12:07:29] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 2d8cb10f246904f1af07b019da270fd8dc7816fa: Set wgGEHelpPanelAskMentor to true for several wikis (T272753) (duration: 01m 21s)
[12:07:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:34] <stashbot>	 T272753: Scale: pilot help panel with mentorship in frwiki, bnwiki, arwiki, viwiki - https://phabricator.wikimedia.org/T272753
[12:07:56] <apergos>	 so that is why you would have seen no log messages, unless it has since been undeployed from frwiki on mwdebug1002
[12:08:03] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db1157 [puppet] - 10https://gerrit.wikimedia.org/r/663181 (https://phabricator.wikimedia.org/T258361) (owner: 10Marostegui)
[12:08:09] <Majavah>	 it was definitely on frwiki when testing
[12:08:23] <marostegui>	 moritzm: ok to merge your change?
[12:09:33] <Majavah>	 maybe it was reset during this backport window
[12:09:54] <Majavah>	 Amir's comments indicate that he just livehacked it there and it would reset on next scap pull
[12:10:12] <wikibugs>	 (03CR) 10Volans: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/663178 (owner: 10Volans)
[12:10:15] <Urbanecm>	 that sounds plausible
[12:10:32] <Urbanecm>	 apergos: if he did so, he had to edit wikiversions.php, not the json file, btw
[12:10:44] <wikibugs>	 (03PS2) 10Volans: wmf-auto-reimage: splay the start when in parallel [puppet] - 10https://gerrit.wikimedia.org/r/663178
[12:11:53] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: e8214ee812f3812f609c26d6422b85a99a91e1f6: Enable GrowthExperiments on bnwiki (T266020) (duration: 01m 08s)
[12:11:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:59] <stashbot>	 T266020: Deploy Growth experiments at Bangla Wikipedia - https://phabricator.wikimedia.org/T266020
[12:12:01] <Urbanecm>	 Lucas_WMDE: the floor is yours
[12:12:03] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Don’t apply X-Wikimedia-Debug routing to noc.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/663156 (https://phabricator.wikimedia.org/T245552) (owner: 10Gilles)
[12:12:14] <apergos>	 I have just done the following: trie to get v1 keys from the wan object cache for testwiki featured feeds en, tried to getv2 keys, all empty.   load https://test.wikipedia.org/wiki/Special:FeedItem/featured/20210201000000/en   via mwdebug1002, get a whine because there's no article (ok); reget v2 key, it's now there
[12:12:29] <Lucas_WMDE>	 ok thanks!
[12:12:42] <apergos>	 I can now look at the logs for specific errors
[12:12:50] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Remove Wikibase.NewItemIdFormatter log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/658321 (https://phabricator.wikimedia.org/T268870) (owner: 10Rosalie Perside (WMDE))
[12:12:55] <apergos>	 I'm not testig multilingual feeds of course :-(
[12:12:59] <icinga-wm>	 RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:13:16] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Remove Wikibase.NewItemIdFormatter log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/658321 (https://phabricator.wikimedia.org/T268870) (owner: 10Rosalie Perside (WMDE))
[12:13:26] <Majavah>	 do we have those anywhere on group0?
[12:14:13] <Majavah>	 frwiki on 1002 still seems to be on .30 fwiw
[12:14:40] <Urbanecm>	 someone definitely ran scap pull there already (either manually or by all-box script)
[12:14:43] <wikibugs>	 (03Merged) 10jenkins-bot: Remove Wikibase.NewItemIdFormatter log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/658321 (https://phabricator.wikimedia.org/T268870) (owner: 10Rosalie Perside (WMDE))
[12:15:07] <Majavah>	 https://fr.wikipedia.org/wiki/Sp%C3%A9cial:Version MediaWiki 	1.36.0-wmf.30 (afb9c32) 2021-02-09T04:07:27
[12:15:21] <Majavah>	 what's that timestamp? last commit?
[12:15:21] <Lucas_WMDE>	 I’m about to pull to mwdebug1001, hope that’s okay
[12:15:28] <dcausse>	 sorry I'm late
[12:15:36] <Lucas_WMDE>	 testing on mwdebug1001
[12:15:45] <Urbanecm>	 Lucas_WMDE: definitely, we're just talking how to debug :)
[12:16:22] <wikibugs>	 (03PS10) 10Arturo Borrero Gonzalez: toolforge: add ingress configuration for jobs.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/662941 (https://phabricator.wikimedia.org/T274139)
[12:16:45] <Lucas_WMDE>	 test seems fine, syncing
[12:18:21] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:658321|Remove Wikibase.NewItemIdFormatter log channel (T268870)]] 1/2 (duration: 01m 07s)
[12:18:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:18:27] <stashbot>	 T268870: Remove Wikibase.NewItemIdFormatter log channel - https://phabricator.wikimedia.org/T268870
[12:19:37] <wikibugs>	 (03PS1) 10Muehlenhoff: profile::kerberos::keytabs: Drop require for the user [puppet] - 10https://gerrit.wikimedia.org/r/663184
[12:20:03] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:658321|Remove Wikibase.NewItemIdFormatter log channel (T268870)]] 2/2 (prod no-op) (duration: 01m 08s)
[12:20:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:09] <Lucas_WMDE>	 (the New Yorker would, of course, call it a prod noöp)
[12:20:33] <Lucas_WMDE>	 dcausse: do you want to self-deploy your change?
[12:20:38] <dcausse>	 Lucas_WMDE: sure
[12:20:42] <Lucas_WMDE>	 alright, go ahead :)
[12:20:46] <apergos>	 I"m still looking at the ebug logs, there are 4k + lines :-)
[12:20:46] <dcausse>	 thanks! :)
[12:20:54] <apergos>	 for my one request...
[12:20:58] <Urbanecm>	 apergos: yeah, verbose logging is...really verbose :)
[12:21:05] <icinga-wm>	 PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:21:11] <wikibugs>	 (03PS4) 10DCausse: [wdqs] Add flink sideoutput stream definitions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661727 (https://phabricator.wikimedia.org/T269619)
[12:21:28] <wikibugs>	 (03CR) 10DCausse: [C: 03+2] [wdqs] Add flink sideoutput stream definitions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661727 (https://phabricator.wikimedia.org/T269619) (owner: 10DCausse)
[12:22:01] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] wmf-auto-reimage: splay the start when in parallel [puppet] - 10https://gerrit.wikimedia.org/r/663178 (owner: 10Volans)
[12:22:24] <wikibugs>	 (03Merged) 10jenkins-bot: [wdqs] Add flink sideoutput stream definitions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661727 (https://phabricator.wikimedia.org/T269619) (owner: 10DCausse)
[12:23:32] <wikibugs>	 (03CR) 10Volans: [C: 03+2] wmf-auto-reimage: splay the start when in parallel [puppet] - 10https://gerrit.wikimedia.org/r/663178 (owner: 10Volans)
[12:24:26] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] profile::kerberos::keytabs: Drop require for the user [puppet] - 10https://gerrit.wikimedia.org/r/663184 (owner: 10Muehlenhoff)
[12:24:51] <icinga-wm>	 PROBLEM - configured eth on sretest1002 is CRITICAL: eno2 reporting no carrier. https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[12:26:44] <logmsgbot>	 !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T269619: [wdqs] Add flink sideoutput stream definitions (duration: 01m 06s)
[12:26:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:26:52] <stashbot>	 T269619: Create pipelines for late/spurious/failed events - https://phabricator.wikimedia.org/T269619
[12:27:04] <wikibugs>	 (03PS2) 10Jbond: base::service_unit: drop support for sysV and upstart init scripts [puppet] - 10https://gerrit.wikimedia.org/r/661917 (https://phabricator.wikimedia.org/T273743)
[12:28:07] <dcausse>	 I'm done
[12:28:29] <Urbanecm>	 so we're all done then :)
[12:29:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] base::service_unit: drop support for sysV and upstart init scripts [puppet] - 10https://gerrit.wikimedia.org/r/661917 (https://phabricator.wikimedia.org/T273743) (owner: 10Jbond)
[12:29:33] <wikibugs>	 (03PS1) 10Thiemo Kreuz (WMDE): [DNM] ReferenceTooltips gadget names for ReferencePreviews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663185 (https://phabricator.wikimedia.org/T274353)
[12:29:57] <wikibugs>	 (03PS6) 10Effie Mouzeli: WIP: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[12:30:57] <wikibugs>	 (03PS11) 10Arturo Borrero Gonzalez: toolforge: add ingress configuration for jobs.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/662941 (https://phabricator.wikimedia.org/T274139)
[12:31:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[12:35:42] <apergos>	 (trying to make frwiki cache the key it ought to cache, via mwdebug1002 now)
[12:36:17] <apergos>	 mwdebug1002 frwiki 1.36.0-wmf.27    so that won't happen
[12:36:28] <apergos>	 ok back to looking at my logs from testwiki
[12:40:03] <wikibugs>	 (03PS12) 10Arturo Borrero Gonzalez: toolforge: add ingress configuration for jobs.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/662941 (https://phabricator.wikimedia.org/T274139)
[12:41:54] <wikibugs>	 (03PS13) 10Arturo Borrero Gonzalez: toolforge: add ingress configuration for jobs.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/662941 (https://phabricator.wikimedia.org/T274139)
[12:42:34] <wikibugs>	 (03PS2) 10Ayounsi: Improve loopback DHCP term [homer/public] - 10https://gerrit.wikimedia.org/r/663176
[12:45:06] <wikibugs>	 (03PS14) 10Arturo Borrero Gonzalez: toolforge: add ingress configuration for jobs.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/662941 (https://phabricator.wikimedia.org/T274139)
[12:45:46] <apergos>	 I scried everything as best as possible, still see no errors, key is cached properly, but again this is not multilingual anything.
[12:45:52] <apergos>	 not sure what to do next
[12:46:05] <apergos>	 role out to mediawiki on debug1002 and check that?
[12:46:06] <wikibugs>	 (03PS4) 10Base: Changing frwiktionary's wmgBabelMainCategory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/662720 (https://phabricator.wikimedia.org/T274137)
[12:46:46] <apergos>	 (this question is for hasharAway and Majavah)
[12:47:12] <Majavah>	 who's the train conductor this week?
[12:47:19] <apergos>	 it can wait for hasharAway to return, presuming he's back later today
[12:47:20] <wikibugs>	 (03PS5) 10Base: Changing frwiktionary's wmgBabelMainCategory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/662720 (https://phabricator.wikimedia.org/T274137)
[12:47:48] <Majavah>	 imo that's a question for releng and not me
[12:47:56] <apergos>	 um twentyafterfour I believe
[12:48:03] <Majavah>	 them and hashar according to wikitech
[12:48:07] <apergos>	 all right, duly pinged :-)
[12:48:14] <Majavah>	 but I'm fairly confident about the fix
[12:48:19] <apergos>	 i will add my meagre findings to the task in the meantime
[12:48:32] <Majavah>	 thanks!
[12:48:43] <icinga-wm>	 PROBLEM - configured eth on sretest1001 is CRITICAL: ens2f1 reporting no carrier. https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[12:48:44] <Majavah>	 according to -releng they will be back later today
[12:49:05] <icinga-wm>	 PROBLEM - dhclient process on sretest1001 is CRITICAL: PROCS CRITICAL: 1 process with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[12:51:16] <wikibugs>	 (03PS7) 10Effie Mouzeli: WIP: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[12:54:14] <wikibugs>	 (03PS15) 10Arturo Borrero Gonzalez: toolforge: add ingress configuration for jobs.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/662941 (https://phabricator.wikimedia.org/T274139)
[12:54:32] <apergos>	 ok great, task updated and we shall see
[12:56:33] <wikibugs>	 (03CR) 10Ayounsi: relforge: New hosts are relforge100[3,4] (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/663054 (https://phabricator.wikimedia.org/T274314) (owner: 10Ryan Kemper)
[12:57:19] <Majavah>	 we should probably add a multilingual feed to a group0 wiki for future incidents
[12:58:11] <apergos>	 that's probably true
[12:58:30] <apergos>	 the extension ought to have tests too, at some point
[13:09:27] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "The filter itself LGTM, but I think the original warning was about the port being opened in cloudsw devices, not in the core routers. Is t" [homer/public] - 10https://gerrit.wikimedia.org/r/663176 (owner: 10Ayounsi)
[13:13:41] <icinga-wm>	 RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:17:53] <wikibugs>	 10SRE, 10Data-Persistence-Backup, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10jcrespo)
[13:18:51] <wikibugs>	 (03PS3) 10Jbond: base::service_unit: drop support for sysV and upstart init scripts [puppet] - 10https://gerrit.wikimedia.org/r/661917
[13:19:24] <wikibugs>	 (03CR) 10Jbond: "PCC (still running) https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27960" [puppet] - 10https://gerrit.wikimedia.org/r/661917 (owner: 10Jbond)
[13:20:04] <wikibugs>	 (03PS4) 10Jbond: base::service_unit: drop support for sysV and upstart init scripts [puppet] - 10https://gerrit.wikimedia.org/r/661917
[13:21:20] <wikibugs>	 (03CR) 10Jbond: "correct pcc" [puppet] - 10https://gerrit.wikimedia.org/r/661917 (owner: 10Jbond)
[13:21:43] <icinga-wm>	 PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:22:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] base::service_unit: drop support for sysV and upstart init scripts [puppet] - 10https://gerrit.wikimedia.org/r/661917 (owner: 10Jbond)
[13:24:36] <wikibugs>	 (03PS5) 10Jbond: base::service_unit: drop support for sysV and upstart init scripts [puppet] - 10https://gerrit.wikimedia.org/r/661917
[13:25:42] <wikibugs>	 (03CR) 10MSantos: "`cleanup-old-osm2pgsql-tables.sql` should be removed. The reason for it was if we migrated imposm3 without cleaning-up data, which is not " (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[13:26:44] <wikibugs>	 (03CR) 10MSantos: [C: 03+1] role::maps: fix MOTD message [puppet] - 10https://gerrit.wikimedia.org/r/662659 (owner: 10Hnowlan)
[13:27:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] base::service_unit: drop support for sysV and upstart init scripts [puppet] - 10https://gerrit.wikimedia.org/r/661917 (owner: 10Jbond)
[13:33:51] <wikibugs>	 (03PS6) 10Jbond: base::service_unit: drop support for sysV and upstart init scripts [puppet] - 10https://gerrit.wikimedia.org/r/661917 (https://phabricator.wikimedia.org/T273743)
[13:38:09] <wikibugs>	 (03PS1) 10Kormat: integration: Allow testing of multiple versions [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/663192 (https://phabricator.wikimedia.org/T265266)
[13:38:18] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "replies inline." (032 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/662762 (https://phabricator.wikimedia.org/T263768) (owner: 10CRusnov)
[13:39:32] <wikibugs>	 (03PS2) 10Kormat: integration: Allow testing of multiple versions [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/663192 (https://phabricator.wikimedia.org/T265266)
[13:40:47] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] Don’t apply X-Wikimedia-Debug routing to noc.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/663156 (https://phabricator.wikimedia.org/T245552) (owner: 10Gilles)
[13:46:12] <wikibugs>	 (03PS3) 10Kormat: integration: Allow testing of multiple versions [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/663192 (https://phabricator.wikimedia.org/T265266)
[13:47:31] <wikibugs>	 (03PS8) 10Jcrespo: [WIP] Move database backups-related puppet code to its own profile/role [puppet] - 10https://gerrit.wikimedia.org/r/662740 (https://phabricator.wikimedia.org/T138562)
[13:50:25] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] integration: Allow testing of multiple versions [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/663192 (https://phabricator.wikimedia.org/T265266) (owner: 10Kormat)
[13:52:55] <wikibugs>	 (03CR) 10Jcrespo: "Actually change looks fine as a starting point, only thing missing is the denylist on monitoring screens: https://puppet-compiler.wmflabs." [puppet] - 10https://gerrit.wikimedia.org/r/662740 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[13:54:22] <wikibugs>	 (03Merged) 10jenkins-bot: integration: Allow testing of multiple versions [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/663192 (https://phabricator.wikimedia.org/T265266) (owner: 10Kormat)
[14:00:04] <jouncebot>	 twentyafterfour and hashar: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Mediawiki train - American+European Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210210T1400).
[14:03:03] <wikibugs>	 (03PS9) 10Jcrespo: dbbackups: Move database backups-related puppet code to its own profile/role [puppet] - 10https://gerrit.wikimedia.org/r/662740 (https://phabricator.wikimedia.org/T138562)
[14:05:05] <apergos>	 hey if any of you are here (hashar, twentyafterfour) please see my question about the ubn
[14:05:08] <apergos>	 in the scrollback
[14:11:14] <wikibugs>	 (03PS1) 10Klausman: Add etcd role for ML Team's new clusters [puppet] - 10https://gerrit.wikimedia.org/r/663200 (https://phabricator.wikimedia.org/T273071)
[14:11:53] <wikibugs>	 (03PS2) 10Klausman: Add etcd role for ML Team's new clusters [puppet] - 10https://gerrit.wikimedia.org/r/663200 (https://phabricator.wikimedia.org/T273071)
[14:14:19] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 04-2] "Not good https://puppet-compiler.wmflabs.org/compiler1002/27964/" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[14:24:12] <wikibugs>	 (03PS1) 10David Caro: utils: add script to run docker ci tests locally [software/spicerack] - 10https://gerrit.wikimedia.org/r/663205 (https://phabricator.wikimedia.org/T274338)
[14:26:40] <wikibugs>	 (03CR) 10Ammarpad: [C: 03+1] Add inline documentation to configuration about updating logos regarding labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663057 (owner: 10Jdlrobson)
[14:29:01] <wikibugs>	 (03PS1) 10Jbond: P:idp: drop tls config in cloud [puppet] - 10https://gerrit.wikimedia.org/r/663206
[14:29:48] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27967/console" [puppet] - 10https://gerrit.wikimedia.org/r/663206 (owner: 10Jbond)
[14:30:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:idp: drop tls config in cloud [puppet] - 10https://gerrit.wikimedia.org/r/663206 (owner: 10Jbond)
[14:32:33] <wikibugs>	 (03CR) 10David Caro: [C: 04-1] "Now that we have development docs I'll add it there too :)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/663205 (https://phabricator.wikimedia.org/T274338) (owner: 10David Caro)
[14:33:51] <wikibugs>	 (03CR) 10Jbond: WIP: profile::memcached::instance: remove "default_values" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[14:37:07] <wikibugs>	 (03PS2) 10Jbond: P:idp: drop tls config in cloud [puppet] - 10https://gerrit.wikimedia.org/r/663206
[14:39:01] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp1081.eqiad.wmnet
[14:39:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:13] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet
[14:39:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:27] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp2034.codfw.wmnet
[14:39:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:46] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp3056.esams.wmnet
[14:39:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:03] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:idp: drop tls config in cloud [puppet] - 10https://gerrit.wikimedia.org/r/663206 (owner: 10Jbond)
[14:40:43] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp3057.esams.wmnet
[14:40:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:58] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp4030.ulsfo.wmnet
[14:41:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:13] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp4024.ulsfo.wmnet
[14:41:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:27] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp5010.eqsin.wmnet
[14:41:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:32] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp5004.eqsin.wmnet
[14:41:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:01] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for urbanecm - https://phabricator.wikimedia.org/T274318 (10Ottomata) Approved.
[14:45:38] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reboot-single for host cp2033.codfw.wmnet
[14:45:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:05] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2034.codfw.wmnet
[14:50:05] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5004.eqsin.wmnet
[14:50:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:03] <jynus>	 !log updating puppet-compiler-facts
[14:51:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:10] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3056.esams.wmnet
[14:51:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:18] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4030.ulsfo.wmnet
[14:51:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:33] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5010.eqsin.wmnet
[14:51:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:40] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1081.eqiad.wmnet
[14:51:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:52] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet
[14:51:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:57] <hasharAway>	 apergos: I am half there yeah
[14:53:30] <apergos>	 ok, uh, what do you think (I guess we are not using this train deploy window)
[14:53:55] <apergos>	 hasharAway: 
[14:54:09] <hasharAway>	 we can do group0 I guess
[14:54:11] <wikibugs>	 (03CR) 10Elukey: "Added a couple of notes, let me know!" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/663200 (https://phabricator.wikimedia.org/T273071) (owner: 10Klausman)
[14:54:20] <apergos>	 when do you want to do it
[14:54:29] <hashar>	 I gotta leave in 40 minutes
[14:54:33] <apergos>	 I want to run a test first to get the version 1 of the key
[14:54:41] <apergos>	 ok let me do this test right now
[14:55:06] <hashar>	 but we can at least deploy the wmf.30 patch for FeatureFeeds if it has not been deployed already
[14:55:10] <hashar>	 and promote group0 wikis
[14:55:21] <hashar>	 not like those wikis have FeatureFeeds
[14:56:03] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3057.esams.wmnet
[14:56:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:50] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4024.ulsfo.wmnet
[14:57:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:56] <apergos>	 ok I need to do a preliminary test so I can get the format of the key in question
[14:58:01] <apergos>	 verify that the v2 version ain't there
[14:58:10] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2021-02-10-134029-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/663213 (https://phabricator.wikimedia.org/T274133)
[14:58:14] <apergos>	 then we can promote and I can rerun the test to verify the key IS there w/o errors.
[14:58:19] <apergos>	 so give me 2 to 5 mins
[14:58:24] <hashar>	 :]]
[14:59:22] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2033.codfw.wmnet
[14:59:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:48] <wikibugs>	 (03PS1) 10Cwhite: hiera: prepare logstash syslog lvs config for removal [puppet] - 10https://gerrit.wikimedia.org/r/663214 (https://phabricator.wikimedia.org/T217032)
[15:02:37] <apergos>	 well I do not get the format of the key from there
[15:02:57] <apergos>	 so next is: go ahead and promote, I try the url, I verify that the exception I see is not again in todays log
[15:03:19] <apergos>	 and I can check that a key goes in because it will be v2 and in the debug logs and I'll see "miss" for the first rounf
[15:03:26] <apergos>	 then i can retrieve it to verify there is content
[15:03:31] <apergos>	 so, tl;dr: roll 'em
[15:03:36] <hashar>	 oh 
[15:03:41] <hashar>	 so you managed to reproduce it?
[15:03:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] profile::kerberos::keytabs: Drop require for the user [puppet] - 10https://gerrit.wikimedia.org/r/663184 (owner: 10Muehlenhoff)
[15:03:50] <apergos>	 no, this is the multilingual feed issue
[15:04:00] <hashar>	 ah
[15:04:00] <wikibugs>	 (03CR) 10Klausman: Add etcd role for ML Team's new clusters (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/663200 (https://phabricator.wikimedia.org/T273071) (owner: 10Klausman)
[15:04:07] <hashar>	 so promote group 0 right?
[15:04:12] <hashar>	 command is ready to launch
[15:04:13] <apergos>	 that's mediawikiwiki, right?
[15:04:24] <hashar>	 yes
[15:04:27] <apergos>	 go
[15:04:37] <hashar>	 we had www.mediawiki.org added there cause it is "low" traffic
[15:04:48] <hashar>	 but has lot of advanced users who can craft nice reports
[15:04:53] <hashar>	 promoting
[15:05:01] <wikibugs>	 (03PS1) 10Hashar: group0 wikis to 1.36.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663216
[15:05:03] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] group0 wikis to 1.36.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663216 (owner: 10Hashar)
[15:05:27] <hashar>	 !log  group0 wikis to 1.36.0-wmf.30  T271344
[15:05:27] <apergos>	 well this is extremely fortunate because it has a multilingual feed right on the home page
[15:05:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:32] <stashbot>	 T271344: 1.36.0-wmf.30 deployment blockers - https://phabricator.wikimedia.org/T271344
[15:05:35] <hashar>	 ;]]]
[15:05:37] <wikibugs>	 10SRE, 10Data-Persistence-Backup, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10jcrespo)
[15:05:51] <apergos>	 is it out already? or I shoul wait?
[15:06:05] <hashar>	 that will report back with a !log once completed
[15:06:07] <apergos>	 ok
[15:06:13] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.36.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663216 (owner: 10Hashar)
[15:06:24] <apergos>	 and mwdebug1002 will have it as well, yes? because debug logging gives me key names
[15:06:33] <hashar>	 yeah
[15:06:36] <apergos>	 perfect
[15:06:49] <hashar>	 mwdebug hosts are just like all the others
[15:06:53] <wikibugs>	 (03CR) 10Itamar Givon: [C: 03+1] "LGTM 😊" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/662970 (https://phabricator.wikimedia.org/T272242) (owner: 10Lucas Werkmeister (WMDE))
[15:06:54] <hashar>	 it is just that they dont receive live traffic
[15:07:02] <hashar>	 unless ones set some http header
[15:07:12] <hashar>	 but otherwise they are part of the scap targets
[15:07:15] <apergos>	 that browser extension is te best thing ever
[15:07:34] <apergos>	 yeah I was cautious because of last night's "30 is on such an such wiki only there"
[15:08:08] <apergos>	 tick... tick... tick...
[15:08:13] <hashar>	 apaches syncing
[15:08:38] <apergos>	 I should have just scap pull to mwebug1002 directly :-D  oh well
[15:08:58] <hashar>	 yeah that works as well
[15:09:07] <hashar>	 actually that is what we should have done bah
[15:09:27] <apergos>	 shoulda coulda woulda
[15:09:36] <apergos>	 I have high confidence in the patch but still
[15:11:39] <logmsgbot>	 !log hashar@deploy1001 rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.30
[15:11:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:59] <apergos>	 woo hoo
[15:12:01] <hashar>	 https://www.mediawiki.org/wiki/Special:Version is at wmf.30
[15:12:05] <apergos>	 time to test
[15:12:12] <apergos>	 hope you didn't load the home page yet :-P
[15:12:24] <hashar>	 most probably someone did 
[15:14:04] <apergos>	 the particular error oes not show up so that's a win
[15:14:35] <apergos>	 nothing in the exception log
[15:14:46] <apergos>	 no key format, guess I should do that for completeness
[15:15:22] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Clsuter for Research Intern (ChristineDeKock) - https://phabricator.wikimedia.org/T274304 (10diego)
[15:15:50] <wikibugs>	 (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/compiler1002/27970/" [puppet] - 10https://gerrit.wikimedia.org/r/662740 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[15:17:05] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] hiera: prepare logstash syslog lvs config for removal [puppet] - 10https://gerrit.wikimedia.org/r/663214 (https://phabricator.wikimedia.org/T217032) (owner: 10Cwhite)
[15:18:03] <apergos>	 meh it doesn't even have the featuredfeeds
[15:18:04] <apergos>	 so.
[15:20:07] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] base::service_unit: drop support for sysV and upstart init scripts [puppet] - 10https://gerrit.wikimedia.org/r/661917 (https://phabricator.wikimedia.org/T273743) (owner: 10Jbond)
[15:20:46] <apergos>	 calling it done regardless :-/
[15:21:25] <wikibugs>	 (03PS1) 10Ottomata: Do not produce canary events for rdf-streaming-updater streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663219 (https://phabricator.wikimedia.org/T269619)
[15:21:56] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] Do not produce canary events for rdf-streaming-updater streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663219 (https://phabricator.wikimedia.org/T269619) (owner: 10Ottomata)
[15:22:02] <hashar>	 apergos: I thought about using wmf.30 on some wiki on mwdebug
[15:22:17] <hashar>	 but I am afraid of the side effectrs it might have if the rest of the app servers are on wmf.27
[15:22:25] <ottomata>	 hashar:  ok if i deploy a config change? ^^
[15:22:31] <apergos>	 I alreay tested testwiki on mwdebug1002
[15:22:41] <apergos>	 so that's the equivalent, I noted it on the task earlier
[15:22:59] <apergos>	 it all looked fine as to the cache and the key etc
[15:22:59] <hashar>	 ottomata: yes go for it ;)
[15:23:02] <ottomata>	 ty
[15:23:07] <hashar>	 ottomata: well not like I have any idea what that change is doing hehe
[15:23:13] <hashar>	 but consider scap your !
[15:23:17] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Do not produce canary events for rdf-streaming-updater streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663219 (https://phabricator.wikimedia.org/T269619) (owner: 10Ottomata)
[15:24:40] <wikibugs>	 (03Merged) 10jenkins-bot: Do not produce canary events for rdf-streaming-updater streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663219 (https://phabricator.wikimedia.org/T269619) (owner: 10Ottomata)
[15:24:40] <apergos>	 i guess this means the train can roll forward in the evening slot today
[15:25:47] <hashar>	 apergos: may you report the result on the wmf.30 blocking task please?
[15:26:05] <apergos>	 yeah I was getting there until I got pinged about my issues with element -> slack
[15:26:09] <apergos>	 e_toomanybrokenthings
[15:26:10] <hashar>	 and I guess american folks will promote wmf.30 to group1 at 20:00 UTC later today
[15:26:15] <apergos>	 yep
[15:26:15] <hashar>	 ahah
[15:26:17] <hashar>	 sounds familiar
[15:26:25] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Do not produce canary events for rdf-streaming-updater streams - T269619 (duration: 01m 13s)
[15:26:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:29] <stashbot>	 T269619: Create pipelines for late/spurious/failed events - https://phabricator.wikimedia.org/T269619
[15:26:30] <Majavah>	 Error: Too many errors
[15:29:25] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Update password locations for database-backups db [labs/private] - 10https://gerrit.wikimedia.org/r/663221 (https://phabricator.wikimedia.org/T138562)
[15:31:04] <apergos>	 commented on task, that should be enough
[15:31:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Good riddance!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/661917 (https://phabricator.wikimedia.org/T273743) (owner: 10Jbond)
[15:31:21] <apergos>	 thanks a lot folks
[15:31:39] <wikibugs>	 (03PS32) 10Hnowlan: start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[15:31:45] <Majavah>	 happy to help
[15:32:32] <wikibugs>	 (03CR) 10Volans: "Started doing a pass, but then converted it into a first pass giving the general comments I have. Sorry in advance for the mix of nits/gen" (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/661921 (https://phabricator.wikimedia.org/T267412) (owner: 10David Caro)
[15:33:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[15:33:55] <wikibugs>	 (03PS2) 10Jcrespo: dbbackups: Update password locations for database-backups db [labs/private] - 10https://gerrit.wikimedia.org/r/663221 (https://phabricator.wikimedia.org/T138562)
[15:33:57] <wikibugs>	 (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS (NOOP 1 DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27971/console" [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[15:37:36] <wikibugs>	 (03PS3) 10Jcrespo: dbbackups: Update password locations for database-backups db [labs/private] - 10https://gerrit.wikimedia.org/r/663221 (https://phabricator.wikimedia.org/T138562)
[15:37:49] <wikibugs>	 (03CR) 10Jbond: "thanks see comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/661917 (https://phabricator.wikimedia.org/T273743) (owner: 10Jbond)
[15:42:47] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10hashar) I can not tell why the homepage of doc....
[15:43:01] <icinga-wm>	 RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:44:05] <wikibugs>	 (03CR) 10Jcrespo: [V: 03+2 C: 03+2] dbbackups: Update password locations for database-backups db [labs/private] - 10https://gerrit.wikimedia.org/r/663221 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[15:50:09] <wikibugs>	 (03CR) 10Jcrespo: "alert hosts look good now, but this change requires private puppet changes deployed at the same time https://puppet-compiler.wmflabs.org/c" [puppet] - 10https://gerrit.wikimedia.org/r/662740 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[15:50:45] <icinga-wm>	 PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:52:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Initial client profile for unprivileged Cumin [puppet] - 10https://gerrit.wikimedia.org/r/662945 (owner: 10Muehlenhoff)
[15:57:08] <wikibugs>	 (03PS8) 10Effie Mouzeli: WIP: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[15:58:37] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission db1081.eqiad.wmnet - https://phabricator.wikimedia.org/T273040 (10wiki_willy) a:05wiki_willy→03Cmjohnson Thanks @Marostegui   >>! In T273040#6817937, @Marostegui wrote: > This is ready for DCOps!
[16:01:18] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
[16:01:19] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
[16:01:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:01:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:00] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) >>! In T247653#6819019, @hashar wrote: >...
[16:02:08] <wikibugs>	 (03Abandoned) 10JMeybohm: Lint the chart _scaffold by creating a dummy chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/638025 (owner: 10JMeybohm)
[16:02:52] <wikibugs>	 (03PS10) 10Jcrespo: dbbackups: Move database backups-related puppet code to its own profile/role [puppet] - 10https://gerrit.wikimedia.org/r/662740 (https://phabricator.wikimedia.org/T138562)
[16:03:26] <wikibugs>	 (03PS1) 10Jbond: P:idp: add ability to disable start tls [puppet] - 10https://gerrit.wikimedia.org/r/663231
[16:03:35] <wikibugs>	 (03CR) 10Elukey: Add etcd role for ML Team's new clusters (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/663200 (https://phabricator.wikimedia.org/T273071) (owner: 10Klausman)
[16:04:04] <wikibugs>	 (03PS33) 10Hnowlan: start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[16:04:25] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27979/console" [puppet] - 10https://gerrit.wikimedia.org/r/663231 (owner: 10Jbond)
[16:04:51] <icinga-wm>	 RECOVERY - tilerator on maps1005 is OK: HTTP OK: HTTP/1.1 200 OK - 315 bytes in 0.032 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/tilerator
[16:04:53] <icinga-wm>	 RECOVERY - tileratorui on maps1005 is OK: HTTP OK: HTTP/1.1 200 OK - 315 bytes in 0.021 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/tileratorui
[16:05:11] <wikibugs>	 (03PS1) 10Volans: dhcpd: create and include files for option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663233 (https://phabricator.wikimedia.org/T221388)
[16:05:13] <wikibugs>	 (03PS1) 10Volans: dhcpd: move sretest1002 to option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663234 (https://phabricator.wikimedia.org/T221388)
[16:05:41] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:idp: add ability to disable start tls [puppet] - 10https://gerrit.wikimedia.org/r/663231 (owner: 10Jbond)
[16:05:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] dhcpd: create and include files for option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663233 (https://phabricator.wikimedia.org/T221388) (owner: 10Volans)
[16:05:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[16:06:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] dhcpd: move sretest1002 to option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663234 (https://phabricator.wikimedia.org/T221388) (owner: 10Volans)
[16:06:17] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] mysql_legacy.py: Add x2 [software/spicerack] - 10https://gerrit.wikimedia.org/r/662631 (https://phabricator.wikimedia.org/T269324) (owner: 10Marostegui)
[16:07:09] <wikibugs>	 (03PS2) 10Volans: dhcpd: create and include files for option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663233 (https://phabricator.wikimedia.org/T221388)
[16:07:11] <wikibugs>	 (03PS2) 10Volans: dhcpd: move sretest1002 to option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663234 (https://phabricator.wikimedia.org/T221388)
[16:08:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] dhcpd: create and include files for option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663233 (https://phabricator.wikimedia.org/T221388) (owner: 10Volans)
[16:08:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] dhcpd: move sretest1002 to option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663234 (https://phabricator.wikimedia.org/T221388) (owner: 10Volans)
[16:08:28] <volans>	 not to self, don't try to make a puppet patch while doing other 3 things
[16:09:39] <wikibugs>	 (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS (NOOP 1 DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27980/console" [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[16:12:34] <moritzm>	 !log installing atftp security updates
[16:12:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:10] <wikibugs>	 (03PS1) 10Filippo Giunchedi: alertmanager: route Performance team alerts [puppet] - 10https://gerrit.wikimedia.org/r/663238 (https://phabricator.wikimedia.org/T272979)
[16:18:12] <wikibugs>	 10ops-eqiad: maps1005.eqiad.wmnet: possible cable issues - https://phabricator.wikimedia.org/T274387 (10hnowlan)
[16:20:33] <wikibugs>	 (03PS1) 10Jbond: P:idp: macke ldap_bind_nd a parameter [puppet] - 10https://gerrit.wikimedia.org/r/663239
[16:20:34] <moritzm>	 !log installing unzip security updates
[16:20:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:idp: macke ldap_bind_nd a parameter [puppet] - 10https://gerrit.wikimedia.org/r/663239 (owner: 10Jbond)
[16:23:31] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia...
[16:23:47] <wikibugs>	 (03CR) 10Muehlenhoff: P:idp: macke ldap_bind_nd a parameter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/663239 (owner: 10Jbond)
[16:24:27] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia...
[16:24:38] <wikibugs>	 (03PS9) 10Effie Mouzeli: WIP: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[16:24:45] <wikibugs>	 (03PS1) 10Elukey: cdh::hive: remove sentry specific bits [puppet] - 10https://gerrit.wikimedia.org/r/663240 (https://phabricator.wikimedia.org/T274345)
[16:25:40] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] cdh::hive: remove sentry specific bits [puppet] - 10https://gerrit.wikimedia.org/r/663240 (https://phabricator.wikimedia.org/T274345) (owner: 10Elukey)
[16:26:11] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia...
[16:28:58] <wikibugs>	 (03CR) 10Volans: [C: 03+2] mysql_legacy.py: Add x2 [software/spicerack] - 10https://gerrit.wikimedia.org/r/662631 (https://phabricator.wikimedia.org/T269324) (owner: 10Marostegui)
[16:31:24] <wikibugs>	 10SRE, 10Services, 10Service-deployment-requests: New Service Request geoshapes - https://phabricator.wikimedia.org/T274388 (10MSantos)
[16:32:53] <wikibugs>	 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Services, 10Service-deployment-requests: New Service Request geoshapes - https://phabricator.wikimedia.org/T274388 (10MSantos)
[16:33:45] <wikibugs>	 (03PS1) 10Elukey: druid: remove cdh specific configurations [puppet] - 10https://gerrit.wikimedia.org/r/663241 (https://phabricator.wikimedia.org/T274345)
[16:33:49] <wikibugs>	 (03PS2) 10Jbond: P:idp: macke ldap_bind_nd a parameter [puppet] - 10https://gerrit.wikimedia.org/r/663239
[16:34:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:idp: macke ldap_bind_nd a parameter [puppet] - 10https://gerrit.wikimedia.org/r/663239 (owner: 10Jbond)
[16:34:32] <wikibugs>	 (03PS3) 10Jbond: P:idp: make ldap_bind_nd a parameter [puppet] - 10https://gerrit.wikimedia.org/r/663239
[16:34:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:idp: make ldap_bind_nd a parameter [puppet] - 10https://gerrit.wikimedia.org/r/663239 (owner: 10Jbond)
[16:35:57] <wikibugs>	 (03Merged) 10jenkins-bot: mysql_legacy.py: Add x2 [software/spicerack] - 10https://gerrit.wikimedia.org/r/662631 (https://phabricator.wikimedia.org/T269324) (owner: 10Marostegui)
[16:37:14] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 04-2] "Still not good https://puppet-compiler.wmflabs.org/compiler1002/27985/" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[16:38:13] <wikibugs>	 (03PS3) 10Volans: dhcpd: create and include files for option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663233 (https://phabricator.wikimedia.org/T221388)
[16:38:15] <wikibugs>	 (03PS3) 10Volans: dhcpd: move sretest1002 to option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663234 (https://phabricator.wikimedia.org/T221388)
[16:38:31] <wikibugs>	 10SRE, 10MediaWiki-Debug-Logger, 10Traffic, 10Developer Productivity, 10Performance-Team (Radar): noc.wikimedia.org with X-Wikimedia-Debug routes to mwdebug but host is not served there - https://phabricator.wikimedia.org/T245552 (10Gilles) 05Open→03Resolved a:03Gilles Thanks @jijiki !
[16:38:42] <wikibugs>	 (03PS4) 10Jbond: P:idp: update profile to use ldap['proxyagent'] [puppet] - 10https://gerrit.wikimedia.org/r/663239
[16:38:46] <wikibugs>	 (03CR) 10Jbond: P:idp: update profile to use ldap['proxyagent'] (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/663239 (owner: 10Jbond)
[16:39:34] <wikibugs>	 (03CR) 10Volans: "Compiler results at: https://puppet-compiler.wmflabs.org/compiler1003/27983/install1003.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/663233 (https://phabricator.wikimedia.org/T221388) (owner: 10Volans)
[16:39:36] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27986/console" [puppet] - 10https://gerrit.wikimedia.org/r/663239 (owner: 10Jbond)
[16:40:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/663239 (owner: 10Jbond)
[16:40:23] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1371.eqiad.wmnet with reason: REIMAGE
[16:40:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:40:46] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:idp: update profile to use ldap['proxyagent'] [puppet] - 10https://gerrit.wikimedia.org/r/663239 (owner: 10Jbond)
[16:41:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/663234 (https://phabricator.wikimedia.org/T221388) (owner: 10Volans)
[16:42:24] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1371.eqiad.wmnet with reason: REIMAGE
[16:42:24] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1379.eqiad.wmnet with reason: REIMAGE
[16:42:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/663233 (https://phabricator.wikimedia.org/T221388) (owner: 10Volans)
[16:44:14] <wikibugs>	 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Services, 10Service-deployment-requests: [DRAFT] New Service Request tegola - https://phabricator.wikimedia.org/T274390 (10MSantos)
[16:44:25] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1379.eqiad.wmnet with reason: REIMAGE
[16:44:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:07] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] hiera: prepare logstash syslog lvs config for removal [puppet] - 10https://gerrit.wikimedia.org/r/663214 (https://phabricator.wikimedia.org/T217032) (owner: 10Cwhite)
[16:47:34] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] dhcpd: create and include files for option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663233 (https://phabricator.wikimedia.org/T221388) (owner: 10Volans)
[16:47:56] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] dhcpd: move sretest1002 to option 82 [puppet] - 10https://gerrit.wikimedia.org/r/663234 (https://phabricator.wikimedia.org/T221388) (owner: 10Volans)
[16:48:13] <wikibugs>	 (03PS10) 10Effie Mouzeli: WIP: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[16:52:04] <wikibugs>	 (03PS3) 10Dzahn: systemd: add data type for 'day of the week' in systemd timers/calendar [puppet] - 10https://gerrit.wikimedia.org/r/663051
[16:53:20] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1295.eqiad.wmnet with reason: REIMAGE
[16:53:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:47] <wikibugs>	 (03CR) 10Dzahn: "could releng let us know if this is officially declined or still happening? It was stalled month ago by request but the reason for that re" [puppet] - 10https://gerrit.wikimedia.org/r/556270 (https://phabricator.wikimedia.org/T240266) (owner: 10Paladox)
[16:56:54] <wikibugs>	 (03PS1) 10Cwhite: hiera: prepare logstash-syslog lvs service for removal [puppet] - 10https://gerrit.wikimedia.org/r/663242 (https://phabricator.wikimedia.org/T217032)
[16:57:01] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1295.eqiad.wmnet with reason: REIMAGE
[16:57:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:57:45] <wikibugs>	 (03CR) 10Phuedx: [C: 03+1] Add inline documentation to configuration about updating logos regarding labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663057 (owner: 10Jdlrobson)
[16:59:19] <wikibugs>	 (03CR) 10Dzahn: "how to get review from traffic?" [puppet] - 10https://gerrit.wikimedia.org/r/659377 (https://phabricator.wikimedia.org/T272559) (owner: 10Dzahn)
[17:06:29] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/663242 (https://phabricator.wikimedia.org/T217032) (owner: 10Cwhite)
[17:07:23] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] hiera: prepare logstash-syslog lvs service for removal [puppet] - 10https://gerrit.wikimedia.org/r/663242 (https://phabricator.wikimedia.org/T217032) (owner: 10Cwhite)
[17:07:38] <wikibugs>	 (03PS2) 10Cwhite: hiera: prepare logstash-syslog lvs service for removal [puppet] - 10https://gerrit.wikimedia.org/r/663242 (https://phabricator.wikimedia.org/T217032)
[17:08:36] <twentyafterfour>	 apergos: which question in the scrollback?
[17:08:53] <wikibugs>	 (03PS34) 10Hnowlan: start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[17:08:56] <apergos>	 ah now it's ased ans answered :-)
[17:09:10] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@4f5a5a7]: puppet dashboard policy updates
[17:09:10] <twentyafterfour>	 ah ok
[17:09:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:13] <apergos>	 current state of things for the ubn train blocker: it is deployed to group1 seemingly without issues
[17:09:17] <wikibugs>	 (03CR) 10Hnowlan: [V: 03+1] "I believe this latest patchset addresses all issues raised, apologies for the sprawling nature of fixes." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[17:09:22] <apergos>	 i.e. wmf.30 is deployed
[17:09:47] <twentyafterfour>	 apergos: oh, excellent 
[17:10:14] <wikibugs>	 (03PS5) 10Legoktm: docker_registry_ha: Have restricted/ images that are limited read/write [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521)
[17:10:16] <wikibugs>	 (03PS3) 10Legoktm: k8s: Add docker-registry credentials to pull restricted images [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521)
[17:10:25] <apergos>	 no wrong it is on group0
[17:10:26] <wikibugs>	 (03CR) 10Legoktm: k8s: Add docker-registry credentials to pull restricted images (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[17:10:29] <apergos>	 including mediawikiwiki
[17:10:32] <wikibugs>	 (03CR) 10Legoktm: docker_registry_ha: Have restricted/ images that are limited read/write (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[17:10:33] <apergos>	 (sorry)
[17:10:47] <apergos>	 I just checked exception log, it seems fine still
[17:11:50] <apergos>	 so "we" were thinking that in the evening train slot the deployer might roll it out to group1 
[17:12:16] <twentyafterfour>	 indeed that would be the plan 
[17:12:20] <apergos>	 cool!
[17:13:02] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@4f5a5a7]: puppet dashboard policy updates (duration: 03m 53s)
[17:13:04] <shdubsh>	 !log restart pybal on backup lvs1016
[17:13:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:14:01] <wikibugs>	 (03PS11) 10Effie Mouzeli: WIP: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[17:14:22] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1015 is CRITICAL: CRITICAL: Services in IPVS but unknown to PyBal: set([10.2.2.36:10514]) https://wikitech.wikimedia.org/wiki/PyBal
[17:14:58] <shdubsh>	 ^^ known
[17:15:13] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] druid: remove cdh specific configurations [puppet] - 10https://gerrit.wikimedia.org/r/663241 (https://phabricator.wikimedia.org/T274345) (owner: 10Elukey)
[17:18:21] <wikibugs>	 (03PS12) 10Effie Mouzeli: WIP: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[17:18:58] <shdubsh>	 !log restart pybal on low-traffic lvs1015
[17:19:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:19:57] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
[17:19:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:26] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Services in IPVS but unknown to PyBal: set([10.2.2.36:10514]) https://wikitech.wikimedia.org/wiki/PyBal
[17:21:42] <wikibugs>	 (03PS1) 10Andrew Bogott: cinder: set default policy to admin_or_projectadmin [puppet] - 10https://gerrit.wikimedia.org/r/663251 (https://phabricator.wikimedia.org/T274107)
[17:22:26] <wikibugs>	 (03CR) 10Effie Mouzeli: "woohoo https://puppet-compiler.wmflabs.org/compiler1003/27991/" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[17:22:48] <wikibugs>	 (03PS13) 10Effie Mouzeli: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[17:23:38] <wikibugs>	 (03PS14) 10Effie Mouzeli: profile::memcached::instance: remove "default_values" [puppet] - 10https://gerrit.wikimedia.org/r/647190 (owner: 10Elukey)
[17:23:52] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[17:24:26] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1015 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[17:25:07] <wikibugs>	 (03CR) 10JMeybohm: k8s: Add docker-registry credentials to pull restricted images (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[17:27:49] <wikibugs>	 (03PS1) 10Andrew Bogott: Openstack policies: add a default policy override requiring projectadmin [puppet] - 10https://gerrit.wikimedia.org/r/663253 (https://phabricator.wikimedia.org/T274107)
[17:28:05] <wikibugs>	 (03PS3) 10Cwhite: profile: remove deprecated syslog input [puppet] - 10https://gerrit.wikimedia.org/r/662009 (https://phabricator.wikimedia.org/T217032)
[17:28:11] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cinder: set default policy to admin_or_projectadmin [puppet] - 10https://gerrit.wikimedia.org/r/663251 (https://phabricator.wikimedia.org/T274107) (owner: 10Andrew Bogott)
[17:28:42] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Openstack policies: add a default policy override requiring projectadmin [puppet] - 10https://gerrit.wikimedia.org/r/663253 (https://phabricator.wikimedia.org/T274107) (owner: 10Andrew Bogott)
[17:38:52] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] profile: update w3creportingapi to use 12 weekly indexes [puppet] - 10https://gerrit.wikimedia.org/r/661993 (https://phabricator.wikimedia.org/T274005) (owner: 10Cwhite)
[17:39:59] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wikireplicas: adjust logrotate for multiinstance on wmf-pt-kill [puppet] - 10https://gerrit.wikimedia.org/r/662797 (https://phabricator.wikimedia.org/T274044) (owner: 10Bstorm)
[17:40:14] <icinga-wm>	 PROBLEM - Host thumbor1001 is DOWN: PING CRITICAL - Packet loss = 100%
[17:41:13] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] docker_registry_ha: Have restricted/ images that are limited read/write (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[17:41:33] <wikibugs>	 10SRE: hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (10Dzahn)
[17:42:24] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1295.eqiad.wmnet'] `  an...
[17:42:54] <wikibugs>	 (03PS2) 10Dzahn: hieradata/common: replace hiera within hiera with lookup [puppet] - 10https://gerrit.wikimedia.org/r/662021 (https://phabricator.wikimedia.org/T209953)
[17:43:10] <icinga-wm>	 RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:43:22] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1295.eqiad.wmnet
[17:43:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:44:13] <wikibugs>	 (03PS1) 10Andrew Bogott: Keystone policy: standardize on the rule name 'admin_or_projectadmin' [puppet] - 10https://gerrit.wikimedia.org/r/663258 (https://phabricator.wikimedia.org/T274107)
[17:45:04] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Keystone policy: standardize on the rule name 'admin_or_projectadmin' [puppet] - 10https://gerrit.wikimedia.org/r/663258 (https://phabricator.wikimedia.org/T274107) (owner: 10Andrew Bogott)
[17:47:56] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1013 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:48:48] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=webperf_arclamp site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:49:36] <icinga-wm>	 PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:51:03] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1295.eqiad.wmnet
[17:51:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:51:29] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia...
[17:53:31] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/663261
[17:54:12] <icinga-wm>	 PROBLEM - Prometheus prometheus1003/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus1003 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/global
[17:55:18] <icinga-wm>	 PROBLEM - Prometheus prometheus1004/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus1004 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/global
[17:58:05] <wikibugs>	 (03PS1) 10Jdlrobson: Labs should override all logo definitions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663263 (https://phabricator.wikimedia.org/T274210)
[17:58:08] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1016 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:58:34] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1020 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:58:38] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1015 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:59:34] <wikibugs>	 (03PS2) 10Jdlrobson: Labs should override all logo definitions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663263 (https://phabricator.wikimedia.org/T274210)
[18:01:48] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1017 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:04:35] <jynus>	 yay ^ , bstorm! (I am guessing it was you)
[18:04:51] <bstorm>	 Yes it was :)
[18:04:57] <jynus>	 thank you!!!!
[18:05:06] <bstorm>	 np 
[18:05:33] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1371.eqiad.wmnet'] `  Of...
[18:06:53] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1379.eqiad.wmnet'] `  Of...
[18:13:24] <wikibugs>	 (03CR) 10Legoktm: docker_registry_ha: Have restricted/ images that are limited read/write (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[18:13:40] <wikibugs>	 (03CR) 10Legoktm: k8s: Add docker-registry credentials to pull restricted images (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[18:14:10] <wikibugs>	 (03PS6) 10Legoktm: docker_registry_ha: Have restricted/ images that are limited read/write [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521)
[18:14:12] <wikibugs>	 (03PS4) 10Legoktm: k8s: Add docker-registry credentials to pull restricted images [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521)
[18:14:31] <logmsgbot>	 !log jiji@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thumbor1001.eqiad.wmnet
[18:14:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] k8s: Add docker-registry credentials to pull restricted images [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[18:17:22] <icinga-wm>	 RECOVERY - Prometheus prometheus1004/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/global
[18:17:24] <wikibugs>	 (03PS5) 10Legoktm: k8s: Add docker-registry credentials to pull restricted images [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521)
[18:17:26] <icinga-wm>	 RECOVERY - Prometheus prometheus1003/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus1003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/global
[18:17:34] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1019 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:18:33] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1294.eqiad.wmnet with reason: REIMAGE
[18:18:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:59] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "not speaking for the nginx config, but the lookup/hiera part and naming of the keys looks good to me now." [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[18:20:09] <wikibugs>	 10SRE, 10Scap, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)): Re-imaged mw app servers can end up with missing l10n cache for old versions of MW needed for rollback - https://phabricator.wikimedia.org/T273334 (10Legoktm) This happened agai...
[18:20:38] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1294.eqiad.wmnet with reason: REIMAGE
[18:20:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:04] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1018 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:24:07] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by legoktm on cumin1001.eq...
[18:25:49] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Legoktm)
[18:27:02] <icinga-wm>	 RECOVERY - Host logstash1020.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.95 ms
[18:27:51] <wikibugs>	 (03PS1) 10Legoktm: Update docker-registry related hiera keys for I76a6fc9d21 and Ic655290a69a [labs/private] - 10https://gerrit.wikimedia.org/r/663268
[18:28:03] <wikibugs>	 (03CR) 10Legoktm: [V: 03+2 C: 03+2] Update docker-registry related hiera keys for I76a6fc9d21 and Ic655290a69a [labs/private] - 10https://gerrit.wikimedia.org/r/663268 (owner: 10Legoktm)
[18:28:31] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/663269
[18:28:35] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/663270
[18:29:11] <wikibugs>	 (03CR) 10Legoktm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27992/console" [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[18:30:42] <wikibugs>	 (03CR) 10Legoktm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27993/console" [puppet] - 10https://gerrit.wikimedia.org/r/662807 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[18:31:38] <wikibugs>	 (03CR) 10Legoktm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27994/console" [puppet] - 10https://gerrit.wikimedia.org/r/662806 (owner: 10Legoktm)
[18:31:47] <wikibugs>	 (03PS3) 10Legoktm: docker_registry_ha: Properly override nginx timeouts [puppet] - 10https://gerrit.wikimedia.org/r/662806
[18:32:13] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1371.eqiad.wmnet
[18:32:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:29] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@4f5a5a7]: security group dashboard policy updates
[18:32:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:36] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@4f5a5a7]: security group dashboard policy updates (duration: 00m 07s)
[18:32:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:33:21] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@02cb8a4]: security group dashboard policy updates, now after doing a submodule update!
[18:33:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:33:54] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] docker_registry_ha: Properly override nginx timeouts [puppet] - 10https://gerrit.wikimedia.org/r/662806 (owner: 10Legoktm)
[18:34:10] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
[18:34:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:34:31] <wikibugs>	 (03CR) 10Dzahn: "+1 for the lookup/hiera keys part discussed on IRC, not speaking for the rest of it" [puppet] - 10https://gerrit.wikimedia.org/r/663064 (https://phabricator.wikimedia.org/T273521) (owner: 10Legoktm)
[18:36:52] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@02cb8a4]: security group dashboard policy updates, now after doing a submodule update! (duration: 03m 31s)
[18:36:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:42:00] <wikibugs>	 (03PS1) 10Jbond: idp - cloud: update base-dn [puppet] - 10https://gerrit.wikimedia.org/r/663274
[18:43:22] <wikibugs>	 10SRE: hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (10Dzahn)
[18:44:08] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] idp - cloud: update base-dn [puppet] - 10https://gerrit.wikimedia.org/r/663274 (owner: 10Jbond)
[18:44:36] <icinga-wm>	 PROBLEM - Host mw1379 is DOWN: PING CRITICAL - Packet loss = 100%
[18:44:45] <wikibugs>	 10SRE: hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (10Dzahn)
[18:46:48] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "Looks good to me. Access peculiarities are kept to the proxy layer, so I have no concerns at this layer." [puppet] - 10https://gerrit.wikimedia.org/r/662697 (https://phabricator.wikimedia.org/T273606) (owner: 10Kormat)
[18:47:10] <mutante>	 legoktm: any idea why hosts are going down like 1379 that were already done?
[18:47:21] <legoktm>	 uhh not me
[18:47:30] <legoktm>	 I'm doing 1321-1324
[18:47:31] <mutante>	 hmm, ACK.. looking
[18:47:35] <mutante>	 ok, thx
[18:47:48] <mutante>	 weird.. everything else seemed normal and it was done
[18:48:08] <mutante>	 and it's 2 of them
[18:48:58] <mutante>	 oh.. it's what happened once before yesterday
[18:49:11] <mutante>	 everything works .. past the first puppet run
[18:49:22] <mutante>	 but then it does the final reboot and times out 
[18:49:44] <mutante>	 then I powercycled and it was .. fine
[18:52:11] <logmsgbot>	 !log legoktm@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1321.eqiad.wmnet with reason: REIMAGE
[18:52:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:53:54] <mutante>	 Unable to perform requested operation.
[18:54:08] <mutante>	 hrmm.. another special case
[18:54:09] <logmsgbot>	 !log legoktm@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1322.eqiad.wmnet with reason: REIMAGE
[18:54:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:54:21] <logmsgbot>	 !log legoktm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1321.eqiad.wmnet with reason: REIMAGE
[18:54:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:13] <logmsgbot>	 !log legoktm@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1323.eqiad.wmnet with reason: REIMAGE
[18:56:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:31] <logmsgbot>	 !log legoktm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1322.eqiad.wmnet with reason: REIMAGE
[18:56:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:57:28] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1371.eqiad.wmnet
[18:57:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:57:59] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia...
[18:58:07] <Amir1>	 twentyafterfour: the train is good to move forward, the subticket is addressed T273242#6818961
[18:58:08] <stashbot>	 T273242: MemcachedPeclBagOStuff: Serialization of 'Closure' is not allowed - https://phabricator.wikimedia.org/T273242
[18:58:08] <logmsgbot>	 !log legoktm@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1324.eqiad.wmnet with reason: REIMAGE
[18:58:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:58:30] <twentyafterfour>	 Amir1: thanks!
[18:58:36] <logmsgbot>	 !log legoktm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1323.eqiad.wmnet with reason: REIMAGE
[18:58:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:04] <jouncebot>	 twentyafterfour and hashar: It is that lovely time of the day again! You are hereby commanded to deploy Train log triage with CPT. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210210T1900).
[19:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: Your horoscope predicts another unfortunate Morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210210T1900).
[19:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[19:00:05] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=inactive; selector: name=mw1379.eqiad.wmnet
[19:00:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:39] <logmsgbot>	 !log legoktm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1324.eqiad.wmnet with reason: REIMAGE
[19:00:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:04:09] <mutante>	 !log mw1379 - racadm racreset - host did not come back from reboot and DRAC says it can't powercycle it.. while it also ALREADY ON
[19:04:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:15] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia...
[19:06:38] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1294.eqiad.wmnet'] `  an...
[19:07:02] <wikibugs>	 (03PS1) 10Jbond: P:idp::client: add generic uwsgi template [puppet] - 10https://gerrit.wikimedia.org/r/663276
[19:07:04] <wikibugs>	 (03PS2) 10Thcipriani: Remove a couple of useless DNS lookups from mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661732 (https://phabricator.wikimedia.org/T231025) (owner: 10Giuseppe Lavagetto)
[19:08:03] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] Remove a couple of useless DNS lookups from mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661732 (https://phabricator.wikimedia.org/T231025) (owner: 10Giuseppe Lavagetto)
[19:08:12] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[19:08:55] <wikibugs>	 (03Merged) 10jenkins-bot: Remove a couple of useless DNS lookups from mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/661732 (https://phabricator.wikimedia.org/T231025) (owner: 10Giuseppe Lavagetto)
[19:11:52] <icinga-wm>	 RECOVERY - Host mw1379 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms
[19:12:31] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1294.eqiad.wmnet
[19:12:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:13:16] <icinga-wm>	 RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:14:51] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1370.eqiad.wmnet with reason: REIMAGE
[19:14:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:15:54] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
[19:15:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:25] <wikibugs>	 10SRE: mw1379 - down after reboot attempt and DRAC can't powercycle - https://phabricator.wikimedia.org/T274403 (10Dzahn)
[19:16:53] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1370.eqiad.wmnet with reason: REIMAGE
[19:16:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:10] <wikibugs>	 10SRE: mw1379 - down after reboot attempt and DRAC can't powercycle - https://phabricator.wikimedia.org/T274403 (10Dzahn)
[19:17:13] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn)
[19:17:25] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1379.eqiad.wmnet
[19:17:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:49] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1294.eqiad.wmnet
[19:17:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:19:05] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1294.eqiad.wmnet
[19:19:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:19:55] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia...
[19:20:36] <wikibugs>	 (03PS1) 10Jbond: P:idp: add ability to change memcached port [puppet] - 10https://gerrit.wikimedia.org/r/663279
[19:20:41] <logmsgbot>	 !log thcipriani@deploy1001 Synchronized wmf-config/ProductionServices.php: [[gerrit:661732|Remove a couple of useless DNS lookups from mediawiki-config]] T231025 (duration: 01m 10s)
[19:20:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:20:46] <stashbot>	 T231025: LegacyHandler.php: PHP Warning: Host lookup failed [-10002]: Unknown error -10002 - https://phabricator.wikimedia.org/T231025
[19:23:06] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1378.eqiad.wmnet with reason: REIMAGE
[19:23:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:17] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27995/console" [puppet] - 10https://gerrit.wikimedia.org/r/663279 (owner: 10Jbond)
[19:23:35] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:idp::client: add generic uwsgi template [puppet] - 10https://gerrit.wikimedia.org/r/663276 (owner: 10Jbond)
[19:23:40] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:idp: add ability to change memcached port [puppet] - 10https://gerrit.wikimedia.org/r/663279 (owner: 10Jbond)
[19:25:10] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1378.eqiad.wmnet with reason: REIMAGE
[19:25:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:25:45] <icinga-wm>	 PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:27:05] <icinga-wm>	 ACKNOWLEDGEMENT - configured eth on sretest1001 is CRITICAL: ens2f1 reporting no carrier. daniel_zahn machines with TEST in the name should not have prod monitoring alerts https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[19:27:05] <icinga-wm>	 ACKNOWLEDGEMENT - dhclient process on sretest1001 is CRITICAL: PROCS CRITICAL: 1 process with command name dhclient daniel_zahn machines with TEST in the name should not have prod monitoring alerts https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[19:27:05] <icinga-wm>	 ACKNOWLEDGEMENT - configured eth on sretest1002 is CRITICAL: eno2 reporting no carrier. daniel_zahn machines with TEST in the name should not have prod monitoring alerts https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[19:27:05] <icinga-wm>	 ACKNOWLEDGEMENT - dhclient process on sretest1002 is CRITICAL: PROCS CRITICAL: 1 process with command name dhclient daniel_zahn machines with TEST in the name should not have prod monitoring alerts https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[19:29:52] <wikibugs>	 (03CR) 10MarcoAurelio: "Untested." [puppet] - 10https://gerrit.wikimedia.org/r/663074 (owner: 10MarcoAurelio)
[19:30:09] <wikibugs>	 10SRE: hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (10Dzahn)
[19:30:50] <wikibugs>	 (03CR) 10Dzahn: "tried to compile on all - opened https://phabricator.wikimedia.org/T274392  because there are always a bunch of false positives" [puppet] - 10https://gerrit.wikimedia.org/r/662021 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[19:31:12] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/27989/" [puppet] - 10https://gerrit.wikimedia.org/r/662021 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[19:33:27] <wikibugs>	 10SRE: mw1379 - down after reboot attempt and DRAC can't powercycle - https://phabricator.wikimedia.org/T274403 (10Dzahn) 05Open→03Resolved a:03Dzahn
[19:33:30] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn)
[19:34:20] <wikibugs>	 10SRE, 10serviceops, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2021 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10Krinkle) a:03Robert-RtC3V
[19:35:17] <icinga-wm>	 PROBLEM - Host ms-be1034 is DOWN: PING CRITICAL - Packet loss = 100%
[19:36:44] <wikibugs>	 10SRE, 10serviceops, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2021 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10RhinosF1) @Krinkle: Robert doesn't seem to have been active for a few years nor involved in this task. Did you mean to assign it to him?
[19:37:52] <wikibugs>	 10SRE, 10serviceops, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2021 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10Krinkle) a:05Robert-RtC3V→03None It wasn't me, it was <https://phabricator.wikimedia.org/project/trigger/72/>.
[19:38:14] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] wmcs::monitoring: replace hiera inside hiera with lookup [puppet] - 10https://gerrit.wikimedia.org/r/662026 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[19:38:51] <wikibugs>	 10SRE, 10serviceops, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2021 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10RhinosF1) >>! In T245183#6819999, @Krinkle wrote: > It wasn't me, it was <https://phabricator.wikimedia.org/project/trigger/72/>. Can we get some...
[19:40:19] <wikibugs>	 10SRE, 10serviceops, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2021 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10Ladsgroup) I'm admin and I can't disable it....
[19:41:46] <wikibugs>	 (03CR) 10Dzahn: "cloudmetrics1002 - confirmed noop" [puppet] - 10https://gerrit.wikimedia.org/r/662026 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[19:41:59] <wikibugs>	 10SRE, 10serviceops, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2021 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10RhinosF1) >>! In T245183#6820029, @Ladsgroup wrote: > I'm admin and I can't disable it.... Phabricator is good at that. I guess it'll have to be...
[19:45:52] <wikibugs>	 10SRE, 10serviceops, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2021 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10Dzahn) >>! In T245183#6820029, @Ladsgroup wrote: > I'm admin and I can't disable it....  You should ask the members of the "phabricator-admin" sh...
[19:46:22] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] ldap: Migrate hiera() to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/661916 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup)
[19:46:59] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1293.eqiad.wmnet with reason: REIMAGE
[19:47:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:47:39] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[19:47:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:02] <wikibugs>	 (03CR) 10Dzahn: "being bold here.. Ladsgroup can't self-merge this and T209953 was originally created by wmcs" [puppet] - 10https://gerrit.wikimedia.org/r/661916 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup)
[19:49:06] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1293.eqiad.wmnet with reason: REIMAGE
[19:49:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:50:56] <wikibugs>	 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Use lookup() instead of hiera() in Puppet - https://phabricator.wikimedia.org/T209953 (10Dzahn) This would be done if it wasn't for a single remaining case:  Could you guys fix this one please?   `  62 puppetmaster::servers:  63   "%{hiera('puppetmas...
[19:52:30] <wikibugs>	 (03PS1) 10Dzahn: cloud: replace hiera in hiera with lookup [puppet] - 10https://gerrit.wikimedia.org/r/663289 (https://phabricator.wikimedia.org/T209953)
[19:52:59] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1321.eqiad.wmnet', 'mw13...
[19:53:38] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:53:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:36] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=no; selector: name=mw1321.eqiad.wmnet
[19:54:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:40] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=no; selector: name=mw1322.eqiad.wmnet
[19:54:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:44] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=no; selector: name=mw1323.eqiad.wmnet
[19:54:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:50] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=no; selector: name=mw1324.eqiad.wmnet
[19:54:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:59:19] <wikibugs>	 (03PS14) 10Kosta Harlan: [WIP] linkrecommendation: Cron job to load datasets [deployment-charts] - 10https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893)
[19:59:40] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@1c5477d]: query_clicks: timestamp is now a reserved keyword
[19:59:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:04] <jouncebot>	 twentyafterfour and hashar: Your horoscope predicts another unfortunate Mediawiki train - American+European Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210210T2000).
[20:00:06] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=yes; selector: name=mw1321.eqiad.wmnet
[20:00:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:23] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=yes; selector: name=mw1322.eqiad.wmnet
[20:00:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:01:06] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=yes; selector: name=mw1323.eqiad.wmnet
[20:01:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:01:29] <wikibugs>	 10SRE, 10Maps: Requesting access to maps for mbsantos and jgiannelos - https://phabricator.wikimedia.org/T269357 (10Dzahn) 05Open→03Stalled
[20:01:59] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@1c5477d]: query_clicks: timestamp is now a reserved keyword (duration: 02m 19s)
[20:02:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:05:56] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=yes; selector: name=mw1324.eqiad.wmnet
[20:06:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:13] <wikibugs>	 (03PS1) 1020after4: group1 wikis to 1.36.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663295
[20:09:15] <wikibugs>	 (03CR) 1020after4: [C: 03+2] group1 wikis to 1.36.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663295 (owner: 1020after4)
[20:09:57] <wikibugs>	 (03CR) 10Kosta Harlan: [WIP] linkrecommendation: Cron job to load datasets (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: 10Kosta Harlan)
[20:10:04] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.36.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663295 (owner: 1020after4)
[20:12:27] <logmsgbot>	 !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.30
[20:12:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:13:30] <logmsgbot>	 !log twentyafterfour@deploy1001 Synchronized php: group1 wikis to 1.36.0-wmf.30 (duration: 01m 02s)
[20:13:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:36] <wikibugs>	 (03PS19) 10CRusnov: dhcp: Introduce automation proxies for management networks [puppet] - 10https://gerrit.wikimedia.org/r/662025 (https://phabricator.wikimedia.org/T271583)
[20:19:38] <wikibugs>	 (03CR) 10CRusnov: dhcp: Introduce automation proxies for management networks (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/662025 (https://phabricator.wikimedia.org/T271583) (owner: 10CRusnov)
[20:20:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] dhcp: Introduce automation proxies for management networks [puppet] - 10https://gerrit.wikimedia.org/r/662025 (https://phabricator.wikimedia.org/T271583) (owner: 10CRusnov)
[20:21:33] <mutante>	 !log mw1370, mw1378 - again failing to reboot as the last step of reimaging script
[20:21:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:23:08] <mutante>	 !log mw1370, mw1378 - powercycling via DRAC
[20:23:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:26:55] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1370.eqiad.wmnet'] `  an...
[20:27:45] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1378.eqiad.wmnet'] `  an...
[20:31:15] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1370.eqiad.wmnet
[20:31:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:31:25] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1378.eqiad.wmnet
[20:31:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:32:13] <wikibugs>	 (03PS15) 10Kosta Harlan: linkrecommendation: Cron job to load datasets [deployment-charts] - 10https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893)
[20:35:01] <wikibugs>	 (03CR) 10Kosta Harlan: "Tested with local-charts, it's working" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: 10Kosta Harlan)
[20:36:04] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1378.eqiad.wmnet
[20:36:04] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1293.eqiad.wmnet'] `  an...
[20:36:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:36:26] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1370.eqiad.wmnet
[20:36:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:36:44] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1293.eqiad.wmnet
[20:36:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:50] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1293.eqiad.wmnet
[20:39:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:43:50] <wikibugs>	 10SRE, 10Commons, 10Traffic, 10Patch-For-Review: Investigate unusual media traffic pattern for AsterNovi-belgii-flower-1mb.jpg on Commons - https://phabricator.wikimedia.org/T273741 (10Joe) >>! In T273741#6816099, @Joe wrote: >>>! In T273741#6815874, @Majavah wrote: >> Is the effect that the block will hav...
[20:45:02] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia...
[20:46:02] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia...
[20:47:12] <wikibugs>	 (03PS16) 10Kosta Harlan: linkrecommendation: Cron job to load datasets [deployment-charts] - 10https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893)
[20:53:14] <wikibugs>	 10SRE, 10netops, 10observability: Ingest Cron and Root Alerts Into Logstash - https://phabricator.wikimedia.org/T274377 (10herron) Sorry, I should have clarified this initially, afaict a proxy won't work for this case because logstash configures this at the JVM level and would have unwanted effects on the ot...
[21:00:04] <jouncebot>	 chrisalbon and accraze: #bothumor My software never has bugs. It just develops random features. Rise for Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210210T2100).
[21:01:56] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1369.eqiad.wmnet with reason: REIMAGE
[21:02:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:02:55] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1377.eqiad.wmnet with reason: REIMAGE
[21:02:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:02:59] <wikibugs>	 (03PS1) 10Ebernhardson: airflow: Increase scheduler health check to match interval [puppet] - 10https://gerrit.wikimedia.org/r/663304
[21:03:38] <wikibugs>	 (03CR) 10Volans: "Some comments/questions inline." (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/663205 (https://phabricator.wikimedia.org/T274338) (owner: 10David Caro)
[21:04:06] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1369.eqiad.wmnet with reason: REIMAGE
[21:04:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:23] <wikibugs>	 (03CR) 10Cwhite: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/662009 (https://phabricator.wikimedia.org/T217032) (owner: 10Cwhite)
[21:06:00] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1377.eqiad.wmnet with reason: REIMAGE
[21:06:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:07:28] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by legoktm on cumin1001.eq...
[21:10:17] <wikibugs>	 (03PS11) 10Cwhite: profile: update netdev to output ECS-formatted logs [puppet] - 10https://gerrit.wikimedia.org/r/647029 (https://phabricator.wikimedia.org/T234565)
[21:12:45] <wikibugs>	 (03PS12) 10Cwhite: profile: update netdev to output ECS-formatted logs [puppet] - 10https://gerrit.wikimedia.org/r/647029 (https://phabricator.wikimedia.org/T234565)
[21:14:21] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh)
[21:15:24] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] profile: update netdev to output ECS-formatted logs [puppet] - 10https://gerrit.wikimedia.org/r/647029 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite)
[21:19:56] <wikibugs>	 (03CR) 10Ebernhardson: "verified that changing this config var fixes the UI warning in our analytics-integration environment." [puppet] - 10https://gerrit.wikimedia.org/r/663304 (owner: 10Ebernhardson)
[21:21:08] <wikibugs>	 (03PS1) 10Jgreen: remove A/PTR records for frdata1001.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/663307 (https://phabricator.wikimedia.org/T255435)
[21:22:52] <wikibugs>	 (03CR) 10Jgreen: [C: 03+2] remove A/PTR records for frdata1001.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/663307 (https://phabricator.wikimedia.org/T255435) (owner: 10Jgreen)
[21:35:29] <logmsgbot>	 !log legoktm@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1325.eqiad.wmnet with reason: REIMAGE
[21:35:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:28] <logmsgbot>	 !log legoktm@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1326.eqiad.wmnet with reason: REIMAGE
[21:37:39] <logmsgbot>	 !log legoktm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1325.eqiad.wmnet with reason: REIMAGE
[21:37:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:39:32] <logmsgbot>	 !log legoktm@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1327.eqiad.wmnet with reason: REIMAGE
[21:39:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:39:39] <logmsgbot>	 !log legoktm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1326.eqiad.wmnet with reason: REIMAGE
[21:39:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:29] <logmsgbot>	 !log legoktm@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1328.eqiad.wmnet with reason: REIMAGE
[21:41:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:44] <logmsgbot>	 !log legoktm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1327.eqiad.wmnet with reason: REIMAGE
[21:41:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:43:51] <logmsgbot>	 !log legoktm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1328.eqiad.wmnet with reason: REIMAGE
[21:43:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:20] <apergos>	 twentyafterfour: did wmf.30 roll to group1 yet? I'm just trying to keep track
[21:52:18] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] "Overall LGTM." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/663238 (https://phabricator.wikimedia.org/T272979) (owner: 10Filippo Giunchedi)
[21:52:45] <apergos>	 nm I see it is, I shoul have checked versions earlier, my bad
[21:55:36] <mutante>	 ryankemper: kibana is failing on relforge1003/1004
[21:56:29] <wikibugs>	 (03PS1) 10Legoktm: Revert "profiler: Send data to excimer-buster pipeline" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663078
[21:56:45] <wikibugs>	 (03PS1) 10Legoktm: Revert "arclamp: Add excimer-buster pipeline" [puppet] - 10https://gerrit.wikimedia.org/r/663079
[21:56:55] <wikibugs>	 (03PS2) 10Legoktm: Revert "arclamp: Add excimer-buster pipeline" [puppet] - 10https://gerrit.wikimedia.org/r/663079
[21:56:57] <wikibugs>	 (03PS2) 10Legoktm: Revert "profiler: Send data to excimer-buster pipeline" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663078
[21:57:07] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn https://phabricator.wikimedia.org/T262211#6817218 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:57:07] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on relforge1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn https://phabricator.wikimedia.org/T262211#6817218 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:57:25] <wikibugs>	 10SRE, 10MediaWiki-Debug-Logger, 10Traffic, 10Platform Team Workboards (Clinic Duty Team), 10Wikimedia-production-error: LegacyHandler.php: PHP Warning: Host lookup failed [-10002]: Unknown error -10002 - https://phabricator.wikimedia.org/T231025 (10thcipriani) 05Open→03Resolved a:03Joe Specific er...
[21:58:07] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] Revert "profiler: Send data to excimer-buster pipeline" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/663078 (owner: 10Legoktm)
[22:04:42] <wikibugs>	 (03PS3) 10Cwhite: profile: update netdev rsyslog template to ecs 1.7.0 [puppet] - 10https://gerrit.wikimedia.org/r/647032 (https://phabricator.wikimedia.org/T234565)
[22:07:16] <mutante>	 !log mw1369, mw1377 - all servers in this section now consistenly fail to reboot when triggered as the last step of wmf-reimage script
[22:07:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:35] <legoktm>	 mutante: is it a general problem or specific to those servers?
[22:09:59] <mutante>	 legoktm: I don't know, either it is this type of hardware or something broke about sending the reboot command
[22:10:08] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] "This can be deployed after the other patch to stop sending data to it." [puppet] - 10https://gerrit.wikimedia.org/r/663079 (owner: 10Legoktm)
[22:10:16] <mutante>	 the facts I have.. it did not happen until yesterday
[22:10:21] <mutante>	 now it happens all the time ..to me
[22:10:45] <mutante>	 but this a specific section in the etherpad
[22:10:50] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1369.eqiad.wmnet'] `  an...
[22:10:59] <mutante>	 generally if you just powercycle them they will be ok
[22:11:02] <legoktm>	 the 4 I did earlier had no issue, and I'm doing 4 right now, but they've all been in the same group (mw1321-1328)
[22:11:12] <mutante>	 except the special case among special cases which needed DRAC reset
[22:11:15] <mutante>	 to be able to do just that
[22:11:31] <mutante>	 ok, *nod*
[22:12:18] <mutante>	 if you manually powercycle before a full hour is over.. you can even get the reimage script to end with exit 0 and all good
[22:12:39] <mutante>	 but if you don't.. just gives up after 60 min
[22:12:46] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1377.eqiad.wmnet'] `  an...
[22:13:57] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1369.eqiad.wmnet
[22:14:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:14:07] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1377.eqiad.wmnet
[22:14:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:16:34] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] profile: update netdev rsyslog template to ecs 1.7.0 [puppet] - 10https://gerrit.wikimedia.org/r/647032 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite)
[22:23:42] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1369.eqiad.wmnet
[22:23:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:24:52] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1377.eqiad.wmnet
[22:24:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:27:35] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/27996/" [puppet] - 10https://gerrit.wikimedia.org/r/663289 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[22:28:07] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1] "already compiled on * - so unless this is used by internal cloud VPS machines - it is noop" [puppet] - 10https://gerrit.wikimedia.org/r/663289 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[22:29:11] <wikibugs>	 10SRE, 10ops-eqiad, 10cloud-services-team (Hardware): cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem - https://phabricator.wikimedia.org/T271058 (10RobH)
[22:29:25] <wikibugs>	 10SRE, 10ops-eqiad, 10cloud-services-team (Hardware): cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem - https://phabricator.wikimedia.org/T271058 (10RobH)
[22:29:39] <wikibugs>	 (03CR) 10Dzahn: "This is just asking to add new and already existing VMs to scap so that when people deploy they also deploy to these." [puppet] - 10https://gerrit.wikimedia.org/r/650306 (owner: 10Dzahn)
[22:30:50] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@d97f7d9]: query_clicks: Remove result file merging
[22:30:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:31:07] * Krinkle testing on mwdebug1001
[22:32:17] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@d97f7d9]: query_clicks: Remove result file merging (duration: 01m 27s)
[22:32:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:33:13] <wikibugs>	 (03CR) 10Dzahn: "This is switching doc.wm.org from doc1001 (stretch) to doc1002 (buster). It is already up and running, has the right puppet role, no error" [dns] - 10https://gerrit.wikimedia.org/r/650625 (https://phabricator.wikimedia.org/T247653) (owner: 10Dzahn)
[22:35:14] <wikibugs>	 10SRE, 10serviceops, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1325.eqiad.wmnet', 'mw13...
[22:37:39] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=no; selector: name=mw1325.eqiad.wmnet
[22:37:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:46] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=no; selector: name=mw1326.eqiad.wmnet
[22:37:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:51] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet
[22:37:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:55] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=no; selector: name=mw1328.eqiad.wmnet
[22:37:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:01:59] <wikibugs>	 (03PS1) 10Cwhite: Revert "profile: update netdev rsyslog template to ecs 1.7.0" [puppet] - 10https://gerrit.wikimedia.org/r/663080
[23:04:54] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] Revert "profile: update netdev rsyslog template to ecs 1.7.0" [puppet] - 10https://gerrit.wikimedia.org/r/663080 (owner: 10Cwhite)
[23:20:21] <wikibugs>	 (03PS1) 10BryanDavis: README: line wrapping for easier source reading [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/663322
[23:38:58] <logmsgbot>	 !log milimetric@deploy1001 Started deploy [analytics/refinery@3da19b6]: More fixes for jobs after cluster upgrade
[23:39:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:47:09] <wikibugs>	 10SRE, 10Analytics, 10SRE-Access-Requests: Add kzeta to analytics-privatedata-users - https://phabricator.wikimedia.org/T272982 (10kzimmerman) @Vgutierrez (I saw you were listed on [[ https://wikitech.wikimedia.org/wiki/SRE_Clinic_Duty | Clinic Duty ]]) - I ran into access problems again today; do you need a...
[23:49:35] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=yes; selector: name=mw1325.eqiad.wmnet
[23:49:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:49:40] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=yes; selector: name=mw1326.eqiad.wmnet
[23:49:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:49:45] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet
[23:49:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:49:51] <logmsgbot>	 !log legoktm@cumin1001 conftool action : set/pooled=yes; selector: name=mw1328.eqiad.wmnet
[23:49:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:51:07] <wikibugs>	 10SRE: mw1379 - down after reboot attempt and DRAC can't powercycle - https://phabricator.wikimedia.org/T274403 (10Papaul) I looked at 3 hosts wmf-auto-reimage  .out log, there were no indication of this issue then i looked at the IDRAC log  of 3 of the hosts that are having this issue (mw1377,mw1378 and mw1379)...
[23:53:21] <logmsgbot>	 !log milimetric@deploy1001 Finished deploy [analytics/refinery@3da19b6]: More fixes for jobs after cluster upgrade (duration: 14m 23s)
[23:53:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:58:33] <Krinkle>	 mutante: fyi, the most notable part of doc1001 is the /srv/doc which is stateful (not scap deployed)
[23:59:00] <Krinkle>	 this is not automatically synced from old to new server and between the two new servers, right?