[00:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190208T0000).
[00:00:04] <jouncebot>	 ebernhardson and AndyRussG: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[00:00:39] <AndyRussG>	 o/
[00:01:52] <ebernhardson>	 \o
[00:02:16] <ebernhardson>	 i can ship it, two config patches should be easy
[00:02:17] <wikibugs>	 (03CR) 10Dzahn: "@bblack is this a bad idea? trying to eliminate that users have to change their local config when the numbers change and we do the same fo" [dns] - 10https://gerrit.wikimedia.org/r/489103 (owner: 10Dzahn)
[00:02:45] <AndyRussG>	 ebernhardson: ah cool thanks!
[00:02:47] <wikibugs>	 (03PS6) 10EBernhardson: Give protect right to centralnoticeadmin on Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483044 (https://phabricator.wikimedia.org/T209873) (owner: 10AndyRussG)
[00:03:44] <ebernhardson>	 AndyRussG: you said "Requires confirmation that this is acceptable policy." and accessing the related task gives me an access denied. I'll simply trust you got that confirmation?
[00:04:01] <AndyRussG>	 ebernhardson: yes it's on the task, in a comment
[00:04:10] <ebernhardson>	 excellent
[00:04:11] <AndyRussG>	 sorry you should be able to see the task
[00:04:24] <wikibugs>	 (03CR) 10EBernhardson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483044 (https://phabricator.wikimedia.org/T209873) (owner: 10AndyRussG)
[00:06:20] <wikibugs>	 (03Merged) 10jenkins-bot: Give protect right to centralnoticeadmin on Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483044 (https://phabricator.wikimedia.org/T209873) (owner: 10AndyRussG)
[00:06:55] <ebernhardson>	 AndyRussG: pulled to mwdebug1001
[00:07:13] <AndyRussG>	 ebernhardson: k one sec
[00:07:23] <AndyRussG>	 I sillily uninstalled the mw debug browser add on
[00:07:28] <AndyRussG>	 gotta put that back
[00:07:35] <AndyRussG>	 btw you should be able to see the task now
[00:07:48] <ebernhardson>	 i have too many plugins installed...should drop a few. i guess not the mwdebug one :)
[00:07:52] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "literally "+1 lgtm but somebody else must approve". I filled out the access request section in the Etherpad for the Monday SRE meeting and" [puppet] - 10https://gerrit.wikimedia.org/r/487040 (https://phabricator.wikimedia.org/T214922) (owner: 10Mathew.onipe)
[00:08:19] <wikibugs>	 (03CR) 10jenkins-bot: Give protect right to centralnoticeadmin on Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483044 (https://phabricator.wikimedia.org/T209873) (owner: 10AndyRussG)
[00:08:24] <wikibugs>	 (03PS2) 10EBernhardson: Turn off wbsearchentities ab test in de, fr, es [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488588 (https://phabricator.wikimedia.org/T214515)
[00:08:46] <AndyRussG>	 (I was seeing a bunch of network activity that I didn't know the source of, so I wanted to see if that was a cause)
[00:09:02] <AndyRussG>	 (Just because you're paranoid doesn't mean they're not after you)
[00:10:10] <AndyRussG>	 ebernhardson: lgtm!
[00:13:04] <wikibugs>	 (03CR) 10EBernhardson: [C: 03+2] Turn off wbsearchentities ab test in de, fr, es [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488588 (https://phabricator.wikimedia.org/T214515) (owner: 10EBernhardson)
[00:13:13] <wikibugs>	 (03PS1) 10Dzahn: remove bast3003, bast3002 has been repaired [dns] - 10https://gerrit.wikimedia.org/r/489104 (https://phabricator.wikimedia.org/T184936)
[00:13:59] <wikibugs>	 (03PS2) 10Dzahn: remove bast3003, bast3002 has been repaired (?) [dns] - 10https://gerrit.wikimedia.org/r/489104 (https://phabricator.wikimedia.org/T184936)
[00:14:08] <wikibugs>	 (03Merged) 10jenkins-bot: Turn off wbsearchentities ab test in de, fr, es [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488588 (https://phabricator.wikimedia.org/T214515) (owner: 10EBernhardson)
[00:14:43] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] remove bast3003, bast3002 has been repaired (?) [dns] - 10https://gerrit.wikimedia.org/r/489104 (https://phabricator.wikimedia.org/T184936) (owner: 10Dzahn)
[00:14:49] <ebernhardson>	 AndyRussG: it's almost sync'd ... i think there is still one mw instance that's timing out (was doing it earlier swat today too)
[00:14:56] <wikibugs>	 (03CR) 10Dzahn: [C: 04-2] remove bast3003, bast3002 has been repaired (?) [dns] - 10https://gerrit.wikimedia.org/r/489104 (https://phabricator.wikimedia.org/T184936) (owner: 10Dzahn)
[00:15:17] <AndyRussG>	 hmmm okok
[00:15:27] <logmsgbot>	 !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT gerrit:483044 T209873 Give protect right to centralnoticeadmin on Meta (duration: 02m 56s)
[00:15:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:16:28] <ebernhardson>	 there it goes
[00:16:47] <ebernhardson>	 !log scap sync timed out on mw1299.eqiad.wmnet
[00:16:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:17:43] <Reedy>	 ebernhardson: Host is down, I filed a task (it's been rebooted once already today)
[00:18:14] <wikibugs>	 (03PS2) 10Dzahn: CNAMEs for bastions in each DC for user convenience [dns] - 10https://gerrit.wikimedia.org/r/489103
[00:18:18] <ebernhardson>	 Reedy: gotcha
[00:18:18] <wikibugs>	 (03CR) 10BryanDavis: "> Unfortunately, we need PHP7.2 and the gridengine only has 5.5 I" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/488764 (https://phabricator.wikimedia.org/T213669) (owner: 10Samwilson)
[00:18:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] CNAMEs for bastions in each DC for user convenience [dns] - 10https://gerrit.wikimedia.org/r/489103 (owner: 10Dzahn)
[00:20:42] <wikibugs>	 (03CR) 10jenkins-bot: Turn off wbsearchentities ab test in de, fr, es [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488588 (https://phabricator.wikimedia.org/T214515) (owner: 10EBernhardson)
[00:21:14] <AndyRussG>	 ebernhardson: thanks much!! :)
[00:21:33] <AndyRussG>	 looks good now generally (i.e. not just on the debug host)
[00:21:49] <wikibugs>	 (03PS3) 10Dzahn: contint: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/485096
[00:22:02] <logmsgbot>	 !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT gerrit:488588 phab:T214515 Turn off wikidata wbsearchentities ab test in de, fr, es (duration: 02m 55s)
[00:22:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:22:06] <stashbot>	 T214515: Run wikidata entitiy autocomplete AB test in de, fr, es - https://phabricator.wikimedia.org/T214515
[00:23:24] <ebernhardson>	 SWAT complete
[00:24:29] <AndyRussG>	 yeeee :)
[00:26:31] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "noop in prod: https://puppet-compiler.wmflabs.org/compiler1002/14578/" [puppet] - 10https://gerrit.wikimedia.org/r/485096 (owner: 10Dzahn)
[00:29:49] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "talked at allhands and to Mark about it. abandoning / recycling in favor of making notes_url a mandatory parameter for newly added icinga " [puppet] - 10https://gerrit.wikimedia.org/r/459659 (https://phabricator.wikimedia.org/T197873) (owner: 10Dzahn)
[00:37:13] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "ok, ready to merge this but what's a good test we should do to make sure no phab mail features are affected" [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox)
[00:43:16] <wikibugs>	 (03PS2) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012
[00:43:39] <wikibugs>	 (03PS1) 10Ayounsi: Icinga: add ping check for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/489113 (https://phabricator.wikimedia.org/T209101)
[00:43:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (owner: 10Paladox)
[00:44:46] <wikibugs>	 (03PS3) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012
[00:45:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (owner: 10Paladox)
[00:47:15] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10thcipriani) >>! In T207707#4909130, @Dzahn wrote: > Let's ask dcops instead and request a new disk to be ad...
[00:48:06] <mutante>	 jouncebot: next
[00:48:06] <jouncebot>	 In 81 hour(s) and 41 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190211T1030)
[00:48:26] <wikibugs>	 (03PS4) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012
[00:49:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (owner: 10Paladox)
[00:49:23] <wikibugs>	 (03PS5) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012
[00:50:02] <wikibugs>	 (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler1002/14579/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/489113 (https://phabricator.wikimedia.org/T209101) (owner: 10Ayounsi)
[00:50:08] <wikibugs>	 (03PS6) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012
[00:50:42] <wikibugs>	 (03PS7) 10Dzahn: jenkins: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/485094
[00:50:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (owner: 10Paladox)
[00:51:46] <wikibugs>	 (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/485094 (owner: 10Dzahn)
[00:54:47] * bd808 sees he will have all weekend to break and fix jouncebot
[00:54:51] <wikibugs>	 (03PS10) 10Dzahn: phabricator: Add new cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox)
[00:55:51] <wikibugs>	 (03PS7) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012
[00:56:23] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phabricator: Add new cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox)
[00:56:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (owner: 10Paladox)
[00:57:37] <wikibugs>	 (03PS8) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012
[00:57:43] <mutante>	 paladox: twentyafterfour: config change deployed (ack, it should not affect anything but lets test)
[00:57:50] * paladox tests
[00:57:52] <mutante>	 eh wait.. that was 2001
[00:58:28] <mutante>	 ok, now for real on 1001
[00:58:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (owner: 10Paladox)
[00:58:42] <paladox>	 ok
[00:59:07] <paladox>	 mutante comment on a task im subscribed too
[00:59:14] <paladox>	 comment can be removed after testing
[01:01:17] <mutante>	 https://phabricator.wikimedia.org/T200739#4937042
[01:01:30] <paladox>	 mutante mail works!
[01:01:42] <mutante>	 paladox: ok:) thx
[01:01:47] <paladox>	 your welcome :)
[01:02:00] <mutante>	 i was a bad tester, i actually have web-based notifications
[01:02:16] <paladox>	 heh
[01:04:03] <mutante>	 paladox: i like that there is a hash tag for 2.16 now, thx https://gerrit.wikimedia.org/r/q/hashtag:%22gerrit-2.16%22+(status:open%20OR%20status:merged)
[01:04:11] <paladox>	 yup :)
[01:07:59] <mutante>	 !log powercycle crashed mw1299 via mgmt (garbled console output) (T215569)
[01:08:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:08:04] <stashbot>	 T215569: mw1299 is down - https://phabricator.wikimedia.org/T215569
[01:09:52] <icinga-wm>	 RECOVERY - Host mw1299 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms
[01:10:52] <mutante>	 !log mw1299 has been down about 8 hours, does it need deployment.. depooling 
[01:10:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:11:10] <mutante>	 Reedy: ^ 
[01:11:29] <mutante>	 so if in the last 8 hours there was deployment, we need to deploy to just mw1299
[01:11:51] <Reedy>	 mutante: It's died a couple of times today it seems
[01:11:51] <mutante>	 or just wait until next regular deployment and then remember to repool
[01:12:01] <Reedy>	 So I think it might want a bit more debugging
[01:12:06] <mutante>	 [mw1299:~] $ depool
[01:12:06] <mutante>	 Depooling all services on mw1299.eqiad.wmnet
[01:12:09] <mutante>	 ^ this does not log
[01:12:22] <mutante>	 Reedy: ok
[01:12:42] <Reedy>	 I did file a task if you want to comment its depooled
[01:13:06] <wikibugs>	 10Operations, 10ops-eqiad: mw1299 is down - https://phabricator.wikimedia.org/T215569 (10Dzahn) ` 20:12 < mutante> [mw1299:~] $ depool 20:12 < mutante> Depooling all services on mw1299.eqiad.wmnet  `
[01:13:31] <mutante>	 done 
[01:14:24] <Reedy>	 ta
[01:15:26] <icinga-wm>	 PROBLEM - Host mw1280 is DOWN: PING CRITICAL - Packet loss = 100%
[01:15:36] <wikibugs>	 10Operations, 10ops-eqiad: mw1299 is down - https://phabricator.wikimedia.org/T215569 (10Dzahn) it's back up and running right now but depooled because this isn't the first time it happened on this machine
[01:16:20] <icinga-wm>	 RECOVERY - Host mw1280 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms
[01:16:22] <mutante>	 Reedy: "remove from dsh" still a thing nowadays ... 
[01:16:31] <mutante>	 should i look for it
[01:16:34] <Reedy>	 Is it? I've no idea :D
[01:16:46] <Reedy>	 does scap foo still try and sync to it if it's depooled? :)
[01:16:52] <mutante>	 hieradata/common/scap/dsh.yaml:      - mw1299.eqiad.wmnet
[01:16:53] <mutante>	 yes
[01:17:04] <mutante>	 afair
[01:17:26] <mutante>	 but it would be for deployers not having to skip that host.. only
[01:18:07] <wikibugs>	 (03CR) 10RobH: [C: 03+1] Icinga: add ping check for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/489113 (https://phabricator.wikimedia.org/T209101) (owner: 10Ayounsi)
[01:18:16] <mutante>	 this is a jobrunner-canary
[01:18:32] <mutante>	 and it's still callsed scap::dsh::groups
[01:19:35] <mutante>	 i think you are not affected since its not in "mediawiki-installation" 
[01:19:43] <mutante>	 this case wouldnt need it then
[01:19:52] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2084 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.72 seconds
[01:20:10] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.45 seconds
[01:20:12] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2090 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 312.45 seconds
[01:20:17] <wikibugs>	 (03PS9) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458)
[01:20:34] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2073 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 325.15 seconds
[01:20:36] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 325.87 seconds
[01:20:38] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 326.73 seconds
[01:21:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:23:16] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 414.32 seconds
[01:24:17] <wikibugs>	 (03PS10) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458)
[01:24:23] <wikibugs>	 (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:25:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:25:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:26:41] <wikibugs>	 (03PS11) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458)
[01:26:47] <wikibugs>	 (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:27:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:27:49] <wikibugs>	 (03PS12) 10Paladox: [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458)
[01:27:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:28:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [wip] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:28:44] <mutante>	 checked on dbtree only one server in s4 is really affected (db2051) and the other are all back to ok  and this has happened sometimes in the past
[01:29:41] <wikibugs>	 (03CR) 10Paladox: "@Hashar would you be able to review this please?" [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:30:06] <wikibugs>	 (03PS13) 10Paladox: zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458)
[01:30:33] <mutante>	 (during backups)
[01:30:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox)
[01:31:06] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2091 is OK: OK slave_sql_lag Replication lag: 0.48 seconds
[01:31:08] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2058 is OK: OK slave_sql_lag Replication lag: 0.52 seconds
[01:31:08] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2090 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[01:31:34] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2073 is OK: OK slave_sql_lag Replication lag: 0.48 seconds
[01:31:36] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2051 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[01:31:36] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2065 is OK: OK slave_sql_lag Replication lag: 0.37 seconds
[01:31:40] <wikibugs>	 10Operations, 10ops-eqiad: mw1299 is down (jobrunner-canary, now up but depooled) - https://phabricator.wikimedia.org/T215569 (10Dzahn)
[01:31:54] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2084 is OK: OK slave_sql_lag Replication lag: 0.36 seconds
[01:32:11] <mutante>	 and icinga caught up because it just checks every 5 min 
[01:37:47] <logmsgbot>	 !log dzahn@puppetmaster1001 conftool action : set/pooled=no; selector: name=mw1299.eqiad.wmnet
[01:37:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:39:43] <mutante>	 Reedy: i also did it the other way using conftctl to make sure ^ and that changed the actual state as opposed to running "depool" locally
[01:40:21] <mutante>	 and the old modules/scap/files/dsh/ is _almost_ gone but it's not a thing anymore 
[01:42:01] <wikibugs>	 10Operations, 10ops-eqiad: mw1299 is down (jobrunner-canary, now up but depooled) - https://phabricator.wikimedia.org/T215569 (10Dzahn) ` [puppetmaster1001:~] $ sudo -i confctl depool --hostname mw1299.eqiad.wmnet  eqiad/jobrunner/apache2/mw1299.eqiad.wmnet: pooled changed yes => no eqiad/jobrunner/nginx/mw129...
[01:47:05] <wikibugs>	 10Operations, 10Mail, 10Phabricator, 10serviceops, and 2 others: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Dzahn) deployed in production and we tested mail still works. this just adds the new config and does not remove the old config though
[01:50:28] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 336.37 seconds
[01:50:31] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 336.43 seconds
[01:50:31] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2090 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 336.91 seconds
[01:51:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2084 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 350.97 seconds
[01:51:45] <icinga-wm>	 PROBLEM - Host mw1299 is DOWN: PING CRITICAL - Packet loss = 100%
[01:55:01] <wikibugs>	 10Operations, 10Parsoid, 10Patch-For-Review: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10Dzahn) p:05High→03Normal lowering priority since Subbu is unblocked and can use the new box and we have switched varnish over.  the remaining part is just...
[01:57:51] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 515.61 seconds
[01:57:51] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 515.65 seconds
[01:58:09] <wikibugs>	 (03PS1) 10Paladox: phabricator: Remove old mail config [puppet] - 10https://gerrit.wikimedia.org/r/489121 (https://phabricator.wikimedia.org/T212989)
[02:00:02] <wikibugs>	 (03PS2) 10Paladox: phabricator: Remove old mail config [puppet] - 10https://gerrit.wikimedia.org/r/489121 (https://phabricator.wikimedia.org/T212989)
[02:01:03] <wikibugs>	 (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/489121 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox)
[02:06:29] <wikibugs>	 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10Dzahn) Hi @faidon,  let me explain. It was never a request for running 2 phabricator hosts in each datacenter. That's a misunderstanding.   It's just that we want to reinstall pha...
[02:08:15] <wikibugs>	 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10Dzahn) a:05Dzahn→03faidon
[02:08:37] <wikibugs>	 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10Dzahn) P.S. running it in codfw is blocked on unrelated things (lack of dbproxy) and the host currently called phab1002 with 32GB would immediately go back to pool
[02:08:37] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2073 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 790.37 seconds
[02:35:55] <wikibugs>	 (03PS1) 10Krinkle: tests: Assert that no computed lists are used in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489126
[02:36:24] <wikibugs>	 (03PS1) 10Bstorm: toolforge: bastion cgroup limits need a big boost in resources [puppet] - 10https://gerrit.wikimedia.org/r/489127 (https://phabricator.wikimedia.org/T215434)
[02:38:23] <wikibugs>	 (03CR) 10Bstorm: "And so begins the great journey of giving enough resources...but not too much!  :)" [puppet] - 10https://gerrit.wikimedia.org/r/489127 (https://phabricator.wikimedia.org/T215434) (owner: 10Bstorm)
[02:38:43] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] toolforge: bastion cgroup limits need a big boost in resources [puppet] - 10https://gerrit.wikimedia.org/r/489127 (https://phabricator.wikimedia.org/T215434) (owner: 10Bstorm)
[02:39:09] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] tests: Assert that no computed lists are used in wmf-config (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489126 (owner: 10Krinkle)
[02:46:02] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] tests: Assert that no computed lists are used in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489126 (owner: 10Krinkle)
[02:47:14] <wikibugs>	 (03Merged) 10jenkins-bot: tests: Assert that no computed lists are used in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489126 (owner: 10Krinkle)
[02:50:44] <wikibugs>	 (03CR) 10jenkins-bot: tests: Assert that no computed lists are used in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489126 (owner: 10Krinkle)
[04:15:45] <wikibugs>	 (03CR) 10Samwilson: "> The new Debian Stretch job grid is PHP 7.2." [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/488764 (https://phabricator.wikimedia.org/T213669) (owner: 10Samwilson)
[04:22:44] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on db2053 is CRITICAL: cluster=mysql device=cciss,3 instance=db2053:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2053&var-datasource=codfw+prometheus/ops
[04:46:27] <wikibugs>	 (03PS1) 10Andrew Bogott: openstack: add some ipv6 firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/489133
[04:47:20] <wikibugs>	 (03PS1) 10BryanDavis: Move most code into jouncebot package [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489134
[04:47:22] <wikibugs>	 (03PS1) 10BryanDavis: Toolforge kubernetes support [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489135
[04:47:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: add some ipv6 firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/489133 (owner: 10Andrew Bogott)
[04:48:56] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2073 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[04:48:58] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2065 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[04:48:58] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2051 is OK: OK slave_sql_lag Replication lag: 0.44 seconds
[04:49:08] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2091 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[04:49:38] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2090 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[04:49:44] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2058 is OK: OK slave_sql_lag Replication lag: 0.39 seconds
[04:49:55] <wikibugs>	 (03CR) 10BryanDavis: "> Is it okay to stay with the (new) job grid?" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/488764 (https://phabricator.wikimedia.org/T213669) (owner: 10Samwilson)
[04:49:57] <wikibugs>	 (03PS2) 10Andrew Bogott: openstack: add some ipv6 firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/489133
[04:49:58] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2084 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[04:50:16] <wikibugs>	 (03CR) 10BryanDavis: [C: 04-1] Add all fonts used in production MediaWiki [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/488764 (https://phabricator.wikimedia.org/T213669) (owner: 10Samwilson)
[04:52:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack: add some ipv6 firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/489133 (owner: 10Andrew Bogott)
[04:53:09] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Move most code into jouncebot package [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489134 (owner: 10BryanDavis)
[04:53:41] <wikibugs>	 (03Merged) 10jenkins-bot: Move most code into jouncebot package [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489134 (owner: 10BryanDavis)
[04:53:48] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Toolforge kubernetes support [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489135 (owner: 10BryanDavis)
[04:54:17] <wikibugs>	 (03Merged) 10jenkins-bot: Toolforge kubernetes support [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489135 (owner: 10BryanDavis)
[05:10:37] <wikibugs>	 (03Abandoned) 10Samwilson: Add all fonts used in production MediaWiki [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/488764 (https://phabricator.wikimedia.org/T213669) (owner: 10Samwilson)
[05:13:56] <wikibugs>	 (03PS1) 10BryanDavis: Update runner script and default config [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489138
[05:14:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update runner script and default config [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489138 (owner: 10BryanDavis)
[05:18:55] <wikibugs>	 (03PS2) 10BryanDavis: Update runner script and default config [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489138
[05:19:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update runner script and default config [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489138 (owner: 10BryanDavis)
[05:24:10] <icinga-wm>	 PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:24:22] <wikibugs>	 (03PS3) 10BryanDavis: Update runner script and default config [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489138
[05:27:52] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Update runner script and default config [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489138 (owner: 10BryanDavis)
[05:28:13] <wikibugs>	 (03Merged) 10jenkins-bot: Update runner script and default config [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489138 (owner: 10BryanDavis)
[05:28:26] <bd808>	 jouncebot: next
[05:28:26] <jouncebot>	 In 77 hour(s) and 1 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190211T1030)
[05:28:51] * bd808 is about to migrate jouncebot to the Toolforge kubernetes cluster
[05:47:34] <onimisionipe>	 :)
[06:07:28] <marostegui>	 !log Drop staging.mep_word_persistence from dbstore1002 T215450 T213706
[06:07:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:07:34] <stashbot>	 T215450: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450
[06:07:35] <stashbot>	 T213706: Convert Aria/Tokudb tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706
[06:11:09] <wikibugs>	 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui)
[06:11:52] <icinga-wm>	 ACKNOWLEDGEMENT - Device not healthy -SMART- on db2053 is CRITICAL: cluster=mysql device=cciss,3 instance=db2053:9100 job=node site=codfw Marostegui T208323 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2053&var-datasource=codfw+prometheus/ops
[06:12:57] <wikibugs>	 (03PS1) 10BryanDavis: Fix py3: s/iteritems/items/ [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489143
[06:13:36] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Fix py3: s/iteritems/items/ [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489143 (owner: 10BryanDavis)
[06:13:40] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489144 (https://phabricator.wikimedia.org/T210713)
[06:13:57] <wikibugs>	 (03Merged) 10jenkins-bot: Fix py3: s/iteritems/items/ [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489143 (owner: 10BryanDavis)
[06:16:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489144 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:17:45] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489144 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:18:53] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489144 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:21:14] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1098:3317 (duration: 02m 58s)
[06:21:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:21:45] <marostegui>	 !log Deploy schema change on db1098:3317
[06:21:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:27:31] <wikibugs>	 (03PS1) 10BryanDavis: Run `2to3 -w` on codebase [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489145
[06:27:33] <wikibugs>	 (03PS1) 10BryanDavis: Update .gitignore [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489146
[06:27:59] <wikibugs>	 (03PS1) 10Marostegui: dbstore.my.cnf: Enable automatic slaves start [puppet] - 10https://gerrit.wikimedia.org/r/489147 (https://phabricator.wikimedia.org/T213670)
[06:28:38] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Run `2to3 -w` on codebase [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489145 (owner: 10BryanDavis)
[06:28:39] <wikibugs>	 10Operations, 10ops-eqiad: mw1299 is down (jobrunner-canary, now up but depooled) - https://phabricator.wikimedia.org/T215569 (10Marostegui) ` /admin1/system1/logs1/log1-> show record27   properties   CreationTimestamp = 20190208014959.000000-360   ElementName = System Event Log Entry   RecordData = CPU 1 mach...
[06:28:43] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Update .gitignore [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489146 (owner: 10BryanDavis)
[06:28:59] <wikibugs>	 (03Merged) 10jenkins-bot: Run `2to3 -w` on codebase [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489145 (owner: 10BryanDavis)
[06:29:02] <marostegui>	 !log powercycle mw1299 - T215569
[06:29:03] <wikibugs>	 (03Merged) 10jenkins-bot: Update .gitignore [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/489146 (owner: 10BryanDavis)
[06:29:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:29:05] <stashbot>	 T215569: mw1299 is down (jobrunner-canary, now up but depooled) - https://phabricator.wikimedia.org/T215569
[06:30:08] <bd808>	 jouncebot: next
[06:30:08] <jouncebot>	 In 75 hour(s) and 59 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190211T1030)
[06:30:13] <bd808>	 jouncebot: now
[06:30:13] <jouncebot>	 No deployments scheduled for the next 75 hour(s) and 59 minute(s)
[06:30:19] <bd808>	 jouncebot: refresh
[06:30:21] <jouncebot>	 I refreshed my knowledge about deployments.
[06:30:43] <bd808>	 👍
[06:31:37] <Reedy>	 marostegui: it's fscked then? :P
[06:31:46] <marostegui>	 yeah, looks like the CPU
[06:31:53] <marostegui>	 I have restarted it and going to clean up logs
[06:31:58] <marostegui>	 to make sure we start "fresh"
[06:32:02] <marostegui>	 it had like 130 logs XD
[06:32:11] <marostegui>	 some of them are quite old, so could be confusing
[06:32:31] <Reedy>	 heh
[06:32:37] <icinga-wm>	 RECOVERY - Host mw1299 is UP: PING WARNING - Packet loss = 44%, RTA = 0.26 ms
[06:32:55] <icinga-wm>	 PROBLEM - puppet last run on cp1090 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ssl/dhparam.pem]
[06:36:06] <Reedy>	 marostegui: Amazingly, it's still in warranty too, for like 2 months
[06:39:05] <marostegui>	 oh nice, let's comment on the task so we can rush before it expires
[06:40:52] <Reedy>	 2019-04-14 if racktables is correct
[06:41:34] <marostegui>	 yeah, looks so on netbox too
[06:43:29] <icinga-wm>	 RECOVERY - puppet last run on cp1090 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[06:49:20] <wikibugs>	 10Operations, 10ops-eqiad: mw1299 is down (jobrunner-canary, now up but depooled) - https://phabricator.wikimedia.org/T215569 (10Marostegui) a:03RobH This host is under warranty until April 14, 2019 so we might want to try to debug this before it expires in case we need some replacement CPU or mainboard.
[06:51:14] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489148
[06:52:19] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489148 (owner: 10Marostegui)
[06:53:21] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489148 (owner: 10Marostegui)
[06:53:33] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489148 (owner: 10Marostegui)
[06:54:21] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 (duration: 00m 49s)
[06:54:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:54:35] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489149 (https://phabricator.wikimedia.org/T210713)
[06:54:42] <marostegui>	 !log Take a mysqldump from staging on dbstore1003 from dbstore1002 - T210478
[06:54:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:54:44] <stashbot>	 T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478
[06:56:01] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489149 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:57:02] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489149 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[06:58:02] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 46s)
[06:58:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:58:07] <marostegui>	 !log Deploy schema change on db1094 T210713
[06:58:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:58:10] <stashbot>	 T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713
[07:00:17] <icinga-wm>	 PROBLEM - HHVM rendering on mw2289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:01:25] <icinga-wm>	 RECOVERY - HHVM rendering on mw2289 is OK: HTTP OK: HTTP/1.1 200 OK - 80716 bytes in 0.308 second response time
[07:02:41] <icinga-wm>	 PROBLEM - HHVM rendering on mw2177 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:02:41] <icinga-wm>	 PROBLEM - HHVM rendering on mw2251 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:03:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw2251 is OK: HTTP OK: HTTP/1.1 200 OK - 80716 bytes in 0.289 second response time
[07:03:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw2177 is OK: HTTP OK: HTTP/1.1 200 OK - 80716 bytes in 0.296 second response time
[07:04:40] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489149 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[07:09:11] <wikibugs>	 (03PS1) 10Vgutierrez: Rename certcentral to acme-chief [software/certcentral] - 10https://gerrit.wikimedia.org/r/489150 (https://phabricator.wikimedia.org/T207389)
[07:09:43] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10serviceops, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 2 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Joe) >>! In T212129#4932035, @EvanProdromou wrote: >...
[07:10:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Rename certcentral to acme-chief [software/certcentral] - 10https://gerrit.wikimedia.org/r/489150 (https://phabricator.wikimedia.org/T207389) (owner: 10Vgutierrez)
[07:12:43] <marostegui>	 !log Upgrade mysql and kernel on db1094
[07:12:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:12:55] <icinga-wm>	 PROBLEM - Host mw1299 is DOWN: PING CRITICAL - Packet loss = 100%
[07:13:06] <marostegui>	 heh, it didn't last long
[07:17:27] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10serviceops, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 2 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Joe) >>! In T212129#4935629, @EvanProdromou wrote: >...
[07:19:53] <wikibugs>	 10Operations, 10ops-eqiad: mw1299 is down (jobrunner-canary, now up but depooled) - https://phabricator.wikimedia.org/T215569 (10Marostegui) And if crashed again with the same error: ` /admin1/system1/logs1/log1-> show record13   properties   CreationTimestamp = 20190208071154.000000-360   ElementName = System...
[07:20:17] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Search, 10Elasticsearch: Merge http and https elasticsearch icinga checks into one - https://phabricator.wikimedia.org/T215587 (10Mathew.onipe)
[07:20:29] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Search, 10Elasticsearch: Merge http and https elasticsearch icinga checks into one - https://phabricator.wikimedia.org/T215587 (10Mathew.onipe) p:05Triage→03Normal a:03Mathew.onipe
[07:21:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 883.27 seconds
[07:21:51] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489151
[07:21:55] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 606.95 seconds
[07:22:47] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 877.48 seconds
[07:22:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489151 (owner: 10Marostegui)
[07:22:57] <wikibugs>	 (03PS2) 10Vgutierrez: Rename certcentral to acme-chief [software/certcentral] - 10https://gerrit.wikimedia.org/r/489150 (https://phabricator.wikimedia.org/T207389)
[07:23:53] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489151 (owner: 10Marostegui)
[07:24:31] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 200.82 seconds
[07:24:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Rename certcentral to acme-chief [software/certcentral] - 10https://gerrit.wikimedia.org/r/489150 (https://phabricator.wikimedia.org/T207389) (owner: 10Vgutierrez)
[07:26:53] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489151 (owner: 10Marostegui)
[07:27:00] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 (duration: 02m 56s)
[07:27:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:27:53] <logmsgbot>	 !log marostegui@puppetmaster1001 conftool action : set/pooled=no; selector: name=mw1299.eqiad.wmnet
[07:27:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:28:51] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 61.16 seconds
[07:29:51] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Give more traffic to db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489153
[07:32:20] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] dbstore.my.cnf: Enable automatic slaves start [puppet] - 10https://gerrit.wikimedia.org/r/489147 (https://phabricator.wikimedia.org/T213670) (owner: 10Marostegui)
[07:32:52] <wikibugs>	 (03PS1) 10Mathew.onipe: icinga: enable check for psi and omega clusters [puppet] - 10https://gerrit.wikimedia.org/r/489154 (https://phabricator.wikimedia.org/T212850)
[07:33:07] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 197.70 seconds
[07:36:03] <wikibugs>	 (03PS2) 10Mathew.onipe: icinga: enable check for psi and omega clusters [puppet] - 10https://gerrit.wikimedia.org/r/489154 (https://phabricator.wikimedia.org/T212850)
[07:36:53] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10serviceops, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 3 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10jijiki)
[07:37:40] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Give more traffic to db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489153 (owner: 10Marostegui)
[07:38:43] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Give more traffic to db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489153 (owner: 10Marostegui)
[07:38:56] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Give more traffic to db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489153 (owner: 10Marostegui)
[07:39:08] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] dbstore.my.cnf: Enable automatic slaves start [puppet] - 10https://gerrit.wikimedia.org/r/489147 (https://phabricator.wikimedia.org/T213670) (owner: 10Marostegui)
[07:41:55] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1094 (duration: 02m 55s)
[07:41:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:45:32] <logmsgbot>	 !log jmm@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=mw1299.eqiad.wmnet
[07:45:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:48:35] <wikibugs>	 (03PS3) 10Vgutierrez: Rename certcentral to acme-chief [software/certcentral] - 10https://gerrit.wikimedia.org/r/489150 (https://phabricator.wikimedia.org/T207389)
[07:51:24] <wikibugs>	 (03CR) 10Vgutierrez: "This change is ready for review." [software/certcentral] - 10https://gerrit.wikimedia.org/r/489150 (https://phabricator.wikimedia.org/T207389) (owner: 10Vgutierrez)
[07:52:07] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489155
[07:53:44] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489155 (owner: 10Marostegui)
[07:54:46] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489155 (owner: 10Marostegui)
[07:55:51] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Full repool db1094 (duration: 00m 47s)
[07:55:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:51] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489156 (https://phabricator.wikimedia.org/T210713)
[07:59:13] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489155 (owner: 10Marostegui)
[08:00:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489156 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[08:01:53] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489156 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[08:03:05] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 46s)
[08:03:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:22] <marostegui>	 !log Upgrade MySQL on db1086 and deploy schema change
[08:05:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:09:40] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10User-jijiki: Please create docker-sig@ mailing list - https://phabricator.wikimedia.org/T215563 (10jijiki) p:05Triage→03Normal
[08:10:19] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489156 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[08:15:59] <marostegui>	 !log Upgrade MySQL on db1086
[08:16:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:17:23] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Depool db1083 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489157
[08:17:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: Depool db1083 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489157 (owner: 10Jcrespo)
[08:19:48] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Depool db1083 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489157 (owner: 10Jcrespo)
[08:20:57] <wikibugs>	 (03Merged) 10jenkins-bot: mariadb: Depool db1083 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489157 (owner: 10Jcrespo)
[08:21:43] <wikibugs>	 (03CR) 10jenkins-bot: mariadb: Depool db1083 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489157 (owner: 10Jcrespo)
[08:23:33] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1083 (duration: 00m 47s)
[08:23:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:49] <jynus>	 !log stop and upgrade db1083
[08:24:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:49] <wikibugs>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool db1083 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489158
[08:26:29] <wikibugs>	 (03Abandoned) 10Elukey: profile::kafka::broker: fix cloud ferm ranges [puppet] - 10https://gerrit.wikimedia.org/r/471951 (owner: 10Elukey)
[08:26:40] <wikibugs>	 (03Abandoned) 10Elukey: mcrouter: allow to tune server timeout and timeouts until tko [puppet] - 10https://gerrit.wikimedia.org/r/468363 (https://phabricator.wikimedia.org/T203786) (owner: 10Elukey)
[08:27:42] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489159
[08:28:38] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Repool db1083 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489160
[08:28:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489159 (owner: 10Marostegui)
[08:29:54] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489159 (owner: 10Marostegui)
[08:30:51] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 (duration: 00m 47s)
[08:30:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:32:50] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489159 (owner: 10Marostegui)
[08:39:43] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Repool db1083 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489160
[08:42:33] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1306 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[08:42:43] <icinga-wm>	 PROBLEM - Nginx local proxy to videoscaler on mw1306 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.009 second response time
[08:43:07] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Repool db1083 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489160 (owner: 10Jcrespo)
[08:43:49] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1306 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.006 second response time
[08:43:59] <icinga-wm>	 RECOVERY - Nginx local proxy to videoscaler on mw1306 is OK: HTTP OK: HTTP/1.1 200 OK - 288 bytes in 0.032 second response time
[08:44:08] <wikibugs>	 (03Merged) 10jenkins-bot: mariadb: Repool db1083 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489160 (owner: 10Jcrespo)
[08:44:21] <wikibugs>	 (03CR) 10jenkins-bot: mariadb: Repool db1083 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489160 (owner: 10Jcrespo)
[08:48:48] <wikibugs>	 (03PS1) 10Vgutierrez: Add acmechief[12]001 DNS entries [dns] - 10https://gerrit.wikimedia.org/r/489161 (https://phabricator.wikimedia.org/T207389)
[08:49:16] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Depool db1099 from s1 and s8 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489162
[08:50:37] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1083 with low load (duration: 00m 46s)
[08:50:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:51:16] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] Add acmechief[12]001 DNS entries [dns] - 10https://gerrit.wikimedia.org/r/489161 (https://phabricator.wikimedia.org/T207389) (owner: 10Vgutierrez)
[08:52:07] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Depool db1099 from s1 and s8 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489162
[08:53:10] <moritzm>	 !log reimage graphite2002 to buster
[08:53:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:53:36] <jynus>	 \o/
[08:59:54] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Give more traffic to db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489163
[09:00:13] <elukey>	 moritzm: any chance that I can do the same with stat1005?? :D
[09:02:34] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Give more traffic to db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489163 (owner: 10Marostegui)
[09:03:14] <wikibugs>	 (03PS1) 10Vgutierrez: install_server: Add DHCP entries for acmechief[12]001 [puppet] - 10https://gerrit.wikimedia.org/r/489164 (https://phabricator.wikimedia.org/T207389)
[09:03:38] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Give more traffic to db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489163 (owner: 10Marostegui)
[09:04:37] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1086 (duration: 00m 47s)
[09:04:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:27] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Give more traffic to db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489163 (owner: 10Marostegui)
[09:05:51] <moritzm>	 elukey: still running into some installer issues, but when that's figured out, for sure!
[09:06:44] <moritzm>	 !log installing libarchive security updates
[09:06:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:52] <wikibugs>	 (03CR) 10Faidon Liambotis: "This sounds like a good idea in general, but is there a plan for how users are supposed to validate (and by extension, invalidate) the hos" [dns] - 10https://gerrit.wikimedia.org/r/489103 (owner: 10Dzahn)
[09:16:13] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489166
[09:16:23] <moritzm>	 !log installing rssh security updates
[09:16:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:19:10] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489166 (owner: 10Marostegui)
[09:20:14] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489166 (owner: 10Marostegui)
[09:22:19] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 46s)
[09:22:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:31] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Search, 10Elasticsearch, and 2 others: Merge http and https elasticsearch icinga checks into one - https://phabricator.wikimedia.org/T215587 (10Peachey88)
[09:22:52] <wikibugs>	 (03PS3) 10Jcrespo: mariadb: Depool db1099 from s1 and s8 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489162
[09:26:11] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Depool db1099 from s1 and s8 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489162 (owner: 10Jcrespo)
[09:27:19] <wikibugs>	 (03Merged) 10jenkins-bot: mariadb: Depool db1099 from s1 and s8 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489162 (owner: 10Jcrespo)
[09:28:51] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489166 (owner: 10Marostegui)
[09:28:53] <wikibugs>	 (03CR) 10jenkins-bot: mariadb: Depool db1099 from s1 and s8 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489162 (owner: 10Jcrespo)
[09:28:55] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1099 (duration: 00m 46s)
[09:28:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:31:21] <wikibugs>	 10Operations: confd: Superfluous golang dependency - https://phabricator.wikimedia.org/T215593 (10MoritzMuehlenhoff)
[09:34:28] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489167 (https://phabricator.wikimedia.org/T210713)
[09:34:50] <wikibugs>	 (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/14583/" [puppet] - 10https://gerrit.wikimedia.org/r/488436 (owner: 10Muehlenhoff)
[09:35:51] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489167 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[09:36:57] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489167 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[09:37:45] <wikibugs>	 10Operations, 10Maps (Kartotherian): Create discovery entry for Kartotherian - https://phabricator.wikimedia.org/T214672 (10Gehel) 05Open→03Invalid Discovery entry is only used for internal communications, but not by varnish (which confused me quite a bit). So we do have the proper entries in place, let's...
[09:37:59] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 (duration: 00m 46s)
[09:38:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:40:31] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489167 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui)
[09:42:22] <wikibugs>	 (03PS1) 10Elukey: Add analytics dbstore SRV records [dns] - 10https://gerrit.wikimedia.org/r/489170
[09:43:57] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend d-i config for buster [puppet] - 10https://gerrit.wikimedia.org/r/489171 (https://phabricator.wikimedia.org/T213527)
[09:51:54] <wikibugs>	 (03PS10) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772
[09:52:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 (owner: 10Daimona Eaytoy)
[09:55:30] <wikibugs>	 (03PS2) 10Daimona Eaytoy: Remove $wgAbuseFilterRuntimeProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486470 (https://phabricator.wikimedia.org/T191039)
[09:57:00] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10MW-1.33-notes (1.33.0-wmf.2; 2018-10-30), 10User-Addshore: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10jcrespo)
[09:57:23] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10MW-1.33-notes (1.33.0-wmf.2; 2018-10-30), 10User-Addshore: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10jcrespo)
[10:09:18] <wikibugs>	 (03PS11) 10Daimona Eaytoy: Move all AbuseFilter config to abusefilter.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477063 (https://phabricator.wikimedia.org/T145931)
[10:10:16] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489174
[10:10:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Icinga: add ping check for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/489113 (https://phabricator.wikimedia.org/T209101) (owner: 10Ayounsi)
[10:11:49] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489174 (owner: 10Marostegui)
[10:12:59] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489174 (owner: 10Marostegui)
[10:13:50] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489174 (owner: 10Marostegui)
[10:14:15] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 (duration: 00m 47s)
[10:14:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:46] <jynus>	 !log stop and upgrade db1099
[10:15:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:17] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/487982 (https://phabricator.wikimedia.org/T205886) (owner: 10Volans)
[10:23:38] <godog>	 !log swift codfw-prod: more weight to ms-be2047 - T209395 T209921
[10:23:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:23:42] <stashbot>	 T209395: rack/setup/install new ms-be servers ms-be204[4-9] ,ms-be2050 - https://phabricator.wikimedia.org/T209395
[10:23:43] <stashbot>	 T209921: ms-be2047 spontaneous reboots - https://phabricator.wikimedia.org/T209921
[10:24:05] <wikibugs>	 (03PS1) 10Effie Mouzeli: Apply -R 200 to memcached on mc1026 [puppet] - 10https://gerrit.wikimedia.org/r/489175 (https://phabricator.wikimedia.org/T208844)
[10:27:19] <jijiki>	 !log Restarting memcached on mc1026 to apply '-R 200' - T208844
[10:27:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:27:22] <stashbot>	 T208844: Apply -R 200 to all the memcached mw object cache instances running in eqiad/codfw - https://phabricator.wikimedia.org/T208844
[10:27:36] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] Apply -R 200 to memcached on mc1026 [puppet] - 10https://gerrit.wikimedia.org/r/489175 (https://phabricator.wikimedia.org/T208844) (owner: 10Effie Mouzeli)
[10:28:50] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/488436 (owner: 10Muehlenhoff)
[10:29:17] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 724.90 seconds
[10:29:51] <wikibugs>	 (03PS11) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772
[10:30:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 (owner: 10Daimona Eaytoy)
[10:31:41] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10serviceops, 10Patch-For-Review, and 3 others: Apply -R 200 to all the memcached mw object cache instances running in eqiad/codfw - https://phabricator.wikimedia.org/T208844 (10jijiki)
[10:43:37] <wikibugs>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool db1099 from s1 and s8 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489178
[10:46:04] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Depool db1099 from s1 and s8 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489178 (owner: 10Jcrespo)
[10:47:13] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1099 from s1 and s8 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489178 (owner: 10Jcrespo)
[10:47:52] <wikibugs>	 (03CR) 10jenkins-bot: Revert "mariadb: Depool db1099 from s1 and s8 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489178 (owner: 10Jcrespo)
[10:50:13] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1099 (duration: 00m 47s)
[10:50:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:52:27] <wikibugs>	 (03PS12) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772
[10:53:13] <wikibugs>	 (03PS2) 10Jcrespo: Revert "mariadb: Depool db1083 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489158
[10:53:49] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 239.49 seconds
[10:57:21] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] "The mapping is correct." [dns] - 10https://gerrit.wikimedia.org/r/489170 (owner: 10Elukey)
[10:58:39] <wikibugs>	 10Operations, 10serviceops, 10User-jijiki: Fix spamassassin's "warn: netset: cannot include <network>" warning - https://phabricator.wikimedia.org/T215496 (10jijiki) 05Open→03Resolved a:03jijiki Resolved by @akosiaris in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/488894/
[10:59:10] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Depool db1083 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489158 (owner: 10Jcrespo)
[11:00:15] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1083 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489158 (owner: 10Jcrespo)
[11:01:02] <wikibugs>	 (03PS2) 10Muehlenhoff: Extend d-i config for buster [puppet] - 10https://gerrit.wikimedia.org/r/489171 (https://phabricator.wikimedia.org/T213527)
[11:04:18] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "this seems perfectly legit use of SRV records to me" [dns] - 10https://gerrit.wikimedia.org/r/489170 (owner: 10Elukey)
[11:04:46] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] redis: Stop supporting trusty/upstart [puppet] - 10https://gerrit.wikimedia.org/r/488436 (owner: 10Muehlenhoff)
[11:05:07] <wikibugs>	 (03PS2) 10Effie Mouzeli: redis: Stop supporting trusty/upstart [puppet] - 10https://gerrit.wikimedia.org/r/488436 (owner: 10Muehlenhoff)
[11:05:24] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Extend d-i config for buster [puppet] - 10https://gerrit.wikimedia.org/r/489171 (https://phabricator.wikimedia.org/T213527) (owner: 10Muehlenhoff)
[11:06:22] <wikibugs>	 (03PS3) 10Effie Mouzeli: redis: Stop supporting trusty/upstart [puppet] - 10https://gerrit.wikimedia.org/r/488436 (owner: 10Muehlenhoff)
[11:08:19] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1083 fully (duration: 00m 47s)
[11:08:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:15] <wikibugs>	 (03CR) 10jenkins-bot: Revert "mariadb: Depool db1083 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489158 (owner: 10Jcrespo)
[11:18:26] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: Serve >= 50% of production Prometheus systems with Prometheus v2 - https://phabricator.wikimedia.org/T187987 (10MoritzMuehlenhoff) As discussed on IRC: Let's upgrade to 2.7.1 next week as that fixes a security issue (CVE-2019-3826) in the internal UI (not expos...
[11:18:45] <wikibugs>	 10Operations, 10Patch-For-Review: ferm: Log dropped packets - https://phabricator.wikimedia.org/T116011 (10jbond) discussion  from meeting https://etherpad.wikimedia.org/p/SRE-Foundations-ulogd_discussion  The conclusion was to have iptable drop standard entries sent to syslog, as syslog is already in the logg...
[11:19:03] <wikibugs>	 10Operations, 10monitoring: Expose linux kernel firewall and connections statistics - https://phabricator.wikimedia.org/T215277 (10jbond) discussion  from meeting https://etherpad.wikimedia.org/p/SRE-Foundations-ulogd_discussion  The conclusion was to have iptable drop standard entries sent to syslog, as syslo...
[11:29:46] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Would a comment instead of a removal work then?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/488800 (owner: 10Alexandros Kosiaris)
[11:36:15] <moritzm>	 !log reimage graphite2002 to buster
[11:36:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:00] <wikibugs>	 10Operations, 10ops-codfw, 10monitoring, 10Patch-For-Review: rack/setup/install graphite2003 - https://phabricator.wikimedia.org/T196483 (10fgiunchedi)
[11:40:02] <wikibugs>	 10Operations, 10ops-codfw, 10monitoring: graphite2001 crashed - https://phabricator.wikimedia.org/T198041 (10fgiunchedi) 05Open→03Declined Host is going to be decom -- declining
[11:51:30] <wikibugs>	 (03PS4) 10D3r1ck01: Stop NavPopups gadget conflict with PagePreviews on Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878)
[11:52:19] <wikibugs>	 (03CR) 10D3r1ck01: "PS4 is **only** a manual rebase!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) (owner: 10D3r1ck01)
[11:55:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM overall, some food-for-thought questions (i.e. ok to followup/address later and on phabricator)" [dns] - 10https://gerrit.wikimedia.org/r/489170 (owner: 10Elukey)
[12:06:04] <wikibugs>	 (03PS26) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011)
[12:07:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond)
[12:07:56] <wikibugs>	 (03PS27) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011)
[12:08:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond)
[12:12:14] <wikibugs>	 (03CR) 10Elukey: "> LGTM overall, some food-for-thought questions (i.e. ok to" [dns] - 10https://gerrit.wikimedia.org/r/489170 (owner: 10Elukey)
[12:17:51] <icinga-wm>	 PROBLEM - Host db1114 is DOWN: PING CRITICAL - Packet loss = 100%
[12:18:51] <elukey>	 is anybody working on --^ ?
[12:18:55] <elukey>	 don't see anything in the SAL
[12:18:59] <elukey>	 Cc: marostegui, jynus 
[12:19:58] <elukey>	 from https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php it seems pooled for s1 right? (slave of course)
[12:23:20] <wikibugs>	 (03PS1) 10Elukey: Depool db1114 - host down [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489189
[12:24:27] <wikibugs>	 (03PS2) 10Elukey: Depool db1114 - host down [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489189
[12:25:21] <wikibugs>	 (03CR) 10Muehlenhoff: Introduce systemd::slice::all_users (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey)
[12:27:54] <moritzm>	 db1114 seems to have crashed due to memory errors, there's several "Critical" errors in SEL for DIMM_B7 and DIMM_B3
[12:28:18] <elukey>	 yeah I think it is not the first time it crashed, it happened a week ago (from SAL)
[12:28:23] <moritzm>	 "Multi-bit memory errors detected on a memory device"
[12:28:38] <moritzm>	 right, just found https://phabricator.wikimedia.org/T214720
[12:29:46] <wikibugs>	 (03PS3) 10Elukey: Depool db1114 - host down [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489189 (https://phabricator.wikimedia.org/T214720)
[12:29:50] <elukey>	 I think my patch should be ok, there is already another shard for the api read traffic
[12:31:26] <moritzm>	 probably yes
[12:32:55] <elukey>	 calling manuel
[12:34:07] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1114 crashed - https://phabricator.wikimedia.org/T214720 (10MoritzMuehlenhoff) The server went down at 12:16, with a number of memory errors logged in SEL:  ` ------------------------------------------------------------------------------- Record:...
[12:34:17] <elukey>	 trying with Jaime
[12:34:19] <wikibugs>	 (03PS13) 10BBlack: WIP: Add a Google Translate-specific redirect-to-mobile [puppet] - 10https://gerrit.wikimedia.org/r/485171 (https://phabricator.wikimedia.org/T212197) (owner: 10Dr0ptp4kt)
[12:34:50] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Depool db1114 - host down [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489189 (https://phabricator.wikimedia.org/T214720) (owner: 10Elukey)
[12:34:57] <_joe_>	 elukey: jfdi :)
[12:35:01] <moritzm>	 ack. the serial console is also dead, I see a few characters of garbage output, but nothing else
[12:35:17] <elukey>	 _joe_ I don't recall exactly how to deploy :P
[12:35:26] <_joe_>	 is enwiki working? meaning you can edit?
[12:35:36] <elukey>	 that is a slave, only read traffic
[12:35:46] <_joe_>	 elukey: that sadly didn't matter in the past
[12:35:58] <elukey>	 ah yes but it should be fixed no?
[12:36:03] <_joe_>	 should be
[12:36:06] <_joe_>	 might be
[12:36:08] <_joe_>	 :)
[12:36:31] <_joe_>	 anyways, you +2 the patch, wait for gate-and-submit, go on deploy1001 
[12:36:41] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Depool db1114 - host down [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489189 (https://phabricator.wikimedia.org/T214720) (owner: 10Elukey)
[12:36:44] <_joe_>	 and well, jijiki knows what to do if she's around
[12:36:49] <_joe_>	 else, I can assist
[12:37:30] <wikibugs>	 (03PS28) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011)
[12:37:36] <jijiki>	 what did I miss?
[12:37:52] <_joe_>	 you need to pull the code in /srv/mediawiki-staging and do a scap sync-file, see https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Small_changes:_sync_individual_files_or_directories
[12:38:03] <_joe_>	 jijiki: elukey needs to deploy 1 file via scap
[12:38:31] <jijiki>	 good, we can experiment together
[12:38:59] <wikibugs>	 (03CR) 10jenkins-bot: Depool db1114 - host down [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489189 (https://phabricator.wikimedia.org/T214720) (owner: 10Elukey)
[12:39:07] <jijiki>	 elukey: so _joe_ by saying "knows what to do" means "looking for trouble:
[12:39:09] <jijiki>	 "
[12:39:46] <jijiki>	 :D
[12:40:01] <elukey>	 ok so there's nothing pending in mediawiki-staging
[12:40:56] <elukey>	 so I have to cd into wmf-config, pull, scap sync-file blabla right?
[12:41:11] <wikibugs>	 (03PS14) 10BBlack: WIP: Add a Google Translate-specific redirect-to-mobile [puppet] - 10https://gerrit.wikimedia.org/r/485171 (https://phabricator.wikimedia.org/T212197) (owner: 10Dr0ptp4kt)
[12:41:31] <jijiki>	 let's tmux on deploy1001
[12:42:08] <_joe_>	 elukey: not in wmf-config, in the main dir
[12:42:09] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1007 is CRITICAL: 0.8295 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[12:42:10] <jijiki>	 we have to remote update, check diffs 
[12:42:22] <jijiki>	 elukey: in /srv/mediawiki-staging
[12:42:25] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2020 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[12:43:05] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.2456 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[12:44:35] <wikibugs>	 (03PS29) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011)
[12:44:49] <logmsgbot>	 !log elukey@deploy1001 Synchronized wmf-config/db-eqiad.php: depooling db1114, host down (duration: 00m 47s)
[12:44:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:32] <elukey>	 all right seems done
[12:46:06] <moritzm>	 ms-be2020 was just the session scope bug under high load, fixed
[12:46:19] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2020 is OK: OK - running: The system is fully operational
[12:46:59] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.2406 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[12:47:43] <jijiki>	 om anyone knows about 
[12:47:45] <jijiki>	 https://grafana.wikimedia.org/d/000000561/logstash?orgId=1
[12:47:47] <jijiki>	 this?
[12:48:26] <wikibugs>	 (03PS1) 10Joal: Update aqs druid datasource to 2019_01 snapshot [puppet] - 10https://gerrit.wikimedia.org/r/489194
[12:50:03] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1009 is CRITICAL: 0.1429 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[12:51:38] <jynus>	 !log disabling notifications on db1114
[12:51:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:52:41] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1009 is OK: (C)0.1 ge (W)0.05 ge 0.01983 https://grafana.wikimedia.org/dashboard/db/logstash
[12:53:32] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Update aqs druid datasource to 2019_01 snapshot [puppet] - 10https://gerrit.wikimedia.org/r/489194 (owner: 10Joal)
[12:55:45] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: grid: exec_environ: use libmariadbclient* packages [puppet] - 10https://gerrit.wikimedia.org/r/489195 (https://phabricator.wikimedia.org/T215578)
[12:56:16] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: toolforge: grid: exec_environ: use libmariadbclient-dev* packages [puppet] - 10https://gerrit.wikimedia.org/r/489195 (https://phabricator.wikimedia.org/T215578)
[12:57:11] <wikibugs>	 (03CR) 10Elukey: Introduce systemd::slice::all_users (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey)
[12:59:03] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1007 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[13:00:01] <wikibugs>	 (03CR) 10Muehlenhoff: Introduce systemd::slice::all_users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey)
[13:01:16] <wikibugs>	 (03CR) 10Elukey: Introduce systemd::slice::all_users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey)
[13:04:43] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1004.eqiad.wmnet ` The log can be fou...
[13:04:46] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1004.eqiad.wmnet'] `  Of which those **FAILED**: ` ['cloudelastic1004.eqiad.wmnet'] `
[13:05:09] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.3605 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[13:05:09] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2021 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:05:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1004.eqiad.wmnet ` The log can be fou...
[13:05:31] <gehel>	 jijiki: are you looking at the logstash overload?
[13:05:44] <jijiki>	 gehel: yeah with jynu.s
[13:05:54] <arturo>	 !log T209029 reimaging cloudelastic1004
[13:05:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:58] <stashbot>	 T209029: cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029
[13:06:16] <gehel>	 jijiki: ok, I probably don't know much more than you about it, but ping me if you need another set of eyes
[13:06:23] <jijiki>	 tx tx :)\
[13:09:01] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1008 is OK: (C)0.1 ge (W)0.05 ge 0.03716 https://grafana.wikimedia.org/dashboard/db/logstash
[13:15:20] <wikibugs>	 (03PS12) 10Jbond: Improve CI checks to ensure a basic catalogue compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275)
[13:15:32] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Pool rc slaves with higher weight to rebalance load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489200 (https://phabricator.wikimedia.org/T214720)
[13:16:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Improve CI checks to ensure a basic catalogue compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275) (owner: 10Jbond)
[13:17:38] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] toolforge: grid: exec_environ: use libmariadbclient-dev* packages [puppet] - 10https://gerrit.wikimedia.org/r/489195 (https://phabricator.wikimedia.org/T215578) (owner: 10Arturo Borrero Gonzalez)
[13:20:21] <wikibugs>	 (03PS7) 10Elukey: Introduce systemd::slice::all_users [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824)
[13:21:09] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 04-1] Stop NavPopups gadget conflict with PagePreviews on Wikivoyage (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) (owner: 10D3r1ck01)
[13:21:17] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw2221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:22:09] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw2221 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.211 second response time
[13:31:49] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "When dropping the code in Toolforge, there is leftover stale file from file(profile/toolforge/bastion-root-resource-control.conf). Please " [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey)
[13:33:59] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1004.eqiad.wmnet'] `  Of which those **FAILED**: ` ['cloudelastic1004.eqiad.wmnet'] `
[13:34:19] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1004.eqiad.wmnet ` The log can be fou...
[13:34:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1004.eqiad.wmnet'] `  Of which those **FAILED**: ` ['cloudelastic1004.eqiad.wmnet'] `
[13:35:42] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1004.eqiad.wmnet ` The log can be fou...
[13:37:40] <elukey>	 !log roll restart of aqs on aqs1* to pick up new druid backend changes
[13:37:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:49] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2021 is OK: OK - running: The system is fully operational
[13:39:53] <onimisionipe>	 !log starting osm-initial-import for maps2004 which is the newly migrated to stretch master - T198622
[13:39:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:39:56] <stashbot>	 T198622: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622
[13:44:18] <wikibugs>	 (03PS1) 10Urbanecm: New throttle rule + removal of expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489207 (https://phabricator.wikimedia.org/T215446)
[13:44:58] <Urbanecm>	 Hi, anybody to deploy https://gerrit.wikimedia.org/r/489207 for T215446? It's throttle rule for tomorrow 
[13:44:59] <stashbot>	 T215446: Request for temporary lift of account creation cap for Wikipedia edit-a-thon event (Feb 9) - https://phabricator.wikimedia.org/T215446
[13:45:40] <jynus>	 !log racadm serveraction powercycle db1114
[13:45:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:45:51] <wikibugs>	 (03Abandoned) 10Zppix: Remove past throttles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487881 (owner: 10Zppix)
[13:46:24] <wikibugs>	 (03PS3) 10Zppix: Lift Account creation cap for Women Activists edit-a-thon at Simmons University [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487876 (https://phabricator.wikimedia.org/T215069)
[13:47:54] <wikibugs>	 (03CR) 10D3r1ck01: "Thanks for catching that." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) (owner: 10D3r1ck01)
[13:48:00] <wikibugs>	 10Operations, 10MediaWiki-Database, 10monitoring: MediaWiki errors overloading logtash - https://phabricator.wikimedia.org/T215611 (10Marostegui)
[13:49:40] <wikibugs>	 (03PS5) 10D3r1ck01: Stop NavPopups gadget conflict with PagePreviews on Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878)
[13:49:50] <wikibugs>	 10Operations, 10MediaWiki-Database, 10MediaWiki-Logging, 10Wikimedia-Logstash, 10monitoring: MediaWiki errors overloading logtash - https://phabricator.wikimedia.org/T215611 (10jcrespo)
[13:50:12] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): "I'm quite a bit worried to see we are starting a "black list of bad words" here. When will we stop expanding this list? How did it started" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/487917 (https://phabricator.wikimedia.org/T201491) (owner: 10Dzahn)
[13:52:03] <wikibugs>	 10Operations, 10MediaWiki-Database, 10Wikimedia-Logstash, 10monitoring, 10Wikimedia-production-error: MediaWiki errors overloading logtash - https://phabricator.wikimedia.org/T215611 (10jcrespo) To clarify "lag behind"- it created at least 20 minutes of lag, which would have blocked any mediawiki
[13:56:47] <moritzm>	 !log updated firmware-enriched buster netboot image to 20190208 daily build, the alpha5 image no longer works as Linux 4.19.16-1 bumped the ABI and migrated to testing yesterday
[13:56:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:49] <wikibugs>	 (03PS1) 10Marostegui: db1114: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/489210 (https://phabricator.wikimedia.org/T214720)
[13:57:31] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] db1114: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/489210 (https://phabricator.wikimedia.org/T214720) (owner: 10Marostegui)
[13:57:49] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1114: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/489210 (https://phabricator.wikimedia.org/T214720) (owner: 10Marostegui)
[14:04:45] <wikibugs>	 (03PS1) 10Bmansurov: Add page-links-change event to EventStreams [puppet] - 10https://gerrit.wikimedia.org/r/489211 (https://phabricator.wikimedia.org/T214706)
[14:05:31] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10aborrero) I'm having troubles reimaging the server:  ` aborrero@cumin1001:~$ sudo -i wmf-auto-reimage-host -p T209029 --no-verify --no-downtime --no-reboot...
[14:06:43] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: scaffolding: Fix deployment indentation [deployment-charts] - 10https://gerrit.wikimedia.org/r/489212
[14:07:48] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] add statsd_exporter config to mathoid (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/482718 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[14:08:01] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Switch db1114 and db1118 roles [puppet] - 10https://gerrit.wikimedia.org/r/489213 (https://phabricator.wikimedia.org/T214720)
[14:08:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] scaffolding: Fix deployment indentation [deployment-charts] - 10https://gerrit.wikimedia.org/r/489212 (owner: 10Alexandros Kosiaris)
[14:10:17] <wikibugs>	 (03PS1) 10Jcrespo: install_server: Allow full reimage of db1114 [puppet] - 10https://gerrit.wikimedia.org/r/489214 (https://phabricator.wikimedia.org/T214720)
[14:11:18] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] Lift Account creation cap for Women Activists edit-a-thon at Simmons University (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487876 (https://phabricator.wikimedia.org/T215069) (owner: 10Zppix)
[14:11:48] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] install_server: Allow full reimage of db1114 [puppet] - 10https://gerrit.wikimedia.org/r/489214 (https://phabricator.wikimedia.org/T214720) (owner: 10Jcrespo)
[14:12:28] <wikibugs>	 (03PS2) 10Elukey: Add analytics dbstore SRV records [dns] - 10https://gerrit.wikimedia.org/r/489170 (https://phabricator.wikimedia.org/T212386)
[14:12:39] <wikibugs>	 (03PS6) 10DCausse: [WIP] Upgrade to 6.5.4 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/446869 (https://phabricator.wikimedia.org/T199791)
[14:12:41] <wikibugs>	 (03PS3) 10DCausse: [WIP] Add nori korean analyzer [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/486266 (https://phabricator.wikimedia.org/T206874)
[14:13:14] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: Switch db1114 and db1118 roles [puppet] - 10https://gerrit.wikimedia.org/r/489213 (https://phabricator.wikimedia.org/T214720) (owner: 10Jcrespo)
[14:22:26] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-Database, 10Wikimedia-Logstash, and 2 others: MediaWiki errors overloading logtash - https://phabricator.wikimedia.org/T215611 (10CDanis) I don't feel nearly well-versed in PHP/PSR-3/Monolog nor the MW codebase to suggest implementations, but it seems to me t...
[14:23:28] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Stop NavPopups gadget conflict with PagePreviews on Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487007 (https://phabricator.wikimedia.org/T214878) (owner: 10D3r1ck01)
[14:27:15] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] install_server: Allow full reimage of db1114 [puppet] - 10https://gerrit.wikimedia.org/r/489214 (https://phabricator.wikimedia.org/T214720) (owner: 10Jcrespo)
[14:27:44] <wikibugs>	 (03PS2) 10Jcrespo: install_server: Allow full reimage of db1114 [puppet] - 10https://gerrit.wikimedia.org/r/489214 (https://phabricator.wikimedia.org/T214720)
[14:34:49] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10MoritzMuehlenhoff) > The debian installer completes, but I can't log in because apparently the first puppet run isn't completed and I can't use any login me...
[14:35:59] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1004.eqiad.wmnet'] `  Of which those **FAILED**: ` ['cloudelastic1004.eqiad.wmnet'] `
[14:37:51] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Switch db1114 and db1118 roles [puppet] - 10https://gerrit.wikimedia.org/r/489213 (https://phabricator.wikimedia.org/T214720)
[14:43:01] <wikibugs>	 (03PS1) 10GTirloni: wiki replicas: depool labsdb1009 for updates [puppet] - 10https://gerrit.wikimedia.org/r/489220 (https://phabricator.wikimedia.org/T212308)
[14:43:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wiki replicas: depool labsdb1009 for updates [puppet] - 10https://gerrit.wikimedia.org/r/489220 (https://phabricator.wikimedia.org/T212308) (owner: 10GTirloni)
[14:43:54] <wikibugs>	 10Operations, 10Wiki-Loves-Love, 10Wikimedia-Mailing-lists, 10User-jijiki: Reset password for wll mailling list - https://phabricator.wikimedia.org/T215390 (10jijiki) @Psychoslave I have sent you the new password, please let me know if you got an automated one as well. Let us know if everything is ok:)
[14:44:28] <wikibugs>	 (03PS2) 10GTirloni: wiki replicas: depool labsdb1009 for updates [puppet] - 10https://gerrit.wikimedia.org/r/489220 (https://phabricator.wikimedia.org/T212308)
[14:47:41] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] wiki replicas: depool labsdb1009 for updates [puppet] - 10https://gerrit.wikimedia.org/r/489220 (https://phabricator.wikimedia.org/T212308) (owner: 10GTirloni)
[14:49:57] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] wiki replicas: depool labsdb1009 for updates [puppet] - 10https://gerrit.wikimedia.org/r/489220 (https://phabricator.wikimedia.org/T212308) (owner: 10GTirloni)
[14:51:06] <marostegui>	 !log Reload haproxy on dbproxy1011 to depool labsdb1009 - https://phabricator.wikimedia.org/T212308
[14:51:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:21] <marostegui>	 gtirloni: ^
[14:51:25] <wikibugs>	 (03PS13) 10Jbond: Improve CI checks to ensure a basic catalogue compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275)
[14:52:18] <gtirloni>	 thanks!
[14:52:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Improve CI checks to ensure a basic catalogue compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275) (owner: 10Jbond)
[15:07:39] <Urbanecm>	 Hi, anybody to deploy https://gerrit.wikimedia.org/r/489207 for T215446? It's throttle rule for tomorrow 
[15:07:40] <stashbot>	 T215446: Request for temporary lift of account creation cap for Wikipedia edit-a-thon event (Feb 9) - https://phabricator.wikimedia.org/T215446
[15:08:37] <Urbanecm>	 MaxSem, twentyafterfour, RoanKattouw, dereckson, thcipriani, Niharika, zeljkof, Reedy ?
[15:08:51] <wikibugs>	 (03PS3) 10Jcrespo: mariadb: Switch db1114 and db1118 roles [puppet] - 10https://gerrit.wikimedia.org/r/489213 (https://phabricator.wikimedia.org/T214720)
[15:10:55] <jijiki>	 !log Upgrading php-redis 4.1.1 to mwmaint1002 - T215376
[15:10:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:58] <stashbot>	 T215376: mwscript dies on mwmaint with PHP=php7.2 due to php-redis missing - https://phabricator.wikimedia.org/T215376
[15:11:08] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1114 crashed - https://phabricator.wikimedia.org/T214720 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts: ` ['db1118.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20190208151...
[15:14:44] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Switch db1114 and db1118 roles [puppet] - 10https://gerrit.wikimedia.org/r/489213 (https://phabricator.wikimedia.org/T214720) (owner: 10Jcrespo)
[15:18:49] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10RStallman-legalteam) Just confirming that the Master Services Agreement and Data Processing Agreem...
[15:23:29] <wikibugs>	 (03PS1) 10Andrew Bogott: nova: add wmcs-rescue-console.sh to compute hosts [puppet] - 10https://gerrit.wikimedia.org/r/489230 (https://phabricator.wikimedia.org/T215211)
[15:28:38] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1114 crashed - https://phabricator.wikimedia.org/T214720 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1118.eqiad.wmnet'] `  and were **ALL** successful.
[15:30:58] <wikibugs>	 (03PS8) 10Elukey: Introduce systemd::slice::all_users [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824)
[15:31:09] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] "Cool!" [dns] - 10https://gerrit.wikimedia.org/r/489170 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[15:31:15] <_joe_>	 !log upgraded all php extensions to php 7.2 compatible versions on mwmaint1002
[15:31:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:46] <moritzm>	 !log imported debmonitor 0.1.5-1+deb10u1 to buster-wikimedia (T213527)
[15:31:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:48] <stashbot>	 T213527: Prepare our base system layer for Debian buster - https://phabricator.wikimedia.org/T213527
[15:32:16] <zeljkof>	 Urbanecm: sorry, just saw your ping
[15:32:43] <Urbanecm>	 zeljkof, np, I'd like to get https://gerrit.wikimedia.org/r/489207 deployed extraordinaly
[15:32:47] <zeljkof>	 can you put it in an US SWAT, since it's throttle, it can be deployed without you, right?
[15:32:50] <wikibugs>	 (03PS14) 10Jbond: Improve CI checks to ensure a basic catalogue compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275)
[15:33:35] <wikibugs>	 (03CR) 10Elukey: "> When dropping the code in Toolforge, there is leftover stale file" [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey)
[15:33:36] <_joe_>	 !log apt-get upgrade on mwmaint2001 to fix the php installation T215376
[15:33:38] <Urbanecm>	 Which US SWaT?
[15:33:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:39] <stashbot>	 T215376: mwscript dies on mwmaint with PHP=php7.2 due to php-redis missing - https://phabricator.wikimedia.org/T215376
[15:33:43] <Urbanecm>	 zeljkof, 
[15:33:56] <Urbanecm>	 I don't see an US SWAT window between now and tomorrow
[15:33:58] <zeljkof>	 ah, it's friday :d
[15:34:00] <zeljkof>	 :D
[15:34:20] <zeljkof>	 sorry, totally confused
[15:34:35] <wikibugs>	 (03PS15) 10BBlack: Add a Google Translate-specific redirect-to-mobile [puppet] - 10https://gerrit.wikimedia.org/r/485171 (https://phabricator.wikimedia.org/T212197) (owner: 10Dr0ptp4kt)
[15:34:42] <Urbanecm>	 np :D
[15:35:17] <icinga-wm>	 PROBLEM - DPKG on mwmaint2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:35:19] <zeljkof>	 Urbanecm: uh, so since it's Friday, I think we need permission from greg-g, I don't think I've ever deployed anything on Friday
[15:35:25] <zeljkof>	 even throttle rule :/
[15:35:42] <Urbanecm>	 I'm pretty sure somebody did so sometime
[15:35:46] <Urbanecm>	 not 100% sure through
[15:37:09] <wikibugs>	 (03PS2) 10Andrew Bogott: nova: add wmcs-rescue-console.sh to compute hosts [puppet] - 10https://gerrit.wikimedia.org/r/489230 (https://phabricator.wikimedia.org/T215211)
[15:37:30] <wikibugs>	 (03PS3) 10Andrew Bogott: nova: add wmcs-rescue-console.sh to compute hosts [puppet] - 10https://gerrit.wikimedia.org/r/489230 (https://phabricator.wikimedia.org/T215211)
[15:37:34] <icinga-wm>	 PROBLEM - HTTP-noc on mwmaint2001 is CRITICAL: connect to address 10.192.48.45 and port 80: Connection refused
[15:38:48] <icinga-wm>	 RECOVERY - HTTP-noc on mwmaint2001 is OK: HTTP OK: HTTP/1.1 200 OK - 3516 bytes in 0.073 second response time
[15:39:44] <wikibugs>	 (03PS1) 10Muehlenhoff: Only enable backports up to stretch [puppet] - 10https://gerrit.wikimedia.org/r/489237
[15:39:55] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] "A few nits fixed (= vs ==, missing semicolon), and all the VTC tests now pass.  Will sync up again today before merging." [puppet] - 10https://gerrit.wikimedia.org/r/485171 (https://phabricator.wikimedia.org/T212197) (owner: 10Dr0ptp4kt)
[15:40:36] <wikibugs>	 (03CR) 10Dr0ptp4kt: "Thanks @bblack for the fixes!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/485171 (https://phabricator.wikimedia.org/T212197) (owner: 10Dr0ptp4kt)
[15:42:12] <dr0ptp4kt>	 bblack: thx for the fixes. i'm available during the next 77 minutes if you want to sync
[15:42:32] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10serviceops, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 3 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10jijiki) @EvanProdromou I will try to pull you some st...
[15:42:51] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Icinga passive checks go awal and downtime stops working - https://phabricator.wikimedia.org/T196336 (10jcrespo) 05Resolved→03Open We believe we had a new case of this on the 2018-02-08 :-(
[15:44:21] <wikibugs>	 (03PS1) 10GTirloni: Revert "wiki replicas: depool labsdb1009 for updates" [puppet] - 10https://gerrit.wikimedia.org/r/489239 (https://phabricator.wikimedia.org/T212308)
[15:44:58] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] Revert "wiki replicas: depool labsdb1009 for updates" [puppet] - 10https://gerrit.wikimedia.org/r/489239 (https://phabricator.wikimedia.org/T212308) (owner: 10GTirloni)
[15:46:02] <marostegui>	 !log Repool labsdb1009 - T212308
[15:46:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:46:06] <stashbot>	 T212308: Rerun maintain-views for all tables to drop valid_tag and tag_summary tables - https://phabricator.wikimedia.org/T212308
[15:46:52] <bblack>	 dr0ptp4kt: ok yeah, mainly I just wanted you to be here so you can verify live functionality after the deploy, and/or help hold up the flameshield if everything melts
[15:48:26] <wikibugs>	 (03CR) 10Thcipriani: New throttle rule + removal of expired rules (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489207 (https://phabricator.wikimedia.org/T215446) (owner: 10Urbanecm)
[15:48:41] <bblack>	 dr0ptp4kt: just say when and I'll start rebase -> merge -> deploy stuff
[15:48:55] <wikibugs>	 10Operations, 10Wikimedia-General-or-Unknown, 10serviceops, 10PHP 7.2 support, 10User-jijiki: mwscript dies on mwmaint with PHP=php7.2 due to php-redis missing - https://phabricator.wikimedia.org/T215376 (10Joe) `All the extensions were not upgraded at the time we did the 7.0 => 7.2 transition - my bad!...
[15:49:01] <dr0ptp4kt>	 bblack: i'm ready
[15:50:23] <bblack>	 dr0ptp4kt: ok, going!
[15:50:39] <wikibugs>	 (03PS16) 10BBlack: Add a Google Translate-specific redirect-to-mobile [puppet] - 10https://gerrit.wikimedia.org/r/485171 (https://phabricator.wikimedia.org/T212197) (owner: 10Dr0ptp4kt)
[15:50:45] <icinga-wm>	 RECOVERY - DPKG on mwmaint2001 is OK: All packages OK
[15:50:53] <wikibugs>	 10Operations, 10Patch-For-Review: Prepare our base system layer for Debian buster - https://phabricator.wikimedia.org/T213527 (10MoritzMuehlenhoff) Still some rough edges to sort out, but bare metal installations are working now:   ` $ ssh graphite2002.codfw.wmnet Linux graphite2002 4.19.0-2-amd64 #1 SMP Debia...
[15:51:24] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Add a Google Translate-specific redirect-to-mobile [puppet] - 10https://gerrit.wikimedia.org/r/485171 (https://phabricator.wikimedia.org/T212197) (owner: 10Dr0ptp4kt)
[15:53:17] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "Good job! This is indeed a great idea!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/489230 (https://phabricator.wikimedia.org/T215211) (owner: 10Andrew Bogott)
[15:56:01] <icinga-wm>	 PROBLEM - puppet last run on cp1077 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:56:07] <icinga-wm>	 PROBLEM - puppet last run on cp3032 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:56:23] <icinga-wm>	 PROBLEM - puppet last run on cp2023 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 48 seconds ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:56:50] <marostegui>	 bblack: ^ could that be your change?
[15:56:53] <icinga-wm>	 PROBLEM - puppet last run on cp1089 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:57:07] <bblack>	 it is
[15:57:20] <bblack>	 I don't think the change is actually faulty, though
[15:57:36] <bblack>	 it's the standard bullshit race condition on deploying a new file and referencing that new file all in the same puppet patch
[15:57:47] <icinga-wm>	 PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:57:51] <bblack>	 also to boot, the cumin of "run-puppet-agent -q" claimed they were all successful :P
[15:58:01] <icinga-wm>	 PROBLEM - puppet last run on cp1083 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:01] <icinga-wm>	 PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:05] <icinga-wm>	 PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:05] <icinga-wm>	 PROBLEM - puppet last run on cp4031 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:05] <icinga-wm>	 PROBLEM - puppet last run on cp4030 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:13] <icinga-wm>	 PROBLEM - puppet last run on cp2007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:15] <icinga-wm>	 PROBLEM - puppet last run on cp5007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:23] <icinga-wm>	 PROBLEM - puppet last run on cp2006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:24] <marostegui>	 ah so a second run should fix it?
[15:58:41] <icinga-wm>	 PROBLEM - puppet last run on cp2010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:41] <icinga-wm>	 PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:41] <icinga-wm>	 PROBLEM - puppet last run on cp4029 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:47] <icinga-wm>	 PROBLEM - puppet last run on cp4032 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:49] <icinga-wm>	 PROBLEM - puppet last run on cp1081 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:49] <icinga-wm>	 PROBLEM - puppet last run on cp1075 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:49] <icinga-wm>	 PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:49] <bblack>	 yeah, hopefully.  I'm waiting a couple of minutes first, to let the puppet master/fileserver catch up to reality for sure
[15:58:55] <icinga-wm>	 PROBLEM - puppet last run on cp1088 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:55] <icinga-wm>	 PROBLEM - puppet last run on cp2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:58:55] <wikibugs>	 (03PS1) 10Anomie: wiki replicas: Remove reference to old comment fields [puppet] - 10https://gerrit.wikimedia.org/r/489242 (https://phabricator.wikimedia.org/T212972)
[15:59:18] <bblack>	 oh wait, maybe it's not the race condition heh!
[15:59:33] <icinga-wm>	 PROBLEM - puppet last run on cp1085 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:59:39] <bblack>	 maybe it's that I tested all the diffs manually, but the patch is critically missing the code change to actually deploy the new file :P
[15:59:42] <marostegui>	 I tried a second run on cp1083 and failed yep
[15:59:45] <icinga-wm>	 PROBLEM - puppet last run on cp4028 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:59:45] <icinga-wm>	 PROBLEM - puppet last run on cp5008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:59:45] <icinga-wm>	 PROBLEM - puppet last run on cp5012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:59:59] <icinga-wm>	 PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[15:59:59] <icinga-wm>	 PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:07] <icinga-wm>	 PROBLEM - puppet last run on cp1087 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:07] <icinga-wm>	 PROBLEM - puppet last run on cp5002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:19] <icinga-wm>	 PROBLEM - puppet last run on cp2016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:19] <icinga-wm>	 PROBLEM - puppet last run on cp2019 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:23] <icinga-wm>	 PROBLEM - puppet last run on cp4027 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:29] <icinga-wm>	 PROBLEM - puppet last run on cp1090 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:29] <icinga-wm>	 PROBLEM - puppet last run on cp2012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:29] <icinga-wm>	 PROBLEM - puppet last run on cp5009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:40] <wikibugs>	 (03PS1) 10Elukey: role::analytics_test_cluster::coordinator: add basic camus support [puppet] - 10https://gerrit.wikimedia.org/r/489243 (https://phabricator.wikimedia.org/T212259)
[16:00:49] <icinga-wm>	 PROBLEM - puppet last run on cp1079 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:00:53] <icinga-wm>	 PROBLEM - puppet last run on cp5011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:01:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] role::analytics_test_cluster::coordinator: add basic camus support [puppet] - 10https://gerrit.wikimedia.org/r/489243 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey)
[16:01:13] <icinga-wm>	 PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:01:13] <icinga-wm>	 PROBLEM - puppet last run on cp1084 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:01:21] <icinga-wm>	 PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:01:51] <icinga-wm>	 PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:03:07] <icinga-wm>	 PROBLEM - puppet last run on cp1080 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:03:27] <wikibugs>	 (03PS1) 10BBlack: Bugfix for prev commit 6c0cea96 [puppet] - 10https://gerrit.wikimedia.org/r/489244 (https://phabricator.wikimedia.org/T212197)
[16:03:31] <icinga-wm>	 PROBLEM - puppet last run on cp4021 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:03:54] <wikibugs>	 (03PS2) 10BBlack: Bugfix for prev commit 22c6cd56 [puppet] - 10https://gerrit.wikimedia.org/r/489244 (https://phabricator.wikimedia.org/T212197)
[16:03:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Bugfix for prev commit 22c6cd56 [puppet] - 10https://gerrit.wikimedia.org/r/489244 (https://phabricator.wikimedia.org/T212197) (owner: 10BBlack)
[16:04:09] <icinga-wm>	 PROBLEM - puppet last run on cp4025 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:04:22] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10GTirloni) >>! In T209029#4937948, @aborrero wrote: > **WMCS needs discussion**: what do we want to do with this server? can it live with `spare::system` for...
[16:04:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Bugfix for prev commit 22c6cd56 [puppet] - 10https://gerrit.wikimedia.org/r/489244 (https://phabricator.wikimedia.org/T212197) (owner: 10BBlack)
[16:04:48] <bblack>	 Line 1: Do not define bug in the header
[16:04:55] <icinga-wm>	 PROBLEM - puppet last run on cp2022 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 10 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:05:03] <bblack>	 we can't say the word bugfix in a commit title :)
[16:05:11] <icinga-wm>	 PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:05:12] <dr0ptp4kt>	 bblack: oh for crying out loud, i'm sorry i didn't think of that
[16:05:35] <wikibugs>	 (03PS3) 10BBlack: Add file resource for translation-engine.inc.vcl [puppet] - 10https://gerrit.wikimedia.org/r/489244 (https://phabricator.wikimedia.org/T212197)
[16:05:44] <bblack>	 it's ok, it doesn't actually hurt anything, just spams the channel
[16:05:59] <jynus>	 I was about to ask, saw 68 criticals
[16:06:06] <jynus>	 no big issue I assume
[16:06:13] <bblack>	 they're just puppetfails to load new VCL, which leaves the existing VCL running
[16:06:44] <bblack>	 so they're critical in the sense of "someone needs to fix this pronto", but not in the sense of functional site issues
[16:06:57] <icinga-wm>	 PROBLEM - puppet last run on cp2011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:06:57] <icinga-wm>	 PROBLEM - puppet last run on cp3049 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:06:57] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Add file resource for translation-engine.inc.vcl [puppet] - 10https://gerrit.wikimedia.org/r/489244 (https://phabricator.wikimedia.org/T212197) (owner: 10BBlack)
[16:06:58] <jynus>	 yes, I got it
[16:07:03] <dr0ptp4kt>	 bblack: yeah. i'm having flashbacks to learning puppet
[16:07:43] <icinga-wm>	 PROBLEM - puppet last run on cp5003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:07:43] <bblack>	 well what's really missing here is true CI integration for our VCL stuff that would've caught this
[16:08:04] <bblack>	 but there's little point investing heavily in that direction at this point.  We've been living this way for a few years and it will all go away eventually.
[16:08:10] <moritzm>	 !log imported git-fat 0.1.3-2+deb10u1 to buster-wikimedia (T213527)
[16:08:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:08:13] <stashbot>	 T213527: Prepare our base system layer for Debian buster - https://phabricator.wikimedia.org/T213527
[16:08:31] <icinga-wm>	 RECOVERY - puppet last run on cp1083 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[16:08:55] <icinga-wm>	 PROBLEM - puppet last run on cp2005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:09:00] <jynus>	 !log stopping s1 replication on dbstore1001 to speed up cloning T214720
[16:09:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:03] <stashbot>	 T214720: db1114 crashed - https://phabricator.wikimedia.org/T214720
[16:09:23] <icinga-wm>	 PROBLEM - puppet last run on cp5006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:10:47] <icinga-wm>	 RECOVERY - puppet last run on cp2016 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[16:10:47] <icinga-wm>	 RECOVERY - puppet last run on cp2019 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[16:10:57] <icinga-wm>	 RECOVERY - puppet last run on cp2012 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[16:11:01] <icinga-wm>	 PROBLEM - puppet last run on cp2014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:11:15] <icinga-wm>	 RECOVERY - puppet last run on cp1079 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[16:11:35] <icinga-wm>	 PROBLEM - puppet last run on cp4022 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 16 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:11:39] <icinga-wm>	 RECOVERY - puppet last run on cp1077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:11:41] <icinga-wm>	 RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[16:11:45] <icinga-wm>	 RECOVERY - puppet last run on cp3032 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[16:11:49] <icinga-wm>	 RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[16:12:03] <icinga-wm>	 PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:12:03] <icinga-wm>	 RECOVERY - puppet last run on cp2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:12:23] <icinga-wm>	 PROBLEM - puppet last run on cp1076 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:12:31] <icinga-wm>	 PROBLEM - puppet last run on cp2008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:12:43] <icinga-wm>	 RECOVERY - puppet last run on cp1089 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:13:37] <icinga-wm>	 RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:13:49] <icinga-wm>	 RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:13:55] <icinga-wm>	 RECOVERY - puppet last run on cp4031 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:13:55] <icinga-wm>	 RECOVERY - puppet last run on cp4030 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:13:59] <icinga-wm>	 RECOVERY - puppet last run on cp2007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:14:01] <icinga-wm>	 RECOVERY - puppet last run on cp5007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:14:11] <icinga-wm>	 RECOVERY - puppet last run on cp2006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:14:27] <wikibugs>	 (03PS2) 10Elukey: role::analytics_test_cluster::coordinator: add basic camus support [puppet] - 10https://gerrit.wikimedia.org/r/489243 (https://phabricator.wikimedia.org/T212259)
[16:14:29] <icinga-wm>	 RECOVERY - puppet last run on cp2010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:14:29] <icinga-wm>	 RECOVERY - puppet last run on cp4029 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:14:37] <icinga-wm>	 RECOVERY - puppet last run on cp4032 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:14:39] <icinga-wm>	 RECOVERY - puppet last run on cp1081 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:14:39] <icinga-wm>	 RECOVERY - puppet last run on cp1075 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[16:14:39] <icinga-wm>	 RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:14:41] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Introduce systemd::slice::all_users [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey)
[16:14:43] <icinga-wm>	 RECOVERY - puppet last run on cp2004 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[16:15:11] <wikibugs>	 (03CR) 10Dr0ptp4kt: "Leaving a note for future self in case of similar changes." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/485171 (https://phabricator.wikimedia.org/T212197) (owner: 10Dr0ptp4kt)
[16:15:23] <icinga-wm>	 RECOVERY - puppet last run on cp1085 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[16:15:33] <icinga-wm>	 RECOVERY - puppet last run on cp4028 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[16:15:35] <icinga-wm>	 RECOVERY - puppet last run on cp5008 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[16:15:35] <icinga-wm>	 RECOVERY - puppet last run on cp5012 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[16:15:43] <icinga-wm>	 PROBLEM - puppet last run on cp2018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:15:45] <icinga-wm>	 RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[16:15:45] <icinga-wm>	 RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[16:15:53] <icinga-wm>	 RECOVERY - puppet last run on cp1087 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:16:07] <icinga-wm>	 RECOVERY - puppet last run on cp4027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:16:15] <icinga-wm>	 RECOVERY - puppet last run on cp5009 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[16:16:15] <icinga-wm>	 PROBLEM - puppet last run on cp3039 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend]
[16:16:37] <icinga-wm>	 RECOVERY - puppet last run on cp5011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:18:22] <wikibugs>	 (03PS1) 10Muehlenhoff: prometheus::node_exporter: Change OS detection for buster [puppet] - 10https://gerrit.wikimedia.org/r/489246
[16:18:51] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1004.wikimedia.org ` The log can be f...
[16:18:57] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1004.wikimedia.org'] `  Of which those **FAILED**: ` ['cloudelastic1004.wikimedia.org'] `
[16:19:06] <bblack>	 dr0ptp4kt: it should be live everywhere
[16:19:23] <elukey>	 arturo: o/ - if you have time I can merge now and let you run puppet to verify that nothing looks weird
[16:19:29] <bblack>	 well, I say that, but some puppetfails still sticky, one last run
[16:19:32] <dr0ptp4kt>	 bblack: thx. i see https://translate.google.com/translate?sl=auto&tl=id&u=https%3A%2F%2Fsimple.wikipedia.org%2Fwiki%2FCholera redirecting as expected and i see googlebot not getting redirected, so that's a start
[16:19:56] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1004.wikimedia.org ` The log can be f...
[16:20:14] <arturo>	 elukey: ok
[16:21:34] <elukey>	 super
[16:21:47] <dr0ptp4kt>	 bblack: kafkacat doesn't even seem to be turning up hits for cp3039 - not sure if that's a symptom of where it is located in the topology or what
[16:21:55] <wikibugs>	 (03PS9) 10Elukey: Introduce systemd::slice::all_users [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824)
[16:22:11] <bblack>	 dr0ptp4kt: 3039 is one the ones still failing puppet, it should catch up shortly
[16:23:23] <elukey>	 arturo: done!
[16:25:03] <icinga-wm>	 RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[16:25:13] <arturo>	 elukey: :-(
[16:25:15] <arturo>	 https://www.irccloud.com/pastebin/aARqo1B7/
[16:25:19] <icinga-wm>	 RECOVERY - puppet last run on cp1088 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:26:05] <icinga-wm>	 RECOVERY - puppet last run on cp2022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:26:21] <elukey>	 arturo: of course that is a class not a define!
[16:26:27] <icinga-wm>	 RECOVERY - puppet last run on cp5002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:26:32] <arturo>	 :-/ my fault elukey 
[16:26:47] <icinga-wm>	 RECOVERY - puppet last run on cp1090 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[16:26:53] <elukey>	 no no I was sloppy, lemme fix it
[16:26:54] <elukey>	 sorry
[16:27:05] <arturo>	 ok
[16:27:21] <icinga-wm>	 RECOVERY - puppet last run on cp4022 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:27:27] <icinga-wm>	 RECOVERY - puppet last run on cp1084 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:28:11] <icinga-wm>	 RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:29:33] <icinga-wm>	 RECOVERY - puppet last run on cp1080 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:29:47] <icinga-wm>	 RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[16:29:55] <icinga-wm>	 RECOVERY - puppet last run on cp4021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:30:21] <bblack>	 dr0ptp4kt: where are you checking kafkacat at?
[16:30:34] <dr0ptp4kt>	 bblack: stat1007
[16:30:37] <icinga-wm>	 RECOVERY - puppet last run on cp4025 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:30:53] <wikibugs>	 (03PS1) 10Elukey: profile::toolforge::bastion::resourcecontrol: fix class definition [puppet] - 10https://gerrit.wikimedia.org/r/489250
[16:31:37] <icinga-wm>	 RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:31:55] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::toolforge::bastion::resourcecontrol: fix class definition [puppet] - 10https://gerrit.wikimedia.org/r/489250 (owner: 10Elukey)
[16:32:20] <elukey>	 arturo: should be better now
[16:33:17] <icinga-wm>	 RECOVERY - puppet last run on cp3049 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:34:13] <icinga-wm>	 RECOVERY - puppet last run on cp5003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:34:39] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251
[16:34:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris)
[16:35:21] <icinga-wm>	 RECOVERY - puppet last run on cp2005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[16:35:24] <arturo>	 elukey: indeed, thanks
[16:35:29] <arturo>	 https://www.irccloud.com/pastebin/QUQp3b8N/
[16:35:49] <icinga-wm>	 RECOVERY - puppet last run on cp5006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:37:14] <icinga-wm>	 RECOVERY - puppet last run on cp2014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:38:06] <icinga-wm>	 RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:38:18] <icinga-wm>	 RECOVERY - puppet last run on cp2011 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[16:38:28] <icinga-wm>	 RECOVERY - puppet last run on cp1076 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:38:34] <icinga-wm>	 RECOVERY - puppet last run on cp2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:41:44] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1004.wikimedia.org'] `  and were **ALL** successful.
[16:41:52] <icinga-wm>	 RECOVERY - puppet last run on cp2018 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:42:14] <icinga-wm>	 RECOVERY - puppet last run on cp3039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:42:57] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Icinga passive checks go awal and downtime stops working - https://phabricator.wikimedia.org/T196336 (10Dzahn) a:05Dzahn→03None Was it really both "passive checks" and "downtime stopped working" at the same time or just one of them?
[16:44:32] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-Database, 10Wikimedia-Logstash, and 2 others: MediaWiki errors overloading logtash - https://phabricator.wikimedia.org/T215611 (10fsero) my 2 cents. i think what @CDanis proposes seems the right approach, alternatively we could ratelimit kafka output using so...
[16:45:23] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudelastic1004: SMART/disk error - https://phabricator.wikimedia.org/T209029 (10aborrero) 05Open→03Resolved a:03aborrero Thanks @Cmjohnson and @MoritzMuehlenhoff, the server seems fine now:  ` aborrero@cloudelastic1004:~ $ sudo sm...
[16:47:31] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251
[16:47:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris)
[16:49:27] <wikibugs>	 (03CR) 10Fsero: [C: 04-1] "i think CI job did properly the job, fix the typo on codfw block pointing to eqiad and it might be better :)" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris)
[16:49:56] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Dzahn)  >>! In T214623#4938377, @RStallman-legalteam wrote: > Just confirming that the Master Serv...
[16:50:15] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Dzahn) 05Stalled→03Open
[16:50:43] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251
[16:50:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris)
[16:51:16] <wikibugs>	 (03CR) 10Alexandros Kosiaris: Add kubernetes pods PTR records for IPv4 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris)
[16:51:58] <wikibugs>	 (03PS1) 10Muehlenhoff: Don't install pxz on buster [puppet] - 10https://gerrit.wikimedia.org/r/489252
[16:53:18] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251
[16:53:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris)
[16:53:55] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Gehel) >>! In T214623#4938690, @Dzahn wrote: > Thanks, will do. Did the date stay the same? >  >>>...
[16:54:58] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] prometheus::node_exporter: Change OS detection for buster [puppet] - 10https://gerrit.wikimedia.org/r/489246 (owner: 10Muehlenhoff)
[16:55:18] <wikibugs>	 (03CR) 10Ppchelko: [C: 04-1] "LGTM, however -1 for now because we can only deploy this after next week MW train." [puppet] - 10https://gerrit.wikimedia.org/r/489211 (https://phabricator.wikimedia.org/T214706) (owner: 10Bmansurov)
[16:55:26] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Pool rc slaves with higher weight to rebalance load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489200 (https://phabricator.wikimedia.org/T214720)
[16:55:28] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Depool db1099:s1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489253
[16:56:32] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251
[16:56:39] <wikibugs>	 (03PS4) 10Gehel: admins: create user with analytics-privatedata access for juliaglen [puppet] - 10https://gerrit.wikimedia.org/r/488120 (https://phabricator.wikimedia.org/T214623) (owner: 10Dzahn)
[16:56:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris)
[16:57:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] admins: create user with analytics-privatedata access for juliaglen [puppet] - 10https://gerrit.wikimedia.org/r/488120 (https://phabricator.wikimedia.org/T214623) (owner: 10Dzahn)
[17:03:35] <dr0ptp4kt>	 bblack: things are in order but for one minor edge case - use of the "Desktop" link when using the translation proxy on simple english (that 301s and then the next request necessarily doesn't contain the toggle_view_desktip, which in turn makes it go back to mobile). that's a rather rare behavior from what i see, although i will doublecheck. anyway, it's something we'll want working for regular english, so i'll tend to that
[17:03:57] <dr0ptp4kt>	 s/desktip/desktop/
[17:10:56] <wikibugs>	 (03CR) 10Dzahn: "> I'm quite a bit worried to see we are starting a "black list of bad words"" [puppet] - 10https://gerrit.wikimedia.org/r/487917 (https://phabricator.wikimedia.org/T201491) (owner: 10Dzahn)
[17:11:02] <icinga-wm>	 PROBLEM - Disk space on labmon1001 is CRITICAL: DISK CRITICAL - free space: /srv 81974 MB (3% inode=93%)
[17:11:39] <wikibugs>	 (03PS1) 10Dzahn: Revert "add some common typo words to CI checks" [puppet] - 10https://gerrit.wikimedia.org/r/489262
[17:11:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "add some common typo words to CI checks" [puppet] - 10https://gerrit.wikimedia.org/r/489262 (owner: 10Dzahn)
[17:12:52] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.16; 2019-02-05), 10Patch-For-Review, and 3 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) @aaron Any more ideas about what could be...
[17:16:02] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Helm chart for eventgate-analytics deployment (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/483035 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata)
[17:17:11] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Helm chart for eventgate-analytics deployment (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/483035 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata)
[17:21:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] prometheus::node_exporter: Change OS detection for buster [puppet] - 10https://gerrit.wikimedia.org/r/489246 (owner: 10Muehlenhoff)
[17:21:57] <bblack>	 dr0ptp4kt: was that even working before the GT patch?
[17:23:16] <wikibugs>	 (03PS2) 10Dzahn: Revert "add some common typo words to CI checks" [puppet] - 10https://gerrit.wikimedia.org/r/489262
[17:23:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "add some common typo words to CI checks" [puppet] - 10https://gerrit.wikimedia.org/r/489262 (owner: 10Dzahn)
[17:24:19] <wikibugs>	 (03PS5) 10Gehel: admins: create user with analytics-privatedata access for juliaglen [puppet] - 10https://gerrit.wikimedia.org/r/488120 (https://phabricator.wikimedia.org/T214623) (owner: 10Dzahn)
[17:24:23] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: src:prometheus-openstack-exporter: run wrap-and-sort [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489264
[17:24:25] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: src:prometheus-openstack-exporter: bump debhelper compat to 10 [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489265
[17:24:27] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: postinst: use non-existent home dir [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489266
[17:24:29] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: src:prometheus-openstack-exporter: bump std-versions to 3.9.8 [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489267
[17:24:31] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: src:prometheus-openstack-exporter: switch to python3 [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489268
[17:24:33] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: d/changelog: generate entry for 0.0.8-4 stretch-wikimedia [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489269 (https://phabricator.wikimedia.org/T215605)
[17:25:23] <wikibugs>	 (03PS2) 10Ayounsi: Icinga: add ping check for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/489113 (https://phabricator.wikimedia.org/T209101)
[17:25:24] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] src:prometheus-openstack-exporter: run wrap-and-sort [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489264 (owner: 10Arturo Borrero Gonzalez)
[17:25:37] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] src:prometheus-openstack-exporter: bump debhelper compat to 10 [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489265 (owner: 10Arturo Borrero Gonzalez)
[17:25:42] <dr0ptp4kt>	 bblack: yeah, as i recall it did work (and following the code paths suggest it did), and for the other domains outside of simple it's still working
[17:26:00] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] postinst: use non-existent home dir [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489266 (owner: 10Arturo Borrero Gonzalez)
[17:26:03] <dr0ptp4kt>	 i can check the web logs, though. it's just such a rare behavior to go through the proxy and then go to the desktop link
[17:26:09] <dr0ptp4kt>	 and yet it defies user intent
[17:26:19] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] src:prometheus-openstack-exporter: bump std-versions to 3.9.8 [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489267 (owner: 10Arturo Borrero Gonzalez)
[17:26:23] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Icinga: add ping check for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/489113 (https://phabricator.wikimedia.org/T209101) (owner: 10Ayounsi)
[17:26:34] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] src:prometheus-openstack-exporter: switch to python3 [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489268 (owner: 10Arturo Borrero Gonzalez)
[17:26:46] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] d/changelog: generate entry for 0.0.8-4 stretch-wikimedia [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489269 (https://phabricator.wikimedia.org/T215605) (owner: 10Arturo Borrero Gonzalez)
[17:27:05] <XioNoX>	 !log merge  Icinga: add ping check for ulsfo PDUs
[17:27:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:35] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: d/changelog: generate entry for 0.0.8-4 stretch-wikimedia [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489269 (https://phabricator.wikimedia.org/T215605)
[17:28:09] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] "Seems sane.  Obviously clients will need to actually support SRV lookups explicitly!" [dns] - 10https://gerrit.wikimedia.org/r/489170 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[17:30:20] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admins: create user with analytics-privatedata access for juliaglen [puppet] - 10https://gerrit.wikimedia.org/r/488120 (https://phabricator.wikimedia.org/T214623) (owner: 10Dzahn)
[17:31:20] <wikibugs>	 (03Abandoned) 10Jcrespo: mariadb: Depool db1099:s1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489253 (owner: 10Jcrespo)
[17:31:43] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Pool rc slaves with higher weight to rebalance load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489200 (https://phabricator.wikimedia.org/T214720) (owner: 10Jcrespo)
[17:32:16] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] d/changelog: generate entry for 0.0.8-4 stretch-wikimedia [debs/prometheus-openstack-exporter] - 10https://gerrit.wikimedia.org/r/489269 (https://phabricator.wikimedia.org/T215605) (owner: 10Arturo Borrero Gonzalez)
[17:32:56] <wikibugs>	 (03Merged) 10jenkins-bot: mariadb: Pool rc slaves with higher weight to rebalance load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489200 (https://phabricator.wikimedia.org/T214720) (owner: 10Jcrespo)
[17:34:48] <wikibugs>	 (03PS6) 10Dzahn: admins: create user with analytics-privatedata access for juliaglen [puppet] - 10https://gerrit.wikimedia.org/r/488120 (https://phabricator.wikimedia.org/T214623)
[17:34:58] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Icinga passive checks go awal and downtime stops working - https://phabricator.wikimedia.org/T196336 (10jcrespo) The thing that we knew is "passive checks awol", then restart, then gone. We didn't test downtiming.
[17:35:21] <wikibugs>	 (03PS1) 10Papaul: DNS: Remove mgmt DNS for mw2213 [dns] - 10https://gerrit.wikimedia.org/r/489271 (https://phabricator.wikimedia.org/T203434)
[17:37:33] <wikibugs>	 (03CR) 10jenkins-bot: mariadb: Pool rc slaves with higher weight to rebalance load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489200 (https://phabricator.wikimedia.org/T214720) (owner: 10Jcrespo)
[17:38:40] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1114 crashed - https://phabricator.wikimedia.org/T214720 (10jcrespo) Transfer of 1h and 20m, probably sped up because I stopped replication (avoiding to replay many changes).
[17:40:37] <wikibugs>	 (03PS2) 10Papaul: DNS: Remove mgmt DNS for mw2213 [dns] - 10https://gerrit.wikimedia.org/r/489271 (https://phabricator.wikimedia.org/T203434)
[17:42:19] <mutante>	 !log graceful reload of apache on phabricator prod server (phab1001)
[17:42:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:43:45] <icinga-wm>	 PROBLEM - puppet last run on people1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[all-users_ensure_members]
[17:43:51] <icinga-wm>	 PROBLEM - puppet last run on notebook1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[analytics-privatedata-users_ensure_members]
[17:44:02] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for mw2213 [dns] - 10https://gerrit.wikimedia.org/r/489271 (https://phabricator.wikimedia.org/T203434) (owner: 10Papaul)
[17:44:51] <wikibugs>	 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review: Decom mw2213 - https://phabricator.wikimedia.org/T203434 (10Papaul)
[17:46:01] <wikibugs>	 10Operations, 10Mail, 10Phabricator, 10serviceops, and 2 others: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10greg)
[17:46:05] <wikibugs>	 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review: Decom mw2213 - https://phabricator.wikimedia.org/T203434 (10Papaul) 05Open→03Resolved complete
[17:46:19] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 1:" [dns] - 10https://gerrit.wikimedia.org/r/489170 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[17:47:43] <mutante>	 !log phab1001 - restarting apache2 service for library upgrade
[17:47:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:49:07] <arturo>	 !log T215605 add prometheus-openstack-exporter 0.0.8-4 to stretch-wikimedia
[17:49:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:49:10] <stashbot>	 T215605: cloudvps: missing packages in stretch for cloudcontrol servers - https://phabricator.wikimedia.org/T215605
[17:49:43] <wikibugs>	 10Operations, 10ops-codfw: ms-be2030 spontaneous reboot - https://phabricator.wikimedia.org/T204567 (10Papaul) Checked temperature in the rack all looks good.  add blanks to the rack since we have only 8 servers in that rack. Leaving the task open for another week.
[17:50:53] <icinga-wm>	 PROBLEM - puppet last run on stat1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[analytics-privatedata-users_ensure_members]
[17:51:33] <icinga-wm>	 PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[analytics-privatedata-users_ensure_members]
[17:52:03] <mutante>	 !log phab1001 - restarting phd service
[17:52:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:52:31] <icinga-wm>	 PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 10 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[analytics-privatedata-users_ensure_members]
[17:53:50] <mutante>	 these failures are a new race condition when adding new shell users
[17:53:57] <mutante>	 and fixes itself after the second puppet run
[17:54:15] <mutante>	 it tries to ensure all users are member of all users group..before a user has been created
[17:54:29] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: add keystone support for mitaka/stretch in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/489275 (https://phabricator.wikimedia.org/T215407)
[17:54:48] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Nuria) Contract was signed so we shoudl be good to go here. Dates are unchanged.
[17:54:59] <mutante>	 !log phab1001 - restart aphlict service
[17:55:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:55:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: add keystone support for mitaka/stretch in cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/489275 (https://phabricator.wikimedia.org/T215407) (owner: 10Arturo Borrero Gonzalez)
[17:55:11] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Nuria) @Julia.glen to confirm she has access and ticket can be closed.
[17:57:39] <icinga-wm>	 PROBLEM - puppet last run on bast4002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[all-users_ensure_members]
[17:58:05] <icinga-wm>	 PROBLEM - puppet last run on analytics1032 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[analytics-privatedata-users_ensure_members]
[17:58:15] <mutante>	 you will see it recover on stat1004 /notebook1003 now cause i ran puppet
[17:59:25] <icinga-wm>	 PROBLEM - puppet last run on an-master1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[analytics-privatedata-users_ensure_members]
[18:00:04] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Dzahn) 05Open→03Resolved puppet is creating her user on all the relevant servers right now. in...
[18:00:09] <jynus>	 mutante: re:icinga I don't exect there is any actionable, but I think it was important to note it on the ticket
[18:00:35] <mutante>	 jynus: ACK, thanks for adding that. makes sense
[18:00:57] <wikibugs>	 10Operations, 10ops-codfw, 10decommission: Decommission baham - https://phabricator.wikimedia.org/T199247 (10Papaul)
[18:02:07] <icinga-wm>	 RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[18:02:55] <icinga-wm>	 RECOVERY - puppet last run on bast4002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[18:03:01] <icinga-wm>	 PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[all-users_ensure_members]
[18:03:07] <icinga-wm>	 RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[18:03:45] <mutante>	 does anyone know when we added the "ensure all users are in the special allusers group" thing 
[18:03:53] <wikibugs>	 (03PS6) 10Alexandros Kosiaris: Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251
[18:03:55] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Run zole validation on generated zonefiles [dns] - 10https://gerrit.wikimedia.org/r/489277
[18:04:30] <mutante>	 it seems like we should have gotten this before .. lots of users were created
[18:04:41] <icinga-wm>	 RECOVERY - puppet last run on an-master1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[18:06:39] <icinga-wm>	 PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[all-users_ensure_members]
[18:08:52] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase weight for s1 rc slaves (duration: 00m 49s)
[18:08:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:09:18] <jynus>	 mutante: I added some users not a long time ago, I don't remember that being a thing
[18:09:53] <mutante>	 jynus: ok.. hmm. it's odd it seems consistent now, but it's just a race
[18:09:54] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): "> This is not about compiling a list of bad words." [puppet] - 10https://gerrit.wikimedia.org/r/487917 (https://phabricator.wikimedia.org/T201491) (owner: 10Dzahn)
[18:10:07] <icinga-wm>	 RECOVERY - puppet last run on people1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[18:10:15] <icinga-wm>	 RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[18:10:19] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Revert "add some common typo words to CI checks" [puppet] - 10https://gerrit.wikimedia.org/r/489262 (owner: 10Dzahn)
[18:10:21] <mutante>	 there it goes
[18:11:15] <icinga-wm>	 PROBLEM - puppet last run on bast5001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[all-users_ensure_members]
[18:11:29] <icinga-wm>	 PROBLEM - puppet last run on an-master1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[analytics-privatedata-users_ensure_members]
[18:11:53] <icinga-wm>	 RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[18:14:55] <gtirloni>	 !log T213527 graphite2002 disabled puppet and commented prometheus_puppet_agent_stats cronjob due to cronspam
[18:14:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:58] <stashbot>	 T213527: Prepare our base system layer for Debian buster - https://phabricator.wikimedia.org/T213527
[18:15:12] <wikibugs>	 (03PS3) 10Dzahn: Revert "add some common typo words to CI checks" [puppet] - 10https://gerrit.wikimedia.org/r/489262
[18:18:56] <wikibugs>	 (03CR) 10MarcoAurelio: "I find rather offensive the questioning of the morality of the (volunteer in my case) time that I and others have spent fixing those typos" [puppet] - 10https://gerrit.wikimedia.org/r/487917 (https://phabricator.wikimedia.org/T201491) (owner: 10Dzahn)
[18:20:53] <wikibugs>	 (03CR) 10Dzahn: "> But it is. The effect of this list is not like in a word processor that highlights possible mistakes, and let's the user decide. The eff" [puppet] - 10https://gerrit.wikimedia.org/r/487917 (https://phabricator.wikimedia.org/T201491) (owner: 10Dzahn)
[18:21:38] <wikibugs>	 (03PS4) 10Dzahn: Revert "add some common typo words to CI checks" [puppet] - 10https://gerrit.wikimedia.org/r/489262
[18:22:04] <icinga-wm>	 RECOVERY - puppet last run on stat1007 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[18:22:12] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: cloudstore100{8,9} - Upgrade to 10GbE - https://phabricator.wikimedia.org/T214079 (10GTirloni) @RobH @Cmjohnson thanks a lot for this, really appreciate the effort.  ` ------------------------------------------------------------ Server listening on TCP port 5001...
[18:22:19] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: cloudstore100{8,9} - Upgrade to 10GbE - https://phabricator.wikimedia.org/T214079 (10GTirloni) 05Open→03Resolved
[18:22:43] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Enable notifications for db1118 [puppet] - 10https://gerrit.wikimedia.org/r/489280 (https://phabricator.wikimedia.org/T214720)
[18:22:56] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "add some common typo words to CI checks" [puppet] - 10https://gerrit.wikimedia.org/r/489262 (owner: 10Dzahn)
[18:23:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Enable notifications for db1118 [puppet] - 10https://gerrit.wikimedia.org/r/489280 (https://phabricator.wikimedia.org/T214720) (owner: 10Jcrespo)
[18:23:58] <icinga-wm>	 RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[18:26:26] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Introduce and pool db1118 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489281 (https://phabricator.wikimedia.org/T214720)
[18:28:11] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Pool db1118 with full weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489282 (https://phabricator.wikimedia.org/T214720)
[18:28:42] <icinga-wm>	 RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[18:37:16] <icinga-wm>	 RECOVERY - puppet last run on an-master1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[18:40:14] <wikibugs>	 (03CR) 10Paladox: "I don't see the need to have reverted this." [puppet] - 10https://gerrit.wikimedia.org/r/489262 (owner: 10Dzahn)
[18:42:16] <icinga-wm>	 RECOVERY - puppet last run on bast5001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[18:49:07] <wikibugs>	 (03PS1) 10BBlack: zone_validator: require -z argument zones dir [dns] - 10https://gerrit.wikimedia.org/r/489287
[18:49:09] <wikibugs>	 (03PS1) 10BBlack: deploy-check: integrate other checks, no-gdnsd opt [dns] - 10https://gerrit.wikimedia.org/r/489288
[18:49:11] <wikibugs>	 (03PS1) 10BBlack: update README and run-tests.sh [dns] - 10https://gerrit.wikimedia.org/r/489289
[18:49:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] zone_validator: require -z argument zones dir [dns] - 10https://gerrit.wikimedia.org/r/489287 (owner: 10BBlack)
[18:49:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] deploy-check: integrate other checks, no-gdnsd opt [dns] - 10https://gerrit.wikimedia.org/r/489288 (owner: 10BBlack)
[18:49:43] <wikibugs>	 (03PS1) 10Bstorm: toolforge: Use a really old version of kubectl for the current k8s [puppet] - 10https://gerrit.wikimedia.org/r/489291 (https://phabricator.wikimedia.org/T215586)
[18:49:53] <wikibugs>	 (03PS1) 10BBlack: authdns-local-update: update deploy-check.py args [puppet] - 10https://gerrit.wikimedia.org/r/489292
[18:50:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] toolforge: Use a really old version of kubectl for the current k8s [puppet] - 10https://gerrit.wikimedia.org/r/489291 (https://phabricator.wikimedia.org/T215586) (owner: 10Bstorm)
[18:50:52] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1114 crashed - https://phabricator.wikimedia.org/T214720 (10jcrespo) Except for the above 3 patches, db1118 should be ready to go (not done so late in the week for obvious reasons).
[18:52:45] <wikibugs>	 (03PS2) 10Bstorm: toolforge: Use a really old version of kubectl for the current k8s [puppet] - 10https://gerrit.wikimedia.org/r/489291 (https://phabricator.wikimedia.org/T215586)
[18:54:25] <wikibugs>	 (03PS2) 10BBlack: zone_validator: require -z argument zones dir [dns] - 10https://gerrit.wikimedia.org/r/489287
[18:54:27] <wikibugs>	 (03PS2) 10BBlack: deploy-check: integrate other checks, no-gdnsd opt [dns] - 10https://gerrit.wikimedia.org/r/489288
[18:54:29] <wikibugs>	 (03PS2) 10BBlack: update README and run-tests.sh [dns] - 10https://gerrit.wikimedia.org/r/489289
[19:06:50] <wikibugs>	 (03CR) 10Mobrovac: "Idem as Petr, LGTM, but we have to wait." [puppet] - 10https://gerrit.wikimedia.org/r/489211 (https://phabricator.wikimedia.org/T214706) (owner: 10Bmansurov)
[19:09:39] <wikibugs>	 (03CR) 10Mobrovac: "> Would a comment instead of a removal work then?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/488800 (owner: 10Alexandros Kosiaris)
[19:15:03] <wikibugs>	 (03PS1) 10Andrew Bogott: Cloud vms: enable a default tty [puppet] - 10https://gerrit.wikimedia.org/r/489299 (https://phabricator.wikimedia.org/T215211)
[19:15:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Cloud vms: enable a default tty [puppet] - 10https://gerrit.wikimedia.org/r/489299 (https://phabricator.wikimedia.org/T215211) (owner: 10Andrew Bogott)
[19:16:18] <wikibugs>	 (03PS3) 10Volans: sre.hosts: add decommission cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/487982 (https://phabricator.wikimedia.org/T205886)
[19:16:54] <wikibugs>	 (03PS2) 10Andrew Bogott: Cloud vms: enable a default tty [puppet] - 10https://gerrit.wikimedia.org/r/489299 (https://phabricator.wikimedia.org/T215211)
[19:17:26] <wikibugs>	 (03CR) 10BryanDavis: nova: add wmcs-rescue-console.sh to compute hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/489230 (https://phabricator.wikimedia.org/T215211) (owner: 10Andrew Bogott)
[19:19:55] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-Database, 10Wikimedia-Logstash, and 2 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10Reedy)
[19:24:26] <wikibugs>	 (03PS4) 10Dzahn: mediawiki/scap: do not install sql scripts on canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/479142 (https://phabricator.wikimedia.org/T211512)
[19:30:39] <wikibugs>	 10Operations, 10Traffic, 10Core Platform Team Backlog (Designing), 10Patch-For-Review, and 5 others: Harmonise the identification of requests across our stack - https://phabricator.wikimedia.org/T201409 (10Anomie) A discussion about later plans for this task was accidentally started in code review. Copying...
[19:32:57] <wikibugs>	 10Operations, 10Traffic, 10Core Platform Team Backlog (Designing), 10Patch-For-Review, and 5 others: Harmonise the identification of requests across our stack - https://phabricator.wikimedia.org/T201409 (10Anomie) >>! In [[https://gerrit.wikimedia.org/r/c/mediawiki/core/+/487544#message-3709f9fc817ed264ba9...
[19:39:19] <wikibugs>	 (03PS1) 10Krinkle: Set MW_NO_SESSION for various entry points in w/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489302
[19:39:26] <wikibugs>	 (03CR) 10Dzahn: "@Effie i amended to a new version that simplifies and avoids creating a new profile. i now remember why i did that, i wanted to avoid havi" [puppet] - 10https://gerrit.wikimedia.org/r/479142 (https://phabricator.wikimedia.org/T211512) (owner: 10Dzahn)
[19:39:37] <wikibugs>	 (03CR) 10Volans: [C: 03+2] sre.hosts: add decommission cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/487982 (https://phabricator.wikimedia.org/T205886) (owner: 10Volans)
[19:40:18] <wikibugs>	 (03CR) 10Krinkle: "E.g. https://performance.wikimedia.org/xenon/svgs/daily/2019-02-07.touch.svgz shows that "MediaWiki\Session\SessionManager::getSessionFrom" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489302 (owner: 10Krinkle)
[19:40:49] <wikibugs>	 (03CR) 10Krinkle: "https://performance.wikimedia.org/xenon/svgs/daily/2019-02-07.favicon.svgz" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489302 (owner: 10Krinkle)
[19:41:45] <wikibugs>	 (03Merged) 10jenkins-bot: sre.hosts: add decommission cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/487982 (https://phabricator.wikimedia.org/T205886) (owner: 10Volans)
[19:42:38] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] Set MW_NO_SESSION for various entry points in w/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489302 (owner: 10Krinkle)
[19:43:26] <volans|off>	 bblack: I've not yet read the whole backlog but the last ~24h of puppet run logs are always available in puppetboard
[19:43:47] <wikibugs>	 (03Merged) 10jenkins-bot: Set MW_NO_SESSION for various entry points in w/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489302 (owner: 10Krinkle)
[19:43:54] <wikibugs>	 (03CR) 10Anomie: [C: 03+1] Set MW_NO_SESSION for various entry points in w/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489302 (owner: 10Krinkle)
[19:44:20] <Krinkle>	 anomie: thx, just testing on mwdebug for now.
[19:44:45] * Krinkle staging on mwdebug1002
[19:46:07] <Krinkle>	 and they all still return 200 OK with the same response as without XWD, so I'll sync it out
[19:47:47] <logmsgbot>	 !log krinkle@deploy1001 Synchronized w/extract2.php: Ia1e610a5f (duration: 00m 48s)
[19:47:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:34] <logmsgbot>	 !log krinkle@deploy1001 Synchronized w/favicon.php: Ia1e610a5f (duration: 00m 46s)
[19:48:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:49:06] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-Database, 10Wikimedia-Logstash, and 2 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10Marostegui)
[19:49:21] <logmsgbot>	 !log krinkle@deploy1001 Synchronized w/robots.php: Ia1e610a5f (duration: 00m 46s)
[19:49:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:50:08] <logmsgbot>	 !log krinkle@deploy1001 Synchronized w/touch.php: Ia1e610a5f (duration: 00m 46s)
[19:50:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:02] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): "> I don't see the need to have reverted this." [puppet] - 10https://gerrit.wikimedia.org/r/489262 (owner: 10Dzahn)
[19:53:45] <wikibugs>	 (03CR) 10jenkins-bot: Set MW_NO_SESSION for various entry points in w/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489302 (owner: 10Krinkle)
[19:57:36] <wikibugs>	 (03CR) 10Krinkle: "To recap: We're continuing to use typos in the puppet repo, as before, for the purpose of common mistakes in normative code (i.e. statemen" [puppet] - 10https://gerrit.wikimedia.org/r/487917 (https://phabricator.wikimedia.org/T201491) (owner: 10Dzahn)
[20:01:31] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): "> […] questioning of the morality of the (volunteer in my case) time that I and others have spent fixing those typos […]" [puppet] - 10https://gerrit.wikimedia.org/r/487917 (https://phabricator.wikimedia.org/T201491) (owner: 10Dzahn)
[20:03:34] <wikibugs>	 (03CR) 10Krinkle: "@Thiemo It is not. You may be looking for grunt-tyops, https://github.com/wikimedia/grunt-tyops." [puppet] - 10https://gerrit.wikimedia.org/r/487917 (https://phabricator.wikimedia.org/T201491) (owner: 10Dzahn)
[20:05:03] <wikibugs>	 (03CR) 10Paladox: "> > I don't see the need to have reverted this." [puppet] - 10https://gerrit.wikimedia.org/r/489262 (owner: 10Dzahn)
[20:08:49] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] prometheus::node_exporter: Change OS detection for buster [puppet] - 10https://gerrit.wikimedia.org/r/489246 (owner: 10Muehlenhoff)
[20:08:57] <wikibugs>	 (03PS2) 10Cwhite: prometheus::node_exporter: Change OS detection for buster [puppet] - 10https://gerrit.wikimedia.org/r/489246 (owner: 10Muehlenhoff)
[20:09:08] <wikibugs>	 10Operations, 10ExternalGuidance, 10Traffic, 10Patch-For-Review: Deliver mobile-based version for automatic translations - https://phabricator.wikimedia.org/T212197 (10dr0ptp4kt) Hi team.  Okay, this is activated for Simple English as the source wiki.  Thank you so much @BBlack!  I'll prepare two patches n...
[20:15:52] <wikibugs>	 (03PS6) 10Zoranzoki21: Set wgRestrictionLevels for all Serbian projects to autoconfirmed, autopatrol and sysop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485903
[20:16:52] <wikibugs>	 (03CR) 10Ottomata: Helm chart for eventgate-analytics deployment (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/483035 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata)
[20:17:01] <wikibugs>	 10Operations, 10monitoring, 10Goal, 10Patch-For-Review: Upgrade production prometheus-node-exporter to >= 0.16 - https://phabricator.wikimedia.org/T213708 (10colewhite)
[20:17:51] <wikibugs>	 (03CR) 10MarcoAurelio: "Hi," [puppet] - 10https://gerrit.wikimedia.org/r/487917 (https://phabricator.wikimedia.org/T201491) (owner: 10Dzahn)
[20:18:31] <wikibugs>	 (03PS4) 10Herron: lists:warn if unknown host issues mail from cmd containing our domain [puppet] - 10https://gerrit.wikimedia.org/r/488602 (https://phabricator.wikimedia.org/T215251)
[20:21:07] <wikibugs>	 (03CR) 10Herron: [C: 03+2] lists:warn if unknown host issues mail from cmd containing our domain [puppet] - 10https://gerrit.wikimedia.org/r/488602 (https://phabricator.wikimedia.org/T215251) (owner: 10Herron)
[20:22:31] <wikibugs>	 (03PS7) 10Zoranzoki21: Set wgRestrictionLevels for all Serbian projects to autoconfirmed, autopatrol, patroller, rollbacker and sysop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/485903 (https://phabricator.wikimedia.org/T215653)
[20:29:37] <wikibugs>	 (03PS4) 10Ottomata: Refactor mysql::config::client to mariadb::config::client [puppet] - 10https://gerrit.wikimedia.org/r/482693 (https://phabricator.wikimedia.org/T162070)
[20:31:15] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Refactor mysql::config::client to mariadb::config::client [puppet] - 10https://gerrit.wikimedia.org/r/482693 (https://phabricator.wikimedia.org/T162070) (owner: 10Ottomata)
[20:32:32] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Ottomata)
[20:44:16] <wikibugs>	 (03PS1) 10Ottomata: Remove role::wikimetrics::staging [puppet] - 10https://gerrit.wikimedia.org/r/489314 (https://phabricator.wikimedia.org/T162070)
[20:44:56] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Remove role::wikimetrics::staging [puppet] - 10https://gerrit.wikimedia.org/r/489314 (https://phabricator.wikimedia.org/T162070) (owner: 10Ottomata)
[20:45:31] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Ottomata)
[20:46:49] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Ottomata) a:05Ottomata→03Dzahn Thanks Daniel!  The Analytics usages are gone.  I'm ass...
[20:49:40] <wikibugs>	 (03CR) 10Ottomata: "Same, thanks yall!" [puppet] - 10https://gerrit.wikimedia.org/r/489211 (https://phabricator.wikimedia.org/T214706) (owner: 10Bmansurov)
[21:13:41] <mutante>	 ottomata: :) thanks!
[21:16:55] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[21:17:39] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 450.80 seconds
[21:17:41] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 451.62 seconds
[21:17:49] * gehel is looking at wdqs1005
[21:18:55] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1005 is OK: OK - running: The system is fully operational
[21:19:12] <ottomata>	 sho thang mutante :)
[21:19:47] <wikibugs>	 (03PS3) 10Reedy: Stop breaking blame for wikimedia special cases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473487
[21:19:51] <Reedy>	 jouncebot: now
[21:19:52] <jouncebot>	 No deployments scheduled for the next 61 hour(s) and 10 minute(s)
[21:19:54] <Reedy>	 jouncebot: next
[21:19:54] <jouncebot>	 In 61 hour(s) and 10 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190211T1030)
[21:22:58] <wikibugs>	 (03PS4) 10Reedy: Stop breaking blame for wikimedia special cases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473487
[21:23:08] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Stop breaking blame for wikimedia special cases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473487 (owner: 10Reedy)
[21:23:53] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 27.68 seconds
[21:23:55] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 21.57 seconds
[21:23:59] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: EDAC events not being reported by node-exporter? - https://phabricator.wikimedia.org/T214529 (10BBlack) >>! In T214529#4936555, @CDanis wrote: >>>> Corrected errors are normal and expected to occur on healthy >>>> hardware. They do not need user's attention unt...
[21:24:14] <wikibugs>	 (03Merged) 10jenkins-bot: Stop breaking blame for wikimedia special cases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473487 (owner: 10Reedy)
[21:24:57] <wikibugs>	 (03CR) 10jenkins-bot: Stop breaking blame for wikimedia special cases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473487 (owner: 10Reedy)
[21:25:59] <logmsgbot>	 !log reedy@deploy1001 Synchronized multiversion/MWMultiVersion.php: Move variable (duration: 00m 49s)
[21:26:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:21] <wikibugs>	 (03CR) 10Volans: "I'm not very familiar with our puppet tests, I can have a more deeper look next week. But CI seems to be happy :)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275) (owner: 10Jbond)
[21:36:16] <wikibugs>	 (03PS15) 10Jbond: Improve CI checks to ensure a basic catalogue compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275)
[21:37:21] <wikibugs>	 (03CR) 10Jbond: "thanks" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275) (owner: 10Jbond)
[21:37:42] <wikibugs>	 (03PS16) 10Jbond: Improve CI checks to ensure a basic catalogue compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275)
[21:42:21] <wikibugs>	 10Puppet, 10Cloud-VPS, 10serviceops: upgrade simplelap / simpelap classes (apache -> httpd and mysql -> mariadb) or deprecate them - https://phabricator.wikimedia.org/T215662 (10Dzahn)
[21:44:08] <wikibugs>	 10Puppet, 10Cloud-VPS, 10serviceops: upgrade simplelap / simpelap classes (apache -> httpd and mysql -> mariadb) or deprecate them - https://phabricator.wikimedia.org/T215662 (10Dzahn)
[21:45:32] <wikibugs>	 10Puppet, 10Cloud-VPS, 10serviceops: upgrade simplelap / simpelap classes (apache -> httpd and mysql -> mariadb) or deprecate them - https://phabricator.wikimedia.org/T215662 (10Dzahn)
[21:45:49] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 297.02 seconds
[21:48:11] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-1] "This patch gets the terminal up but the 'autologin root' bit does nothing at all.  I've tried a bunch of variations of this patch to no av" [puppet] - 10https://gerrit.wikimedia.org/r/489299 (https://phabricator.wikimedia.org/T215211) (owner: 10Andrew Bogott)
[21:48:28] <wikibugs>	 10Puppet, 10Cloud-VPS, 10serviceops: upgrade simplelap / simplelamp classes (apache -> httpd and mysql -> mariadb) or deprecate them - https://phabricator.wikimedia.org/T215662 (10Dzahn)
[21:56:22] <wikibugs>	 10Puppet, 10Cloud-VPS, 10serviceops: upgrade simplelap / simplelamp classes (apache -> httpd and mysql -> mariadb) or deprecate them - https://phabricator.wikimedia.org/T215662 (10Dzahn)
[22:15:43] <wikibugs>	 10Puppet, 10Cloud-VPS, 10serviceops: upgrade simplelamp class (apache -> httpd and mysql -> mariadb) or deprecate it - https://phabricator.wikimedia.org/T215662 (10Dzahn)
[22:16:54] <wikibugs>	 (03PS1) 10Cwhite: prometheus: post-upgrade node-exporter cleanup [puppet] - 10https://gerrit.wikimedia.org/r/489325 (https://phabricator.wikimedia.org/T213708)
[22:18:51] <wikibugs>	 (03PS1) 10Dzahn: convert simplelamp from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/489326 (https://phabricator.wikimedia.org/T215662)
[22:22:56] <wikibugs>	 (03PS1) 10Dzahn: convert simplelamp from mysql to mariadb [puppet] - 10https://gerrit.wikimedia.org/r/489328 (https://phabricator.wikimedia.org/T215662)
[22:26:33] <wikibugs>	 (03CR) 10Dzahn: "p" [puppet] - 10https://gerrit.wikimedia.org/r/489326 (https://phabricator.wikimedia.org/T215662) (owner: 10Dzahn)
[22:27:35] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn)
[22:28:15] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn)
[22:30:19] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) >>! In T162070#4939472, @Ottomata wrote: > Thanks Daniel!  The Analytics usages are...
[22:56:35] <Reedy>	 !log running `refreshImageMetadata.php --mediatype BITMAP --mime image/vnd.djvu` against commonswiki on mwmaint1002 T215635
[22:56:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:38] <stashbot>	 T215635: Run refreshImageMetadata.php for new media type of DjVu files - https://phabricator.wikimedia.org/T215635
[23:03:47] <James_F>	 Reedy: Hah.
[23:04:01] <James_F>	 Reedy: I was planning to do that next week, but OK.
[23:04:21] <Reedy>	 It took a lot of my time to do it :P
[23:04:54] * James_F grins.
[23:05:03] <James_F>	 Also, doesn't that want a ForEachWiki?
[23:05:42] <Reedy>	 Probably
[23:05:52] <Reedy>	 But commons is going to have most of them presumably...
[23:06:15] <James_F>	 Indeed. And given that this is for the SDC search feature, it's rather lower value elsewhere.
[23:06:26] <James_F>	 Reedy: Do you know how to write an update.php call for this?
[23:07:04] <Reedy>	 runMaintenance
[23:07:08] <wikibugs>	 10Operations: Reset Wikitech 2FA access for MarkAHershberger - https://phabricator.wikimedia.org/T215676 (10MarkAHershberger)
[23:07:10] <Reedy>	 I don't know about passing parameters though
[23:07:14] * James_F looks.
[23:07:42] <Reedy>	 I guess DatabaseUpdater::$postDatabaseUpdateMaintenance technically
[23:08:18] <James_F>	 DatabaseUpdater::runMaintenance doesn't take options, yeah.
[23:08:18] <Reedy>	 So it might be a simple wrapper function needed
[23:09:20] <James_F>	 Yeah. Also refreshImageMetadata doesn't extend LoggedUpdateMaintenance so…
[23:09:53] <James_F>	 Maybe I should make a RefreshImageMetadataForUpdate class that does it?
[23:13:30] <wikibugs>	 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Reset Wikitech 2FA access for MarkAHershberger - https://phabricator.wikimedia.org/T215676 (10bd808) 05Open→03Resolved a:03bd808 ` $ ssh bastion-eqiad1-01.bastion.eqiad.wmflabs # ls -alh /home/mah/Please-reset-wikitech-2fa.txt -rw-...
[23:19:21] <Reedy>	 Finished refreshing file metadata for 106634 files. 0 were refreshed, 106634 were already up to date, and 0 refreshes were suspicious.
[23:19:28] <Reedy>	 James_F: Think we need a force?
[23:19:47] * Reedy looks at teh db
[23:20:08] <James_F>	 Reedy: Oh, maybe, yeah, "Reload metadata from file even if the metadata looks ok"
[23:20:24] <wikibugs>	 10Puppet, 10Cloud-VPS, 10serviceops, 10Patch-For-Review: upgrade simplelamp class (apache -> httpd and mysql -> mariadb) or deprecate it - https://phabricator.wikimedia.org/T215662 (10Slevinski) Thanks for the heads up.  The SignWriting project is active and the two websites I administer are needed.  I che...
[23:20:26] <James_F>	 Do we really only have 11k DjVu files on all of Commons?
[23:20:44] <Reedy>	 106K?
[23:21:00] <James_F>	 Oh, right. That's plausible.
[23:21:02] <Reedy>	 heh
[23:21:59] <James_F>	 I guess if we think file metadata might change from run to run, we should just force-run it for all files on update.php?
[23:22:09] <Reedy>	 wikiadmin@10.64.48.11(commonswiki)> select img_media_type from image where img_name = 'Capital_by_Marx,_Karl.djvu'\G
[23:22:09] <Reedy>	 *************************** 1. row ***************************
[23:22:09] <Reedy>	 img_media_type: BITMAP
[23:22:09] <Reedy>	 1 row in set (0.00 sec)
[23:23:24] <Reedy>	 running `refreshImageMetadata.php --mediatype BITMAP --mime image/vnd.djvu --force` against commonswiki on mwmaint1002 T215635 (this time we mean it)
[23:23:25] <stashbot>	 T215635: Run refreshImageMetadata.php for new media type of DjVu files - https://phabricator.wikimedia.org/T215635
[23:23:32] <Reedy>	 !log running `refreshImageMetadata.php --mediatype BITMAP --mime image/vnd.djvu --force` against commonswiki on mwmaint1002 T215635 (this time we mean it)
[23:23:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:28:26] <Reedy>	 quite a bit slower this time
[23:34:56] <James_F>	 Well, you are doing DB writes this time around, so yeah.
[23:47:09] <wikibugs>	 10Operations, 10Traffic, 10Core Platform Team Backlog (Designing), 10MW-1.33-notes (1.33.0-wmf.17; 2019-02-12), and 6 others: Harmonise the identification of requests across our stack - https://phabricator.wikimedia.org/T201409 (10mobrovac) >>! In T201409#4939220, @Anomie wrote: > Defaulted to "never accep...
[23:59:02] <wikibugs>	 10Puppet, 10Cloud-VPS, 10serviceops, 10Patch-For-Review: upgrade simplelamp class (apache -> httpd and mysql -> mariadb) or deprecate it - https://phabricator.wikimedia.org/T215662 (10Dzahn) @Slevinski Not yet, i would warn you before i actually merge any changes. But be aware this role is currently broken...