[00:00:36] <wikibugs_>	 (03CR) 10jenkins-bot: Configure Babel for elwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346044 (https://phabricator.wikimedia.org/T161593) (owner: 10DatGuy)
[00:01:56] <wikibugs_>	 (03CR) 10jenkins-bot: Test LoginNotify on Beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345726 (https://phabricator.wikimedia.org/T158878) (owner: 10Niharika29)
[00:02:49] <wikibugs>	 (03CR) 10jenkins-bot: Convert reference lists to 'responsive' on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346043 (https://phabricator.wikimedia.org/T161804) (owner: 10DatGuy)
[00:03:41] <thcipriani>	 Niharika: so once https://integration.wikimedia.org/ci/job/beta-scap-eqiad/149291/console completes your LoginNotify patch should be live on beta
[00:04:01] <Niharika>	 thcipriani: Cool! 
[00:16:57] <Niharika>	 thcipriani: Hmm, I don't see the extension on https://deployment.wikimedia.beta.wmflabs.org/wiki/Special:Version Did I mess up something in the patch?
[00:17:16] <Niharika>	 Nor on https://en.wikipedia.beta.wmflabs.org/wiki/Special:Version
[00:17:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[00:18:58] <wikibugs>	 (03CR) 10Dzahn: [C: 032] Remove Apache <IfVersion < 2.4> across the tree [puppet] - 10https://gerrit.wikimedia.org/r/346128 (owner: 10Faidon Liambotis)
[00:19:16] <thcipriani>	 Niharika: hrm, doesn't look like your patch is there just yet...lemme see if I can manually deploy it
[00:19:26] <Niharika>	 thcipriani: Okay. 
[00:22:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[00:24:28] <wikibugs>	 (03CR) 10Dzahn: [C: 032] aptrepo: remove precise-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/345550 (owner: 10Faidon Liambotis)
[00:25:13] <thcipriani>	 Niharika: something weird with the git fetch/rebase on deployment-tin, once this is done it will be there: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/149293/console
[00:25:17] <thcipriani>	 sorry about that :(
[00:25:32] <Niharika>	 thcipriani: No problem. Thanks for fixing it! 
[00:27:50] <wikibugs_>	 (03PS3) 10Dzahn: aptrepo: remove precise-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/345550 (owner: 10Faidon Liambotis)
[00:30:25] <wikibugs_>	 (03PS4) 10Dzahn: aptrepo: remove precise-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/345550 (owner: 10Faidon Liambotis)
[00:39:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[00:44:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[00:45:49] <mutante>	 !log install1002/2002: sudo -i reprepro --delete clearvanished  to remove precise distro  after merging gerrit:345550
[00:45:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:46:00] <wikibugs>	 (03CR) 10Dzahn: "[install1002:/srv/wikimedia] $ sudo -i reprepro list apache" [puppet] - 10https://gerrit.wikimedia.org/r/345550 (owner: 10Faidon Liambotis)
[00:50:40] <wikibugs>	 (03CR) 10Dzahn: "install2002:  sudo reprepro --delete clearvanished" [puppet] - 10https://gerrit.wikimedia.org/r/345550 (owner: 10Faidon Liambotis)
[00:56:40] <wikibugs>	 (03PS3) 10Dzahn: Add Bytemark to public_mirrors.html list [puppet] - 10https://gerrit.wikimedia.org/r/345325 (https://phabricator.wikimedia.org/T159331) (owner: 10Reedy)
[00:58:03] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] Add Bytemark to public_mirrors.html list [puppet] - 10https://gerrit.wikimedia.org/r/345325 (https://phabricator.wikimedia.org/T159331) (owner: 10Reedy)
[00:58:48] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] hhvm: kill a precise reference [puppet] - 10https://gerrit.wikimedia.org/r/345547 (owner: 10Faidon Liambotis)
[01:00:12] <wikibugs_>	 (03CR) 10Dzahn: "add something to "Location" column?" [puppet] - 10https://gerrit.wikimedia.org/r/345325 (https://phabricator.wikimedia.org/T159331) (owner: 10Reedy)
[01:01:19] <wikibugs>	 (03PS4) 10Dzahn: hhvm: kill a precise reference [puppet] - 10https://gerrit.wikimedia.org/r/345547 (owner: 10Faidon Liambotis)
[01:01:54] <wikibugs_>	 (03PS5) 10Dzahn: hhvm: kill a precise reference [puppet] - 10https://gerrit.wikimedia.org/r/345547 (owner: 10Faidon Liambotis)
[01:05:28] <wikibugs_>	 (03CR) 10Reedy: "How did I miss that? :/" [puppet] - 10https://gerrit.wikimedia.org/r/345325 (https://phabricator.wikimedia.org/T159331) (owner: 10Reedy)
[01:08:07] <wikibugs>	 (03PS1) 10Dzahn: dumps: add location to Bytemark (UK) mirror [puppet] - 10https://gerrit.wikimedia.org/r/346226
[01:08:36] <wikibugs_>	 (03PS1) 10Reedy: Add location of Bytemark mirror [puppet] - 10https://gerrit.wikimedia.org/r/346227 (https://phabricator.wikimedia.org/T159331)
[01:08:38] <Reedy>	 snap
[01:09:18] <mutante>	 hehe, you were probably also waiting for gerrit to take it
[01:10:00] <wikibugs>	 (03CR) 10Dzahn: [C: 032] dumps: add location to Bytemark (UK) mirror [puppet] - 10https://gerrit.wikimedia.org/r/346226 (owner: 10Dzahn)
[01:10:09] <wikibugs_>	 (03PS2) 10Dzahn: dumps: add location to Bytemark (UK) mirror [puppet] - 10https://gerrit.wikimedia.org/r/346226
[01:10:15] <wikibugs>	 (03CR) 10Dzahn: [V: 032 C: 032] dumps: add location to Bytemark (UK) mirror [puppet] - 10https://gerrit.wikimedia.org/r/346226 (owner: 10Dzahn)
[01:10:26] <wikibugs_>	 (03Abandoned) 10Dzahn: Add location of Bytemark mirror [puppet] - 10https://gerrit.wikimedia.org/r/346227 (https://phabricator.wikimedia.org/T159331) (owner: 10Reedy)
[01:14:57] <wikibugs>	 (03PS2) 10Dzahn: releases: remove the precise suite [puppet] - 10https://gerrit.wikimedia.org/r/345838 (owner: 10Faidon Liambotis)
[01:15:57] <wikibugs>	 (03CR) 10Dzahn: "are we waiting until actual EOL?" [puppet] - 10https://gerrit.wikimedia.org/r/345838 (owner: 10Faidon Liambotis)
[01:16:01] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] releases: remove the precise suite [puppet] - 10https://gerrit.wikimedia.org/r/345838 (owner: 10Faidon Liambotis)
[01:18:46] <wikibugs_>	 (03PS3) 10Dzahn: releases: remove the precise suite [puppet] - 10https://gerrit.wikimedia.org/r/345838 (owner: 10Faidon Liambotis)
[01:18:48] <icinga-wm>	 PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:19:08] <wikibugs>	 (03CR) 10Dzahn: "PS3: fixed lint warning that made jenkins-bot -1" [puppet] - 10https://gerrit.wikimedia.org/r/345838 (owner: 10Faidon Liambotis)
[01:19:28] <icinga-wm>	 PROBLEM - puppet last run on elastic1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:26:44] <wikibugs_>	 (03PS1) 10Reedy: Disable LoginNotify on wikis that don't have Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346228 (https://phabricator.wikimedia.org/T158878)
[01:27:01] <Reedy>	 greg-g: ^ Mind if I push that? Only affects InitialiseSettings-labs
[01:29:14] <wikibugs_>	 (03CR) 10Reedy: [C: 032] Disable LoginNotify on wikis that don't have Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346228 (https://phabricator.wikimedia.org/T158878) (owner: 10Reedy)
[01:29:38] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: CRITICAL - Rep Delay is: 79925.430035 Seconds
[01:29:48] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2004 is CRITICAL: CRITICAL - Rep Delay is: 79933.183203 Seconds
[01:30:23] <wikibugs_>	 (03Merged) 10jenkins-bot: Disable LoginNotify on wikis that don't have Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346228 (https://phabricator.wikimedia.org/T158878) (owner: 10Reedy)
[01:30:37] <wikibugs_>	 (03CR) 10jenkins-bot: Disable LoginNotify on wikis that don't have Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346228 (https://phabricator.wikimedia.org/T158878) (owner: 10Reedy)
[01:31:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 80759.079548 Seconds
[01:31:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 80760.148934 Seconds
[01:31:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 80761.051578 Seconds
[01:31:30] <logmsgbot>	 !log reedy@tin Synchronized wmf-config/InitialiseSettings-labs.php: Disable LoginNotify on wikis that have no Echo T158878 (duration: 00m 44s)
[01:31:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:31:37] <stashbot>	 T158878: Test LoginNotify Extension on Beta Cluster - https://phabricator.wikimedia.org/T158878
[01:32:38] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: CRITICAL - Rep Delay is: 80105.167091 Seconds
[01:35:31] <greg-g>	 Reedy: expo facto don't mind :)
[01:35:53] <Reedy>	 Now prod doesn't shit itself because -labs isn't sync'd ;)
[01:36:03] <Reedy>	 *is
[01:36:12] <greg-g>	 :)
[01:39:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[01:41:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[01:42:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 81419.394763 Seconds
[01:43:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[01:43:28] <icinga-wm>	 PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:46:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 81659.472345 Seconds
[01:46:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[01:47:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[01:47:28] <icinga-wm>	 RECOVERY - puppet last run on elastic1049 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[01:47:48] <icinga-wm>	 RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[01:50:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 81900.30285 Seconds
[01:54:05] <wikibugs_>	 (03CR) 10Aude: [C: 031] Don't set removed Wikibase client settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346161 (owner: 10Hoo man)
[01:54:38] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: OK - Rep Delay is: 27.964003 Seconds
[01:54:38] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2002 is OK: OK - Rep Delay is: 27.966124 Seconds
[01:54:48] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2004 is OK: OK - Rep Delay is: 35.808315 Seconds
[01:55:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 56.283486 Seconds
[01:55:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 57.210154 Seconds
[01:55:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 58.137525 Seconds
[02:11:28] <icinga-wm>	 RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[02:25:08] <wikibugs>	 06Operations, 06Office-IT, 07LDAP: Remove disabled users from internal mailing lists - https://phabricator.wikimedia.org/T161004#3152922 (10mmodell) @MoritzMuehlenhoff AFAIK, Phabricator doesn't handle bounces at all and it doesn't handle SMTP envelope rejections very gracefully. Essentially phabricator keep...
[02:26:35] <wikibugs_>	 06Operations, 06Office-IT, 07LDAP: Remove disabled users from internal mailing lists - https://phabricator.wikimedia.org/T161004#3152924 (10mmodell) Also AFAIK, @wikimedia.org email accounts of former staff get disabled at which time they refuse delivery at the SMTP level.
[02:29:00] <wikibugs>	 (03Abandoned) 1020after4: SemanticForms -> PageForms [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327307 (owner: 1020after4)
[02:31:40] <wikibugs_>	 (03CR) 1020after4: [C: 031] Move mwdeploy home to /var/lib where it belongs, it's a system user [puppet] - 10https://gerrit.wikimedia.org/r/323867 (https://phabricator.wikimedia.org/T86971) (owner: 10Chad)
[02:34:19] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.18) (duration: 14m 27s)
[02:34:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:37:29] <icinga-wm>	 PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:39:47] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Apr  4 02:39:47 UTC 2017 (duration 5m 28s)
[02:39:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:06:28] <icinga-wm>	 RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[03:07:49] <wikibugs>	 (03CR) 10Krinkle: [C: 031] l10nupdate: Reduce code duplication in git clone operations [puppet] - 10https://gerrit.wikimedia.org/r/255958 (owner: 10Reedy)
[03:16:58] <icinga-wm>	 PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:41:56] <wikibugs>	 (03PS1) 10Krinkle: [WIP] Document and automate sources of static/project-logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346234 (https://phabricator.wikimedia.org/T98640)
[03:42:07] <wikibugs_>	 (03PS2) 10Krinkle: [WIP] Document and automate sources of static/project-logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346234 (https://phabricator.wikimedia.org/T98640)
[03:42:49] <wikibugs>	 (03PS3) 10Krinkle: [WIP] Document and automate sources of static/project-logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346234 (https://phabricator.wikimedia.org/T98640)
[03:45:58] <icinga-wm>	 RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[03:57:18] <wikibugs>	 (03PS4) 10Krinkle: [WIP] Document and automate sources of static/project-logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346234 (https://phabricator.wikimedia.org/T98640)
[03:57:41] <wikibugs_>	 (03PS5) 10Krinkle: [WIP] Document and automate sources of static/project-logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346234 (https://phabricator.wikimedia.org/T98640)
[04:19:28] <wikibugs>	 (03CR) 10Krinkle: "Please do scrutinise my meagre attempt at writing Python." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346234 (https://phabricator.wikimedia.org/T98640) (owner: 10Krinkle)
[04:23:28] <icinga-wm>	 PROBLEM - puppet last run on db1070 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:28:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[04:33:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[04:37:00] <wikibugs_>	 06Operations, 06Office-IT, 07LDAP: Remove disabled users from internal mailing lists - https://phabricator.wikimedia.org/T161004#3152956 (10bbogaert) Hi @MoritzMuehlenhoff ,    >>! In T161004#3150359, @MoritzMuehlenhoff wrote: > ... > @bbogaert: If OIT offboards a staff member, does the @wikimedia.org contin...
[04:48:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 7106.271459 Seconds
[04:48:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 7106.405852 Seconds
[04:48:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 7106.41168 Seconds
[04:49:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[04:49:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[04:49:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[04:52:28] <icinga-wm>	 RECOVERY - puppet last run on db1070 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[05:01:00] <wikibugs_>	 (03PS1) 10Dzahn: nagios_common: enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236
[05:15:09] <wikibugs_>	 (03PS2) 10Dzahn: nagios_common: fi/enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085)
[05:19:01] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "@Muehlenhoff this is still a little bit WIP, somehow i could not compile it earlier but error is probably unrelated. anyways, you get the " [puppet] - 10https://gerrit.wikimedia.org/r/346183 (owner: 10Dzahn)
[05:19:27] <wikibugs_>	 (03PS3) 10Dzahn: nagios_common: fix/enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085)
[05:21:17] <wikibugs>	 (03PS4) 10Dzahn: nagios_common: fix/enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085)
[05:26:58] <wikibugs_>	 (03PS5) 10Dzahn: nagios_common: fix/enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085)
[05:33:57] <wikibugs_>	 (03PS6) 10Dzahn: nagios_common: fix/enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085)
[05:41:48] <icinga-wm>	 PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 626 600 - REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3163620 keys, up 11 days 13 hours - replication_delay is 626
[05:43:28] <icinga-wm>	 RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures
[05:44:48] <icinga-wm>	 RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3145045 keys, up 11 days 13 hours - replication_delay is 13
[05:45:08] <wikibugs_>	 06Operations, 06Discovery, 10Wikidata, 10Wikidata-Query-Service, 06Discovery-Search (Current work): Make WDQS active / active - https://phabricator.wikimedia.org/T162111#3152986 (10Gehel)
[05:45:17] <wikibugs>	 06Operations, 06Discovery, 10Wikidata, 10Wikidata-Query-Service, 06Discovery-Search (Current work): Make WDQS active / active - https://phabricator.wikimedia.org/T162111#3152999 (10Gehel) p:05Triage>03High
[05:52:25] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346238 (https://phabricator.wikimedia.org/T160390)
[05:54:48] <icinga-wm>	 PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 613 600 - REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3145045 keys, up 11 days 13 hours - replication_delay is 613
[05:57:12] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346238 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[05:57:38] <icinga-wm>	 PROBLEM - puppet last run on etcd1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:58:18] <wikibugs_>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346238 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[05:58:27] <wikibugs_>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346238 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[06:01:58] <icinga-wm>	 RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3144551 keys, up 11 days 13 hours - replication_delay is 24
[06:05:48] <wikibugs>	 (03CR) 10Gehel: elasticsearch - move role::elasticsearch::common to a profile (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342248 (https://phabricator.wikimedia.org/T147718) (owner: 10Gehel)
[06:05:59] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1034" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346239
[06:06:07] <wikibugs>	 (03PS11) 10Gehel: elasticsearch - move role::elasticsearch::common to a profile [puppet] - 10https://gerrit.wikimedia.org/r/342248 (https://phabricator.wikimedia.org/T147718)
[06:07:30] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch - move role::elasticsearch::common to a profile [puppet] - 10https://gerrit.wikimedia.org/r/342248 (https://phabricator.wikimedia.org/T147718) (owner: 10Gehel)
[06:07:46] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1034" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346239 (owner: 10Marostegui)
[06:09:05] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1034" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346239 (owner: 10Marostegui)
[06:09:13] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1034" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346239 (owner: 10Marostegui)
[06:09:14] <wikibugs>	 06Operations, 06Commons, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Allow anonymous users to change interface language on Commons with ULS - https://phabricator.wikimedia.org/T161517#3153014 (10Steinsplitter) @ema will this be fixed soon? If not i have to fix stuff & update the MediaWiki message o...
[06:25:38] <icinga-wm>	 RECOVERY - puppet last run on etcd1003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[06:27:46] <marostegui>	 !log Deploy schema change db1015 (s3) -  https://phabricator.wikimedia.org/T159319
[06:27:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:28:38] <icinga-wm>	 PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:29:18] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Depool db2068 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346243 (https://phabricator.wikimedia.org/T160390)
[06:32:19] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2068 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346243 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[06:33:31] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw.php: Depool db2068 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346243 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[06:33:43] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Depool db2068 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346243 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[06:34:32] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2068 - T160390 (duration: 00m 44s)
[06:34:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:34:39] <stashbot>	 T160390: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390
[06:35:56] <marostegui>	 !log Deploy schema change db2068 (s7) - T160390
[06:36:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:40:25] <wikibugs_>	 (03CR) 10Muehlenhoff: "There's no real point in making this Hiera-configurable, this is just a temporary test setup and when the tests are completed, it'll be ap" [puppet] - 10https://gerrit.wikimedia.org/r/346183 (owner: 10Dzahn)
[06:43:54] <marostegui>	 !log Deploy alter table on db2019 (codfw s4 master) - this will generate lag on codfw for s4 - T161683
[06:44:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:44:01] <stashbot>	 T161683: Remove partitioning from db2019 (codfw master) commonswiki.templatelinks - https://phabricator.wikimedia.org/T161683
[06:48:18] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[06:53:18] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[06:55:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[06:57:28] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0]
[06:57:28] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [1000.0]
[07:00:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[07:02:10] <wikibugs_>	 (03CR) 10Thiemo Mättig (WMDE): [C: 031] Don't set removed Wikibase client settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346161 (owner: 10Hoo man)
[07:02:38] <icinga-wm>	 RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[07:07:55] <wikibugs_>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3153092 (10Nemo_bis)
[07:08:29] <wikibugs>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150362 (10Nemo_bis) (Fixed summary to reflect the "original" bug repor...
[07:12:48] <icinga-wm>	 PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:17:05] <wikibugs>	 (03PS12) 10Gehel: elasticsearch - move role::elasticsearch::common to a profile [puppet] - 10https://gerrit.wikimedia.org/r/342248 (https://phabricator.wikimedia.org/T147718)
[07:18:28] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[07:20:34] <wikibugs_>	 (03PS13) 10Gehel: elasticsearch - move role::elasticsearch::common to a profile [puppet] - 10https://gerrit.wikimedia.org/r/342248 (https://phabricator.wikimedia.org/T147718)
[07:21:28] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[07:24:54] <wikibugs>	 (03PS1) 10Mobrovac: RESTBase: Migrate to Scap3 deployment [puppet] - 10https://gerrit.wikimedia.org/r/346248 (https://phabricator.wikimedia.org/T116335)
[07:26:28] <icinga-wm>	 PROBLEM - puppet last run on db1077 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:27:55] <wikibugs_>	 06Operations, 10Traffic, 06WMF-Design, 10Wikimedia-General-or-Unknown, 07Design: Better WMF error pages - https://phabricator.wikimedia.org/T76560#3153103 (10Nemo_bis) >>! In T76560#807460, @Nirzar wrote: > We were trying to populate this spread sheet with common errors  > https://docs.google.com/a/wikim...
[07:29:15] <_joe_>	 mobrovac_: back in my TZ or up at ungodly hours?
[07:32:24] <wikibugs>	 (03PS14) 10Gehel: elasticsearch - move role::elasticsearch::common to a profile [puppet] - 10https://gerrit.wikimedia.org/r/342248 (https://phabricator.wikimedia.org/T147718)
[07:35:12] <elukey>	 !log reimage analytics103[234] to Debian Jessie
[07:35:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:35:54] <wikibugs_>	 06Operations, 07HHVM, 10Wikimedia-General-or-Unknown: HHVM and PCRE v8.31 gives incorrect results for certain PCRE patterns - https://phabricator.wikimedia.org/T73922#3153119 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff We're using Debian jessie for a while now (which has PCRE 8.35) an...
[07:40:48] <icinga-wm>	 RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[07:45:18] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[07:48:58] <icinga-wm>	 PROBLEM - puppet last run on multatuli is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:50:18] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[07:52:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[07:54:08] <moritzm>	 !log rebooting bast2001 to Linux 4.9
[07:54:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:54:38] <icinga-wm>	 RECOVERY - puppet last run on db1077 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[07:57:34] <wikibugs_>	 06Operations, 10Traffic: lvs2002 hanging, usb messages flooding kernel logs - https://phabricator.wikimedia.org/T162117#3153143 (10ema)
[07:57:46] <wikibugs>	 06Operations, 10Traffic: lvs2002 hanging, usb messages flooding kernel logs - https://phabricator.wikimedia.org/T162117#3153132 (10ema) p:05Triage>03Normal
[07:57:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[08:06:14] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Switch master DC from eqiad to codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346251
[08:09:16] <wikibugs_>	 (03PS1) 10Muehlenhoff: Fix date calculation for accounts with expiry date [puppet] - 10https://gerrit.wikimedia.org/r/346254
[08:13:26] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1015" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346258
[08:13:31] <wikibugs>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1015" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346258
[08:15:46] <wikibugs_>	 (03PS2) 10Muehlenhoff: Fix date calculation for accounts with expiry date [puppet] - 10https://gerrit.wikimedia.org/r/346254
[08:16:58] <icinga-wm>	 RECOVERY - puppet last run on multatuli is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[08:17:05] <wikibugs_>	 06Operations, 10Traffic, 06WMF-Design, 10Wikimedia-General-or-Unknown, 07Design: Better WMF error pages - https://phabricator.wikimedia.org/T76560#3153160 (10Nemo_bis)
[08:17:21] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1015" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346258 (owner: 10Marostegui)
[08:18:23] <wikibugs_>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3153163 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analytics1032.eqiad.wmnet', 'analytics1033....
[08:18:30] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1015" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346258 (owner: 10Marostegui)
[08:18:39] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1015" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346258 (owner: 10Marostegui)
[08:19:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Fix date calculation for accounts with expiry date [puppet] - 10https://gerrit.wikimedia.org/r/346254 (owner: 10Muehlenhoff)
[08:19:32] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1015 - T159319 (duration: 00m 45s)
[08:19:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:19:38] <wikibugs_>	 06Operations, 10Traffic, 06WMF-Design, 10Wikimedia-General-or-Unknown, 07Design: Better WMF error pages - https://phabricator.wikimedia.org/T76560#3153164 (10Nemo_bis)
[08:24:12] <wikibugs_>	 06Operations, 06Commons, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Allow anonymous users to change interface language on Commons with ULS - https://phabricator.wikimedia.org/T161517#3153178 (10ema) >>! In T161517#3153014, @Steinsplitter wrote: > @ema will this be fixed soon? If not i have to fix...
[08:27:09] <wikibugs_>	 06Operations, 10Traffic: lvs2002 hanging, usb messages flooding kernel logs - https://phabricator.wikimedia.org/T162117#3153180 (10ema) Oh and apparently the repeated USB messages have been reported already in T148017.
[08:30:48] <wikibugs_>	 (03CR) 10Nemo bis: "It's necessary to tell users and translators how they can get the errors use their language now. Will you take care of that (in particular" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345274 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle)
[08:34:23] <wikibugs>	 06Operations, 06Commons, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Allow anonymous users to change interface language on Commons with ULS - https://phabricator.wikimedia.org/T161517#3153187 (10Nemo_bis) This is the only pending question, isn't it?  >>! In T161517#3140732, @Krinkle wrote: > If we...
[08:44:45] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[08:47:41] <moritzm>	 !log rebooting mw1265 to Linux 4.9
[08:47:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:52] <wikibugs_>	 (03PS1) 10Nemo bis: Make Wikipedia link on 404 page language-agnostic via Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346264 (https://phabricator.wikimedia.org/T113114)
[08:49:45] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[08:50:50] <wikibugs_>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3153194 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1032.eqiad.wmnet', 'analytics1033.eqiad.wmnet', 'analytics1034.eqiad.wmnet'] ```  Of...
[08:52:53] <wikibugs_>	 (03CR) 10Hoo man: "Do we have any chance to know the project here, if so, you could use https://www.wikidata.org/wiki/Special:GoToLinkedPage/enwiki/Q208219" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346264 (https://phabricator.wikimedia.org/T113114) (owner: 10Nemo bis)
[08:56:05] <icinga-wm>	 PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:10:24] <volans>	 !log restarted swiftrepl (repl_all.sh loop) on ms-fe1005
[09:10:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:42] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[09:16:42] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[09:24:12] <icinga-wm>	 RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[09:27:18] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[09:27:23] <wikibugs_>	 06Operations, 10media-storage: Swiftrepl was stuck in an infinite loop since days - https://phabricator.wikimedia.org/T162122#3153254 (10Volans)
[09:29:50] <wikibugs>	 06Operations, 10media-storage: Running swiftrepl is not puppetized - https://phabricator.wikimedia.org/T162123#3153268 (10Volans)
[09:40:13] <moritzm>	 !log rebooting wtp1001 to Linux 4.9
[09:40:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:18] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[09:45:01] <wikibugs_>	 (03PS1) 10Hoo man: Add ll to my bash aliases [puppet] - 10https://gerrit.wikimedia.org/r/346270
[09:49:18] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[09:53:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[09:54:18] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[09:58:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[10:14:26] <wikibugs>	 06Operations, 10Traffic: lvs2002 hanging, usb messages flooding kernel logs - https://phabricator.wikimedia.org/T162117#3153340 (10ema)
[10:14:47] <wikibugs>	 06Operations, 10ops-codfw, 10Traffic: lvs2002 random shut down - https://phabricator.wikimedia.org/T162099#3152568 (10ema) p:05Triage>03Normal
[10:15:08] <icinga-wm>	 PROBLEM - puppet last run on planet1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:44:08] <icinga-wm>	 RECOVERY - puppet last run on planet1001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[10:45:58] <icinga-wm>	 PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 621 600 - REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3150049 keys, up 11 days 18 hours - replication_delay is 621
[10:46:42] <wikibugs>	 06Operations, 15User-Elukey, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3153376 (10elukey) >>! In T125735#3152656, @aaron wrote: > In $wmgRedisQueueBaseConfig in wmf-config/jobqueu...
[10:50:11] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10jcrespo)
[10:55:38] <wikibugs_>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153405 (10jcrespo) Origin ips (under NDA): {P5199}  The queries done are:  ``` ?format=json&action=parse&page=[*title*]&prop=tex...
[10:58:08] <icinga-wm>	 RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3137604 keys, up 11 days 18 hours - replication_delay is 0
[11:01:15] <wikibugs_>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150362 (10TheDJ) I don't think that the purge was complete. This one h...
[11:02:38] <icinga-wm>	 PROBLEM - IPsec on cp2003 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp3003_v4, cp3003_v6
[11:02:48] <icinga-wm>	 PROBLEM - IPsec on cp2009 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp3003_v4, cp3003_v6
[11:02:58] <icinga-wm>	 PROBLEM - IPsec on cp2015 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp3003_v4, cp3003_v6
[11:02:58] <icinga-wm>	 PROBLEM - IPsec on cp1060 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3003_v4, cp3003_v6
[11:02:58] <icinga-wm>	 PROBLEM - IPsec on cp1047 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3003_v4, cp3003_v6
[11:03:08] <icinga-wm>	 PROBLEM - IPsec on cp2021 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp3003_v4, cp3003_v6
[11:03:18] <icinga-wm>	 PROBLEM - IPsec on cp1059 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3003_v4, cp3003_v6
[11:03:23] <ema>	 that's me, looking ^
[11:03:28] <icinga-wm>	 PROBLEM - IPsec on cp1046 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3003_v4, cp3003_v6
[11:05:28] <icinga-wm>	 PROBLEM - Host cp3003 is DOWN: PING CRITICAL - Packet loss = 100%
[11:06:25] <ema>	 cp3003 did reboot into 4.9 but eth0 is marked as down
[11:16:05] <icinga-wm>	 ACKNOWLEDGEMENT - Host cp3003 is DOWN: PING CRITICAL - Packet loss = 100% Ema eth0 issues upon reboot into 4.9, host depooled
[11:30:25] <wikibugs_>	 06Operations, 07Puppet, 06Discovery, 06Maps, 03Interactive-Sprint: Puppet fails with "Could not find init script for 'postgresql@9.4-main'" on maps / labs server - https://phabricator.wikimedia.org/T161893#3146429 (10akosiaris) Something is rotten in the state of maps-cleartables  ``` akosiaris@maps-clea...
[11:31:17] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2068" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346277
[11:31:22] <wikibugs_>	 (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2068" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346277
[11:36:44] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2068" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346277 (owner: 10Marostegui)
[11:37:57] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2068" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346277 (owner: 10Marostegui)
[11:38:06] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2068" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346277 (owner: 10Marostegui)
[11:39:11] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2068 - T160390 (duration: 00m 58s)
[11:39:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:20] <stashbot>	 T160390: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390
[11:39:46] <wikibugs_>	 (03PS1) 10Marostegui: db-codfw.php: Depool db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346278 (https://phabricator.wikimedia.org/T160390)
[11:39:54] <wikibugs_>	 (03PS3) 10Muehlenhoff: Disable wireshark-common/install-setuid to avoid debconf prompt [puppet] - 10https://gerrit.wikimedia.org/r/346162
[11:42:48] <icinga-wm>	 PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:44:06] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346278 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[11:45:24] <wikibugs_>	 (03Merged) 10jenkins-bot: db-codfw.php: Depool db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346278 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[11:46:18] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2061 - T160390 (duration: 00m 44s)
[11:46:23] <marostegui>	 !log Deploy schema change db2061 (s7) - T160390
[11:46:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:25] <stashbot>	 T160390: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390
[11:46:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:37] <wikibugs_>	 (03CR) 10jenkins-bot: db-codfw.php: Depool db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346278 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[11:50:37] <wikibugs_>	 06Operations, 10ops-esams, 10Traffic: cp3003 network interface issues - https://phabricator.wikimedia.org/T162132#3153465 (10ema)
[11:50:45] <wikibugs>	 06Operations, 10ops-esams, 10Traffic: cp3003 network interface issues - https://phabricator.wikimedia.org/T162132#3153480 (10ema) p:05Triage>03Normal
[11:51:25] <wikibugs_>	 06Operations, 07Puppet, 06Discovery, 06Maps, 03Interactive-Sprint: Puppet fails with "Could not find init script for 'postgresql@9.4-main'" on maps / labs server - https://phabricator.wikimedia.org/T161893#3153481 (10Gehel) @akosiaris Thanks for looking into this! How did I end up with puppet isntalled a...
[11:52:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1046 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3003_v4, cp3003_v6 Ema https://phabricator.wikimedia.org/T162132
[11:52:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1047 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3003_v4, cp3003_v6 Ema https://phabricator.wikimedia.org/T162132
[11:52:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1059 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3003_v4, cp3003_v6 Ema https://phabricator.wikimedia.org/T162132
[11:52:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1060 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3003_v4, cp3003_v6 Ema https://phabricator.wikimedia.org/T162132
[11:52:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2003 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp3003_v4, cp3003_v6 Ema https://phabricator.wikimedia.org/T162132
[11:52:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2009 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp3003_v4, cp3003_v6 Ema https://phabricator.wikimedia.org/T162132
[11:52:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2015 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp3003_v4, cp3003_v6 Ema https://phabricator.wikimedia.org/T162132
[11:52:31] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2021 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp3003_v4, cp3003_v6 Ema https://phabricator.wikimedia.org/T162132
[11:53:31] <wikibugs>	 (03PS1) 10Volans: Switchdc: add profile to install and configure it [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178)
[11:53:51] <elukey>	 !log reimage analytics10[36,37,38] to Debian Jessie
[11:53:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:56:20] <wikibugs_>	 (03CR) 10Marostegui: "@jcrespo, any objection to merge this?" [puppet] - 10https://gerrit.wikimedia.org/r/345545 (https://phabricator.wikimedia.org/T160435) (owner: 10Marostegui)
[11:58:15] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] "We also need to deploy the prometheus mysql exporter deletion too." [puppet] - 10https://gerrit.wikimedia.org/r/345545 (https://phabricator.wikimedia.org/T160435) (owner: 10Marostegui)
[11:58:44] <moritzm>	 !log installing e2fsprogs update from jessie point update
[11:58:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:01:05] <wikibugs>	 (03PS4) 10Marostegui: site.pp,linux-host-entries.ttyS1: Remove db1057 [puppet] - 10https://gerrit.wikimedia.org/r/345545 (https://phabricator.wikimedia.org/T160435)
[12:01:26] <wikibugs_>	 (03CR) 10Marostegui: "> We also need to deploy the prometheus mysql exporter deletion too." [puppet] - 10https://gerrit.wikimedia.org/r/345545 (https://phabricator.wikimedia.org/T160435) (owner: 10Marostegui)
[12:01:46] <wikibugs>	 (03PS5) 10Marostegui: site.pp,linux-host-entries.ttyS1: Remove db1057 [puppet] - 10https://gerrit.wikimedia.org/r/345545 (https://phabricator.wikimedia.org/T160435)
[12:08:20] <wikibugs>	 (03CR) 10Marostegui: [C: 032] site.pp,linux-host-entries.ttyS1: Remove db1057 [puppet] - 10https://gerrit.wikimedia.org/r/345545 (https://phabricator.wikimedia.org/T160435) (owner: 10Marostegui)
[12:08:26] <wikibugs_>	 (03PS1) 10ArielGlenn: add table type to flagged revs table config file for dumps [puppet] - 10https://gerrit.wikimedia.org/r/346280
[12:09:16] <wikibugs_>	 (03PS2) 10ArielGlenn: add table type to flagged revs table config file for dumps [puppet] - 10https://gerrit.wikimedia.org/r/346280
[12:10:23] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] add table type to flagged revs table config file for dumps [puppet] - 10https://gerrit.wikimedia.org/r/346280 (owner: 10ArielGlenn)
[12:10:48] <icinga-wm>	 RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[12:13:31] <wikibugs_>	 06Operations, 10ops-eqiad, 10DBA: Decommission db1057 - https://phabricator.wikimedia.org/T162135#3153532 (10Marostegui)
[12:13:41] <wikibugs_>	 06Operations, 10ops-eqiad, 10DBA: Decommission db1057 - https://phabricator.wikimedia.org/T162135#3153550 (10Marostegui) p:05Triage>03Normal
[12:16:57] <wikibugs>	 06Operations, 10ops-esams, 10Traffic: cp3003 network interface issues - https://phabricator.wikimedia.org/T162132#3153567 (10ema) I've tried a "cold reboot" with `racadm serveraction powerdown ; racadm serveraction powerup` to no avail.
[12:19:18] <wikibugs_>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3153569 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analytics1036.eqiad.wmnet', 'analytics1037....
[12:19:49] <ema>	 !log upgrade cp2003 to linux 4.9 T162029
[12:19:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:19:57] <stashbot>	 T162029: Migrate all jessie hosts to Linux 4.9 - https://phabricator.wikimedia.org/T162029
[12:23:18] <icinga-wm>	 PROBLEM - puppet last run on aqs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:23:59] <ema>	 yay cp2003 made it! :)
[12:46:25] <Zppix>	 Reedy:  have you got a minute to answer a few questions for me?
[12:46:36] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] url_downloader: convert to profile/role [puppet] - 10https://gerrit.wikimedia.org/r/344729 (owner: 10Dzahn)
[12:46:44] <wikibugs>	 (03PS8) 10Alexandros Kosiaris: url_downloader: convert to profile/role [puppet] - 10https://gerrit.wikimedia.org/r/344729 (owner: 10Dzahn)
[12:46:47] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] url_downloader: convert to profile/role [puppet] - 10https://gerrit.wikimedia.org/r/344729 (owner: 10Dzahn)
[12:48:34] <Zppix>	 anyone know where shinken-wm irc bot code can be found (what repo) and where  could i find the same for icinga-wm ?
[12:51:14] <zeljkof>	 hashar: nothing for eu swat today
[12:51:54] <wikibugs_>	 06Operations, 07Puppet, 06Discovery, 06Maps, 03Interactive-Sprint: Puppet fails with "Could not find init script for 'postgresql@9.4-main'" on maps / labs server - https://phabricator.wikimedia.org/T161893#3153620 (10Gehel) 05Open>03Resolved a:03Gehel Some other trouble, but this specific issue is...
[12:52:18] <icinga-wm>	 RECOVERY - puppet last run on aqs1005 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[13:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170404T1300).
[13:00:14] <ema>	 !log cache_upload: ban all objects with content-type ~ "^text" T162035
[13:00:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:22] <stashbot>	 T162035: Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser - https://phabricator.wikimedia.org/T162035
[13:00:53] <wikibugs_>	 (03PS1) 10Addshore: Enable interwikisorting on BETA wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346283
[13:01:40] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: Revert "Revert "Add the LVS blocks to url_downloader"" [puppet] - 10https://gerrit.wikimedia.org/r/207490
[13:02:30] <addshore>	 o/ (I just added 1 patch to swat)
[13:03:03] <Zppix>	 addshore:  question, in regex is it * or . for any char
[13:03:27] <addshore>	 .
[13:03:51] <Zppix>	 so for example addshore  foo.
[13:04:53] <wikibugs_>	 (03CR) 10Addshore: [C: 032] Enable interwikisorting on BETA wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346283 (owner: 10Addshore)
[13:05:07] <moritzm>	 !log installing ca-certificates updates from jessie point update
[13:05:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:14] <wikibugs>	 (03Draft2) 10Zppix: Adding a few more typos that could break things if they aren't tested for. [puppet] - 10https://gerrit.wikimedia.org/r/346282
[13:05:59] <wikibugs_>	 (03Merged) 10jenkins-bot: Enable interwikisorting on BETA wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346283 (owner: 10Addshore)
[13:06:25] <wikibugs>	 (03CR) 10jenkins-bot: Enable interwikisorting on BETA wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346283 (owner: 10Addshore)
[13:06:42] <wikibugs_>	 (03PS2) 10Volans: Switchdc: add profile to install and configure it [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178)
[13:08:28] <wikibugs>	 (03PS3) 10Zppix: Switchdc: add profile to install and configure it [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans)
[13:08:55] <Reedy>	 Zppix: It's generally advised not to touch other peoples patches when they're actively working on them
[13:09:22] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] "https://puppet-compiler.wmflabs.org/6014/aluminium.wikimedia.org/ says it's fine. So 2 years after the first submit this is finally resubm" [puppet] - 10https://gerrit.wikimedia.org/r/207490 (owner: 10Alexandros Kosiaris)
[13:09:50] <logmsgbot>	 !log addshore@tin Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY [[gerrit:346283|Enable interwikisorting on BETA wiktionaries]] (duration: 00m 44s)
[13:09:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:56] <Zppix>	 Reedy: ack
[13:09:59] <wikibugs>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3153643 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analytics1038.eqiad.wmnet'] ``` The log can...
[13:10:05] <addshore>	 afaik thats swat doen then...!
[13:10:07] <addshore>	 *done
[13:11:04] <akosiaris>	 !log add LVS IPs to the url-downloader blacklist now that all nodejs services no longer require it anymore. See https://gerrit.wikimedia.org/r/207490
[13:11:07] <Zppix>	 does anyone know where the icinga-wm  and shinken-wm  repos are for the irc codes
[13:11:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:17] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[13:12:37] <icinga-wm>	 PROBLEM - restbase endpoints health on praseodymium is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:12:37] <icinga-wm>	 PROBLEM - restbase endpoints health on cerium is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:12:37] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:12:57] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:12:57] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:04] <Reedy>	 Zppix: It's just ircecho it seems https://github.com/wikimedia/puppet/blob/e959321aa620b77403cc9379db2e86080323c6e8/modules/shinken/manifests/ircbot.pp#L12
[13:13:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:17] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:17] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:17] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:27] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:27] <icinga-wm>	 PROBLEM - restbase endpoints health on xenon is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:27] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1002 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:27] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:27] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:28] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:28] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:13:29] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: /data/citation/{format}/{query} (Get citation for Darth Vader) is CRITICAL: Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)
[13:15:01] <Reedy>	 akosiaris: ^ Is that your change/
[13:15:27] <akosiaris>	 Reedy: that would be funny
[13:15:32] <akosiaris>	 but maybe
[13:15:47] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[13:16:18] <Reedy>	 Just with restbase using node stuff..
[13:16:18] <akosiaris>	 I 'll revert just for good measure and let's see later
[13:16:40] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Revert "Revert "Revert "Add the LVS blocks to url_downloader""" [puppet] - 10https://gerrit.wikimedia.org/r/346285
[13:16:46] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "Revert "Revert "Add the LVS blocks to url_downloader""" [puppet] - 10https://gerrit.wikimedia.org/r/346285 (owner: 10Alexandros Kosiaris)
[13:18:17] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy
[13:18:17] <icinga-wm>	 RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy
[13:18:17] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[13:18:17] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1003 is OK: All endpoints are healthy
[13:18:18] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy
[13:18:18] <icinga-wm>	 RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy
[13:18:27] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1002 is OK: All endpoints are healthy
[13:18:27] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy
[13:18:27] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy
[13:18:27] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy
[13:18:27] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy
[13:18:31] <Reedy>	 rofl
[13:18:33] <akosiaris>	 Reedy: yeah definitely
[13:18:37] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy
[13:18:37] <icinga-wm>	 RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy
[13:18:37] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy
[13:18:37] <icinga-wm>	 RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy
[13:18:46] <akosiaris>	 so why on earth does restbase use url-downloader?
[13:18:47] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy
[13:18:48] <akosiaris>	 mobrovac_: ^ ?
[13:18:57] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy
[13:19:07] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy
[13:19:07] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy
[13:19:09] <Reedy>	 "Test Get citation for Darth Vader returned the unexpected status 520 (expecting: 200)"
[13:19:21] <Reedy>	 That suggests it uses en.wikipedia.org
[13:19:29] <Reedy>	 Rather than the service pool, sending a host header etc
[13:20:18] <Reedy>	 maybe
[13:20:47] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[13:21:23] <akosiaris>	 Reedy: it's fine connecting to the LVS IP. It's just that it should do it directly, not via url-downloader
[13:21:29] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Revert "Revert "Revert "Revert "Add the LVS blocks to url_downloader"""" [puppet] - 10https://gerrit.wikimedia.org/r/346287
[13:21:53] <akosiaris>	 4 reverts already... let's hope it will not take another 2 years for restbase to stop using url-downloader
[13:23:59] <akosiaris>	 Reedy: actually by the citation part I guess that's citoid ?
[13:24:15] <akosiaris>	 oh please tell me it's not restbase calling citoid calling restbase to provide a citation
[13:24:26] <wikibugs_>	 06Operations, 10media-storage: Swiftrepl was stuck in an infinite loop since days - https://phabricator.wikimedia.org/T162122#3153254 (10faidon) You can kill the two thumbs if you want to move past it, as killing thumbs is almost always a safe operation. That said, there is probably an underlying bug that resu...
[13:24:28] <Reedy>	 Possibly. I really don't know that much about the services stuff
[13:24:55] <Reedy>	 akosiaris: We've seen worse ideas around here
[13:25:10] <akosiaris>	 true
[13:25:19] <akosiaris>	 ah the stories we have to tell
[13:26:17] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[13:28:17] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153673 (10jcrespo) Seems to have stopped for now since 12:34 UTC: https://grafana.wikimedia.org/dashboard/db/api-summary?panelId...
[13:31:13] <icinga-wm>	 PROBLEM - puppet last run on mw2125 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[ca-certificates]
[13:31:33] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[13:31:51] <elukey>	 ema: --^
[13:32:08] <ema>	 elukey: yeah thanks :)
[13:32:16] <ema>	 seems to be over already
[13:32:17] <elukey>	 :)
[13:32:26] <ema>	 (see #-traffic)
[13:32:43] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[13:35:40] <wikibugs_>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3153680 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1038.eqiad.wmnet'] ```  and were **ALL** successful.
[13:38:43] <icinga-wm>	 RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[13:39:33] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[13:47:43] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 21 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[13:48:20] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153759 (10jcrespo) 05Open>03stalled
[13:48:23] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 21 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[13:48:23] <icinga-wm>	 PROBLEM - Check systemd state on analytics1037 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:49:03] <icinga-wm>	 PROBLEM - Check systemd state on analytics1036 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:52:43] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[13:53:23] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[13:54:41] <elukey>	 checking 1037 and 1036 (just reimaged)
[13:58:23] <icinga-wm>	 RECOVERY - Check systemd state on analytics1037 is OK: OK - running: The system is fully operational
[13:58:39] <elukey>	 weird, puppet.service was failed for SIGTERM
[13:59:03] <icinga-wm>	 RECOVERY - Check systemd state on analytics1036 is OK: OK - running: The system is fully operational
[13:59:13] <icinga-wm>	 RECOVERY - puppet last run on mw2125 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[14:04:35] <wikibugs>	 06Operations, 10DBA, 06Labs: eqiad: (2) hardware access request for labsdb1004 & 5 refresh - https://phabricator.wikimedia.org/T161754#3153870 (10chasemp)
[14:06:05] <elukey>	 !log reimage analytics1039 and 1051 to Debian Jessie
[14:06:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:20] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 031] nagios_common: fix/enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085) (owner: 10Dzahn)
[14:06:33] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 578044
[14:10:23] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[14:14:43] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:15:23] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[14:18:23] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1073 is CRITICAL: CRITICAL: expiry mailbox lag is 594554
[14:19:43] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:21:34] <wikibugs>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3153884 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analytics1039.eqiad.wmnet', 'analytics1051....
[14:22:25] <wikibugs>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3153887 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analytics1039.eqiad.wmnet', 'analytics1051....
[14:26:33] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 470
[14:27:34] <moritzm>	 !log rebooting cerium to Linux 4.9
[14:27:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:28:26] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1073 is OK: OK: expiry mailbox lag is 54006
[14:34:16] <moritzm>	 !log rebooting xenon to Linux 4.9
[14:34:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:47] <wikibugs>	 06Operations, 06Labs, 13Patch-For-Review: Instance creation fails before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3153926 (10Andrew) I shutdown a failed instance and mounted the drive.   ``` andrew@labvirt1008:/tmp/mnt/var/lib/dhcp$ ls dhclient.eth0.leases  dhclient....
[14:38:07] <wikibugs_>	 06Operations, 10ops-esams, 10netops: esams higher than usual temperature - https://phabricator.wikimedia.org/T162152#3153928 (10faidon)
[14:39:32] <moritzm>	 !log rebooting praseodymium to Linux 4.9
[14:39:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:11] <paladox>	 moritzm hi, will linux 4.9 be available for labs too?
[14:46:02] <moritzm>	 paladox: it's already available via the "linux-meta-4.9" package on apt.wikimedia.org, will talk to labs team to use it by default for jessie images soon
[14:46:14] <paladox>	 ok thanks
[14:46:46] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:47:16] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[14:50:29] <wikibugs>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3153961 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1039.eqiad.wmnet', 'analytics1051.eqiad.wmnet'] ```  Of which those **FAILED**: ```...
[14:51:46] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:52:23] <wikibugs>	 (03PS1) 10Volans: PuppetDB backend: consistently use InvalidQueryError [software/cumin] - 10https://gerrit.wikimedia.org/r/346301 (https://phabricator.wikimedia.org/T162151)
[14:52:25] <wikibugs_>	 (03PS1) 10Volans: PuppetDB backend: forbid resource's parameters regex [software/cumin] - 10https://gerrit.wikimedia.org/r/346302 (https://phabricator.wikimedia.org/T162151)
[14:53:01] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: mediawiki: use mw_primary for jobrunner, cronjobs state [puppet] - 10https://gerrit.wikimedia.org/r/346303
[14:53:50] <paladox>	 moritzm i just installed linux 4.9 and i get this
[14:53:52] <paladox>	 groups: cannot find name for group ID 50062
[14:53:52] <paladox>	 groups: cannot find name for group ID 50380
[14:53:52] <paladox>	 groups: cannot find name for group ID 51275
[14:53:52] <paladox>	 groups: cannot find name for group ID 52308
[14:53:53] <paladox>	 groups: cannot find name for group ID 53013
[14:53:54] <paladox>	 groups: cannot find name for group ID 53259
[14:53:55] <paladox>	 now
[14:53:59] <paladox>	 I didnt see that before with 4.4
[14:58:35] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/6017/ seems ok" [puppet] - 10https://gerrit.wikimedia.org/r/346303 (owner: 10Giuseppe Lavagetto)
[14:59:02] <wikibugs_>	 06Operations, 06Labs, 13Patch-For-Review: Instance creation fails before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3153973 (10Andrew) Until I can get install1001 to STOP responding to labs dhcp requests, I'm going to assume that that's the problem.
[15:02:17] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[15:02:26] <wikibugs_>	 (03PS1) 10Ema: cache_upload: properly detect 304s when unsetting CT [puppet] - 10https://gerrit.wikimedia.org/r/346304 (https://phabricator.wikimedia.org/T162035)
[15:03:46] <wikibugs_>	 (03PS4) 10Volans: Switchdc: add profile to install and configure it [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178)
[15:05:07] <icinga-wm>	 PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: Puppet has 10 failures
[15:07:48] <icinga-wm>	 PROBLEM - puppet last run on es1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:08:14] <wikibugs>	 (03PS5) 10Volans: Switchdc: add profile to install and configure it [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178)
[15:10:07] <icinga-wm>	 RECOVERY - check_puppetrun on pay-lvs2001 is OK: OK: Puppet is currently enabled, last run 91 seconds ago with 0 failures
[15:10:08] <icinga-wm>	 PROBLEM - check_puppetrun on heka is CRITICAL: CRITICAL: Puppet has 1 failures
[15:11:32] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: Add tasks for stage 0 [switchdc] - 10https://gerrit.wikimedia.org/r/346305
[15:11:34] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Fix the stop-maintenance task [switchdc] - 10https://gerrit.wikimedia.org/r/346306
[15:11:36] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: Propery re-reference the redis task [switchdc] - 10https://gerrit.wikimedia.org/r/346307
[15:11:38] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Update the varnish task to use the new puppet scripts [switchdc] - 10https://gerrit.wikimedia.org/r/346308
[15:11:40] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: Modify the start maintenance script [switchdc] - 10https://gerrit.wikimedia.org/r/346309
[15:11:42] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Add phase-9 varnish puppet run to restore order to dc_from [switchdc] - 10https://gerrit.wikimedia.org/r/346310
[15:11:44] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: Add task to restore the TTL of discovery entries to 5 minutes [switchdc] - 10https://gerrit.wikimedia.org/r/346311
[15:12:29] <wikibugs_>	 (03CR) 10Volans: "Puppet compiler results: https://puppet-compiler.wmflabs.org/6019/" [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans)
[15:13:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Fix the stop-maintenance task [switchdc] - 10https://gerrit.wikimedia.org/r/346306 (owner: 10Giuseppe Lavagetto)
[15:13:20] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Propery re-reference the redis task [switchdc] - 10https://gerrit.wikimedia.org/r/346307 (owner: 10Giuseppe Lavagetto)
[15:13:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Modify the start maintenance script [switchdc] - 10https://gerrit.wikimedia.org/r/346309 (owner: 10Giuseppe Lavagetto)
[15:13:30] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Update the varnish task to use the new puppet scripts [switchdc] - 10https://gerrit.wikimedia.org/r/346308 (owner: 10Giuseppe Lavagetto)
[15:13:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add phase-9 varnish puppet run to restore order to dc_from [switchdc] - 10https://gerrit.wikimedia.org/r/346310 (owner: 10Giuseppe Lavagetto)
[15:13:52] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add task to restore the TTL of discovery entries to 5 minutes [switchdc] - 10https://gerrit.wikimedia.org/r/346311 (owner: 10Giuseppe Lavagetto)
[15:15:07] <icinga-wm>	 RECOVERY - check_puppetrun on heka is OK: OK: Puppet is currently enabled, last run 68 seconds ago with 0 failures
[15:16:21] <wikibugs_>	 06Operations, 10DBA, 06Labs: eqiad: (2) hardware access request for labsdb1004 & 5 refresh - https://phabricator.wikimedia.org/T161754#3154025 (10chasemp) 05Open>03stalled
[15:16:32] <wikibugs>	 06Operations, 10DBA, 06Labs: eqiad: (2) hardware access request for labsdb1006 & 7 refresh - https://phabricator.wikimedia.org/T161755#3154026 (10chasemp) 05Open>03stalled
[15:29:16] <wikibugs>	 06Operations, 06Labs, 13Patch-For-Review: Instance creation fails before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3154082 (10Andrew) Ok, I no longer thing that install1001 is involved.  Instead, it's something to do with an IP getting assigned to two instances at onc...
[15:29:27] <wikibugs_>	 06Operations, 10ops-codfw, 10DBA: codfw racking first 10 DB servers - https://phabricator.wikimedia.org/T162159#3154083 (10Marostegui)
[15:33:14] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Add tasks for stage 0 [switchdc] - 10https://gerrit.wikimedia.org/r/346305
[15:33:16] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: Fix the stop-maintenance task [switchdc] - 10https://gerrit.wikimedia.org/r/346306
[15:33:18] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Propery re-reference the redis task [switchdc] - 10https://gerrit.wikimedia.org/r/346307
[15:33:19] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: Update the varnish task to use the new puppet scripts [switchdc] - 10https://gerrit.wikimedia.org/r/346308
[15:33:22] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Modify the start maintenance script [switchdc] - 10https://gerrit.wikimedia.org/r/346309
[15:33:24] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: Add phase-9 varnish puppet run to restore order to dc_from [switchdc] - 10https://gerrit.wikimedia.org/r/346310
[15:33:26] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Add task to restore the TTL of discovery entries to 5 minutes [switchdc] - 10https://gerrit.wikimedia.org/r/346311
[15:35:10] <icinga-wm>	 PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures
[15:36:50] <icinga-wm>	 RECOVERY - puppet last run on es1011 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[15:38:50] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[15:40:10] <icinga-wm>	 RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[15:42:10] <icinga-wm>	 PROBLEM - puppet last run on ms-be1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:43:50] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[15:45:37] <paladox>	 moritzm im noticing performance improvements with linux 4.9. Unless it's because i rebooted. But ssh in is faster and running commands are faster.
[15:47:41] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Depool db1034 temporarilly to run ALTER TABLE on revision [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346313 (https://phabricator.wikimedia.org/T159319)
[15:48:08] <wikibugs_>	 (03CR) 10Marostegui: [C: 031] mariadb: Depool db1034 temporarilly to run ALTER TABLE on revision [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346313 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[15:49:20] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[15:50:01] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "This fixes puppet, but I do not think it makes it work." [puppet] - 10https://gerrit.wikimedia.org/r/345847 (https://phabricator.wikimedia.org/T157359) (owner: 10Jcrespo)
[15:50:59] <wikibugs_>	 (03CR) 10Volans: "A minor comment inline." (031 comment) [switchdc] - 10https://gerrit.wikimedia.org/r/346306 (owner: 10Giuseppe Lavagetto)
[15:51:01] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Depool db1034 temporarilly to run ANALYZE on revision [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346313 (https://phabricator.wikimedia.org/T159319)
[15:54:20] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[15:54:26] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1034 temporarilly to run ANALYZE on revision [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346313 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[15:54:42] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Depool db1034 temporarilly to run ANALYZE on revision [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346313 (https://phabricator.wikimedia.org/T159319) (owner: 10Jcrespo)
[15:56:15] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1034 for maintenance (duration: 00m 44s)
[15:56:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:30] <wikibugs_>	 (03CR) 10Volans: "A couple of minor comments inline." (033 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/346305 (owner: 10Giuseppe Lavagetto)
[15:57:25] <wikibugs_>	 (03CR) 10Volans: [C: 031] "LGTM" [switchdc] - 10https://gerrit.wikimedia.org/r/346307 (owner: 10Giuseppe Lavagetto)
[15:58:36] <wikibugs>	 (03CR) 10Volans: [C: 031] "LGTM, but depends on the final choice of the switch procedure for traffic" [switchdc] - 10https://gerrit.wikimedia.org/r/346308 (owner: 10Giuseppe Lavagetto)
[15:59:11] <wikibugs_>	 (03CR) 10Volans: [C: 031] "LGTM" [switchdc] - 10https://gerrit.wikimedia.org/r/346309 (owner: 10Giuseppe Lavagetto)
[15:59:20] <jynus>	 !log running ANALIZE on revision table for on eswiki,cawiki on db1034
[15:59:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:51] <elukey>	 !log reimage analytics1052 (Hadoop Journal node) to Debian Jessie
[15:59:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:00:04] <jouncebot>	 godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170404T1600).
[16:00:16] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3154178 (10Papaul)
[16:00:38] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3154083 (10Papaul) p:05Triage>03Normal a:03Papaul
[16:03:25] <wikibugs_>	 (03CR) 10Volans: [C: 031] "LGTM but might depend on the switch procedure for traffic" [switchdc] - 10https://gerrit.wikimedia.org/r/346310 (owner: 10Giuseppe Lavagetto)
[16:03:32] <hoo>	 !log Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds
[16:03:38] <hoo>	 sjoerddebruin: FYI^
[16:03:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:39] <stashbot>	 T132839: [RfC] Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839
[16:05:57] <wikibugs_>	 (03CR) 10Volans: "A couple of comments inline" (032 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/346311 (owner: 10Giuseppe Lavagetto)
[16:06:02] <elukey>	 no patches scheduled from what I can see - following godog's best practices: https://giphy.com/gifs/funny-happy-excited-gTNSX6N7vcKOY
[16:07:10] <icinga-wm>	 PROBLEM - puppet last run on labvirt1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:11:10] <icinga-wm>	 RECOVERY - puppet last run on ms-be1037 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[16:11:45] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: base::puppet: add puppet helper scripts (0316 comments) [puppet] - 10https://gerrit.wikimedia.org/r/346118 (owner: 10Giuseppe Lavagetto)
[16:12:58] <wikibugs_>	 (03PS4) 10Giuseppe Lavagetto: base::puppet: add puppet helper scripts [puppet] - 10https://gerrit.wikimedia.org/r/346118
[16:20:37] <wikibugs_>	 (03PS5) 10Giuseppe Lavagetto: base::puppet: add puppet helper scripts [puppet] - 10https://gerrit.wikimedia.org/r/346118
[16:21:31] <wikibugs>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3154217 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` ['analytics1052.eqiad.wmnet'] ``` The log can b...
[16:26:17] <wikibugs_>	 (03PS2) 10Ema: cache_upload: properly detect 304s when unsetting CT [puppet] - 10https://gerrit.wikimedia.org/r/346304 (https://phabricator.wikimedia.org/T162035)
[16:28:15] <wikibugs>	 (03PS3) 10Ema: cache_upload: properly detect 304s when unsetting CT [puppet] - 10https://gerrit.wikimedia.org/r/346304 (https://phabricator.wikimedia.org/T162035)
[16:29:32] <wikibugs>	 (03PS1) 10Andrew Bogott: Nova dnsmasq:  Reduce lease times and ttls by a lot [puppet] - 10https://gerrit.wikimedia.org/r/346318 (https://phabricator.wikimedia.org/T160908)
[16:31:06] <wikibugs_>	 (03PS4) 10Ema: cache_upload: override CT updates on 304s [puppet] - 10https://gerrit.wikimedia.org/r/346304 (https://phabricator.wikimedia.org/T162035)
[16:36:10] <icinga-wm>	 RECOVERY - puppet last run on labvirt1003 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[16:40:11] <wikibugs_>	 (03PS5) 10Ema: cache_upload: override CT updates on 304s [puppet] - 10https://gerrit.wikimedia.org/r/346304 (https://phabricator.wikimedia.org/T162035)
[16:45:03] <wikibugs_>	 (03CR) 10Subramanya Sastry: "https://github.com/wikimedia/mediawiki-services-parsoid-testreduce/commit/a76785d3cc77b58d3d5f3062af6ba3c4748dc1f1 now fixes testreduce to" [puppet] - 10https://gerrit.wikimedia.org/r/346209 (owner: 10Subramanya Sastry)
[16:46:11] <wikibugs>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool db1034 temporarilly to run ANALYZE on revision" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346319
[16:46:22] <wikibugs>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3154270 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1052.eqiad.wmnet'] ```  and were **ALL** successful.
[16:46:46] <wikibugs_>	 (03CR) 10Jcrespo: [C: 04-2] "Not yet, until query finishes and replication lag recovers." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346319 (owner: 10Jcrespo)
[16:46:52] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] "a couple of smallish comments but LGTM. It can even be merged as-is." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans)
[16:49:40] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=448.00 Read Requests/Sec=505.80 Write Requests/Sec=0.80 KBytes Read/Sec=36488.40 KBytes_Written/Sec=17.20
[16:54:02] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: cache::text: switch all mediawiki to codfw [puppet] - 10https://gerrit.wikimedia.org/r/346320
[16:54:04] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: discovery::app_routes: switch mediawiki to codfw [puppet] - 10https://gerrit.wikimedia.org/r/346321
[16:54:06] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: cache::text: remove direct route to mediawiki from eqiad [puppet] - 10https://gerrit.wikimedia.org/r/346322
[16:56:20] <icinga-wm>	 PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6479
[16:57:20] <icinga-wm>	 RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3130118 keys, up 12 days 42 minutes - replication_delay is 0
[16:59:11] <wikibugs>	 (03CR) 10Volans: "Done" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans)
[16:59:23] <wikibugs_>	 (03PS6) 10Volans: Switchdc: add profile to install and configure it [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178)
[17:00:05] <jouncebot>	 gwicke, cscott, arlolra, subbu, halfak, and Amir1: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170404T1700). Please do the needful.
[17:00:40] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=24.10 Read Requests/Sec=0.30 Write Requests/Sec=0.30 KBytes Read/Sec=1.20 KBytes_Written/Sec=17.60
[17:03:28] <subbu>	 we might have a parsoid deploy later on once arlo is back, but if others are deploying, please go ahead.
[17:15:07] <icinga-wm>	 RECOVERY - check_swap on lutetium is OK: SWAP OK - 100% free (7608 MB out of 7627 MB)
[17:15:40] <wikibugs_>	 (03CR) 10Volans: [C: 031] "LGTM, single nitpick comment inline (on a comment)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/346118 (owner: 10Giuseppe Lavagetto)
[17:23:06] <wikibugs>	 (03PS1) 10Urbanecm: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346324 (https://phabricator.wikimedia.org/T162089)
[17:24:11] <wikibugs>	 (03PS7) 10Volans: Switchdc: add profile to install and configure it [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178)
[17:24:55] <Urbanecm>	 Hi all, I'd like to ask everybody why today isn't Morning SWAT. Is it less frequent than other SWAT windows?
[17:25:04] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/342248 (https://phabricator.wikimedia.org/T147718) (owner: 10Gehel)
[17:25:25] <gehel>	 _joe_: Great!
[17:26:00] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "on hold until the switchover is completed." [puppet] - 10https://gerrit.wikimedia.org/r/346173 (owner: 10Giuseppe Lavagetto)
[17:27:24] <wikibugs>	 (03CR) 10Volans: [C: 032] Switchdc: add profile to install and configure it [puppet] - 10https://gerrit.wikimedia.org/r/346279 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans)
[17:28:23] <wikibugs_>	 (03PS6) 10Giuseppe Lavagetto: base::puppet: add puppet helper scripts [puppet] - 10https://gerrit.wikimedia.org/r/346118
[17:28:29] <wikibugs>	 (03PS1) 10Chad: Scap clean: l10nupdate cache is owned by www-data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346325
[17:32:25] <wikibugs>	 (03CR) 10Thcipriani: [C: 031] Scap clean: l10nupdate cache is owned by www-data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346325 (owner: 10Chad)
[17:33:41] <wikibugs_>	 (03CR) 10Chad: [C: 032] Scap clean: l10nupdate cache is owned by www-data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346325 (owner: 10Chad)
[17:34:49] <wikibugs>	 (03Merged) 10jenkins-bot: Scap clean: l10nupdate cache is owned by www-data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346325 (owner: 10Chad)
[17:36:29] <wikibugs>	 (03CR) 10jenkins-bot: Scap clean: l10nupdate cache is owned by www-data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346325 (owner: 10Chad)
[17:36:44] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] base::puppet: add puppet helper scripts [puppet] - 10https://gerrit.wikimedia.org/r/346118 (owner: 10Giuseppe Lavagetto)
[17:37:35] <wikibugs>	 (03PS7) 10Dzahn: nagios_common: fix/enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085)
[17:38:35] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Revert "base::puppet: add puppet helper scripts" [puppet] - 10https://gerrit.wikimedia.org/r/346326
[17:38:42] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Revert "base::puppet: add puppet helper scripts" [puppet] - 10https://gerrit.wikimedia.org/r/346326 (owner: 10Giuseppe Lavagetto)
[17:38:42] <mutante>	 paladox: fyi  https://phabricator.wikimedia.org/T162029
[17:38:47] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Revert "base::puppet: add puppet helper scripts" [puppet] - 10https://gerrit.wikimedia.org/r/346326 (owner: 10Giuseppe Lavagetto)
[17:38:56] <_joe_>	 grrr
[17:38:56] <mutante>	 because i saw you requesting 4.9 kernel
[17:39:20] <paladox>	 mutante thanks yep i am subscribed to that. I installed it on gerrit-test, gerrit-test3, jenkins-slave-01, phabricator.
[17:39:30] <mutante>	 paladox: cool! ok
[17:39:34] <_joe_>	 I can't start to describe the WTF I just found :P
[17:39:35] <paladox>	 yep
[17:39:47] <icinga-wm>	 PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:39:57] <icinga-wm>	 PROBLEM - puppet last run on labstore1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:07] <icinga-wm>	 PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:17] <icinga-wm>	 PROBLEM - puppet last run on cp2020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:17] <icinga-wm>	 PROBLEM - puppet last run on elastic2031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:17] <icinga-wm>	 PROBLEM - puppet last run on puppetmaster2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:17] <icinga-wm>	 PROBLEM - puppet last run on install2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:17] <icinga-wm>	 PROBLEM - puppet last run on mw2135 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:27] <icinga-wm>	 PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:27] <icinga-wm>	 PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:27] <icinga-wm>	 PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:27] <icinga-wm>	 PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:27] <icinga-wm>	 PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:28] <icinga-wm>	 PROBLEM - puppet last run on aqs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:36] <volans>	 _joe_: related to your change? or mine?
[17:40:37] <wikibugs>	 (03PS8) 10Dzahn: nagios_common: fix/enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085)
[17:40:37] <icinga-wm>	 PROBLEM - puppet last run on ms-be1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:37] <icinga-wm>	 PROBLEM - puppet last run on es1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:38] <icinga-wm>	 PROBLEM - puppet last run on elastic1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:38] <icinga-wm>	 PROBLEM - puppet last run on db2048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:40:59] <volans>	 !log stopped ircecho to avoid IRC spam
[17:41:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:41:09] <_joe_>	 volans: mine, and you'll love it
[17:41:15] <volans>	 there is a script with the same name?
[17:41:24] <_joe_>	 no
[17:41:26] <_joe_>	 Error: Failed to apply catalog: Cannot alias File[/usr/local/sbin/] to ["/usr/local/sbin"] at /etc/puppet/modules/base/manifests/puppet.pp:116; resource ["File", "/usr/local/sbin"] already declared at /etc/puppet/modules/profile/manifests/base.pp:25
[17:41:42] <_joe_>	 so that dir is defined twice
[17:41:50] <volans>	 lovely!
[17:41:52] <_joe_>	 until the declaration was the same, it didn't fail
[17:42:01] <_joe_>	 now I added a second file to the same define
[17:42:06] <_joe_>	 so formally I changed nothing
[17:42:09] <_joe_>	 and it fails
[17:42:14] <_joe_>	 how awesome is that?
[17:42:16] <volans>	 but you changed the "title"
[17:42:24] <volans>	 awesome puppet
[17:42:43] <wikibugs>	 (03CR) 10Dzahn: [C: 032] nagios_common: fix/enhance check_ssl_certfile plugin [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085) (owner: 10Dzahn)
[17:43:36] <_joe_>	 anyways, fixing it
[17:45:24] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: base::puppet: add puppet helper scripts [puppet] - 10https://gerrit.wikimedia.org/r/346328
[17:47:12] <wikibugs_>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10Legoktm) > Requests do not have a user agent  There's no user-agent header at all or is it some generic UA?
[17:48:51] <wikibugs_>	 (03PS2) 10Dzahn: Add ll to my bash aliases [puppet] - 10https://gerrit.wikimedia.org/r/346270 (owner: 10Hoo man)
[17:49:36] <wikibugs>	 (03PS3) 10Dzahn: admins::hoo: Add ll to bash aliases [puppet] - 10https://gerrit.wikimedia.org/r/346270 (owner: 10Hoo man)
[17:49:46] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] admins::hoo: Add ll to bash aliases [puppet] - 10https://gerrit.wikimedia.org/r/346270 (owner: 10Hoo man)
[17:50:05] <wikibugs>	 (03CR) 10Dzahn: [V: 032 C: 032] admins::hoo: Add ll to bash aliases [puppet] - 10https://gerrit.wikimedia.org/r/346270 (owner: 10Hoo man)
[17:50:26] <wikibugs_>	 (03PS1) 10Chad: scap clean: Only prune staging files from the active master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346330 (https://phabricator.wikimedia.org/T161643)
[17:53:23] <andrewbogott>	 !log disabling puppet on labvirts to roll out a nova config change
[17:53:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:00] <wikibugs>	 (03PS2) 10Andrew Bogott: Nova dnsmasq:  Reduce lease times and ttls by a lot [puppet] - 10https://gerrit.wikimedia.org/r/346318 (https://phabricator.wikimedia.org/T160908)
[17:55:37] <wikibugs>	 (03CR) 10Thcipriani: [C: 031] scap clean: Only prune staging files from the active master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346330 (https://phabricator.wikimedia.org/T161643) (owner: 10Chad)
[17:56:06] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] Nova dnsmasq:  Reduce lease times and ttls by a lot [puppet] - 10https://gerrit.wikimedia.org/r/346318 (https://phabricator.wikimedia.org/T160908) (owner: 10Andrew Bogott)
[17:57:08] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic, 05Security: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3154452 (10MaxSem)
[17:57:27] <icinga-wm>	 PROBLEM - puppet last run on db1086 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:58:16] <wikibugs_>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10MaxSem)
[17:59:45] <wikibugs>	 (03CR) 10Chad: [C: 032] scap clean: Only prune staging files from the active master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346330 (https://phabricator.wikimedia.org/T161643) (owner: 10Chad)
[18:02:22] <wikibugs>	 (03Merged) 10jenkins-bot: scap clean: Only prune staging files from the active master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346330 (https://phabricator.wikimedia.org/T161643) (owner: 10Chad)
[18:02:31] <wikibugs_>	 (03CR) 10jenkins-bot: scap clean: Only prune staging files from the active master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346330 (https://phabricator.wikimedia.org/T161643) (owner: 10Chad)
[18:05:05] <wikibugs_>	 (03CR) 10BBlack: cache_upload: override CT updates on 304s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/346304 (https://phabricator.wikimedia.org/T162035) (owner: 10Ema)
[18:07:17] <wikibugs>	 (03CR) 10Dzahn: "works, it turned the Icinga checks to CRIT now as it should: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=kvm" [puppet] - 10https://gerrit.wikimedia.org/r/346236 (https://phabricator.wikimedia.org/T162085) (owner: 10Dzahn)
[18:07:40] <volans>	 ircecho was me, the recovery of all the previous failure are coming
[18:09:07] <wikibugs>	 (03PS1) 10Chad: Scap clean: Shut up non-error output from git ops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346331
[18:09:18] <wikibugs_>	 (03CR) 10Chad: [C: 032] Scap clean: Shut up non-error output from git ops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346331 (owner: 10Chad)
[18:10:35] <wikibugs>	 (03Merged) 10jenkins-bot: Scap clean: Shut up non-error output from git ops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346331 (owner: 10Chad)
[18:10:47] <wikibugs_>	 (03CR) 10jenkins-bot: Scap clean: Shut up non-error output from git ops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346331 (owner: 10Chad)
[18:12:03] <wikibugs_>	 06Operations, 06Labs, 10Labs-Infrastructure, 10Monitoring, 13Patch-For-Review: monitor expiration of labvirt-star SSL cert - https://phabricator.wikimedia.org/T116332#3154475 (10Dzahn) after the merge above now Icinga checks turned CRIT as they should have. due to a bug they stayed just WARN before for l...
[18:13:07] <wikibugs>	 (03CR) 10Dzahn: ":) thanks Alex" [puppet] - 10https://gerrit.wikimedia.org/r/344729 (owner: 10Dzahn)
[18:15:06] <volans>	 all recovered, restarting ircecho
[18:15:34] <mutante>	 new alerts about expiring certs will show up, but that's the good part because they should have shown earlier
[18:16:15] <logmsgbot>	 !log demon@tin Started scap: wmf.19 bootstrap
[18:16:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:18:24] <wikibugs_>	 (03CR) 10Thcipriani: "One minor issue and one nit about deployment-prep to make sure salt isn't trying to manage the same repo as scap on the deployment boxen." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/346248 (https://phabricator.wikimedia.org/T116335) (owner: 10Mobrovac)
[18:20:32] <wikibugs_>	 (03PS1) 10Volans: Fix typo for dict access [switchdc] - 10https://gerrit.wikimedia.org/r/346332 (https://phabricator.wikimedia.org/T160178)
[18:22:48] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 55963.01943 Seconds
[18:23:27] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 56002.408584 Seconds
[18:23:37] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 56013.677784 Seconds
[18:24:27] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[18:24:37] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[18:24:47] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[18:25:05] <volans>	 gehel: FYI ^^^
[18:25:22] <wikibugs_>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3154500 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` ['analytics1053.eqiad.wmnet', 'analytics1054.eq...
[18:25:37] <icinga-wm>	 RECOVERY - puppet last run on db1086 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[18:26:16] <gehel>	 volans: thanks!
[18:29:25] <volans>	 yw :)
[18:29:59] <wikibugs_>	 (03PS2) 10Jforrester: Enable wgCiteResponsiveReferences on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/344722 (https://phabricator.wikimedia.org/T161307)
[18:30:01] <wikibugs>	 (03PS1) 10Jforrester: Enable wgCiteResponsiveReferences on bgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346333 (https://phabricator.wikimedia.org/T162145)
[18:33:47] <icinga-wm>	 PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:36:08] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Fix typo for dict access [switchdc] - 10https://gerrit.wikimedia.org/r/346332 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans)
[18:36:46] <volans>	 _joe_: I was assuming you want to first merge yours and this after to avoid all the rebasing ;)
[18:37:18] <_joe_>	 nah, watever
[18:37:37] <icinga-wm>	 PROBLEM - puppet last run on elastic1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:38:05] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] base::puppet: add puppet helper scripts [puppet] - 10https://gerrit.wikimedia.org/r/346328 (owner: 10Giuseppe Lavagetto)
[18:38:12] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: base::puppet: add puppet helper scripts [puppet] - 10https://gerrit.wikimedia.org/r/346328
[18:38:17] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] base::puppet: add puppet helper scripts [puppet] - 10https://gerrit.wikimedia.org/r/346328 (owner: 10Giuseppe Lavagetto)
[18:42:48] <icinga-wm>	 RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[18:44:47] <icinga-wm>	 PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:48:00] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: base::puppet: actually install run-puppet-agent [puppet] - 10https://gerrit.wikimedia.org/r/346337
[18:48:29] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] base::puppet: actually install run-puppet-agent [puppet] - 10https://gerrit.wikimedia.org/r/346337 (owner: 10Giuseppe Lavagetto)
[18:48:57] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] base::puppet: actually install run-puppet-agent [puppet] - 10https://gerrit.wikimedia.org/r/346337 (owner: 10Giuseppe Lavagetto)
[18:50:19] <wikibugs>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3154665 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1053.eqiad.wmnet', 'analytics1054.eqiad.wmnet'] ```  and were **ALL** successful.
[18:51:14] <wikibugs_>	 06Operations, 10Monitoring: tendril cert expiry alerts on dbmonitor hosts - https://phabricator.wikimedia.org/T162183#3154666 (10Dzahn)
[18:51:23] <wikibugs>	 06Operations, 10Monitoring: tendril cert expiry alerts on dbmonitor hosts - https://phabricator.wikimedia.org/T162183#3154678 (10Dzahn) a:03Dzahn
[18:51:31] <logmsgbot>	 !log demon@tin Finished scap: wmf.19 bootstrap (duration: 35m 16s)
[18:51:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:43] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: base::puppet: add include to run-puppet-agent [puppet] - 10https://gerrit.wikimedia.org/r/346338
[18:52:23] <wikibugs_>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3154681 (10jcrespo) User agent was "-" (without quotes).
[18:52:32] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] base::puppet: add include to run-puppet-agent [puppet] - 10https://gerrit.wikimedia.org/r/346338 (owner: 10Giuseppe Lavagetto)
[18:54:57] <wikibugs>	 06Operations, 10Monitoring: tendril cert expiry alerts on dbmonitor hosts - https://phabricator.wikimedia.org/T162183#3154688 (10Dzahn)
[18:55:00] <wikibugs>	 (03PS1) 10Chad: group0 to wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346339
[18:55:54] <logmsgbot>	 !log demon@tin Synchronized php: symlink repoint (duration: 00m 39s)
[18:55:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:26] <wikibugs>	 06Operations, 10Monitoring: tendril cert expiry alerts on dbmonitor hosts - https://phabricator.wikimedia.org/T162183#3154666 (10Dzahn) p:05Triage>03Normal
[18:57:23] <wikibugs_>	 06Operations, 10Monitoring: tendril cert expiry alerts on dbmonitor hosts - https://phabricator.wikimedia.org/T162183#3154714 (10Dzahn)
[18:57:27] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10MaxSem) We used to block API requests that provided no UA - anybody remembers why did we stop doing that?
[19:00:04] <jouncebot>	 RainbowSprinkles: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170404T1900).
[19:02:40] <icinga-wm>	 RECOVERY - puppet last run on db1054 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[19:03:21] <wikibugs_>	 (03PS2) 10Mobrovac: RESTBase: Migrate to Scap3 deployment [puppet] - 10https://gerrit.wikimedia.org/r/346248 (https://phabricator.wikimedia.org/T116335)
[19:03:56] <wikibugs_>	 (03CR) 10Mobrovac: RESTBase: Migrate to Scap3 deployment (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/346248 (https://phabricator.wikimedia.org/T116335) (owner: 10Mobrovac)
[19:04:50] <icinga-wm>	 PROBLEM - puppet last run on wtp1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:05:31] <icinga-wm>	 RECOVERY - puppet last run on elastic1033 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[19:06:21] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[19:08:31] <wikibugs>	 (03CR) 10Chad: [C: 032] group0 to wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346339 (owner: 10Chad)
[19:10:12] <wikibugs_>	 06Operations, 10RESTBase, 10RESTBase-Cassandra: cassandra client authentication - https://phabricator.wikimedia.org/T112742#3154775 (10Volker_E)
[19:10:35] <wikibugs>	 (03Merged) 10jenkins-bot: group0 to wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346339 (owner: 10Chad)
[19:10:53] <wikibugs_>	 06Operations, 13Patch-For-Review: labtestservices2001.wikimedia.org.crt - https://phabricator.wikimedia.org/T124374#1954131 (10Dzahn) came here looking to do this for a similar issue. would have been nice to see the actual command that was the solution here.
[19:10:57] <wikibugs_>	 (03CR) 10jenkins-bot: group0 to wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346339 (owner: 10Chad)
[19:11:50] <icinga-wm>	 RECOVERY - puppet last run on wtp1007 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[19:16:38] <wikibugs>	 06Operations, 10ops-eqiad, 10fundraising-tech-ops, 13Patch-For-Review: rack and cable frdev1001 - https://phabricator.wikimedia.org/T159887#3154837 (10Cmjohnson) Set the raid cfg to raid 10
[19:16:53] <wikibugs_>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack and cable frdb1002 - https://phabricator.wikimedia.org/T159886#3154838 (10Cmjohnson) Set the raid cfg to raid 10
[19:17:46] <logmsgbot>	 !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.19
[19:17:50] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[19:17:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:02] <wikibugs_>	 06Operations, 10ops-eqiad: decommission ms1003 - https://phabricator.wikimedia.org/T157975#3022054 (10Cmjohnson) @arielglenn, clean up everything but dns and update task. I will wipe it and remove dns once off the rack.
[19:22:50] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[19:26:20] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[19:27:50] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 59863.120938 Seconds
[19:28:50] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[19:29:13] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10Tgr) >>! In T162129#3154681, @jcrespo wrote: > User agent was "-" (without quotes).  More likely, nothing at all. The...
[19:32:50] <icinga-wm>	 RECOVERY - puppet last run on wtp1009 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[19:33:52] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3154911 (10Tgr) Did the IPs change periodically or did they actually use 50 boxes to query the API in parallel? The second case s...
[19:47:00] <icinga-wm>	 PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:48:20] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[19:49:29] <wikibugs>	 (03CR) 10Thcipriani: [C: 031] RESTBase: Migrate to Scap3 deployment [puppet] - 10https://gerrit.wikimedia.org/r/346248 (https://phabricator.wikimedia.org/T116335) (owner: 10Mobrovac)
[19:53:20] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[19:54:05] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3154951 (10Tgr) Seems to have restarted  (at least based on raw GET volume, haven't looked at what type it is). See P5199#27747 f...
[19:54:50] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[19:59:50] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[20:00:20] <paravoid>	 !log rolling out a border-in4 ACL update across core routers (T160055)
[20:00:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:00:28] <stashbot>	 T160055: Audit and cleanup border-in ACL on core routers - https://phabricator.wikimedia.org/T160055
[20:00:31] <paravoid>	 (the ulsfo alert was unrelated, not sure what's up with that)
[20:02:35] <wikibugs_>	 (03PS1) 10Dzahn: renew labvirt-star.eqiad.wmnet cert [puppet] - 10https://gerrit.wikimedia.org/r/346356
[20:03:05] <wikibugs>	 (03CR) 10Dzahn: [C: 04-2] renew labvirt-star.eqiad.wmnet cert [puppet] - 10https://gerrit.wikimedia.org/r/346356 (owner: 10Dzahn)
[20:04:15] <wikibugs_>	 (03PS2) 10Dzahn: renew labvirt-star.eqiad.wmnet cert [puppet] - 10https://gerrit.wikimedia.org/r/346356 (https://phabricator.wikimedia.org/T162085)
[20:05:20] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[20:05:55] <wikibugs_>	 06Operations, 10netops: Audit and cleanup border-in ACL on core routers - https://phabricator.wikimedia.org/T160055#3154987 (10faidon) 05Open>03Resolved a:03faidon I just deployed a change which puts 224/4 back to special-ranges4 and nothing seems to be broken.
[20:06:22] <wikibugs_>	 (03CR) 10Dzahn: [C: 04-1] "don't merge yet" [puppet] - 10https://gerrit.wikimedia.org/r/346356 (https://phabricator.wikimedia.org/T162085) (owner: 10Dzahn)
[20:08:13] <wikibugs_>	 (03CR) 10Hashar: [C: 031] "Zuul ssh to Gerrit using Paramiko 1.15.1.  I gave it a quick try from contint1001 by running /var/lib/zuul/gerrit-stream-events.py  None o" [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[20:10:20] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[20:12:20] <icinga-wm>	 PROBLEM - puppet last run on prometheus2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:15:00] <icinga-wm>	 RECOVERY - puppet last run on gerrit2001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[20:16:10] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10Anomie) >>! In T162129#3154715, @MaxSem wrote: > We used to block API requests that provided no UA - anybody remembers...
[20:17:48] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3155015 (10jcrespo) He is back, and now trying to parse Special pages, too :-)  > Did the IPs change periodically or did they act...
[20:22:48] <wikibugs>	 (03PS7) 10Thcipriani: Add 3d2png deploy repo to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/345377 (https://phabricator.wikimedia.org/T160185) (owner: 10MarkTraceur)
[20:24:40] <icinga-wm>	 PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:25:25] <wikibugs_>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3155030 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` ['analytics1056.eqiad.wmnet'] ``` The log can b...
[20:33:10] <wikibugs_>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3155081 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` ['analytics1055.eqiad.wmnet'] ``` The log can b...
[20:33:40] <icinga-wm>	 PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:38:19] <wikibugs_>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack and cable frdb1002 - https://phabricator.wikimedia.org/T159886#3155103 (10Jgreen) 05Open>03Resolved a:03Jgreen looks good, host is imaged and up!
[20:38:41] <wikibugs_>	 06Operations, 10ops-eqiad, 10fundraising-tech-ops, 13Patch-For-Review: rack and cable frdev1001 - https://phabricator.wikimedia.org/T159887#3155107 (10Jgreen) 05Open>03Resolved looks good, host is imaged and up!
[20:39:42] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3155110 (10Anomie) The simple solution may be to just block the IPs in varnish or the like, perhaps delivering a message like "If...
[20:40:20] <icinga-wm>	 RECOVERY - puppet last run on prometheus2004 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[20:47:20] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[20:48:22] <logmsgbot>	 !log catrope@tin Synchronized php-1.29.0-wmf.19/extensions/Echo/: T162173 (duration: 00m 43s)
[20:48:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:48:28] <stashbot>	 T162173: Clicking on Notices/Alerts issues a banner over the other icon - https://phabricator.wikimedia.org/T162173
[20:50:09] <wikibugs>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3155140 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1056.eqiad.wmnet'] ```  and were **ALL** successful.
[20:52:20] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[20:52:40] <icinga-wm>	 RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[20:53:16] <wikibugs_>	 (03PS8) 10Thcipriani: Add 3d2png deploy repo to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/345377 (https://phabricator.wikimedia.org/T160185) (owner: 10MarkTraceur)
[20:56:43] <wikibugs_>	 06Operations, 10DBA, 10MediaWiki-API, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3155143 (10Tgr) > I don't think it is malign, just parallelizing queries to load balancing source IPs (always the same ones).  Ye...
[20:58:47] <wikibugs_>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3155144 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1055.eqiad.wmnet'] ```  and were **ALL** successful.
[21:00:45] <icinga-wm>	 RECOVERY - puppet last run on db1054 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[21:00:48] <wikibugs_>	 (03CR) 10Hashar: "And I have verified the CI instances that build packages are all clean.  Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/345836 (owner: 10Faidon Liambotis)
[21:00:55] <icinga-wm>	 PROBLEM - puppet last run on stat1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:08:34] <wikibugs_>	 (03PS3) 10Dzahn: renew labvirt-star.eqiad.wmnet cert [puppet] - 10https://gerrit.wikimedia.org/r/346356 (https://phabricator.wikimedia.org/T162085)
[21:12:40] <mutante>	 !log revoked old labvirt-star.eqiad.wmnet cert - created new csr, signed it (CA: wmf_ca_2014_2017). deploying new labvirt-star.eqiad valid for 720 days (T162085)
[21:12:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:12:46] <stashbot>	 T162085: labvirt-star.eqiad.wmnet.crt expiring soon - https://phabricator.wikimedia.org/T162085
[21:13:08] <wikibugs>	 (03PS4) 10Dzahn: renew labvirt-star.eqiad.wmnet cert [puppet] - 10https://gerrit.wikimedia.org/r/346356 (https://phabricator.wikimedia.org/T162085)
[21:16:16] <wikibugs>	 (03CR) 10Dzahn: [C: 032] renew labvirt-star.eqiad.wmnet cert [puppet] - 10https://gerrit.wikimedia.org/r/346356 (https://phabricator.wikimedia.org/T162085) (owner: 10Dzahn)
[21:18:32] <mutante>	 !log running puppet across labvirt10* to replace cert
[21:18:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:19:58] <mutante>	 andrewbogott: done, icinga is all green again :)   https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=kvm
[21:20:08] <andrewbogott>	 cool
[21:20:38] <jynus>	 !log applying mariadb MDEV#7383 patch on db1034 T159319
[21:20:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:21:45] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[21:26:33] <logmsgbot>	 !log mobrovac@tin Started deploy [citoid/deploy@7dbbac8]: Bump service-runner to pick up new DNS caching
[21:26:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:26:46] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[21:27:12] <wikibugs>	 06Operations, 06Labs, 10Labs-Infrastructure: labvirt-star.eqiad.wmnet.crt expiring soon - https://phabricator.wikimedia.org/T162085#3155258 (10Dzahn)
[21:27:23] <wikibugs>	 06Operations, 06Labs, 10Labs-Infrastructure: labvirt-star.eqiad.wmnet.crt expiring soon - https://phabricator.wikimedia.org/T162085#3152009 (10Dzahn) 05Open>03Resolved
[21:27:56] <icinga-wm>	 RECOVERY - puppet last run on stat1003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[21:29:00] <wikibugs_>	 06Operations, 06Labs, 10Labs-Infrastructure: labvirt-star.eqiad.wmnet.crt expiring soon - https://phabricator.wikimedia.org/T162085#3152009 (10Dzahn) @labvirt1014:~# openssl x509 -in /etc/ssl/localcerts/labvirt-star.eqiad.wmnet.crt -text -noout | grep After             Not After : Mar 25 21:00:52 2019 GMT
[21:29:46] <logmsgbot>	 !log mobrovac@tin Finished deploy [citoid/deploy@7dbbac8]: Bump service-runner to pick up new DNS caching (duration: 03m 13s)
[21:29:48] <wikibugs>	 (03PS1) 10Andrew Bogott: Revert "Keystonehooks: Exclude 'novaobserver' user from posix user group." [puppet] - 10https://gerrit.wikimedia.org/r/346451
[21:29:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:56] <icinga-wm>	 PROBLEM - puppet last run on restbase-dev1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:31:24] <wikibugs>	 (03PS1) 10Jdlrobson: Prepare for related pages configuration change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346452 (https://phabricator.wikimedia.org/T160076)
[21:31:25] <wikibugs_>	 (03PS1) 10Jdlrobson: Remove use of blacklist for related pages feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346453 (https://phabricator.wikimedia.org/T160076)
[21:31:29] <wikibugs_>	 (03CR) 10Jdlrobson: [C: 04-1] "not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346453 (https://phabricator.wikimedia.org/T160076) (owner: 10Jdlrobson)
[21:31:34] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] Revert "Keystonehooks: Exclude 'novaobserver' user from posix user group." [puppet] - 10https://gerrit.wikimedia.org/r/346451 (owner: 10Andrew Bogott)
[21:33:54] <wikibugs_>	 06Operations, 10Parsoid: Upload of Parsoid deb package 0.7.0 failed - https://phabricator.wikimedia.org/T162200#3155315 (10ssastry)
[21:33:59] <logmsgbot>	 !log mobrovac@tin Started deploy [eventstreams/deploy@cf892f4]: Bump service-runner to pick up new DNS caching
[21:34:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:36:05] <logmsgbot>	 !log mobrovac@tin Finished deploy [eventstreams/deploy@cf892f4]: Bump service-runner to pick up new DNS caching (duration: 02m 04s)
[21:36:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:38:03] <wikibugs>	 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3155345 (10Dzahn) a:05RobH>03Ayokura
[21:38:20] <wikibugs_>	 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3066883 (10Dzahn) a:05Ayokura>03ayounsi
[21:40:36] <logmsgbot>	 !log mobrovac@tin Started deploy [mathoid/deploy@4eb6d9d]: Bump service-runner to pick up new DNS caching
[21:40:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:43:07] <wikibugs>	 06Operations, 10DBA, 10Monitoring: tendril cert expiry alerts on dbmonitor hosts - https://phabricator.wikimedia.org/T162183#3155357 (10jcrespo)
[21:44:04] <logmsgbot>	 !log mobrovac@tin Finished deploy [mathoid/deploy@4eb6d9d]: Bump service-runner to pick up new DNS caching (duration: 03m 27s)
[21:44:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:45:03] <logmsgbot>	 !log mobrovac@tin Started deploy [cxserver/deploy@b4184d3]: Bump service-runner to pick up new DNS caching
[21:45:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:48:36] <icinga-wm>	 PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0]
[21:48:40] <logmsgbot>	 !log mobrovac@tin Finished deploy [cxserver/deploy@b4184d3]: Bump service-runner to pick up new DNS caching (duration: 03m 37s)
[21:48:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:49:27] <logmsgbot>	 !log mobrovac@tin Started deploy [mobileapps/deploy@b93488f]: Bump service-runner to pick up new DNS caching
[21:49:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:22] <wikibugs_>	 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3155373 (10faidon) a:05ayounsi>03RobH
[21:52:10] <logmsgbot>	 !log mobrovac@tin Finished deploy [mobileapps/deploy@b93488f]: Bump service-runner to pick up new DNS caching (duration: 02m 43s)
[21:52:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:52:43] <wikibugs_>	 (03PS1) 10Niharika29: Update $wgLoginNotifyAttemptsKnownIP in Labs to make testing easier [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346464 (https://phabricator.wikimedia.org/T160094)
[21:52:53] <wikibugs>	 06Operations, 10ops-codfw, 10hardware-requests, 10netops, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3155377 (10Dzahn)
[21:53:19] <wikibugs>	 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3066883 (10Dzahn)
[21:53:27] <logmsgbot>	 !log mobrovac@tin Started deploy [graphoid/deploy@5fc26cb]: Bump service-runner to pick up new DNS caching
[21:53:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:06] <logmsgbot>	 !log mobrovac@tin Started deploy [trending-edits/deploy@5cc3969]: Bump service-runner to pick up new DNS caching
[21:54:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:35] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic, 13Patch-For-Review: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3155380 (10BBlack)
[21:55:42] <logmsgbot>	 !log mobrovac@tin Finished deploy [graphoid/deploy@5fc26cb]: Bump service-runner to pick up new DNS caching (duration: 02m 15s)
[21:55:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:58:56] <icinga-wm>	 RECOVERY - puppet last run on restbase-dev1003 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[22:00:46] <logmsgbot>	 !log mobrovac@tin Finished deploy [trending-edits/deploy@5cc3969]: Bump service-runner to pick up new DNS caching (duration: 06m 40s)
[22:00:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:01:38] <mobrovac>	 !log SCB all services updated to use the new service-runner DNS caching
[22:01:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:03:15] <wikibugs_>	 (03CR) 10Mobrovac: "PCC looking good - https://puppet-compiler.wmflabs.org/6023/" [puppet] - 10https://gerrit.wikimedia.org/r/346248 (https://phabricator.wikimedia.org/T116335) (owner: 10Mobrovac)
[22:09:55] <RainbowSprinkles>	 jouncebot: next
[22:09:55] <jouncebot>	 In 0 hour(s) and 50 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170404T2300)
[22:09:59] <RainbowSprinkles>	 jouncebot: now
[22:10:00] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 50 minute(s)
[22:10:06] <RainbowSprinkles>	 Ok I'm stealing a slot for scap
[22:19:55] <wikibugs>	 (03PS1) 10Catrope: Set $wgOresThresholds now that it exists in wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346470
[22:20:14] <wikibugs_>	 (03PS2) 10Jdlrobson: Remove use of blacklist for related pages feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346453 (https://phabricator.wikimedia.org/T162201)
[22:21:50] <wikibugs_>	 (03PS2) 10Catrope: Set $wgOresThresholds on wikis where both ORES and rcfilters are enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346470
[22:22:34] <wikibugs_>	 (03PS3) 10Catrope: Set $wgOresThresholds on wikis where both ORES and rcfilters are enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346470
[22:22:41] * p858snake watches RainbowSprinkles be put in cuffs by the DeploymentPolice™ for stealing
[22:23:10] <paladox>	 lol
[22:24:51] <Zppix>	 before i get yelled at i need to make this ONE joke then im done... RainbowSprinkles  do you want  some scap with a side of sprinkles?
[22:25:11] <RainbowSprinkles>	 I don't get the joke
[22:25:24] <paladox>	 lol
[22:25:27] <RainbowSprinkles>	 p858snake: I am the deployment police.
[22:25:40] <DeploymentPolice>	 We saw it.
[22:25:46] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1054 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[22:26:06] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:26:06] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:26:37] <icinga-wm>	 PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:27:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:27:36] <jynus>	 dbstore1002 is probably just temporary extra load
[22:27:46] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:46] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m3 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:46] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:46] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:46] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:47] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:47] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:48] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:48] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:56] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s4 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:27:56] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: x1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:28:46] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[22:29:26] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[22:30:36] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s4 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:30:36] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s7 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:30:36] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:30:36] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s3 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:30:36] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:30:37] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m2 on dbstore1002 is OK: OK slave_sql_state not a slave
[22:30:37] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s7 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:30:38] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s2 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:30:38] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s1 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:30:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s4 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:32:14] <wikibugs_>	 06Operations, 10Librarization, 10MediaWiki-extensions-CentralNotice, 10Traffic, 07Privacy: Split GeoIP into a new component - https://phabricator.wikimedia.org/T102848#3155492 (10Reedy)
[22:33:46] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[22:34:26] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[22:34:28] <logmsgbot>	 !log demon@tin Started scap: re-syncing old wmf.14-16 branches...cleaned up a little too much
[22:34:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:36:13] <wikibugs_>	 06Operations, 10ops-eqiad, 10netops: Faulty optics on asw-b-eqiad:xe-1/1/2 - https://phabricator.wikimedia.org/T162199#3155501 (10Reedy)
[22:38:36] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:36] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:36] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:47] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:47] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:47] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m3 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:47] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:47] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:47] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:47] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:48] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:48] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:56] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s4 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:56] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m3 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:56] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s6 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:57] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m2 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:57] <icinga-wm>	 PROBLEM - MariaDB Slave IO: x1 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:38:57] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s5 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:39:06] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:39:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:42:36] <icinga-wm>	 RECOVERY - carbon-cache too many creates on graphite1001 is OK: OK: Less than 1.00% above the threshold [500.0]
[22:44:28] <jynus>	 I am going to ack all of dbstore1002 so it doesn't keep spaming
[22:44:37] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s2 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:44:37] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m2 on dbstore1002 is OK: OK slave_sql_state not a slave
[22:44:37] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s7 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:44:37] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s4 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:44:37] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s1 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:44:37] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s7 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:44:37] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:44:38] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s3 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:44:38] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:44:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s4 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:44:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m3 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:44:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s6 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:44:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m2 on dbstore1002 is OK: OK slave_io_state not a slave
[22:44:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s5 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:44:47] <icinga-wm>	 RECOVERY - MariaDB Slave IO: x1 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[22:45:46] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[22:46:16] <icinga-wm>	 PROBLEM - Keystone admin and observer projects exist on labtestnet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:47:56] <icinga-wm>	 PROBLEM - puppet last run on mendelevium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:48:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: x1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:48:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:48:27] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s5 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:48:27] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s2 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:48:27] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s6 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[22:48:59] <wikibugs_>	 06Operations, 10Ops-Access-Requests, 10Traffic, 13Patch-For-Review: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3155545 (10Dzahn) I signed Arzhel's GPG key after he read the fingerprint to me over Hangout.   gpg --fingerprint 58E24182       Key fingerprint = 8F89 0CBB E7BE...
[22:50:46] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[22:54:36] <icinga-wm>	 RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[22:56:26] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[23:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170404T2300). Please do the needful.
[23:00:04] <jouncebot>	 Niharika, Jdlrobson, and RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[23:00:16] <Niharika>	 o/
[23:00:21] <RoanKattouw>	 I'll do the SWAT to day
[23:00:40] <RoanKattouw>	 Niharika's is labs only so that one can go first
[23:00:44] <wikibugs_>	 (03CR) 10Catrope: [C: 032] Update $wgLoginNotifyAttemptsKnownIP in Labs to make testing easier [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346464 (https://phabricator.wikimedia.org/T160094) (owner: 10Niharika29)
[23:01:26] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 19 probes of 273 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[23:01:42] <RoanKattouw>	 jdlrobson: You around for your SWAT?
[23:01:50] <jdlrobson>	 yup
[23:01:53] <wikibugs_>	 (03CR) 10Catrope: [C: 032] Set $wgOresThresholds on wikis where both ORES and rcfilters are enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346470 (owner: 10Catrope)
[23:01:59] <RoanKattouw>	 Cool
[23:02:05] <wikibugs>	 (03CR) 10Catrope: [C: 032] Prepare for related pages configuration change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346452 (https://phabricator.wikimedia.org/T160076) (owner: 10Jdlrobson)
[23:02:15] <Reedy>	 jouncebot: now
[23:02:15] <jouncebot>	 For the next 0 hour(s) and 57 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170404T2300)
[23:02:46] * RoanKattouw waits for Jenkins
[23:03:04] <Reedy>	 Wonder if https://gerrit.wikimedia.org/r/#/c/346274/ should just go out...
[23:03:10] <wikibugs>	 (03Merged) 10jenkins-bot: Update $wgLoginNotifyAttemptsKnownIP in Labs to make testing easier [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346464 (https://phabricator.wikimedia.org/T160094) (owner: 10Niharika29)
[23:03:18] <Reedy>	 Reverted in master, presumably in time for .19... But is broken in .18
[23:03:24] <wikibugs>	 (03CR) 10jenkins-bot: Update $wgLoginNotifyAttemptsKnownIP in Labs to make testing easier [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346464 (https://phabricator.wikimedia.org/T160094) (owner: 10Niharika29)
[23:03:42] <wikibugs_>	 (03Merged) 10jenkins-bot: Set $wgOresThresholds on wikis where both ORES and rcfilters are enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346470 (owner: 10Catrope)
[23:03:52] <wikibugs>	 (03CR) 10jenkins-bot: Set $wgOresThresholds on wikis where both ORES and rcfilters are enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346470 (owner: 10Catrope)
[23:04:14] <RoanKattouw>	 Reedy: Ha, nice one. I'll pick that one up, could you add it to the wiki page for the record?
[23:04:19] <wikibugs_>	 06Operations, 10Parsoid: Upload of Parsoid deb package 0.7.0 failed - https://phabricator.wikimedia.org/T162200#3155592 (10Dzahn) I'm not sure what happened here, but yes, 0.7.0 has been uploaded and also reprepro itself thinks so:   ``` [bromine:/srv/org] $ sudo -E reprepro ls parsoid parsoid | 0.7.0all | jes...
[23:04:26] <Niharika>	 Thanks RoanKattouw. 
[23:04:52] <RainbowSprinkles>	 I'm still scapping
[23:05:19] <RainbowSprinkles>	 Nearly done tho
[23:05:28] <wikibugs_>	 06Operations, 10Parsoid: Upload of Parsoid deb package 0.7.0 failed - https://phabricator.wikimedia.org/T162200#3155594 (10Dzahn) /srv/org/wikimedia/reprepro/incoming/  has:   ```  16M -rw-r--r-- 1 reprepro reprepro  16M Nov 14 18:09 parsoid_0.6.0all_all.deb 4.0K -rw-r--r-- 1 reprepro reprepro 1.9K Nov 14 18:0...
[23:05:46] <RoanKattouw>	 RainbowSprinkles: OK, will wait
[23:05:57] <RoanKattouw>	 Niharika: You can follow the automated scap to beta labs in real time here: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/149440/console
[23:06:06] <Reedy>	 Done
[23:06:13] <RainbowSprinkles>	 RoanKattouw: Feel free to start doing gerrit merges, staging stuff on tin
[23:06:19] <RainbowSprinkles>	 It's just the final apache pull I'm in now
[23:06:20] <Niharika>	 Cool. 
[23:06:42] <RoanKattouw>	 And it's done, your patch should be in labs now
[23:06:54] <Niharika>	 \m/
[23:07:11] <wikibugs>	 06Operations, 10Parsoid: Upload of Parsoid deb package 0.7.0 failed - https://phabricator.wikimedia.org/T162200#3155611 (10ssastry) >>! In T162200#3155592, @Dzahn wrote: > I'm not sure what happened here, but yes, 0.7.0 has been uploaded and also reprepro itself thinks so: >  >  > ``` > [bromine:/srv/org] $ su...
[23:08:58] <RoanKattouw>	 RainbowSprinkles: Yeah doing that already, pulling to mwdebug1002 too but that hung for some reason
[23:09:55] <RoanKattouw>	 Ugh, it's doing cdb-rebuild --no-progress which is taking forever
[23:09:59] <RoanKattouw>	 And it's also not telling me that that's what it's doing
[23:10:08] * RoanKattouw files task
[23:10:46] <icinga-wm>	 PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:12:59] <RainbowSprinkles>	 I probably already have a lock on it?
[23:13:08] <RainbowSprinkles>	 We're doing cdb rebuilds shortly as part of my scap
[23:14:02] <RoanKattouw>	 On mwdebug1002?
[23:14:15] <RoanKattouw>	 No it was just a 3-minute rebuild with no reporting whatsoever
[23:14:26] <RoanKattouw>	 It was using 90% CPU, didn't look like a lock
[23:14:33] <RoanKattouw>	 Filing a task about that
[23:14:43] <RoanKattouw>	 jdlrobson: Your change is live on mwdebug1002 now,  please test. Sorry for the delay
[23:15:01] <RainbowSprinkles>	 Doing full pulls on mwdebug is always kind of funny when we end up only doing a sync-file or sync-dir afterwords for full deployment ;-)
[23:15:07] * RainbowSprinkles chuckles about mwdebug in swat
[23:15:45] <RainbowSprinkles>	 We could abstract that into a param for scap so you don't have to ssh to that host and do funny pulls
[23:15:48] <jdlrobson>	 RoanKattouw: testing
[23:16:07] <RoanKattouw>	 Yeah, I mean it's usually fast engouh
[23:16:26] <RoanKattouw>	 And I don't even terribly mind it taking 3 minutes, as long as there's some kind of progress indication
[23:16:34] <RoanKattouw>	 (filed T162207 )
[23:16:34] <stashbot>	 T162207: When "scap pull" does a (slow) CDB rebuild, it should tell me that that's what it's doing - https://phabricator.wikimedia.org/T162207
[23:16:38] <RainbowSprinkles>	 Actually.
[23:16:41] <jdlrobson>	 looks good RoanKattouw 
[23:16:52] <RainbowSprinkles>	 RoanKattouw: tbf, it does use --no-progress ;-)
[23:16:56] <icinga-wm>	 RECOVERY - puppet last run on mendelevium is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[23:18:46] <icinga-wm>	 PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:19:01] <logmsgbot>	 !log demon@tin Finished scap: re-syncing old wmf.14-16 branches...cleaned up a little too much (duration: 44m 32s)
[23:19:06] <RoanKattouw>	 Sure, and I understand that it would want to suppress progress reporting from rebuild-cdb itself
[23:19:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:34] <RoanKattouw>	 But it should at least tell me something like 23:07:12 started cdb rebuild    23:10:45 finished cdb rebuild
[23:21:16] <RoanKattouw>	 jdlrobson: Oops I didn't actually sync your patch :(
[23:21:18] <RoanKattouw>	 Trying again
[23:21:26] <RoanKattouw>	 I was wondering why mine wasn't working..
[23:21:57] <RoanKattouw>	 jdlrobson: OK now it's there for reals
[23:22:56] <wikibugs>	 (03PS2) 10Catrope: Prepare for related pages configuration change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346452 (https://phabricator.wikimedia.org/T160076) (owner: 10Jdlrobson)
[23:23:03] <wikibugs_>	 (03CR) 10Catrope: [C: 032] Prepare for related pages configuration change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346452 (https://phabricator.wikimedia.org/T160076) (owner: 10Jdlrobson)
[23:23:07] <jdlrobson>	 woop
[23:23:15] <RoanKattouw>	 jdlrobson: Urgh, yours didn't even merge, so it's doubly not there
[23:23:25] <RoanKattouw>	 Sorry about that, I was juggling three patches and lost track
[23:24:14] <wikibugs>	 (03Merged) 10jenkins-bot: Prepare for related pages configuration change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346452 (https://phabricator.wikimedia.org/T160076) (owner: 10Jdlrobson)
[23:24:27] <wikibugs>	 (03CR) 10jenkins-bot: Prepare for related pages configuration change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346452 (https://phabricator.wikimedia.org/T160076) (owner: 10Jdlrobson)
[23:27:41] <RoanKattouw>	 Alright, my patch works
[23:27:44] <jdlrobson>	 RoanKattouw: done for reals now?
[23:27:51] <RoanKattouw>	 jdlrobson: Your patch is now on mwdebug1002 for real for real
[23:27:58] <wikibugs_>	 06Operations, 10DBA: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212#3155755 (10jcrespo)
[23:28:16] <RoanKattouw>	 I guess it might be intended to be a no-op anyway?
[23:28:59] <jdlrobson>	 RoanKattouw: yup
[23:29:03] <RainbowSprinkles>	 RoanKattouw: Oh, that rogue wmf.8 is gone now btw, and shouldn't happen again
[23:29:09] <RainbowSprinkles>	 Weird growing pains with `scap clean`
[23:29:13] <jynus>	 !log unscheduled restart of dbstore1002 T162212
[23:29:17] <RoanKattouw>	 Thanks
[23:29:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:29:21] <stashbot>	 T162212: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212
[23:29:41] <RoanKattouw>	 jdlrobson: OK lemme know when you're done checking and I'll deploy both of our patches
[23:29:45] <jdlrobson>	 yup checked again
[23:29:50] <RoanKattouw>	 Sweet, going live
[23:29:50] <jdlrobson>	 looks good
[23:31:08] <logmsgbot>	 !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Prepare for related pages config change (T160076) and set $wgOresFiltersThresholds on plwiki and ptwiki (duration: 00m 41s)
[23:31:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:31:14] <stashbot>	 T160076: Disable related pages on desktop beta mode - https://phabricator.wikimedia.org/T160076
[23:34:36] <icinga-wm>	 PROBLEM - puppet last run on labtestservices2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/usr/local/bin/labs-ip-alias-dump.py]
[23:35:16] <jdlrobson>	 thanks RoanKattouw 
[23:35:36] <wikibugs_>	 06Operations, 15User-Elukey, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3155792 (10aaron) The timeout could be conditioned on  ``` defined( 'MEDIAWIKI_JOB_RUNNER' ) ``` ...via a te...
[23:38:46] <icinga-wm>	 RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[23:45:25] <wikibugs>	 06Operations, 10DBA: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212#3155813 (10jcrespo) Probably excessive memory pressure due to heavy mysql usage... blah blah blah... restarted cleanly ... updated kernel... check new import script... check long running queries,... mysql error log is cle...
[23:46:46] <icinga-wm>	 RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[23:50:22] <wikibugs_>	 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3155832 (10jcrespo) > @leila, we can dump and copy to analytics-store, as long as there aren't any database.table name collisions.  I hope you are aware that if for any reason...
[23:50:55] <logmsgbot>	 !log reedy@tin Synchronized php-1.29.0-wmf.18/extensions/Quiz: (no justification provided) (duration: 00m 42s)
[23:51:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:51:03] <Reedy>	 !log that was Revert "Start implementing Quiz generation using TemplateParser"
[23:51:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:55:42] <Zppix>	 Reedy:  ill edit sal and consolidate the two log messages if you want
[23:55:54] <logmsgbot>	 !log tstarling@tin Synchronized php-1.29.0-wmf.18/extensions/ParserMigration: (no justification provided) (duration: 00m 39s)
[23:56:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:56:20] <wikibugs>	 06Operations, 10DBA: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212#3155839 (10jcrespo) There is also more load than usual since the 29, that could have contributed to it: https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=dbstore1002&from=...
[23:58:12] <Zppix>	 Reedy:  fixed your log mistake your welcome