[00:45:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[00:45:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[00:45:55] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[00:45:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[00:45:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[00:48:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[00:48:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[00:48:45] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[00:48:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[00:48:46] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[01:08:05] <wikibugs>	 (03PS1) 10Nemo bis: Enable ValidationStatistics log for FlaggedRevs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354615 (https://phabricator.wikimedia.org/T163107)
[01:26:45] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[01:26:45] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[01:26:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[01:26:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[01:26:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[01:26:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[01:26:55] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[01:28:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy
[01:28:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy
[01:29:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy
[01:29:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy
[01:29:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy
[01:29:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy
[01:29:45] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy
[02:14:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[02:14:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[02:14:55] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[02:14:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[02:14:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[02:17:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[02:17:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[02:17:45] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[02:17:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[02:17:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[02:23:01] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 17s)
[02:23:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:29:14] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sat May 20 02:29:14 UTC 2017 (duration 6m 13s)
[02:29:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:19:53] <wikibugs>	 06Operations, 10Traffic, 10fundraising-tech-ops: Fix nits in HTTPS/HSTS configs in externally-hosted fundraising domains - https://phabricator.wikimedia.org/T137161#3277960 (10BBlack) @Jgreen - re: civicrm, it needs to emit the HSTS header on **all** HTTPS responses.  ``` $ curl -v https://civicrm.wikimedia....
[04:10:16] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=6404.00 Read Requests/Sec=3357.70 Write Requests/Sec=19.80 KBytes Read/Sec=36646.00 KBytes_Written/Sec=2451.20
[04:21:15] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=6.50 Read Requests/Sec=2.30 Write Requests/Sec=0.60 KBytes Read/Sec=14.40 KBytes_Written/Sec=16.80
[04:49:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[04:49:55] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[04:49:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[04:49:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[04:49:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[04:51:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[04:51:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[04:52:45] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[04:52:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[04:52:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[05:21:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[05:21:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[05:22:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[05:22:55] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[05:23:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[05:23:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[05:24:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[05:24:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[05:25:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[05:25:45] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[06:00:55] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[06:01:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[06:01:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[06:01:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[06:01:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[06:03:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[06:03:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[06:03:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[06:04:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[06:04:46] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[06:05:37] <_joe_>	 wow
[06:06:39] <_joe_>	 so it seems we have some issues on a provider of citoid references
[06:08:11] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Set empty PYTHONPATH in tox.ini [debs/pybal] - 10https://gerrit.wikimedia.org/r/354547 (owner: 10Ema)
[06:08:27] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Set empty PYTHONPATH in tox.ini [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354617
[06:08:54] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Set empty PYTHONPATH in tox.ini [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354617 (owner: 10Giuseppe Lavagetto)
[06:10:33] <wikibugs>	 (03Merged) 10jenkins-bot: Set empty PYTHONPATH in tox.ini [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354617 (owner: 10Giuseppe Lavagetto)
[06:13:02] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Split IPVS Manager into the interface and manager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354506
[06:16:53] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Split IPVS Manager into the interface and manager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354506 (owner: 10Giuseppe Lavagetto)
[06:17:18] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "recheck" [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto)
[06:17:29] <wikibugs>	 (03Merged) 10jenkins-bot: Split IPVS Manager into the interface and manager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354506 (owner: 10Giuseppe Lavagetto)
[06:19:49] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Add IPVSError as a generic IPVS-related exception [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto)
[06:19:54] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "recheck" [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto)
[06:21:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto)
[06:58:37] <wikibugs>	 (03CR) 10DCausse: [C: 031] logstash - apifeature indices need to be cleaned up [puppet] - 10https://gerrit.wikimedia.org/r/353560 (owner: 10Gehel)
[07:52:00] <gehel>	 !log restart wdqs-updater on all wdqs clusters (stuck on too large update)
[07:52:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:54:05] <icinga-wm>	 PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1800.0]
[07:56:05] <icinga-wm>	 PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1800.0]
[07:56:06] <icinga-wm>	 PROBLEM - High lag on wdqs2002 is CRITICAL: CRITICAL: 31.03% of data above the critical threshold [1800.0]
[08:00:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[08:00:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[08:00:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[08:00:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[08:00:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[08:00:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[08:00:56] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[08:02:45] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy
[08:02:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy
[08:02:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy
[08:03:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy
[08:03:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy
[08:03:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy
[08:03:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy
[08:16:05] <icinga-wm>	 RECOVERY - High lag on wdqs1002 is OK: OK: Less than 30.00% above the threshold [600.0]
[08:17:05] <icinga-wm>	 RECOVERY - High lag on wdqs2002 is OK: OK: Less than 30.00% above the threshold [600.0]
[08:22:24] <logmsgbot>	 !log smalyshev@tin Started deploy [wdqs/wdqs@227ab25]: Whitelist update
[08:22:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:57] <logmsgbot>	 !log smalyshev@tin Finished deploy [wdqs/wdqs@227ab25]: Whitelist update (duration: 02m 32s)
[08:25:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:26:26] <wikibugs>	 (03CR) 10BryanDavis: "Applying this hook along with the WikimediaMessages patch in my mw-vagrant dev server looks like this: https://phabricator.wikimedia.org/F" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis)
[08:45:27] <wikibugs>	 (03PS3) 10Ema: Add IPVSError as a generic IPVS-related exception [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto)
[09:03:12] <wikibugs>	 (03CR) 10Ema: [C: 031] "Zuul is backlogged but all tests are green on my laptop. https://blog.codinghorror.com/content/images/uploads/2007/03/6a0120a85dcdae970b01" [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto)
[09:04:11] <wikibugs>	 (03CR) 10Ema: [V: 032 C: 031] Add IPVSError as a generic IPVS-related exception [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto)
[09:05:05] <icinga-wm>	 PROBLEM - High lag on wdqs2003 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1800.0]
[09:05:06] <icinga-wm>	 PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1800.0]
[09:05:06] <icinga-wm>	 PROBLEM - High lag on wdqs2002 is CRITICAL: CRITICAL: 31.03% of data above the critical threshold [1800.0]
[09:05:35] <icinga-wm>	 PROBLEM - puppet last run on labvirt1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:05] <icinga-wm>	 PROBLEM - High lag on wdqs1003 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1800.0]
[09:08:42] <thcipriani>	 !log restarting jenkins on contint1001
[09:08:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[09:17:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[09:17:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[09:18:05] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[09:18:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[09:18:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[09:18:55] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[09:18:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[09:18:56] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[09:18:56] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[09:23:26] <wikibugs>	 (03PS2) 10Volans: CLI: add -i/--interactive option [software/cumin] - 10https://gerrit.wikimedia.org/r/354442 (https://phabricator.wikimedia.org/T165838)
[09:23:28] <wikibugs>	 (03PS1) 10Volans: CLI: add -o/--output to get the output in different formats [software/cumin] - 10https://gerrit.wikimedia.org/r/354637 (https://phabricator.wikimedia.org/T165842)
[09:27:31] <wikibugs>	 (03PS1) 10Gehel: analytics - add shiny-server to reprepro [puppet] - 10https://gerrit.wikimedia.org/r/354639 (https://phabricator.wikimedia.org/T164603)
[09:31:15] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Add generic Finite States Machine [debs/pybal] - 10https://gerrit.wikimedia.org/r/302435
[09:31:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add generic Finite States Machine [debs/pybal] - 10https://gerrit.wikimedia.org/r/302435 (owner: 10Giuseppe Lavagetto)
[09:33:35] <icinga-wm>	 RECOVERY - puppet last run on labvirt1002 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[09:37:12] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508
[09:41:42] <wikibugs>	 (03CR) 10Jforrester: [C: 031] "Good to go post-train next week." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis)
[09:49:13] <wikibugs>	 (03CR) 10Bearloga: [C: 04-1] "as far as i know shiny-server deb package is only available from RStudio; CRAN repo (comprehensive r archive network) is a totally separat" [puppet] - 10https://gerrit.wikimedia.org/r/354639 (https://phabricator.wikimedia.org/T164603) (owner: 10Gehel)
[09:49:19] <wikibugs>	 (03Abandoned) 10Gehel: analytics - add shiny-server to reprepro [puppet] - 10https://gerrit.wikimedia.org/r/354639 (https://phabricator.wikimedia.org/T164603) (owner: 10Gehel)
[09:53:05] <icinga-wm>	 RECOVERY - High lag on wdqs1003 is OK: OK: Less than 30.00% above the threshold [600.0]
[09:53:06] <icinga-wm>	 RECOVERY - High lag on wdqs1002 is OK: OK: Less than 30.00% above the threshold [600.0]
[09:54:05] <icinga-wm>	 RECOVERY - High lag on wdqs2003 is OK: OK: Less than 30.00% above the threshold [600.0]
[09:54:06] <icinga-wm>	 RECOVERY - High lag on wdqs2002 is OK: OK: Less than 30.00% above the threshold [600.0]
[09:55:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto)
[10:06:41] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508
[10:09:36] <wikibugs>	 (03CR) 10Mattflaschen: [C: 031] Add Code of Conduct footer links to wikitech and mw.o [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis)
[10:21:24] <wikibugs>	 (03CR) 10Dereckson: Add Code of Conduct footer links to wikitech and mw.o (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis)
[10:43:31] <wikibugs>	 (03PS1) 10Ema: Bump version number in setup.py [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/354660
[11:05:25] <icinga-wm>	 PROBLEM - Check systemd state on restbase-dev1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[11:05:35] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.64.32.160:9042 on restbase-dev1002 is CRITICAL: connect to address 10.64.32.160 and port 9042: Connection refused
[11:05:45] <icinga-wm>	 PROBLEM - cassandra-b SSL 10.64.32.160:7001 on restbase-dev1002 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:05:45] <icinga-wm>	 PROBLEM - cassandra-b service on restbase-dev1002 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed
[11:12:25] <icinga-wm>	 RECOVERY - Check systemd state on restbase-dev1002 is OK: OK - running: The system is fully operational
[11:12:45] <icinga-wm>	 RECOVERY - cassandra-b service on restbase-dev1002 is OK: OK - cassandra-b is active
[11:14:45] <icinga-wm>	 RECOVERY - cassandra-b SSL 10.64.32.160:7001 on restbase-dev1002 is OK: SSL OK - Certificate restbase-dev1002-b valid until 2018-01-05 22:53:07 +0000 (expires in 230 days)
[11:15:35] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.64.32.160:9042 on restbase-dev1002 is OK: TCP OK - 0.036 second response time on 10.64.32.160 port 9042
[11:23:09] <wikibugs>	 (03PS1) 10Andrew Bogott: Tidy up tools node motd [puppet] - 10https://gerrit.wikimedia.org/r/354668
[11:27:37] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#3278799 (10Perhelion) Anyway the disfunction is still present if not CSS but on SVG attribute font-family.{F8131638}
[11:28:27] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#3278801 (10Perhelion) 05Resolved>03Open
[11:58:30] <wikibugs>	 06Operations, 10Gerrit, 07LDAP, 06Release-Engineering-Team (Backlog): Remove user gerrit2 from ldap - https://phabricator.wikimedia.org/T160122#3278842 (10greg)
[11:59:51] <wikibugs>	 06Operations, 05Goal, 13Patch-For-Review, 06Release-Engineering-Team (Watching / External), and 3 others: Prepare and maintain base container images - https://phabricator.wikimedia.org/T162042#3278860 (10greg)
[12:01:11] <wikibugs>	 06Operations, 10Phabricator, 06Release-Engineering-Team (Backlog): Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3278881 (10greg)
[12:06:13] <wikibugs>	 06Operations, 10Beta-Cluster-Infrastructure, 10DBA, 13Patch-For-Review, 06Release-Engineering-Team (Backlog): Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3278891 (10greg)
[12:06:55] <wikibugs>	 06Operations, 06Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3278895 (10greg)
[12:08:26] <wikibugs>	 06Operations, 10Wikimedia-Logstash, 06Release-Engineering-Team (Watching / External), 06Services (watching): Kibana functionality missing after upgrade: histograms - https://phabricator.wikimedia.org/T152782#3278917 (10greg)
[12:12:31] <wikibugs>	 06Operations, 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team (Backlog): setup a DB backed parser cache - https://phabricator.wikimedia.org/T55457#3278934 (10greg)
[12:13:11] <wikibugs>	 06Operations, 10Parsoid, 06Release-Engineering-Team (Watching / External): Provide a /parsoid directory on releases.wikimedia.org - https://phabricator.wikimedia.org/T150672#3278939 (10greg)
[12:13:30] <wikibugs>	 06Operations, 10Deployment-Systems, 06Release-Engineering-Team (Backlog): Trebuchet targets for test/testrepo are out of date - https://phabricator.wikimedia.org/T149180#3278945 (10greg)
[12:14:15] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1032 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:15:15] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1033 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:16:15] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1033 is OK: OK: YARN NodeManager analytics1033.eqiad.wmnet:8041 Node-State: RUNNING
[12:16:17] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1032 is OK: OK: YARN NodeManager analytics1032.eqiad.wmnet:8041 Node-State: RUNNING
[12:17:46] <wikibugs>	 06Operations, 10Monitoring, 06Performance-Team, 06Release-Engineering-Team (Watching / External), 07Wikimedia-Incident: MediaWiki load time regression should trigger an alarm / page people - https://phabricator.wikimedia.org/T146125#3278971 (10greg)
[12:17:55] <wikibugs>	 06Operations, 06Editing-Department, 10Monitoring, 06Release-Engineering-Team (Watching / External), 07Wikimedia-Incident: High failure rate of account creation should trigger an alarm / page people - https://phabricator.wikimedia.org/T146090#3278973 (10greg)
[12:20:49] <wikibugs>	 06Operations, 06Services, 06Release-Engineering-Team (Backlog), 07Wikimedia-Incident: Review new service 'pre-deployment to production' checklist - https://phabricator.wikimedia.org/T141897#3278992 (10greg)
[12:22:42] <wikibugs>	 06Operations, 10Monitoring, 06Release-Engineering-Team (Watching / External): "MediaWiki exceptions and fatals per minute" alarm is too slow (half an hour delay!) - https://phabricator.wikimedia.org/T141520#3278999 (10greg)
[12:29:17] <wikibugs>	 (03PS5) 10Ema: Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto)
[12:31:52] <wikibugs>	 (03PS1) 10Ema: bgp: log with util.log instead of printing to stdout [debs/pybal] - 10https://gerrit.wikimedia.org/r/354677
[12:35:47] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#3279102 (10Aklapper) @Perhelion: Does that mean https://bugzilla.gnome.org/show_bug.cgi?id=739329 should be reopened? If not, would you please...
[12:37:58] <wikibugs>	 (03PS2) 10Dereckson: Enable ValidationStatistics log for FlaggedRevs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354615 (https://phabricator.wikimedia.org/T163107) (owner: 10Nemo bis)
[12:39:41] <wikibugs>	 (03PS3) 10Ema: Add netlink-based Ipvsmanager implementation [WiP] [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354509 (owner: 10Giuseppe Lavagetto)
[12:42:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add netlink-based Ipvsmanager implementation [WiP] [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354509 (owner: 10Giuseppe Lavagetto)
[12:42:26] <wikibugs>	 (03CR) 10Dereckson: "Indeed, there is no trace of this setting on the source code, or anywhere else." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354600 (owner: 10Nemo bis)
[12:46:42] <wikibugs>	 (03CR) 10Ema: [C: 031] Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto)
[12:46:45] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Bump version number in setup.py [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/354660 (owner: 10Ema)
[12:48:16] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] "Thanks, this is one of the small things I needed to do for a long time and always forgot about." [puppet] - 10https://gerrit.wikimedia.org/r/354095 (owner: 10Alexandros Kosiaris)
[12:49:47] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto)
[12:52:50] <wikibugs>	 06Operations, 05MW-1.30-release-notes, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Thumbor should reject thumbnail requests that are the same size as the original or bigger - https://phabricator.wikimedia.org/T150741#3279146 (10Gilles) Since the current migration scripts are very slow on Terbium w...
[13:10:04] <wikibugs>	 (03PS1) 10Ema: Instrumentation fixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354680 (https://phabricator.wikimedia.org/T103882)
[13:15:56] <wikibugs>	 (03PS2) 10Mark Bergsma: Add pyenv and pydev config files to .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/353988
[13:15:58] <wikibugs>	 (03PS1) 10Mark Bergsma: Cleanup whitespace [debs/pybal] - 10https://gerrit.wikimedia.org/r/354683
[13:16:02] <wikibugs>	 (03PS1) 10Mark Bergsma: Create new BGP message classes for incremental construction [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684
[13:16:06] <wikibugs>	 (03PS1) 10Mark Bergsma: Use a bytearray to encode prefixes in BGP.encodePrefixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354685
[13:16:09] <wikibugs>	 (03PS1) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686
[13:16:12] <mark>	 :-P
[13:21:46] <wikibugs>	 06Operations, 05MW-1.30-release-notes, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Thumbor should reject thumbnail requests that are the same size as the original or bigger - https://phabricator.wikimedia.org/T150741#3279381 (10Gilles) I've just realized that the existing migration scripts probabl...
[13:22:14] <wikibugs>	 (03CR) 10Ema: [C: 031] Add pyenv and pydev config files to .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/353988 (owner: 10Mark Bergsma)
[13:41:15] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[13:41:15] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[13:42:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[13:42:15] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[13:42:15] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[13:43:44] <greg-g>	 ?
[13:44:03] <wikibugs>	 (03PS3) 10Ema: pybal: bind instrumentation TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/348074 (https://phabricator.wikimedia.org/T103882)
[13:44:05] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[13:44:15] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[13:44:15] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[13:44:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[13:45:05] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[13:48:10] <wikibugs>	 (03CR) 10Mark Bergsma: [C: 031] pybal: bind instrumentation TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/348074 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema)
[13:52:51] <wikibugs>	 06Operations, 07Puppet, 06Labs: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885#3279509 (10Paladox)
[13:54:26] <wikibugs>	 06Operations, 07Puppet, 06Labs: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885#3279530 (10Paladox)
[13:58:08] <wikibugs>	 06Operations, 10ops-eqiad, 15User-fgiunchedi: Debug HP raid cache disabled errors on ms-be1019/20/21 - https://phabricator.wikimedia.org/T163777#3279562 (10fgiunchedi) @Cmjohnson sounds good! let's try that on ms-be1019 on Tues
[14:00:53] <wikibugs>	 (03CR) 10Ema: [C: 031] Cleanup whitespace [debs/pybal] - 10https://gerrit.wikimedia.org/r/354683 (owner: 10Mark Bergsma)
[14:06:22] <wikibugs>	 (03CR) 10Ema: Create new BGP message classes for incremental construction (032 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684 (owner: 10Mark Bergsma)
[14:29:48] <wikibugs>	 06Operations, 07Puppet, 06Labs: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885#3279670 (10Dzahn) https://ask.puppet.com/question/132/does-filebucket-need-periodic-maintenance-cleaning/  https://bugzilla.mozilla.org/show_bug.cgi?id=624166
[14:32:02] <wikibugs>	 (03CR) 10Mark Bergsma: [C: 032] Add pyenv and pydev config files to .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/353988 (owner: 10Mark Bergsma)
[14:33:23] <wikibugs>	 (03CR) 10Mark Bergsma: [C: 032] Cleanup whitespace [debs/pybal] - 10https://gerrit.wikimedia.org/r/354683 (owner: 10Mark Bergsma)
[14:38:23] <wikibugs>	 (03Merged) 10jenkins-bot: Add pyenv and pydev config files to .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/353988 (owner: 10Mark Bergsma)
[14:38:24] <wikibugs>	 (03Merged) 10jenkins-bot: Cleanup whitespace [debs/pybal] - 10https://gerrit.wikimedia.org/r/354683 (owner: 10Mark Bergsma)
[14:47:29] <wikibugs>	 (03CR) 10Mark Bergsma: Create new BGP message classes for incremental construction (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684 (owner: 10Mark Bergsma)
[14:49:44] <wikibugs>	 (03CR) 10Ema: [C: 031] Use a bytearray to encode prefixes in BGP.encodePrefixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354685 (owner: 10Mark Bergsma)
[15:07:44] <wikibugs>	 (03PS1) 10Mark Bergsma: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711
[15:09:48] <mark>	 ema ^
[15:12:00] <wikibugs>	 (03PS2) 10Mark Bergsma: Use a bytearray to encode prefixes in BGP.encodePrefixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354685
[15:12:02] <wikibugs>	 (03PS2) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686
[15:12:04] <wikibugs>	 (03PS2) 10Mark Bergsma: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711
[15:16:25] <wikibugs>	 (03PS3) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686
[15:16:27] <wikibugs>	 (03PS3) 10Mark Bergsma: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711
[15:23:52] <ema>	 mark: nice!
[15:52:45] <wikibugs>	 06Operations, 05Continuous-Integration-Scaling, 07Nodepool, 06Release-Engineering-Team (Backlog), 07WorkType-NewFunctionality: Backport python-shade from debian/testing to jessie-wikimedia - https://phabricator.wikimedia.org/T107267#3279958 (10greg)
[15:56:36] <wikibugs>	 (03PS1) 10Mark Bergsma: Allow for withdrawals and NLRI to be sent in the same UPDATE [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723
[16:07:51] <wikibugs>	 (03PS4) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686
[16:07:53] <wikibugs>	 (03PS4) 10Mark Bergsma: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711
[16:07:55] <wikibugs>	 (03PS2) 10Mark Bergsma: Allow for withdrawals and NLRI to be sent in the same UPDATE [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723
[16:08:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[16:08:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[16:08:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[16:08:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[16:08:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[16:08:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[16:09:05] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[16:11:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy
[16:11:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy
[16:11:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy
[16:11:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy
[16:11:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy
[16:11:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy
[16:11:55] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy
[16:19:58] <greg-g>	 does citoid do this often? :)
[16:24:05] <wikibugs>	 (03PS1) 10Jforrester: Beta Features: Update last-big-change-plus-six-month dates in comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354731
[16:24:07] <wikibugs>	 (03PS1) 10Jforrester: Cleanup ORES config: Drop wgOresExtensionStatus (default), alphasort [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354732
[16:41:07] <wikibugs>	 (03PS1) 10Framawiki: Change $wgUploadNavigationUrl on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354737 (https://phabricator.wikimedia.org/T165901)
[16:42:35] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2059693
[16:53:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, maybe add a comment about the need of running the script from a puppet.git checkout" [puppet] - 10https://gerrit.wikimedia.org/r/354105 (https://phabricator.wikimedia.org/T165583) (owner: 10Volans)
[16:54:29] <wikibugs>	 (03PS2) 10Framawiki: Set $wgUploadNavigationUrl on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354737 (https://phabricator.wikimedia.org/T165901)
[17:02:33] <wikibugs>	 (03PS1) 10Daniel Kinzler: New SSH key for me [puppet] - 10https://gerrit.wikimedia.org/r/354743
[17:02:47] <Reedy>	 addshore: Tell Daniel I just found some opsen
[17:03:42] <wikibugs>	 (03CR) 10Addshore: [C: 031] New SSH key for me [puppet] - 10https://gerrit.wikimedia.org/r/354743 (owner: 10Daniel Kinzler)
[17:05:12] <wikibugs>	 (03PS2) 10Daniel Kinzler: New SSH key for me [puppet] - 10https://gerrit.wikimedia.org/r/354743
[17:05:15] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:05:16] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:05:16] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:05:25] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:05:25] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:05:37] <wikibugs>	 (03PS4) 10Volans: Puppet: run-puppet-agent, add --failed-only option [puppet] - 10https://gerrit.wikimedia.org/r/349416
[17:05:46] <wikibugs>	 (03CR) 10Ema: [C: 031] Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711 (owner: 10Mark Bergsma)
[17:05:52] <godog>	 greg-g: IIRC it is due to an external resource being flaky, I'll take a look
[17:07:20] <wikibugs>	 (03CR) 10Volans: [C: 032] New SSH key for me [puppet] - 10https://gerrit.wikimedia.org/r/354743 (owner: 10Daniel Kinzler)
[17:08:15] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[17:08:15] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[17:09:15] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[17:09:15] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[17:09:16] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[17:18:52] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: Add netlink-based Ipvsmanager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354509
[17:19:24] <godog>	 yeah there's a "Maximum call stack size exceeded" from a worker that gets restarted, afaik that shouldn't cause a timeout tho
[17:20:27] <wikibugs>	 (03CR) 10Volans: "I had verified the request in person." [puppet] - 10https://gerrit.wikimedia.org/r/354743 (owner: 10Daniel Kinzler)
[17:22:35] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 39886
[17:22:36] <wikibugs>	 (03PS1) 10Ema: Move IP classes to pybal.ip [debs/pybal] - 10https://gerrit.wikimedia.org/r/354746
[17:25:25] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:25:25] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:25:25] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:26:15] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:26:25] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[17:28:15] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[17:28:15] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[17:28:15] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[17:28:15] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[17:28:16] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[17:29:27] <addshore>	 !log addshore@terbium:/srv/mediawiki/php-1.30.0-wmf.1$ mwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log
[17:29:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:29:39] <addshore>	 !log addshore@terbium:/srv/mediawiki/php-1.30.0-wmf.1$ mwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log T164407
[17:29:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:29:47] <stashbot>	 T164407: Cognate has been disabled from WMF because it caused an outage on x1 by overtaking 10000 concurrent connections - https://phabricator.wikimedia.org/T164407
[17:33:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add netlink-based Ipvsmanager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354509 (owner: 10Giuseppe Lavagetto)
[19:02:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[19:02:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[19:02:56] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[19:03:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[19:03:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[19:03:05] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[19:03:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy
[19:03:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy
[19:03:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy
[19:03:55] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy
[19:03:56] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy
[19:04:05] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy
[20:18:25] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[20:18:35] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[20:18:35] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[20:18:35] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[20:19:25] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received
[20:21:25] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[20:21:25] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[20:21:25] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[20:21:25] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[20:22:15] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[20:49:12] <wikibugs>	 (03PS1) 10BryanDavis: tools: have maintain-kubeusers chown $HOME/.kube [puppet] - 10https://gerrit.wikimedia.org/r/354839 (https://phabricator.wikimedia.org/T165875)
[20:56:49] <wikibugs>	 (03PS1) 10BryanDavis: Use wikitech db group instead of labswiki+ labtestwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354856
[20:57:20] <wikibugs>	 (03CR) 10BryanDavis: Add Code of Conduct footer links to wikitech and mw.o (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis)
[21:00:09] <wikibugs>	 (03PS2) 10BryanDavis: Add Code of Conduct footer links to wikitech and mw.o [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612
[21:00:11] <wikibugs>	 (03PS2) 10BryanDavis: Use wikitech db group instead of labswiki+ labtestwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354856
[21:54:44] <Dereckson>	 !log Run namespaceDupe on fr.wikisource and en.wikisource
[21:54:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:22:25] <icinga-wm>	 PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:50:25] <icinga-wm>	 RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures