[00:45:55] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [00:45:55] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [00:45:55] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [00:45:55] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [00:45:55] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [00:48:45] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [00:48:45] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [00:48:45] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [00:48:45] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [00:48:46] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [01:08:05] (03PS1) 10Nemo bis: Enable ValidationStatistics log for FlaggedRevs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354615 (https://phabricator.wikimedia.org/T163107) [01:26:45] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [01:26:45] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [01:26:55] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [01:26:55] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [01:26:55] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [01:26:55] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [01:26:55] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [01:28:45] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [01:28:45] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [01:29:45] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [01:29:45] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [01:29:45] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [01:29:45] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [01:29:45] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [02:14:55] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:14:55] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:14:55] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:14:55] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:14:55] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:17:45] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [02:17:45] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [02:17:45] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [02:17:45] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [02:17:45] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [02:23:01] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 17s) [02:23:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:29:14] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat May 20 02:29:14 UTC 2017 (duration 6m 13s) [02:29:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:19:53] 06Operations, 10Traffic, 10fundraising-tech-ops: Fix nits in HTTPS/HSTS configs in externally-hosted fundraising domains - https://phabricator.wikimedia.org/T137161#3277960 (10BBlack) @Jgreen - re: civicrm, it needs to emit the HSTS header on **all** HTTPS responses. ``` $ curl -v https://civicrm.wikimedia.... [04:10:16] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=6404.00 Read Requests/Sec=3357.70 Write Requests/Sec=19.80 KBytes Read/Sec=36646.00 KBytes_Written/Sec=2451.20 [04:21:15] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=6.50 Read Requests/Sec=2.30 Write Requests/Sec=0.60 KBytes Read/Sec=14.40 KBytes_Written/Sec=16.80 [04:49:55] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:49:55] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:49:55] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:49:56] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:49:56] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:51:55] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [04:51:55] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [04:52:45] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [04:52:45] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [04:52:45] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [05:21:56] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:21:56] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:22:55] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:22:55] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:23:05] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:23:55] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [05:24:55] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [05:24:55] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [05:25:45] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [05:25:45] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [06:00:55] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [06:01:05] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [06:01:05] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [06:01:05] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [06:01:55] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [06:03:55] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [06:03:55] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [06:03:55] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [06:04:45] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [06:04:46] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [06:05:37] <_joe_> wow [06:06:39] <_joe_> so it seems we have some issues on a provider of citoid references [06:08:11] (03CR) 10Giuseppe Lavagetto: [C: 032] Set empty PYTHONPATH in tox.ini [debs/pybal] - 10https://gerrit.wikimedia.org/r/354547 (owner: 10Ema) [06:08:27] (03PS1) 10Giuseppe Lavagetto: Set empty PYTHONPATH in tox.ini [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354617 [06:08:54] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Set empty PYTHONPATH in tox.ini [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354617 (owner: 10Giuseppe Lavagetto) [06:10:33] (03Merged) 10jenkins-bot: Set empty PYTHONPATH in tox.ini [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354617 (owner: 10Giuseppe Lavagetto) [06:13:02] (03PS3) 10Giuseppe Lavagetto: Split IPVS Manager into the interface and manager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354506 [06:16:53] (03CR) 10Giuseppe Lavagetto: [C: 032] Split IPVS Manager into the interface and manager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354506 (owner: 10Giuseppe Lavagetto) [06:17:18] (03CR) 10Giuseppe Lavagetto: "recheck" [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto) [06:17:29] (03Merged) 10jenkins-bot: Split IPVS Manager into the interface and manager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354506 (owner: 10Giuseppe Lavagetto) [06:19:49] (03CR) 10Giuseppe Lavagetto: [C: 032] Add IPVSError as a generic IPVS-related exception [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto) [06:19:54] (03CR) 10Giuseppe Lavagetto: "recheck" [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto) [06:21:01] (03CR) 10jerkins-bot: [V: 04-1] Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto) [06:58:37] (03CR) 10DCausse: [C: 031] logstash - apifeature indices need to be cleaned up [puppet] - 10https://gerrit.wikimedia.org/r/353560 (owner: 10Gehel) [07:52:00] !log restart wdqs-updater on all wdqs clusters (stuck on too large update) [07:52:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:54:05] PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1800.0] [07:56:05] PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1800.0] [07:56:06] PROBLEM - High lag on wdqs2002 is CRITICAL: CRITICAL: 31.03% of data above the critical threshold [1800.0] [08:00:55] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [08:00:55] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [08:00:55] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [08:00:55] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [08:00:55] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [08:00:55] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [08:00:56] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [08:02:45] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [08:02:55] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [08:02:55] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [08:03:45] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [08:03:45] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [08:03:45] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [08:03:45] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [08:16:05] RECOVERY - High lag on wdqs1002 is OK: OK: Less than 30.00% above the threshold [600.0] [08:17:05] RECOVERY - High lag on wdqs2002 is OK: OK: Less than 30.00% above the threshold [600.0] [08:22:24] !log smalyshev@tin Started deploy [wdqs/wdqs@227ab25]: Whitelist update [08:22:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:24:57] !log smalyshev@tin Finished deploy [wdqs/wdqs@227ab25]: Whitelist update (duration: 02m 32s) [08:25:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:26] (03CR) 10BryanDavis: "Applying this hook along with the WikimediaMessages patch in my mw-vagrant dev server looks like this: https://phabricator.wikimedia.org/F" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis) [08:45:27] (03PS3) 10Ema: Add IPVSError as a generic IPVS-related exception [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto) [09:03:12] (03CR) 10Ema: [C: 031] "Zuul is backlogged but all tests are green on my laptop. https://blog.codinghorror.com/content/images/uploads/2007/03/6a0120a85dcdae970b01" [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto) [09:04:11] (03CR) 10Ema: [V: 032 C: 031] Add IPVSError as a generic IPVS-related exception [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354507 (owner: 10Giuseppe Lavagetto) [09:05:05] PROBLEM - High lag on wdqs2003 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1800.0] [09:05:06] PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1800.0] [09:05:06] PROBLEM - High lag on wdqs2002 is CRITICAL: CRITICAL: 31.03% of data above the critical threshold [1800.0] [09:05:35] PROBLEM - puppet last run on labvirt1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:07:05] PROBLEM - High lag on wdqs1003 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1800.0] [09:08:42] !log restarting jenkins on contint1001 [09:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:17:05] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [09:17:05] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [09:17:55] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [09:18:05] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [09:18:05] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [09:18:45] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [09:18:55] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [09:18:55] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [09:18:56] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [09:18:56] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [09:23:26] (03PS2) 10Volans: CLI: add -i/--interactive option [software/cumin] - 10https://gerrit.wikimedia.org/r/354442 (https://phabricator.wikimedia.org/T165838) [09:23:28] (03PS1) 10Volans: CLI: add -o/--output to get the output in different formats [software/cumin] - 10https://gerrit.wikimedia.org/r/354637 (https://phabricator.wikimedia.org/T165842) [09:27:31] (03PS1) 10Gehel: analytics - add shiny-server to reprepro [puppet] - 10https://gerrit.wikimedia.org/r/354639 (https://phabricator.wikimedia.org/T164603) [09:31:15] (03PS3) 10Giuseppe Lavagetto: Add generic Finite States Machine [debs/pybal] - 10https://gerrit.wikimedia.org/r/302435 [09:31:24] (03CR) 10jerkins-bot: [V: 04-1] Add generic Finite States Machine [debs/pybal] - 10https://gerrit.wikimedia.org/r/302435 (owner: 10Giuseppe Lavagetto) [09:33:35] RECOVERY - puppet last run on labvirt1002 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [09:37:12] (03PS3) 10Giuseppe Lavagetto: Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 [09:41:42] (03CR) 10Jforrester: [C: 031] "Good to go post-train next week." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis) [09:49:13] (03CR) 10Bearloga: [C: 04-1] "as far as i know shiny-server deb package is only available from RStudio; CRAN repo (comprehensive r archive network) is a totally separat" [puppet] - 10https://gerrit.wikimedia.org/r/354639 (https://phabricator.wikimedia.org/T164603) (owner: 10Gehel) [09:49:19] (03Abandoned) 10Gehel: analytics - add shiny-server to reprepro [puppet] - 10https://gerrit.wikimedia.org/r/354639 (https://phabricator.wikimedia.org/T164603) (owner: 10Gehel) [09:53:05] RECOVERY - High lag on wdqs1003 is OK: OK: Less than 30.00% above the threshold [600.0] [09:53:06] RECOVERY - High lag on wdqs1002 is OK: OK: Less than 30.00% above the threshold [600.0] [09:54:05] RECOVERY - High lag on wdqs2003 is OK: OK: Less than 30.00% above the threshold [600.0] [09:54:06] RECOVERY - High lag on wdqs2002 is OK: OK: Less than 30.00% above the threshold [600.0] [09:55:27] (03CR) 10jerkins-bot: [V: 04-1] Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto) [10:06:41] (03PS4) 10Giuseppe Lavagetto: Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 [10:09:36] (03CR) 10Mattflaschen: [C: 031] Add Code of Conduct footer links to wikitech and mw.o [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis) [10:21:24] (03CR) 10Dereckson: Add Code of Conduct footer links to wikitech and mw.o (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis) [10:43:31] (03PS1) 10Ema: Bump version number in setup.py [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/354660 [11:05:25] PROBLEM - Check systemd state on restbase-dev1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:05:35] PROBLEM - cassandra-b CQL 10.64.32.160:9042 on restbase-dev1002 is CRITICAL: connect to address 10.64.32.160 and port 9042: Connection refused [11:05:45] PROBLEM - cassandra-b SSL 10.64.32.160:7001 on restbase-dev1002 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [11:05:45] PROBLEM - cassandra-b service on restbase-dev1002 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed [11:12:25] RECOVERY - Check systemd state on restbase-dev1002 is OK: OK - running: The system is fully operational [11:12:45] RECOVERY - cassandra-b service on restbase-dev1002 is OK: OK - cassandra-b is active [11:14:45] RECOVERY - cassandra-b SSL 10.64.32.160:7001 on restbase-dev1002 is OK: SSL OK - Certificate restbase-dev1002-b valid until 2018-01-05 22:53:07 +0000 (expires in 230 days) [11:15:35] RECOVERY - cassandra-b CQL 10.64.32.160:9042 on restbase-dev1002 is OK: TCP OK - 0.036 second response time on 10.64.32.160 port 9042 [11:23:09] (03PS1) 10Andrew Bogott: Tidy up tools node motd [puppet] - 10https://gerrit.wikimedia.org/r/354668 [11:27:37] 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#3278799 (10Perhelion) Anyway the disfunction is still present if not CSS but on SVG attribute font-family.{F8131638} [11:28:27] 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#3278801 (10Perhelion) 05Resolved>03Open [11:58:30] 06Operations, 10Gerrit, 07LDAP, 06Release-Engineering-Team (Backlog): Remove user gerrit2 from ldap - https://phabricator.wikimedia.org/T160122#3278842 (10greg) [11:59:51] 06Operations, 05Goal, 13Patch-For-Review, 06Release-Engineering-Team (Watching / External), and 3 others: Prepare and maintain base container images - https://phabricator.wikimedia.org/T162042#3278860 (10greg) [12:01:11] 06Operations, 10Phabricator, 06Release-Engineering-Team (Backlog): Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3278881 (10greg) [12:06:13] 06Operations, 10Beta-Cluster-Infrastructure, 10DBA, 13Patch-For-Review, 06Release-Engineering-Team (Backlog): Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3278891 (10greg) [12:06:55] 06Operations, 06Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3278895 (10greg) [12:08:26] 06Operations, 10Wikimedia-Logstash, 06Release-Engineering-Team (Watching / External), 06Services (watching): Kibana functionality missing after upgrade: histograms - https://phabricator.wikimedia.org/T152782#3278917 (10greg) [12:12:31] 06Operations, 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team (Backlog): setup a DB backed parser cache - https://phabricator.wikimedia.org/T55457#3278934 (10greg) [12:13:11] 06Operations, 10Parsoid, 06Release-Engineering-Team (Watching / External): Provide a /parsoid directory on releases.wikimedia.org - https://phabricator.wikimedia.org/T150672#3278939 (10greg) [12:13:30] 06Operations, 10Deployment-Systems, 06Release-Engineering-Team (Backlog): Trebuchet targets for test/testrepo are out of date - https://phabricator.wikimedia.org/T149180#3278945 (10greg) [12:14:15] PROBLEM - YARN NodeManager Node-State on analytics1032 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:15:15] PROBLEM - YARN NodeManager Node-State on analytics1033 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:16:15] RECOVERY - YARN NodeManager Node-State on analytics1033 is OK: OK: YARN NodeManager analytics1033.eqiad.wmnet:8041 Node-State: RUNNING [12:16:17] RECOVERY - YARN NodeManager Node-State on analytics1032 is OK: OK: YARN NodeManager analytics1032.eqiad.wmnet:8041 Node-State: RUNNING [12:17:46] 06Operations, 10Monitoring, 06Performance-Team, 06Release-Engineering-Team (Watching / External), 07Wikimedia-Incident: MediaWiki load time regression should trigger an alarm / page people - https://phabricator.wikimedia.org/T146125#3278971 (10greg) [12:17:55] 06Operations, 06Editing-Department, 10Monitoring, 06Release-Engineering-Team (Watching / External), 07Wikimedia-Incident: High failure rate of account creation should trigger an alarm / page people - https://phabricator.wikimedia.org/T146090#3278973 (10greg) [12:20:49] 06Operations, 06Services, 06Release-Engineering-Team (Backlog), 07Wikimedia-Incident: Review new service 'pre-deployment to production' checklist - https://phabricator.wikimedia.org/T141897#3278992 (10greg) [12:22:42] 06Operations, 10Monitoring, 06Release-Engineering-Team (Watching / External): "MediaWiki exceptions and fatals per minute" alarm is too slow (half an hour delay!) - https://phabricator.wikimedia.org/T141520#3278999 (10greg) [12:29:17] (03PS5) 10Ema: Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto) [12:31:52] (03PS1) 10Ema: bgp: log with util.log instead of printing to stdout [debs/pybal] - 10https://gerrit.wikimedia.org/r/354677 [12:35:47] 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#3279102 (10Aklapper) @Perhelion: Does that mean https://bugzilla.gnome.org/show_bug.cgi?id=739329 should be reopened? If not, would you please... [12:37:58] (03PS2) 10Dereckson: Enable ValidationStatistics log for FlaggedRevs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354615 (https://phabricator.wikimedia.org/T163107) (owner: 10Nemo bis) [12:39:41] (03PS3) 10Ema: Add netlink-based Ipvsmanager implementation [WiP] [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354509 (owner: 10Giuseppe Lavagetto) [12:42:15] (03CR) 10jerkins-bot: [V: 04-1] Add netlink-based Ipvsmanager implementation [WiP] [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354509 (owner: 10Giuseppe Lavagetto) [12:42:26] (03CR) 10Dereckson: "Indeed, there is no trace of this setting on the source code, or anywhere else." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354600 (owner: 10Nemo bis) [12:46:42] (03CR) 10Ema: [C: 031] Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto) [12:46:45] (03CR) 10Giuseppe Lavagetto: [C: 032] Bump version number in setup.py [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/354660 (owner: 10Ema) [12:48:16] (03CR) 10Giuseppe Lavagetto: [C: 031] "Thanks, this is one of the small things I needed to do for a long time and always forgot about." [puppet] - 10https://gerrit.wikimedia.org/r/354095 (owner: 10Alexandros Kosiaris) [12:49:47] (03CR) 10Giuseppe Lavagetto: [C: 032] Add generic Finite States Machine [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354508 (owner: 10Giuseppe Lavagetto) [12:52:50] 06Operations, 05MW-1.30-release-notes, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Thumbor should reject thumbnail requests that are the same size as the original or bigger - https://phabricator.wikimedia.org/T150741#3279146 (10Gilles) Since the current migration scripts are very slow on Terbium w... [13:10:04] (03PS1) 10Ema: Instrumentation fixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354680 (https://phabricator.wikimedia.org/T103882) [13:15:56] (03PS2) 10Mark Bergsma: Add pyenv and pydev config files to .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/353988 [13:15:58] (03PS1) 10Mark Bergsma: Cleanup whitespace [debs/pybal] - 10https://gerrit.wikimedia.org/r/354683 [13:16:02] (03PS1) 10Mark Bergsma: Create new BGP message classes for incremental construction [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684 [13:16:06] (03PS1) 10Mark Bergsma: Use a bytearray to encode prefixes in BGP.encodePrefixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354685 [13:16:09] (03PS1) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686 [13:16:12] :-P [13:21:46] 06Operations, 05MW-1.30-release-notes, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Thumbor should reject thumbnail requests that are the same size as the original or bigger - https://phabricator.wikimedia.org/T150741#3279381 (10Gilles) I've just realized that the existing migration scripts probabl... [13:22:14] (03CR) 10Ema: [C: 031] Add pyenv and pydev config files to .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/353988 (owner: 10Mark Bergsma) [13:41:15] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:41:15] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:42:05] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:42:15] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:42:15] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:43:44] ? [13:44:03] (03PS3) 10Ema: pybal: bind instrumentation TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/348074 (https://phabricator.wikimedia.org/T103882) [13:44:05] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [13:44:15] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [13:44:15] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [13:44:55] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [13:45:05] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [13:48:10] (03CR) 10Mark Bergsma: [C: 031] pybal: bind instrumentation TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/348074 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [13:52:51] 06Operations, 07Puppet, 06Labs: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885#3279509 (10Paladox) [13:54:26] 06Operations, 07Puppet, 06Labs: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885#3279530 (10Paladox) [13:58:08] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Debug HP raid cache disabled errors on ms-be1019/20/21 - https://phabricator.wikimedia.org/T163777#3279562 (10fgiunchedi) @Cmjohnson sounds good! let's try that on ms-be1019 on Tues [14:00:53] (03CR) 10Ema: [C: 031] Cleanup whitespace [debs/pybal] - 10https://gerrit.wikimedia.org/r/354683 (owner: 10Mark Bergsma) [14:06:22] (03CR) 10Ema: Create new BGP message classes for incremental construction (032 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684 (owner: 10Mark Bergsma) [14:29:48] 06Operations, 07Puppet, 06Labs: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885#3279670 (10Dzahn) https://ask.puppet.com/question/132/does-filebucket-need-periodic-maintenance-cleaning/ https://bugzilla.mozilla.org/show_bug.cgi?id=624166 [14:32:02] (03CR) 10Mark Bergsma: [C: 032] Add pyenv and pydev config files to .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/353988 (owner: 10Mark Bergsma) [14:33:23] (03CR) 10Mark Bergsma: [C: 032] Cleanup whitespace [debs/pybal] - 10https://gerrit.wikimedia.org/r/354683 (owner: 10Mark Bergsma) [14:38:23] (03Merged) 10jenkins-bot: Add pyenv and pydev config files to .gitignore [debs/pybal] - 10https://gerrit.wikimedia.org/r/353988 (owner: 10Mark Bergsma) [14:38:24] (03Merged) 10jenkins-bot: Cleanup whitespace [debs/pybal] - 10https://gerrit.wikimedia.org/r/354683 (owner: 10Mark Bergsma) [14:47:29] (03CR) 10Mark Bergsma: Create new BGP message classes for incremental construction (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684 (owner: 10Mark Bergsma) [14:49:44] (03CR) 10Ema: [C: 031] Use a bytearray to encode prefixes in BGP.encodePrefixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354685 (owner: 10Mark Bergsma) [15:07:44] (03PS1) 10Mark Bergsma: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711 [15:09:48] ema ^ [15:12:00] (03PS2) 10Mark Bergsma: Use a bytearray to encode prefixes in BGP.encodePrefixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354685 [15:12:02] (03PS2) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686 [15:12:04] (03PS2) 10Mark Bergsma: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711 [15:16:25] (03PS3) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686 [15:16:27] (03PS3) 10Mark Bergsma: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711 [15:23:52] mark: nice! [15:52:45] 06Operations, 05Continuous-Integration-Scaling, 07Nodepool, 06Release-Engineering-Team (Backlog), 07WorkType-NewFunctionality: Backport python-shade from debian/testing to jessie-wikimedia - https://phabricator.wikimedia.org/T107267#3279958 (10greg) [15:56:36] (03PS1) 10Mark Bergsma: Allow for withdrawals and NLRI to be sent in the same UPDATE [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723 [16:07:51] (03PS4) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686 [16:07:53] (03PS4) 10Mark Bergsma: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711 [16:07:55] (03PS2) 10Mark Bergsma: Allow for withdrawals and NLRI to be sent in the same UPDATE [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723 [16:08:55] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [16:08:55] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [16:08:55] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [16:08:55] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [16:08:56] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [16:08:56] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [16:09:05] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [16:11:45] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [16:11:55] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [16:11:55] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [16:11:55] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [16:11:55] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [16:11:55] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [16:11:55] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [16:19:58] does citoid do this often? :) [16:24:05] (03PS1) 10Jforrester: Beta Features: Update last-big-change-plus-six-month dates in comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354731 [16:24:07] (03PS1) 10Jforrester: Cleanup ORES config: Drop wgOresExtensionStatus (default), alphasort [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354732 [16:41:07] (03PS1) 10Framawiki: Change $wgUploadNavigationUrl on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354737 (https://phabricator.wikimedia.org/T165901) [16:42:35] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2059693 [16:53:55] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, maybe add a comment about the need of running the script from a puppet.git checkout" [puppet] - 10https://gerrit.wikimedia.org/r/354105 (https://phabricator.wikimedia.org/T165583) (owner: 10Volans) [16:54:29] (03PS2) 10Framawiki: Set $wgUploadNavigationUrl on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354737 (https://phabricator.wikimedia.org/T165901) [17:02:33] (03PS1) 10Daniel Kinzler: New SSH key for me [puppet] - 10https://gerrit.wikimedia.org/r/354743 [17:02:47] addshore: Tell Daniel I just found some opsen [17:03:42] (03CR) 10Addshore: [C: 031] New SSH key for me [puppet] - 10https://gerrit.wikimedia.org/r/354743 (owner: 10Daniel Kinzler) [17:05:12] (03PS2) 10Daniel Kinzler: New SSH key for me [puppet] - 10https://gerrit.wikimedia.org/r/354743 [17:05:15] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:05:16] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:05:16] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:05:25] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:05:25] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:05:37] (03PS4) 10Volans: Puppet: run-puppet-agent, add --failed-only option [puppet] - 10https://gerrit.wikimedia.org/r/349416 [17:05:46] (03CR) 10Ema: [C: 031] Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711 (owner: 10Mark Bergsma) [17:05:52] greg-g: IIRC it is due to an external resource being flaky, I'll take a look [17:07:20] (03CR) 10Volans: [C: 032] New SSH key for me [puppet] - 10https://gerrit.wikimedia.org/r/354743 (owner: 10Daniel Kinzler) [17:08:15] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [17:08:15] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [17:09:15] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [17:09:15] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [17:09:16] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [17:18:52] (03PS4) 10Giuseppe Lavagetto: Add netlink-based Ipvsmanager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354509 [17:19:24] yeah there's a "Maximum call stack size exceeded" from a worker that gets restarted, afaik that shouldn't cause a timeout tho [17:20:27] (03CR) 10Volans: "I had verified the request in person." [puppet] - 10https://gerrit.wikimedia.org/r/354743 (owner: 10Daniel Kinzler) [17:22:35] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 39886 [17:22:36] (03PS1) 10Ema: Move IP classes to pybal.ip [debs/pybal] - 10https://gerrit.wikimedia.org/r/354746 [17:25:25] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:25:25] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:25:25] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:26:15] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:26:25] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [17:28:15] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [17:28:15] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [17:28:15] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [17:28:15] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [17:28:16] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [17:29:27] !log addshore@terbium:/srv/mediawiki/php-1.30.0-wmf.1$ mwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log [17:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:39] !log addshore@terbium:/srv/mediawiki/php-1.30.0-wmf.1$ mwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log T164407 [17:29:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:47] T164407: Cognate has been disabled from WMF because it caused an outage on x1 by overtaking 10000 concurrent connections - https://phabricator.wikimedia.org/T164407 [17:33:38] (03CR) 10jerkins-bot: [V: 04-1] Add netlink-based Ipvsmanager implementation [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354509 (owner: 10Giuseppe Lavagetto) [19:02:55] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [19:02:56] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [19:02:56] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [19:03:05] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [19:03:05] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [19:03:05] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [19:03:45] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [19:03:55] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [19:03:55] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [19:03:55] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [19:03:56] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [19:04:05] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [20:18:25] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [20:18:35] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [20:18:35] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [20:18:35] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [20:19:25] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [20:21:25] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [20:21:25] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [20:21:25] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [20:21:25] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [20:22:15] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [20:49:12] (03PS1) 10BryanDavis: tools: have maintain-kubeusers chown $HOME/.kube [puppet] - 10https://gerrit.wikimedia.org/r/354839 (https://phabricator.wikimedia.org/T165875) [20:56:49] (03PS1) 10BryanDavis: Use wikitech db group instead of labswiki+ labtestwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354856 [20:57:20] (03CR) 10BryanDavis: Add Code of Conduct footer links to wikitech and mw.o (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 (owner: 10BryanDavis) [21:00:09] (03PS2) 10BryanDavis: Add Code of Conduct footer links to wikitech and mw.o [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354612 [21:00:11] (03PS2) 10BryanDavis: Use wikitech db group instead of labswiki+ labtestwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354856 [21:54:44] !log Run namespaceDupe on fr.wikisource and en.wikisource [21:54:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:25] PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:50:25] RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures