[03:17:09] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:19:19] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.005 second response time [03:25:13] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:13] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.024 second response time [03:33:23] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:33:29] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 913.41 seconds [03:36:41] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [03:41:27] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:43:09] 10Operations, 10Citoid, 10Patch-For-Review, 10Service-deployment-requests, and 3 others: Deploy translation-server-v2 - https://phabricator.wikimedia.org/T201611 (10Krenair) @Akosiaris: I think your last paste is slightly broken (most of the URL got muddled with the data) Does this mean that there should... [03:46:03] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.145 second response time [03:49:33] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:50:43] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 255.48 seconds [03:52:30] 10Operations, 10DNS, 10Traffic, 10incubator.wikimedia.org: Non-existing wikis should redirect to Incubator - https://phabricator.wikimedia.org/T32206 (10Hydriz) [03:52:36] 10Operations, 10Wikimedia-Site-requests, 10incubator.wikimedia.org: Enable the NewUserMessage extension on Wikimedia Incubator - https://phabricator.wikimedia.org/T31727 (10Hydriz) [03:53:40] 10Operations, 10Wikimedia-Mailing-lists, 10incubator.wikimedia.org: Incubator mailinglist. - https://phabricator.wikimedia.org/T18215 (10Hydriz) [03:55:33] PROBLEM - puppet last run on torrelay1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:21:15] RECOVERY - puppet last run on torrelay1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:32:05] PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:33:08] RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 277 bytes in 3.777 second response time [04:35:45] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 276 bytes in 4.140 second response time [04:39:19] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:33] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 277 bytes in 4.991 second response time [04:59:07] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:01:21] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.890 second response time [05:04:53] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:05:55] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.094 second response time [05:09:33] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:01] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.241 second response time [05:31:31] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:07] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.537 second response time [05:39:41] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:06:13] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.470 second response time [06:09:45] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:18:53] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.618 second response time [06:22:27] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:35:09] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.449 second response time [06:38:41] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:43:15] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.380 second response time [06:46:45] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:15:25] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.167 second response time [07:18:57] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:38:31] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.041 second response time [07:42:03] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:23:23] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.203 second response time [08:26:53] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:05:57] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.205 second response time [09:09:27] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:15:07] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.149 second response time [09:20:51] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:16:13] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 276 bytes in 7.282 second response time [10:19:39] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:38:57] <_joe_> oh sigh [10:40:13] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [10:40:37] <_joe_> !log restarting pdfrender on scb1002, flapping since 3:00 UTC [10:40:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:47] (03CR) 10Mathew.onipe: elasticsearch_cluster: Added multi-cluster/multi-instance support (035 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [12:22:42] (03PS26) 10Mathew.onipe: elasticsearch_cluster: Added multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) [12:38:45] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:42:13] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.498 second response time [12:46:51] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:47:51] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.009 second response time [12:51:15] PROBLEM - pdfrender on scb1003 is CRITICAL: connect to address 10.64.32.153 and port 5252: Connection refused [12:52:23] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [13:12:15] (03CR) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [13:13:30] (03PS24) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) [13:23:29] PROBLEM - HHVM jobrunner on mw1306 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [13:24:37] RECOVERY - HHVM jobrunner on mw1306 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [15:15:31] (03PS1) 10Andrew Bogott: Add grants for keystone and neutron db users [puppet] - 10https://gerrit.wikimedia.org/r/475617 (https://phabricator.wikimedia.org/T210326) [18:34:20] (03PS2) 10Andrew Bogott: Add grants for keystone and neutron db users [puppet] - 10https://gerrit.wikimedia.org/r/475617 (https://phabricator.wikimedia.org/T210326) [18:36:04] (03CR) 10Andrew Bogott: [C: 032] Add grants for keystone and neutron db users [puppet] - 10https://gerrit.wikimedia.org/r/475617 (https://phabricator.wikimedia.org/T210326) (owner: 10Andrew Bogott) [20:58:24] (03PS2) 10Zoranzoki21: Delete 'Импортировано' namespace from ru.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475367 (https://phabricator.wikimedia.org/T210171) [23:02:21] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:17:45] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:50:29] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.199, interfaces up: 35, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:52:09] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [23:52:41] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [23:52:49] RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 37, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:57:23] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 2.16 ms [23:57:55] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 28.61 ms