[01:22:26] PROBLEM - HHVM rendering on mw2135 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:17] RECOVERY - HHVM rendering on mw2135 is OK: HTTP OK: HTTP/1.1 200 OK - 74139 bytes in 0.307 second response time [01:52:22] 10Operations, 10Toolforge, 10Traffic, 10HTTPS: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#3864661 (10Liuxinyu970226) [02:28:37] PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1514687312 600 - REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 3285677 keys, up 2 minutes 39 seconds - replication_delay is 1514687312 [02:30:46] RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 3269055 keys, up 4 minutes 49 seconds - replication_delay is 0 [03:24:27] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 719.02 seconds [03:44:36] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 59.25 seconds [11:25:01] (03PS1) 10Revi: Add patrol to Image-reviewer on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401160 [11:26:20] (03PS2) 10Revi: Add patrol to Image-reviewer on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401160 (https://phabricator.wikimedia.org/T183835) [11:27:46] (03CR) 10Steinsplitter: [C: 031] Add patrol to Image-reviewer on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401160 (https://phabricator.wikimedia.org/T183835) (owner: 10Revi) [11:30:21] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401160 (https://phabricator.wikimedia.org/T183835) (owner: 10Revi) [11:56:06] (03PS1) 10Revi: Add Translation NS for kowikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401174 (https://phabricator.wikimedia.org/T183836) [13:51:43] (03CR) 10Framawiki: [C: 031] Set category collation to uca-es-u-kn for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401081 (https://phabricator.wikimedia.org/T183802) (owner: 10MarcoAurelio) [13:52:31] (03CR) 10Framawiki: [C: 031] Add patrol to Image-reviewer on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401160 (https://phabricator.wikimedia.org/T183835) (owner: 10Revi) [13:53:08] (03CR) 10Framawiki: [C: 031] Add Translation NS for kowikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/401174 (https://phabricator.wikimedia.org/T183836) (owner: 10Revi) [13:57:51] (03CR) 10Framawiki: [C: 031] Set 'watchcreations' preference to true by default on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400579 (https://phabricator.wikimedia.org/T178750) (owner: 10Urbanecm) [16:47:17] PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/media/{title} (retrieve media items of en.wp Cat page via media route) timed out before a response was received [16:50:16] RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy [17:26:56] 10Operations, 10ops-esams: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T183815#3865241 (10ArielGlenn) [17:26:58] 10Operations, 10ops-esams: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T168619#3865243 (10ArielGlenn) [17:39:16] PROBLEM - HHVM rendering on mw1317 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:40:16] RECOVERY - HHVM rendering on mw1317 is OK: HTTP OK: HTTP/1.1 200 OK - 74072 bytes in 4.360 second response time [18:34:27] PROBLEM - HHVM rendering on mw1315 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:35:27] RECOVERY - HHVM rendering on mw1315 is OK: HTTP OK: HTTP/1.1 200 OK - 74072 bytes in 7.172 second response time [19:12:17] PROBLEM - Nginx local proxy to apache on mw1315 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:13:16] RECOVERY - Nginx local proxy to apache on mw1315 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 619 bytes in 1.433 second response time [19:47:12] (03PS1) 10Madhuvishy: Revert "dumps: Turn off dumps auto-sync to labstore1006 for reimaging" [puppet] - 10https://gerrit.wikimedia.org/r/401190 [19:47:28] (03CR) 10jerkins-bot: [V: 04-1] Revert "dumps: Turn off dumps auto-sync to labstore1006 for reimaging" [puppet] - 10https://gerrit.wikimedia.org/r/401190 (owner: 10Madhuvishy) [20:02:34] (03PS2) 10Madhuvishy: Revert "dumps: Turn off dumps auto-sync to labstore1006 for reimaging" [puppet] - 10https://gerrit.wikimedia.org/r/401190 [20:06:27] (03PS3) 10Madhuvishy: Revert "dumps: Turn off dumps auto-sync to labstore1006 for reimaging" [puppet] - 10https://gerrit.wikimedia.org/r/401190 [20:08:05] (03PS4) 10Madhuvishy: Revert "dumps: Turn off dumps auto-sync to labstore1006 for reimaging" [puppet] - 10https://gerrit.wikimedia.org/r/401190 [20:10:07] (03CR) 10Madhuvishy: [C: 032] Revert "dumps: Turn off dumps auto-sync to labstore1006 for reimaging" [puppet] - 10https://gerrit.wikimedia.org/r/401190 (owner: 10Madhuvishy) [20:22:56] PROBLEM - Varnish HTTP text-backend - port 3128 on cp4029 is CRITICAL: connect to address 10.128.0.129 and port 3128: Connection refused [20:23:56] RECOVERY - Varnish HTTP text-backend - port 3128 on cp4029 is OK: HTTP OK: HTTP/1.1 200 OK - 218 bytes in 0.157 second response time [20:47:06] RECOVERY - Long running screen/tmux on labstore1007 is OK: OK: No SCREEN or tmux processes detected. [21:55:36] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/random/{format} (Random title redirect) timed out before a response was received: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body (AttributeError: NoneType object has no attribute get) [21:57:26] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy [22:08:14] (03CR) 10Smalyshev: [C: 031] Support prefixed dump types [dumps/dcat] - 10https://gerrit.wikimedia.org/r/390312 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [22:27:26] PROBLEM - puppet last run on ms-be1033 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[mkfs-/dev/sdk1] [22:57:26] RECOVERY - puppet last run on ms-be1033 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:31:06] RECOVERY - Long running screen/tmux on labstore1006 is OK: OK: No SCREEN or tmux processes detected.