[00:20:52] RECOVERY - Check Varnish expiry mailbox lag on cp4026 is OK: OK: expiry mailbox lag is 0 [00:28:36] (03Draft1) 10Paladox: mediawiki_vagrant: Update defined(Class['role::labs_vagrant'] to role::deprecated::labsvagrant [puppet] - 10https://gerrit.wikimedia.org/r/389295 [00:28:39] (03PS2) 10Paladox: mediawiki_vagrant: Update defined(Class['role::labs_vagrant'] to role::deprecated::labsvagrant [puppet] - 10https://gerrit.wikimedia.org/r/389295 [00:28:43] (03CR) 10jerkins-bot: [V: 04-1] mediawiki_vagrant: Update defined(Class['role::labs_vagrant'] to role::deprecated::labsvagrant [puppet] - 10https://gerrit.wikimedia.org/r/389295 (owner: 10Paladox) [00:29:04] (03CR) 10jerkins-bot: [V: 04-1] mediawiki_vagrant: Update defined(Class['role::labs_vagrant'] to role::deprecated::labsvagrant [puppet] - 10https://gerrit.wikimedia.org/r/389295 (owner: 10Paladox) [00:29:43] (03PS3) 10Paladox: mediawiki_vagrant: Update role name used for if defined check [puppet] - 10https://gerrit.wikimedia.org/r/389295 [00:29:57] (03PS4) 10Paladox: mediawiki_vagrant: Update role name used for if defined check [puppet] - 10https://gerrit.wikimedia.org/r/389295 [00:33:38] (03Abandoned) 10Paladox: mediawiki_vagrant: Update role name used for if defined check [puppet] - 10https://gerrit.wikimedia.org/r/389295 (owner: 10Paladox) [00:35:29] (03Restored) 10Paladox: mediawiki_vagrant: Update role name used for if defined check [puppet] - 10https://gerrit.wikimedia.org/r/389295 (owner: 10Paladox) [01:04:12] PROBLEM - Check health of redis instance on 6379 on rdb2001 is CRITICAL: CRITICAL: replication_delay is 1509843848 600 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8508553 keys, up 4 minutes 6 seconds - replication_delay is 1509843848 [01:04:32] PROBLEM - Check health of redis instance on 6380 on rdb2003 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6380 [01:04:41] PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1509843871 600 - REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 3798263 keys, up 4 minutes 23 seconds - replication_delay is 1509843871 [01:04:41] PROBLEM - Check health of redis instance on 6379 on rdb2003 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6379 [01:05:12] RECOVERY - Check health of redis instance on 6379 on rdb2001 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8503183 keys, up 5 minutes 6 seconds - replication_delay is 0 [01:05:32] PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1509843925 600 - REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3795629 keys, up 5 minutes 17 seconds - replication_delay is 1509843925 [01:05:41] RECOVERY - Check health of redis instance on 6380 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 8499177 keys, up 5 minutes 28 seconds - replication_delay is 0 [01:05:42] PROBLEM - Check health of redis instance on 6481 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1509843936 600 - REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 3802974 keys, up 5 minutes 22 seconds - replication_delay is 1509843936 [01:05:42] RECOVERY - Check health of redis instance on 6379 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8498768 keys, up 5 minutes 37 seconds - replication_delay is 0 [01:07:32] RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 3796519 keys, up 7 minutes 22 seconds - replication_delay is 0 [01:07:33] RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3795026 keys, up 7 minutes 22 seconds - replication_delay is 0 [01:07:41] RECOVERY - Check health of redis instance on 6481 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 3792600 keys, up 7 minutes 22 seconds - replication_delay is 0 [01:29:52] PROBLEM - Check whether ferm is active by checking the default input chain on labtestmetal2001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [01:30:52] RECOVERY - Check whether ferm is active by checking the default input chain on labtestmetal2001 is OK: OK ferm input default policy is set [02:41:51] (03CR) 10BryanDavis: [C: 031] uwsgi: fix dependency for stretch [puppet] - 10https://gerrit.wikimedia.org/r/388750 (owner: 10Ayounsi) [03:25:21] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 810.21 seconds [03:33:31] PROBLEM - Nginx local proxy to apache on mw2201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:21] RECOVERY - Nginx local proxy to apache on mw2201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.198 second response time [03:54:31] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 94.38 seconds [10:40:41] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 29 probes of 285 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [10:45:41] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 8 probes of 285 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [11:39:41] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 23 probes of 285 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [11:44:41] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 8 probes of 285 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [12:33:57] 10Operations, 10Ops-Access-Requests: Requesting access to perf-teams for phedenskog - https://phabricator.wikimedia.org/T179729#3736005 (10Peter) [12:39:45] 10Operations, 10MediaWiki-Containers, 10Continuous-Integration-Infrastructure (shipyard): Homepage for https://docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T179696#3736006 (10Addshore) [12:40:21] 10Operations, 10MediaWiki-Containers, 10Continuous-Integration-Infrastructure (shipyard): Homepage for https://docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T179696#3733282 (10Addshore) I switched the title from UI to Homepage to keep things a little more basic. I would be happy with a s... [12:52:02] 10Operations, 10media-storage: upload.wikimedia.org reports wrong mimetype for svg - https://phabricator.wikimedia.org/T179787#3736026 (10Reedy) Is it all SVG files? Or just that one specific one? [13:44:41] PROBLEM - puppet last run on restbase2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:09:41] RECOVERY - puppet last run on restbase2004 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [15:54:39] 10Operations, 10media-storage: upload.wikimedia.org reports wrong mimetype for svg - https://phabricator.wikimedia.org/T179787#3736138 (10MichaelSchoenitzer) > Is it all SVG files? Or just that one specific one? I saw another one shortly (but don't remember which), so I assumed it beeing a server issue – but... [16:01:34] 10Operations, 10media-storage: upload.wikimedia.org reports wrong mimetype for svg - https://phabricator.wikimedia.org/T179787#3736008 (10BBlack) related to T131012 ? The file doesn't appear to have an XML prolog (although as that ticket notes, one probably shouldn't be required) [17:44:02] PROBLEM - puppet last run on mw1313 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:01:41] PROBLEM - puppet last run on ganeti2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:14:02] RECOVERY - puppet last run on mw1313 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:31:41] RECOVERY - puppet last run on ganeti2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [20:29:43] (03PS8) 10TerraCodes: Remove overlapping userrights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370791 (https://phabricator.wikimedia.org/T101983) [20:37:57] 10Operations, 10media-storage: upload.wikimedia.org reports wrong mimetype for svg - https://phabricator.wikimedia.org/T179787#3736314 (10Reedy) >>! In T179787#3736139, @BBlack wrote: > related to T131012 ? The file doesn't appear to have an XML prolog (although as that ticket notes, one probably shouldn't be... [20:50:55] 10Operations, 10media-storage: upload.wikimedia.org reports wrong mimetype for svg - https://phabricator.wikimedia.org/T179787#3736008 (10zhuyifei1999) See also T150929 [20:57:41] PROBLEM - puppet last run on mc1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:10:25] 10Operations, 10Ops-Access-Requests: Add hoo to perf-roots - https://phabricator.wikimedia.org/T179317#3736323 (10hoo) >>! In T179317#3731710, @MoritzMuehlenhoff wrote: > Can you elaborate what you need in specific to debug wikidata performance problems? We can arrange access to all the logs you need, but perf... [21:27:41] RECOVERY - puppet last run on mc1027 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:02:02] PROBLEM - puppet last run on conf2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:27:01] RECOVERY - puppet last run on conf2003 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures