[00:00:05] twentyafterfour: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170629T0000). Please do the needful. [00:12:50] (03PS1) 10Krinkle: Disable wgEnableJavaScriptTest on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362109 [00:18:25] (03CR) 10Krinkle: [C: 032] Disable wgEnableJavaScriptTest on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362109 (owner: 10Krinkle) [00:19:45] (03CR) 10Ladsgroup: "As wikiba.se is currently outside of WMF production cluster (which it should be and there is a ticket for that), redirecting to that is no" [puppet] - 10https://gerrit.wikimedia.org/r/361801 (https://phabricator.wikimedia.org/T169023) (owner: 10Krinkle) [00:19:50] (03CR) 10Gehel: [C: 032] Monitor elasticsearch stats for load test [puppet] - 10https://gerrit.wikimedia.org/r/362091 (https://phabricator.wikimedia.org/T169002) (owner: 10EBernhardson) [00:19:52] (03Merged) 10jenkins-bot: Disable wgEnableJavaScriptTest on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362109 (owner: 10Krinkle) [00:19:59] (03CR) 10jenkins-bot: Disable wgEnableJavaScriptTest on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362109 (owner: 10Krinkle) [00:20:28] (03CR) 10Krinkle: "It also appears to lack HTTPS (Let's Encrypt?)" [puppet] - 10https://gerrit.wikimedia.org/r/361801 (https://phabricator.wikimedia.org/T169023) (owner: 10Krinkle) [00:23:01] !log krinkle@tin Synchronized wmf-config/InitialiseSettings.php: I8ce28a4ce7 - test2wiki config cleanup (duration: 00m 47s) [00:23:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:19] (03PS1) 10Krinkle: xenon: Increase hourly retention from 1 to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/362114 (https://phabricator.wikimedia.org/T166624) [00:55:09] (03PS7) 10Dzahn: librenms: add support for stretch, adjust (PHP) packages [puppet] - 10https://gerrit.wikimedia.org/r/362014 [00:57:32] (03PS8) 10Dzahn: librenms: add support for stretch, adjust (PHP) packages [puppet] - 10https://gerrit.wikimedia.org/r/362014 [01:00:14] (03CR) 10Dzahn: [C: 032] librenms: add support for stretch, adjust (PHP) packages [puppet] - 10https://gerrit.wikimedia.org/r/362014 (owner: 10Dzahn) [01:00:50] (03PS9) 10Dzahn: librenms: add support for stretch, adjust (PHP) packages [puppet] - 10https://gerrit.wikimedia.org/r/362014 [01:00:53] (03CR) 10Dzahn: [V: 032 C: 032] librenms: add support for stretch, adjust (PHP) packages [puppet] - 10https://gerrit.wikimedia.org/r/362014 (owner: 10Dzahn) [01:02:24] PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1498698135 600 - REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4097265 keys, up 2 minutes 13 seconds - replication_delay is 1498698135 [01:02:24] PROBLEM - Check health of redis instance on 6380 on rdb2003 is CRITICAL: CRITICAL: replication_delay is 1498698138 600 - REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 8799518 keys, up 2 minutes 16 seconds - replication_delay is 1498698138 [01:02:24] PROBLEM - Check health of redis instance on 6379 on rdb2001 is CRITICAL: CRITICAL: replication_delay is 1498698138 600 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8806492 keys, up 2 minutes 17 seconds - replication_delay is 1498698138 [01:02:34] PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1498698146 600 - REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 4096686 keys, up 2 minutes 23 seconds - replication_delay is 1498698146 [01:02:45] PROBLEM - Check health of redis instance on 6381 on rdb2003 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6381 [01:02:52] (03CR) 10Dzahn: [C: 032] "yes, there is enough space there for that" [puppet] - 10https://gerrit.wikimedia.org/r/362114 (https://phabricator.wikimedia.org/T166624) (owner: 10Krinkle) [01:02:59] (03PS2) 10Dzahn: xenon: Increase hourly retention from 1 to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/362114 (https://phabricator.wikimedia.org/T166624) (owner: 10Krinkle) [01:03:24] RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4095357 keys, up 3 minutes 13 seconds - replication_delay is 0 [01:03:24] RECOVERY - Check health of redis instance on 6379 on rdb2001 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8801264 keys, up 3 minutes 17 seconds - replication_delay is 0 [01:03:24] RECOVERY - Check health of redis instance on 6380 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 8797182 keys, up 3 minutes 17 seconds - replication_delay is 0 [01:03:34] RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 4095377 keys, up 3 minutes 23 seconds - replication_delay is 0 [01:03:44] RECOVERY - Check health of redis instance on 6381 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6381 has 1 databases (db0) with 8705265 keys, up 3 minutes 41 seconds - replication_delay is 0 [01:06:26] 10Operations, 10ArchCom-RfC, 10Traffic, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3389372 (10Anomie) You can't propose limiting concurrency based on IP while de... [01:08:27] !log mwlog1001 - deleted /srv/xenon/logs from 2015 and 2016 as requested by Krinkle. Also merged https://gerrit.wikimedia.org/r/#/c/362114/ so now logs are retained for 14 days [01:08:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:09:28] (03CR) 10Dzahn: "i deleted these:" [puppet] - 10https://gerrit.wikimedia.org/r/362114 (https://phabricator.wikimedia.org/T166624) (owner: 10Krinkle) [01:21:26] (03PS1) 10Papaul: DNS: Add mgmt DNS entries for labtestcontrol2003,labtestservices200[2-3] and labtestmetal2001 [dns] - 10https://gerrit.wikimedia.org/r/362115 [01:38:40] (03CR) 10Dzahn: [C: 032] DNS: Add mgmt DNS entries for labtestcontrol2003,labtestservices200[2-3] and labtestmetal2001 [dns] - 10https://gerrit.wikimedia.org/r/362115 (owner: 10Papaul) [01:51:40] (03PS1) 10Milimetric: [WIP don't merge yet, still working on the actual code to deploy] Clone wikistats v2 repository and link it to v2 [puppet] - 10https://gerrit.wikimedia.org/r/362118 (https://phabricator.wikimedia.org/T167684) [01:57:00] what is the official doc/homepage of mod_php anyways? it's not an Apache project, but in our Apache module we have things like https://httpd.apache.org/docs/current/mod/mod_php5.html but that doesn't exist, heh [01:57:15] and #httpd claims it never did :) [02:03:11] #php also doesnt seem to know and install guides is as close as it gets. http://php.net/manual/en/install.unix.apache2.php oh well [02:04:27] (03CR) 10Krinkle: "@Filippo: Excuse the naive question, but why don't puppet 'require' dependencies work for that case? File > .. > Service, right?" [puppet] - 10https://gerrit.wikimedia.org/r/268598 (owner: 10Ori.livneh) [02:14:29] (03PS1) 10Dzahn: apache: add class for mod_php with PHP 7.0 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) [02:17:14] (03PS2) 10Dzahn: apache: add class for mod_php with PHP 7.0 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) [02:22:32] (03PS1) 10Dzahn: librenms: php7 [puppet] - 10https://gerrit.wikimedia.org/r/362122 [02:22:34] (03PS1) 10Dzahn: librenms: use libapache2-mod-php7.0 if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/362123 (https://phabricator.wikimedia.org/T159756) [02:23:32] (03Abandoned) 10Dzahn: librenms: php7 [puppet] - 10https://gerrit.wikimedia.org/r/362122 (owner: 10Dzahn) [02:26:36] (03PS3) 10Dzahn: apache: add class for mod_php with PHP 7.0 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) [02:26:49] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.6) (duration: 09m 24s) [02:27:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:27:22] (03CR) 10Dzahn: librenms: use libapache2-mod-php7.0 if on stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/362123 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [02:46:05] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.7) (duration: 07m 43s) [02:46:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:52:57] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Jun 29 02:52:57 UTC 2017 (duration 6m 52s) [02:53:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:53:10] (03PS1) 10Dzahn: phabricator: add support for stretch and PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/362124 [02:54:33] (03CR) 10jerkins-bot: [V: 04-1] phabricator: add support for stretch and PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/362124 (owner: 10Dzahn) [02:54:47] (03PS2) 10Dzahn: phabricator: add support for stretch and PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/362124 [02:55:53] (03CR) 10jerkins-bot: [V: 04-1] phabricator: add support for stretch and PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/362124 (owner: 10Dzahn) [02:56:18] (03CR) 10Krinkle: phabricator: add support for stretch and PHP7 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/362124 (owner: 10Dzahn) [02:56:35] mutante: Wee, Phab on PHP 7 in prod, that would be nice. [02:57:46] Krinkle: i was moving another thing to stretch (tools on netmon1002 like librenms) and i had to always adjust the php package names.. so i grepped what else uses libapache-mod-php5 , might as well fix that too.. and phab came up [02:58:05] we do have the stretch image on labs now.. so can test, yep [02:59:17] For most things stretch isn't a big prio I imagine, but phab being a PHP service, would definitely benefit. [02:59:21] https://secure.phabricator.com/T9640 [02:59:27] Make Phabricator compatible with PHP7 [02:59:30] Resolved upstream [03:00:36] since January. so i guess why not. yes :) will try it [03:02:45] (03PS3) 10Dzahn: phabricator: add support for stretch and PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/362124 [03:03:29] (03PS4) 10Dzahn: phabricator: add support for stretch and PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/362124 [03:04:29] (03CR) 10Dzahn: "this requires https://gerrit.wikimedia.org/r/#/c/362119/ which adds mod-php7 class in Apache module" [puppet] - 10https://gerrit.wikimedia.org/r/362124 (owner: 10Dzahn) [03:04:50] (03CR) 10jerkins-bot: [V: 04-1] phabricator: add support for stretch and PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/362124 (owner: 10Dzahn) [03:05:56] (03PS5) 10Dzahn: phabricator: add support for stretch and PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/362124 [03:07:25] !log kartik@tin Started deploy [cxserver/deploy@6f0e9a7]: Update cxserver to e69353b [03:07:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:09:54] !log kartik@tin Finished deploy [cxserver/deploy@6f0e9a7]: Update cxserver to e69353b (duration: 02m 28s) [03:10:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:17:08] (03PS7) 10Krinkle: Enable wgUsejQueryThree on the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348120 (https://phabricator.wikimedia.org/T124742) [03:17:11] (03PS8) 10Krinkle: Enable wgUsejQueryThree on the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348120 (https://phabricator.wikimedia.org/T124742) [03:17:38] (03PS9) 10Krinkle: Enable wgUsejQueryThree on the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348120 (https://phabricator.wikimedia.org/T124742) [03:48:35] (03PS1) 10Dzahn: netmon1002: disable Letsencrypt cert creation for migration [puppet] - 10https://gerrit.wikimedia.org/r/362126 [03:56:26] !log 'service hhvm restart' on mwdebug1001 and mwdebug1002 to help investigate T168540 [03:56:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:56:39] T168540: Understand APC size increase after HHVM upgrade/restart - https://phabricator.wikimedia.org/T168540 [03:58:13] (03PS1) 10Dzahn: netmon: use existing role::network::monitor, clean up site.pp [puppet] - 10https://gerrit.wikimedia.org/r/362127 [04:26:14] 10Operations, 10Performance-Team: Understand APC size increase after HHVM upgrade/restart - https://phabricator.wikimedia.org/T168540#3389484 (10Krinkle) 05Open>03Resolved > 2017-06-15 08:02 moritzm: updating HHVM on terbium/wasat to 3.18 1GB increase in memory usage immediately following the upgrade: |... [04:33:00] !log 'service hhvm restart' on mwdebug1001 and mwdebug1002 (T168540) [04:33:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:33:12] T168540: Understand APC size increase after HHVM upgrade/restart - https://phabricator.wikimedia.org/T168540 [04:50:22] (03CR) 10BBlack: [C: 031] 4.1.6-1wm2: new varnish-counters for transient storage [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/361845 (https://phabricator.wikimedia.org/T164768) (owner: 10Ema) [06:14:24] PROBLEM - Check Varnish expiry mailbox lag on cp4015 is CRITICAL: CRITICAL: expiry mailbox lag is 2023383 [06:17:50] (03PS1) 10Dzahn: wikimania_scholarships: add support for stretch and PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/362137 [06:18:52] (03CR) 10Dzahn: wikimania_scholarships: add support for stretch and PHP7 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/362137 (owner: 10Dzahn) [06:20:15] (03CR) 10Dzahn: "more use cases that depend on this: https://gerrit.wikimedia.org/r/#/c/362124/ , https://gerrit.wikimedia.org/r/#/c/362137/ ," [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [06:23:44] PROBLEM - Check Varnish expiry mailbox lag on cp4013 is CRITICAL: CRITICAL: expiry mailbox lag is 2081661 [06:29:34] PROBLEM - Confd template for /etc/dsh/group/mediawiki-installation on bast3002 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:30:24] RECOVERY - Confd template for /etc/dsh/group/mediawiki-installation on bast3002 is OK: No errors detected [06:34:04] PROBLEM - dhclient process on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:04] PROBLEM - Check whether ferm is active by checking the default input chain on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:04] PROBLEM - Check systemd state on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:04] PROBLEM - salt-minion processes on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:04] PROBLEM - Check size of conntrack table on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:04] PROBLEM - DPKG on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:05] PROBLEM - configured eth on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:14] PROBLEM - Disk space on bast3002 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:34:14] PROBLEM - Confd template for /etc/dsh/group/jobrunner on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:14] PROBLEM - MD RAID on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:15] PROBLEM - confd service on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:15] PROBLEM - puppet last run on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:34] PROBLEM - Confd template for /etc/dsh/group/mediawiki-installation on bast3002 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:34:44] PROBLEM - Confd template for /etc/dsh/group/cassandra on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:44] PROBLEM - Confd template for /etc/dsh/group/parsoid on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:37:04] RECOVERY - salt-minion processes on bast3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:37:04] RECOVERY - dhclient process on bast3002 is OK: PROCS OK: 0 processes with command name dhclient [06:37:04] RECOVERY - Check whether ferm is active by checking the default input chain on bast3002 is OK: OK ferm input default policy is set [06:37:04] RECOVERY - Check size of conntrack table on bast3002 is OK: OK: nf_conntrack is 0 % full [06:37:04] RECOVERY - configured eth on bast3002 is OK: OK - interfaces up [06:37:04] RECOVERY - DPKG on bast3002 is OK: All packages OK [06:37:04] RECOVERY - Disk space on bast3002 is OK: DISK OK [06:37:05] RECOVERY - Check systemd state on bast3002 is OK: OK - running: The system is fully operational [06:37:05] RECOVERY - Confd template for /etc/dsh/group/jobrunner on bast3002 is OK: No errors detected [06:37:06] RECOVERY - confd service on bast3002 is OK: OK - confd is active [06:37:06] RECOVERY - MD RAID on bast3002 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [06:37:07] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 25 minutes ago with 0 failures [06:37:24] RECOVERY - Confd template for /etc/dsh/group/mediawiki-installation on bast3002 is OK: No errors detected [06:37:44] RECOVERY - Confd template for /etc/dsh/group/parsoid on bast3002 is OK: No errors detected [06:37:45] RECOVERY - Confd template for /etc/dsh/group/cassandra on bast3002 is OK: No errors detected [07:02:48] 10Operations, 10ops-esams: bast3002 sdb broken - https://phabricator.wikimedia.org/T169035#3389595 (10ema) p:05Triage>03High [07:03:59] !log joal@tin Started deploy [analytics/refinery@f6cccf9]: (no justification provided) [07:04:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:04:10] (03PS1) 10Marostegui: db-eqiad.php: Depool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362139 (https://phabricator.wikimedia.org/T166208) [07:06:45] 10Operations, 10ops-esams: bast3002 sdb broken - https://phabricator.wikimedia.org/T169035#3389599 (10ema) Today Jun 29th at 07:34 AM bast3002 was entirely unreachable for about 3 minutes. During that time, I've logged in in console to find kernel logs such as those posted by @fgiunchedi above. @Volans suggest... [07:07:31] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362139 (https://phabricator.wikimedia.org/T166208) (owner: 10Marostegui) [07:08:51] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362139 (https://phabricator.wikimedia.org/T166208) (owner: 10Marostegui) [07:09:00] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362139 (https://phabricator.wikimedia.org/T166208) (owner: 10Marostegui) [07:09:54] !log joal@tin Finished deploy [analytics/refinery@f6cccf9]: (no justification provided) (duration: 05m 55s) [07:10:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:07] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1028 - T166208 (duration: 00m 47s) [07:10:08] !log elukey@tin Started deploy [analytics/refinery@f6cccf9]: Updated stat1002 with the last refinery deployment [07:10:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:17] T166208: Convert unique keys into primary keys for some wiki tables on s7 - https://phabricator.wikimedia.org/T166208 [07:10:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:16] (03PS1) 10Jcrespo: mariadb: Change default socket location for analytics slave [puppet] - 10https://gerrit.wikimedia.org/r/362140 (https://phabricator.wikimedia.org/T148507) [07:12:44] !log elukey@tin Finished deploy [analytics/refinery@f6cccf9]: Updated stat1002 with the last refinery deployment (duration: 02m 36s) [07:12:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:21] (03CR) 10Marostegui: [C: 031] mariadb: Change default socket location for analytics slave [puppet] - 10https://gerrit.wikimedia.org/r/362140 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [07:13:45] PROBLEM - Check Varnish expiry mailbox lag on cp4005 is CRITICAL: CRITICAL: expiry mailbox lag is 2028972 [07:13:45] !log Deploy alter table on s7 - db1028 - T166208 [07:13:49] (03CR) 10Jcrespo: [C: 032] mariadb: Change default socket location for analytics slave [puppet] - 10https://gerrit.wikimedia.org/r/362140 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [07:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:16:32] !log elukey@tin Started deploy [analytics/refinery@f6cccf9]: Updated stat1002 with the last refinery deployment [07:16:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:18:09] !log Disable event scheduler temporarily on dbstore1001 - T169050 [07:18:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:18:18] T169050: dbstore1001 mysql crashed with: semaphore wait has lasted > 600 seconds - https://phabricator.wikimedia.org/T169050 [07:19:26] !log elukey@tin Finished deploy [analytics/refinery@f6cccf9]: Updated stat1002 with the last refinery deployment (duration: 02m 55s) [07:19:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:22:44] RECOVERY - MariaDB Slave IO: s2 on db1047 is OK: OK slave_io_state Slave_IO_Running: Yes [07:22:45] RECOVERY - MariaDB Slave Lag: s2 on db1047 is OK: OK slave_sql_lag Replication lag: 0.27 seconds [07:22:54] RECOVERY - MariaDB Slave SQL: s2 on db1047 is OK: OK slave_sql_state Slave_SQL_Running: Yes [07:23:04] RECOVERY - MariaDB Slave SQL: s1 on db1047 is OK: OK slave_sql_state Slave_SQL_Running: Yes [07:23:04] RECOVERY - MariaDB Slave IO: s1 on db1047 is OK: OK slave_io_state Slave_IO_Running: Yes [07:23:23] marostegui: --^ it is fighting back :D [07:23:40] thanks marostegui for taking all the credit :-P [07:27:18] haha [07:27:29] I didn't do anything! [07:28:44] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[analytics/refinery] [07:29:11] this is me --^ [07:33:59] (03PS1) 10Ayounsi: Depool codfw for asw-a-codfw switch upgrade [dns] - 10https://gerrit.wikimedia.org/r/362141 (https://phabricator.wikimedia.org/T168462) [07:36:21] !log depooled kafka2001.codfw.wmnet for T168462 [07:36:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:32] T168462: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462 [07:37:19] (03PS1) 10Ayounsi: Route cache traffic around codfw for asw-a-codfw switch upgrade [puppet] - 10https://gerrit.wikimedia.org/r/362145 (https://phabricator.wikimedia.org/T168462) [07:38:10] (03CR) 10Ema: [C: 031] Depool codfw for asw-a-codfw switch upgrade [dns] - 10https://gerrit.wikimedia.org/r/362141 (https://phabricator.wikimedia.org/T168462) (owner: 10Ayounsi) [07:39:10] (03CR) 10Ema: [C: 031] Route cache traffic around codfw for asw-a-codfw switch upgrade [puppet] - 10https://gerrit.wikimedia.org/r/362145 (https://phabricator.wikimedia.org/T168462) (owner: 10Ayounsi) [07:41:53] 10Operations, 10Discovery, 10Maps, 10Traffic, 10Interactive-Sprint: Rate-limit browsers without referers - https://phabricator.wikimedia.org/T154704#2921080 (10Gehel) After some discussion with @ema and @BBlack: //TL;DR - there's lots of fancy thoughts to have about the long term, but pragmatically ther... [07:42:44] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [07:42:50] (03CR) 10Ayounsi: [C: 032] Depool codfw for asw-a-codfw switch upgrade [dns] - 10https://gerrit.wikimedia.org/r/362141 (https://phabricator.wikimedia.org/T168462) (owner: 10Ayounsi) [07:43:02] (03CR) 10Ayounsi: [C: 032] Route cache traffic around codfw for asw-a-codfw switch upgrade [puppet] - 10https://gerrit.wikimedia.org/r/362145 (https://phabricator.wikimedia.org/T168462) (owner: 10Ayounsi) [07:44:29] !log codfw depooled from DNS - T168462 [07:44:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:38] T168462: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462 [07:46:38] !log ema@neodymium conftool action : set/pooled=no; selector: name=acamar.wikimedia.org,service=pdns_recursor [07:46:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:11] !log Route cache traffic around codfw - T168462 [07:47:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:38] PROBLEM - LVS HTTP IPv4 on eventbus.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:50:28] RECOVERY - LVS HTTP IPv4 on eventbus.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1488 bytes in 3.238 second response time [07:50:31] ok this one is strange --^ [07:50:40] I am guessing that is was due to acamar [07:53:44] RECOVERY - Check Varnish expiry mailbox lag on cp4005 is OK: OK: expiry mailbox lag is 0 [07:57:37] !log switching citoid and restbase-async temporarily to eqiad for T168462 [07:57:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:47] T168462: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462 [07:59:52] 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure, 10Technical-Debt, 10Tracking: Minimize differences between beta and production (Tracking) - https://phabricator.wikimedia.org/T87220#3389706 (10hashar) [08:01:14] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:04:22] 10Operations, 10Performance-Team, 10Traffic, 10Varnish: Collect Backend-Timing in Graphite - https://phabricator.wikimedia.org/T131894#3389745 (10Gilles) a:05Gilles>03None [08:05:31] !log bounce pybal on codfw secondary LVSs (lvs2004-2006) [08:05:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:06:33] (03CR) 10Hashar: [C: 04-1] Add 3d2png deploy repo to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/345377 (https://phabricator.wikimedia.org/T160185) (owner: 10MarkTraceur) [08:07:14] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [08:07:53] !log volans@neodymium conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async [08:08:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:05] (03PS1) 10Joal: Add cron job dropping webrequest from druid [puppet] - 10https://gerrit.wikimedia.org/r/362148 (https://phabricator.wikimedia.org/T168614) [08:08:15] elukey: --^ when you have a minute [08:08:17] !log volans@neodymium conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid [08:08:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:52] 10Operations, 10Patch-For-Review: setup netmon1002.wikimedia.org - https://phabricator.wikimedia.org/T159756#3077541 (10fgiunchedi) >>! In T159756#3388460, @Dzahn wrote: > So yea.. eh @akosiaris any idea how much work that kind of change would be? An alternative to get unblocked now would be to upload `python... [08:14:09] !log volans@neodymium conftool action : set/ttl=60; selector: dnsdisc=citoid [08:14:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:31] !log volans@neodymium conftool action : set/ttl=60; selector: dnsdisc=restbase-async [08:14:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:15:36] !log volans@neodymium conftool action : set/pooled=false; selector: name=codfw,dnsdisc=citoid [08:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:15:54] PROBLEM - pdfrender on scb1004 is CRITICAL: connect to address 10.64.48.29 and port 5252: Connection refused [08:16:09] !log volans@neodymium conftool action : set/pooled=false; selector: name=codfw,dnsdisc=restbase-async [08:16:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:20] elukey: pdfrender, can you quickly look at it if it's the known issue please? busy with the other stuff [08:17:52] ack [08:19:05] !log restart pdfrender on scb1004 - xpra issue [08:19:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:54] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [08:21:33] thanks! [08:22:40] (03CR) 10Alexandros Kosiaris: [C: 031] Change lists.wikimedia.org SPF record to soft fail (~all) (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/361501 (https://phabricator.wikimedia.org/T167703) (owner: 10Herron) [08:25:05] !log failover codfw LVSs to secondaries T168462 [08:25:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:15] T168462: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462 [08:28:15] PROBLEM - pybal on lvs2002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal [08:28:24] PROBLEM - PyBal backends health check on lvs2001 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 [08:28:34] PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 [08:28:44] PROBLEM - pybal on lvs2001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal [08:28:44] PROBLEM - pybal on lvs2003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal [08:29:14] RECOVERY - pybal on lvs2002 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal [08:29:19] !log asw-a-codfw upgrade started - T168462 [08:29:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:47] (03CR) 10Filippo Giunchedi: "> @Filippo: Excuse the naive question, but why don't puppet 'require'" [puppet] - 10https://gerrit.wikimedia.org/r/268598 (owner: 10Ori.livneh) [08:29:59] wait why did pybal restart on lvs2002?? [08:33:27] (03CR) 10Filippo Giunchedi: [C: 031] Add firewall rules for pinkunicorn [puppet] - 10https://gerrit.wikimedia.org/r/361844 (https://phabricator.wikimedia.org/T169039) (owner: 10Ema) [08:34:14] PROBLEM - pybal on lvs2002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal [08:34:16] !log Shutdown MySQL and reboot db1034 for maintenance [08:34:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:34] PROBLEM - PyBal backends health check on lvs2002 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 [08:35:03] (03CR) 10Filippo Giunchedi: [C: 031] apache: add class for mod_php with PHP 7.0 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [08:35:17] (03PS1) 10Joal: Update hadoop fair scheduler queues [puppet] - 10https://gerrit.wikimedia.org/r/362151 (https://phabricator.wikimedia.org/T156841) [08:35:30] elukey: another one for you --^ :) [08:35:52] joal on fire today :D [08:36:14] * joal moves as many stuff as possible before leaving for more than a month :D [08:36:24] (03PS1) 10Marostegui: db1034.yaml: Remove old socket location [puppet] - 10https://gerrit.wikimedia.org/r/362152 (https://phabricator.wikimedia.org/T148507) [08:36:29] will review both code reviews in a bit! [08:36:32] ACKNOWLEDGEMENT - PyBal backends health check on lvs2001 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 Ema pybal stopped to failover to secondaries T168462 [08:36:33] ACKNOWLEDGEMENT - pybal on lvs2001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal Ema pybal stopped to failover to secondaries T168462 [08:36:33] ACKNOWLEDGEMENT - PyBal backends health check on lvs2002 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 Ema pybal stopped to failover to secondaries T168462 [08:36:33] ACKNOWLEDGEMENT - pybal on lvs2002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal Ema pybal stopped to failover to secondaries T168462 [08:36:33] ACKNOWLEDGEMENT - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 Ema pybal stopped to failover to secondaries T168462 [08:36:33] ACKNOWLEDGEMENT - pybal on lvs2003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal Ema pybal stopped to failover to secondaries T168462 [08:36:34] (03CR) 10Filippo Giunchedi: [C: 031] netmon: use existing role::network::monitor, clean up site.pp [puppet] - 10https://gerrit.wikimedia.org/r/362127 (owner: 10Dzahn) [08:37:43] (03CR) 10Jcrespo: [C: 031] db1034.yaml: Remove old socket location [puppet] - 10https://gerrit.wikimedia.org/r/362152 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [08:42:20] (03CR) 10Marostegui: [C: 032] db1034.yaml: Remove old socket location [puppet] - 10https://gerrit.wikimedia.org/r/362152 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [08:44:57] (03PS1) 10Gehel: scap3 - deployment of packge requires configuration to already exist [puppet] - 10https://gerrit.wikimedia.org/r/362155 (https://phabricator.wikimedia.org/T169011) [08:45:56] (03CR) 10jerkins-bot: [V: 04-1] scap3 - deployment of packge requires configuration to already exist [puppet] - 10https://gerrit.wikimedia.org/r/362155 (https://phabricator.wikimedia.org/T169011) (owner: 10Gehel) [08:46:58] (03PS2) 10Gehel: scap3 - deployment of packge requires configuration to already exist [puppet] - 10https://gerrit.wikimedia.org/r/362155 (https://phabricator.wikimedia.org/T169011) [08:47:35] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) is CRITICAL: Test Random title redirect returned the unexpected status 504 (expecting: 303): /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 504 (expecting: 200) [08:47:44] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) is CRITICAL: Test Random title redirect returned the unexpected status 504 (expecting: 303) [08:47:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) is CRITICAL: Test Random title redirect returned the unexpected status 504 (expecting: 303): /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 504 (expecting: 200) [08:47:55] mmmmm [08:48:14] PROBLEM - configured eth on lvs2006 is CRITICAL: eth1 reporting no carrier. [08:48:24] PROBLEM - PyBal backends health check on lvs2006 is CRITICAL: PYBAL CRITICAL - eventbus_8085 - Could not depool server kafka2002.codfw.wmnet because of too many down!: trendingedits_6699 - Could not depool server scb2005.codfw.wmnet because of too many down!: prometheus_80 - Could not depool server prometheus2003.codfw.wmnet because of too many down!: wdqs_80 - Could not depool server wdqs2003.codfw.wmnet because of too many d [08:48:24] - Could not depool server scb2006.codfw.wmnet because of too many down!: search_9200 - Could not depool server elastic2017.codfw.wmnet because of too many down!: ores_8081 - Could not depool server scb2003.codfw.wmnet because of too many down!: swift-https_443 - Could not depool server ms-fe2005.codfw.wmnet because of too many down!: kartotherian_6533 - Could not depool server maps2001.codfw.wmnet because of too many down!: mo [08:48:24] d not depool server scb2003.codfw.wmnet because of too many down!: eventstreams_8092 - Could not depool server scb2003.codfw.wmnet because of too many down!: swift_80 - Could not depool [08:48:24] PROBLEM - configured eth on lvs2004 is CRITICAL: eth1 reporting no carrier. [08:48:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [08:48:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [08:48:44] PROBLEM - Mathoid LVS codfw on mathoid.svc.codfw.wmnet is CRITICAL: /{format}/ (mass-energy equivalence (svg)) timed out before a response was received: / (mass-energy equivalence (json)) timed out before a response was received [08:48:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:48:44] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:48:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received [08:48:49] PROBLEM - Kartotherian LVS codfw on kartotherian.svc.codfw.wmnet is CRITICAL: /{src}/info.json (tile service info for osm-intl) timed out before a response was received [08:48:49] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 112, down: 2, dormant: 0, excluded: 0, unused: 0BRae1: down - Core: asw-a-codfw:ae2BRet-0/0/0: down - Core: asw-a-codfw:et-7/0/52 {#10706} [40Gbps DF]BR [08:48:54] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received: /_info (retrieve service info) timed out before a response was received [08:48:54] PROBLEM - configured eth on lvs2005 is CRITICAL: eth1 reporting no carrier. [08:48:58] PROBLEM - LVS HTTP IPv4 on prometheus.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.25 and port 80: No route to host [08:48:59] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received: / (spec from root) timed out before a response was received: /{domain}/v1/feed/onthisday/{type}/{mm}/{dd} (retrieve all events on January 15) timed out before a response was received: /{domain}/v1/feed/onthisday/{type}/ [08:48:59] the selected anniversaries for January 15) timed out before a response was received: /{domain}/v1/page/mobile-sections/{title} (retrieve en.wp main page via mobile-sections) timed out before a response was received: /{domain}/v1/page/definition/{title} (retrieve en-wiktionary definitions for cat) timed out before a response was received: /{domain}/v1/page/most-read/{yyyy}/{mm}/{dd} (retrieve the most-read articles for January [08:48:59] ated=true)) timed out before a response was received [08:49:14] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v1/dictionary/{word}/{from}/{to}{/provider} (Fetch dictionay meaning with a given provider) timed out before a response was received: /v1/dictionary/{word}/{from}/{to}{/provider} (Fetch dictionay meaning without specifying a provider) timed out before a response was received: / (spec from root) timed out before a response was received: /v1/list/{tool}{/from [08:49:14] ool between two language pairs) timed out before a response was received: /_info/home (redirect to the home page) is CRITICAL: Could not fetch url http://cxserver.svc.codfw.wmnet:8080/_info/home: Generic connection error: HTTPConnectionPool(host=ucxserver.svc.codfw.wmnet, port=8080): Max retries exceeded with url: /_info/home (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7fdf6c5cac50: Failed to es [08:49:14] tion: [Errno 113] No route to host,)): /v1/mt/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium.) timed out before a response was received: /_info/version (retrieve service version) timed out before a response was received [08:49:48] PROBLEM - LVS HTTP IPv4 on search.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.30 and port 9243: No route to host [08:49:48] PROBLEM - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:49:48] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:50:28] PROBLEM - LVS HTTP IPv4 on mathoid.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.20 and port 10042: No route to host [08:50:33] PROBLEM - LVS HTTP IPv4 on ms-fe.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.27 and port 80: No route to host [08:50:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [08:50:54] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [08:50:58] RECOVERY - LVS HTTP IPv4 on search.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 506 bytes in 0.167 second response time [08:51:19] PROBLEM - LVS HTTP IPv4 on cxserver.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.18 and port 8080: No route to host [08:51:45] RECOVERY - Mathoid LVS codfw on mathoid.svc.codfw.wmnet is OK: All endpoints are healthy [08:51:49] RECOVERY - Kartotherian LVS codfw on kartotherian.svc.codfw.wmnet is OK: All endpoints are healthy [08:51:49] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [08:51:49] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [08:51:49] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [08:51:49] RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy [08:51:49] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [08:51:51] what was that? [08:52:04] RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy [08:52:04] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy [08:52:04] RECOVERY - configured eth on lvs2005 is OK: OK - interfaces up [08:52:07] gehel: codfw switch upgrade [08:52:09] RECOVERY - LVS HTTP IPv4 on prometheus.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 156 bytes in 0.073 second response time [08:52:18] RECOVERY - LVS HTTP IPv4 on cxserver.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 904 bytes in 0.075 second response time [08:52:35] RECOVERY - configured eth on lvs2006 is OK: OK - interfaces up [08:52:38] XioNoX: Oh, of course! [08:52:38] XioNoX: lvs2005? [08:52:38] RECOVERY - LVS HTTP IPv4 on mathoid.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.075 second response time [08:52:39] RECOVERY - configured eth on lvs2004 is OK: OK - interfaces up [08:52:48] RECOVERY - LVS HTTP IPv4 on ms-fe.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.387 second response time [08:53:14] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [08:53:15] PROBLEM - HHVM rendering on mw2251 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:53:15] (03PS1) 10Jcrespo: mariadb: handle service for systemd -autostart and overrides [puppet] - 10https://gerrit.wikimedia.org/r/362156 (https://phabricator.wikimedia.org/T168356) [08:53:32] ema: that server is in row B... https://racktables.wikimedia.org/index.php?page=object&tab=default&object_id=2477 [08:54:04] ema: are there cross row cablings? [08:54:05] 10:48 < icinga-wm_> PROBLEM - configured eth on lvs2006 is CRITICAL: eth1 reporting no carrier. [08:54:33] (03CR) 10jerkins-bot: [V: 04-1] mariadb: handle service for systemd -autostart and overrides [puppet] - 10https://gerrit.wikimedia.org/r/362156 (https://phabricator.wikimedia.org/T168356) (owner: 10Jcrespo) [08:54:34] PROBLEM - puppet last run on cp2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:54:37] PROBLEM - Host ganeti2006 is DOWN: PING CRITICAL - Packet loss = 100% [08:54:37] PROBLEM - Host ganeti2008 is DOWN: PING CRITICAL - Packet loss = 100% [08:54:37] PROBLEM - Host ganeti2007 is DOWN: PING CRITICAL - Packet loss = 100% [08:54:37] PROBLEM - Host planet2001 is DOWN: PING CRITICAL - Packet loss = 100% [08:54:37] PROBLEM - Host ganeti2005 is DOWN: PING CRITICAL - Packet loss = 100% [08:54:37] PROBLEM - Host acrux is DOWN: PING CRITICAL - Packet loss = 100% [08:54:45] PROBLEM - Host kubetcd2002 is DOWN: PING CRITICAL - Packet loss = 100% [08:54:45] PROBLEM - Host kubetcd2001 is DOWN: PING CRITICAL - Packet loss = 100% [08:54:55] PROBLEM - Host sca2004 is DOWN: PING CRITICAL - Packet loss = 100% [08:55:04] PROBLEM - Host ns1-v6 is DOWN: PING CRITICAL - Packet loss = 100% [08:55:07] XioNoX: lvs2004/2006 are fine now, but indeed that 'eth1 reporting no carrier' earlier on didn't look good [08:55:08] PROBLEM - LVS HTTP IPv4 on kartotherian.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:55:18] PROBLEM - Kartotherian LVS codfw on kartotherian.svc.codfw.wmnet is CRITICAL: /v4/marker/{base}-{size}-{symbol}+{color}@{scale}x.png (scaled pushpin marker with an icon) timed out before a response was received: /{src}/{z}/{x}/{y}.{format} (get a tile in the middle of the ocean, with overzoom) timed out before a response was received: /_info (test for /_info) timed out before a response was received: /{src}/info.json (tile ser [08:55:18] tl) timed out before a response was received [08:55:18] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:55:18] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [08:55:18] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [08:55:18] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [08:55:18] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:55:19] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:55:19] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:55:20] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:55:20] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [08:55:21] PROBLEM - Mathoid LVS codfw on mathoid.svc.codfw.wmnet is CRITICAL: /{format}/ (mass-energy equivalence (complete)) timed out before a response was received: /{format}/ (mass-energy equivalence (svg)) timed out before a response was received: /{format}/ (mass-energy equivalence (texvcinfo)) timed out before a response was received: /{format}/ (Invaid command (texvcinfo)) timed out before a response was received: /_info (retrie [08:55:34] PROBLEM - Host ns1-v4 is DOWN: PING CRITICAL - Packet loss = 100% [08:55:44] PROBLEM - haproxy failover on dbproxy1002 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [08:55:58] PROBLEM - LVS HTTP IPv4 on mobileapps.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:55:58] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v1/list/pair/{from}/{to} (Get the tools between two language pairs) timed out before a response was received: /v1/list/tool/{tool} (Get the tools for all language pairs) timed out before a response was received: / (spec from root) timed out before a response was received: /_info/home (redirect to the home page) timed out before a response was received: /v1/ [08:55:58] ider} (Machine translate an HTML fragment using Apertium.) timed out before a response was received: /_info/version (retrieve service version) timed out before a response was received [08:55:58] PROBLEM - HHVM rendering on mw2180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:56:03] RECOVERY - LVS HTTP IPv4 on kartotherian.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 2224 bytes in 0.104 second response time [08:56:09] (03PS2) 10Jcrespo: mariadb: handle service for systemd -autostart and overrides [puppet] - 10https://gerrit.wikimedia.org/r/362156 (https://phabricator.wikimedia.org/T168356) [08:56:14] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [08:56:14] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [08:56:14] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [08:56:15] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [08:56:24] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [08:56:44] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received: /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp Altrincham page via mobile-sections-lead) timed out before a response was received: /_info (retrieve service info) timed out before a respon [08:56:44] omain}/v1/feed/onthisday/{type}/{mm}/{dd} (retrieve all events on January 15) timed out before a response was received: /{domain}/v1/page/featured/{yyyy}/{mm}/{dd} (retrieve title of the featured article for April 29, 2016) timed out before a response was received: /{domain}/v1/page/mobile-sections/{title} (retrieve en.wp main page via mobile-sections) timed out before a response was received: /{domain}/v1/page/most-read/{yyyy [08:56:44] e the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received: /{domain}/v1/page/most-read/{yyyy}/{mm}/{dd} (retrieve most-read articles for date with no data (with aggregated=true)) timed out before a response was received [08:56:48] RECOVERY - LVS HTTP IPv4 on mobileapps.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 960 bytes in 0.074 second response time [08:56:54] RECOVERY - HHVM rendering on mw2180 is OK: HTTP OK: HTTP/1.1 200 OK - 75958 bytes in 6.537 second response time [08:57:04] PROBLEM - puppet last run on ms-be2025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:57:14] PROBLEM - puppet last run on cp2022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:57:14] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:57:24] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [08:57:24] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [08:57:24] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [08:57:24] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [08:57:27] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Dinka Wikipedia - https://phabricator.wikimedia.org/T168518#3389851 (10Amire80) Hi. Any blockers here? It has been silent for a week. [08:57:44] PROBLEM - HHVM rendering on mw2255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:57:54] PROBLEM - HHVM rendering on mw2104 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:58:04] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (Zotero alive) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [08:58:14] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (Zotero alive) timed out before a response was received [08:58:14] PROBLEM - Host ripe-atlas-codfw is DOWN: PING CRITICAL - Packet loss = 100% [08:58:15] PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:58:24] PROBLEM - puppet last run on ms-be2030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:58:24] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (Scrapes sample page) timed out before a response was received [08:58:24] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (Scrapes sample page) timed out before a response was received [08:58:34] PROBLEM - HHVM rendering on mw2194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:58:35] RECOVERY - HHVM rendering on mw2255 is OK: HTTP OK: HTTP/1.1 200 OK - 75956 bytes in 0.501 second response time [08:58:44] RECOVERY - HHVM rendering on mw2104 is OK: HTTP OK: HTTP/1.1 200 OK - 75958 bytes in 6.547 second response time [08:59:15] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:59:24] PROBLEM - puppet last run on ganeti2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:59:24] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [08:59:24] PROBLEM - HHVM rendering on mw2126 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:59:28] PROBLEM - LVS HTTP IPv4 on trendingedits.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:59:29] RECOVERY - HHVM rendering on mw2194 is OK: HTTP OK: HTTP/1.1 200 OK - 75958 bytes in 7.543 second response time [08:59:34] PROBLEM - HHVM rendering on mw2130 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:59:34] PROBLEM - puppet last run on cp2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:59:34] PROBLEM - IPsec on cp3038 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2002_v4, cp2002_v6 [08:59:44] PROBLEM - IPsec on mc1020 is CRITICAL: Strongswan CRITICAL - ok: 0 not-conn: mc2020_v4 [08:59:44] RECOVERY - Mathoid LVS codfw on mathoid.svc.codfw.wmnet is OK: All endpoints are healthy [08:59:44] PROBLEM - IPsec on cp3034 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2002_v4, cp2002_v6 [08:59:54] PROBLEM - puppet last run on mc2025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:00:05] PROBLEM - puppet last run on cp2025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:00:05] PROBLEM - IPsec on cp4016 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp2001_v4, cp2001_v6 [09:00:05] PROBLEM - IPsec on cp4010 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp2001_v4, cp2001_v6 [09:00:05] PROBLEM - IPsec on cp4017 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp2001_v4, cp2001_v6 [09:00:05] PROBLEM - IPsec on cp4018 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp2001_v4, cp2001_v6 [09:00:05] PROBLEM - IPsec on cp4014 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2002_v4, cp2002_v6 [09:00:10] please ignore the IPsec spam, I'm trying to ACK those as we go but they're many! [09:00:14] PROBLEM - HHVM rendering on mw2136 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:00:14] PROBLEM - HHVM rendering on mw2112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:00:14] PROBLEM - HHVM rendering on mw2133 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:00:14] PROBLEM - IPsec on cp4013 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2002_v4, cp2002_v6 [09:00:14] PROBLEM - IPsec on cp4015 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2002_v4, cp2002_v6 [09:00:14] PROBLEM - HHVM rendering on mw2120 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:00:14] PROBLEM - puppet last run on cp2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:00:15] PROBLEM - IPsec on mc1019 is CRITICAL: Strongswan CRITICAL - ok: 0 not-conn: mc2019_v4 [09:00:19] RECOVERY - LVS HTTP IPv4 on trendingedits.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 957 bytes in 0.074 second response time [09:00:24] RECOVERY - HHVM rendering on mw2126 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.538 second response time [09:00:24] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [09:00:24] PROBLEM - IPsec on mc1022 is CRITICAL: Strongswan CRITICAL - ok: 0 not-conn: mc2022_v4 [09:00:24] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:00:34] PROBLEM - IPsec on mc1021 is CRITICAL: Strongswan CRITICAL - ok: 0 not-conn: mc2021_v4 [09:00:44] PROBLEM - IPsec on rdb1001 is CRITICAL: Strongswan CRITICAL - ok: 0 not-conn: rdb2001_v4 [09:01:04] RECOVERY - HHVM rendering on mw2112 is OK: HTTP OK: HTTP/1.1 200 OK - 75782 bytes in 0.515 second response time [09:01:04] RECOVERY - HHVM rendering on mw2136 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 1.519 second response time [09:01:04] PROBLEM - puppet last run on cp2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:01:08] PROBLEM - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:01:14] RECOVERY - HHVM rendering on mw2120 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 7.549 second response time [09:01:14] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp Altrincham page via mobile-sections-lead) timed out before a response was received: / (spec from root) timed out before a response was received: /{domain}/v1/page/featured/{yyyy}/{mm}/{dd} (retrieve featured article info for unsupported site (with aggregated=true)) timed out b [09:01:14] received: /{domain}/v1/page/definition/{title} (retrieve en-wiktionary definitions for cat) timed out before a response was received [09:01:24] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [09:01:34] RECOVERY - HHVM rendering on mw2130 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.621 second response time [09:01:34] PROBLEM - HHVM rendering on mw2163 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:01:48] PROBLEM - LVS HTTP IPv4 on prometheus.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:01:54] PROBLEM - HHVM rendering on mw2178 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:02:08] RECOVERY - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.080 second response time [09:02:08] RECOVERY - HHVM rendering on mw2133 is OK: HTTP OK: HTTP/1.1 200 OK - 75782 bytes in 0.520 second response time [09:02:18] PROBLEM - LVS HTTP IPv4 on citoid.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:02:18] PROBLEM - puppet last run on ms-be2033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:02:34] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [09:02:48] RECOVERY - LVS HTTP IPv4 on prometheus.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 156 bytes in 0.072 second response time [09:02:48] RECOVERY - HHVM rendering on mw2163 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 7.533 second response time [09:02:49] RECOVERY - HHVM rendering on mw2178 is OK: HTTP OK: HTTP/1.1 200 OK - 75782 bytes in 0.512 second response time [09:02:54] PROBLEM - Mathoid LVS codfw on mathoid.svc.codfw.wmnet is CRITICAL: /{format}/ (mass-energy equivalence (mml)) timed out before a response was received: /{format}/ (Invaid command (texvcinfo)) timed out before a response was received [09:03:09] RECOVERY - LVS HTTP IPv4 on citoid.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 921 bytes in 0.077 second response time [09:03:14] PROBLEM - HHVM rendering on mw2189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:03:34] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [09:03:34] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:04:08] PROBLEM - ElasticSearch health check for shards on search.svc.codfw.wmnet is CRITICAL: CRITICAL - elasticsearch inactive shards 2198 threshold =0.1 breach: status: red, number_of_nodes: 27, unassigned_shards: 2094, number_of_pending_tasks: 2364, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 3077, task_max_waiting_in_queue_millis: 674059, cluster_name: production-search-codfw, relocating_shards: 0, acti [09:04:08] _number: 76.1966645008, active_shards: 7036, initializing_shards: 104, number_of_data_nodes: 27, delayed_unassigned_shards: 0 [09:04:09] PROBLEM - LVS HTTP IPv4 on eventstreams.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:04:09] RECOVERY - HHVM rendering on mw2189 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.542 second response time [09:04:24] RECOVERY - Graphoid LVS codfw on graphoid.svc.codfw.wmnet is OK: All endpoints are healthy [09:04:24] PROBLEM - puppet last run on cp2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:04:34] PROBLEM - puppet last run on cp2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:04:54] RECOVERY - LVS HTTP IPv4 on eventstreams.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 929 bytes in 0.074 second response time [09:05:14] PROBLEM - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:05:24] PROBLEM - puppet last run on ms-be2022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:05:24] PROBLEM - puppet last run on cp2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:05:34] PROBLEM - HHVM rendering on mw2185 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:05:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [09:05:54] PROBLEM - HHVM rendering on mw2211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:05:54] PROBLEM - puppet last run on cp2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:06:14] PROBLEM - puppet last run on ms-be2037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:06:14] PROBLEM - puppet last run on mc2028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:06:19] PROBLEM - LVS HTTP IPv4 on citoid.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:06:19] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v1/list/pair/{from}/{to} (Get the tools between two language pairs) timed out before a response was received: /v1/list/tool/{tool} (Get the tools for all language pairs) timed out before a response was received: / (root with wrong query param) timed out before a response was received: /_info/home (redirect to the home page) timed out before a response was r [09:06:19] (retrieve service name) timed out before a response was received [09:06:28] PROBLEM - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:06:34] RECOVERY - HHVM rendering on mw2185 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.542 second response time [09:06:34] PROBLEM - puppet last run on ms-be2035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:06:34] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:06:54] PROBLEM - HHVM rendering on mw2129 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:06:54] PROBLEM - HHVM rendering on mw2210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:06:56] RECOVERY - HHVM rendering on mw2211 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 1.519 second response time [09:07:18] RECOVERY - LVS HTTP IPv4 on citoid.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 921 bytes in 0.074 second response time [09:07:23] RECOVERY - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.078 second response time [09:07:28] PROBLEM - LVS HTTP IPv4 on mathoid.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:07:44] RECOVERY - HHVM rendering on mw2129 is OK: HTTP OK: HTTP/1.1 200 OK - 75782 bytes in 0.525 second response time [09:07:54] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:07:54] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [09:07:54] RECOVERY - HHVM rendering on mw2210 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.541 second response time [09:08:04] PROBLEM - puppet last run on ms-be2015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:08:04] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:08:05] PROBLEM - Graphoid LVS codfw on graphoid.svc.codfw.wmnet is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) timed out before a response was received [09:08:09] PROBLEM - LVS HTTP IPv4 on prometheus.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:08:24] PROBLEM - puppet last run on ms-be2031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:08:24] PROBLEM - LVS HTTP IPv4 on eventstreams.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:08:24] PROBLEM - puppet last run on ms-be2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:08:44] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy [09:09:04] PROBLEM - HHVM rendering on mw2196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:09:14] PROBLEM - HHVM rendering on mw2208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:09:54] PROBLEM - puppet last run on mc2029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:10:04] RECOVERY - HHVM rendering on mw2196 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.550 second response time [09:10:14] RECOVERY - HHVM rendering on mw2208 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.536 second response time [09:10:28] RECOVERY - LVS HTTP IPv4 on mathoid.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.074 second response time [09:10:28] PROBLEM - puppet last run on mc2027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:10:44] PROBLEM - puppet last run on ms-be2039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:10:44] PROBLEM - puppet last run on lvs2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:10:44] PROBLEM - puppet last run on ms-be2034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:10:54] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [09:11:05] PROBLEM - Mathoid LVS codfw on mathoid.svc.codfw.wmnet is CRITICAL: /{format}/ (mass-energy equivalence (mml)) timed out before a response was received: /{format}/ (mass-energy equivalence (texvcinfo)) timed out before a response was received [09:11:18] PROBLEM - LVS HTTP IPv4 on cxserver.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:34] PROBLEM - HHVM rendering on mw2144 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:34] PROBLEM - puppet last run on ms-be2021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:11:54] PROBLEM - Kafka Broker Under Replicated Partitions on kafka2003 is CRITICAL: CRITICAL: 58.62% of data above the critical threshold [10.0] [09:12:04] PROBLEM - puppet last run on cp2020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:12:08] RECOVERY - LVS HTTP IPv4 on cxserver.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 904 bytes in 0.076 second response time [09:12:08] PROBLEM - HHVM rendering on mw2183 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:12:34] RECOVERY - HHVM rendering on mw2144 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.643 second response time [09:12:54] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:13:04] RECOVERY - HHVM rendering on mw2183 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 7.535 second response time [09:13:14] RECOVERY - Graphoid LVS codfw on graphoid.svc.codfw.wmnet is OK: All endpoints are healthy [09:13:34] PROBLEM - puppet last run on ms-be2027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:13:34] PROBLEM - puppet last run on ms-be2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:13:44] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:13:54] PROBLEM - puppet last run on ms-be2032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:13:54] PROBLEM - puppet last run on ms-be2038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:14:04] PROBLEM - Kafka Broker Under Replicated Partitions on kafka2002 is CRITICAL: CRITICAL: 64.29% of data above the critical threshold [10.0] [09:14:04] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received [09:14:04] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (bad URL) timed out before a response was received [09:14:04] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [09:14:04] PROBLEM - HHVM rendering on mw2130 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:14:34] PROBLEM - HHVM rendering on mw2103 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:14:35] PROBLEM - puppet last run on cp2010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:14:35] PROBLEM - puppet last run on ms-be2020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:14:54] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [09:15:04] PROBLEM - puppet last run on cp2023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:15:24] RECOVERY - HHVM rendering on mw2103 is OK: HTTP OK: HTTP/1.1 200 OK - 75782 bytes in 0.517 second response time [09:15:34] PROBLEM - HHVM rendering on mw2165 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:15:34] PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:15:44] PROBLEM - puppet last run on ms-be2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:15:54] PROBLEM - Graphoid LVS codfw on graphoid.svc.codfw.wmnet is CRITICAL: /_info (retrieve service info) timed out before a response was received: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) timed out before a response was received [09:15:55] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:16:04] RECOVERY - HHVM rendering on mw2130 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.292 second response time [09:16:24] RECOVERY - HHVM rendering on mw2165 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.523 second response time [09:16:34] PROBLEM - HHVM rendering on mw2207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:16:34] PROBLEM - puppet last run on mc2030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:16:34] PROBLEM - puppet last run on mc2035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:16:34] PROBLEM - HHVM rendering on mw2116 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:16:44] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:16:54] PROBLEM - puppet last run on ms-fe2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:17:04] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [09:17:04] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [09:17:08] RECOVERY - LVS HTTP IPv4 on prometheus.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 156 bytes in 0.073 second response time [09:17:24] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v1/list/tool/{tool} (Get the tools for all language pairs) timed out before a response was received: /v1/dictionary/{word}/{from}/{to}{/provider} (Fetch dictionay meaning with a given provider) timed out before a response was received: /v1/list/{tool}{/from}{/to} (Get the MT tool between two language pairs) timed out before a response was received: /_info/h [09:17:24] home page) timed out before a response was received [09:17:24] RECOVERY - HHVM rendering on mw2116 is OK: HTTP OK: HTTP/1.1 200 OK - 75782 bytes in 0.517 second response time [09:17:25] RECOVERY - HHVM rendering on mw2207 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.535 second response time [09:17:34] PROBLEM - HHVM rendering on mw2204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:17:38] PROBLEM - LVS HTTP IPv4 on ores.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:17:38] PROBLEM - puppet last run on mc2034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:17:44] PROBLEM - HHVM rendering on mw2107 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:17:54] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:18:04] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [09:18:04] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [09:18:14] PROBLEM - puppet last run on ms-be2014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:18:28] RECOVERY - LVS HTTP IPv4 on ores.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 4375 bytes in 0.075 second response time [09:18:34] RECOVERY - HHVM rendering on mw2204 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.289 second response time [09:18:44] PROBLEM - puppet last run on mc2023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:18:44] PROBLEM - puppet last run on cp2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:18:44] RECOVERY - HHVM rendering on mw2107 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 7.548 second response time [09:18:44] PROBLEM - puppet last run on cp2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:19:14] PROBLEM - Graphoid LVS codfw on graphoid.svc.codfw.wmnet is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) timed out before a response was received [09:19:24] RECOVERY - LVS HTTP IPv4 on eventstreams.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 929 bytes in 0.074 second response time [09:20:04] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy [09:20:04] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:20:34] PROBLEM - HHVM rendering on mw2195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:20:34] PROBLEM - puppet last run on labtestneutron2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:20:44] PROBLEM - puppet last run on mc2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:20:44] PROBLEM - puppet last run on mw2108 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:20:54] PROBLEM - puppet last run on mw2255 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:08] PROBLEM - LVS HTTP IPv4 on trendingedits.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:21:14] PROBLEM - HHVM rendering on mw2099 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:21:24] PROBLEM - puppet last run on wtp2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:24] PROBLEM - puppet last run on mw2111 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:28] PROBLEM - LVS HTTP IPv4 on prometheus.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:21:34] RECOVERY - HHVM rendering on mw2195 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.536 second response time [09:21:44] PROBLEM - puppet last run on labtestvirt2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:44] PROBLEM - puppet last run on wtp2014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:44] PROBLEM - puppet last run on wtp2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:44] PROBLEM - puppet last run on labtestnet2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:44] PROBLEM - puppet last run on db2041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:44] PROBLEM - puppet last run on mw2141 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:58] PROBLEM - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:21:59] PROBLEM - puppet last run on db2074 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:21:59] PROBLEM - puppet last run on cp2009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:08] RECOVERY - LVS HTTP IPv4 on trendingedits.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 957 bytes in 0.074 second response time [09:22:08] PROBLEM - HHVM rendering on mw2203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:22:08] RECOVERY - Graphoid LVS codfw on graphoid.svc.codfw.wmnet is OK: All endpoints are healthy [09:22:09] PROBLEM - HHVM rendering on mw2194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:22:34] PROBLEM - puppet last run on mw2170 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:34] PROBLEM - puppet last run on mw2212 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:34] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:35] PROBLEM - puppet last run on elastic2032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:35] PROBLEM - puppet last run on mc2033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:44] PROBLEM - HHVM rendering on mw2207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:22:44] PROBLEM - puppet last run on restbase2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:44] PROBLEM - puppet last run on elastic2031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:44] PROBLEM - puppet last run on kafka2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:44] PROBLEM - puppet last run on labtestneutron2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:44] PROBLEM - puppet last run on cp2015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:22:54] RECOVERY - HHVM rendering on mw2203 is OK: HTTP OK: HTTP/1.1 200 OK - 75782 bytes in 0.513 second response time [09:23:04] PROBLEM - puppet last run on scb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:04] PROBLEM - puppet last run on wdqs2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:04] PROBLEM - puppet last run on ms-fe2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:04] PROBLEM - puppet last run on ms-fe2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:05] RECOVERY - HHVM rendering on mw2194 is OK: HTTP OK: HTTP/1.1 200 OK - 75784 bytes in 6.536 second response time [09:23:28] RECOVERY - LVS HTTP IPv4 on prometheus.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 156 bytes in 0.073 second response time [09:23:28] PROBLEM - puppet last run on cp2014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:34] PROBLEM - HHVM rendering on mw2179 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:23:34] PROBLEM - puppet last run on db2065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:34] PROBLEM - HHVM rendering on mw2198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:23:34] PROBLEM - puppet last run on mw2175 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:34] PROBLEM - puppet last run on mw2140 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:34] PROBLEM - puppet last run on wtp2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:44] PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:44] PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:44] PROBLEM - puppet last run on mw2116 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:44] PROBLEM - puppet last run on mw2099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:44] PROBLEM - puppet last run on mc2032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:48] RECOVERY - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.080 second response time [09:23:49] PROBLEM - puppet last run on maps2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:49] PROBLEM - puppet last run on mw2130 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:49] PROBLEM - puppet last run on db2063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:49] PROBLEM - LVS HTTP IPv4 on eventstreams.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:23:54] PROBLEM - puppet last run on ores2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:54] PROBLEM - puppet last run on mw2206 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:54] PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:54] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:23:54] PROBLEM - puppet last run on elastic2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:24] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [09:24:34] RECOVERY - HHVM rendering on mw2179 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.543 second response time [09:24:38] PROBLEM - LVS HTTP IPv4 on cxserver.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:24:38] RECOVERY - HHVM rendering on mw2198 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.520 second response time [09:24:38] PROBLEM - puppet last run on conf2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:38] PROBLEM - puppet last run on wtp2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:44] RECOVERY - HHVM rendering on mw2207 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.536 second response time [09:24:44] PROBLEM - puppet last run on mw2188 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:44] PROBLEM - puppet last run on mw2161 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:44] PROBLEM - puppet last run on elastic2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:44] PROBLEM - puppet last run on mw2184 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:44] PROBLEM - puppet last run on oresrdb2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:54] PROBLEM - puppet last run on mc2036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:54] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v1/list/tool/{tool} (Get the tools for all language pairs) timed out before a response was received: /v1/dictionary/{word}/{from}/{to}{/provider} (Fetch dictionay meaning with a given provider) timed out before a response was received: /v1/list/{tool}{/from}{/to} (Get the MT tool between two language pairs) timed out before a response was received: /_info/h [09:24:54] home page) timed out before a response was received [09:24:54] PROBLEM - puppet last run on wtp2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:55] PROBLEM - puppet last run on db2052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:55] PROBLEM - puppet last run on mw2181 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:55] PROBLEM - puppet last run on mw2115 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:04] PROBLEM - puppet last run on cp2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:14] PROBLEM - puppet last run on db2049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:14] PROBLEM - puppet last run on ganeti2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:14] PROBLEM - puppet last run on wtp2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:14] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:24] PROBLEM - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [09:25:34] PROBLEM - HHVM rendering on mw2102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:25:34] PROBLEM - puppet last run on pc2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:34] PROBLEM - puppet last run on labstore2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:44] PROBLEM - puppet last run on eventlog2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:54] PROBLEM - puppet last run on elastic2021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:54] PROBLEM - puppet last run on mc2031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:54] PROBLEM - puppet last run on mw2134 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:54] PROBLEM - puppet last run on lvs4004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:54] PROBLEM - puppet last run on db2077 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:55] PROBLEM - puppet last run on cp2026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:56] PROBLEM - puppet last run on db2069 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:56] PROBLEM - puppet last run on mc2026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:56] PROBLEM - puppet last run on ms-be2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:04] PROBLEM - puppet last run on mw2196 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:14] PROBLEM - puppet last run on mw2254 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:14] PROBLEM - puppet last run on wtp2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:14] PROBLEM - puppet last run on wtp2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:14] PROBLEM - puppet last run on kubetcd2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:24] RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy [09:26:24] RECOVERY - HHVM rendering on mw2099 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.547 second response time [09:26:29] PROBLEM - LVS HTTP IPv4 on trendingedits.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:26:30] RECOVERY - HHVM rendering on mw2102 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.537 second response time [09:26:44] PROBLEM - puppet last run on oresrdb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:44] PROBLEM - puppet last run on labstore2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:44] RECOVERY - LVS HTTP IPv4 on eventstreams.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 929 bytes in 0.074 second response time [09:26:44] PROBLEM - puppet last run on mw2156 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:54] PROBLEM - puppet last run on mw2100 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:54] PROBLEM - puppet last run on mw2133 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:54] PROBLEM - puppet last run on maps2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:58] PROBLEM - LVS HTTP IPv4 on mathoid.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:27:04] PROBLEM - puppet last run on mw2166 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:05] PROBLEM - puppet last run on mw2117 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:14] PROBLEM - puppet last run on ganeti2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:18] RECOVERY - LVS HTTP IPv4 on trendingedits.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 957 bytes in 0.074 second response time [09:27:24] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (Scrapes sample page) timed out before a response was received [09:27:27] for everybody reading - there is a network maintenance ongoing in codfw, no impact to users [09:27:34] PROBLEM - puppet last run on mw2159 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:44] PROBLEM - puppet last run on zosma is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:44] PROBLEM - puppet last run on mw2169 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:44] PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:44] PROBLEM - puppet last run on kubernetes2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:44] PROBLEM - puppet last run on db2030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:44] PROBLEM - puppet last run on mw2190 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:44] PROBLEM - puppet last run on mw2135 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:54] PROBLEM - puppet last run on mw2209 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:54] PROBLEM - puppet last run on mw2132 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:54] PROBLEM - puppet last run on wtp2015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:27:54] PROBLEM - puppet last run on mw2112 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:28:04] PROBLEM - puppet last run on elastic2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:28:14] PROBLEM - puppet last run on rdb2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:28:14] PROBLEM - puppet last run on wtp2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:28:28] RECOVERY - Kartotherian LVS codfw on kartotherian.svc.codfw.wmnet is OK: All endpoints are healthy [09:28:34] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received [09:28:34] PROBLEM - HHVM rendering on mw2100 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:28:34] PROBLEM - HHVM rendering on mw2179 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:28:48] RECOVERY - LVS HTTP IPv4 on cxserver.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 904 bytes in 0.075 second response time [09:28:48] RECOVERY - Mathoid LVS codfw on mathoid.svc.codfw.wmnet is OK: All endpoints are healthy [09:28:49] RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy [09:28:49] PROBLEM - puppet last run on pc2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:28:49] PROBLEM - puppet last run on mw2151 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:28:54] PROBLEM - HHVM rendering on mw2200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:28:54] PROBLEM - HHVM rendering on mw2172 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:28:54] PROBLEM - puppet last run on db2045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:28:54] PROBLEM - puppet last run on mw2164 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:28:58] RECOVERY - LVS HTTP IPv4 on mathoid.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.074 second response time [09:29:04] !log silence paging alerts for *.svc.codfw.wmnet for two hours - T168462 [09:29:04] PROBLEM - puppet last run on restbase2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:04] PROBLEM - puppet last run on restbase2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:04] PROBLEM - puppet last run on db2051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:04] PROBLEM - HHVM rendering on mw2201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:29:04] PROBLEM - puppet last run on db2048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:14] PROBLEM - puppet last run on db2076 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:14] PROBLEM - puppet last run on mw2138 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:15] T168462: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462 [09:29:24] PROBLEM - puppet last run on mw2152 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:24] PROBLEM - puppet last run on wdqs2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:25] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [09:29:25] PROBLEM - HHVM rendering on mw2126 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:29:25] PROBLEM - HHVM rendering on mw2256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:29:25] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [09:29:25] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [09:29:34] PROBLEM - HHVM rendering on mw2130 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:29:34] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [09:29:34] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [09:29:35] RECOVERY - HHVM rendering on mw2179 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.292 second response time [09:29:44] PROBLEM - HHVM rendering on mw2102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:29:44] PROBLEM - puppet last run on restbase-test2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:44] PROBLEM - puppet last run on pybal-test2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:45] PROBLEM - puppet last run on elastic2029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:45] PROBLEM - puppet last run on restbase2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:45] RECOVERY - HHVM rendering on mw2200 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.290 second response time [09:29:54] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy [09:29:54] PROBLEM - puppet last run on elastic2009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:54] PROBLEM - puppet last run on mw2185 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:54] PROBLEM - puppet last run on mw2189 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:55] PROBLEM - HHVM rendering on mw2184 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:29:55] PROBLEM - HHVM rendering on mw2189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:04] PROBLEM - puppet last run on db2059 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:14] PROBLEM - puppet last run on mw2195 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:14] PROBLEM - puppet last run on mw2194 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:14] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:14] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [09:30:24] PROBLEM - HHVM rendering on mw2171 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:24] PROBLEM - HHVM rendering on mw2175 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:24] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:24] PROBLEM - HHVM rendering on mw2186 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:24] PROBLEM - puppet last run on db2092 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:24] PROBLEM - puppet last run on pybal-test2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:24] PROBLEM - puppet last run on wtp2009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:25] PROBLEM - HHVM rendering on mw2194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:25] PROBLEM - puppet last run on nihal is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:34] PROBLEM - HHVM rendering on mw2174 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:34] PROBLEM - HHVM rendering on mw2187 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:45] PROBLEM - HHVM rendering on mw2257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:45] PROBLEM - HHVM rendering on mw2178 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:45] PROBLEM - HHVM rendering on mw2208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:54] PROBLEM - HHVM rendering on mw2205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:54] PROBLEM - HHVM rendering on mw2197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:54] PROBLEM - puppet last run on elastic2030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:54] PROBLEM - puppet last run on mw2172 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:54] PROBLEM - HHVM rendering on mw2191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:54] PROBLEM - puppet last run on mw2145 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:30:54] PROBLEM - HHVM rendering on mw2198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:55] PROBLEM - HHVM rendering on mw2109 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:55] PROBLEM - HHVM rendering on mw2122 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:56] PROBLEM - HHVM rendering on mw2104 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:57] PROBLEM - HHVM rendering on mw2211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:57] PROBLEM - puppet last run on db2060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:14] PROBLEM - puppet last run on mw2163 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:14] PROBLEM - puppet last run on mw2187 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:14] PROBLEM - HHVM rendering on mw2128 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:14] PROBLEM - HHVM rendering on mw2209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:14] PROBLEM - HHVM rendering on mw2136 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:14] RECOVERY - HHVM rendering on mw2186 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 1.270 second response time [09:31:24] PROBLEM - HHVM rendering on mw2176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:24] PROBLEM - puppet last run on db2067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:24] RECOVERY - HHVM rendering on mw2194 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.296 second response time [09:31:24] PROBLEM - puppet last run on cp2021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:24] PROBLEM - puppet last run on kafka2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:24] RECOVERY - HHVM rendering on mw2174 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.287 second response time [09:31:34] PROBLEM - HHVM rendering on mw2163 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:34] PROBLEM - HHVM rendering on mw2117 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:34] PROBLEM - HHVM rendering on mw2168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:34] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:34] RECOVERY - HHVM rendering on mw2187 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.438 second response time [09:31:44] PROBLEM - puppet last run on restbase2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:44] RECOVERY - HHVM rendering on mw2208 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.295 second response time [09:31:44] RECOVERY - HHVM rendering on mw2257 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.519 second response time [09:31:44] PROBLEM - puppet last run on wezen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:45] RECOVERY - HHVM rendering on mw2178 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.291 second response time [09:31:45] RECOVERY - HHVM rendering on mw2197 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.535 second response time [09:31:54] PROBLEM - puppet last run on mw2153 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:31:58] RECOVERY - ElasticSearch health check for shards on search.svc.codfw.wmnet is OK: OK - elasticsearch status production-search-codfw: status: red, number_of_nodes: 27, unassigned_shards: 843, number_of_pending_tasks: 3, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 3080, task_max_waiting_in_queue_millis: 934, cluster_name: production-search-codfw, relocating_shards: 0, active_shards_percent_as_number: 9 [09:31:59] _shards: 8325, initializing_shards: 69, number_of_data_nodes: 27, delayed_unassigned_shards: 0 [09:31:59] RECOVERY - HHVM rendering on mw2135 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.295 second response time [09:31:59] RECOVERY - HHVM rendering on mw2103 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.304 second response time [09:31:59] RECOVERY - HHVM rendering on mw2255 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.524 second response time [09:31:59] RECOVERY - HHVM rendering on mw2204 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.536 second response time [09:31:59] RECOVERY - HHVM rendering on mw2172 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.537 second response time [09:32:00] PROBLEM - puppet last run on mw2101 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:32:04] PROBLEM - HHVM rendering on mw2192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:04] PROBLEM - puppet last run on db2033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:32:04] PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:32:14] RECOVERY - HHVM rendering on mw2201 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.400 second response time [09:32:14] RECOVERY - HHVM rendering on mw2166 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.420 second response time [09:32:14] RECOVERY - HHVM rendering on mw2134 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.549 second response time [09:32:14] PROBLEM - puppet last run on mw2199 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:32:14] PROBLEM - puppet last run on mw2211 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:32:14] PROBLEM - HHVM rendering on mw2206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:24] PROBLEM - HHVM rendering on mw2131 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:24] PROBLEM - puppet last run on db2083 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:32:34] RECOVERY - HHVM rendering on mw2163 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.164 second response time [09:32:34] PROBLEM - puppet last run on es2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:32:44] RECOVERY - HHVM rendering on mw2100 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.146 second response time [09:32:44] PROBLEM - HHVM rendering on mw2137 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:54] RECOVERY - HHVM rendering on mw2198 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.289 second response time [09:32:54] RECOVERY - HHVM rendering on mw2122 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 8.696 second response time [09:32:54] PROBLEM - puppet last run on mw2203 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:32:54] PROBLEM - puppet last run on wtp2020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:32:54] PROBLEM - HHVM rendering on mw2195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:33:04] RECOVERY - HHVM rendering on mw2192 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.281 second response time [09:33:04] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:14] PROBLEM - puppet last run on kubernetes2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:14] PROBLEM - puppet last run on db2080 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:14] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:14] PROBLEM - HHVM rendering on mw2258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:33:24] PROBLEM - puppet last run on lvs4002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:44] PROBLEM - puppet last run on mw2180 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:54] PROBLEM - puppet last run on elastic2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:54] PROBLEM - puppet last run on rdb2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:54] PROBLEM - puppet last run on mw2155 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:54] PROBLEM - puppet last run on wtp2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:55] PROBLEM - HHVM rendering on mw2132 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:33:55] PROBLEM - HHVM rendering on mw2116 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:33:55] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:04] PROBLEM - HHVM rendering on mw2106 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:34:04] PROBLEM - puppet last run on graphite2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:04] PROBLEM - puppet last run on db2042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:14] PROBLEM - puppet last run on elastic2022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:14] PROBLEM - puppet last run on db2082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:14] PROBLEM - puppet last run on mw2097 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:14] PROBLEM - puppet last run on restbase2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:24] PROBLEM - puppet last run on elastic2015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:24] RECOVERY - HHVM rendering on mw2171 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.534 second response time [09:34:24] PROBLEM - puppet last run on wtp2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:26] PROBLEM - HHVM rendering on mw2120 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:34:26] RECOVERY - HHVM rendering on mw2126 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.548 second response time [09:34:34] PROBLEM - HHVM rendering on mw2121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:34:34] RECOVERY - HHVM rendering on mw2130 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 8.368 second response time [09:34:34] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) is CRITICAL: Test Random title redirect returned the unexpected status 504 (expecting: 303) [09:34:44] PROBLEM - HHVM rendering on mw2125 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:34:55] PROBLEM - puppet last run on elastic2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:55] PROBLEM - puppet last run on mw2162 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:55] PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:55] PROBLEM - HHVM rendering on mw2193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:34:55] PROBLEM - puppet last run on mw2104 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:04] PROBLEM - HHVM rendering on mw2172 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:04] PROBLEM - puppet last run on mw2103 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:14] RECOVERY - HHVM rendering on mw2111 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.761 second response time [09:35:14] PROBLEM - puppet last run on mw2167 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:15] PROBLEM - puppet last run on elastic2023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:15] PROBLEM - HHVM rendering on mw2167 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:24] PROBLEM - HHVM rendering on mw2105 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:24] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:34] PROBLEM - HHVM rendering on mw2194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:34] PROBLEM - HHVM rendering on mw2163 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:34] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [09:35:44] RECOVERY - HHVM rendering on mw2125 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 8.514 second response time [09:35:44] PROBLEM - HHVM rendering on mw2187 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:44] PROBLEM - HHVM rendering on mw2179 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:44] PROBLEM - HHVM rendering on mw2101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:54] PROBLEM - HHVM rendering on mw2178 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:54] PROBLEM - HHVM rendering on mw2257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:54] PROBLEM - puppet last run on pc2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:54] PROBLEM - puppet last run on elastic2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:54] PROBLEM - HHVM rendering on mw2110 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:55] PROBLEM - HHVM rendering on mw2180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:55] PROBLEM - puppet last run on mw2171 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:55] PROBLEM - puppet last run on mw2182 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:57] PROBLEM - puppet last run on mw2210 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:35:57] PROBLEM - HHVM rendering on mw2142 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:04] PROBLEM - HHVM rendering on mw2170 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:04] PROBLEM - HHVM rendering on mw2123 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:04] PROBLEM - HHVM rendering on mw2204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:04] PROBLEM - HHVM rendering on mw2103 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:04] PROBLEM - HHVM rendering on mw2135 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:04] PROBLEM - HHVM rendering on mw2138 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:04] PROBLEM - HHVM rendering on mw2214 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:05] PROBLEM - HHVM rendering on mw2192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:05] PROBLEM - HHVM rendering on mw2113 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:06] PROBLEM - HHVM rendering on mw2212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:06] PROBLEM - HHVM rendering on mw2097 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:07] PROBLEM - HHVM rendering on mw2213 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:14] PROBLEM - puppet last run on restbase-test2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:36:14] PROBLEM - puppet last run on db2073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:36:24] PROBLEM - HHVM rendering on mw2177 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:24] PROBLEM - HHVM rendering on mw2112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:24] RECOVERY - HHVM rendering on mw2121 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 1.517 second response time [09:36:24] PROBLEM - puppet last run on mw2257 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:36:24] RECOVERY - HHVM rendering on mw2117 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.497 second response time [09:36:24] RECOVERY - HHVM rendering on mw2168 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.514 second response time [09:36:25] RECOVERY - HHVM rendering on mw2120 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.559 second response time [09:36:25] RECOVERY - HHVM rendering on mw2194 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.539 second response time [09:36:34] RECOVERY - HHVM rendering on mw2256 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.526 second response time [09:36:34] RECOVERY - HHVM rendering on mw2102 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 1.526 second response time [09:36:44] RECOVERY - HHVM rendering on mw2187 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.544 second response time [09:36:44] RECOVERY - HHVM rendering on mw2179 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.285 second response time [09:36:44] RECOVERY - HHVM rendering on mw2137 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.518 second response time [09:36:44] RECOVERY - HHVM rendering on mw2101 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.555 second response time [09:36:44] RECOVERY - HHVM rendering on mw2257 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.518 second response time [09:36:45] RECOVERY - HHVM rendering on mw2109 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.506 second response time [09:36:45] RECOVERY - HHVM rendering on mw2195 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.509 second response time [09:36:45] RECOVERY - HHVM rendering on mw2211 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.518 second response time [09:36:54] RECOVERY - HHVM rendering on mw2178 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.533 second response time [09:36:54] RECOVERY - HHVM rendering on mw2170 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.515 second response time [09:36:54] RECOVERY - HHVM rendering on mw2205 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.537 second response time [09:36:54] RECOVERY - HHVM rendering on mw2110 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.552 second response time [09:36:54] RECOVERY - HHVM rendering on mw2116 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.524 second response time [09:36:54] RECOVERY - HHVM rendering on mw2132 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.526 second response time [09:36:54] PROBLEM - puppet last run on mw2142 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:36:55] RECOVERY - HHVM rendering on mw2172 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.309 second response time [09:36:55] RECOVERY - HHVM rendering on mw2191 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.539 second response time [09:36:56] RECOVERY - HHVM rendering on mw2204 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.515 second response time [09:36:56] RECOVERY - HHVM rendering on mw2104 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.548 second response time [09:36:57] RECOVERY - HHVM rendering on mw2193 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.536 second response time [09:37:14] PROBLEM - puppet last run on db2070 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:14] RECOVERY - HHVM rendering on mw2128 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 1.524 second response time [09:37:14] RECOVERY - HHVM rendering on mw2209 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.503 second response time [09:37:14] PROBLEM - puppet last run on elastic2036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:14] PROBLEM - HHVM rendering on mw2107 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:37:14] RECOVERY - HHVM rendering on mw2136 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.533 second response time [09:37:15] RECOVERY - HHVM rendering on mw2206 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.548 second response time [09:37:15] RECOVERY - HHVM rendering on mw2112 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.550 second response time [09:37:24] RECOVERY - HHVM rendering on mw2176 is OK: HTTP OK: HTTP/1.1 200 OK - 75774 bytes in 0.508 second response time [09:37:24] RECOVERY - HHVM rendering on mw2258 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 7.525 second response time [09:37:24] RECOVERY - HHVM rendering on mw2105 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.543 second response time [09:37:24] RECOVERY - HHVM rendering on mw2175 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 1.510 second response time [09:37:24] RECOVERY - HHVM rendering on mw2131 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.547 second response time [09:37:24] PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:24] PROBLEM - puppet last run on labtestvirt2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:34] PROBLEM - puppet last run on es2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:34] PROBLEM - puppet last run on es2015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:34] RECOVERY - HHVM rendering on mw2163 is OK: HTTP OK: HTTP/1.1 200 OK - 75776 bytes in 6.538 second response time [09:37:44] PROBLEM - puppet last run on restbase-test2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:54] PROBLEM - puppet last run on labtestnet2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:54] PROBLEM - puppet last run on mw2144 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:54] PROBLEM - puppet last run on db2062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:54] PROBLEM - puppet last run on mw2143 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:54] PROBLEM - puppet last run on labstore2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:54] PROBLEM - puppet last run on restbase2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:37:55] PROBLEM - puppet last run on mw2123 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:04] PROBLEM - puppet last run on db2058 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:04] PROBLEM - puppet last run on elastic2034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:14] RECOVERY - HHVM rendering on mw2107 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 1.523 second response time [09:38:14] PROBLEM - puppet last run on mw2176 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:14] PROBLEM - puppet last run on elastic2014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:14] RECOVERY - HHVM rendering on mw2177 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.536 second response time [09:38:34] PROBLEM - puppet last run on pybal-test2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:54] PROBLEM - puppet last run on db2053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:54] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:54] PROBLEM - puppet last run on mw2154 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:54] PROBLEM - puppet last run on mw2193 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:54] PROBLEM - puppet last run on elastic2035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:39:14] PROBLEM - puppet last run on mw2165 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:39:34] PROBLEM - puppet last run on es2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:39:44] PROBLEM - puppet last run on mw2119 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:39:54] PROBLEM - puppet last run on wtp2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:39:54] PROBLEM - puppet last run on restbase2009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:39:54] PROBLEM - puppet last run on mw2183 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:04] PROBLEM - puppet last run on mw2204 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:04] PROBLEM - HHVM rendering on mw2106 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:40:04] PROBLEM - puppet last run on db2056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:14] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:14] PROBLEM - puppet last run on db2066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:24] PROBLEM - puppet last run on db2087 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:24] PROBLEM - puppet last run on db2086 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:24] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:34] PROBLEM - puppet last run on es2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:54] PROBLEM - puppet last run on elastic2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:54] PROBLEM - puppet last run on mw2150 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:04] PROBLEM - puppet last run on mw2198 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:04] RECOVERY - HHVM rendering on mw2106 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.543 second response time [09:41:15] PROBLEM - puppet last run on mw2202 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:25] PROBLEM - puppet last run on thumbor2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:25] PROBLEM - puppet last run on mw2260 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:34] PROBLEM - HHVM rendering on mw2164 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:41:34] PROBLEM - HHVM rendering on mw2203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:41:34] PROBLEM - puppet last run on es2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:54] PROBLEM - puppet last run on db2064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:54] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:54] PROBLEM - puppet last run on mw2160 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:56] PROBLEM - puppet last run on elastic2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:42:04] PROBLEM - puppet last run on subra is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:42:34] RECOVERY - HHVM rendering on mw2164 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.535 second response time [09:42:34] PROBLEM - HHVM rendering on mw2120 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:42:34] RECOVERY - HHVM rendering on mw2203 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 7.275 second response time [09:42:44] PROBLEM - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received [09:42:48] (03CR) 10Elukey: [C: 04-1] "LGTM, formally blocking until we'll have the refinery deployed correctly" [puppet] - 10https://gerrit.wikimedia.org/r/362148 (https://phabricator.wikimedia.org/T168614) (owner: 10Joal) [09:42:54] PROBLEM - puppet last run on maps2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:04] PROBLEM - puppet last run on mw2200 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:04] PROBLEM - puppet last run on mw2147 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:14] PROBLEM - puppet last run on db2078 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:24] RECOVERY - HHVM rendering on mw2120 is OK: HTTP OK: HTTP/1.1 200 OK - 75824 bytes in 0.523 second response time [09:43:24] PROBLEM - puppet last run on ores2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:24] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (Zotero alive) timed out before a response was received [09:43:24] PROBLEM - puppet last run on mw2259 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:24] PROBLEM - puppet last run on thumbor2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:34] PROBLEM - HHVM rendering on mw2210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:34] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (Zotero alive) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [09:43:34] RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy [09:43:44] PROBLEM - HHVM rendering on mw2129 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:46] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (Zotero alive) timed out before a response was received [09:43:46] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (Zotero alive) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [09:43:46] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received [09:43:54] PROBLEM - puppet last run on mw2168 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:54] PROBLEM - puppet last run on tureis is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:54] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:43:54] PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:04] PROBLEM - puppet last run on mw2214 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:04] PROBLEM - puppet last run on mw2174 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:04] PROBLEM - puppet last run on scb2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:14] PROBLEM - puppet last run on mw2114 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:14] PROBLEM - puppet last run on conf2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:14] PROBLEM - puppet last run on mw2213 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:14] PROBLEM - puppet last run on mw2105 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:14] PROBLEM - puppet last run on mw2113 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:24] RECOVERY - HHVM rendering on mw2210 is OK: HTTP OK: HTTP/1.1 200 OK - 75824 bytes in 0.518 second response time [09:44:24] PROBLEM - puppet last run on restbase2010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:24] PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:24] PROBLEM - puppet last run on db2084 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:34] PROBLEM - puppet last run on mw2258 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:34] PROBLEM - puppet last run on rdb2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:34] PROBLEM - puppet last run on wtp2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [09:44:44] RECOVERY - HHVM rendering on mw2129 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.548 second response time [09:44:44] PROBLEM - HHVM rendering on mw2196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:45:04] PROBLEM - puppet last run on mw2186 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:04] PROBLEM - puppet last run on mw2110 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:04] PROBLEM - puppet last run on mw2124 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:15] PROBLEM - puppet last run on acrab is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:34] PROBLEM - puppet last run on db2088 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:34] PROBLEM - puppet last run on wtp2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:34] PROBLEM - HHVM rendering on mw2194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:45:34] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:35] RECOVERY - HHVM rendering on mw2196 is OK: HTTP OK: HTTP/1.1 200 OK - 75824 bytes in 0.513 second response time [09:45:54] PROBLEM - puppet last run on mwlog2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:54] PROBLEM - puppet last run on mw2173 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:54] PROBLEM - puppet last run on mw2118 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:55] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:04] PROBLEM - puppet last run on labtestvirt2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:06] PROBLEM - puppet last run on mw2205 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:14] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:14] PROBLEM - puppet last run on mw2201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:14] PROBLEM - puppet last run on restbase2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:24] PROBLEM - puppet last run on db2044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:24] PROBLEM - puppet last run on restbase2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:24] PROBLEM - puppet last run on db2090 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:55] (03CR) 10Elukey: [C: 031] "LGTM, wait for Nuria's review before merging." [puppet] - 10https://gerrit.wikimedia.org/r/362151 (https://phabricator.wikimedia.org/T156841) (owner: 10Joal) [09:47:14] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:47:14] PROBLEM - puppet last run on mw2107 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:47:14] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:47:15] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:47:24] PROBLEM - puppet last run on ores2009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:47:24] PROBLEM - puppet last run on scb2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:47:34] PROBLEM - puppet last run on ganeti2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:47:34] RECOVERY - HHVM rendering on mw2194 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.531 second response time [09:47:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received [09:47:57] PROBLEM - puppet last run on elastic2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:47:57] PROBLEM - puppet last run on mw2197 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:04] PROBLEM - puppet last run on mw2148 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:04] PROBLEM - puppet last run on mw2137 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:04] PROBLEM - puppet last run on thumbor2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:04] PROBLEM - puppet last run on mw2128 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:14] PROBLEM - puppet last run on elastic2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:14] PROBLEM - puppet last run on kubernetes2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:14] PROBLEM - puppet last run on mw2127 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:24] PROBLEM - puppet last run on db2046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:25] PROBLEM - puppet last run on ores2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:25] PROBLEM - puppet last run on scb2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:34] PROBLEM - puppet last run on mw2256 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:34] PROBLEM - puppet last run on es2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:34] PROBLEM - puppet last run on wtp2010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:44] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [09:48:44] PROBLEM - puppet last run on db2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:04] PROBLEM - puppet last run on mw2139 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:04] PROBLEM - puppet last run on mw2149 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:04] PROBLEM - puppet last run on elastic2010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:04] PROBLEM - puppet last run on elastic2020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:14] PROBLEM - puppet last run on rdb2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:14] PROBLEM - puppet last run on mw2157 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:15] PROBLEM - puppet last run on thumbor2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:25] PROBLEM - puppet last run on ores2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:34] PROBLEM - puppet last run on es2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:44] PROBLEM - puppet last run on hassaleh is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:44] PROBLEM - puppet last run on db2028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:54] PROBLEM - puppet last run on db2072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:50:04] PROBLEM - puppet last run on mw2192 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:50:04] PROBLEM - puppet last run on ores2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:50:14] PROBLEM - puppet last run on mw2106 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:50:14] PROBLEM - puppet last run on mw2121 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:50:25] PROBLEM - puppet last run on ores2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:50:44] PROBLEM - puppet last run on es2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:51:04] PROBLEM - puppet last run on mw2179 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:51:09] (03CR) 10Joal: "From Hadoop doc: only in-queue config changes can be done without restart. So yes, we'll need to restart yarn master (yarn only, no impact" [puppet] - 10https://gerrit.wikimedia.org/r/362151 (https://phabricator.wikimedia.org/T156841) (owner: 10Joal) [09:51:14] PROBLEM - puppet last run on mw2102 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:51:14] PROBLEM - HHVM rendering on mw2169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:51:14] PROBLEM - HHVM rendering on mw2173 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:51:14] PROBLEM - HHVM rendering on mw2147 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:51:14] PROBLEM - HHVM rendering on mw2146 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:51:44] PROBLEM - HHVM rendering on mw2196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:52:04] PROBLEM - HHVM rendering on mw2195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:52:04] RECOVERY - HHVM rendering on mw2146 is OK: HTTP OK: HTTP/1.1 200 OK - 75824 bytes in 0.522 second response time [09:52:14] RECOVERY - HHVM rendering on mw2147 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.277 second response time [09:52:14] RECOVERY - HHVM rendering on mw2173 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.532 second response time [09:52:14] RECOVERY - HHVM rendering on mw2169 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.535 second response time [09:52:44] RECOVERY - HHVM rendering on mw2196 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.534 second response time [09:52:54] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: /en.wikipedia.org/v1/page/random/{format} (Random title redirect) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [09:53:04] PROBLEM - HHVM rendering on mw2197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:53:04] RECOVERY - HHVM rendering on mw2195 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.539 second response time [09:53:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [09:53:54] RECOVERY - HHVM rendering on mw2197 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 7.533 second response time [09:54:04] PROBLEM - HHVM rendering on mw2110 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:54:04] PROBLEM - HHVM rendering on mw2140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:54:14] PROBLEM - HHVM rendering on mw2213 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:54:54] RECOVERY - HHVM rendering on mw2140 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 1.262 second response time [09:55:04] RECOVERY - HHVM rendering on mw2110 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 7.552 second response time [09:55:04] RECOVERY - HHVM rendering on mw2213 is OK: HTTP OK: HTTP/1.1 200 OK - 75823 bytes in 0.253 second response time [09:55:54] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [09:56:14] PROBLEM - HHVM rendering on mw2103 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:56:34] PROBLEM - HHVM rendering on mw2127 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:56:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [09:57:14] RECOVERY - HHVM rendering on mw2103 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.523 second response time [09:57:14] PROBLEM - HHVM rendering on mw2097 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:57:34] RECOVERY - HHVM rendering on mw2127 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 1.530 second response time [09:57:44] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [09:57:54] PROBLEM - HHVM rendering on mw2202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:57:54] PROBLEM - HHVM rendering on mw2099 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:14] RECOVERY - HHVM rendering on mw2097 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.294 second response time [09:58:14] PROBLEM - HHVM rendering on mw2133 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:54] RECOVERY - HHVM rendering on mw2202 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.540 second response time [09:59:04] PROBLEM - HHVM rendering on mw2170 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:59:04] RECOVERY - HHVM rendering on mw2133 is OK: HTTP OK: HTTP/1.1 200 OK - 75824 bytes in 0.517 second response time [09:59:34] PROBLEM - HHVM rendering on mw2206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:00:04] RECOVERY - HHVM rendering on mw2170 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.291 second response time [10:00:34] RECOVERY - HHVM rendering on mw2206 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 7.546 second response time [10:00:54] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (Scrapes sample page) timed out before a response was received [10:00:54] PROBLEM - HHVM rendering on mw2102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:01:44] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [10:01:54] RECOVERY - HHVM rendering on mw2102 is OK: HTTP OK: HTTP/1.1 200 OK - 75826 bytes in 6.543 second response time [10:01:54] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received [10:02:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [10:03:15] RECOVERY - Host ganeti2005 is UP: PING OK - Packet loss = 0%, RTA = 44.23 ms [10:03:15] RECOVERY - Host kubetcd2002 is UP: PING OK - Packet loss = 0%, RTA = 43.52 ms [10:03:15] RECOVERY - IPsec on mc1019 is OK: Strongswan OK - 1 ESP OK [10:03:15] RECOVERY - Host kubetcd2001 is UP: PING OK - Packet loss = 0%, RTA = 44.86 ms [10:03:24] RECOVERY - IPsec on mc1022 is OK: Strongswan OK - 1 ESP OK [10:03:24] RECOVERY - Host ns1-v6 is UP: PING OK - Packet loss = 0%, RTA = 36.13 ms [10:03:24] RECOVERY - Host acrux is UP: PING OK - Packet loss = 0%, RTA = 45.17 ms [10:03:24] RECOVERY - Host planet2001 is UP: PING OK - Packet loss = 0%, RTA = 44.47 ms [10:03:24] RECOVERY - Host ganeti2008 is UP: PING OK - Packet loss = 0%, RTA = 43.58 ms [10:03:24] RECOVERY - Host sca2004 is UP: PING OK - Packet loss = 0%, RTA = 43.70 ms [10:03:24] RECOVERY - Host ganeti2006 is UP: PING OK - Packet loss = 0%, RTA = 45.03 ms [10:03:25] RECOVERY - Host ganeti2007 is UP: PING OK - Packet loss = 0%, RTA = 43.98 ms [10:03:34] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [10:03:34] RECOVERY - IPsec on mc1021 is OK: Strongswan OK - 1 ESP OK [10:03:34] RECOVERY - haproxy failover on dbproxy1007 is OK: OK check_failover servers up 2 down 0 [10:03:35] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [10:03:36] RECOVERY - IPsec on rdb1001 is OK: Strongswan OK - 1 ESP OK [10:03:44] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [10:03:45] RECOVERY - IPsec on mc1020 is OK: Strongswan OK - 1 ESP OK [10:03:45] RECOVERY - HHVM rendering on mw2099 is OK: HTTP OK: HTTP/1.1 200 OK - 75824 bytes in 0.302 second response time [10:03:54] RECOVERY - haproxy failover on dbproxy1002 is OK: OK check_failover servers up 2 down 0 [10:03:55] RECOVERY - Postgres Replication Lag on maps2004 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 3024 [10:03:55] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 3024 [10:03:55] RECOVERY - HHVM rendering on mw2251 is OK: HTTP OK: HTTP/1.1 200 OK - 75824 bytes in 0.834 second response time [10:03:55] RECOVERY - Postgres Replication Lag on maps2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 3024 [10:03:55] RECOVERY - puppet last run on db2072 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:04:05] RECOVERY - puppet last run on labtestvirt2002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:04:05] RECOVERY - puppet last run on oresrdb2002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:04:05] RECOVERY - puppet last run on labtestneutron2001 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [10:04:14] RECOVERY - puppet last run on labtestvirt2001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [10:04:14] RECOVERY - puppet last run on mc2028 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:04:14] RECOVERY - puppet last run on labtestneutron2002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [10:04:14] RECOVERY - puppet last run on labtestnet2002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:04:24] RECOVERY - puppet last run on wtp2006 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [10:04:24] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:04:24] RECOVERY - puppet last run on ores2008 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:04:24] RECOVERY - puppet last run on db2077 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:04:24] RECOVERY - puppet last run on db2076 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:04:24] RECOVERY - puppet last run on cp2015 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:04:24] RECOVERY - puppet last run on lvs4004 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [10:04:25] RECOVERY - Host ns1-v4 is UP: PING OK - Packet loss = 0%, RTA = 36.08 ms [10:04:34] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:04:35] RECOVERY - puppet last run on wdqs2001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:04:35] RECOVERY - puppet last run on ores2004 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:04:35] RECOVERY - puppet last run on ores2006 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [10:04:35] RECOVERY - puppet last run on db2090 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [10:04:35] RECOVERY - puppet last run on db2092 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [10:04:36] RECOVERY - puppet last run on ganeti2004 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:04:36] RECOVERY - puppet last run on ganeti2002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:04:37] RECOVERY - puppet last run on ganeti2003 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:04:37] RECOVERY - puppet last run on pybal-test2001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [10:04:38] RECOVERY - puppet last run on wtp2011 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [10:04:38] RECOVERY - puppet last run on wtp2017 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:04:39] RECOVERY - puppet last run on kubetcd2003 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:04:39] RECOVERY - puppet last run on nihal is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:04:44] RECOVERY - puppet last run on hassaleh is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [10:04:44] RECOVERY - puppet last run on es2019 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [10:04:44] RECOVERY - puppet last run on es2011 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [10:04:44] RECOVERY - puppet last run on db2019 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [10:04:44] RECOVERY - puppet last run on db2028 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [10:04:54] RECOVERY - puppet last run on ms-be2015 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [10:04:54] RECOVERY - puppet last run on wezen is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:04:55] RECOVERY - puppet last run on db2065 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:04:55] RECOVERY - puppet last run on elastic2012 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:04:55] RECOVERY - puppet last run on oresrdb2001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:04:55] RECOVERY - puppet last run on zosma is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:04:55] RECOVERY - puppet last run on labtestnet2001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:04:55] RECOVERY - puppet last run on pc2005 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [10:05:04] RECOVERY - puppet last run on kubernetes2003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [10:05:04] RECOVERY - puppet last run on labstore2003 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [10:05:04] RECOVERY - puppet last run on mw2172 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:05:04] RECOVERY - puppet last run on mw2139 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:05:04] RECOVERY - puppet last run on wtp2013 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [10:05:04] RECOVERY - puppet last run on mw2153 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:05:04] RECOVERY - puppet last run on mw2149 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [10:05:05] RECOVERY - puppet last run on eventlog2001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:05:05] RECOVERY - puppet last run on db2045 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:05:06] RECOVERY - puppet last run on db2060 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [10:05:06] RECOVERY - puppet last run on mw2170 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:05:07] RECOVERY - puppet last run on mw2190 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [10:05:14] RECOVERY - puppet last run on elastic2032 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [10:05:14] PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:05:14] RECOVERY - puppet last run on db2058 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:05:14] RECOVERY - puppet last run on elastic2021 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [10:05:14] RECOVERY - puppet last run on ms-be2019 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:05:14] RECOVERY - puppet last run on restbase2012 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:05:14] RECOVERY - puppet last run on maps2003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:05:15] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [10:05:34] PROBLEM - puppet last run on mw2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:05:34] PROBLEM - puppet last run on es2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:05:34] PROBLEM - puppet last run on rdb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:05:34] PROBLEM - puppet last run on elastic2025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:05:34] PROBLEM - puppet last run on planet2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:05:34] PROBLEM - puppet last run on heze is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:05:34] PROBLEM - puppet last run on rdb2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:06:54] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [10:06:54] RECOVERY - puppet last run on elastic2001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [10:06:54] RECOVERY - puppet last run on elastic2005 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [10:06:54] RECOVERY - puppet last run on restbase2001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:06:54] RECOVERY - puppet last run on mw2180 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [10:06:54] RECOVERY - puppet last run on mw2251 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [10:06:55] RECOVERY - puppet last run on mw2118 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:06:55] RECOVERY - puppet last run on ganeti2006 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [10:06:56] RECOVERY - puppet last run on elastic2007 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:07:04] RECOVERY - puppet last run on elastic2030 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:07:04] RECOVERY - puppet last run on elastic2011 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [10:07:04] RECOVERY - puppet last run on mw2241 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [10:07:04] RECOVERY - puppet last run on rdb2005 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:07:04] RECOVERY - puppet last run on mw2169 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [10:07:04] RECOVERY - puppet last run on mw2151 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:07:05] RECOVERY - puppet last run on mw2137 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:07:05] RECOVERY - puppet last run on mw2155 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:07:06] RECOVERY - puppet last run on mw2203 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:07:06] RECOVERY - puppet last run on wtp2005 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:07:07] RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [10:07:07] RECOVERY - puppet last run on mw2111 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:07:08] RECOVERY - puppet last run on mw2164 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:07:08] RECOVERY - puppet last run on mw2189 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:07:24] RECOVERY - puppet last run on mw2138 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:07:24] RECOVERY - puppet last run on mw2097 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [10:07:24] RECOVERY - puppet last run on elastic2023 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:07:24] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [10:07:24] RECOVERY - puppet last run on ms-be2039 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [10:07:25] RECOVERY - puppet last run on restbase2011 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:07:34] RECOVERY - puppet last run on elastic2015 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [10:07:34] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:07:44] RECOVERY - Host ripe-atlas-codfw is UP: PING OK - Packet loss = 0%, RTA = 36.06 ms [10:07:54] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [10:07:55] RECOVERY - IPsec on cp3038 is OK: Strongswan OK - 54 ESP OK [10:07:56] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review: Create Dinka Wikipedia - https://phabricator.wikimedia.org/T168518#3366966 (10Urbanecm) I'll create initial configuration, next things must be done by somebody else. [10:08:04] RECOVERY - puppet last run on pc2006 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [10:08:04] RECOVERY - puppet last run on mw2142 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [10:08:04] RECOVERY - puppet last run on mw2162 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:08:05] RECOVERY - puppet last run on elastic2019 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [10:08:05] RECOVERY - puppet last run on mw2210 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [10:08:05] RECOVERY - puppet last run on mw2182 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [10:08:05] RECOVERY - puppet last run on mw2104 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [10:08:06] RECOVERY - puppet last run on mw2131 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [10:08:14] RECOVERY - IPsec on cp3034 is OK: Strongswan OK - 54 ESP OK [10:08:14] RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:08:15] RECOVERY - puppet last run on restbase-test2003 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:08:15] RECOVERY - puppet last run on mw2103 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [10:08:24] RECOVERY - puppet last run on db2073 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [10:08:24] RECOVERY - puppet last run on lvs2005 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [10:08:24] RECOVERY - puppet last run on mw2167 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:08:24] RECOVERY - IPsec on cp4016 is OK: Strongswan OK - 44 ESP OK [10:08:24] RECOVERY - IPsec on cp4010 is OK: Strongswan OK - 44 ESP OK [10:08:24] RECOVERY - IPsec on cp4017 is OK: Strongswan OK - 44 ESP OK [10:08:25] RECOVERY - IPsec on cp4018 is OK: Strongswan OK - 44 ESP OK [10:08:25] RECOVERY - IPsec on cp4014 is OK: Strongswan OK - 54 ESP OK [10:08:26] RECOVERY - puppet last run on ms-be2034 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [10:08:27] RECOVERY - IPsec on cp4013 is OK: Strongswan OK - 54 ESP OK [10:08:27] RECOVERY - IPsec on cp4015 is OK: Strongswan OK - 54 ESP OK [10:08:34] RECOVERY - puppet last run on mw2257 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:08:44] RECOVERY - puppet last run on es2001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [10:08:44] RECOVERY - puppet last run on cp2020 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:08:54] RECOVERY - puppet last run on mw2253 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:08:55] RECOVERY - puppet last run on mw2226 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [10:09:04] RECOVERY - puppet last run on db2062 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:09:05] RECOVERY - puppet last run on restbase2004 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:09:05] RECOVERY - puppet last run on mw2171 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:09:24] RECOVERY - puppet last run on ms-be2021 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:09:24] RECOVERY - puppet last run on elastic2036 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:09:34] RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:09:34] RECOVERY - puppet last run on labtestvirt2003 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [10:09:35] RECOVERY - puppet last run on es2015 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:09:44] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 203 probes of 287 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [10:09:54] PROBLEM - puppet last run on ms-be2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:09:54] RECOVERY - puppet last run on restbase-test2001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:09:54] PROBLEM - puppet last run on ms-be2028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:09:54] PROBLEM - puppet last run on ms-be2029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:09:54] PROBLEM - puppet last run on cp2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:09:55] RECOVERY - puppet last run on db2089 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:10:04] PROBLEM - puppet last run on cp2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:10:04] RECOVERY - puppet last run on mw2144 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:10:04] RECOVERY - puppet last run on mw2143 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [10:10:05] RECOVERY - puppet last run on mw2154 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:10:05] RECOVERY - puppet last run on labstore2004 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:10:05] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 122 probes of 435 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [10:10:15] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:10:15] RECOVERY - puppet last run on elastic2034 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [10:10:15] RECOVERY - puppet last run on db2070 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:10:24] RECOVERY - puppet last run on elastic2014 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:10:34] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [10:10:35] RECOVERY - puppet last run on es2016 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:10:44] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:10:44] RECOVERY - puppet last run on pybal-test2002 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [10:10:54] RECOVERY - puppet last run on scb2003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:10:54] RECOVERY - puppet last run on ms-be2016 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:10:55] RECOVERY - puppet last run on mw2216 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:10:55] RECOVERY - puppet last run on es2017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:11:04] RECOVERY - puppet last run on mw2247 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:11:04] RECOVERY - puppet last run on mw2193 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:11:04] RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [10:11:04] RECOVERY - puppet last run on elastic2035 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [10:11:04] RECOVERY - puppet last run on mw2123 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [10:11:14] RECOVERY - puppet last run on cp2010 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [10:11:14] RECOVERY - puppet last run on ms-be2020 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:11:24] RECOVERY - puppet last run on mw2165 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [10:11:24] RECOVERY - puppet last run on ms-be2024 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:11:24] RECOVERY - puppet last run on ms-be2027 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [10:11:24] RECOVERY - puppet last run on mw2176 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:11:35] RECOVERY - puppet last run on ms-be2032 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [10:11:35] RECOVERY - puppet last run on ms-be2038 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:11:44] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:11:54] RECOVERY - puppet last run on ms-be2029 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:11:54] RECOVERY - puppet last run on ms-be2028 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:11:54] RECOVERY - puppet last run on mw2218 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:12:04] RECOVERY - puppet last run on db2053 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:12:04] RECOVERY - puppet last run on mw2150 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:12:04] RECOVERY - puppet last run on mw2183 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [10:12:15] RECOVERY - puppet last run on db2056 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:12:15] RECOVERY - puppet last run on mw2221 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:12:24] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [10:12:34] RECOVERY - puppet last run on db2087 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [10:12:34] RECOVERY - puppet last run on db2086 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:12:44] RECOVERY - puppet last run on es2004 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:12:54] RECOVERY - puppet last run on cp2023 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:12:54] RECOVERY - puppet last run on mw2119 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:13:04] RECOVERY - puppet last run on wtp2016 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [10:13:04] RECOVERY - puppet last run on elastic2016 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [10:13:04] RECOVERY - puppet last run on restbase2009 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [10:13:04] RECOVERY - puppet last run on mw2228 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:13:05] RECOVERY - puppet last run on mw2204 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:13:14] RECOVERY - puppet last run on mc2035 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [10:13:14] RECOVERY - puppet last run on mc2034 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [10:13:14] RECOVERY - puppet last run on mc2030 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:13:24] RECOVERY - puppet last run on mw2202 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:13:24] RECOVERY - puppet last run on ms-be2023 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [10:13:34] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:13:34] RECOVERY - puppet last run on db2075 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:13:35] RECOVERY - puppet last run on ms-fe2007 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [10:13:35] RECOVERY - puppet last run on mw2260 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [10:13:35] RECOVERY - puppet last run on thumbor2004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [10:13:44] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:13:44] RECOVERY - puppet last run on es2003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:13:54] RECOVERY - puppet last run on mc2021 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:13:54] RECOVERY - puppet last run on ganeti2007 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:13:54] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [10:13:55] RECOVERY - puppet last run on mw2217 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:14:04] RECOVERY - puppet last run on mw2232 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:14:04] RECOVERY - puppet last run on mw2160 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:14:14] RECOVERY - puppet last run on mw2198 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:14:14] RECOVERY - puppet last run on mw2238 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:14:24] RECOVERY - puppet last run on db2066 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:14:24] RECOVERY - puppet last run on ms-be2017 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:14:34] RECOVERY - puppet last run on heze is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:14:34] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [10:14:34] RECOVERY - puppet last run on ores2003 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [10:14:44] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 14 probes of 287 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [10:14:54] RECOVERY - puppet last run on elastic2026 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:14:54] RECOVERY - puppet last run on ms-be2014 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [10:14:54] RECOVERY - puppet last run on elastic2006 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:14:54] RECOVERY - puppet last run on scb2005 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:14:54] RECOVERY - puppet last run on db2064 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [10:15:04] RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [10:15:04] RECOVERY - puppet last run on tureis is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [10:15:05] RECOVERY - puppet last run on subra is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:15:05] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 2 probes of 435 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [10:15:05] RECOVERY - puppet last run on mw2200 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:15:14] RECOVERY - puppet last run on mw2214 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:15:14] RECOVERY - puppet last run on elastic2008 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:15:14] RECOVERY - puppet last run on mw2147 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [10:15:14] RECOVERY - puppet last run on cp2007 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [10:15:24] RECOVERY - puppet last run on db2078 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:15:34] RECOVERY - puppet last run on mw2259 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:15:34] RECOVERY - puppet last run on thumbor2003 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:15:44] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:15:54] RECOVERY - puppet last run on db2081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:15:54] RECOVERY - puppet last run on wdqs2003 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:15:55] RECOVERY - puppet last run on mw2215 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [10:16:04] RECOVERY - puppet last run on mw2168 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [10:16:04] RECOVERY - puppet last run on mw2223 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:16:04] RECOVERY - puppet last run on mw2242 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [10:16:04] RECOVERY - puppet last run on mw2146 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [10:16:14] RECOVERY - puppet last run on mw2186 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:16:14] RECOVERY - puppet last run on maps2002 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [10:16:14] RECOVERY - puppet last run on scb2006 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [10:16:15] RECOVERY - puppet last run on mc2023 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:16:15] RECOVERY - puppet last run on mw2114 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:16:24] RECOVERY - puppet last run on cp2018 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [10:16:24] RECOVERY - puppet last run on mw2113 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [10:16:34] RECOVERY - puppet last run on restbase2010 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [10:16:34] RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:16:34] RECOVERY - puppet last run on db2084 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:16:34] RECOVERY - puppet last run on wtp2001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [10:16:44] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [10:16:54] RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:16:54] RECOVERY - puppet last run on ores2001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [10:17:04] RECOVERY - puppet last run on db2037 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:17:14] RECOVERY - puppet last run on mw2174 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:17:14] RECOVERY - puppet last run on mw2110 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:17:14] RECOVERY - puppet last run on mw2124 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [10:17:14] RECOVERY - puppet last run on mc2024 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:17:24] RECOVERY - puppet last run on acrab is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:17:24] RECOVERY - puppet last run on conf2002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:17:24] RECOVERY - puppet last run on mw2213 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [10:17:24] RECOVERY - puppet last run on mw2105 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:17:34] RECOVERY - puppet last run on db2088 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [10:17:34] RECOVERY - puppet last run on mw2258 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:17:35] RECOVERY - puppet last run on rdb2004 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [10:17:44] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:17:44] RECOVERY - puppet last run on wtp2018 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [10:17:55] RECOVERY - puppet last run on mw2173 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [10:18:04] RECOVERY - puppet last run on ganeti2005 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [10:18:04] RECOVERY - puppet last run on mw2236 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [10:18:04] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [10:18:24] RECOVERY - puppet last run on mw2201 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:18:24] RECOVERY - puppet last run on restbase2005 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:18:34] RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:18:34] RECOVERY - puppet last run on suhail is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [10:18:44] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [10:18:55] RECOVERY - puppet last run on mwlog2001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:19:04] RECOVERY - puppet last run on mw2229 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:19:05] RECOVERY - puppet last run on mw2205 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:19:15] RECOVERY - puppet last run on mc2033 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [10:19:15] RECOVERY - puppet last run on mw2230 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:19:15] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:19:34] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:19:35] RECOVERY - puppet last run on restbase2002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:19:35] RECOVERY - puppet last run on ores2009 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [10:19:35] RECOVERY - puppet last run on ms-fe2006 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [10:19:35] RECOVERY - puppet last run on ms-fe2008 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:19:54] RECOVERY - puppet last run on auth2001 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:19:54] RECOVERY - puppet last run on cp2014 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [10:19:54] RECOVERY - puppet last run on cp2004 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:20:04] RECOVERY - puppet last run on mw2249 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [10:20:04] RECOVERY - puppet last run on mw2225 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [10:20:04] RECOVERY - puppet last run on cp2003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [10:20:04] RECOVERY - puppet last run on mw2197 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:20:14] RECOVERY - puppet last run on mc2032 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [10:20:24] RECOVERY - puppet last run on kubernetes2002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:20:34] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [10:20:34] RECOVERY - puppet last run on scb2002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:20:44] RECOVERY - puppet last run on wtp2010 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [10:20:54] PROBLEM - High lag on wdqs2003 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1800.0] [10:20:55] RECOVERY - puppet last run on elastic2002 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:21:04] RECOVERY - puppet last run on mw2248 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:21:04] RECOVERY - puppet last run on mw2250 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:21:04] RECOVERY - puppet last run on mw2148 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:21:04] RECOVERY - puppet last run on thumbor2001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:21:34] RECOVERY - Kafka Broker Under Replicated Partitions on kafka2003 is OK: OK: Less than 50.00% above the threshold [1.0] [10:21:34] RECOVERY - puppet last run on db2046 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [10:21:34] RECOVERY - Kafka Broker Under Replicated Partitions on kafka2002 is OK: OK: Less than 50.00% above the threshold [1.0] [10:21:34] RECOVERY - puppet last run on es2013 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [10:21:35] RECOVERY - puppet last run on es2018 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [10:21:54] RECOVERY - puppet last run on elastic2004 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [10:21:55] RECOVERY - puppet last run on elastic2003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:22:04] RECOVERY - puppet last run on mw2243 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:22:04] RECOVERY - puppet last run on mw2239 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:22:14] RECOVERY - puppet last run on kubetcd2001 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [10:22:14] RECOVERY - puppet last run on mc2031 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [10:22:15] RECOVERY - puppet last run on mc2036 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:22:24] RECOVERY - puppet last run on mw2102 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:22:24] RECOVERY - puppet last run on cp2026 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:22:24] RECOVERY - puppet last run on mc2026 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:22:24] RECOVERY - puppet last run on cp2017 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [10:22:24] RECOVERY - puppet last run on mw2157 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [10:22:34] RECOVERY - puppet last run on ores2007 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [10:22:54] PROBLEM - High lag on wdqs2003 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1800.0] [10:22:54] RECOVERY - puppet last run on kafka2001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [10:23:04] RECOVERY - puppet last run on kubernetes2001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:23:04] RECOVERY - puppet last run on mw2235 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:23:04] RECOVERY - puppet last run on es2014 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:23:04] RECOVERY - puppet last run on mw2192 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:23:24] RECOVERY - puppet last run on ms-be2018 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:23:24] RECOVERY - puppet last run on ms-be2025 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:23:54] RECOVERY - puppet last run on ganeti2008 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [10:24:14] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:24:24] RECOVERY - puppet last run on cp2022 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [10:24:34] RECOVERY - puppet last run on mw2017 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [10:24:34] RECOVERY - puppet last run on conf2001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:24:35] RECOVERY - puppet last run on mw2255 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [10:24:44] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [10:24:54] RECOVERY - puppet last run on mc2019 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [10:24:54] RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [10:25:04] RECOVERY - puppet last run on db2091 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:25:04] RECOVERY - puppet last run on mc2020 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:25:04] RECOVERY - puppet last run on mw2252 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [10:25:04] RECOVERY - puppet last run on mw2220 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:25:34] RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:25:34] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:25:44] RECOVERY - puppet last run on acrux is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:25:44] RECOVERY - puppet last run on sarin is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [10:26:04] RECOVERY - puppet last run on kubetcd2002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:26:04] RECOVERY - puppet last run on mw2233 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:26:14] RECOVERY - puppet last run on mc2025 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:26:24] RECOVERY - puppet last run on mw2108 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:26:24] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [10:26:44] RECOVERY - puppet last run on ms-be2030 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [10:26:54] RECOVERY - puppet last run on cp2019 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:26:54] RECOVERY - puppet last run on cp2002 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:27:04] RECOVERY - puppet last run on mw2219 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:27:14] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:27:22] (03PS1) 10Urbanecm: Initial configuration for Dinka Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362168 (https://phabricator.wikimedia.org/T168518) [10:27:24] RECOVERY - puppet last run on cp2025 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [10:27:24] RECOVERY - puppet last run on cp2006 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:27:34] RECOVERY - puppet last run on db2079 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [10:27:44] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:27:55] RECOVERY - puppet last run on mw2246 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [10:28:04] RECOVERY - puppet last run on db2085 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:28:04] RECOVERY - puppet last run on prometheus2003 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:28:14] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [10:28:14] RECOVERY - puppet last run on cp2016 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:28:34] RECOVERY - puppet last run on cp2011 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:28:54] RECOVERY - puppet last run on db2071 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:29:04] RECOVERY - puppet last run on puppetmaster2001 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [10:29:04] RECOVERY - puppet last run on mw2224 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:29:14] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for Dinka Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362168 (https://phabricator.wikimedia.org/T168518) (owner: 10Urbanecm) [10:29:24] RECOVERY - puppet last run on cp2012 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:29:34] RECOVERY - puppet last run on elastic2025 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:29:44] RECOVERY - puppet last run on db2010 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:29:44] RECOVERY - puppet last run on db2011 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [10:30:14] RECOVERY - puppet last run on mw2240 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [10:30:24] RECOVERY - puppet last run on ms-be2033 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:30:34] (03PS2) 10Urbanecm: Initial configuration for Dinka Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362168 (https://phabricator.wikimedia.org/T168518) [10:30:43] !log repooling acamar T168462 [10:30:44] RECOVERY - puppet last run on maps2001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [10:30:44] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [10:30:50] !log ema@neodymium conftool action : set/pooled=yes; selector: name=acamar.wikimedia.org,service=pdns_recursor [10:30:54] RECOVERY - puppet last run on cp2005 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:30:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:55] T168462: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462 [10:31:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:04] RECOVERY - puppet last run on mw2222 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:31:12] now if eventbus fires an alarm again it means it doesn't like dns-recursors changes [10:31:24] RECOVERY - puppet last run on cp2008 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:31:34] RECOVERY - puppet last run on es2002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:31:34] RECOVERY - puppet last run on rdb2001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:31:54] RECOVERY - puppet last run on ores2002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:32:24] RECOVERY - puppet last run on ms-be2026 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [10:32:24] RECOVERY - puppet last run on ms-be2022 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:33:04] RECOVERY - puppet last run on mw2245 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:33:34] RECOVERY - puppet last run on ms-be2035 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [10:34:44] RECOVERY - puppet last run on wtp2009 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [10:34:47] !log re-enable puppet and start pybal on lvs2001-2003 T168462 [10:34:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:04] RECOVERY - pybal on lvs2001 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal [10:36:54] RECOVERY - PyBal backends health check on lvs2001 is OK: PYBAL OK - All pools are healthy [10:37:54] RECOVERY - pybal on lvs2002 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal [10:37:54] RECOVERY - PyBal backends health check on lvs2002 is OK: PYBAL OK - All pools are healthy [10:38:54] RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy [10:39:04] RECOVERY - pybal on lvs2003 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal [10:40:58] (03PS1) 10Ema: Revert "Depool codfw for asw-a-codfw switch upgrade" [dns] - 10https://gerrit.wikimedia.org/r/362170 [10:41:19] (03PS1) 10Ema: Revert "Route cache traffic around codfw for asw-a-codfw switch upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/362172 [10:44:04] RECOVERY - Check Varnish expiry mailbox lag on cp4013 is OK: OK: expiry mailbox lag is 0 [10:45:44] !log switching citoid and restbase-async back to codfw after T168462 [10:45:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:54] T168462: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462 [10:46:34] !log ema@neodymium conftool action : set/pooled=true; selector: name=codfw,dnsdisc=restbase-async [10:46:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:01] !log ema@neodymium conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid [10:47:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:44] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2115742 [10:52:54] RECOVERY - High lag on wdqs2003 is OK: OK: Less than 30.00% above the threshold [600.0] [10:53:48] !log elukey@tin Started deploy [analytics/refinery@f6cccf9]: Weekely refinery deployment [10:53:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:59] !log elukey@tin Finished deploy [analytics/refinery@f6cccf9]: Weekely refinery deployment (duration: 00m 11s) [10:54:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:54] !log elukey@tin Started deploy [analytics/refinery@f6cccf9]: Weekely refinery deployment [10:55:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:50] !log elukey@tin Finished deploy [analytics/refinery@f6cccf9]: Weekely refinery deployment (duration: 02m 56s) [10:57:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:25] !log ema@neodymium conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async [11:02:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:41] !log ema@neodymium conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async [11:02:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:23] !log ema@neodymium conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid [11:03:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:56] !log repool codfw in DNS after T168462 [11:07:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:07] T168462: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462 [11:07:44] (03CR) 10Ema: [V: 032 C: 032] Revert "Depool codfw for asw-a-codfw switch upgrade" [dns] - 10https://gerrit.wikimedia.org/r/362170 (owner: 10Ema) [11:09:30] !log ema@neodymium conftool action : set/ttl=300; selector: dnsdisc=(citoid|restbase-async) [11:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:25] (03CR) 10Daniel Kinzler: [C: 031] "I agree with the intent: we want /data/main/Foo to work on all wikis, and want it on Commons first." [puppet] - 10https://gerrit.wikimedia.org/r/360887 (https://phabricator.wikimedia.org/T163922) (owner: 10Ladsgroup) [11:16:53] (03CR) 10Daniel Kinzler: [C: 031] "Seems fine to me." [puppet] - 10https://gerrit.wikimedia.org/r/360891 (https://phabricator.wikimedia.org/T163922) (owner: 10Ladsgroup) [11:20:04] (03CR) 10Daniel Kinzler: Make /entity/ redirect internal (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/357985 (https://phabricator.wikimedia.org/T119536) (owner: 10Ladsgroup) [11:25:09] (03PS2) 10Thiemo Mättig (WMDE): mediawiki: Remove broken wikidata.org/ontology Apache alias [puppet] - 10https://gerrit.wikimedia.org/r/361801 (https://phabricator.wikimedia.org/T169023) (owner: 10Krinkle) [11:30:58] (03PS2) 10Ema: Revert "Route cache traffic around codfw for asw-a-codfw switch upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/362172 [11:31:22] (03CR) 10Ema: [V: 032 C: 032] Revert "Route cache traffic around codfw for asw-a-codfw switch upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/362172 (owner: 10Ema) [11:31:59] !log route ulsfo back to codfw T168462 [11:32:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:13] T168462: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462 [11:36:49] (03CR) 10Ladsgroup: [C: 04-1] "It should not be a redirect to wikiba.se as I mentioned earlier." [puppet] - 10https://gerrit.wikimedia.org/r/361801 (https://phabricator.wikimedia.org/T169023) (owner: 10Krinkle) [11:41:17] (03CR) 10Thiemo Mättig (WMDE): [C: 031] "I'm sorry? http://wikiba.se/ontology is the canonical URI. There is no other URI than this. Changing it to something else because of obscu" [puppet] - 10https://gerrit.wikimedia.org/r/361801 (https://phabricator.wikimedia.org/T169023) (owner: 10Krinkle) [11:44:33] (03CR) 10Ladsgroup: [C: 04-1] "If security or Ops are okay with having redirects to outside of the production cluster, I'm fine." [puppet] - 10https://gerrit.wikimedia.org/r/361801 (https://phabricator.wikimedia.org/T169023) (owner: 10Krinkle) [11:48:50] !log cp4015: restart varnish-be [11:49:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:10] 10Operations, 10media-storage, 10Patch-For-Review, 10User-fgiunchedi: Implement storage policies for swift - https://phabricator.wikimedia.org/T151648#3390487 (10fgiunchedi) [11:51:27] !log create xfs filesystems on fourth partition on ms-be machines - T151648 [11:51:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:37] T151648: Implement storage policies for swift - https://phabricator.wikimedia.org/T151648 [11:54:25] RECOVERY - Check Varnish expiry mailbox lag on cp4015 is OK: OK: expiry mailbox lag is 0 [11:56:30] 10Operations, 10Pybal, 10Traffic, 10User-Joe: Pybal not happy with DNS delays - https://phabricator.wikimedia.org/T154759#3390493 (10faidon) From what I understand, this happened again today during asw-a-codfw's upgrade (T168462), i.e. similar circumstances -> similar symptoms. We should probably prioritiz... [11:58:21] !log Stop replication on the same position for: dbstore1001 (s6) and db1050 - T169050 [11:58:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:30] T169050: dbstore1001 mysql crashed with: semaphore wait has lasted > 600 seconds - https://phabricator.wikimedia.org/T169050 [12:00:36] (03CR) 10Jonas Kress (WMDE): [C: 031] Configure WikibaseQualityConstraints extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/358553 (https://phabricator.wikimedia.org/T168938) (owner: 10Lucas Werkmeister (WMDE)) [12:17:50] (03PS1) 10Alexandros Kosiaris: Depool poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362181 [12:18:25] !log Re-enable event scheduler on dbstore1001 - T169050 [12:18:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:37] T169050: dbstore1001 mysql crashed with: semaphore wait has lasted > 600 seconds - https://phabricator.wikimedia.org/T169050 [12:19:10] (03CR) 10Alexandros Kosiaris: [C: 032] Depool poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362181 (owner: 10Alexandros Kosiaris) [12:19:22] (03CR) 10jenkins-bot: Depool poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362181 (owner: 10Alexandros Kosiaris) [12:20:28] !log depool poolcounter1001 for kernel upgrades [12:20:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:24] (03PS1) 10Alexandros Kosiaris: Revert "Depool poolcounter1001" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362183 [12:22:28] 10Operations, 10Traffic, 10netops, 10Patch-For-Review, 10User-Joe: codfw row A switch upgrade - https://phabricator.wikimedia.org/T168462#3390519 (10ayounsi) 05Open>03Resolved Upgrade has been completed in ~1h45min. Notable events: - NSSU bug, where members a4 and a5 were not passing traffic after b... [12:23:11] !log akosiaris@tin Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 05s) [12:23:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:56] !log forcing reindex of cirrus / elasticsearch after switch upgrade [12:24:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:48] godog: hmm poolcounter still being used by thumbor [12:24:58] what can I do ? restart thumbor ? [12:27:43] (03CR) 10Alexandros Kosiaris: [C: 04-1] scap3 - deployment of packge requires configuration to already exist (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/362155 (https://phabricator.wikimedia.org/T169011) (owner: 10Gehel) [12:29:26] !log reboot nitrogen for kernel upgrades [12:29:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:32] sigh.. yeah I forgot to bump the config [12:33:05] (03PS1) 10Alexandros Kosiaris: thumbor: Use poolcounter1002 instead of poolcounter1001 [puppet] - 10https://gerrit.wikimedia.org/r/362186 [12:33:28] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] thumbor: Use poolcounter1002 instead of poolcounter1001 [puppet] - 10https://gerrit.wikimedia.org/r/362186 (owner: 10Alexandros Kosiaris) [12:34:14] PROBLEM - puppet last run on mw1259 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:35:05] RECOVERY - puppet last run on mw1259 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:37:47] (03PS3) 10Gehel: scap3 - deployment of package requires configuration to already exist [puppet] - 10https://gerrit.wikimedia.org/r/362155 (https://phabricator.wikimedia.org/T169011) [12:38:16] oh damn thumbor is many systemd instances [12:38:56] !log Stop replication on dbstore1002 - x1 - T169050 [12:39:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:07] T169050: dbstore1001 mysql crashed with: semaphore wait has lasted > 600 seconds - https://phabricator.wikimedia.org/T169050 [12:40:00] systemd partOf ... I did not know [12:40:02] nice! [12:41:41] !log reboot poolcounter1001 for kernel upgrades [12:41:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:02] (03PS1) 10Alexandros Kosiaris: Revert "thumbor: Use poolcounter1002 instead of poolcounter1001" [puppet] - 10https://gerrit.wikimedia.org/r/362189 [12:44:16] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "thumbor: Use poolcounter1002 instead of poolcounter1001" [puppet] - 10https://gerrit.wikimedia.org/r/362189 (owner: 10Alexandros Kosiaris) [12:44:18] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "thumbor: Use poolcounter1002 instead of poolcounter1001" [puppet] - 10https://gerrit.wikimedia.org/r/362189 (owner: 10Alexandros Kosiaris) [12:46:18] 10Operations, 10Dumps-Generation: Reboot snapshot hosts - https://phabricator.wikimedia.org/T168516#3390587 (10ArielGlenn) 05Open>03Resolved Done. [12:47:24] !log reboot argon.eqiad.wmnet, darmstadtium.eqiad.wmnet, dbmonitor1001.wikimedia.org, etcd1001.eqiad.wmnet, etcd1006.eqiad.wmnet, krypton.eqiad.wmnet, mendelevium.eqiad.wmnet, mwdebug1001.eqiad.wmnet, roentgenium.eqiad.wmnet, sca1003.eqiad.wmnet for kernel upgrades [12:47:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:51:55] (03CR) 10Gehel: "Running puppet-compiler on a list of affected nodes seems to work: https://puppet-compiler.wmflabs.org/6891/. The list of nodes to test wa" [puppet] - 10https://gerrit.wikimedia.org/r/362155 (https://phabricator.wikimedia.org/T169011) (owner: 10Gehel) [12:55:13] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "Depool poolcounter1001" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362183 (owner: 10Alexandros Kosiaris) [12:56:02] (03CR) 10jenkins-bot: Revert "Depool poolcounter1001" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362183 (owner: 10Alexandros Kosiaris) [12:56:17] !log akosiaris@tin Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 46s) [12:56:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170629T1300). Please do the needful. [13:01:52] PROBLEM - Check systemd state on sodium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:09:53] RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational [13:15:22] (03PS3) 10Nschaaf: Stop reader surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360849 (https://phabricator.wikimedia.org/T131949) [13:23:27] (03PS19) 10Mforns: role::mariadb::analytics::custom_repl_slave: add eventlogging_cleaner.py [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey) [13:26:07] (03CR) 10Mforns: role::mariadb::analytics::custom_repl_slave: add eventlogging_cleaner.py (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey) [13:41:12] PROBLEM - Confd template for /etc/dsh/group/mediawiki-installation on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:12] PROBLEM - Check whether ferm is active by checking the default input chain on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:12] PROBLEM - Check size of conntrack table on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:12] PROBLEM - dhclient process on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:12] PROBLEM - Check systemd state on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:13] PROBLEM - configured eth on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:13] PROBLEM - confd service on bast3002 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:41:14] PROBLEM - DPKG on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:14] PROBLEM - salt-minion processes on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:22] PROBLEM - MD RAID on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:23] PROBLEM - puppet last run on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:26] (03PS4) 10ArielGlenn: treat wikidata just like enwiki for dumps [puppet] - 10https://gerrit.wikimedia.org/r/355100 [13:41:42] PROBLEM - Confd template for /etc/dsh/group/parsoid on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:41:45] (03PS1) 10Jcrespo: mariadb: Set default limits for systemd core databases [puppet] - 10https://gerrit.wikimedia.org/r/362204 (https://phabricator.wikimedia.org/T168356) [13:42:02] RECOVERY - Confd template for /etc/dsh/group/mediawiki-installation on bast3002 is OK: No errors detected [13:42:02] RECOVERY - Check whether ferm is active by checking the default input chain on bast3002 is OK: OK ferm input default policy is set [13:42:02] RECOVERY - Check size of conntrack table on bast3002 is OK: OK: nf_conntrack is 0 % full [13:42:02] RECOVERY - dhclient process on bast3002 is OK: PROCS OK: 0 processes with command name dhclient [13:42:42] did bast3002 went down again? [13:43:31] yeah, it is down for me :-( [13:44:22] PROBLEM - SSH on bast3002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:44:52] PROBLEM - Confd template for /etc/dsh/group/cassandra on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:44:52] PROBLEM - Disk space on bast3002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:44:52] PROBLEM - Confd template for /etc/dsh/group/jobrunner on bast3002 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:45:02] RECOVERY - configured eth on bast3002 is OK: OK - interfaces up [13:45:02] RECOVERY - Check systemd state on bast3002 is OK: OK - running: The system is fully operational [13:45:12] RECOVERY - salt-minion processes on bast3002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:45:12] RECOVERY - DPKG on bast3002 is OK: All packages OK [13:45:12] RECOVERY - confd service on bast3002 is OK: OK - confd is active [13:45:12] RECOVERY - SSH on bast3002 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [13:45:12] RECOVERY - MD RAID on bast3002 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [13:45:22] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 34 minutes ago with 0 failures [13:45:42] RECOVERY - Disk space on bast3002 is OK: DISK OK [13:45:42] RECOVERY - Confd template for /etc/dsh/group/parsoid on bast3002 is OK: No errors detected [13:45:42] RECOVERY - Confd template for /etc/dsh/group/cassandra on bast3002 is OK: No errors detected [13:45:42] RECOVERY - Confd template for /etc/dsh/group/jobrunner on bast3002 is OK: No errors detected [13:45:56] akosiaris: yup, looks right, thanks for taking care of that! [13:46:28] godog: yw [13:46:36] nice trick btw with the PartOf [13:46:41] I learned something today :-) [13:47:07] I should do that for uwsgi as well [13:48:32] hehe yeah for multi instance things that's a nice trick indeed [13:49:50] akosiaris: uwsgi uses distinct units IIRC not unit "templates" with @ ? not that it makes a difference I think [13:50:32] unrelated but I'm about to fail sdb from bast3002 mdadm, T169035 [13:50:33] T169035: bast3002 sdb broken - https://phabricator.wikimedia.org/T169035 [13:50:34] objections ? [13:51:44] godog: yeah I think it's distinct units. but that's probably easily changed [13:51:54] and no, I don't think it would make a diff either [13:52:38] sure, fail it out [13:54:15] !log kick sdb out of mdadm arrays on bast3002 - T169035 [13:54:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:12] PROBLEM - MD RAID on bast3002 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 3, Spare: 0 [13:56:13] ACKNOWLEDGEMENT - MD RAID on bast3002 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 3, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T169220 [13:56:16] 10Operations, 10ops-esams: Degraded RAID on bast3002 - https://phabricator.wikimedia.org/T169220#3390785 (10ops-monitoring-bot) [13:56:46] 10Operations, 10ops-esams: Degraded RAID on bast3002 - https://phabricator.wikimedia.org/T169220#3390793 (10fgiunchedi) [13:56:48] 10Operations, 10ops-esams: bast3002 sdb broken - https://phabricator.wikimedia.org/T169035#3390791 (10fgiunchedi) [14:08:52] !log Deploy alter table on s7 on dbstore1001 - T166208 [14:09:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:01] T166208: Convert unique keys into primary keys for some wiki tables on s7 - https://phabricator.wikimedia.org/T166208 [14:10:53] (03PS2) 10Ema: 4.1.7-1wm1: new upstream, new counters [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/361845 (https://phabricator.wikimedia.org/T164768) [14:11:07] (03CR) 10Elukey: "Like a lot this change, it would be great to update the analytics configs to pull events from the new varnishkafka topic (webrequest_canar" [puppet] - 10https://gerrit.wikimedia.org/r/361844 (https://phabricator.wikimedia.org/T169039) (owner: 10Ema) [14:13:59] (03PS3) 10Ema: 4.1.7-1wm1: new upstream, new counters [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/361845 (https://phabricator.wikimedia.org/T164768) [14:14:03] (03PS2) 10Elukey: Add cron job dropping webrequest from druid [puppet] - 10https://gerrit.wikimedia.org/r/362148 (https://phabricator.wikimedia.org/T168614) (owner: 10Joal) [14:16:07] (03PS6) 10Filippo Giunchedi: swift: use implicit /dev/swift prefix for swift devices [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) [14:16:09] (03PS1) 10Filippo Giunchedi: Use fourth partition on ms-be SSD for swift data [puppet] - 10https://gerrit.wikimedia.org/r/362208 (https://phabricator.wikimedia.org/T151648) [14:16:45] (03CR) 10Elukey: "This cron will push logs to the same location as refinery-drop-webrequest-raw-partitions, is it intended?" [puppet] - 10https://gerrit.wikimedia.org/r/362148 (https://phabricator.wikimedia.org/T168614) (owner: 10Joal) [14:17:26] (03PS2) 10Filippo Giunchedi: Use fourth partition on ms-be SSD for swift data [puppet] - 10https://gerrit.wikimedia.org/r/362208 (https://phabricator.wikimedia.org/T151648) [14:19:06] (03CR) 10BBlack: [C: 031] 4.1.7-1wm1: new upstream, new counters [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/361845 (https://phabricator.wikimedia.org/T164768) (owner: 10Ema) [14:21:29] (03CR) 10Ema: [C: 032] 4.1.7-1wm1: new upstream, new counters [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/361845 (https://phabricator.wikimedia.org/T164768) (owner: 10Ema) [14:29:10] (03PS2) 10BBlack: Add CAA records for wikimedia.org/wikipedia.org [dns] - 10https://gerrit.wikimedia.org/r/356200 (https://phabricator.wikimedia.org/T155806) (owner: 10Faidon Liambotis) [14:30:08] (03CR) 10BBlack: [C: 031] "I'd still like to add globalsign to both and then expand the wp.org entry to the other canonicals, but neither are immediately critical, s" [dns] - 10https://gerrit.wikimedia.org/r/356200 (https://phabricator.wikimedia.org/T155806) (owner: 10Faidon Liambotis) [14:30:21] !log varnish 4.1.7-1wm1 uploaded to apt.w.o, cp1008 upgraded T164768 [14:30:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:30] T164768: Explicitly limit varnishd transient storage - https://phabricator.wikimedia.org/T164768 [14:31:50] (03CR) 10Filippo Giunchedi: [C: 032] Use fourth partition on ms-be SSD for swift data [puppet] - 10https://gerrit.wikimedia.org/r/362208 (https://phabricator.wikimedia.org/T151648) (owner: 10Filippo Giunchedi) [14:32:52] PROBLEM - Varnish HTTP text-backend - port 3128 on cp1008 is CRITICAL: connect to address 208.80.154.42 and port 3128: Connection refused [14:33:10] looking ^ [14:34:33] oh, I forgot to run puppet after the package upgrade, fixing [14:35:52] RECOVERY - Varnish HTTP text-backend - port 3128 on cp1008 is OK: HTTP OK: HTTP/1.1 200 OK - 174 bytes in 0.001 second response time [14:36:20] (03CR) 10BBlack: [C: 032] "Manually re-generated and validated, LGTM as-is" [dns] - 10https://gerrit.wikimedia.org/r/356200 (https://phabricator.wikimedia.org/T155806) (owner: 10Faidon Liambotis) [14:36:37] 10Operations, 10Discovery, 10Maps, 10Traffic, 10Interactive-Sprint: Rate-limit browsers without referers - https://phabricator.wikimedia.org/T154704#3390940 (10debt) [14:36:49] (03PS1) 10Alexandros Kosiaris: striker: Override http-socket config [puppet] - 10https://gerrit.wikimedia.org/r/362210 (https://phabricator.wikimedia.org/T169070) [14:37:04] 10Operations, 10Analytics, 10Traffic, 10Patch-For-Review: Implement Varnish-level rough ratelimiting - https://phabricator.wikimedia.org/T163233#3390943 (10debt) [14:38:28] 10Operations, 10Analytics, 10Traffic, 10Patch-For-Review: Implement Varnish-level rough ratelimiting - https://phabricator.wikimedia.org/T163233#3190763 (10debt) [14:38:33] 10Operations, 10Discovery, 10Maps, 10Traffic, 10Interactive-Sprint: Rate-limit browsers without referers - https://phabricator.wikimedia.org/T154704#2921080 (10debt) 05Open>03Resolved Closing, per new work detailed in T169175 [14:41:11] 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: Add CAA records to our domains - https://phabricator.wikimedia.org/T155806#3390977 (10BBlack) ssllabs confirms the expected changes above. The sslmate generator doesn't allow for custom entries to get globalsign.com in early (as a likely guess for when... [14:46:17] (03PS3) 10Jcrespo: mariadb: handle service for systemd -autostart and overrides [puppet] - 10https://gerrit.wikimedia.org/r/362156 (https://phabricator.wikimedia.org/T168356) [14:47:01] !log several restarts of db2072 services and host on the following hour [14:47:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:51] (03PS1) 10Rush: labstore: secondary cluster set 1004 as primary [puppet] - 10https://gerrit.wikimedia.org/r/362214 [14:52:27] (03CR) 10Paladox: "Ah phabricator support for stretch :)." [puppet] - 10https://gerrit.wikimedia.org/r/362124 (owner: 10Dzahn) [14:57:23] !log reboot aluminium.wikimedia.org bromine.eqiad.wmnet etherpad1001.eqiad.wmnet d-i-test.eqiad.wmnet kubestagetcd1001.eqiad.wmnet mx1001.wikimedia.org seaborgium.wikimedia.org for kernel upgrades [14:57:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:52] mobrovac: hello i am ready to work on scb2005 when you are done putting it in maintenance mode and power off thanks [15:00:34] papaul: great, will do so in the next 5 mins [15:01:45] mobrovac: thanks [15:01:52] PROBLEM - salt-minion processes on d-i-test is CRITICAL: Return code of 255 is out of bounds [15:02:02] PROBLEM - puppet last run on d-i-test is CRITICAL: Return code of 255 is out of bounds [15:02:12] ignore all of these ^ [15:02:12] PROBLEM - Check systemd state on d-i-test is CRITICAL: Return code of 255 is out of bounds [15:02:12] PROBLEM - configured eth on d-i-test is CRITICAL: Return code of 255 is out of bounds [15:02:20] system going down the drain [15:02:22] PROBLEM - Disk space on d-i-test is CRITICAL: Return code of 255 is out of bounds [15:02:22] PROBLEM - dhclient process on d-i-test is CRITICAL: Return code of 255 is out of bounds [15:02:38] !log purge d-i-test from puppet/salt [15:02:42] PROBLEM - DPKG on d-i-test is CRITICAL: Return code of 255 is out of bounds [15:02:44] PROBLEM - SSH on d-i-test is CRITICAL: connect to address 10.64.32.201 and port 22: Connection refused [15:02:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:47] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3391121 (10Halfak) [15:07:29] (03PS2) 10Jcrespo: mariadb: Set default limits for systemd core databases [puppet] - 10https://gerrit.wikimedia.org/r/362204 (https://phabricator.wikimedia.org/T168356) [15:08:29] (03CR) 10Jcrespo: "Because queueing due to the pool of connections, 1 connection =/= 1 thread, so the deafult (~10K threads) is probably more than enough for" [puppet] - 10https://gerrit.wikimedia.org/r/362204 (https://phabricator.wikimedia.org/T168356) (owner: 10Jcrespo) [15:09:35] (03PS3) 10Jcrespo: mariadb: Set default limits for systemd core databases [puppet] - 10https://gerrit.wikimedia.org/r/362204 (https://phabricator.wikimedia.org/T168356) [15:09:56] !log set downtimes for labstore1004/1005 failover see https://etherpad.wikimedia.org/p/labstore_reboots [15:10:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:03] (03CR) 10Joal: "I thought it was a good idea to keep all webrequest-related logs together. I f you prefer, we can use another file :)" [puppet] - 10https://gerrit.wikimedia.org/r/362148 (https://phabricator.wikimedia.org/T168614) (owner: 10Joal) [15:12:38] (03CR) 10Paladox: phabricator: add support for stretch and PHP7 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/362124 (owner: 10Dzahn) [15:20:50] PROBLEM - showmount succeeds on a labs instance on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/nfs/secondary_cluster_showmount - 185 bytes in 26.105 second response time [15:21:20] PROBLEM - Host dbmonitor1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:50] PROBLEM - Host krypton is DOWN: PING CRITICAL - Packet loss = 100% [15:22:11] PROBLEM - Host etcd1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:22:11] PROBLEM - Host etherpad1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:22:11] PROBLEM - Host etcd1006 is DOWN: PING CRITICAL - Packet loss = 100% [15:22:11] PROBLEM - Host mx1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:22:11] PROBLEM - Host aluminium is DOWN: PING CRITICAL - Packet loss = 100% [15:22:11] PROBLEM - Host nitrogen is DOWN: PING CRITICAL - Packet loss = 100% [15:22:11] PROBLEM - Host roentgenium is DOWN: PING CRITICAL - Packet loss = 100% [15:22:12] PROBLEM - Host kubestagetcd1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:22:12] PROBLEM - Host bromine is DOWN: PING CRITICAL - Packet loss = 100% [15:22:13] PROBLEM - Host mendelevium is DOWN: PING CRITICAL - Packet loss = 100% [15:22:13] PROBLEM - Host mwdebug1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:22:20] PROBLEM - Host seaborgium is DOWN: PING CRITICAL - Packet loss = 100% [15:22:20] PROBLEM - Host sca1003 is DOWN: PING CRITICAL - Packet loss = 100% [15:22:20] PROBLEM - Host darmstadtium is DOWN: PING CRITICAL - Packet loss = 100% [15:22:20] PROBLEM - Host argon is DOWN: PING CRITICAL - Packet loss = 100% [15:22:30] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (bad URL) timed out before a response was received: /api (Zotero alive) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [15:22:30] PROBLEM - Host poolcounter1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:22:40] PROBLEM - Postgres Replication Lag on nihal is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:22:50] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (bad URL) timed out before a response was received: /api (Zotero alive) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [15:22:50] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (bad URL) timed out before a response was received: /api (Zotero alive) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [15:22:50] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) timed out before a response was received: /api (Zotero alive) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [15:22:51] PROBLEM - LibreNMS HTTPS on netmon1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:23:01] PROBLEM - url_downloader on alsafi is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:23:10] PROBLEM - Check Varnish expiry mailbox lag on cp4013 is CRITICAL: CRITICAL: expiry mailbox lag is 2038252 [15:23:17] I am the cause of all these ^ [15:23:20] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (bad URL) timed out before a response was received: /api (Zotero alive) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received [15:23:30] RECOVERY - showmount succeeds on a labs instance on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.551 second response time [15:24:30] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:24:43] (03PS1) 10Jcrespo: mariadb: Add cluster manager hosts to allowed admin port users [puppet] - 10https://gerrit.wikimedia.org/r/362217 [15:24:50] PROBLEM - puppet last run on elastic1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:00] PROBLEM - puppet last run on labvirt1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:00] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:10] PROBLEM - puppet last run on planet1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:10] PROBLEM - puppet last run on cp1071 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:10] PROBLEM - puppet last run on labsdb1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:10] PROBLEM - puppet last run on radium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:11] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:20] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:30] PROBLEM - puppet last run on mc1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:30] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:30] PROBLEM - puppet last run on graphite1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:52] (03PS1) 10Alexandros Kosiaris: Revert "Revert "Depool poolcounter1001"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362218 [15:26:00] PROBLEM - puppet last run on serpens is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:10] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:10] PROBLEM - puppet last run on multatuli is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:20] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:30] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:30] PROBLEM - puppet last run on analytics1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:30] PROBLEM - puppet last run on prometheus1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:30] PROBLEM - puppet last run on ununpentium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:31] PROBLEM - puppet last run on dbproxy1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:31] PROBLEM - puppet last run on ms-fe1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:38] (03CR) 10BryanDavis: "I will test this with a cherry pick in my labs project for striker. It looks like it should do the job, but easy to make sure." [puppet] - 10https://gerrit.wikimedia.org/r/362210 (https://phabricator.wikimedia.org/T169070) (owner: 10Alexandros Kosiaris) [15:26:50] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [15:26:51] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:55] !log scb2005 depooled all services for T167763 [15:27:00] PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:00] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:04] T167763: Troubleshoot scb2005 NICs - https://phabricator.wikimedia.org/T167763 [15:27:10] PROBLEM - puppet last run on cp1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:10] RECOVERY - Host mw1228 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [15:27:10] PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:11] PROBLEM - puppet last run on db1093 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:20] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:20] PROBLEM - puppet last run on mc1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:20] PROBLEM - puppet last run on thumbor1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:22] papaul: ok, we are good to go, i set a 2h maintenance window, is that enough? [15:27:30] PROBLEM - puppet last run on kafka1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:30] PROBLEM - puppet last run on elastic1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:30] PROBLEM - puppet last run on wtp1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:31] PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:31] PROBLEM - puppet last run on analytics1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:31] PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:31] PROBLEM - puppet last run on ganeti1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:32] PROBLEM - puppet last run on elastic1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:32] PROBLEM - puppet last run on mwlog1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:33] PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:43] mobrovac: please pause any maintenance [15:27:48] we have issues ongoing [15:27:50] PROBLEM - puppet last run on ms-be1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:50] PROBLEM - puppet last run on elastic1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:50] PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:27:50] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:00] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:01] PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:01] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:01] PROBLEM - puppet last run on mc1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:01] PROBLEM - puppet last run on restbase1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:10] PROBLEM - puppet last run on db1101 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:10] PROBLEM - puppet last run on wtp1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:10] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:10] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:10] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:20] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:24] kk jynus, papaul ^ [15:28:31] PROBLEM - puppet last run on mw1193 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:31] PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:31] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:31] PROBLEM - puppet last run on db1102 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:31] PROBLEM - puppet last run on kubernetes1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:31] PROBLEM - puppet last run on auth1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:31] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:32] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:32] PROBLEM - puppet last run on dbproxy1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:50] PROBLEM - puppet last run on db1079 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:28:50] PROBLEM - puppet last run on mw1284 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:00] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [15:29:00] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:00] PROBLEM - puppet last run on db1078 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:00] PROBLEM - puppet last run on mc1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:01] PROBLEM - puppet last run on maps1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:10] PROBLEM - puppet last run on rdb1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:10] PROBLEM - puppet last run on elastic1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:10] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:10] PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:20] PROBLEM - puppet last run on mw1224 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:20] PROBLEM - Nginx local proxy to apache on mw1228 is CRITICAL: connect to address 10.64.48.63 and port 443: Connection refused [15:29:20] PROBLEM - salt-minion processes on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:29:20] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:20] PROBLEM - puppet last run on mc1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:20] PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:21] PROBLEM - puppet last run on netmon1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:21] PROBLEM - nutcracker port on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:29:22] PROBLEM - Disk space on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:29:30] PROBLEM - Check systemd state on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:29:30] PROBLEM - puppet last run on ms-be1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:30] PROBLEM - HHVM processes on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:29:30] PROBLEM - nutcracker process on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:29:30] PROBLEM - Check whether ferm is active by checking the default input chain on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:29:30] PROBLEM - configured eth on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:29:31] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:31] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:32] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:32] PROBLEM - puppet last run on etcd1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:33] PROBLEM - puppet last run on ganeti1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:33] PROBLEM - puppet last run on aqs1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:34] PROBLEM - puppet last run on mw1273 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:34] PROBLEM - puppet last run on restbase1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:50] PROBLEM - HHVM rendering on mw1228 is CRITICAL: connect to address 10.64.48.63 and port 80: Connection refused [15:29:50] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:29:51] PROBLEM - puppet last run on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:30:00] PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:00] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:00] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:10] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:10] PROBLEM - puppet last run on analytics1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:10] PROBLEM - puppet last run on kafka1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:10] PROBLEM - dhclient process on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:30:11] PROBLEM - Check size of conntrack table on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:30:11] PROBLEM - DPKG on mw1228 is CRITICAL: Return code of 255 is out of bounds [15:30:11] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:12] PROBLEM - puppet last run on dbproxy1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:12] PROBLEM - puppet last run on cp1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:20] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:20] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:20] PROBLEM - puppet last run on kubestagetcd1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:21] PROBLEM - puppet last run on scb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:30] PROBLEM - puppet last run on naos is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:30] PROBLEM - puppet last run on es1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:30] PROBLEM - puppet last run on releases1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:30] PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:30] PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:31] PROBLEM - puppet last run on dbproxy1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:31] PROBLEM - puppet last run on etcd1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:30:34] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "Revert "Depool poolcounter1001"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362218 (owner: 10Alexandros Kosiaris) [15:31:00] PROBLEM - puppet last run on mx2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:01] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:10] PROBLEM - puppet last run on elastic1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:10] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:10] PROBLEM - puppet last run on mw1288 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:10] PROBLEM - puppet last run on wtp1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:10] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:11] PROBLEM - puppet last run on mc1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:11] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:20] PROBLEM - puppet last run on mc1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:20] PROBLEM - puppet last run on cp3049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:20] PROBLEM - puppet last run on db1086 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:20] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:20] PROBLEM - puppet last run on db1077 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:21] PROBLEM - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:25] cannot reach Wikitech [15:31:26] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3391332 (10RobH) [15:31:30] PROBLEM - puppet last run on mw1287 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:30] PROBLEM - puppet last run on analytics1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:31] PROBLEM - puppet last run on wtp1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:31] PROBLEM - puppet last run on analytics1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:31] PROBLEM - puppet last run on dumpsdata1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:31] PROBLEM - puppet last run on analytics1069 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:31] PROBLEM - puppet last run on mw1219 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:32] PROBLEM - puppet last run on kafka1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:32] PROBLEM - puppet last run on db1064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:33] PROBLEM - puppet last run on logstash1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:33] PROBLEM - puppet last run on logstash1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:34] PROBLEM - puppet last run on dbproxy1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:40] PROBLEM - Host ganeti1002 is DOWN: PING CRITICAL - Packet loss = 100% [15:31:50] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [15:31:51] PROBLEM - Host ganeti1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:31:51] PROBLEM - Host ganeti1003 is DOWN: PING CRITICAL - Packet loss = 100% [15:31:51] PROBLEM - puppet last run on druid1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:31:51] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:00] PROBLEM - puppet last run on puppetmaster2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:10] PROBLEM - puppet last run on mw1242 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:10] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:11] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:11] PROBLEM - puppet last run on db1088 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:11] PROBLEM - puppet last run on mw1280 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:11] PROBLEM - puppet last run on labservices1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:11] PROBLEM - puppet last run on elastic1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:12] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:20] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:20] PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:20] PROBLEM - puppet last run on db1090 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:20] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:20] PROBLEM - puppet last run on ms-be1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:30] PROBLEM - puppet last run on lvs1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:30] PROBLEM - puppet last run on elastic1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:30] PROBLEM - puppet last run on mw1270 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:30] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:30] PROBLEM - puppet last run on db1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:31] PROBLEM - puppet last run on meitnerium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:31] PROBLEM - puppet last run on mc1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:32] PROBLEM - puppet last run on mw1304 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:32] PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:33] PROBLEM - puppet last run on elastic1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:40] PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:00] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:10] RECOVERY - Check Varnish expiry mailbox lag on cp4013 is OK: OK: expiry mailbox lag is 0 [15:33:10] PROBLEM - puppet last run on labvirt1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:20] PROBLEM - puppet last run on labvirt1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:20] PROBLEM - puppet last run on labvirt1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:20] PROBLEM - puppet last run on mw1290 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:20] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:20] PROBLEM - puppet last run on mc1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:20] PROBLEM - puppet last run on elastic1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:30] PROBLEM - puppet last run on elastic1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:31] PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:31] PROBLEM - puppet last run on mw1266 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:31] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:31] PROBLEM - puppet last run on mc1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:40] PROBLEM - puppet last run on puppetmaster2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:40] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:40] PROBLEM - puppet last run on elastic1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:41] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:50] PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:01] !log akosiaris@tin Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 02m 54s) [15:34:10] PROBLEM - puppet last run on aqs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:10] PROBLEM - puppet last run on druid1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:10] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:10] PROBLEM - puppet last run on rutherfordium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:10] PROBLEM - puppet last run on cp3039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:20] PROBLEM - puppet last run on dbproxy1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:30] PROBLEM - puppet last run on lvs1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:30] PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:30] PROBLEM - puppet last run on mw1306 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:31] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:31] PROBLEM - puppet last run on analytics1055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:31] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:31] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:31] PROBLEM - puppet last run on analytics1053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:32] PROBLEM - puppet last run on analytics1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:32] PROBLEM - puppet last run on analytics1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:33] PROBLEM - puppet last run on wtp1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:33] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:34] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:34] PROBLEM - puppet last run on elastic1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:50] PROBLEM - puppet last run on mw1272 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:00] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [15:35:00] PROBLEM - puppet last run on prometheus2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:01] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:10] PROBLEM - puppet last run on druid1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:10] PROBLEM - puppet last run on kafka1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:10] PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:10] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:10] PROBLEM - puppet last run on wtp1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:20] PROBLEM - puppet last run on ms-be1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:20] PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:20] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:21] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:21] PROBLEM - puppet last run on ms-be1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:30] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:30] PROBLEM - puppet last run on elastic1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:30] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:31] PROBLEM - puppet last run on analytics1056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:31] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:31] PROBLEM - puppet last run on mw1261 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:31] PROBLEM - puppet last run on es1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:32] PROBLEM - puppet last run on ores1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:32] PROBLEM - puppet last run on es1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:33] PROBLEM - puppet last run on analytics1059 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:33] PROBLEM - puppet last run on pc1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:34] PROBLEM - puppet last run on analytics1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:34] PROBLEM - puppet last run on lvs1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:35:50] PROBLEM - puppet last run on labnodepool1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:01] PROBLEM - puppet last run on cp3032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:10] PROBLEM - puppet last run on restbase1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:10] PROBLEM - puppet last run on mc1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:10] PROBLEM - puppet last run on mc1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:10] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:10] PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:10] PROBLEM - puppet last run on mw1262 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:11] PROBLEM - puppet last run on labnet1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:11] PROBLEM - puppet last run on db1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:12] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:12] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:13] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:20] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:20] PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:20] PROBLEM - puppet last run on ms-fe1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:20] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:30] PROBLEM - puppet last run on ms-fe1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:30] PROBLEM - puppet last run on analytics1042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:31] PROBLEM - puppet last run on mw1167 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:31] PROBLEM - puppet last run on analytics1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:31] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:31] PROBLEM - puppet last run on es1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:31] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:32] PROBLEM - puppet last run on labvirt1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:40] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:40] PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:36:50] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:00] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:00] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:00] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:00] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:10] PROBLEM - puppet last run on mc1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:10] PROBLEM - puppet last run on labvirt1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:10] PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:10] PROBLEM - puppet last run on dumpsdata1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:10] PROBLEM - puppet last run on wtp1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:10] PROBLEM - puppet last run on pc1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:11] PROBLEM - puppet last run on mc1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:11] PROBLEM - puppet last run on mc1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:12] PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:20] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:20] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:30] PROBLEM - puppet last run on elastic1042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:30] PROBLEM - puppet last run on mw1286 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:30] PROBLEM - puppet last run on db1055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:31] PROBLEM - puppet last run on dbproxy1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:37:31] PROBLEM - puppet last run on dbproxy1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:00] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:00] PROBLEM - puppet last run on db1082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:00] PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:01] PROBLEM - puppet last run on db1095 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:01] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:10] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:10] PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:10] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:10] PROBLEM - puppet last run on mw1250 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:10] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:10] PROBLEM - puppet last run on wtp1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:11] PROBLEM - puppet last run on puppetmaster1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:11] PROBLEM - puppet last run on mc1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:12] PROBLEM - puppet last run on francium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:12] PROBLEM - puppet last run on ms1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:13] PROBLEM - puppet last run on labvirt1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:13] PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:20] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:20] PROBLEM - puppet last run on elastic1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:20] PROBLEM - puppet last run on mc1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:24] !log restart nfs-exportd on labstore1004 [15:38:30] PROBLEM - puppet last run on elastic1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:30] PROBLEM - puppet last run on labvirt1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:30] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:30] PROBLEM - puppet last run on mw1300 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:30] PROBLEM - puppet last run on relforge1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:31] PROBLEM - puppet last run on dubnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:31] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:32] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:32] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:33] PROBLEM - puppet last run on ores1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:33] PROBLEM - puppet last run on ores1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:34] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:40] PROBLEM - puppet last run on dbmonitor2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:40] PROBLEM - puppet last run on install2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:40] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:40] PROBLEM - puppet last run on lithium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:55] madhuvishy: same [15:39:00] PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:00] PROBLEM - puppet last run on lvs1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:05] hmm https://www.mediawiki.org is not loading for me [15:39:10] PROBLEM - puppet last run on rdb1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:10] PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:10] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:10] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:10] PROBLEM - puppet last run on restbase1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:11] PROBLEM - puppet last run on cp1073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:11] it's slow [15:39:11] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:12] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:12] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:20] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:20] PROBLEM - puppet last run on db1074 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:20] PROBLEM - puppet last run on labcontrol1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:21] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:30] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:31] PROBLEM - puppet last run on relforge1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:31] PROBLEM - puppet last run on labvirt1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:31] PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:31] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:31] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:31] PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:40] PROBLEM - puppet last run on elastic1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:40] PROBLEM - puppet last run on ms-be1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:50] PROBLEM - puppet last run on db1092 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:50] PROBLEM - puppet last run on mw1302 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:39:51] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:10] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:12] PROBLEM - puppet last run on mw1303 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:12] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:12] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:12] PROBLEM - puppet last run on maps1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:20] PROBLEM - puppet last run on db1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:30] PROBLEM - puppet last run on ms-be1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:30] PROBLEM - puppet last run on mw1187 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:30] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:30] PROBLEM - puppet last run on kafka1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:31] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:31] PROBLEM - puppet last run on db1070 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:31] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:32] PROBLEM - puppet last run on kubernetes1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:32] PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:33] PROBLEM - puppet last run on maps1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:50] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:50] PROBLEM - puppet last run on db1075 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:58] !log disable puppet on all of eqiad/esams, problems with ganeti and puppetdb [15:41:00] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:10] PROBLEM - puppet last run on mc1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:10] PROBLEM - puppet last run on mw1241 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:10] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:10] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:10] PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:11] PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:11] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:20] PROBLEM - puppet last run on db1081 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:20] PROBLEM - puppet last run on rcs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:30] PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:30] PROBLEM - puppet last run on wdqs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:31] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:31] PROBLEM - puppet last run on mc1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:31] PROBLEM - puppet last run on mw1267 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:31] PROBLEM - puppet last run on aqs1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:32] PROBLEM - puppet last run on kubestage1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:32] PROBLEM - puppet last run on es1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:33] PROBLEM - puppet last run on mw1227 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:33] PROBLEM - puppet last run on analytics1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:34] PROBLEM - puppet last run on dbproxy1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:40] RECOVERY - Check Varnish expiry mailbox lag on cp1049 is OK: OK: expiry mailbox lag is 0 [15:41:40] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:51] PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:41:51] PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:00] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:10] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:10] PROBLEM - puppet last run on mw1271 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:10] PROBLEM - puppet last run on mw1268 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:11] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:11] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:11] PROBLEM - puppet last run on notebook1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:11] PROBLEM - puppet last run on restbase1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:30] PROBLEM - puppet last run on ms-be1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:30] PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:30] PROBLEM - puppet last run on ganeti1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:30] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:30] PROBLEM - puppet last run on kafka1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:31] PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:31] PROBLEM - puppet last run on chlorine is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:33] PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:33] PROBLEM - puppet last run on kubestage1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:33] PROBLEM - puppet last run on es1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:33] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:34] PROBLEM - puppet last run on es1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:34] PROBLEM - puppet last run on db1053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:35] PROBLEM - puppet last run on mc1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:40] RECOVERY - Host ganeti1002 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [15:42:40] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:50] PROBLEM - puppet last run on ms-be1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:50] PROBLEM - mediawiki-installation DSH group on mw1228 is CRITICAL: Host mw1228 is not in mediawiki-installation dsh group [15:43:10] PROBLEM - puppet last run on mw1255 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:10] PROBLEM - puppet last run on ores1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:11] PROBLEM - puppet last run on mw1265 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:11] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:11] PROBLEM - puppet last run on mw1223 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:11] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:20] PROBLEM - puppet last run on db1076 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:20] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:20] PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:30] PROBLEM - puppet last run on elastic1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:30] PROBLEM - puppet last run on ms-be1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:30] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:30] PROBLEM - puppet last run on maps1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:31] PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:31] PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:31] PROBLEM - puppet last run on aqs1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:32] PROBLEM - puppet last run on etcd1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:32] PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:33] PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:33] PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:34] PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:34] PROBLEM - puppet last run on logstash1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:41] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:41] PROBLEM - puppet last run on mw1296 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:41] PROBLEM - puppet last run on mc1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:41] PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:50] PROBLEM - puppet last run on hassium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:43:50] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:03] ooooh [15:44:10] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:10] PROBLEM - puppet last run on analytics1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:10] PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:10] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:10] PROBLEM - puppet last run on cp1074 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:10] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:11] PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:11] PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:12] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:12] PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:13] PROBLEM - puppet last run on ms-be1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:20] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:21] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:30] PROBLEM - puppet last run on mc1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:30] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:30] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:30] PROBLEM - puppet last run on ms-be1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:30] PROBLEM - puppet last run on db1099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:30] PROBLEM - puppet last run on elastic1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:31] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:31] PROBLEM - puppet last run on oresrdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:32] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:00] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:01] (03CR) 10Mobrovac: [C: 031] scap3 - deployment of package requires configuration to already exist [puppet] - 10https://gerrit.wikimedia.org/r/362155 (https://phabricator.wikimedia.org/T169011) (owner: 10Gehel) [15:45:10] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:10] PROBLEM - puppet last run on db1091 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:10] PROBLEM - puppet last run on mw1299 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:12] PROBLEM - puppet last run on mw1283 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:12] PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:12] PROBLEM - puppet last run on mw1269 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:12] PROBLEM - puppet last run on thorium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:12] PROBLEM - puppet last run on oresrdb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:12] PROBLEM - puppet last run on logstash1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:13] PROBLEM - puppet last run on rdb1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:20] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:20] PROBLEM - puppet last run on cp1099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:20] PROBLEM - puppet last run on eventlog1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:20] PROBLEM - puppet last run on mw1259 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:20] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:20] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:21] PROBLEM - puppet last run on db1085 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:21] PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:22] PROBLEM - puppet last run on mw1252 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:30] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:30] PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:30] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:31] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:31] PROBLEM - puppet last run on wtp1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:31] PROBLEM - puppet last run on cobalt is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:31] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:32] PROBLEM - puppet last run on analytics1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:32] PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:33] PROBLEM - puppet last run on prometheus1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:33] PROBLEM - puppet last run on ores1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:40] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:50] RECOVERY - LibreNMS HTTPS on netmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8461 bytes in 0.035 second response time [15:46:00] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:00] RECOVERY - Host seaborgium is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [15:46:00] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:11] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:11] PROBLEM - puppet last run on ganeti1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:11] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:11] PROBLEM - puppet last run on ocg1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:11] PROBLEM - puppet last run on snapshot1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:11] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:11] PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:20] PROBLEM - puppet last run on ms-be1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:20] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:30] PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:30] PROBLEM - puppet last run on ms-be1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:30] PROBLEM - puppet last run on kubestagetcd1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:30] PROBLEM - puppet last run on wdqs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:31] PROBLEM - puppet last run on labvirt1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:31] PROBLEM - puppet last run on restbase-dev1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:32] PROBLEM - puppet last run on db1097 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:32] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:40] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:10] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:10] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:10] PROBLEM - puppet last run on thumbor1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:10] PROBLEM - puppet last run on analytics1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:10] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:11] PROBLEM - puppet last run on labcontrol1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:11] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:20] PROBLEM - puppet last run on mwdebug1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:20] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:20] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:30] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:30] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:30] PROBLEM - puppet last run on ores1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:31] PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:40] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:40] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:40] PROBLEM - puppet last run on ganeti1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:50] PROBLEM - puppet last run on ms-be1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:47:50] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:00] PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:00] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:05] (03CR) 10jenkins-bot: Revert "Revert "Depool poolcounter1001"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362218 (owner: 10Alexandros Kosiaris) [15:48:10] PROBLEM - puppet last run on db1096 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:10] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:10] PROBLEM - puppet last run on labtestservices2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:20] PROBLEM - puppet last run on labvirt1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:20] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:20] PROBLEM - puppet last run on labvirt1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:20] PROBLEM - puppet last run on db1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:20] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:21] PROBLEM - puppet last run on mw1231 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:21] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:22] PROBLEM - puppet last run on stat1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:22] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:30] PROBLEM - puppet last run on seaborgium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:30] PROBLEM - puppet last run on analytics1058 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:30] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:31] PROBLEM - puppet last run on analytics1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:31] PROBLEM - puppet last run on mw1209 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:31] PROBLEM - puppet last run on thumbor1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:31] PROBLEM - puppet last run on restbase-dev1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:32] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:40] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:40] PROBLEM - puppet last run on elastic1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:40] PROBLEM - puppet last run on restbase-dev1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:50] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:48:50] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:10] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:10] PROBLEM - puppet last run on db1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:10] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:10] PROBLEM - puppet last run on rdb1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:11] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:20] PROBLEM - puppet last run on scb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:20] PROBLEM - puppet last run on db1087 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:21] PROBLEM - puppet last run on mc1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:21] (03PS1) 10Filippo Giunchedi: thumbor: use poolcounter1002 [puppet] - 10https://gerrit.wikimedia.org/r/362225 [15:49:30] PROBLEM - puppet last run on aqs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:30] PROBLEM - puppet last run on kafka1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:30] PROBLEM - puppet last run on analytics1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:31] PROBLEM - puppet last run on wtp1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:31] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:31] PROBLEM - puppet last run on analytics1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:31] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:32] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:40] PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:40] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:50] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] thumbor: use poolcounter1002 [puppet] - 10https://gerrit.wikimedia.org/r/362225 (owner: 10Filippo Giunchedi) [15:49:50] PROBLEM - puppet last run on ms-be1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:49:50] PROBLEM - Check for valid instance states on labnodepool1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:49:53] PROBLEM - puppet last run on elastic1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:11] PROBLEM - puppet last run on mw1285 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:11] PROBLEM - puppet last run on bohrium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:11] PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:11] PROBLEM - puppet last run on mw1277 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:11] PROBLEM - puppet last run on elastic1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:13] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:20] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:20] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:30] PROBLEM - puppet last run on poolcounter1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:31] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:31] PROBLEM - puppet last run on elastic1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:31] PROBLEM - puppet last run on labvirt1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:31] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:31] PROBLEM - puppet last run on db1103 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:31] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:32] PROBLEM - puppet last run on analytics1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:32] PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:33] PROBLEM - puppet last run on restbase1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:40] PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:40] PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:50] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:50:50] PROBLEM - puppet last run on ms-fe1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:00] PROBLEM - puppet last run on aqs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:10] PROBLEM - puppet last run on mw1260 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:10] PROBLEM - puppet last run on mw1245 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:10] PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:10] PROBLEM - puppet last run on mw1247 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:10] PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:11] PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:11] PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:12] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:12] PROBLEM - puppet last run on ms-be1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:20] PROBLEM - puppet last run on db1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:20] PROBLEM - puppet last run on db1084 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:21] PROBLEM - puppet last run on db1080 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:30] PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:30] PROBLEM - puppet last run on mw1225 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:30] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:31] PROBLEM - puppet last run on ms-be1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:31] PROBLEM - puppet last run on analytics1064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:31] PROBLEM - puppet last run on analytics1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:31] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:32] PROBLEM - puppet last run on kubernetes1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:40] PROBLEM - puppet last run on lvs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:40] PROBLEM - puppet last run on ms-be1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:41] PROBLEM - puppet last run on prometheus2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:41] PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:41] RECOVERY - Check for valid instance states on labnodepool1001 is OK: nodepool state management is OK [15:52:00] PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:00] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:10] PROBLEM - puppet last run on etcd1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:10] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:11] PROBLEM - puppet last run on labsdb1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:20] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:20] PROBLEM - puppet last run on db1083 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:30] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:30] PROBLEM - puppet last run on ms-be1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:30] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:30] PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:31] PROBLEM - puppet last run on labvirt1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:31] PROBLEM - puppet last run on ores1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:40] PROBLEM - puppet last run on dbproxy1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:40] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:52:50] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.150 second response time [15:52:50] PROBLEM - puppet last run on wtp1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:00] PROBLEM - puppet last run on restbase1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:01] PROBLEM - puppet last run on ms-be1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:10] PROBLEM - puppet last run on stat1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:10] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:10] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:10] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:10] PROBLEM - puppet last run on restbase1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:10] PROBLEM - puppet last run on wtp1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:11] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:11] PROBLEM - puppet last run on ms-be1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:20] PROBLEM - puppet last run on mc1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:20] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:20] PROBLEM - puppet last run on mc1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:30] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:30] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:30] PROBLEM - puppet last run on elastic1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:30] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:31] PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:31] PROBLEM - puppet last run on mw1264 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:31] PROBLEM - puppet last run on mw1274 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:32] PROBLEM - puppet last run on kubernetes1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:32] PROBLEM - puppet last run on es1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:33] PROBLEM - puppet last run on labvirt1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:40] PROBLEM - puppet last run on cp3038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:40] PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:40] PROBLEM - puppet last run on kraz is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:50] PROBLEM - puppet last run on mc1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:51] PROBLEM - puppet last run on mc1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:00] PROBLEM - puppet last run on snapshot1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:00] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:00] PROBLEM - puppet last run on conf1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:11] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:11] PROBLEM - puppet last run on elastic1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:11] PROBLEM - puppet last run on install1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:11] PROBLEM - puppet last run on wtp1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:11] PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:11] PROBLEM - puppet last run on labsdb1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:11] PROBLEM - puppet last run on analytics1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:20] PROBLEM - puppet last run on elastic1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:20] RECOVERY - Host ganeti1003 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [15:54:20] PROBLEM - puppet last run on db1089 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:30] PROBLEM - puppet last run on restbase1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:30] PROBLEM - puppet last run on restbase1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:30] PROBLEM - puppet last run on mw1301 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:30] PROBLEM - puppet last run on db1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:30] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:30] PROBLEM - puppet last run on db1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:31] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:31] PROBLEM - puppet last run on ores1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:40] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:40] PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:40] PROBLEM - puppet last run on mc1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:50] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:50] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:10] PROBLEM - puppet last run on mw1253 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:10] PROBLEM - puppet last run on mw1257 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:10] PROBLEM - puppet last run on mc1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:20] PROBLEM - puppet last run on elastic1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:30] PROBLEM - puppet last run on elastic1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:30] PROBLEM - puppet last run on mw1279 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:30] PROBLEM - puppet last run on analytics1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:30] PROBLEM - puppet last run on es1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:40] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:55:40] RECOVERY - Host etherpad1001 is UP: PING OK - Packet loss = 0%, RTA = 4.39 ms [15:55:40] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [15:55:50] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [15:55:50] RECOVERY - Host mx1001 is UP: PING OK - Packet loss = 0%, RTA = 2.23 ms [15:55:50] RECOVERY - Host bromine is UP: PING OK - Packet loss = 0%, RTA = 2.20 ms [15:55:50] RECOVERY - Host kubestagetcd1001 is UP: PING OK - Packet loss = 0%, RTA = 1.76 ms [15:55:50] PROBLEM - puppet last run on mw1293 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:00] RECOVERY - Host aluminium is UP: PING OK - Packet loss = 0%, RTA = 1.28 ms [15:56:00] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [15:56:00] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [15:56:01] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [15:56:04] 10Operations, 10Office-IT, 10netops: Some BGP sessions to the SF Office down - https://phabricator.wikimedia.org/T167281#3391437 (10ayounsi) 05Open>03Resolved Figured it out over IRC, broken hardware has been replaced. [15:56:10] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:10] PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:10] PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:10] PROBLEM - puppet last run on notebook1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:11] RECOVERY - url_downloader on alsafi is OK: TCP OK - 0.003 second response time on url-downloader.wikimedia.org port 8080 [15:56:11] PROBLEM - puppet last run on thumbor1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:11] PROBLEM - puppet last run on wdqs1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:20] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:30] RECOVERY - Host ganeti1004 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [15:56:31] PROBLEM - puppet last run on analytics1068 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:31] PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:31] PROBLEM - puppet last run on ores1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:31] PROBLEM - puppet last run on mw1298 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:31] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:41] PROBLEM - puppet last run on ganeti1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:56:50] PROBLEM - puppet last run on elastic1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:57:00] PROBLEM - puppet last run on mw1295 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:57:20] RECOVERY - Host sca1003 is UP: PING OK - Packet loss = 0%, RTA = 3.09 ms [15:57:20] RECOVERY - Host poolcounter1001 is UP: PING OK - Packet loss = 0%, RTA = 3.82 ms [15:57:20] RECOVERY - Host nitrogen is UP: PING OK - Packet loss = 0%, RTA = 3.54 ms [15:57:20] RECOVERY - Host roentgenium is UP: PING OK - Packet loss = 0%, RTA = 3.97 ms [15:57:50] PROBLEM - puppet last run on etherpad1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:57:50] RECOVERY - Host argon is UP: PING OK - Packet loss = 0%, RTA = 6.12 ms [15:57:50] RECOVERY - Host darmstadtium is UP: PING OK - Packet loss = 0%, RTA = 6.37 ms [15:57:50] RECOVERY - Postgres Replication Lag on nihal is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 [15:57:51] RECOVERY - puppet last run on db1079 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [15:58:00] RECOVERY - Host dbmonitor1001 is UP: PING OK - Packet loss = 0%, RTA = 8.66 ms [15:58:00] RECOVERY - Host etcd1001 is UP: PING OK - Packet loss = 0%, RTA = 8.44 ms [15:58:00] RECOVERY - Host etcd1006 is UP: PING OK - Packet loss = 0%, RTA = 8.32 ms [15:58:00] PROBLEM - puppet last run on bromine is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:58:00] PROBLEM - spamassassin on mx1001 is CRITICAL: PROCS CRITICAL: 0 processes with args spamd [15:58:10] RECOVERY - Host krypton is UP: PING OK - Packet loss = 0%, RTA = 6.30 ms [15:58:10] RECOVERY - puppet last run on labvirt1006 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [15:58:10] RECOVERY - puppet last run on rdb1006 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:58:10] RECOVERY - puppet last run on wtp1005 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [15:58:11] RECOVERY - puppet last run on db1088 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:58:20] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:58:20] RECOVERY - Host mendelevium is UP: PING OK - Packet loss = 0%, RTA = 5.08 ms [15:58:20] RECOVERY - Host mwdebug1001 is UP: PING OK - Packet loss = 0%, RTA = 5.25 ms [15:58:20] RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:58:21] RECOVERY - puppet last run on db1090 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [15:58:30] RECOVERY - puppet last run on kubestagetcd1002 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [15:58:30] RECOVERY - puppet last run on seaborgium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [15:58:30] RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:58:30] RECOVERY - puppet last run on ms-be1024 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [15:58:30] RECOVERY - puppet last run on elastic1041 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [15:58:31] RECOVERY - puppet last run on wtp1008 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [15:58:31] RECOVERY - puppet last run on analytics1055 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [15:58:32] RECOVERY - puppet last run on analytics1043 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [15:58:32] RECOVERY - puppet last run on analytics1053 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:58:33] RECOVERY - puppet last run on etcd1005 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [15:58:34] RECOVERY - puppet last run on aqs1009 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [15:58:34] RECOVERY - puppet last run on es1012 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [15:58:34] RECOVERY - puppet last run on logstash1006 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [15:58:35] RECOVERY - puppet last run on db1016 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [15:58:40] PROBLEM - puppet last run on mx1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:58:40] RECOVERY - puppet last run on etcd1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [15:58:40] RECOVERY - puppet last run on dbproxy1007 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:58:40] RECOVERY - puppet last run on elastic1040 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [15:58:40] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [15:58:41] RECOVERY - puppet last run on elastic1048 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:59:00] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [15:59:00] PROBLEM - puppet last run on ganeti1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:59:00] RECOVERY - puppet last run on bromine is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [15:59:01] RECOVERY - puppet last run on prometheus2003 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:59:10] RECOVERY - puppet last run on puppetmaster2001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [15:59:10] RECOVERY - puppet last run on aqs1006 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [15:59:10] RECOVERY - puppet last run on mw1242 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:59:10] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [15:59:10] RECOVERY - puppet last run on wtp1024 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [15:59:10] RECOVERY - puppet last run on rutherfordium is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [15:59:11] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [15:59:11] RECOVERY - puppet last run on cp1058 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [15:59:12] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:59:20] RECOVERY - puppet last run on mw1280 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [15:59:21] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [15:59:21] RECOVERY - puppet last run on elastic1044 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [15:59:21] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:59:21] RECOVERY - puppet last run on labvirt1004 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:59:21] RECOVERY - puppet last run on cp3039 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [15:59:21] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [15:59:21] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:59:22] RECOVERY - puppet last run on ms-be1019 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [15:59:23] RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:59:23] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [15:59:30] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [15:59:30] RECOVERY - puppet last run on scb1004 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [15:59:30] RECOVERY - puppet last run on mw1306 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:59:30] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [15:59:30] RECOVERY - puppet last run on mw1261 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [15:59:30] RECOVERY - puppet last run on analytics1059 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:59:31] RECOVERY - puppet last run on es1018 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [15:59:31] RECOVERY - puppet last run on es1011 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [15:59:32] RECOVERY - puppet last run on wtp1011 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:59:32] RECOVERY - puppet last run on pc1004 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:59:40] RECOVERY - puppet last run on mx1001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [15:59:40] RECOVERY - puppet last run on lvs1009 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [15:59:40] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [15:59:40] RECOVERY - puppet last run on elastic1024 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [15:59:50] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [16:00:00] RECOVERY - puppet last run on labnodepool1001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:00:00] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:00:00] RECOVERY - spamassassin on mx1001 is OK: PROCS OK: 3 processes with args spamd [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170629T1600). [16:00:04] Smalyshev: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:10] PROBLEM - puppet last run on aluminium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:00:10] RECOVERY - puppet last run on mc1010 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:00:10] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:00:11] RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:00:11] RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:00:11] RECOVERY - puppet last run on wtp1014 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:00:11] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:00:12] RECOVERY - puppet last run on db1038 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:00:12] RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:00:13] RECOVERY - puppet last run on labservices1002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:00:14] (03CR) 10Nuria: "Thanks for doing changes, looks good will wait for @elukey to merge." [puppet] - 10https://gerrit.wikimedia.org/r/362148 (https://phabricator.wikimedia.org/T168614) (owner: 10Joal) [16:00:15] here [16:00:30] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:00:31] RECOVERY - puppet last run on lvs1012 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:00:31] RECOVERY - puppet last run on ms-be1038 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:00:31] RECOVERY - puppet last run on ms-fe1005 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:00:31] RECOVERY - puppet last run on analytics1042 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:00:31] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:00:31] RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:00:32] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:00:32] RECOVERY - puppet last run on ores1002 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:00:33] RECOVERY - puppet last run on es1016 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:00:33] RECOVERY - puppet last run on labvirt1009 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:00:34] RECOVERY - puppet last run on analytics1001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:00:40] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:00:40] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:00:50] PROBLEM - Check systemd state on krypton is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:01:00] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:01:01] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:01:10] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:01:10] RECOVERY - puppet last run on restbase1008 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:01:10] RECOVERY - puppet last run on druid1002 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:01:10] RECOVERY - puppet last run on mc1025 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:01:10] RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:01:11] RECOVERY - puppet last run on kafka1012 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:01:11] RECOVERY - puppet last run on dumpsdata1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:01:12] RECOVERY - puppet last run on pc1005 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:01:12] RECOVERY - puppet last run on cp3032 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:01:13] RECOVERY - puppet last run on mc1018 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:01:19] did JS just broke? [16:01:23] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:01:23] RECOVERY - puppet last run on dbproxy1004 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:01:31] RECOVERY - puppet last run on wtp1023 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:01:31] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:01:31] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:01:31] RECOVERY - puppet last run on db1055 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:01:31] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:01:41] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:01:45] (03CR) 10Elukey: "Yes please Joseph, let's use another file just to keep things separate :)" [puppet] - 10https://gerrit.wikimedia.org/r/362148 (https://phabricator.wikimedia.org/T168614) (owner: 10Joal) [16:01:50] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:02:00] RECOVERY - puppet last run on db1095 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:02:01] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:02:10] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:02:10] RECOVERY - puppet last run on mc1026 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:02:10] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:02:10] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:02:10] RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:02:11] RECOVERY - puppet last run on wtp1016 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:02:11] RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:02:12] RECOVERY - puppet last run on puppetmaster1002 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:02:12] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:02:13] RECOVERY - puppet last run on cp1045 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:02:13] RECOVERY - puppet last run on labvirt1018 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:02:14] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:02:20] RECOVERY - puppet last run on db1031 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:02:20] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:02:21] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:02:21] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:02:21] RECOVERY - puppet last run on mc1036 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:02:30] RECOVERY - puppet last run on elastic1026 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:02:30] RECOVERY - puppet last run on elastic1031 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:02:30] RECOVERY - puppet last run on labvirt1011 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:02:30] RECOVERY - puppet last run on mw1286 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:02:31] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:02:31] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:02:32] RECOVERY - puppet last run on dubnium is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:02:32] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:02:32] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:02:40] RECOVERY - puppet last run on dbproxy1011 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:02:40] RECOVERY - puppet last run on dbproxy1003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:02:40] RECOVERY - puppet last run on dbmonitor2001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:02:41] RECOVERY - puppet last run on lithium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:02:41] RECOVERY - puppet last run on install2002 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:03:00] RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:03:10] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:03:10] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:03:10] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:03:10] RECOVERY - puppet last run on rdb1005 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:03:11] RECOVERY - puppet last run on mc1008 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:03:11] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:03:20] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:03:20] RECOVERY - puppet last run on db1074 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:03:20] RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:03:20] RECOVERY - puppet last run on ms-fe1006 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:03:30] RECOVERY - puppet last run on elastic1042 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:03:31] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:03:31] RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:03:31] RECOVERY - puppet last run on mw1167 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:03:31] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:03:31] RECOVERY - puppet last run on analytics1050 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:03:31] RECOVERY - puppet last run on analytics1056 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:03:32] RECOVERY - puppet last run on labvirt1002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:03:32] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:03:33] RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:03:40] RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:03:46] (03PS1) 10Alexandros Kosiaris: Revert "Revert "Revert "Depool poolcounter1001""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362228 [16:04:00] RECOVERY - puppet last run on db1075 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:04:00] RECOVERY - puppet last run on db1092 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:04:03] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "Revert "Revert "Depool poolcounter1001""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362228 (owner: 10Alexandros Kosiaris) [16:04:10] RECOVERY - puppet last run on cp3037 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:04:10] RECOVERY - puppet last run on mw1303 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:04:10] RECOVERY - puppet last run on labvirt1007 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:04:10] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:04:10] RECOVERY - puppet last run on thumbor1003 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:04:11] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:04:11] RECOVERY - puppet last run on mw1250 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:04:12] RECOVERY - puppet last run on wtp1015 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:04:12] RECOVERY - puppet last run on francium is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:04:13] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:04:13] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:04:14] RECOVERY - puppet last run on ms1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:04:20] RECOVERY - puppet last run on thumbor1002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:04:20] RECOVERY - puppet last run on maps1003 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:04:21] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:04:21] RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:04:21] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:04:21] RECOVERY - puppet last run on db1030 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:04:21] RECOVERY - puppet last run on labcontrol1002 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:04:22] RECOVERY - puppet last run on elastic1025 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:04:30] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:04:31] RECOVERY - puppet last run on thumbor1001 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:04:31] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:04:31] RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:04:31] RECOVERY - puppet last run on ms-be1037 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:04:32] RECOVERY - puppet last run on mw1300 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:04:32] RECOVERY - puppet last run on mc1013 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:04:32] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:04:32] RECOVERY - puppet last run on kubernetes1004 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:04:33] RECOVERY - puppet last run on thumbor1004 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:04:38] mobrovac: things are stable [16:04:40] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:04:41] RECOVERY - puppet last run on elastic1046 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:04:46] specially if it is codfw [16:04:50] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:05:00] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:05:00] RECOVERY - puppet last run on lvs1010 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:05:00] RECOVERY - puppet last run on db1082 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:05:04] !log akosiaris@tin Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 46s) [16:05:05] thnx for the ping jynus, and yup, it's in codfw [16:05:10] RECOVERY - puppet last run on mw1241 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:05:10] RECOVERY - puppet last run on cp1073 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:05:11] RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:05:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:18] papaul: we are good to go! [16:05:20] RECOVERY - puppet last run on rcs1001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:05:30] RECOVERY - puppet last run on relforge1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:05:30] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:05:31] RECOVERY - puppet last run on kubestage1001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:05:31] RECOVERY - puppet last run on aqs1007 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:05:31] RECOVERY - puppet last run on ores1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:05:31] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:05:31] RECOVERY - puppet last run on ores1006 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:05:32] RECOVERY - puppet last run on es1013 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:05:32] RECOVERY - puppet last run on es1014 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:05:33] RECOVERY - puppet last run on maps1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:05:33] mobrovac: sorry for the interruption [16:05:42] no worries [16:06:00] RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:06:10] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:06:10] RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:06:11] RECOVERY - puppet last run on mw1271 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:06:11] RECOVERY - puppet last run on mw1268 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:06:11] RECOVERY - puppet last run on restbase1015 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:06:11] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:06:11] (03CR) 10jenkins-bot: Revert "Revert "Revert "Depool poolcounter1001""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362228 (owner: 10Alexandros Kosiaris) [16:06:20] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:06:20] RECOVERY - puppet last run on restbase1011 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:06:20] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:06:20] RECOVERY - puppet last run on db1081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:06:20] Is this affecting personal javascript? [16:06:30] RECOVERY - puppet last run on kafka1013 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:06:31] RECOVERY - puppet last run on db1070 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:06:31] RECOVERY - puppet last run on ganeti1007 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:06:31] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:06:31] RECOVERY - puppet last run on kafka1020 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:06:31] RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:06:31] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:06:32] RECOVERY - puppet last run on analytics1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:06:32] RECOVERY - puppet last run on db1053 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:06:40] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:06:40] RECOVERY - puppet last run on mc1017 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:06:50] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:06:50] RECOVERY - puppet last run on ms-be1031 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:07:00] RECOVERY - puppet last run on mw1302 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:07:10] RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:07:10] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:07:10] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:07:10] RECOVERY - puppet last run on mc1019 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:07:10] RECOVERY - puppet last run on mw1265 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:07:20] RECOVERY - puppet last run on notebook1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:07:20] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:07:20] RECOVERY - puppet last run on db1076 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:07:25] sjoerddebruin: no, should be all recovered now [16:07:30] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:07:30] RECOVERY - puppet last run on ms-be1025 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:07:32] RECOVERY - puppet last run on maps1004 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:07:32] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:07:32] RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:07:32] RECOVERY - puppet last run on aqs1008 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:07:32] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:07:32] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:07:32] RECOVERY - puppet last run on mw1267 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:07:32] RECOVERY - puppet last run on etcd1004 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:07:33] RECOVERY - puppet last run on mw1227 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:07:40] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:07:40] RECOVERY - puppet last run on dbproxy1006 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:07:41] I'm still having problems, godog [16:07:50] RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:07:50] RECOVERY - puppet last run on hassium is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:07:50] RECOVERY - puppet last run on labsdb1001 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:07:53] godog: is puppet swat happening? [16:07:59] sjoerddebruin: can you be more specific? [16:08:00] RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:08:00] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:08:10] Personal javascript and gadgets are not working. [16:08:10] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:08:11] RECOVERY - puppet last run on ores1007 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:08:11] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:08:11] RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:08:11] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:08:20] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:08:20] RECOVERY - puppet last run on mw1223 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:08:20] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:08:20] RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:08:30] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:08:30] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:08:31] RECOVERY - puppet last run on wdqs1001 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:08:31] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:08:31] RECOVERY - puppet last run on chlorine is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:08:31] RECOVERY - puppet last run on ms-be1016 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:08:31] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:08:32] RECOVERY - puppet last run on radon is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:08:32] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:08:33] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:08:40] RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:08:41] RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:08:45] Also edit statement links on Wikidata are gone. [16:08:50] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:09:00] RECOVERY - puppet last run on labsdb1005 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:09:00] PROBLEM - IPMI Temperature on mw1228 is CRITICAL: Return code of 255 is out of bounds [16:09:00] RECOVERY - puppet last run on ms-be1028 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:09:01] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:09:10] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:09:10] RECOVERY - puppet last run on mw1299 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:09:11] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:09:11] RECOVERY - puppet last run on analytics1062 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:09:11] RECOVERY - puppet last run on cp1074 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:09:11] RECOVERY - puppet last run on thorium is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:09:11] RECOVERY - puppet last run on oresrdb1002 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:09:20] SMalyshev: I think so yeah, sec I'm trying to understand if sjoerddebruin's problem is somehow related [16:09:20] RECOVERY - puppet last run on db1001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:09:20] RECOVERY - puppet last run on rdb1007 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:09:20] RECOVERY - puppet last run on cp1099 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:09:20] RECOVERY - puppet last run on mw1259 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:09:21] RECOVERY - puppet last run on ms-be1030 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:09:21] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:09:30] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:09:30] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:09:30] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:09:30] RECOVERY - puppet last run on elastic1051 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:09:31] RECOVERY - puppet last run on kubestagetcd1003 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:09:31] RECOVERY - puppet last run on cobalt is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:09:31] RECOVERY - puppet last run on wtp1006 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:09:33] RECOVERY - puppet last run on labvirt1017 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:09:33] RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:09:33] RECOVERY - puppet last run on analytics1057 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:09:33] RECOVERY - puppet last run on kubestage1002 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:09:34] RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:09:34] RECOVERY - puppet last run on prometheus1003 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:09:35] RECOVERY - puppet last run on oresrdb1001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:09:41] godog: I have to go offline for about 20 mins, so we could do it later if you prefer [16:10:00] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:10:04] SMalyshev: ok, thanks! [16:10:10] RECOVERY - puppet last run on db1091 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:10:12] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:10:12] RECOVERY - puppet last run on mw1255 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:10:12] RECOVERY - puppet last run on mw1283 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:10:12] RECOVERY - puppet last run on ganeti1001 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:10:12] RECOVERY - puppet last run on mw1269 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:10:12] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:10:20] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:10:20] RECOVERY - puppet last run on eventlog1001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:10:20] RECOVERY - puppet last run on ms-be1022 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:10:22] sjoerddebruin: can you give me a way to reproduce the problem and what the error is? [16:10:30] RECOVERY - puppet last run on mw1282 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:10:30] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:10:30] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:10:30] RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:10:30] RECOVERY - puppet last run on restbase-dev1001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:10:30] RECOVERY - puppet last run on wdqs1002 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:10:31] RECOVERY - puppet last run on elastic1038 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:10:32] RECOVERY - puppet last run on analytics1046 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:10:32] RECOVERY - puppet last run on db1097 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:10:40] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:10:40] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:10:40] RECOVERY - puppet last run on mc1031 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:11:05] godog: no error is given, try a random item on Wikidata (i'm looking at https://www.wikidata.org/wiki/Q2103111), no edit links appearing for the statements. [16:11:10] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:11:10] RECOVERY - puppet last run on snapshot1006 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:11:11] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:11:11] RECOVERY - puppet last run on logstash1004 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:11:20] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:11:20] RECOVERY - puppet last run on labvirt1005 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:11:21] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:11:30] RECOVERY - puppet last run on db1085 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:11:30] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:11:30] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:11:30] RECOVERY - puppet last run on mc1035 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:11:30] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [16:11:30] RECOVERY - puppet last run on db1099 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:11:31] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:11:31] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:11:31] RECOVERY - puppet last run on ms-be1017 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:11:32] RECOVERY - puppet last run on ores1005 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:11:40] RECOVERY - puppet last run on db1054 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:11:40] RECOVERY - puppet last run on ms-be1034 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:11:40] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:11:40] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:12:10] RECOVERY - puppet last run on db1096 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:12:10] RECOVERY - puppet last run on analytics1061 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:12:11] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:12:11] RECOVERY - puppet last run on ocg1003 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:12:20] RECOVERY - puppet last run on mwdebug1002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:12:20] RECOVERY - puppet last run on labvirt1015 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:12:20] RECOVERY - puppet last run on db1035 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:12:30] RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:12:30] RECOVERY - puppet last run on mw1231 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:12:30] RECOVERY - puppet last run on stat1006 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:12:30] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:12:30] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:12:30] RECOVERY - puppet last run on mw1281 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:12:31] sjoerddebruin: works for me? http://esaurito.net/~godog/sshot/screenshot_5LytWD.png [16:12:31] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:12:31] RECOVERY - puppet last run on mw1209 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:12:32] RECOVERY - puppet last run on analytics1067 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:12:32] RECOVERY - puppet last run on restbase-dev1003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:12:40] RECOVERY - puppet last run on mw1296 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:12:40] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:12:45] godog: i think it's a cache problem. I'm not having the problem anymore on for example Commons. [16:13:00] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:13:00] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:13:01] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [16:13:10] RECOVERY - puppet last run on db1041 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:13:10] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:13:14] sjoerddebruin: ack, thanks, yeah try force-reloading in the browser [16:13:20] RECOVERY - puppet last run on rdb1008 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:13:20] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:13:20] RECOVERY - puppet last run on labcontrol1001 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [16:13:20] RECOVERY - puppet last run on labtestservices2001 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:13:20] RECOVERY - puppet last run on cp3047 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:13:30] RECOVERY - puppet last run on db1087 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:13:30] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:13:30] RECOVERY - puppet last run on mc1023 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:13:30] RECOVERY - puppet last run on aqs1005 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:13:30] RECOVERY - puppet last run on analytics1058 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:13:31] RECOVERY - puppet last run on wtp1003 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:13:31] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:13:33] I've disabled my local cache, no change. [16:13:40] RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:13:40] RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:13:40] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:13:50] RECOVERY - puppet last run on restbase-dev1002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:13:51] RECOVERY - puppet last run on ms-fe1008 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:14:00] RECOVERY - puppet last run on ms-be1029 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:14:00] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:14:01] Hm, sometimes they do appear but mostly not. [16:14:10] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:14:11] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:14:20] RECOVERY - puppet last run on elastic1028 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:14:20] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:14:20] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:14:20] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:14:20] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:14:30] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:14:30] RECOVERY - puppet last run on scb1003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:14:30] RECOVERY - puppet last run on poolcounter1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:14:30] RECOVERY - puppet last run on labvirt1001 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:14:31] RECOVERY - puppet last run on mw1215 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:14:31] RECOVERY - puppet last run on analytics1052 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:14:31] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:14:31] RECOVERY - puppet last run on wtp1022 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:14:32] RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:14:40] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:14:40] RECOVERY - puppet last run on db1044 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:14:50] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:15:00] RECOVERY - puppet last run on elastic1052 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:15:00] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:15:00] RECOVERY - puppet last run on ganeti1004 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:15:08] SMalyshev: ping me when you're back [16:15:10] RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:15:10] RECOVERY - puppet last run on db1037 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:15:10] RECOVERY - puppet last run on mw1285 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:15:10] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:15:10] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:15:11] RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:15:11] RECOVERY - puppet last run on mw1276 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:15:20] RECOVERY - puppet last run on db1047 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:15:20] RECOVERY - puppet last run on db1084 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:15:32] RECOVERY - puppet last run on mw1225 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:15:32] RECOVERY - puppet last run on elastic1039 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:15:32] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:15:32] RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:15:32] RECOVERY - puppet last run on analytics1064 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:15:32] RECOVERY - puppet last run on analytics1063 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:15:32] RECOVERY - puppet last run on analytics1066 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:15:32] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:15:32] RECOVERY - puppet last run on kubernetes1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:15:40] RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:15:40] RECOVERY - puppet last run on lvs1004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:15:41] RECOVERY - puppet last run on elastic1050 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:15:51] RECOVERY - puppet last run on ms-be1033 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:16:00] RECOVERY - puppet last run on wtp1009 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:16:00] PROBLEM - Host ps1-c8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:00] RECOVERY - puppet last run on aqs1004 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:16:10] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:16:10] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:16:10] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:16:11] RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:16:11] RECOVERY - puppet last run on bohrium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:16:11] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:16:11] RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:16:11] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:16:20] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:16:20] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:16:20] PROBLEM - Host ps1-d6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:20] PROBLEM - Host ps1-a7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:20] PROBLEM - Host ps1-d1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:21] PROBLEM - Host ps1-d8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:21] PROBLEM - Host ps1-d5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:21] PROBLEM - Host msw1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:22] PROBLEM - Host ps1-c6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:22] PROBLEM - Host ps1-b4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:23] PROBLEM - Host ps1-c5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:23] PROBLEM - Host ps1-a2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:24] PROBLEM - Host ps1-b5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:24] PROBLEM - Host ps1-b7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:40] RECOVERY - puppet last run on labvirt1003 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:16:40] RECOVERY - puppet last run on ms-be1035 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:16:40] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:16:43] RECOVERY - puppet last run on dbproxy1008 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:16:43] PROBLEM - Host asw-b-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:43] PROBLEM - Host asw-c-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:43] PROBLEM - Host ps1-a4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:43] PROBLEM - Host asw-a-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:43] PROBLEM - Host ps1-b1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:43] PROBLEM - Host ps1-b8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:43] PROBLEM - Host ps1-c3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:44] PROBLEM - Host ps1-b3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [16:16:50] RECOVERY - puppet last run on ms-be1018 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:16:50] RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:17:10] RECOVERY - Host ps1-c3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.50 ms [16:17:10] RECOVERY - Host ps1-b6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.78 ms [16:17:10] RECOVERY - Host ps1-d2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.87 ms [16:17:10] RECOVERY - Host ps1-a7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.91 ms [16:17:10] RECOVERY - Host ps1-d1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.88 ms [16:17:15] godog: I'm getting "Invalid file type" on pages as https://www.wikidata.org/w/resources/lib/oojs-ui/oojs-ui-core.js.map [16:18:23] RECOVERY - puppet last run on labsdb1009 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:18:23] RECOVERY - puppet last run on labsdb1010 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:18:23] RECOVERY - puppet last run on analytics1037 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:18:23] RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:18:23] RECOVERY - puppet last run on db1049 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:18:30] RECOVERY - puppet last run on db1050 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:18:30] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:18:40] RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:18:40] RECOVERY - puppet last run on mw1264 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:18:40] RECOVERY - puppet last run on analytics1054 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:18:40] RECOVERY - puppet last run on labvirt1008 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:18:40] RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:18:41] RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:18:50] RECOVERY - puppet last run on prometheus2004 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:19:00] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:19:10] RECOVERY - puppet last run on aluminium is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:19:10] RECOVERY - puppet last run on elastic1037 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [16:19:10] RECOVERY - puppet last run on wtp1021 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:19:10] RECOVERY - puppet last run on wtp1019 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:19:18] am I right in guessing we are going to have recovery for puppet on all hosts across the fleet? i.e. it was down everywhere? [16:19:20] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:19:20] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:19:20] RECOVERY - puppet last run on elastic1049 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:19:20] RECOVERY - puppet last run on mc1032 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:19:30] godog: seems like some gadget issues, working on it. [16:19:30] RECOVERY - puppet last run on elastic1020 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:19:30] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:19:30] RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:19:40] RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:19:40] RECOVERY - puppet last run on ores1003 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:19:40] RECOVERY - puppet last run on es1017 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:19:40] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:19:50] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:20:00] RECOVERY - puppet last run on mc1024 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:20:00] RECOVERY - puppet last run on restbase1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:20:10] RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:20:11] RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:20:11] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:20:11] RECOVERY - puppet last run on install1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:20:11] RECOVERY - puppet last run on db1073 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:20:11] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:20:20] RECOVERY - puppet last run on wdqs1003 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:20:20] RECOVERY - puppet last run on notebook1001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:20:21] RECOVERY - puppet last run on db1089 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:20:25] apergos: yep :( [16:20:30] RECOVERY - puppet last run on restbase1016 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:20:30] RECOVERY - puppet last run on restbase1018 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:20:30] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:20:30] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:20:31] RECOVERY - puppet last run on mw1279 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:20:31] RECOVERY - puppet last run on analytics1049 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:20:40] RECOVERY - puppet last run on ores1004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:20:41] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:20:41] RECOVERY - puppet last run on es1015 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:20:41] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:20:41] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:20:50] RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:20:50] RECOVERY - puppet last run on kraz is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:20:50] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:21:00] RECOVERY - puppet last run on elastic1036 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:21:00] RECOVERY - puppet last run on elastic1034 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:21:00] RECOVERY - puppet last run on elastic1035 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:21:00] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:21:14] !log temporarily stop ircecho, puppet spam [16:21:21] heh [16:21:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:33] papaul: ping? status? [16:30:26] mobrovac: i am waiting on you [16:30:48] oh, i pinged you earlier that we are good to go, might have missed it due to noise [16:31:08] papaul: do you have an ETA on the time it will take? [16:31:17] maint window ends in 1h [16:31:22] godog: back now [16:32:02] mobrovac: i think i did not undestand you you said give me 5 minutes so i was waiting on you i did not start anything [16:32:09] SMalyshev: ack, I guess we're going ahead with 17:27 godog: re: thumbor, three actionables I can think of: 1) should have a backup poolcounter configured 2) shouldn't fail when poolcounters are failing 3) should alert/page when thumbs aren't rendered [16:32:13] no [16:32:19] https://gerrit.wikimedia.org/r/#/c/358783 SMalyshev [16:32:32] yup [16:32:32] papaul: kk, we are good now, got held up due to problems [16:32:59] (03PS2) 10Filippo Giunchedi: Add "latest" links to TTL dumps [puppet] - 10https://gerrit.wikimedia.org/r/358783 (https://phabricator.wikimedia.org/T164783) (owner: 10Smalyshev) [16:33:05] mobrovac: ok will take it down and troubleshoot the problem thanks [16:33:20] cool, thnx papaul, keep me posted [16:33:54] mobrovac: will do [16:36:20] (03CR) 10Filippo Giunchedi: [C: 032] Add "latest" links to TTL dumps [puppet] - 10https://gerrit.wikimedia.org/r/358783 (https://phabricator.wikimedia.org/T164783) (owner: 10Smalyshev) [16:38:55] SMalyshev: merged! I suppose we wait the next dump run in this case [16:39:11] godog: yes, thanks! [16:39:38] SMalyshev: neat, thanks [16:44:12] Warning: OutputPage::transformFilePath: Failed to hash /srv/mediawiki/php-1.30.0-wmf.7/extensions/WikibaseQualityConstraints/modules/gadget.js [Called from OutputPage::transformFilePath in /srv/mediawiki/php-1.30.0-wmf.7/includes/OutputPage.php at line 37 [16:49:33] 10Operations, 10Cassandra, 10RESTBase, 10RESTBase-Cassandra, and 2 others: Option: Consider switching back to leveled compaction (LCS) - https://phabricator.wikimedia.org/T153703#3391633 (10Eevans) >>! In T153703#3382361, @GWicke wrote: > @eevans, I know you have switched some keyspaces on the dev cluster... [16:49:58] sjoerddebruin: ^ [16:50:45] Yeah, that is the same issue I saw. I'm also having problems with https://tools-static.wmflabs.org being down [16:50:47] (03CR) 10Nschaaf: Stop reader surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360849 (https://phabricator.wikimedia.org/T131949) (owner: 10Nschaaf) [16:50:49] (one of the scripts I'm using was pointing to that domain, blocking further js load) [16:53:24] (03PS1) 10Nuria: Adding mailto to camus job ` [puppet] - 10https://gerrit.wikimedia.org/r/362237 [16:53:35] hi, is something wrong with WikibaseQualityConstraints? is it my fault that it fails to hash? [16:55:47] (03PS2) 10Nuria: Adding mailto to camus job [puppet] - 10https://gerrit.wikimedia.org/r/362237 (https://phabricator.wikimedia.org/T169248) [17:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170629T1700). [17:00:22] Nothing for ORES today. [17:00:41] _joe_ or akosiaris: if you have time, would you mind commenting about what else needs done by us from an Ops perspective for productionizing the Recommendation API? Thanks T148129 [17:00:41] T148129: Productization of Recommendation API - https://phabricator.wikimedia.org/T148129 [17:01:14] schana: I think _joe_ said he wanted to do some review. Is it up and running in beta btw ? [17:01:20] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.270 second response time [17:02:37] not yet akosiaris, we're just trying to get a complete picture of the potential scope of the remaining work [17:03:31] 10Operations, 10ops-codfw, 10Services (watching): Troubleshoot scb2005 NICs - https://phabricator.wikimedia.org/T167763#3391696 (10Papaul) I spoke with Dell support team they will send a replacement board on Monday. Please see below for case information Dell Service Request#: 950264222 System is back on li... [17:03:43] 10Operations, 10Performance: /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam - https://phabricator.wikimedia.org/T169249#3391697 (10fgiunchedi) [17:03:44] mobrovac: https://phabricator.wikimedia.org/T167763 [17:04:59] schana: yeah getting running in beta is kind of a blocker for running it in production [17:05:52] akosiaris: understood, but is there any other work we should be expecting to need to perform between getting it running in beta and going to production? [17:06:11] kk papaul, thank you! [17:06:38] schana: no, unless something comes up after _joe_ reviews it, beta should be the last step before landing into production [17:07:05] okay, thanks akosiaris [17:08:22] !log scb2005 repooling back the services - T167763 [17:08:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:08:31] T167763: Troubleshoot scb2005 NICs - https://phabricator.wikimedia.org/T167763 [17:12:11] !log arlolra@tin Started deploy [parsoid/deploy@717df08]: Updating Parsoid to b4187f18 [17:12:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:30] PROBLEM - HHVM rendering on mw1297 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:15:20] RECOVERY - HHVM rendering on mw1297 is OK: HTTP OK: HTTP/1.1 200 OK - 75822 bytes in 0.136 second response time [17:19:36] (03PS1) 10Cmjohnson: Fixing mgmt dns for several servers that did not have matching forward and reverse entries. The IP's physically on server match these entries. [dns] - 10https://gerrit.wikimedia.org/r/362240 [17:21:08] 10Operations, 10Patch-For-Review: Ops Onboarding for Keith Herron - https://phabricator.wikimedia.org/T166587#3391762 (10Dzahn) 05Resolved>03Open As was pointed out to me, we did not do the Icinga paging part yet. Just permissions on the web ui.. [17:21:14] Hi all! We are struggeling to get a rewrite rule for wikidata.org right. Anyone around who could help? [17:21:17] COnfig patch is here: https://gerrit.wikimedia.org/r/#/c/357985/ [17:21:34] I suspect the rule is correct, but in the wrong place, so it's applied too late [17:21:52] !log arlolra@tin Finished deploy [parsoid/deploy@717df08]: Updating Parsoid to b4187f18 (duration: 09m 41s) [17:22:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:26] (03CR) 10Cmjohnson: [C: 032] Fixing mgmt dns for several servers that did not have matching forward and reverse entries. The IP's physically on server match these entrie [dns] - 10https://gerrit.wikimedia.org/r/362240 (owner: 10Cmjohnson) [17:22:34] * DanielK_WMDE waves at the various bots [17:23:11] ticket: https://phabricator.wikimedia.org/T119536 [17:28:25] !log Updated Parsoid to b4187f18 (T168900, T168675, T168404, T153203) [17:28:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:28:38] T153203: Improve detection of missing quotes in HTML tag attributes - https://phabricator.wikimedia.org/T153203 [17:28:38] T168675: Param in template argument gets parsed incorrectly - https://phabricator.wikimedia.org/T168675 [17:28:38] T168404: BUG: Snippets of styling code for certain East-Asian text styles are shown in the app - https://phabricator.wikimedia.org/T168404 [17:28:38] T168900: Notice: Undefined index: dsr in /extensions/Linter/includes/ApiRecordLint.php on line 65 - https://phabricator.wikimedia.org/T168900 [17:29:11] 10Operations, 10ops-codfw, 10Services (watching): Troubleshoot scb2005 NICs - https://phabricator.wikimedia.org/T167763#3391851 (10Papaul) Papaul, The DPS number is: 326998165 This has been set for Monday, July 3rd. Thank you, Dustin Crawford Enterprise Tech Support Senior Analyst, Linux & Virtual... [17:30:00] (03PS1) 10Jgreen: adjust monitoring for fundraising.wikimedia.org to payments-listener [puppet] - 10https://gerrit.wikimedia.org/r/362244 [17:31:04] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3391861 (10Papaul) [17:31:36] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3391862 (10Papaul) [17:32:49] (03CR) 10Jgreen: [C: 032] adjust monitoring for fundraising.wikimedia.org to payments-listener [puppet] - 10https://gerrit.wikimedia.org/r/362244 (owner: 10Jgreen) [17:33:33] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3391868 (10Papaul) [17:35:20] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure, 10Patch-For-Review: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3391869 (10Papaul) [17:37:07] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure, 10Patch-For-Review: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3379777 (10Papaul) Port information eth0 ge-8/0/0 eth1 ge-8/0/3 [17:39:20] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3391879 (10Papaul) Port information ge-1/0/13 [17:40:55] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure, 10Patch-For-Review: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3391882 (10Papaul) port information ge-1/0/17 [17:42:10] 10Operations, 10ops-codfw, 10Labs, 10Labs-Infrastructure, 10Patch-For-Review: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3391883 (10Papaul) port information ge-1/0/13 [17:50:15] PROBLEM - Host wtp1039 is DOWN: PING CRITICAL - Packet loss = 100% [17:50:15] PROBLEM - Host wtp1040 is DOWN: PING CRITICAL - Packet loss = 100% [17:55:25] RECOVERY - Host wtp1040 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [17:55:25] RECOVERY - Host wtp1039 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [17:57:35] PROBLEM - SSH on wtp1039 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:57:35] PROBLEM - SSH on wtp1040 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:59:15] bah [17:59:18] wtp is me [17:59:28] and those arent deployed and are in maint mode, not sure why they echoed [18:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170629T1800). [18:00:05] Jdlrobson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:02:05] PROBLEM - Host wtp1040 is DOWN: PING CRITICAL - Packet loss = 100% [18:05:05] is anyone able to swat it? [18:05:35] PROBLEM - nutcracker process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:05:55] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:05:55] jdlrobson: On it [18:06:07] RainbowSprinkles: awesome [18:06:22] (since I've been following on the task and know what's up already :)) [18:06:25] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:06:50] (03PS1) 10Jgreen: adjust DNS for *-fundraising.frack.*.wmnet [dns] - 10https://gerrit.wikimedia.org/r/362256 [18:07:22] (03CR) 10Jgreen: [C: 032] adjust DNS for *-fundraising.frack.*.wmnet [dns] - 10https://gerrit.wikimedia.org/r/362256 (owner: 10Jgreen) [18:07:35] RECOVERY - SSH on wtp1039 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [18:08:35] RECOVERY - nutcracker process on thumbor1002 is OK: PROCS OK: 1 process with UID = 115 (nutcracker), command name nutcracker [18:08:45] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [18:09:05] PROBLEM - High load average on labstore1004 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [18:09:15] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:10:22] !log demon@tin Synchronized php-1.30.0-wmf.7/extensions/TextExtracts/extension.json: T107206 (duration: 00m 47s) [18:10:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:51] jdlrobson: Live everywhere now ^ [18:13:52] 10Operations, 10Performance: /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam - https://phabricator.wikimedia.org/T169249#3392045 (10Krinkle) This must've been triggered by which is the only change Xenon has seen since several months. {6dfacc0111c922de8... [18:14:02] RainbowSprinkles: hey! can you help us out with getting a rewrite rule right? or rather, finding the correct place for it? This does not seem to work for some reason (don't ask me how exactly Ladsgroup went about testing it): https://gerrit.wikimedia.org/r/#/c/357985/ [18:14:02] 10Operations, 10Performance-Team: /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam - https://phabricator.wikimedia.org/T169249#3392049 (10Krinkle) [18:14:52] DanielK_WMDE: Rewrite rules are black magic, but I can look :) [18:15:50] (03CR) 10Krinkle: Make /entity/ redirect internal (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/357985 (https://phabricator.wikimedia.org/T119536) (owner: 10Ladsgroup) [18:20:10] 10Operations, 10ops-eqiad: rack and setup wtp1025-1048 - https://phabricator.wikimedia.org/T165520#3392074 (10RobH) wtp-1029-1038 good to go, puppet/salt signed. [18:20:36] DanielK_WMDE: Soooo, Krinkle pointed you to how we do it normally, but that seems like the wrong place to put things. [18:21:08] ...and we only want this on one domain. well, two, counting test.wikidata.org. [18:21:10] not all domains [18:21:56] RainbowSprinkles: if you have ideas or advice, please put it on the ticket, so Ladsgroup can find it later [18:28:15] DanielK_WMDE: Not really tbh. Rewrites are basically trial-and-error for me :\ [18:30:43] !log restart nfs on labstore1004 (primary) [18:31:40] RainbowSprinkles: yay, trial and error on the live site :) Anyway, the rule itself is easy enough, and i can test it locally. [18:31:48] but it needs to be done in the correct order. [18:31:56] and i have no idea how all these files get combiend [18:34:40] !log demon@tin Synchronized README: Forcing co-master sync (duration: 00m 46s) [18:35:56] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds [18:37:46] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3570 bytes in 0.006 second response time [18:39:16] RECOVERY - High load average on labstore1004 is OK: OK: Less than 50.00% above the threshold [16.0] [18:40:05] 10Operations, 10ops-eqiad: rack and setup wtp1025-1048 - https://phabricator.wikimedia.org/T165520#3392169 (10RobH) ``` [ (1*installer) 2 shell 3 shell 4- log ][ Jun 29 18:39 ]... [18:42:23] 10Operations, 10ops-eqiad, 10Analytics: Smartctl errors for one kafka1012 disk - https://phabricator.wikimedia.org/T168927#3381297 (10RobH) This system is out of warranty, and will require onsite spare disks to be used as replacement. [18:48:08] PROBLEM - High load average on labstore1004 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [24.0] [18:52:56] (03CR) 10Herron: icinga/role:mail::mx: add monitoring of exim queue size (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/361023 (https://phabricator.wikimedia.org/T133110) (owner: 10Dzahn) [18:54:59] (03PS1) 10MarcoAurelio: Add 'WP' namespace alias to ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362267 (https://phabricator.wikimedia.org/T166035) [18:57:54] 10Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#3392248 (10Jgreen) [18:57:57] 10Operations, 10Mail, 10fundraising-tech-ops: (re)move problemsdonating aliases - https://phabricator.wikimedia.org/T127488#3392245 (10Jgreen) 05Resolved>03Open p:05Normal>03Low The ZenDesk task is #13464, haven't heard back yet. I'll reopen this task as a reminder to clean up the privateexim entries. [18:58:26] (03PS2) 10MarcoAurelio: Add 'WP' namespace alias to ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362267 (https://phabricator.wikimedia.org/T168164) [18:58:55] 10Operations, 10Mail, 10fundraising-tech-ops: (re)move problemsdonating aliases - https://phabricator.wikimedia.org/T127488#3392252 (10Jgreen) a:05Jgreen>03None [18:58:57] (03CR) 10Daniel Kinzler: "From IRC:" [puppet] - 10https://gerrit.wikimedia.org/r/357985 (https://phabricator.wikimedia.org/T119536) (owner: 10Ladsgroup) [19:01:08] PROBLEM - DRBD role on labstore1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:58] PROBLEM - Host labstore1004 is DOWN: PING CRITICAL - Packet loss = 100% [19:02:20] jouncebot: next [19:02:38] jouncebot: are you asleep? [19:02:38] oh oh. jouncebot may be sick [19:03:27] In 3 hour(s) and 56 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170629T2300) [19:03:27] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170629T1900). Please do the needful. [19:03:29] so I guess I'm going ahead with wmf.7, logspam situation appears to be under control with the exception of the luasandbox warnings [19:03:52] thcipriani: fyi ^ [19:03:53] (03PS1) 10Chad: Scap clean: Provide better logging on failed commands [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362269 [19:04:46] !log demon@tin Pruned MediaWiki: 1.30.0-wmf.4 [keeping static files] (duration: 02m 07s) [19:04:48] RECOVERY - Host labstore1004 is UP: PING OK - Packet loss = 16%, RTA = 0.25 ms [19:06:21] !log demon@tin Pruned MediaWiki: 1.30.0-wmf.5 [keeping static files] (duration: 01m 16s) [19:06:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:08] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1004 is CRITICAL: CRITICAL - Expecting active but unit maintain-dbusers is inactive [19:07:25] (03CR) 10Chad: [C: 032] Scap clean: Provide better logging on failed commands [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362269 (owner: 10Chad) [19:07:38] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1004 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is inactive [19:07:48] PROBLEM - drbd service on labstore1004 is CRITICAL: CRITICAL - Expecting active but unit drbd is inactive [19:09:12] (03Merged) 10jenkins-bot: Scap clean: Provide better logging on failed commands [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362269 (owner: 10Chad) [19:09:21] (03CR) 10jenkins-bot: Scap clean: Provide better logging on failed commands [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362269 (owner: 10Chad) [19:09:53] twentyafterfour and others: FYI we are having some serious problems with Kubernetes in Cloud Services/Tool Labs now. stashbot may disappear at any time [19:10:08] RECOVERY - High load average on labstore1004 is OK: OK: Less than 50.00% above the threshold [16.0] [19:11:00] bd808: no biggie, we can live without stashbot, I suppose ;) [19:11:41] (03PS1) 10Chad: Scap clean: Fix syntax you dummy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362271 [19:11:43] (03CR) 10Chad: [C: 032] Scap clean: Fix syntax you dummy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362271 (owner: 10Chad) [19:13:11] !log twentyafterfour@tin Synchronized php-1.30.0-wmf.7/extensions/RevisionSlider/src/RevisionSliderHooks.php: sync https://gerrit.wikimedia.org/r/#/c/362131/ prior to promoting wmf.7 (duration: 00m 46s) [19:13:38] PROBLEM - High load average on labstore1005 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [24.0] [19:14:34] (03Merged) 10jenkins-bot: Scap clean: Fix syntax you dummy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362271 (owner: 10Chad) [19:14:36] (03PS1) 10MarcoAurelio: Fix nowikisource template namespace subpages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362272 (https://phabricator.wikimedia.org/T166035) [19:15:01] such helpful errors, hhvm: "<11>Jun 29 19:13:25 mw1162 hhvm:" [19:15:05] (03PS2) 10MarcoAurelio: Fix nowikisource template namespace subpages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362272 (https://phabricator.wikimedia.org/T166035) [19:16:19] !log deploying wmf/1.30.0-wmf.7 to all wikis refs T167536 [19:16:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:29] T167536: MW-1.30.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T167536 [19:16:58] RECOVERY - DRBD role on labstore1004 is OK: DRBD role OK [19:17:42] (03PS1) 1020after4: all wikis to 1.30.0-wmf.7 refs T167536 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362273 [19:17:44] (03CR) 1020after4: [C: 032] all wikis to 1.30.0-wmf.7 refs T167536 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362273 (owner: 1020after4) [19:17:48] RECOVERY - drbd service on labstore1004 is OK: OK - drbd is active [19:21:25] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.7 refs T167536 [19:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:21:35] T167536: MW-1.30.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T167536 [19:21:37] twentyafterfour: I have a fix for that warning on scap clean's syntax [19:21:42] Was waiting for you to finish [19:21:57] RainbowSprinkles: I think it just got pulled by deploy-promote [19:22:04] Ah I see that [19:22:12] I'll sync it [19:22:18] I'm done with the train assuming nothing crazy pops up [19:23:03] This was just some logging additions so I can better debug why clean with --delete fails [19:23:17] (fails sometimes) [19:23:20] !log demon@tin Synchronized scap/plugins/clean.py: Because I need to learn basic python syntax before trying stuff (duration: 00m 42s) [19:23:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:15] (03CR) 10Daniel Kinzler: "Oh hm... in this patch, you can see where the standard rewrite rules for wikidata.org are: https://gerrit.wikimedia.org/r/#/c/361801/2/mod" [puppet] - 10https://gerrit.wikimedia.org/r/357985 (https://phabricator.wikimedia.org/T119536) (owner: 10Ladsgroup) [19:30:40] (03CR) 10Daniel Kinzler: [C: 031] "I agree that having wikiba.se outsiede the production cluster is a problem, though I don't think it's a big one. I also agree that this is" [puppet] - 10https://gerrit.wikimedia.org/r/361801 (https://phabricator.wikimedia.org/T169023) (owner: 10Krinkle) [19:32:18] (03CR) 10jenkins-bot: Scap clean: Fix syntax you dummy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362271 (owner: 10Chad) [19:32:20] (03CR) 10jenkins-bot: all wikis to 1.30.0-wmf.7 refs T167536 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362273 (owner: 1020after4) [19:39:38] RECOVERY - High load average on labstore1005 is OK: OK: Less than 50.00% above the threshold [16.0] [19:40:08] (03CR) 10Daniel Kinzler: [C: 031] "oh, but for the record: if we can't have the redirect, I'm also good with just removing the /ontology path. It's dead wood, I believe. Bes" [puppet] - 10https://gerrit.wikimedia.org/r/361801 (https://phabricator.wikimedia.org/T169023) (owner: 10Krinkle) [19:40:10] (03CR) 10Jdlrobson: [C: 031] Stop reader surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360849 (https://phabricator.wikimedia.org/T131949) (owner: 10Nschaaf) [20:06:00] (03CR) 10Dzahn: "ok, thanks for pointing that out. i wasn't expecting us to upgrade just yet, but once we have 7.1 this puppet code should still be just fi" [puppet] - 10https://gerrit.wikimedia.org/r/362124 (owner: 10Dzahn) [20:07:04] (03CR) 10Paladox: [C: 031] apache: add class for mod_php with PHP 7.0 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [20:07:52] (03CR) 10Paladox: [C: 031] "We can still merge this. Just noting phabricator will not work on php 7.0 but does on php 7.1 :)" [puppet] - 10https://gerrit.wikimedia.org/r/362124 (owner: 10Dzahn) [20:10:39] 10Operations, 10Electron-PDFs, 10Services, 10Patch-For-Review, 10Reading-Web-Backlog (Tracking): pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922#3392368 (10Jdlrobson) [20:13:42] 10Operations, 10OfflineContentGenerator, 10Reading-Web-Backlog (Tracking), 10Services (watching): Confirm attribution needs - https://phabricator.wikimedia.org/T150875#3392430 (10Jdlrobson) [20:15:19] (03PS2) 10Dzahn: netmon: use existing role::network::monitor, clean up site.pp [puppet] - 10https://gerrit.wikimedia.org/r/362127 [20:22:40] (03CR) 10Dzahn: [C: 032] netmon: use existing role::network::monitor, clean up site.pp [puppet] - 10https://gerrit.wikimedia.org/r/362127 (owner: 10Dzahn) [20:28:41] 10Operations, 10OfflineContentGenerator, 10Reading-Community-Engagement, 10Reading-Web-Backlog (Tracking), 10Services (watching): Collate wikimedia pages into a single html wikimedia page that can then be rendered into a single pdf - https://phabricator.wikimedia.org/T150874#3392584 (10Jdlrobson) [20:28:57] 10Operations, 10Collection, 10OfflineContentGenerator, 10Reading-Community-Engagement, and 2 others: Replace OCG in collection extension with Electron - https://phabricator.wikimedia.org/T150872#3392594 (10Jdlrobson) [20:30:48] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2011759 [20:31:18] PROBLEM - puppet last run on netmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[prometheus-snmp-exporter] [20:32:06] (03CR) 10Paladox: [C: 031] apache: add class for mod_php with PHP 7.0 for stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [20:32:38] 10Operations, 10OCG-General, 10Reading-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3392616 (10Jdlrobson) [20:33:14] (03CR) 10Dzahn: [C: 04-1] "you just told me about the problem you found with it. please add that error or feel free to amend" [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [20:36:42] (03CR) 10Paladox: [C: 031] "error i get is" [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [20:36:49] (03PS4) 10Paladox: apache: add class for mod_php with PHP 7.0 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/362119 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [20:37:21] (03CR) 10Dzahn: "E: Unable to locate package prometheus-snmp-exporter :(" [puppet] - 10https://gerrit.wikimedia.org/r/362127 (owner: 10Dzahn) [20:38:17] !log ppchelko@tin Started deploy [changeprop/deploy@350076c]: Config: Enable red links processing. T133221 [20:38:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:28] T133221: Support red links updates in change-propagation - https://phabricator.wikimedia.org/T133221 [20:39:19] !log ppchelko@tin Finished deploy [changeprop/deploy@350076c]: Config: Enable red links processing. T133221 (duration: 01m 01s) [20:39:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:18] RECOVERY - puppet last run on netmon1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [20:46:30] !log APT - reprepro copy stretch-wikimedia jessie-wikimedia prometheus-snmp-exporter (to make it available on stretch for netmon1002) (T159756) [20:46:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:46:41] T159756: setup netmon1002.wikimedia.org - https://phabricator.wikimedia.org/T159756 [20:52:38] PROBLEM - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [3000.0] [20:53:28] PROBLEM - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [3000.0] [20:54:18] PROBLEM - pdfrender on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 5252: Connection refused [20:54:48] PROBLEM - High load average on labstore1005 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [20:56:38] 10Operations, 10Performance-Team, 10Reading-Infrastructure-Team-Backlog, 10TemplateStyles, and 3 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#3392775 (10Jdlrobson) [20:56:38] 10Operations, 10Performance-Team, 10Reading-Infrastructure-Team-Backlog, 10TemplateStyles, and 3 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#3392775 (10Jdlrobson) [21:00:38] RECOVERY - mediawiki originals uploads -hourly- for eqiad-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [21:00:52] !log mobrovac@tin Started deploy [restbase/deploy@bcb83f4]: Fix special char handling in PDF back-end requests - T169223 [21:01:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:03] T169223: Electron not rendering titles with questions marks on Italian wikipedia - https://phabricator.wikimedia.org/T169223 [21:01:29] RECOVERY - mediawiki originals uploads -hourly- for codfw-prod on graphite1001 is OK: OK: Less than 80.00% above the threshold [2000.0] [21:01:46] 10Operations, 10MobileFrontend, 10Traffic, 10Patch-For-Review, 10Reading-Web-Backlog (Tracking): Remove disableImages handling from VCL - https://phabricator.wikimedia.org/T168013#3392786 (10Jdlrobson) [21:01:51] 10Operations, 10MobileFrontend, 10Traffic, 10Patch-For-Review, 10Reading-Web-Backlog (Tracking): Remove disableImages handling from VCL - https://phabricator.wikimedia.org/T168013#3392786 (10Jdlrobson) [21:02:01] 10Operations, 10Performance-Team: Understand APC size increase after HHVM upgrade/restart - https://phabricator.wikimedia.org/T168540#3392787 (10Krinkle) [21:02:04] 10Operations, 10Performance-Team: Understand APC size increase after HHVM upgrade/restart - https://phabricator.wikimedia.org/T168540#3392787 (10Krinkle) [21:04:07] !log mobrovac@tin Finished deploy [restbase/deploy@bcb83f4]: Fix special char handling in PDF back-end requests - T169223 (duration: 03m 14s) [21:04:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:05:00] !log mobrovac@tin Started deploy [restbase/deploy@bcb83f4]: (no justification provided) [21:05:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:06:02] !log mobrovac@tin Finished deploy [restbase/deploy@bcb83f4]: (no justification provided) (duration: 01m 02s) [21:06:09] !log mobrovac@tin Started deploy [restbase/deploy@bcb83f4]: (no justification provided) [21:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:06:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:09:51] 10Operations, 10Incident-20150423-Commons, 10RESTBase, 10ArchCom-RfC (ArchCom-Approved), and 6 others: RFC: Request timeouts and retries - https://phabricator.wikimedia.org/T97204#3392825 (10GWicke) [21:09:51] 10Operations, 10Incident-20150423-Commons, 10MediaWiki-API, 10Parsoid, and 7 others: HHVM request timeouts not working; support lowering the API request timeout per request - https://phabricator.wikimedia.org/T97192#3392822 (10GWicke) 05Open>03Resolved a:03GWicke Okay, I verified that the test cases... [21:09:54] 10Operations, 10Incident-20150423-Commons, 10RESTBase, 10ArchCom-RfC (ArchCom-Approved), and 6 others: RFC: Request timeouts and retries - https://phabricator.wikimedia.org/T97204#3392825 (10GWicke) [21:10:30] !log mobrovac@tin Finished deploy [restbase/deploy@bcb83f4]: (no justification provided) (duration: 04m 21s) [21:10:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:28] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds [21:17:08] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3570 bytes in 0.006 second response time [21:31:48] RECOVERY - High load average on labstore1005 is OK: OK: Less than 50.00% above the threshold [16.0] [21:36:48] PROBLEM - High load average on labstore1005 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [21:37:48] RECOVERY - High load average on labstore1005 is OK: OK: Less than 50.00% above the threshold [16.0] [21:40:56] !log reboot labstore1004 with grub set to gnulinux-advanced-1773f282-5a1b-441e-865c-8b70a0ebc925>gnulinux-4.4.0-3-amd64-advanced-1773f282-5a1b-441e-865c-8b70a0ebc925 [21:41:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:41:48] PROBLEM - High load average on labstore1005 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [21:44:48] RECOVERY - High load average on labstore1005 is OK: OK: Less than 50.00% above the threshold [16.0] [21:51:13] !log APT - uploading python-django-south from jessie to wikimedia-stretch for librenms on stretch (T159756) [21:51:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:23] T159756: setup netmon1002.wikimedia.org - https://phabricator.wikimedia.org/T159756 [21:51:32] dang, "servermon", not librenms.. fixing in wiki, heh [21:51:46] should have used subtasks for each tool [21:52:38] lol [21:59:17] @seen grrrit-wm [21:59:17] mutante: Last time I saw grrrit-wm they were quitting the network with reason: Remote host closed the connection N/A at 12/20/2016 2:08:22 AM (191d19h50m55s ago) [21:59:29] lol [21:59:31] wrong bot [21:59:33] wikibugs [22:00:09] it felt like it was missing a gerrit comment [22:00:49] i think the bot is really slow tonight. [22:00:56] and wikibugs has the nick with _ underscore [22:01:02] the altnick [22:01:49] yep [22:02:21] anyways, now the servermon role should work on stretch, since the package is there. let's see.. or at least the next error [22:02:39] tries on a 'Wikimedia VPS' [22:05:58] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/1/3: down - Core: cr2-esams:xe-0/1/3 (Level3, BDFS2448, 84ms) {#2013} [10Gbps wave]BR [22:06:08] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/1/3: down - Core: cr2-eqiad:xe-4/1/3 (Level3, BDFS2448, 84ms) {#A0010621} [10Gbps wave]BR [22:07:58] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [22:08:08] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 59, down: 0, dormant: 0, excluded: 0, unused: 0 [22:10:48] PROBLEM - High load average on labstore1005 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [22:11:06] yea, so the VPS currently tells me: Failed to open TCP connection to puppet:8140 (getaddrinfo: Name or service not known) [22:12:25] mutante: I think the bot should be working again. [22:12:38] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [22:13:01] legoktm: ah :) thanks [22:14:48] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [22:16:08] !log set cfq scheduler on labstore1005 [22:16:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:16:48] RECOVERY - High load average on labstore1005 is OK: OK: Less than 50.00% above the threshold [16.0] [22:19:38] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [22:19:48] PROBLEM - High load average on labstore1005 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [22:20:48] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [22:20:48] RECOVERY - High load average on labstore1005 is OK: OK: Less than 50.00% above the threshold [16.0] [22:23:48] PROBLEM - High load average on labstore1005 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [24.0] [22:30:48] RECOVERY - Check Varnish expiry mailbox lag on cp1049 is OK: OK: expiry mailbox lag is 883 [22:30:48] RECOVERY - High load average on labstore1005 is OK: OK: Less than 50.00% above the threshold [16.0] [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170629T2300). Please do the needful. [23:00:04] mooeypoo, dbrant, jan_drewniak, and jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:27] o/ [23:00:33] Here and ready o7 [23:00:34] \o [23:01:48] I'll SWAT today [23:02:28] (03PS4) 10Catrope: Stop reader surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360849 (https://phabricator.wikimedia.org/T131949) (owner: 10Nschaaf) [23:02:31] (03CR) 10Catrope: [C: 032] Stop reader surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360849 (https://phabricator.wikimedia.org/T131949) (owner: 10Nschaaf) [23:03:02] dbrant_: Did you create a cherry-pick for https://gerrit.wikimedia.org/r/#/c/361862/1 or would you like me to create it for you? [23:04:10] Urgghhh, the gate-and-submit-swat queue starves mw-config jobs in the gate-and-submit queue [23:08:23] (Filed T169279) [23:08:24] T169279: Add mediawiki-config to the gate-and-submit-swat pipeline - https://phabricator.wikimedia.org/T169279 [23:12:07] (03Merged) 10jenkins-bot: Stop reader surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360849 (https://phabricator.wikimedia.org/T131949) (owner: 10Nschaaf) [23:12:09] Yeah, those should be in the same queue [23:12:21] (I think I suggested that in passing but it didn't happen yet) [23:12:23] (03CR) 10jenkins-bot: Stop reader surveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360849 (https://phabricator.wikimedia.org/T131949) (owner: 10Nschaaf) [23:13:58] (03CR) 10Reedy: [C: 031] Change lists.wikimedia.org SPF record to soft fail (~all) [dns] - 10https://gerrit.wikimedia.org/r/361501 (https://phabricator.wikimedia.org/T167703) (owner: 10Herron) [23:14:50] jdlrobson: Your change is on mwdebug1002, please test [23:17:05] RoanKattouw: please sync looks good [23:19:12] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Stop reader surveys (T131949) (duration: 00m 43s) [23:19:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:22] T131949: [Epic] Repeat the big English reader survey in one or two more languages - https://phabricator.wikimedia.org/T131949 [23:20:58] mooeypoo: jan_drewniak: Your changes are now on mwdebug1002, please test [23:23:58] PROBLEM - High load average on labstore1005 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [24.0] [23:24:03] RoanKattouw: looks good to sync :) [23:26:46] !log catrope@tin Synchronized php-1.30.0-wmf.7/extensions/CirrusSearch/: "Explore similar" widget for CirrusSearch (T149809) (duration: 00m 54s) [23:26:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:56] T149809: [A/B Test] Add 'explore similar' pages, categories and suggested languages in search results - https://phabricator.wikimedia.org/T149809 [23:27:49] RoanKattouw, all good on my end [23:28:19] !log catrope@tin Synchronized php-1.30.0-wmf.7/extensions/WikimediaEvents/: Add event logging for explode-similar on SRP (T149809) (duration: 00m 42s) [23:28:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:31:17] !log catrope@tin Synchronized php-1.30.0-wmf.7/resources/src/mediawiki.rcfilters/: RCFilters fixes (T169169, T169107, T169042) (duration: 00m 42s) [23:31:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:31:28] T169107: [regression] Can't remove a saved filter - https://phabricator.wikimedia.org/T169107 [23:31:28] T169042: The dialog to save filters on Recent Changes needs adjustments to better communicate the initial status - https://phabricator.wikimedia.org/T169042 [23:31:42] dbrant_: Are you here for your SWAT? If you want https://gerrit.wikimedia.org/r/#/c/361862/1 to be SWATed you need to 1) be present and 2) respond to my earlier question about a cherry-pick [23:32:15] !log Sorry I meant T169163 [23:32:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:32:25] T169163: [wmf.7] The new advanced filters 'Namespaces' and 'Tagged edits' displayed with old filter options - https://phabricator.wikimedia.org/T169163 [23:50:58] PROBLEM - High load average on labstore1005 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [24.0] [23:59:32] (03PS1) 10Niharika29: Config changes for LoginNotify [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362323 (https://phabricator.wikimedia.org/T107707)