[00:27:39] 06Operations, 06Release-Engineering-Team, 05Gitblit-Deprecate, 13Patch-For-Review: write Apache rewrite rules for gitblit -> diffusion migration - https://phabricator.wikimedia.org/T137224#2391744 (10Paladox) [00:59:28] PROBLEM - puppet last run on wtp2017 is CRITICAL: CRITICAL: puppet fail [01:07:24] (03PS1) 10Yuvipanda: k8s: Specify user groups in k8s auth [puppet] - 10https://gerrit.wikimedia.org/r/295102 [01:07:39] (03PS2) 10Yuvipanda: k8s: Specify user groups in k8s auth [puppet] - 10https://gerrit.wikimedia.org/r/295102 [01:26:57] RECOVERY - puppet last run on wtp2017 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [01:34:53] (03PS1) 10Yuvipanda: k8s: Rename infrastructure user types to infrastructure-readwrite [puppet] - 10https://gerrit.wikimedia.org/r/295103 [01:35:05] (03PS2) 10Yuvipanda: k8s: Rename infrastructure user types to infrastructure-readwrite [puppet] - 10https://gerrit.wikimedia.org/r/295103 [01:42:32] (03CR) 10Yuvipanda: [C: 032] k8s: Specify user groups in k8s auth [puppet] - 10https://gerrit.wikimedia.org/r/295102 (owner: 10Yuvipanda) [01:42:45] (03CR) 10Yuvipanda: [C: 032] k8s: Rename infrastructure user types to infrastructure-readwrite [puppet] - 10https://gerrit.wikimedia.org/r/295103 (owner: 10Yuvipanda) [01:44:12] (03PS1) 10Yuvipanda: k8s: Call tool account types 'tool' rather than 'namespaced' [puppet] - 10https://gerrit.wikimedia.org/r/295106 [01:46:58] (03CR) 10Yuvipanda: [C: 032] k8s: Call tool account types 'tool' rather than 'namespaced' [puppet] - 10https://gerrit.wikimedia.org/r/295106 (owner: 10Yuvipanda) [01:57:09] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 80755 MB (15% inode=99%) [02:03:37] RECOVERY - Disk space on elastic1024 is OK: DISK OK [02:25:38] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.6) (duration: 10m 50s) [02:25:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:31:25] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Jun 19 02:31:25 UTC 2016 (duration 5m 47s) [02:31:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:51:23] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2391834 (10mehtab.ahmed) Author has asked for a few days to resolve the matter. [03:37:17] PROBLEM - puppet last run on mw1097 is CRITICAL: CRITICAL: Puppet has 2 failures [03:48:26] PROBLEM - puppet last run on mw2116 is CRITICAL: CRITICAL: Puppet has 1 failures [04:02:36] RECOVERY - puppet last run on mw1097 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:05:27] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: puppet fail [04:13:48] RECOVERY - puppet last run on mw2116 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:33:06] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:55:47] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: puppet fail [04:57:57] PROBLEM - Apache HTTP on mw1138 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.007 second response time [04:58:57] PROBLEM - HHVM rendering on mw1138 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.016 second response time [05:21:17] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:31:07] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:37] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:57] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:08] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:26] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:36] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:07] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Not Available - 530 bytes in 0.040 second response time [06:33:07] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:36] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:46] PROBLEM - puppet last run on elastic1010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:16] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3669 bytes in 0.028 second response time [06:56:26] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:16] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:57:25] RECOVERY - puppet last run on elastic1010 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:57:26] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:57:36] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:06] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:56] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:05] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:59:16] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:51:56] PROBLEM - puppet last run on cp2010 is CRITICAL: CRITICAL: puppet fail [07:55:45] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: Puppet has 15 failures [08:19:16] RECOVERY - puppet last run on cp2010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:21:05] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [08:26:54] PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: puppet fail [08:38:54] PROBLEM - puppet last run on mw2154 is CRITICAL: CRITICAL: puppet fail [08:52:15] RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [09:04:35] RECOVERY - puppet last run on mw2154 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [10:51:32] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures [11:16:31] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:27:02] RECOVERY - HHVM rendering on mw1114 is OK: HTTP OK: HTTP/1.1 200 OK - 66346 bytes in 9.932 second response time [12:27:03] RECOVERY - Apache HTTP on mw1114 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.351 second response time [12:27:18] (03PS1) 10KartikMistry: apertium-arg-cat: Initial Debian packaging [debs/contenttranslation/apertium-arg-cat] - 10https://gerrit.wikimedia.org/r/295121 (https://phabricator.wikimedia.org/T124369) [12:27:36] !log restarted hhvm on mw1114 - trace in /tmp/hhvm.11092.bt, hhvm killed by OOM [12:27:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:29:22] RECOVERY - Apache HTTP on mw1138 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.024 second response time [12:29:24] !log restarted hhvm on mw1138 - trace in /tmp/hhvm.25048.bt, hhvm killed by OOM [12:29:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:29:42] RECOVERY - HHVM rendering on mw1138 is OK: HTTP OK: HTTP/1.1 200 OK - 66336 bytes in 0.591 second response time [12:32:19] from what I can see the threads are stuck in [12:32:19] #01 0x00007f8e41880414 in __pthread_cond_wait () from /build/eglibc-3GlaMS/eglibc-2.19/nptl/../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183 [12:32:22] #02 0x0000000001cfd4c0 in folly::EventBase::runInEventBaseThreadAndWait () from /usr/bin/hhvm [12:32:23] (03PS1) 10KartikMistry: apertium-spa-age: Initial Debian packaging [debs/contenttranslation/apertium-spa-arg] - 10https://gerrit.wikimedia.org/r/295122 (https://phabricator.wikimedia.org/T124370) [12:38:52] (03PS2) 10KartikMistry: apertium-spa-arg: Initial Debian packaging [debs/contenttranslation/apertium-spa-arg] - 10https://gerrit.wikimedia.org/r/295122 (https://phabricator.wikimedia.org/T124370) [12:41:32] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: puppet fail [13:09:01] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [13:22:23] elukey: do you have a minute to check if the jonqueue stucks for global rename stuff? [13:29:37] Steinsplitter: I have no idea how to do it except checking Grafana :( [13:29:59] ok. thx aniway [13:30:58] sorry! [13:31:37] I can see from https://grafana.wikimedia.org/dashboard/db/job-queue-health that the queue size is having some variations for the past hours [13:32:41] but nothing weird if I look the past week [13:33:08] is there any specific event/issue that concerns you? [13:42:38] (going afk, will try to read later :) [14:24:31] PROBLEM - puppet last run on mw2062 is CRITICAL: CRITICAL: Puppet has 1 failures [14:37:40] 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2392278 (10Paladox) [14:49:39] (03PS1) 10Nicko: T137422 Refactor the default cassandra monitoring into a separate class [puppet] - 10https://gerrit.wikimedia.org/r/295123 [14:50:38] RECOVERY - puppet last run on mw2062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:50:42] (03CR) 10jenkins-bot: [V: 04-1] T137422 Refactor the default cassandra monitoring into a separate class [puppet] - 10https://gerrit.wikimedia.org/r/295123 (owner: 10Nicko) [15:01:26] (03CR) 10Edomaur: [C: 031] T137422 Refactor the default cassandra monitoring into a separate class [puppet] - 10https://gerrit.wikimedia.org/r/295123 (owner: 10Nicko) [15:07:57] (03PS2) 10Nicko: T137422 Refactor the default cassandra monitoring into a separate class [puppet] - 10https://gerrit.wikimedia.org/r/295123 [15:13:50] (03PS1) 10Nicko: T137422 Include cassandra monitoring in all the ::cassandra calls [puppet] - 10https://gerrit.wikimedia.org/r/295125 [15:20:39] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures [15:20:40] (03CR) 10Edomaur: [C: 031] "still good for me, but I am perhaps a bit lazy" [puppet] - 10https://gerrit.wikimedia.org/r/295123 (owner: 10Nicko) [15:45:05] (03PS1) 10Maturain: fix puppet unit test in elastic search [puppet] - 10https://gerrit.wikimedia.org/r/295127 [16:04:47] PROBLEM - puppet last run on graphite2001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:06:01] (03PS1) 10Nicko: T136996 Including a .policy file to grant permission to send logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/295129 [16:06:50] (03CR) 10Nicko: [C: 04-1] "This firstly needs to be tested" [puppet] - 10https://gerrit.wikimedia.org/r/295129 (owner: 10Nicko) [16:09:56] (03PS2) 10Nicko: T136696 Including a .policy file to grant permission to send logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/295129 [16:22:40] Hi, can somebody have a look at T129743? This breaks WMCZ's mails which was sent to info@wikimedia.cz. [16:22:41] T129743: E-mail incorrectly forwarded to wm-cz OTRS e-mail - https://phabricator.wikimedia.org/T129743 [16:29:06] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Link "current" to last dump set on cyrrussearch get a 404 - https://phabricator.wikimedia.org/T138176#2392343 (10Edomaur) [16:29:47] RECOVERY - puppet last run on graphite2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:46:57] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:50:09] PROBLEM - puppet last run on mw2079 is CRITICAL: CRITICAL: puppet fail [16:51:49] (03PS1) 10Maturain: fix puppet unit test for squid3 [puppet] - 10https://gerrit.wikimedia.org/r/295130 [17:02:25] (03CR) 10Nicko: [C: 031] fix puppet unit test for squid3 [puppet] - 10https://gerrit.wikimedia.org/r/295130 (owner: 10Maturain) [17:17:38] RECOVERY - puppet last run on mw2079 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:20:29] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures [17:51:38] PROBLEM - puppet last run on mw1143 is CRITICAL: CRITICAL: Puppet has 72 failures [18:15:37] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:31:58] PROBLEM - puppet last run on druid1002 is CRITICAL: CRITICAL: puppet fail [18:57:18] RECOVERY - puppet last run on druid1002 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [20:47:14] PROBLEM - HHVM rendering on mw1143 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:48:24] PROBLEM - Apache HTTP on mw1143 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:48:44] PROBLEM - dhclient process on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:48:55] PROBLEM - Disk space on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:49:04] PROBLEM - nutcracker process on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:49:13] PROBLEM - HHVM processes on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:50:04] PROBLEM - nutcracker port on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:50:15] PROBLEM - configured eth on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:50:15] PROBLEM - Check size of conntrack table on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:50:35] PROBLEM - DPKG on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:50:43] PROBLEM - salt-minion processes on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:52:54] RECOVERY - Disk space on mw1143 is OK: DISK OK [20:53:13] RECOVERY - nutcracker process on mw1143 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [20:53:13] RECOVERY - HHVM processes on mw1143 is OK: PROCS OK: 6 processes with command name hhvm [20:53:35] PROBLEM - SSH on mw1143 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:54:43] RECOVERY - DPKG on mw1143 is OK: All packages OK [20:54:44] RECOVERY - salt-minion processes on mw1143 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [20:57:03] PROBLEM - dhclient process on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:59:14] PROBLEM - Disk space on mw1143 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:59:25] PROBLEM - nutcracker process on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:59:34] PROBLEM - HHVM processes on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:01:14] PROBLEM - DPKG on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:01:14] PROBLEM - salt-minion processes on mw1143 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:14:33] RECOVERY - HHVM processes on mw1143 is OK: PROCS OK: 6 processes with command name hhvm [21:14:43] RECOVERY - SSH on mw1143 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [21:15:14] RECOVERY - nutcracker port on mw1143 is OK: TCP OK - 0.000 second response time on port 11212 [21:15:33] RECOVERY - configured eth on mw1143 is OK: OK - interfaces up [21:15:33] RECOVERY - Check size of conntrack table on mw1143 is OK: OK: nf_conntrack is 0 % full [21:15:53] RECOVERY - salt-minion processes on mw1143 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [21:15:54] RECOVERY - DPKG on mw1143 is OK: All packages OK [21:16:03] RECOVERY - dhclient process on mw1143 is OK: PROCS OK: 0 processes with command name dhclient [21:16:13] RECOVERY - Disk space on mw1143 is OK: DISK OK [21:16:23] RECOVERY - nutcracker process on mw1143 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [21:17:43] PROBLEM - Host payments2003 is DOWN: PING CRITICAL - Packet loss = 100% [21:17:51] PROBLEM - Host saiph is DOWN: PING CRITICAL - Packet loss = 100% [21:19:00] PROBLEM - Host fdb2001 is DOWN: PING CRITICAL - Packet loss = 100% [21:19:20] PROBLEM - Host pay-lvs2001 is DOWN: PING CRITICAL - Packet loss = 100% [21:19:28] PROBLEM - Host heka is DOWN: PING CRITICAL - Packet loss = 100% [21:20:10] PROBLEM - Router interfaces on pfw-codfw is CRITICAL: CRITICAL: host 208.80.153.195, interfaces up: 51, down: 8, dormant: 0, excluded: 0, unused: 0BRfab1.0: down - BRswfab0.0: down - BRswfab0: down - BRvlan.2133: down - Subnet frack-bastion-codfwBRvlan.2140: down - Subnet frack-management-codfwBRfab1: down - BRvlan.2137: down - Subnet frack-listenerdmz-codfwBRxe-15/0/1: down - BR [21:20:30] PROBLEM - Host payments2001 is DOWN: PING CRITICAL - Packet loss = 100% [21:20:39] PROBLEM - Host alnilam is DOWN: PING CRITICAL - Packet loss = 100% [21:21:09] PROBLEM - Host rigel is DOWN: PING CRITICAL - Packet loss = 100% [21:21:58] RECOVERY - HHVM rendering on mw1143 is OK: HTTP OK: HTTP/1.1 200 OK - 65193 bytes in 2.799 second response time [21:23:00] RECOVERY - Apache HTTP on mw1143 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.036 second response time [21:23:08] RECOVERY - puppet last run on mw1143 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:23:58] RECOVERY - Host payments2003 is UP: PING OK - Packet loss = 0%, RTA = 37.24 ms [21:24:09] RECOVERY - Host saiph is UP: PING OK - Packet loss = 0%, RTA = 36.94 ms [21:24:40] PROBLEM - Host pay-lvs2002 is DOWN: PING CRITICAL - Packet loss = 100% [21:25:09] PROBLEM - Host mintaka is DOWN: PING CRITICAL - Packet loss = 100% [21:25:18] RECOVERY - Router interfaces on pfw-codfw is OK: OK: host 208.80.153.195, interfaces up: 90, down: 0, dormant: 0, excluded: 0, unused: 0 [21:26:18] RECOVERY - Host heka is UP: PING OK - Packet loss = 0%, RTA = 38.36 ms [21:27:10] RECOVERY - Host payments2001 is UP: PING OK - Packet loss = 0%, RTA = 38.44 ms [21:27:27] q [21:28:04] RECOVERY - Host rigel is UP: PING WARNING - Packet loss = 86%, RTA = 37.78 ms [21:30:13] PROBLEM - check_puppetrun on betelgeuse is CRITICAL: CRITICAL: Puppet has 4 failures [21:30:14] PROBLEM - check_puppetrun on alnitak is CRITICAL: CRITICAL: Puppet has 6 failures [21:30:14] PROBLEM - check_puppetrun on saiph is CRITICAL: CRITICAL: Puppet has 1 failures [21:30:24] RECOVERY - Host pay-lvs2002 is UP: PING OK - Packet loss = 0%, RTA = 40.13 ms [21:30:33] RECOVERY - Host pay-lvs2001 is UP: PING OK - Packet loss = 0%, RTA = 36.89 ms [21:30:40] RECOVERY - Host fdb2001 is UP: PING OK - Packet loss = 0%, RTA = 36.66 ms [21:30:50] RECOVERY - Host mintaka is UP: PING OK - Packet loss = 0%, RTA = 40.24 ms [21:31:00] RECOVERY - Host alnilam is UP: PING OK - Packet loss = 0%, RTA = 37.29 ms [21:34:21] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [21:35:10] PROBLEM - check_puppetrun on betelgeuse is CRITICAL: CRITICAL: Puppet has 4 failures [21:35:10] PROBLEM - check_puppetrun on rigel is CRITICAL: CRITICAL: Puppet has 4 failures [21:35:11] PROBLEM - check_puppetrun on alnitak is CRITICAL: CRITICAL: Puppet has 6 failures [21:35:11] PROBLEM - check_puppetrun on saiph is CRITICAL: CRITICAL: Puppet has 1 failures [21:35:11] PROBLEM - check_puppetrun on alnilam is CRITICAL: CRITICAL: Puppet has 5 failures [21:35:11] PROBLEM - check_puppetrun on mintaka is CRITICAL: CRITICAL: Puppet has 8 failures [21:37:15] (03PS3) 10Ladsgroup: ores: fix workers and config [puppet] - 10https://gerrit.wikimedia.org/r/293904 [21:38:36] (03CR) 10jenkins-bot: [V: 04-1] ores: fix workers and config [puppet] - 10https://gerrit.wikimedia.org/r/293904 (owner: 10Ladsgroup) [21:38:49] (03PS4) 10Ladsgroup: ores: fix workers and config [puppet] - 10https://gerrit.wikimedia.org/r/293904 [21:39:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [21:39:50] (03CR) 10jenkins-bot: [V: 04-1] ores: fix workers and config [puppet] - 10https://gerrit.wikimedia.org/r/293904 (owner: 10Ladsgroup) [21:40:10] PROBLEM - check_puppetrun on rigel is CRITICAL: CRITICAL: Puppet has 4 failures [21:40:10] PROBLEM - check_puppetrun on saiph is CRITICAL: CRITICAL: Puppet has 1 failures [21:40:10] PROBLEM - check_puppetrun on betelgeuse is CRITICAL: CRITICAL: Puppet has 4 failures [21:40:11] PROBLEM - check_puppetrun on mintaka is CRITICAL: CRITICAL: Puppet has 8 failures [21:40:11] PROBLEM - check_puppetrun on alnilam is CRITICAL: CRITICAL: Puppet has 5 failures [21:40:11] PROBLEM - check_puppetrun on alnitak is CRITICAL: CRITICAL: Puppet has 6 failures [21:44:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [21:45:10] PROBLEM - check_puppetrun on betelgeuse is CRITICAL: CRITICAL: Puppet has 4 failures [21:45:10] PROBLEM - check_puppetrun on rigel is CRITICAL: CRITICAL: Puppet has 4 failures [21:45:10] PROBLEM - check_puppetrun on alnitak is CRITICAL: CRITICAL: Puppet has 6 failures [21:45:11] PROBLEM - check_puppetrun on alnilam is CRITICAL: CRITICAL: Puppet has 5 failures [21:45:11] PROBLEM - check_puppetrun on saiph is CRITICAL: CRITICAL: Puppet has 1 failures [21:45:11] RECOVERY - check_puppetrun on mintaka is OK: OK: Puppet is currently enabled, last run 184 seconds ago with 0 failures [21:46:39] (03PS5) 10Ladsgroup: ores: fix workers and config [puppet] - 10https://gerrit.wikimedia.org/r/293904 [21:49:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [21:49:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [21:50:10] PROBLEM - check_puppetrun on rigel is CRITICAL: CRITICAL: Puppet has 4 failures [21:50:10] PROBLEM - check_puppetrun on betelgeuse is CRITICAL: CRITICAL: Puppet has 4 failures [21:50:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [21:50:11] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [21:50:11] RECOVERY - check_puppetrun on alnitak is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:50:11] RECOVERY - check_puppetrun on saiph is OK: OK: Puppet is currently enabled, last run 133 seconds ago with 0 failures [21:50:11] PROBLEM - check_puppetrun on alnilam is CRITICAL: CRITICAL: Puppet has 5 failures [21:54:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [21:54:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [21:55:10] RECOVERY - check_puppetrun on betelgeuse is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:55:10] RECOVERY - check_puppetrun on rigel is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:55:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [21:55:11] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [21:55:11] RECOVERY - check_puppetrun on alnilam is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:56:21] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.48.133, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [21:57:20] PROBLEM - Restbase root url on restbase1014 is CRITICAL: Connection refused [21:59:21] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [21:59:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [21:59:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:00:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:00:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:04:22] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:04:23] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:04:23] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:04:23] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [22:05:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:05:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:09:22] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:09:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [22:09:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:09:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:10:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:10:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:10:57] 06Operations, 06Commons, 10media-storage: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#2392592 (10Tgr) @Andrew / @fgiunchedi wanna try again with 2.40.16? [22:14:21] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:14:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:14:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:14:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [22:15:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:15:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:18:20] RECOVERY - Restbase root url on restbase1014 is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.016 second response time [22:19:21] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:19:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:19:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:19:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [22:19:31] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [22:20:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:20:10] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:20:11] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:24:22] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:24:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:24:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:24:23] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [22:25:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:25:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:25:10] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:29:22] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:29:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:29:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:29:23] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [22:30:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:30:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:30:10] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:34:21] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:34:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:34:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:34:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [22:35:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:35:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:35:10] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:39:21] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:39:21] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [22:39:21] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:39:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:40:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:40:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:40:10] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:44:21] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [22:44:21] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:44:22] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:44:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:45:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:45:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:45:10] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:49:21] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:49:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:49:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [22:49:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:50:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:50:10] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:50:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:54:21] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [22:54:21] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:54:22] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:54:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:55:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [22:55:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [22:55:10] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:56:51] PROBLEM - MegaRAID on db1009 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [22:59:21] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [22:59:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [22:59:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [22:59:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [23:00:10] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:00:10] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:00:10] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:04:18] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:04:19] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [23:04:19] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:04:19] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:05:17] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:05:17] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:05:17] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:09:19] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:09:19] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:09:19] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: puppet fail [23:09:19] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:10:08] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:10:08] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:10:17] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:14:19] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:14:19] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:14:19] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:14:19] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:15:07] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:15:08] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:15:17] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:19:18] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:19:19] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:19:19] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:19:19] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:20:17] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:20:17] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:20:17] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:24:18] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:24:19] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:24:19] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:24:19] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:25:07] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:25:17] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:25:17] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:29:19] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:29:19] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:29:19] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:29:19] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:30:08] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:30:17] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:30:17] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:34:23] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:34:23] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:34:23] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:34:23] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:35:11] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:35:11] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:35:11] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:39:22] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:39:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:39:23] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:39:23] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:40:11] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:40:11] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:40:12] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:44:22] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:44:23] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:44:23] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:44:23] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:45:11] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:45:11] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:45:12] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:49:22] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:49:22] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:49:23] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:49:23] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:50:11] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:50:11] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:50:12] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:54:22] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:54:23] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:54:23] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:54:23] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:55:11] PROBLEM - check_puppetrun on pay-lvs2001 is CRITICAL: CRITICAL: puppet fail [23:55:11] PROBLEM - check_puppetrun on pay-lvs2002 is CRITICAL: CRITICAL: puppet fail [23:55:11] PROBLEM - check_mysql on fdb2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:59:23] PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [23:59:23] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 20 failures [23:59:23] PROBLEM - check_puppetrun on payments2003 is CRITICAL: CRITICAL: Puppet has 20 failures [23:59:23] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 20 failures