[00:01:10] (03CR) 10Tim Landscheidt: "a) Tested on a Trusty instance in Toolsbeta." [puppet] - 10https://gerrit.wikimedia.org/r/181789 (owner: 10Tim Landscheidt) [00:04:25] (03PS2) 10Yuvipanda: Fix motd on Trusty instances [puppet] - 10https://gerrit.wikimedia.org/r/181789 (owner: 10Tim Landscheidt) [00:07:48] (03CR) 10Yuvipanda: [C: 031] "Welcome back :)" [puppet] - 10https://gerrit.wikimedia.org/r/181789 (owner: 10Tim Landscheidt) [00:12:35] YuviPanda: don't merge that just yet [00:12:56] paravoid: yeah, not doing that. don’t want to mess with ssh at this time [00:12:58] I want to review this because I wanted to create a larger patchset for jessie as well [00:13:07] ah, right [00:54:25] PROBLEM - Disk space on analytics1015 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/a 72343 MB (3% inode=99%): [01:01:56] PROBLEM - Disk space on analytics1015 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/a 72005 MB (3% inode=99%): [01:09:59] PROBLEM - Host mw1230 is DOWN: CRITICAL - Plugin timed out after 15 seconds [01:10:16] RECOVERY - Host mw1230 is UP: PING OK - Packet loss = 0%, RTA = 1.25 ms [03:21:58] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [03:35:20] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:17:27] PROBLEM - Disk space on analytics1034 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/b 149288 MB (3% inode=99%): [04:21:18] PROBLEM - Disk space on analytics1011 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/a 87005 MB (4% inode=99%): /var/lib/hadoop/data/c 83733 MB (4% inode=99%): /var/lib/hadoop/data/e 84445 MB (4% inode=99%): /var/lib/hadoop/data/g 84753 MB (4% inode=99%): /var/lib/hadoop/data/i 82402 MB (4% inode=99%): /var/lib/hadoop/data/k 74682 MB (3% inode=99%): [05:01:38] RECOVERY - Disk space on analytics1034 is OK: DISK OK [05:02:09] (03PS2) 10KartikMistry: Fix trailing spaces [puppet] - 10https://gerrit.wikimedia.org/r/181549 [05:02:31] (03PS4) 10KartikMistry: Fixed spacing and alignment [puppet] - 10https://gerrit.wikimedia.org/r/181404 [06:34:13] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 3 failures [06:34:32] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 2 failures [06:34:42] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:56] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:03] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 2 failures [06:36:58] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:46:24] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:24] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:47:25] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:18] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:25] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:35] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:54:35] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 565.284413104 [07:28:30] PROBLEM - Disk space on analytics1033 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/b 146228 MB (3% inode=99%): [08:07:57] PROBLEM - Disk space on analytics1020 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/i 66360 MB (3% inode=99%): [08:22:45] RECOVERY - Disk space on analytics1033 is OK: DISK OK [08:33:36] RECOVERY - Disk space on analytics1015 is OK: DISK OK [08:35:06] RECOVERY - Disk space on analytics1020 is OK: DISK OK [08:36:05] RECOVERY - Disk space on analytics1014 is OK: DISK OK [08:36:11] RECOVERY - Disk space on analytics1011 is OK: DISK OK [08:36:45] RECOVERY - Disk space on analytics1019 is OK: DISK OK [09:46:25] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [09:49:55] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [09:49:55] PROBLEM - SSH on protactinium is CRITICAL: Server answer: [09:52:05] PROBLEM - SSH on lvs1006 is CRITICAL: Server answer: [09:53:15] RECOVERY - SSH on protactinium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [09:55:24] RECOVERY - SSH on lvs1006 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [10:06:12] hmm [10:06:20] I suppose a lot of these are just neon being sick [10:27:37] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0003913123989 [11:02:25] goodnight [11:02:27] PROBLEM - Host 2620:0:861:2:208:80:154:157 is DOWN: CRITICAL - Plugin timed out after 15 seconds [11:05:16] RECOVERY - Host 2620:0:861:2:208:80:154:157 is UP: PING OK - Packet loss = 0%, RTA = 1.91 ms [13:29:38] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: puppet fail [13:43:17] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:22:48] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: puppet fail [14:36:21] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:41:22] (03PS1) 10Nemo bis: Remove "Gill Sans" from error page: not rendered for most users [puppet] - 10https://gerrit.wikimedia.org/r/181868 [16:43:58] (03PS1) 10Glaisher: Apply workaround for chrome font bug in 404 error page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181869 [16:45:49] err [16:46:31] (03CR) 10Glaisher: "This bug is not present on this error page, just the 404: https://gerrit.wikimedia.org/r/#/c/181869/" [puppet] - 10https://gerrit.wikimedia.org/r/181868 (owner: 10Nemo bis) [16:47:32] (03PS2) 10Glaisher: Apply workaround for chrome font bug in 404 error page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181869 [16:54:39] (03Abandoned) 10Nemo bis: Remove "Gill Sans" from error page: not rendered for most users [puppet] - 10https://gerrit.wikimedia.org/r/181868 (owner: 10Nemo bis) [18:05:46] (03PS1) 10Tpt: Display links to Wikidata in the other project sidebar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181871 [18:10:56] (03CR) 10Hoo man: [C: 031] "Looks like we missed this earlier on..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181871 (owner: 10Tpt) [18:54:41] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Puppet has 1 failures [19:08:20] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:05:52] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 6.97000500876e-21 [20:13:15] PROBLEM - Host elastic1012 is DOWN: CRITICAL - Plugin timed out after 15 seconds [20:13:23] RECOVERY - Host elastic1012 is UP: PING OK - Packet loss = 0%, RTA = 2.73 ms [22:00:23] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [22:01:52] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: Puppet has 2 failures [22:07:12] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [22:14:55] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:01:13] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 5.87795007853e-26