[00:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 00:02:37 UTC 2013 [00:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [00:17:54] Ryan_Lane: see the logo on this page: http://en.wikipedia.org/wiki/Girl_Guides_Association_of_the_United_Arab_Emirates [00:18:03] Ryan_Lane: if you click it, the logo claims it is not in use on any pages [00:18:20] ok? [00:18:37] this seems like something that should be a bugzilla bug, rather than an ops issue [00:18:48] I'm lazy, now you know [00:19:02] and I'm lazy too, so now no one else knows [00:19:23] I honestly don't know what it's supposed to do, so I'm not going to enter a bug [00:19:37] well I'm going to fix it with a null edit now [00:19:41] but there may be some issue there [00:19:45] enter a bug [00:21:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [00:24:56] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 00:24:50 UTC 2013 [00:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [00:27:47] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 00:27:41 UTC 2013 [00:28:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [00:29:05] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 00:28:56 UTC 2013 [00:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [00:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 00:32:40 UTC 2013 [00:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [00:36:57] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [00:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.145 second response time [00:54:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 00:54:41 UTC 2013 [00:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [00:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 00:57:38 UTC 2013 [00:58:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [00:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 00:58:45 UTC 2013 [00:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [01:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 01:02:43 UTC 2013 [01:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [01:04:14] (PS2) Ryan Lane: Use grains for deployment targets [operations/puppet] - https://gerrit.wikimedia.org/r/74108 [01:04:51] (CR) jenkins-bot: [V: -1] Use grains for deployment targets [operations/puppet] - https://gerrit.wikimedia.org/r/74108 (owner: Ryan Lane) [01:07:14] (PS3) Ryan Lane: Use grains for deployment targets [operations/puppet] - https://gerrit.wikimedia.org/r/74108 [01:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [01:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 01:24:45 UTC 2013 [01:24:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [01:28:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 01:28:05 UTC 2013 [01:28:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [01:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 01:28:45 UTC 2013 [01:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [01:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 01:32:44 UTC 2013 [01:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [01:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 01:54:46 UTC 2013 [01:55:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [01:58:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 01:58:02 UTC 2013 [01:58:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [01:58:46] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 01:58:43 UTC 2013 [01:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [02:02:22] (PS2) Ottomata: Fixing automated hue SSL generation and permissions [operations/puppet/cdh4] - https://gerrit.wikimedia.org/r/74686 [02:02:36] (PS4) Ryan Lane: Use grains for deployment targets [operations/puppet] - https://gerrit.wikimedia.org/r/74108 [02:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 02:02:42 UTC 2013 [02:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [02:04:30] (CR) Ottomata: "Ergh, had to do some hacky puppet things to make that happen. Check it out." [operations/puppet/cdh4] - https://gerrit.wikimedia.org/r/74686 (owner: Ottomata) [02:06:01] !log LocalisationUpdate completed (1.22wmf10) at Mon Jul 22 02:06:01 UTC 2013 [02:06:12] Logged the message, Master [02:10:17] !log LocalisationUpdate completed (1.22wmf11) at Mon Jul 22 02:10:16 UTC 2013 [02:10:27] Logged the message, Master [02:17:56] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 22 02:17:55 UTC 2013 [02:18:06] Logged the message, Master [02:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [02:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 02:24:51 UTC 2013 [02:25:58] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [02:29:02] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 02:28:56 UTC 2013 [02:29:12] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 02:29:02 UTC 2013 [02:29:25] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [02:29:25] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [02:29:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.155 second response time [02:34:52] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 02:34:43 UTC 2013 [02:35:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [02:55:02] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 02:55:00 UTC 2013 [02:55:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [02:57:42] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 02:57:37 UTC 2013 [02:58:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [02:58:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 02:58:49 UTC 2013 [02:59:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [03:02:42] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 03:02:39 UTC 2013 [03:03:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [03:25:02] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 03:24:53 UTC 2013 [03:25:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [03:28:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 03:28:46 UTC 2013 [03:29:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [03:30:42] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 03:30:34 UTC 2013 [03:31:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [03:32:42] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 03:32:40 UTC 2013 [03:33:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [03:54:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 03:54:45 UTC 2013 [03:55:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [03:58:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 03:58:49 UTC 2013 [03:59:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [03:59:52] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 03:59:50 UTC 2013 [04:00:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [04:03:02] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 04:02:59 UTC 2013 [04:03:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [04:25:02] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 04:24:53 UTC 2013 [04:25:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [04:27:42] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 04:27:36 UTC 2013 [04:28:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [04:29:02] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 04:28:52 UTC 2013 [04:29:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [04:32:52] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 04:32:46 UTC 2013 [04:33:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [04:39:22] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 10 hours [04:46:42] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:47:42] RECOVERY - Disk space on labstore3 is OK: DISK OK [04:49:42] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:50:33] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [04:54:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 04:54:48 UTC 2013 [04:55:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [04:58:22] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 04:58:16 UTC 2013 [04:58:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [04:58:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 04:58:46 UTC 2013 [04:59:23] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [05:02:42] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 05:02:41 UTC 2013 [05:03:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [05:05:22] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: No successful Puppet run in the last 10 hours [05:09:22] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: No successful Puppet run in the last 10 hours [05:10:22] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: No successful Puppet run in the last 10 hours [05:24:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 05:24:49 UTC 2013 [05:25:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [05:28:12] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 05:28:11 UTC 2013 [05:28:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [05:29:02] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 05:28:57 UTC 2013 [05:29:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [05:32:52] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 05:32:47 UTC 2013 [05:33:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [05:39:42] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:32] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [05:45:22] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [05:45:22] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [05:45:22] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [05:45:23] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [05:45:23] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [05:45:23] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [05:45:23] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [05:50:22] PROBLEM - Puppet freshness on ms-fe1002 is CRITICAL: No successful Puppet run in the last 10 hours [05:54:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 05:54:48 UTC 2013 [05:55:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [05:56:22] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: No successful Puppet run in the last 10 hours [05:58:12] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 05:58:07 UTC 2013 [05:58:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [05:58:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 05:58:48 UTC 2013 [05:59:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [06:01:22] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: No successful Puppet run in the last 10 hours [06:02:42] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 06:02:39 UTC 2013 [06:03:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [06:17:22] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: No successful Puppet run in the last 10 hours [06:18:22] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 10 hours [06:19:39] (PS4) Faidon: (power)dns: support multiple listen addresses [operations/puppet] - https://gerrit.wikimedia.org/r/74615 [06:20:23] (CR) Faidon: [C: 2] (power)dns: support multiple listen addresses [operations/puppet] - https://gerrit.wikimedia.org/r/74615 (owner: Faidon) [06:20:24] (Merged) Faidon: (power)dns: support multiple listen addresses [operations/puppet] - https://gerrit.wikimedia.org/r/74615 (owner: Faidon) [06:23:05] grr puppet broken [06:26:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 06:26:45 UTC 2013 [06:27:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [06:27:52] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 06:27:42 UTC 2013 [06:28:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 06:28:44 UTC 2013 [06:29:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [06:31:19] ffs [06:32:42] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 06:32:39 UTC 2013 [06:33:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [06:53:16] (PS1) Faidon: Workaround fallout from sysctlfile [operations/puppet] - https://gerrit.wikimedia.org/r/75065 [06:54:45] (CR) Faidon: [C: 2] "Ihatemyself" [operations/puppet] - https://gerrit.wikimedia.org/r/75065 (owner: Faidon) [06:54:46] (Merged) Faidon: Workaround fallout from sysctlfile [operations/puppet] - https://gerrit.wikimedia.org/r/75065 (owner: Faidon) [06:54:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 06:54:46 UTC 2013 [06:55:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [06:58:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:58:52] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 06:58:45 UTC 2013 [06:58:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 06:58:50 UTC 2013 [06:59:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [06:59:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [06:59:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [07:01:02] RECOVERY - Puppet freshness on dobson is OK: puppet ran at Mon Jul 22 07:00:58 UTC 2013 [07:02:52] (PS1) Faidon: Undecom cp104[1234] [operations/puppet] - https://gerrit.wikimedia.org/r/75067 [07:02:52] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 07:02:44 UTC 2013 [07:03:13] (CR) Faidon: [C: 2] Undecom cp104[1234] [operations/puppet] - https://gerrit.wikimedia.org/r/75067 (owner: Faidon) [07:03:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [07:03:30] (Merged) Faidon: Undecom cp104[1234] [operations/puppet] - https://gerrit.wikimedia.org/r/75067 (owner: Faidon) [07:17:54] !log restarting pybal and manually ipvsadm removing dns_auth services from lvs1/lvs5/lvs1002/lvs1005 [07:18:05] Logged the message, Master [07:19:51] (PS1) Jalexander: Make FlaggedRev rights available to global groups [operations/mediawiki-config] - https://gerrit.wikimedia.org/r/75070 [07:22:02] RECOVERY - Puppet freshness on mchenry is OK: puppet ran at Mon Jul 22 07:21:55 UTC 2013 [07:24:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 07:24:47 UTC 2013 [07:25:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [07:28:58] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 07:28:50 UTC 2013 [07:28:58] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 07:28:50 UTC 2013 [07:29:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [07:29:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [07:31:18] (PS1) Faidon: Add new ns0/ns1 service IPs to dobson & linne [operations/puppet] - https://gerrit.wikimedia.org/r/75071 [07:32:40] morning [07:32:42] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:42] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 07:32:41 UTC 2013 [07:33:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [07:33:33] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [07:34:05] good morning [07:38:37] (CR) Faidon: [C: 2] Add new ns0/ns1 service IPs to dobson & linne [operations/puppet] - https://gerrit.wikimedia.org/r/75071 (owner: Faidon) [07:38:38] (Merged) Faidon: Add new ns0/ns1 service IPs to dobson & linne [operations/puppet] - https://gerrit.wikimedia.org/r/75071 (owner: Faidon) [07:42:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:43:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [07:44:59] (PS2) Hashar: set some paths to use $wmfHostnames['bits'] [operations/mediawiki-config] - https://gerrit.wikimedia.org/r/71774 [07:46:17] paravoid: i should have caught that in the sysctlfile module [07:46:38] the fact that init.pp is a resource and not a class is also a bit wtf. [07:48:55] (CR) Addshore: [C: 1] Move property-create for * to after loading of Wikibase [operations/mediawiki-config] - https://gerrit.wikimedia.org/r/74620 (owner: Aude) [07:49:10] the whole thing is pretty crazy [07:49:27] base.pp defining a file with source => module/sysctlfile/... [07:49:31] etc. [07:49:38] for some definition of module [07:53:52] PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:54:42] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [07:54:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 07:54:44 UTC 2013 [07:55:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:57:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [07:57:42] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:57:52] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 07:57:42 UTC 2013 [07:57:52] PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [07:58:42] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [07:58:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 07:58:48 UTC 2013 [07:59:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [08:00:32] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [08:01:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:02:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [08:03:22] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 08:03:12 UTC 2013 [08:03:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:32] PROBLEM - search indices - check lucene status page on search1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern found - 189 bytes in 0.002 second response time [08:08:33] I have not received any bugmail from bugzilla.wikimedia.org since 22:49UTC. Is some mail infrastructure down? [08:09:48] (as I do see changes in Bugzilla after 22:49 when I query it) [08:10:59] (PS1) Hashar: fix system_role for role::protoproxy::ssl::beta [operations/puppet] - https://gerrit.wikimedia.org/r/75074 [08:11:45] (PS1) Eloquence: Disable "Mark as helpful" extension on English Wikipedia. [operations/mediawiki-config] - https://gerrit.wikimedia.org/r/75075 [08:12:10] andre__: I have no clue [08:13:21] I cannot trigger a new bugmail either by commenting right now. Checked on gmail.com so it's not my local MUA. So I expect something is broken :-/ [08:13:44] I can't lookup the mail queue on bugzilla server :( [08:13:49] pity :) [08:14:08] I wonder if apergos could. Or anybody else in European timezones [08:14:21] any root could :) [08:14:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:14:45] could ? [08:14:47] yeah, but they need to be awake :P [08:14:53] bugzilla is not sending emails [08:15:01] apergos, see backlog here [08:15:01] oh [08:15:03] apergos: so maybe kaulen.wikimedia.org has some troubles to send emails [08:15:54] ah I managed to send myself an email :-] [08:15:57] using 'mail' command [08:16:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [08:17:10] I see an aklapper message having been processed in the log [08:17:43] 2013-07-22 08:16:09 (utc) [08:17:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:18:22] Uhm. Maybe the problem is with GMail then [08:18:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [08:20:05] ...which is also unlikely. Guess I should reboot my machine, though that still wouldn't explain why gmail.com in my browser does not show any bugmail either [08:20:26] well it would be easy to check if it's gmail or not [08:20:29] (I don't have gmail) [08:20:55] plus I can receive other "normal" email perfectly via my work account (which is gmail) [08:21:05] like mailing lists or private mail [08:23:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:24:27] * apergos looks irritatedly at the labstore3 alert [08:24:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 08:24:46 UTC 2013 [08:25:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [08:26:22] 2013-07-22 08:21:38 1V1BNC-0001aR-1f => aklapper@wikimedia.org R=smart_route T=remote_smtp S=2744 H=mchenry.wikimedia.org [2620:0:860:2:219:b9ff:fedd:c027] C="250 OK id=1V1BNC-0008Mh-5z" DT=0s [08:26:27] 2013-07-22 08:21:38 1V1BNC-0008Mh-5z => aklapper@wikimedia.org R=ldap_account T=remote_smtp S=3017 H=aspmx.l.google.com [2607:f8b0:400d:c02::1a] C="250 2.0.0 OK 1374481298 q4si10939151qag.112 - gsmtp" DT=0s [08:26:46] delivered to google [08:27:03] check your spam folder [08:27:42] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 08:27:38 UTC 2013 [08:28:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [08:28:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 08:28:45 UTC 2013 [08:29:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [08:29:58] RECOVERY - Disk space on labstore3 is OK: DISK OK [08:32:52] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 08:32:44 UTC 2013 [08:33:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [08:33:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:35:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [08:37:16] damn, it's really just my work account. I do receive bugmail for my testing account. [08:37:24] * andre__ totally puzzled [08:37:55] my work account is still set as globalwatcher in Bugzilla, and my email preferences are as usual [08:38:18] oh fuck. GMail spam folder. [08:38:42] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:38:43] that's the solution. apergos: sorry for the noise, got all bugmail in my gmail spam folder, but no idea yet why [08:39:00] * andre__ grumbles [08:39:00] okey dokey [08:39:27] 11:27 < paravoid> check your spam folder [08:39:29] :) [08:39:42] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [08:41:02] still wondering what happened. Gmail, the usual mystery. [08:42:42] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:43:32] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [08:51:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:54:32] PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:54:42] PROBLEM - DPKG on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:54:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 08:54:45 UTC 2013 [08:54:52] PROBLEM - SSH on labstore3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:55:23] RECOVERY - RAID on labstore3 is OK: OK: State is Optimal, checked 1 logical device(s) [08:55:32] RECOVERY - DPKG on labstore3 is OK: All packages OK [08:55:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [08:55:42] RECOVERY - SSH on labstore3 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [08:56:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [08:57:42] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 08:57:37 UTC 2013 [08:58:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [08:58:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 08:58:47 UTC 2013 [08:59:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [09:00:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:01:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [09:04:52] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 09:04:51 UTC 2013 [09:05:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [09:14:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:15:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [09:22:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:24:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 09:24:46 UTC 2013 [09:25:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [09:25:47] !log adding new ns0/ns1 service ip static routes to dobson/linne on cr1-sdtpa/cr2-pmtpa [09:25:57] Logged the message, Master [09:27:43] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 09:27:38 UTC 2013 [09:28:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [09:28:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [09:28:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 09:28:50 UTC 2013 [09:29:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [09:32:12] (PS1) Hashar: beta: set $wg.*Server for loginwiki [operations/mediawiki-config] - https://gerrit.wikimedia.org/r/75080 [09:32:40] (CR) Hashar: [C: 2] beta: set $wg.*Server for loginwiki [operations/mediawiki-config] - https://gerrit.wikimedia.org/r/75080 (owner: Hashar) [09:32:43] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 09:32:38 UTC 2013 [09:32:48] (Merged) jenkins-bot: beta: set $wg.*Server for loginwiki [operations/mediawiki-config] - https://gerrit.wikimedia.org/r/75080 (owner: Hashar) [09:33:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [09:49:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:51:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [09:54:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 09:54:49 UTC 2013 [09:55:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [09:56:32] PROBLEM - Varnish HTTP mobile-frontend on cp1046 is CRITICAL: HTTP CRITICAL - No data received from host [09:57:32] RECOVERY - Varnish HTTP mobile-frontend on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 0.002 second response time [09:57:42] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 09:57:41 UTC 2013 [09:57:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:58:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [09:58:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 09:58:46 UTC 2013 [09:58:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [09:59:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [10:02:52] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 10:02:45 UTC 2013 [10:03:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [10:07:42] PROBLEM - Varnish HTTP mobile-frontend on cp1046 is CRITICAL: HTTP CRITICAL - No data received from host [10:07:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:08:32] RECOVERY - Varnish HTTP mobile-frontend on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 0.005 second response time [10:20:45] PROBLEM - Varnish HTTP mobile-frontend on cp1046 is CRITICAL: HTTP CRITICAL - No data received from host [10:20:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [10:21:33] RECOVERY - Varnish HTTP mobile-frontend on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 2.134 second response time [10:24:37] (PS1) Hashar: varnish: backends trust 127.0.0.1 for XFF [operations/puppet] - https://gerrit.wikimedia.org/r/75085 [10:24:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 10:24:44 UTC 2013 [10:25:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [10:27:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:28:02] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 10:27:55 UTC 2013 [10:28:22] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [10:28:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [10:28:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:28:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 10:28:51 UTC 2013 [10:29:22] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [10:29:42] PROBLEM - Varnish HTTP mobile-frontend on cp1046 is CRITICAL: HTTP CRITICAL - No data received from host [10:30:32] RECOVERY - Varnish HTTP mobile-frontend on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 3.708 second response time [10:30:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [10:31:55] i am tired [10:32:16] the nginx/varnish/mediawiki X-Forwarded-Proto stuff is giving me headaches [10:32:42] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Mon Jul 22 10:32:35 UTC 2013 [10:33:22] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [10:33:27] (CR) Hashar: "I did it manually on the instance that does not fix the issue :-(" [operations/puppet] - https://gerrit.wikimedia.org/r/75085 (owner: Hashar) [10:35:42] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:36:32] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [10:36:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:37:23] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [10:39:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [10:44:05] (PS2) Hashar: varnish: backends trust 127.0.0.1 for XFF [operations/puppet] - https://gerrit.wikimedia.org/r/75085 [10:45:04] (CR) Hashar: "I have edited the accesslist on deployment-cache-text1.pmtpa.wmflabs to include 127.0.0.0/8, that let us access https://login.wikimedia.be" [operations/puppet] - https://gerrit.wikimedia.org/r/75085 (owner: Hashar) [10:45:33] PROBLEM - Varnish HTTP mobile-frontend on cp1046 is CRITICAL: HTTP CRITICAL - No data received from host [10:46:33] RECOVERY - Varnish HTTP mobile-frontend on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 0.003 second response time [10:50:42] PROBLEM - Varnish HTTP mobile-frontend on cp1046 is CRITICAL: HTTP CRITICAL - No data received from host [10:50:52] PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:51:36] paravoid: https://gerrit.wikimedia.org/r/#/c/75087/ [10:51:42] RECOVERY - Varnish HTTP mobile-frontend on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 8.762 second response time [10:51:49] i may have gotten slightly carried away [10:52:22] (PS1) Ori.livneh: Refactor sysctl [operations/puppet] - https://gerrit.wikimedia.org/r/75087 [10:52:37] even grrrit-wm couldn't handle it [10:52:58] ori-l: I blame toollabs [10:53:12] which seems down atm [10:54:03] i'd like to believe it was overeager to review my changeset [10:54:10] but to each his own theory, you know. [10:54:20] hmm, grrrit-wm should be made to review changes. [10:54:52] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Jul 22 10:54:48 UTC 2013 [10:55:32] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [10:56:36] ori-l: oh wow [10:56:42] PROBLEM - Varnish HTTP mobile-frontend on cp1046 is CRITICAL: HTTP CRITICAL - No data received from host [10:56:49] actually I have a completely different take tbh [10:57:00] first of all role::sysctl sounds wrong [10:57:19] setting a sysctl value is not a role [10:57:32] yeah, i thought about that, probably true [10:57:38] I would actually put sysctl calls inside role classes [10:57:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:57:52] and get rid of all those "advanced-routing" files or whatever [10:57:52] RECOVERY - Disk space on labstore3 is OK: DISK OK [10:58:52] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jul 22 10:58:46 UTC 2013 [10:58:52] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Jul 22 10:58:51 UTC 2013 [10:58:56] just inline them, i.e. sysctl { 'net.ipv6.conf.all.accept_ra': value => '0' } [10:59:10] well, in the interest of not making a complicated change more complicated, i just reproduced the pattern that already existed in each manifest [10:59:17] i did that for ceph, for example, since that's what you had in place [10:59:22]