[00:01:09] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 21:59:40 UTC [00:19:56] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sun Jul 13 00:19:52 UTC 2014 [02:12:44] !log migratePass0.php finished a while back [02:12:54] Logged the message, Master [02:14:35] !log LocalisationUpdate completed (1.24wmf12) at 2014-07-13 02:13:32+00:00 [02:14:40] Logged the message, Master [02:24:59] !log LocalisationUpdate completed (1.24wmf13) at 2014-07-13 02:23:56+00:00 [02:25:04] Logged the message, Master [02:54:39] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 13 02:53:33 UTC 2014 (duration 53m 32s) [02:54:45] Logged the message, Master [05:41:22] (03CR) 10Giuseppe Lavagetto: [C: 031] "I erroneously did not remove this cipher. Apart from having !DH at the end of the chiphers list, we do not set a dh_param, so that for ngi" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145688 (owner: 10Dzahn) [06:08:38] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:29:04] PROBLEM - puppet last run on search1010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:24] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.005 second response time [06:30:14] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:24] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:24] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:44] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:24] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [06:44:05] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:45:25] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:15] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:46:25] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:25] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:46:45] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:39:39] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: Puppet has 1 failures [10:40:09] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: Puppet has 1 failures [10:40:19] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 1 failures [10:43:19] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [10:46:20] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet has 1 failures [10:46:20] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: Puppet has 1 failures [10:58:09] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:58:09] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:58:20] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [10:58:21] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [11:03:22] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:04:21] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [12:21:27] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [12:39:25] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [13:16:28] (03PS1) 10Matanya: gitblit: fully qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/145894 [14:46:37] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [15:33:44] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [17:17:42] PROBLEM - puppetmaster https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:18:32] RECOVERY - puppetmaster https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.095 second response time [18:31:58] (03PS1) 10Ori.livneh: apache: on service refresh, do a graceful reload instead of start/stop [operations/puppet] - 10https://gerrit.wikimedia.org/r/145908 [20:01:58] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [20:20:45] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [20:20:45] PROBLEM - Packetloss_Average on analytics1003 is CRITICAL: packet_loss_average CRITICAL: 10.4707055 [20:30:45] RECOVERY - Packetloss_Average on analytics1003 is OK: packet_loss_average OKAY: 1.88484675 [20:46:18] PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 20.2849271667 [20:46:48] PROBLEM - Packetloss_Average on analytics1003 is CRITICAL: packet_loss_average CRITICAL: 21.0789915833 [20:53:12] (03PS1) 10BryanDavis: labs_vagrant: Install to /srv/vagrant [operations/puppet] - 10https://gerrit.wikimedia.org/r/145974 [20:53:14] (03PS1) 10BryanDavis: labs_vagrant: cleanup sudoers config [operations/puppet] - 10https://gerrit.wikimedia.org/r/145975 [20:56:01] (03PS2) 10BryanDavis: labs_vagrant: Install to /srv/vagrant [operations/puppet] - 10https://gerrit.wikimedia.org/r/145974 [20:59:06] (03PS3) 10BryanDavis: labs_vagrant: Install to /srv/vagrant [operations/puppet] - 10https://gerrit.wikimedia.org/r/145974 [21:00:55] (03PS4) 10BryanDavis: labs_vagrant: Install to /srv/vagrant [operations/puppet] - 10https://gerrit.wikimedia.org/r/145974 [21:03:53] !log git-deploy: Deploying integration/slave-scripts I7f2b476807465 [21:03:59] Logged the message, Master [21:16:16] RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 1.22043441667 [21:16:56] (03PS1) 10QChris: Reflect move of refinery script to drop partitions [operations/puppet] - 10https://gerrit.wikimedia.org/r/145980 [21:17:36] (03CR) 10QChris: "This change depends on" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145980 (owner: 10QChris) [21:30:43] PROBLEM - Packetloss_Average on analytics1003 is CRITICAL: packet_loss_average CRITICAL: 10.8208922689 [21:32:14] PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 13.9927621008 [21:33:00] (03PS5) 10BryanDavis: labs_vagrant: Install to /srv/vagrant [operations/puppet] - 10https://gerrit.wikimedia.org/r/145974 [21:36:14] RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 1.36625383333 [21:40:50] RECOVERY - Packetloss_Average on analytics1003 is OK: packet_loss_average OKAY: 1.80645991597 [21:44:50] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [21:46:21] (03CR) 10BryanDavis: "Tested via cherry-pick to self-hosted puppetmaster on bd808-vagrant.wmflabs.net instance in wikimedia-support project." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145974 (owner: 10BryanDavis) [21:46:51] (03PS2) 10BryanDavis: labs_vagrant: cleanup sudoers config [operations/puppet] - 10https://gerrit.wikimedia.org/r/145975 [21:52:11] PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 23.4265182353 [21:56:49] bd808: Do you know anything about 'cobalt'? [21:56:50] PROBLEM - Packetloss_Average on analytics1003 is CRITICAL: packet_loss_average CRITICAL: 42.41308275 [21:56:58] Doesn't show up in http://ganglia.wikimedia.org/latest/ and not documented on wikitech [21:57:03] bd808: ty for the patches! [21:57:12] Noticed it got an IP recently; https://gerrit.wikimedia.org/r/#/c/139080/ [21:57:31] Krinkle: Hmm.. no I haven't hear about it [21:57:51] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [21:58:10] YuviPanda: yw! I was playing with labs-vagrant an ran my disk up to 98% :) [21:58:19] :D [21:59:43] YuviPanda: I was wondering if the labs_vagrant role should run `labs-vagrant provision` too. Thoughts? [22:00:01] bd808: right, but that would mean running provision every time puppet parent runs (30m?) [22:00:44] yeah, it would, but "usually" it should be a slow no-op right? [22:01:16] It would setup the wiki the first time. Maybe it could be done with a notify for the initial provision instead. [22:01:28] I hate notify though. It makes puppet non-deterministic [22:04:17] Krinkle: cobalt doesn't seem to be in site.pp, so I'd guess it's a new misc sever for something that isn't provisioned yet. [22:04:32] bd808: right. I'm 50/50 on that, but don't mind either way. [22:05:04] https://github.com/search?q=cobalt.wikimedia.org+%40wikimedia&type=Code&ref=searchresults [22:05:28] bd808: Krinkle It's currently awaiting a new hard drive according to RT [22:05:35] https://github.com/wikimedia/operations-puppet/search?q=cobalt&ref=cmdform [22:05:38] so it's likely unassigned and down as being dead [22:05:39] Yeah [22:05:45] k [22:12:50] !log stopping puppet on rcs1001 to debug nginx issue [22:12:55] Logged the message, Master [22:36:18] RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 1.67081932773 [22:41:53] RECOVERY - Packetloss_Average on analytics1003 is OK: packet_loss_average OKAY: 0.146342857143