[00:12:36] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [00:23:08] yurikR: yes [00:23:31] bblack, ok, submiting in a min [00:28:31] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [00:31:53] jgonera: [00:31:58] dammit [00:32:10] is Jeff_Green around ? [00:36:17] (03PS1) 10Yurik: Added 404-01 https support [operations/puppet] - 10https://gerrit.wikimedia.org/r/152835 [00:36:18] bblack, ^ [00:37:47] (03CR) 10BBlack: [C: 032 V: 032] Added 404-01 https support [operations/puppet] - 10https://gerrit.wikimedia.org/r/152835 (owner: 10Yurik) [00:37:50] thx [00:38:07] np [00:38:13] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [00:40:11] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 96 seconds ago with 0 failures [00:40:12] PROBLEM - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC [00:50:35] Jeff_Green: when you see this: please revert putting nrpe on ocg. it is already called via base class which included in standard which is already declared on ocg hosts in site [00:50:43] .pp [00:51:31] if you won't push such a patch or don't see this message i'll push such a patch tomorrow [00:53:32] (03PS2) 10Gage: Deb for logstash-gelf.jar: liblogstash-gelf-java [operations/debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/151615 [00:57:03] (03CR) 10Gage: "Fixed trailing whitespace, clarified README.source." [operations/debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/151615 (owner: 10Gage) [01:05:09] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: Epic puppet fail [01:06:48] (03PS1) 10Yurik: Fix kafka & udp2log filtering of ZERO [operations/puppet] - 10https://gerrit.wikimedia.org/r/152836 [01:17:35] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [01:19:01] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [01:24:55] akosiaris, andrewbogott_afk: apologies for breaking puppet on labsdb1003 earlier [01:25:03] (and thanks for the revert) [01:32:10] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 15:28:39 UTC [01:33:26] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [01:33:26] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 13 data above and 8 below the confidence bounds [01:44:04] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [02:07:48] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [02:12:49] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [02:17:16] !log LocalisationUpdate completed (1.24wmf15) at 2014-08-08 02:16:13+00:00 [02:17:27] Logged the message, Master [02:23:05] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [02:29:55] !log LocalisationUpdate completed (1.24wmf16) at 2014-08-08 02:28:39+00:00 [02:30:02] Logged the message, Master [02:41:02] PROBLEM - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC [02:48:40] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [03:13:28] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Aug 8 03:12:21 UTC 2014 (duration 12m 20s) [03:13:32] Logged the message, Master [03:18:48] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [03:30:08] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [03:33:20] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 15:28:39 UTC [03:35:11] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 01:34:30 UTC [04:43:16] PROBLEM - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC [05:13:07] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Fri Aug 8 05:13:02 UTC 2014 [05:34:51] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 15:28:39 UTC [05:56:16] (03PS1) 10BBlack: Switch uploads to temporary map for ulsfo turnup [operations/dns] - 10https://gerrit.wikimedia.org/r/152844 [06:02:30] (03CR) 10BBlack: [C: 032] Switch uploads to temporary map for ulsfo turnup [operations/dns] - 10https://gerrit.wikimedia.org/r/152844 (owner: 10BBlack) [06:03:02] !log ongoing schema changes: rev_content_model, rev_content_format. on terbium, osc_host.sh processes ok to kill in emergency [06:03:08] Logged the message, Master [06:06:58] (03PS1) 10BBlack: move canada uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152845 [06:08:45] (03CR) 10BBlack: [C: 032] move canada uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152845 (owner: 10BBlack) [06:23:00] (03PS1) 10Springle: Remove mysql_multi_instance from labsdb1003, for upgrade. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152846 [06:24:00] (03CR) 10Springle: [C: 032] Remove mysql_multi_instance from labsdb1003, for upgrade. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152846 (owner: 10Springle) [06:29:34] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: Epic puppet fail [06:29:34] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:54] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Epic puppet fail [06:29:54] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Epic puppet fail [06:30:55] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:05] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:05] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:47] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:47] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:47] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:48] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:48] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:48] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:48] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:49] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:49] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:50] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:50] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:51] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:57] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:58] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:58] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:07] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:08] PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:08] PROBLEM - puppet last run on mw1217 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:08] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:17] PROBLEM - puppet last run on mw1088 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:18] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:38] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:39:07] ^? [06:39:52] eh seems to be 500 server error on palladium puppet http server [06:44:33] PROBLEM - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC [06:44:43] RECOVERY - puppet last run on mw1217 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:44:57] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:46:05] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:46:05] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:05] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:46:05] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:05] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 80 seconds ago with 0 failures [06:46:06] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:46:06] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 68 seconds ago with 0 failures [06:46:06] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:46:15] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:46:15] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 97 seconds ago with 0 failures [06:46:15] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 83 seconds ago with 0 failures [06:46:15] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:46:25] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 87 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:47:05] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:47:05] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:47:14] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:47:15] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:47:15] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:47:24] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:48:54] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [07:01:28] (03PS1) 10BBlack: move california uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152847 [07:02:14] (03CR) 10BBlack: [C: 032] move california uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152847 (owner: 10BBlack) [07:15:24] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Unluckily this does not work, due to strange behaviours in both apache and hhvm" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152753 (owner: 10Ori.livneh) [07:23:05] (03PS2) 10Giuseppe Lavagetto: hhvm: fix status [operations/puppet] - 10https://gerrit.wikimedia.org/r/152079 [07:35:34] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 15:28:39 UTC [07:39:11] (03PS3) 10Giuseppe Lavagetto: hhvm: fix status [operations/puppet] - 10https://gerrit.wikimedia.org/r/152079 [07:39:32] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] hhvm: fix status [operations/puppet] - 10https://gerrit.wikimedia.org/r/152079 (owner: 10Giuseppe Lavagetto) [07:40:31] doh [07:40:39] i was just about to amend [07:40:41] <_joe_> ? [07:40:47] but yeah, +1 for the overall approach [07:41:01] <_joe_> oh, wait, what did you want to amend? [07:41:24] i'll submit as a separate patch [07:41:27] good morning btw :) [07:41:30] <_joe_> yes please [07:41:35] <_joe_> good morning :) [07:44:39] (03PS1) 10Ori.livneh: hhvm::status -> hhvm::admin + small lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/152849 [07:44:54] ^ _joe_ [07:45:23] (03CR) 10jenkins-bot: [V: 04-1] hhvm::status -> hhvm::admin + small lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/152849 (owner: 10Ori.livneh) [07:45:29] grr [07:46:18] (03PS2) 10Ori.livneh: hhvm::status -> hhvm::admin + small lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/152849 [07:50:26] (03CR) 10Giuseppe Lavagetto: [C: 032] hhvm::status -> hhvm::admin + small lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/152849 (owner: 10Ori.livneh) [07:55:46] (03PS2) 10Giuseppe Lavagetto: hhvm: add process monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152080 [07:57:14] (03PS3) 10Giuseppe Lavagetto: hhvm: add process monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152080 [07:57:47] <_joe_> ori: ^ [07:58:59] (03PS2) 10Giuseppe Lavagetto: mediawiki: basic HHVM monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152081 [07:59:21] (03PS1) 10Ori.livneh: hhvm::admin: bring back $ensure [operations/puppet] - 10https://gerrit.wikimedia.org/r/152850 [07:59:30] shhh [08:00:19] _joe_: https://gerrit.wikimedia.org/r/#/c/152080/ looks good but -- [08:00:38] the ':100' is a bit funny, i think you can just make it '${process_count}:' to indicate no upper limit' [08:01:05] <_joe_> ori: mmmh the docs suggest otherwise [08:01:14] <_joe_> but let me try it [08:01:26] RANGEs are specified 'min:max' or 'min:' or ':max' (or 'max'). [08:01:28] https://www.monitoring-plugins.org/doc/man/check_procs.html [08:01:53] <_joe_> mmmh, I looked at the inline help on one server [08:02:35] <_joe_> and yes, as usual, the site is more updated than the code [08:02:37] manifests/swift.pp:312: nrpe_command => "/usr/lib/nagios/plugins/check_procs -c 1: --ereg-argument-array='^/usr/bin/python /usr/bin/${title}'", [08:02:42] <_joe_> yes, it works [08:02:52] <_joe_> btw, ereg is not needed here [08:03:02] <_joe_> luckily [08:03:23] yeah just grepped for other instances of 'min:' [08:03:46] (03CR) 10Giuseppe Lavagetto: [C: 032] hhvm::admin: bring back $ensure [operations/puppet] - 10https://gerrit.wikimedia.org/r/152850 (owner: 10Ori.livneh) [08:06:59] sorry about that :P [08:07:04] (03PS3) 10Giuseppe Lavagetto: mediawiki: basic HHVM monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152081 [08:07:19] (03CR) 10Ori.livneh: [C: 031] mediawiki: basic HHVM monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152081 (owner: 10Giuseppe Lavagetto) [08:09:35] (03PS4) 10Giuseppe Lavagetto: hhvm: add process monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152080 [08:14:49] (03CR) 10Giuseppe Lavagetto: [C: 032] hhvm: add process monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152080 (owner: 10Giuseppe Lavagetto) [08:15:07] (03PS2) 10Giuseppe Lavagetto: hhvm::admin: bring back $ensure [operations/puppet] - 10https://gerrit.wikimedia.org/r/152850 (owner: 10Ori.livneh) [08:15:08] <_joe_> ouch [08:15:23] <_joe_> I did only 'publish' and not 'publish and submit' [08:15:24] <_joe_> my bad [08:15:31] (03CR) 10Giuseppe Lavagetto: [V: 032] hhvm::admin: bring back $ensure [operations/puppet] - 10https://gerrit.wikimedia.org/r/152850 (owner: 10Ori.livneh) [08:15:44] (03PS4) 10Giuseppe Lavagetto: mediawiki: basic HHVM monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152081 [08:15:54] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: basic HHVM monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152081 (owner: 10Giuseppe Lavagetto) [08:21:14] (03PS1) 10BBlack: move the rest of the US uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152851 [08:22:25] (03PS2) 10Giuseppe Lavagetto: mediawiki: move testwiki to HAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/152082 [08:22:32] (03CR) 10jenkins-bot: [V: 04-1] mediawiki: move testwiki to HAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/152082 (owner: 10Giuseppe Lavagetto) [08:22:34] (03PS3) 10Giuseppe Lavagetto: mediawiki: move testwiki to HAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/152082 [08:33:41] (03PS1) 10Ori.livneh: wmflib: add apply_format() [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 [08:33:48] ..breakfast for real now [08:43:52] (03CR) 10BBlack: [C: 032] move the rest of the US uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152851 (owner: 10BBlack) [08:44:29] PROBLEM - puppetmaster https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:13] PROBLEM - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC [08:45:13] <_joe_> which is not true [08:45:19] RECOVERY - puppetmaster https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.038 second response time [08:45:25] <_joe_> oh there is one defunct apache child [09:04:59] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: Epic puppet fail [09:08:03] <_joe_> mmmh that doesn't look good [09:08:38] traffic's ok at routers still [09:08:48] <_joe_> yes it's a puppet master failure [09:09:06] Could not retrieve catalog from remote server: Error 502 on SERVER: [09:09:11] <_joe_> yes [09:09:13] being seeing those more an dmore lately [09:09:14] <_joe_> thank you ruby [09:09:30] <_joe_> bblack: it's the puppet runs every 20 mins I guess [09:09:40] (03Abandoned) 10Dzahn: remove chwiki.wordpress from French planet feed [operations/puppet] - 10https://gerrit.wikimedia.org/r/152786 (owner: 10Dzahn) [09:09:43] <_joe_> plus us still running the puppetmaster on ruby 1.8 [09:10:12] yeah looking at palladium, it just plain runs out of CPU power during little spikes [09:10:26] <_joe_> yes [09:10:42] <_joe_> trusty, thus ruby 2.0, will heal that probably [09:11:00] ruby 2.0 will fix the world! [09:11:22] <_joe_> no, but is supposed to be less horribly inefficient than ruby 1.8 [09:12:06] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: move testwiki to HAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/152082 (owner: 10Giuseppe Lavagetto) [09:16:37] This is a recurring topic ^^ [09:17:25] (03PS1) 10Dzahn: fix Apache site setup for Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/152854 [09:19:05] <_joe_> Nemo_bis: I know [09:19:20] <_joe_> buongiorno, btw :) [09:21:59] PROBLEM - Puppet freshness on mw1130 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 07:21:24 UTC [09:23:16] <_joe_> I'll reinstall testwiki in 5 minutes tops [09:23:25] <_joe_> just FYI [09:23:59] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [09:29:17] <_joe_> !log reimaging mw1017 aka testwiki. [09:29:24] Logged the message, Master [09:35:59] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 15:28:39 UTC [09:37:35] (03PS1) 10Giuseppe Lavagetto: test.wp.org: move to mw1018 [operations/puppet] - 10https://gerrit.wikimedia.org/r/152858 [09:39:40] the osmium puppet freshness is me, i forgot to re-enable puppet after experimenting with apache [09:39:40] PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Puppet last ran 65407 seconds ago, expected 14400 [09:39:42] fixing now [09:40:37] <_joe_> ori: yes figured that. osmium is a playground [09:40:44] <_joe_> so no need to log it [09:40:54] <_joe_> if you do, it's nice [09:40:59] RECOVERY - Puppet freshness on osmium is OK: puppet ran at Fri Aug 8 09:40:53 UTC 2014 [09:41:40] RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [09:52:07] (03PS1) 10Giuseppe Lavagetto: HAT: fix class inclusion [operations/puppet] - 10https://gerrit.wikimedia.org/r/152862 [09:52:34] (03CR) 10Giuseppe Lavagetto: [C: 032] HAT: fix class inclusion [operations/puppet] - 10https://gerrit.wikimedia.org/r/152862 (owner: 10Giuseppe Lavagetto) [09:52:44] (03CR) 10Giuseppe Lavagetto: [V: 032] HAT: fix class inclusion [operations/puppet] - 10https://gerrit.wikimedia.org/r/152862 (owner: 10Giuseppe Lavagetto) [09:55:19] (03PS2) 10Alexandros Kosiaris: osm.planet sync up [operations/puppet] - 10https://gerrit.wikimedia.org/r/136740 [10:00:09] 'giorno _joe_ [10:00:19] (03PS1) 10Alexandros Kosiaris: Handle daemon restarts [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/152863 [10:01:38] (03CR) 10Alexandros Kosiaris: [C: 032] osm.planet sync up [operations/puppet] - 10https://gerrit.wikimedia.org/r/136740 (owner: 10Alexandros Kosiaris) [10:04:55] (03PS1) 10BBlack: move oceania uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152865 [10:05:27] (03CR) 10BBlack: [C: 032] move oceania uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152865 (owner: 10BBlack) [10:10:08] (03PS2) 10RobH: decom tantalum, former OCG QA box [operations/puppet] - 10https://gerrit.wikimedia.org/r/152739 (owner: 10Dzahn) [10:11:37] (03CR) 10RobH: [C: 032] decom tantalum, former OCG QA box [operations/puppet] - 10https://gerrit.wikimedia.org/r/152739 (owner: 10Dzahn) [10:12:59] PROBLEM - puppet last run on mw1159 is CRITICAL: CRITICAL: Puppet has 1 failures [10:15:04] (03PS2) 10Ori.livneh: wmflib: add apply_format() [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 [10:15:06] (03PS1) 10Ori.livneh: apache: add env-{available,enabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 [10:15:22] ^ akosiaris: very interested in your feedback on the last one especially :) [10:15:51] (03CR) 10jenkins-bot: [V: 04-1] apache: add env-{available,enabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 (owner: 10Ori.livneh) [10:16:04] jenkins has excellent comedic timing [10:16:13] <_joe_> lol [10:16:14] <_joe_> :) [10:16:17] hahaha [10:17:29] (03CR) 10Filippo Giunchedi: [C: 031] wmflib: add apply_format() [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 (owner: 10Ori.livneh) [10:17:53] (03PS2) 10Ori.livneh: apache: add env-{available,enabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 [10:19:05] yuvipanda: the migration you were expecting regarding postgresql in labs is done. [10:20:35] PROBLEM - Host mw1130 is DOWN: PING CRITICAL - Packet loss = 100% [10:20:56] ? [10:23:52] (03PS1) 10Ori.livneh: mediawiki::hhvm: require 'apache' user [operations/puppet] - 10https://gerrit.wikimedia.org/r/152867 [10:31:56] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [10:33:28] (03PS1) 10Ori.livneh: apache::def: port to env-{enabled,disabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 [10:34:07] (03CR) 10jenkins-bot: [V: 04-1] apache::def: port to env-{enabled,disabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 (owner: 10Ori.livneh) [10:37:33] (03PS1) 10Ori.livneh: Partially revert 9393f239ff, moving dist-check in-line [operations/puppet] - 10https://gerrit.wikimedia.org/r/152869 [10:38:04] (03CR) 10Ori.livneh: [C: 032 V: 032] Partially revert 9393f239ff, moving dist-check in-line [operations/puppet] - 10https://gerrit.wikimedia.org/r/152869 (owner: 10Ori.livneh) [10:41:11] (03PS2) 10Ori.livneh: apache::def: port to env-{enabled,disabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 [10:43:03] (03CR) 10Giuseppe Lavagetto: [C: 031] mediawiki::hhvm: require 'apache' user [operations/puppet] - 10https://gerrit.wikimedia.org/r/152867 (owner: 10Ori.livneh) [10:45:31] PROBLEM - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC [10:47:07] (03PS2) 10Ori.livneh: mediawiki::hhvm: require 'apache' user [operations/puppet] - 10https://gerrit.wikimedia.org/r/152867 [10:47:12] (03CR) 10Ori.livneh: [C: 032 V: 032] mediawiki::hhvm: require 'apache' user [operations/puppet] - 10https://gerrit.wikimedia.org/r/152867 (owner: 10Ori.livneh) [10:52:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] "There is a syntax error. The function looks fine to me, not sure about the use cases yet though" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 (owner: 10Ori.livneh) [10:52:56] any updates on how long before test wiki is live again? [10:53:07] we need to run some user tests at wikimania [10:53:40] <_joe_> moizsyed: we're working on it, we hope to get back in an hour tops [10:53:46] moizsyed: can't you use test2wiki or beta? [10:54:02] (03CR) 10Alexandros Kosiaris: "Heh. looking at the use cases in https://gerrit.wikimedia.org/r/#/c/152866/2/modules/apache/manifests/init.pp right now" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 (owner: 10Ori.livneh) [10:55:18] we're trying to test editing on apps, and test wiki is the only wiki the tests are setup on [10:55:41] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [10:55:41] we'll wait an hour i guess [10:55:57] (didn't really follow up) [10:56:16] yuvipanda: how hard would it be to enable editing on test2? [10:56:30] is it hard-coded in the app, or is it in mediawiki-config? [10:56:41] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [10:56:49] test2wiki doesnt even show up on apps, so maybe it is hard coded [10:56:53] does syncing to mw1017 work? [10:57:06] hoo: not yet [10:57:10] mh [10:57:12] ori: its all good, do your thing, we'll run user tests in an hour when you guys are done [10:57:18] ok, so you'll need to run sync-common on it :P [10:57:24] hoo: i know :) [10:57:32] :) [10:57:42] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [10:57:42] moizsyed: if enabling editing on test2 is just a matter of toggling a config var, that's easier [10:59:10] ori: im not sure if i can get to test2 on apps at all, at least not right now [11:00:27] (03CR) 10Alexandros Kosiaris: [C: 031] "I like this. A more clean approach to maintaining environment variables suits me fine" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 (owner: 10Ori.livneh) [11:01:00] !log hoo Synchronized php-1.24wmf15/extensions/CentralAuth/: Another shot towards bug 39996 (duration: 01m 04s) [11:01:06] Logged the message, Master [11:01:15] mw1130 also is awry it seems [11:02:40] !log hoo Synchronized php-1.24wmf16/extensions/CentralAuth/: Another shot towards bug 39996 (duration: 01m 04s) [11:02:45] Logged the message, Master [11:03:03] (03CR) 10Giuseppe Lavagetto: [C: 031] "love this." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 (owner: 10Ori.livneh) [11:03:50] thanks hoo, let's cross fingers :) [11:03:59] (03CR) 10Alexandros Kosiaris: [C: 04-1] apache::def: port to env-{enabled,disabled} (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 (owner: 10Ori.livneh) [11:04:18] Nemo_bis: Yeah :) I hope that was THE race condition causing all this trobule [11:05:00] <_joe_> hoo: I think mw1130 has some issues, checking [11:05:14] _joe_: can't ssh into it... :S [11:05:23] _joe_: I 'll do that. go back to getting test wiki running [11:05:27] <_joe_> mmmh ok [11:05:35] packet loss even, might be totally donw [11:05:41] <_joe_> akosiaris: I'm waiting for the code deploy to finish [11:06:56] got in console... doesn't look too good [11:07:07] all I get is [434531.619083] [11:07:25] i 'll powercycle [11:09:21] !log running rsync-common on mw1017 [11:09:26] Logged the message, Master [11:09:41] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [11:12:14] (03CR) 10Ori.livneh: apache::def: port to env-{enabled,disabled} (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 (owner: 10Ori.livneh) [11:25:31] hashar: hi! [11:25:51] legoktm: hello [11:25:52] hashar: is there a way to make a certain jenkins job non-voting for a branch? [11:26:18] specifically the "RL2" branch of the Gadgets repo is failing jslint, and we want to fix it in a follow-up commit [11:26:34] but we don't want to turn it off for master obviously [11:28:44] legoktm: no way I know of :-/ [11:29:12] legoktm: if you know you are going to fix jshint, just force merge the patch failling [11:29:20] and whenever the jshint commit lands, problem solved :] [11:29:41] !log mw1130 has broken disk [11:29:46] Logged the message, Master [11:30:44] legoktm: though we can restrict a job to only run on certains branches apparently. In Zuul , the jobs can accept a branch: parameter ( http://ci.openstack.org/zuul/zuul.html#jobs ) [11:30:50] RECOVERY - Host mw1130 is UP: PING OK - Packet loss = 0%, RTA = 4.59 ms [11:32:43] <_joe_> !log rebooting mw1017 [11:32:48] Logged the message, Master [11:33:00] PROBLEM - Apache HTTP on mw1130 is CRITICAL: Connection refused [11:33:10] PROBLEM - SSH on mw1130 is CRITICAL: Connection refused [11:33:10] PROBLEM - puppet disabled on mw1130 is CRITICAL: Connection refused by host [11:33:11] PROBLEM - check if dhclient is running on mw1130 is CRITICAL: Connection refused by host [11:33:11] PROBLEM - nutcracker process on mw1130 is CRITICAL: Connection refused by host [11:33:11] PROBLEM - RAID on mw1130 is CRITICAL: Connection refused by host [11:33:30] PROBLEM - Disk space on mw1130 is CRITICAL: Connection refused by host [11:33:31] PROBLEM - puppet last run on mw1130 is CRITICAL: Connection refused by host [11:34:00] PROBLEM - nutcracker port on mw1130 is CRITICAL: Timeout while attempting connection [11:34:00] PROBLEM - check configured eth on mw1130 is CRITICAL: Timeout while attempting connection [11:34:00] PROBLEM - DPKG on mw1130 is CRITICAL: Timeout while attempting connection [11:34:25] (03CR) 10Ori.livneh: wmflib: add apply_format() (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 (owner: 10Ori.livneh) [11:36:40] ACKNOWLEDGEMENT - Host mw1130 is DOWN: PING CRITICAL - Packet loss = 100% alexandros kosiaris bad disk [11:37:15] ACKNOWLEDGEMENT - Apache HTTP on mw1130 is CRITICAL: Connection timed out alexandros kosiaris bad disk [11:37:15] ACKNOWLEDGEMENT - DPKG on mw1130 is CRITICAL: Timeout while attempting connection alexandros kosiaris bad disk [11:37:15] ACKNOWLEDGEMENT - Disk space on mw1130 is CRITICAL: Timeout while attempting connection alexandros kosiaris bad disk [11:37:15] ACKNOWLEDGEMENT - NTP on mw1130 is CRITICAL: NTP CRITICAL: No response from NTP server alexandros kosiaris bad disk [11:37:16] ACKNOWLEDGEMENT - Puppet freshness on mw1130 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 07:21:24 UTC alexandros kosiaris bad disk [11:37:16] ACKNOWLEDGEMENT - RAID on mw1130 is CRITICAL: Timeout while attempting connection alexandros kosiaris bad disk [11:37:16] ACKNOWLEDGEMENT - SSH on mw1130 is CRITICAL: Connection timed out alexandros kosiaris bad disk [11:37:17] ACKNOWLEDGEMENT - check configured eth on mw1130 is CRITICAL: Timeout while attempting connection alexandros kosiaris bad disk [11:37:17] ACKNOWLEDGEMENT - check if dhclient is running on mw1130 is CRITICAL: Timeout while attempting connection alexandros kosiaris bad disk [11:37:18] ACKNOWLEDGEMENT - nutcracker port on mw1130 is CRITICAL: Timeout while attempting connection alexandros kosiaris bad disk [11:37:18] ACKNOWLEDGEMENT - nutcracker process on mw1130 is CRITICAL: Timeout while attempting connection alexandros kosiaris bad disk [11:37:19] ACKNOWLEDGEMENT - puppet disabled on mw1130 is CRITICAL: Timeout while attempting connection alexandros kosiaris bad disk [11:37:19] ACKNOWLEDGEMENT - puppet last run on mw1130 is CRITICAL: Timeout while attempting connection alexandros kosiaris bad disk [11:42:29] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: Puppet has 1 failures [11:45:11] (03PS1) 10Ori.livneh: mediawiki: fix pidfile location in apache2.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/152896 [11:45:32] (03CR) 10Ori.livneh: [C: 032 V: 032] "per joe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152896 (owner: 10Ori.livneh) [11:46:09] (03CR) 10QChris: [C: 031] Fix kafka & udp2log filtering of ZERO (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152836 (owner: 10Yurik) [11:48:28] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [11:51:07] hashar: errr, did jenkins stop? [11:52:20] legoktm: ah yeah [11:54:05] legoktm: unlocked somehow [11:55:38] PROBLEM - Apache HTTP on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:56:28] RECOVERY - Apache HTTP on mw1017 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 444 bytes in 0.995 second response time [12:03:28] !log ori rebuilt wikiversions.cdb and synchronized wikiversions files: (no message) [12:03:34] Logged the message, Master [12:04:28] bd808: how do i rebuild the localisation cache again? [12:04:40] :P [12:04:54] hoo: ..? [12:05:12] mw-update-l10n [12:05:27] oh, right [12:07:39] !log ifconfig br0 0.0.0.0 on platinum to get rid of the IP on that interface and have facter work more reliably. This does not matter right now as it is an evaluation machine but logging it for completeness [12:07:44] Logged the message, Master [12:13:37] bd808: Exception from line 1327 of /usr/local/apache/common-local/php-1.24wmf15/includes/cache/LocalisationCache.php: Unable to open CDB file "/a/common/php-1.24wmf15/cache/l10n/l10n_cache-en.cdb.tmp.1186995746" for write. [12:13:42] wat? [12:13:42] !log reloading Jenkins [12:13:49] Logged the message, Master [12:15:50] <_joe_> ori: that dir is owned by mwdeploy [12:16:29] the update script sudos to mwdeploy [12:17:08] ori: Where and how are you running the l10n generation? [12:17:24] bd808: where and how should i run it? [12:17:47] :) Do you just need to build the cdbs from the json files? [12:17:54] yes [12:18:05] * bd808 goes to read scap code... [12:18:07] i fucking hate localisation update so much [12:20:32] sudo -u mwdeploy -n -- scap-rebuild-cdbs [12:20:45] is localisation update powered by hhvm yet? ;) [12:20:54] RECOVERY - Host platinum is UP: PING OK - Packet loss = 0%, RTA = 1.21 ms [12:21:14] ok, working now [12:21:19] <_joe_> :)) [12:21:22] where is this documented? [12:21:25] "sudo -u mwdeploy -n -- scap-rebuild-cdbs" i mean [12:21:35] not that it's anything but extremely obvious and intuitive [12:21:47] line 226 of scap/main.py :) [12:21:52] oh, right [12:21:58] silly me [12:22:04] <_joe_> lol [12:22:18] <_joe_> another 10 minutes then? [12:22:27] yes [12:22:35] hopefully less [12:23:02] <_joe_> hhvm running from cli will be slower than plain php [12:23:08] <_joe_> well, marginally so at least [12:23:10] https://test.wikipedia.org/wiki/Main_Page [12:23:30] is it many invocations again? [12:23:35] w00t [12:23:36] <_joe_> wow [12:23:55] https://test.wikipedia.org/wiki/Special:Version says it's running hhvm! [12:23:55] mark: just one for the localisation cache rebuild IIRC [12:23:57] HHVM 5.6.99-hhvm (srv) [12:23:58] \o/ [12:24:06] weeee! [12:24:12] why is it slower than zend then? [12:24:15] _joe_ is the man of the hour [12:24:32] * bd808 pats many people on the back [12:24:40] mark: it isn't; for very short-running jobs the overhead of jitting never has a chance to pay off [12:24:58] <_joe_> ori: but we turn of jitting in cli [12:25:00] yeah, hence my question. isn't localisation cache a long running job? :) [12:25:08] can't turn it on for that job? [12:25:16] ah, i guess we could [12:25:20] <_joe_> yes [12:25:28] <_joe_> ok so [12:25:31] but it didn't take all that long anyhow once bd808 helped me figure out how to actually run it [12:25:55] sebkac [12:26:15] the usual [12:26:19] l10n generation on tin is the long one. [12:26:56] too bad that testwiki has enough javascript on it to make the wiki still feel sluggish [12:27:14] is moodbar still enabled anywhere else? [12:27:41] <_joe_> I'll take a short break [12:27:48] yes, me too [12:27:53] <_joe_> ping me here or by phone if something goes wrong [12:28:01] _joe_: dude, seriously, you're awesome [12:28:03] <_joe_> ori: good job :) [12:28:04] thanks very much for everything [12:28:15] <_joe_> thank you I guess [12:28:32] <_joe_> we've got an interesting week ahead [12:28:32] somebody around for creating a new wikitech instance? [12:28:46] <_joe_> jobrunner on fastcgi, first of all [12:28:48] !log Jenkins: somehow the ArtifactDeployer plugin got upgraded on Aug 7th 20:57 UTC despite it being broken {{bug|69197}}. Attempting manual downgrade [12:28:53] Logged the message, Master [12:29:01] _joe_: mw1053 is already on fastcgi ;) [12:29:11] actually, maybe not [12:29:15] * ori checks [12:29:18] <_joe_> ori: it's stopped [12:29:38] <_joe_> ori: someone *cough* put there an unpuppetized and wrong vhost *cough* [12:29:55] <_joe_> didn't you read my PMs the other day I assume [12:30:01] hah, must've missed them [12:30:05] <_joe_> I wasn't very pleased at the time [12:30:16] <_joe_> :) [12:30:22] must've missed good judgement again [12:30:41] i set it up for aaron to test against, it wouldn't have worked on osmium [12:32:18] look over there! a three-headed monkey! [12:32:21] * ori flees [12:33:26] !log testwiki up, judgement poor [12:33:31] Logged the message, Master [12:35:20] <_joe_> two-headed-squirrel >> three-headed-monkey [12:35:33] NOT THAT WORD AGAIN [12:35:42] <_joe_> and I think you're too young to have played that ori [12:36:55] how appropriate, you fight like a cow! [12:37:29] true story: my friends and i played monkey island before we really knew english [12:37:41] so we basically had to brute-force the sword fights, and i didn't get any of the humor until years later [12:38:14] <_joe_> ori: http://i20.photobucket.com/albums/b241/ATMachine/zaktowns/zakeng7i.png [12:38:26] <_joe_> you _are_ too young [12:38:45] <_joe_> (I learned english playing lucasgames adventures, and leisure-suit larry) [12:41:45] (03PS3) 10Ori.livneh: wmflib: add apply_format() [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 [12:41:54] ok, food time [12:45:54] PROBLEM - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC [12:49:11] !log uploaded ruby-jsduck 5.3.4-1wmftrusty1 and ruby-rkelly-remix 0.0.6-1trusty1 on apt.wikimedia.org [12:49:17] Logged the message, Master [12:49:32] <_joe_> rkelly-remix ? [12:49:36] * yuvipanda waves at _joe_ and ori [12:49:47] <_joe_> :)) [12:51:28] _joe_: you don't want to know [12:52:09] <_joe_> akosiaris: no I think it's an untasteful name, just that [12:52:12] (03CR) 10QChris: Log when Internet.org in X-Analytics with proxy tag (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [12:52:55] what is even more untasteful is that it stinks of ruby community behaviour [12:54:19] <_joe_> oh my. I googled it, not without being redirected to R Kelly twice [12:54:59] ori: there was some patches to hack Monkey Island and win the pirate fights automatically :-D [12:55:24] Monkey Island 2 got revamped with a nicer look'n feel. Should definitely attempt to play it again [12:55:38] <_joe_> hashar: I did, with my step-daughter [12:55:54] <_joe_> it was both 1 and 2 [13:07:38] (03PS1) 10Mark Bergsma: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 [13:07:57] (03PS1) 10Mark Bergsma: Allocate IP addresses for HHVM app servers [operations/dns] - 10https://gerrit.wikimedia.org/r/152904 [13:08:10] _joe_: ^ [13:08:18] (03CR) 10jenkins-bot: [V: 04-1] Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [13:08:24] <_joe_> mark: thanks :) [13:09:35] (03PS2) 10Mark Bergsma: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 [13:10:20] (03CR) 10jenkins-bot: [V: 04-1] Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [13:12:21] (03PS3) 10Mark Bergsma: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 [13:15:53] (03CR) 10Giuseppe Lavagetto: Separate HHVM app servers backend. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [13:19:50] (03CR) 10Mark Bergsma: Separate HHVM app servers backend. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [13:22:20] _joe_: i can setup the LVS service also, but I think you've done it before? [13:22:43] <_joe_> mark: nope [13:23:02] <_joe_> that was my work for today actually [13:23:09] oh hehe [13:23:20] <_joe_> installing testwiki proved worse than anticipated [13:23:31] no worries, i'll do it [13:30:04] (03PS1) 10Mark Bergsma: Create internal LVS cluster 'hhvm_appservers' [operations/puppet] - 10https://gerrit.wikimedia.org/r/152908 [13:30:06] (03PS1) 10Mark Bergsma: Add monitoring for LVS service hhvm-appservers.svc.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/152909 [13:31:54] tyg [13:35:15] <_joe_> mark: I was looking at the vcl_hash you added for marking the appservers [13:35:32] yes? [13:35:48] <_joe_> no I think I got it [13:36:18] alternatively we can just hash the string "hhvm" [13:36:23] <_joe_> it wasn't clear to me how hash_data works [13:36:36] hash data just adds a string to the hash [13:36:39] <_joe_> (it isn't from the docs) [13:36:40] <_joe_> yes [13:36:41] the default vcl_hash does the usual [13:36:45] and this just adds something [13:36:50] which makes it different from the default backend [13:37:05] <_joe_> ok, I got it right from the examples [13:37:34] this vcl_hash runs first, then the default built in one runs after that [13:37:38] <_joe_> I pretty much forgot most of the varnish DSL [13:37:43] hehe [13:37:44] <_joe_> yes that I know [13:37:49] yeah that's one reason why i'm doing that [13:38:04] i might get rusty if I don't do that every once in a while ;-) [13:38:09] <_joe_> eheh [13:38:35] (03CR) 10Giuseppe Lavagetto: [C: 031] Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [13:38:59] so this is not tested [13:39:01] you should probably do that [13:39:06] <_joe_> I will [13:39:08] i used to have a nice labs instance for that [13:39:13] but it got killed in the eqiad migration ;) [13:39:25] also something needs to set that cookie [13:39:30] <_joe_> I will set up one somewhere :) [13:39:36] :) [13:39:40] <_joe_> like, curl :) [13:39:41] beta could work too of course [13:39:58] <_joe_> mark: I think we should anyway inspect the value of the cookie [13:40:16] what do you mean? [13:40:34] <_joe_> oh sorry [13:40:35] <_joe_> =true [13:40:44] <_joe_> I also checked that before [13:40:49] (03CR) 10Hashar: "I guess it can be cherry picked on the beta cluster to test it out. The Zend PHP instances (deployment-apache{01,02} ) are locked though " [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [13:40:52] <_joe_> nevermind [13:42:09] <_joe_> mark: I'll test this in beta, probably [13:42:45] ok [13:44:39] _joe_: the Zend PHP instances on beta are locked [13:44:44] PROBLEM - puppet last run on amslvs2 is CRITICAL: CRITICAL: Epic puppet fail [13:44:52] due to nscld.conf being overridden when some package got upgraded [13:45:41] (03CR) 10Hashar: Separate HHVM app servers backend. (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [13:46:13] and I don't know how to fix them using salt :/ [13:46:37] <_joe_> hashar: I'll help you later [13:46:39] <_joe_> sorry [13:46:41] <_joe_> not now [13:47:11] yeah just commented on the Gerrit change to separate hhvm app servers backend [13:49:50] mark: hey. is it ok to create a new ldap group for graphite access? (it currently is restricted to the ldap group wmf, but i guess there should only be emplyees of the foundation in that...; these tools have a similar problem: ishmael, tendril, kibana, icinga) [13:49:52] (03CR) 10Hashar: "Apparently eventlogging02 no more has a public address bound to it :-] So that sounds good. Ori might knows why the event logging inter" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152791 (owner: 10Tim Landscheidt) [13:50:00] (03CR) 10Hashar: [C: 031] beta: Fix IP mapping for stream.wmflabs.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/152791 (owner: 10Tim Landscheidt) [13:50:13] an 'NDA' group would make sense [13:50:24] i talked to apergos about that [13:50:36] !log RT - granted permission to show ticket summary for role requestor in queue access-requests [13:50:42] Logged the message, Master [13:50:49] (that was some kind of regression as if somebody accidentally removed it) [13:52:28] There are quite a few RT tickets blocked on NDA/ldap policy. I thought we were waiting on having phab keep track of NDAs. [13:53:36] andrewbogott: how would phab tracking of ndas block that? [13:53:52] Let me find the ticket... [13:54:44] i am for creating new LDAP groups, maybe 2 of them, one for the tools that give access to logs and one for the dba tools [13:55:03] rather than just calling it 'volunteers' or having one for each tool [13:55:51] afaict the issue in Jan's case is not the missing NDA, but that we were not supposed to add volunteers to the wmf group [13:56:01] hence the suggestion for new group(s) [13:56:54] welp, I can't find the comment that I thought was blocking this, so I guess I have no opinion :) [13:58:54] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [14:00:54] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 12:00:08 UTC [14:03:35] (03CR) 10Filippo Giunchedi: [C: 031] wmflib: add apply_format() [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 (owner: 10Ori.livneh) [14:03:45] RECOVERY - puppet last run on amslvs2 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [14:04:12] mutante: yeah [14:04:21] (03PS4) 10Ori.livneh: wmflib: add apply_format() [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 [14:04:24] so an NDA group can encompass everyone who signed an nda, whether they are wmf or not [14:04:26] (03CR) 10Ori.livneh: [C: 032 V: 032] wmflib: add apply_format() [operations/puppet] - 10https://gerrit.wikimedia.org/r/152852 (owner: 10Ori.livneh) [14:06:19] mark: ok, cool, Jan is about to create an RT for the creation of (a) new group, and then we'll handle it separately from his personal access [14:07:58] mark: so you want just one group for all different services? [14:08:29] mutante: I could actually use a 1-minute lesson on how/when ldap groups affects access (as vs. puppet admin module, which I pretty much understand) [14:08:41] jzerebecki: most services can probably be using the same group yeah [14:08:46] until we hit something really sensitive [14:09:12] ok thx [14:11:26] andrewbogott: so far it's mostly just one group alled "wmf" and being member in that let's you login on several tools, such as graphite, icinga, logstash... [14:11:40] andrewbogott: so it's all just web based logins, not related to shell access [14:11:46] andrewbogott: one instance where ldap groups are used is via apache http-basic auth, another one is wikitech/nova/labs (each project has a group) [14:11:48] Ah, ok, so it's not ever about system logins, it's just a mishmash of tools that use ldap auth? [14:11:53] apache auth [14:11:57] yes [14:12:11] yea my current problem is only apache auth [14:12:12] ok, makes sense. [14:12:46] I had mentally conflated some shell requests with some graphite requests -- it makes sense that they're totally unrelated though. [14:13:27] Hm, speaking of which, this could use an op review: https://gerrit.wikimedia.org/r/#/c/150850/ [14:13:30] so we did not want to add volunteers into the WMF group, independent from the NDA tracking question [14:13:54] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 12:13:38 UTC [14:14:14] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [14:20:16] ori: _joe_: do we have the jobrunner service on production ? /etc/jobrunner/jobrunner.conf is an invalid json file for PHP json_decode() because of inline comments [14:21:58] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 14:18:49 UTC [14:22:47] !log RT - reverted permission change for access requests requestors per robh [14:22:53] Logged the message, Master [14:23:28] RECOVERY - Puppet freshness on tmh1002 is OK: puppet ran at Fri Aug 8 14:23:23 UTC 2014 [14:23:47] ori: _joe_ : the related bug is https://bugzilla.wikimedia.org/show_bug.cgi?id=69272 [14:23:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [14:24:04] (03CR) 10Dzahn: [C: 032] "want proper symlinks from sites-avail to sites-enabled again" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152854 (owner: 10Dzahn) [14:24:58] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 14:23:23 UTC [14:25:09] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [14:25:09] (03PS3) 10Ori.livneh: apache::def: port to env-{enabled,disabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 [14:25:11] (03PS3) 10Ori.livneh: apache: add env-{available,enabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 [14:25:38] mutante: https://rt.wikimedia.org/Ticket/Display.html?id=8102 [14:26:58] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 14:23:23 UTC [14:27:18] ori: question.. there is that 00-dummy.conf created now, but i want to use the priority parameter to load my site first.. should i use priority "000" ? [14:27:41] mutante: 00-dummy.conf is empty, so you shouldn't have a need to load before it [14:27:44] <_joe_> mutante: no [14:28:02] i just want to make sure clients who dont speak SNI still get Bugzilla [14:28:19] ah, empty file, ok, i used prio 10 [14:28:58] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 14:23:23 UTC [14:30:58] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 14:23:23 UTC [14:31:42] hmm.. re: icinga .. it should not report the same status within 2 minutes in the first place [14:32:58] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 14:23:23 UTC [14:33:09] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Fri Aug 8 14:33:06 UTC 2014 [14:33:23] there we go, just created the LDAP grou [14:33:24] dn: cn=nda,ou=groups,dc=wikimedia,dc=org [14:33:42] \o/ [14:33:54] (03PS4) 10Ori.livneh: apache: add env-{available,enabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 [14:34:53] (03PS5) 10Ori.livneh: apache: add env-{available,enabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 [14:34:58] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 14:23:23 UTC [14:35:03] (03CR) 10Ori.livneh: [C: 032 V: 032] apache: add env-{available,enabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 (owner: 10Ori.livneh) [14:36:58] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 14:23:23 UTC [14:38:38] RECOVERY - Puppet freshness on tmh1002 is OK: puppet ran at Fri Aug 8 14:38:31 UTC 2014 [14:38:58] PROBLEM - puppet last run on mw1073 is CRITICAL: CRITICAL: Puppet has 1 failures [14:39:08] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: Puppet has 1 failures [14:39:18] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Puppet has 1 failures [14:39:28] PROBLEM - puppet last run on mw1047 is CRITICAL: CRITICAL: Puppet has 1 failures [14:39:29] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: Puppet has 1 failures [14:39:48] PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: Puppet has 1 failures [14:39:48] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: Puppet has 1 failures [14:39:58] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 14:38:31 UTC [14:39:58] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Puppet has 1 failures [14:39:58] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 1 failures [14:39:58] PROBLEM - puppet last run on mw1041 is CRITICAL: CRITICAL: Puppet has 1 failures [14:40:08] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: Puppet has 1 failures [14:40:09] PROBLEM - puppet last run on mw1070 is CRITICAL: CRITICAL: Puppet has 1 failures [14:40:09] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet has 1 failures [14:40:18] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: Puppet has 1 failures [14:40:32] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: Puppet has 1 failures [14:40:52] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:01] PROBLEM - puppet last run on mw1085 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:01] PROBLEM - puppet last run on mw1095 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:01] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:01] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:03] blech, that's me. ignorable, fix incoming [14:41:11] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:21] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:21] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:21] PROBLEM - puppet last run on virt0 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:26] when there are no files in env-enabled the for loop returns 1 and the config check freaks out [14:41:31] PROBLEM - puppet last run on nickel is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:32] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:32] PROBLEM - puppet last run on mw1075 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:32] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:41] PROBLEM - puppet last run on mw1083 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:41] PROBLEM - puppet last run on mw1094 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:51] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:51] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 1 failures [14:41:51] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:01] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:31] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:31] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:32] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:36] (03PS1) 10BBlack: move japan uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152914 [14:42:42] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:42] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:51] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:51] PROBLEM - puppet last run on mw1035 is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:58] (03CR) 10BBlack: [C: 032] move japan uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152914 (owner: 10BBlack) [14:43:01] PROBLEM - puppet last run on mw1062 is CRITICAL: CRITICAL: Puppet has 1 failures [14:43:01] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Puppet has 1 failures [14:43:01] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: Puppet has 1 failures [14:43:01] PROBLEM - puppet last run on mw1036 is CRITICAL: CRITICAL: Puppet has 1 failures [14:43:12] PROBLEM - puppet last run on ytterbium is CRITICAL: CRITICAL: Puppet has 1 failures [14:43:21] PROBLEM - puppet last run on zirconium is CRITICAL: CRITICAL: Puppet has 1 failures [14:43:21] PROBLEM - puppet last run on mw1096 is CRITICAL: CRITICAL: Puppet has 1 failures [14:43:41] PROBLEM - puppet last run on mw1124 is CRITICAL: CRITICAL: Puppet has 1 failures [14:43:51] (03PS1) 10Ori.livneh: add '|| true' to env-enabled/* glob [operations/puppet] - 10https://gerrit.wikimedia.org/r/152915 [14:43:52] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:10] (03CR) 10Ori.livneh: [C: 032 V: 032] add '|| true' to env-enabled/* glob [operations/puppet] - 10https://gerrit.wikimedia.org/r/152915 (owner: 10Ori.livneh) [14:44:11] PROBLEM - puppet last run on mw1038 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:11] PROBLEM - puppet last run on mw1072 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:32] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:32] PROBLEM - puppet last run on mw1048 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:32] PROBLEM - puppet last run on mw1089 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:41] PROBLEM - puppet last run on caesium is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:41] PROBLEM - puppet last run on mw1109 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:41] PROBLEM - puppet last run on mw1031 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:41] PROBLEM - puppet last run on mw1067 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:41] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:42] PROBLEM - puppet last run on mw1134 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:43] it should stop in a sec [14:44:51] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:52] PROBLEM - puppet last run on logstash1003 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:52] PROBLEM - puppet last run on mw1028 is CRITICAL: CRITICAL: Puppet has 1 failures [14:44:52] PROBLEM - puppet last run on magnesium is CRITICAL: CRITICAL: Puppet has 1 failures [14:45:00] apologies for spam [14:45:01] PROBLEM - puppet last run on mw1115 is CRITICAL: CRITICAL: Puppet has 1 failures [14:45:01] PROBLEM - puppet last run on mw1040 is CRITICAL: CRITICAL: Puppet has 1 failures [14:45:01] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: Puppet has 1 failures [14:45:11] PROBLEM - puppet last run on mw1080 is CRITICAL: CRITICAL: Puppet has 1 failures [14:45:31] PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: Puppet has 1 failures [14:45:41] PROBLEM - puppet last run on mw1063 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:01] PROBLEM - puppet last run on mw1106 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:01] PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:11] PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:21] PROBLEM - puppet last run on mw1145 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:31] PROBLEM - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC [14:46:31] PROBLEM - puppet last run on mw1187 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:41] PROBLEM - puppet last run on mw1059 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:41] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:41] PROBLEM - puppet last run on mw1082 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:42] PROBLEM - puppet last run on mw1141 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:42] PROBLEM - puppet last run on mw1160 is CRITICAL: CRITICAL: Puppet has 1 failures [14:46:52] PROBLEM - puppet last run on mw1045 is CRITICAL: CRITICAL: Puppet has 1 failures [14:47:01] PROBLEM - puppet last run on mw1026 is CRITICAL: CRITICAL: Puppet has 1 failures [14:47:02] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures [14:47:10] (03PS1) 10Dzahn: scholarships - convert to use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/152916 [14:47:21] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: Puppet has 1 failures [14:47:21] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures [14:47:41] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [14:47:41] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [14:47:51] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: Puppet has 1 failures [14:47:51] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 1 failures [14:47:51] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:01] PROBLEM - puppet last run on mw1088 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:01] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:01] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:01] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:01] PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:02] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:02] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:12] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:12] PROBLEM - puppet last run on mw1217 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:14] legoktm: thanks! [14:48:41] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:41] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [14:48:45] (03CR) 10Dzahn: [C: 032] scholarships - convert to use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/152916 (owner: 10Dzahn) [14:48:53] robh: https://wikimediafoundation.org/w/index.php?title=Home&diff=prev&oldid=98943 https://wikimediafoundation.org/wiki/User_talk:Tbayer#Blog_on_the_main_page - any insights on the best way to fix that? [14:49:33] HaeB: I'll have a patch in like 30 seconds [14:50:29] (03PS1) 10Legoktm: Re-enable $wgRSSProxy since blog.wikimedia.org is on an external host [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152917 [14:53:01] RECOVERY - puppet last run on mw1041 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [14:53:09] (03CR) 10Dzahn: "this finished the cleanup on zirconium - all sites-enabled are links again" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152916 (owner: 10Dzahn) [14:53:21] RECOVERY - puppet last run on zirconium is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [14:53:40] icinga-wm: you're getting weirder, there wasnt a fail? [14:53:58] ah, no there was [14:56:51] legoktm: thanks! [14:57:01] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [14:57:01] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [14:57:11] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [14:57:11] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:22] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [14:57:22] PROBLEM - puppet last run on mw1010 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:22] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [14:57:31] RECOVERY - puppet last run on mw1047 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [14:57:41] PROBLEM - puppet last run on mw1209 is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:41] PROBLEM - puppet last run on iodine is CRITICAL: CRITICAL: Puppet has 1 failures [14:57:51] RECOVERY - Puppet freshness on tmh1002 is OK: puppet ran at Fri Aug 8 14:57:43 UTC 2014 [14:58:01] RECOVERY - puppet last run on mw1073 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [14:58:01] PROBLEM - puppet last run on mw1091 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:11] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [14:58:21] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [14:58:31] RECOVERY - puppet last run on nickel is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [14:58:31] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [14:58:32] RECOVERY - puppet last run on mw1075 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [14:58:32] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [14:58:32] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [14:58:51] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [14:58:51] RECOVERY - puppet last run on lanthanum is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [14:58:52] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [14:58:52] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [14:59:01] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [14:59:01] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [14:59:02] RECOVERY - puppet last run on mw1085 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [14:59:02] RECOVERY - puppet last run on mw1095 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [14:59:12] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [14:59:12] RECOVERY - puppet last run on mw1070 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [14:59:21] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [14:59:21] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [14:59:21] RECOVERY - puppet last run on virt0 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [14:59:31] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [14:59:41] RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [14:59:41] RECOVERY - puppet last run on mw1094 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [14:59:41] RECOVERY - puppet last run on mw1083 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [14:59:42] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [14:59:51] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [14:59:51] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [14:59:52] RECOVERY - puppet last run on mw1035 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [15:00:01] ACKNOWLEDGEMENT - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC daniel_zahn Duplicate declaration: Package[coreutils] is already declared - #8004: virt1009 - RAID/disk fail - used for something other than labs [15:00:02] ACKNOWLEDGEMENT - puppet last run on virt1009 is CRITICAL: CRITICAL: Epic puppet fail daniel_zahn Duplicate declaration: Package[coreutils] is already declared - #8004: virt1009 - RAID/disk fail - used for something other than labs [15:00:02] RECOVERY - puppet last run on mw1036 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [15:00:21] RECOVERY - puppet last run on ytterbium is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [15:00:22] RECOVERY - puppet last run on mw1096 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [15:00:32] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [15:00:32] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:00:41] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [15:00:51] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:00:52] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [15:01:01] RECOVERY - puppet last run on mw1062 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [15:01:01] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [15:01:48] ACKNOWLEDGEMENT - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 07 Aug 2014 18:37:41 UTC daniel_zahn #intentional breakage per site.pp [15:01:51] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [15:01:52] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:02:01] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [15:02:01] RECOVERY - puppet last run on mw1115 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [15:02:02] RECOVERY - puppet last run on mw1040 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [15:02:02] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:02:11] RECOVERY - puppet last run on mw1038 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [15:02:12] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [15:02:31] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [15:02:32] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [15:02:41] RECOVERY - puppet last run on mw1089 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [15:02:42] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [15:02:42] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [15:02:42] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [15:02:42] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [15:02:42] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [15:02:51] RECOVERY - puppet last run on mw1028 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [15:03:12] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [15:03:22] (03PS4) 10Dzahn: bugzilla: use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/152282 (owner: 10Giuseppe Lavagetto) [15:03:32] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [15:03:41] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [15:03:41] RECOVERY - puppet last run on caesium is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:03:41] RECOVERY - puppet last run on mw1059 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:03:42] RECOVERY - puppet last run on mw1063 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:03:42] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [15:03:42] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:03:51] RECOVERY - puppet last run on mw1045 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [15:03:52] RECOVERY - puppet last run on magnesium is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [15:04:01] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:04:01] RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [15:04:01] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:04:01] RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:04:11] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:04:21] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:04:29] (03CR) 10Dzahn: [C: 032] bugzilla: use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/152282 (owner: 10Giuseppe Lavagetto) [15:05:12] RECOVERY - puppet last run on mw1217 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [15:05:21] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [15:05:31] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:05:41] RECOVERY - puppet last run on mw1082 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:05:41] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [15:05:41] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [15:05:41] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [15:05:42] RECOVERY - puppet last run on mw1160 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [15:05:51] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:05:51] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [15:05:51] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:06:01] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [15:06:01] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:06:01] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [15:06:01] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [15:06:01] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [15:06:02] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [15:06:11] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [15:06:12] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [15:06:22] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [15:07:02] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [15:07:42] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [15:07:44] (03CR) 10Filippo Giunchedi: apache: add env-{available,enabled} (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152866 (owner: 10Ori.livneh) [15:09:09] (03PS1) 10BBlack: move KP, KR, TW uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152919 [15:09:11] (03PS1) 10BBlack: move HK, PH, SG back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152920 [15:09:13] (03PS1) 10BBlack: move ID, TH uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152921 [15:09:15] (03PS1) 10BBlack: Move the remainder of ex-ulsfo upload traffic back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152922 [15:09:28] (03CR) 10BBlack: [C: 032] move KP, KR, TW uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152919 (owner: 10BBlack) [15:09:50] _joe_: ssl_ciphersuite .. it works :) [15:10:18] and unrelated: we have SSLCompression Off in one place but not all, fwiw [15:10:40] 17:07 JBlack: This server could not prove that it is wikisource.com; its security certificate is from *.wikipedia.org. This may be caused by a misconfiguration or an attacker intercepting your connection. [15:10:44] mutante: ^ [15:11:05] while visiting https://wikisource.com with Google Chrome [15:11:24] probably needs a ticket or a bug? [15:12:56] odder: wikisource.org , it actually is *.wikipedia.org AND *.wikisource.org as an alt. name [15:13:16] https://bugzilla.wikimedia.org/show_bug.cgi?id=40998 is probably about that? [15:13:24] <_joe_> mutante: ewww [15:13:27] <_joe_> it should be off [15:13:30] odder: yes, it is [15:13:51] Do we need a certificate for a redirect? unsure. [15:14:01] RECOVERY - puppet last run on mw1091 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [15:14:41] RECOVERY - puppet last run on iodine is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:15:11] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [15:15:16] odder: i'd agree with MZ in "It'd be nice to know how common this is (usage stats, I guess)" [15:15:42] RECOVERY - puppet last run on mw1209 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:16:56] odder: it's a matter of $$$, it's gonna be quite a bit, we'd have to also support all that stuff like m.wikimedia.net and whatnot [15:17:07] or i guess what Tim said on the ticket [15:17:18] "go to a separate set of public IPs, and then refuse connections on port 443" [15:19:29] (03PS1) 10Dzahn: bugzilla - retab Apache template [operations/puppet] - 10https://gerrit.wikimedia.org/r/152924 [15:20:11] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Fri Aug 8 15:20:07 UTC 2014 [15:25:41] (03CR) 10Dzahn: [C: 032] bugzilla - retab Apache template [operations/puppet] - 10https://gerrit.wikimedia.org/r/152924 (owner: 10Dzahn) [15:25:50] (03PS1) 10Dzahn: bugzilla - consistently use SSLCompression Off [operations/puppet] - 10https://gerrit.wikimedia.org/r/152925 [15:26:02] (03CR) 10BBlack: [C: 032] move HK, PH, SG back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152920 (owner: 10BBlack) [15:26:57] (03CR) 10Dzahn: [C: 032] "08:15 < _joe_> mutante: ewww" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152925 (owner: 10Dzahn) [15:29:46] (03CR) 10Dzahn: [C: 031] "tested with Apache on Bugzilla - worked" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152248 (owner: 10Giuseppe Lavagetto) [15:39:59] (03PS1) 10JanZerebecki: graphite: replace ldap group restriction with ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 [15:40:31] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 13:40:14 UTC [15:40:39] (03CR) 10jenkins-bot: [V: 04-1] graphite: replace ldap group restriction with ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 (owner: 10JanZerebecki) [15:41:14] (03PS4) 10Dzahn: restrict access to puppet logs to root users [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 [15:42:19] (03CR) 10Dzahn: [C: 031] restrict access to puppet logs to root users [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [15:46:46] (03PS2) 10JanZerebecki: graphite: replace ldap group restriction with ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 [15:49:41] (03CR) 10BBlack: [C: 032] move ID, TH uploads back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152921 (owner: 10BBlack) [15:55:28] (03CR) 10JanZerebecki: [C: 04-1] "Should use the nda group instead, which was just created for this. See Ie621a1f2732fcad872b8b84396b02cc1d4563de5" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140881 (owner: 10Hoo man) [15:56:11] how to check if we can shutdown the ganglia aggregator on tarin (pmtpa) [15:56:14] (03CR) 10JanZerebecki: "RequireAny is not yet available in Apache 2.2" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140881 (owner: 10Hoo man) [15:59:33] (03CR) 10Andrew Bogott: [C: 032] "The logrotate docs say that when a file is rotated the new file is created with the same owner/group/perms as the moved file. So this isn" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [16:00:21] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Fri Aug 8 16:00:11 UTC 2014 [16:01:30] andrewbogott: :) [16:01:58] !log merging https://gerrit.wikimedia.org/r/#/c/150273/ which affects every puppet log everywhere... [16:02:51] !log merging https://gerrit.wikimedia.org/r/#/c/150273/ which affects every puppet log everywhere... [16:02:57] Logged the message, Master [16:04:33] (03CR) 10BBlack: [C: 032] Move the remainder of ex-ulsfo upload traffic back to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/152922 (owner: 10BBlack) [16:06:12] !log datacenter traffic mapping back to normal, varnish fix/wipe/restart/etc work on pause for the weekend in a stable state [16:06:16] Logged the message, Master [16:09:10] legoktm: not sure if it's in my purview to review https://gerrit.wikimedia.org/r/152917 , but i would be happy to ;) [16:12:10] (03CR) 10Andrew Bogott: [C: 031] Disable access for mwalker, who is leaving WMF. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 (owner: 10Andrew Bogott) [16:14:33] (03CR) 10Dzahn: [C: 031] Disable access for mwalker, who is leaving WMF. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 (owner: 10Andrew Bogott) [16:14:46] (03CR) 10Dzahn: "needs manual rebase though" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 (owner: 10Andrew Bogott) [16:15:51] (03PS1) 10BryanDavis: beta: Set runners_* for role::beta::jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/152931 [16:16:39] (03PS2) 10Hashar: beta: Set runners_* for role::beta::jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/152931 (https://bugzilla.wikimedia.org/69272) (owner: 10BryanDavis) [16:18:48] (03CR) 10Dzahn: [C: 031] "re: the concern that it's also a ganglia aggregator.. well.. class ganglia::aggregator sets up service "ganglia-monitor-aggrs" but on tari" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152154 (owner: 10Dzahn) [16:19:14] (03PS7) 10Andrew Bogott: Disable access for mwalker, who is leaving WMF. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 [16:19:25] (03CR) 10Hashar: "I guess it is good as is. We can easily bump the numbers later on." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152931 (https://bugzilla.wikimedia.org/69272) (owner: 10BryanDavis) [16:20:09] (03CR) 10Andrew Bogott: [C: 032] Disable access for mwalker, who is leaving WMF. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 (owner: 10Andrew Bogott) [16:21:10] (03CR) 10Hashar: [C: 031] beta: Set runners_* for role::beta::jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/152931 (https://bugzilla.wikimedia.org/69272) (owner: 10BryanDavis) [16:23:45] (03CR) 10BryanDavis: "Cherry-picked to beta to get the job runner running again." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152931 (https://bugzilla.wikimedia.org/69272) (owner: 10BryanDavis) [16:26:12] (03CR) 10Legoktm: [C: 031] beta: Set runners_* for role::beta::jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/152931 (https://bugzilla.wikimedia.org/69272) (owner: 10BryanDavis) [16:39:46] (03PS1) 10BryanDavis: beta: Change auth realmmessage for logstash-beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/152932 (https://bugzilla.wikimedia.org/69267) [16:46:07] (03PS2) 10Hoo man: Allow ldap "nda" users to access ishmael [operations/puppet] - 10https://gerrit.wikimedia.org/r/140881 [16:46:44] mutante: https://gerrit.wikimedia.org/r/140881 [16:48:13] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Epic puppet fail [16:52:43] (03PS3) 10JanZerebecki: graphite: add access for ldap groups ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 [16:52:56] (03CR) 10Dzahn: [C: 031] "please just also add to the message that you are adding new sudo privs, still +1 though, since you already have all this other critical ac" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [16:54:17] (03PS2) 10Hoo man: Allow "hoo" to sudo into datasets [operations/puppet] - 10https://gerrit.wikimedia.org/r/152724 [17:07:51] !log jenkins/puppet-compiler - granting new LDAP group "nda" the same rights already given to matanya (and wmde even has more) [17:07:56] Logged the message, Master [17:08:23] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [17:21:52] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 17:18:48 UTC [17:23:52] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 17:18:48 UTC [17:24:10] (03PS4) 10JanZerebecki: graphite: add access for ldap groups ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 [17:25:52] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 17:18:48 UTC [17:27:40] (03CR) 10JanZerebecki: "http://puppet-compiler.wmflabs.org/199/change/152928/html/tungsten.eqiad.wmnet.html" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 (owner: 10JanZerebecki) [17:27:52] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 17:18:48 UTC [17:29:52] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 17:18:48 UTC [17:31:52] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 17:18:48 UTC [17:33:32] RECOVERY - Puppet freshness on mw1191 is OK: puppet ran at Fri Aug 8 17:33:26 UTC 2014 [17:35:52] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 17:33:26 UTC [17:37:52] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 17:33:26 UTC [17:39:52] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 17:33:26 UTC [17:40:47] RECOVERY - Puppet freshness on mw1191 is OK: puppet ran at Fri Aug 8 17:40:36 UTC 2014 [17:41:36] (03PS1) 10Rush: allow public option setting in phab [operations/puppet] - 10https://gerrit.wikimedia.org/r/152939 [17:41:56] (03CR) 10Rush: [C: 032 V: 032] allow public option setting in phab [operations/puppet] - 10https://gerrit.wikimedia.org/r/152939 (owner: 10Rush) [18:00:47] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 16:00:11 UTC [18:02:28] (03PS1) 10Ori.livneh: RCStream: use recommended Nginx config for WebSockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/152942 [18:03:08] (03CR) 10Ori.livneh: [C: 032 V: 032] "Service hasn't launched yet and this seems like a reasonable (and cheap) thing to try." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152942 (owner: 10Ori.livneh) [18:05:42] (03PS1) 10BryanDavis: Revert "apache::monitoring: add diamond support; ensure mod_status is enabled" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 [18:05:51] (03CR) 10jenkins-bot: [V: 04-1] Revert "apache::monitoring: add diamond support; ensure mod_status is enabled" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [18:08:59] (03CR) 10BryanDavis: "Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Dupli" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [18:19:57] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Fri Aug 8 18:19:52 UTC 2014 [18:53:18] (03CR) 10JanZerebecki: [C: 031] Allow ldap "nda" users to access ishmael [operations/puppet] - 10https://gerrit.wikimedia.org/r/140881 (owner: 10Hoo man) [18:53:47] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 16:52:55 UTC [18:55:10] ori, hey [19:01:05] jzerebecki: If I merge https://gerrit.wikimedia.org/r/#/ are you available and able to verify that it works? [19:04:09] andrewbogott: there's no commit :P [19:04:57] Reedy: Good point! jzerebecki, I think https://gerrit.wikimedia.org/r/#/c/140881/ is what I meant. [19:13:47] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Fri Aug 8 19:13:38 UTC 2014 [19:53:01] (03PS1) 10BBlack: switch rcstream to source hash LVS scheduling [operations/puppet] - 10https://gerrit.wikimedia.org/r/152960 [19:54:20] (03PS2) 10BBlack: switch rcstream to source hash LVS scheduling [operations/puppet] - 10https://gerrit.wikimedia.org/r/152960 [20:05:07] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 724 [20:10:07] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1024 [20:11:31] (03CR) 10Giuseppe Lavagetto: [C: 031] "all in all this solves the problem quickly and it's most probably not an issue." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152960 (owner: 10BBlack) [20:30:07] RECOVERY - check_mysql on lutetium is OK: Uptime: 3803013 Threads: 1 Questions: 21546029 Slow queries: 22156 Opens: 29365 Flush tables: 2 Open tables: 64 Queries per second avg: 5.665 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [20:38:30] (03Abandoned) 10Andrew Bogott: Remove duplicate nrpe::monitor_service def [operations/puppet] - 10https://gerrit.wikimedia.org/r/152766 (owner: 10Andrew Bogott) [21:20:14] (03PS1) 10Reedy: Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152975 [21:20:40] (03CR) 10Reedy: [C: 032] Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152975 (owner: 10Reedy) [21:20:58] (03Merged) 10jenkins-bot: Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152975 (owner: 10Reedy) [21:21:26] !log reedy Synchronized wmf-config/interwiki.cdb: (no message) (duration: 01m 04s) [21:21:32] Logged the message, Master [21:24:16] !log mw1130 seems to be dead (unresponsive to ping) [21:24:21] Logged the message, Master [21:28:01] again?! -.- [21:34:02] Reedy: Alex logged it in icinga as having a bad disk [21:35:16] Yeah, I saw in the SAL after [21:35:25] Can I remove it from dsh seeing as it's gonna be down a while? [21:35:47] sure [21:37:44] (03PS2) 10Reedy: sync-common-file is no more, use sync-file [operations/puppet] - 10https://gerrit.wikimedia.org/r/143279 [21:37:46] Also, could someone please merge https://gerrit.wikimedia.org/r/#/c/143279/ ? [21:38:08] remember all, if removing a server from pybal, remove it from dsh, and vice versa [21:40:38] (03PS1) 10Reedy: Remove mw1130 from mediawiki-installation dsh as it's offline [operations/puppet] - 10https://gerrit.wikimedia.org/r/152987 [21:42:03] (03CR) 10Yurik: [C: 031] Log when Internet.org in X-Analytics with proxy tag (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [21:42:17] bblack, when you have a chance, this is good to go ^ [21:44:43] (03CR) 10BBlack: [C: 032] sync-common-file is no more, use sync-file [operations/puppet] - 10https://gerrit.wikimedia.org/r/143279 (owner: 10Reedy) [21:45:38] (03PS12) 10BBlack: Log when Internet.org in X-Analytics with proxy tag [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [21:45:48] (03CR) 10BBlack: [C: 032] Log when Internet.org in X-Analytics with proxy tag [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [21:45:53] thx [21:46:03] bblack: thanks [21:46:09] np [21:46:19] bblack, yurikR, (qchris when you read the logs), thx [21:46:41] (03CR) 10BBlack: [V: 032] Log when Internet.org in X-Analytics with proxy tag [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [21:50:58] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: Puppet has 1 failures [21:52:18] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 1 failures [21:52:27] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures [21:52:27] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: Puppet has 1 failures [21:52:37] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: Puppet has 1 failures [21:52:47] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: Puppet has 1 failures [21:52:58] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: Puppet has 1 failures [21:53:17] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: Puppet has 1 failures [21:53:57] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: Puppet has 1 failures [21:53:58] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: Puppet has 1 failures [21:54:28] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [21:54:38] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: Puppet has 1 failures [21:54:51] <_joe_> Reedy: is that the reason why ori had to run the cache update manually today? [21:54:57] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: Puppet has 1 failures [21:55:06] <_joe_> bblack: some puppetmaster is failing i'd say [21:55:10] I don't think so... [21:55:15] Been broken for a while [21:55:27] <_joe_> Reedy: we did install a server anew [21:55:51] <_joe_> but you may want to ask him [21:56:07] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: Puppet has 1 failures [21:56:11] Aug 8 21:49:31 cp1046 frontend[14098]: CLI telnet 127.0.0.1 9602 127.0.0.1 6082 Wr 106 Message from VCC-compiler:#012Cannot read file 'via.inc.vcl': No such file or directory#012('mobile-frontend.inc.vcl' Line 5 Pos 9)#012include "via.inc.vcl";#012--------#############-#012#012Running VCC-compiler failed, exit 1#012VCL compilation failed [21:56:27] PROBLEM - puppet last run on cp1037 is CRITICAL: CRITICAL: Puppet has 1 failures [21:56:27] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Puppet has 1 failures [21:56:57] yurikR: ^ [21:56:57] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:18] PROBLEM - puppet last run on cp1040 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:28] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:34] dr0ptp4kt, ^ [21:57:43] I should've noticed that in review. It adds a new file via.inc.vcl to puppet, but there's no file stanza to distribute it :) [21:57:48] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:57] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:57] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: Puppet has 1 failures [21:57:59] bblack: sorry [21:58:01] we distribute vcls separately? [21:58:17] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:17] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:19] its good that we now track it though ) [21:58:27] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:38] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:48] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Puppet has 1 failures [21:58:58] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet has 1 failures [21:59:07] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: Puppet has 1 failures [21:59:18] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: Puppet has 1 failures [21:59:38] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 1 failures [21:59:42] I'm putting a fix patch through, it's pretty trivial [21:59:52] thx :) [21:59:54] bblack: thank you! [22:00:17] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Puppet has 1 failures [22:00:37] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Puppet has 1 failures [22:00:47] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: Puppet has 1 failures [22:01:37] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 1 failures [22:01:57] (03PS1) 10BBlack: Distribute via.inc.vcl to Varnishes [operations/puppet] - 10https://gerrit.wikimedia.org/r/152990 [22:02:27] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Puppet has 1 failures [22:02:37] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 1 failures [22:02:55] (03CR) 10BBlack: [C: 032] Distribute via.inc.vcl to Varnishes [operations/puppet] - 10https://gerrit.wikimedia.org/r/152990 (owner: 10BBlack) [22:02:57] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Puppet has 1 failures [22:02:58] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: Puppet has 1 failures [22:03:57] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: Puppet has 1 failures [22:03:58] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 1 failures [22:04:38] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: Puppet has 1 failures [22:04:58] PROBLEM - puppet last run on amssq44 is CRITICAL: CRITICAL: Puppet has 1 failures [22:05:37] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: Puppet has 1 failures [22:05:38] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: Puppet has 1 failures [22:05:58] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet has 1 failures [22:06:17] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: Puppet has 1 failures [22:06:37] PROBLEM - puppet last run on cp1039 is CRITICAL: CRITICAL: Puppet has 1 failures [22:06:38] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: Puppet has 1 failures [22:06:47] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: Puppet has 1 failures [22:06:57] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: Puppet has 1 failures [22:09:27] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [22:09:59] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:10:37] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [22:10:37] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:10:47] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [22:11:37] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [22:11:38] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:11:57] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [22:11:58] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [22:11:58] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [22:11:58] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [22:12:18] RECOVERY - puppet last run on cp1038 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:12:37] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [22:14:07] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:14:28] RECOVERY - puppet last run on cp1037 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:15:18] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:15:29] RECOVERY - puppet last run on cp1040 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [22:15:29] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:15:37] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [22:15:57] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [22:15:57] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [22:15:58] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:15:58] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:16:17] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:16:27] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [22:16:47] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [22:16:48] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:17:07] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:17:37] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [22:17:37] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [22:18:07] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [22:18:17] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [22:18:18] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:18:48] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:18:57] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [22:19:38] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [22:20:27] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:20:37] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [22:21:07] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [22:21:48] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [22:22:07] RECOVERY - puppet last run on amssq44 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [22:22:38] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [22:22:57] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [22:23:07] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [22:24:07] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:24:17] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:24:37] RECOVERY - puppet last run on cp1039 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:24:38] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [22:24:47] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [22:24:48] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:25:07] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [22:41:08] (03PS2) 10BBlack: Remove mw1130 from mediawiki-installation dsh as it's offline [operations/puppet] - 10https://gerrit.wikimedia.org/r/152987 (owner: 10Reedy) [22:42:25] (03CR) 10BBlack: [C: 032] Remove mw1130 from mediawiki-installation dsh as it's offline [operations/puppet] - 10https://gerrit.wikimedia.org/r/152987 (owner: 10Reedy) [23:00:47] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Fri 08 Aug 2014 20:59:47 UTC [23:55:57] PROBLEM - puppet last run on mw1022 is CRITICAL: CRITICAL: Puppet has 1 failures