[01:09:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [02:14:03] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 00:13:18 UTC [02:17:41] !log LocalisationUpdate completed (1.24wmf15) at 2014-08-06 02:16:38+00:00 [02:17:50] Logged the message, Master [02:30:04] !log LocalisationUpdate completed (1.24wmf16) at 2014-08-06 02:29:00+00:00 [02:30:09] Logged the message, Master [02:32:54] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Wed Aug 6 02:32:52 UTC 2014 [02:34:13] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 842.140015609 [03:10:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [03:11:12] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Aug 6 03:10:06 UTC 2014 (duration 10m 5s) [03:11:18] Logged the message, Master [04:20:56] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 02:20:00 UTC [04:33:51] jenkins is having problems with https://gerrit.wikimedia.org/r/#/c/151808/ [04:40:15] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Wed Aug 6 04:40:11 UTC 2014 [05:10:56] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [05:18:01] (03CR) 10Swalling: "Please do. :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151639 (https://bugzilla.wikimedia.org/69103) (owner: 10Phuedx) [05:56:55] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [06:03:55] there are some interesting spikes and trends in the 5xx rate over the past week [06:04:35] are they all accounted for by brandon's varnish deployment? [06:06:03] when we get 5xx alerts it's usually mediawiki or varnish, and the mediawiki error rate is pretty flat [06:07:14] the time intervals in gdash aren't great [06:09:55] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [06:27:36] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:45] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:55] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:26] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:26] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:57] (03PS1) 10Ori.livneh: HHVM: set a 5s graceful shutdown timeout [operations/puppet] - 10https://gerrit.wikimedia.org/r/152001 [06:45:35] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:45:36] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:45:45] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:45:55] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:46:35] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:49:46] PROBLEM - puppet last run on db1042 is CRITICAL: CRITICAL: Puppet has 2 failures [07:04:37] !log upgrading Jenkins to latest LTS [07:04:41] Logged the message, Master [07:06:46] RECOVERY - puppet last run on db1042 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [07:06:54] <_joe_> oh so I should V+2 my changes and blame you, great [07:07:04] <_joe_> :D [07:07:46] aren't you all having breakfast at wikimania ? :D [07:07:55] <_joe_> I'm not at wikimania [07:08:23] I tend to do CI upgrades during european morning since that is solo quiet [07:08:54] <_joe_> well, techops are active during the european mornings [07:11:56] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [07:13:18] !log Upgrading Jenkins plugin and restarting. [07:13:24] Logged the message, Master [07:13:52] (03PS2) 10Giuseppe Lavagetto: Update refreshDomainRedirects with port number [operations/puppet] - 10https://gerrit.wikimedia.org/r/151858 (owner: 10Alexandros Kosiaris) [07:14:22] (03CR) 10Giuseppe Lavagetto: [C: 032] Update refreshDomainRedirects with port number [operations/puppet] - 10https://gerrit.wikimedia.org/r/151858 (owner: 10Alexandros Kosiaris) [07:15:55] (03CR) 10Hashar: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151858 (owner: 10Alexandros Kosiaris) [07:16:25] <_joe_> hashar: uh? [07:16:37] commenting 'recheck' cause the tests to be run again [07:16:52] that change now has V+2 from Jenkins hehe [07:19:15] <_joe_> oh that trick I didn't knew [07:24:51] <_joe_> hashar: how long will it take for jenkins to be back? [07:25:02] <_joe_> no pressure, just so that I can organize my work [07:25:06] it is back [07:25:12] well at least it is proceeding jobs :-] [07:25:17] i thought so, or recheck wouldn't have worked [07:25:18] <_joe_> https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/ [07:25:21] the web GUI takes a while to load though [07:25:29] <_joe_> ok so it's "a while [07:25:34] <_joe_> not "at lunch" [07:25:35] <_joe_> :) [07:25:38] <_joe_> thanks [07:26:05] it is proceeding a bunch of history for some reason :( [07:27:22] processing* :-) [07:32:43] pfff [07:32:45] lets stop restart [07:33:11] !log restarting Jenkins. It apparently like to parse the whole history on reload, so aborting that. [07:33:17] Logged the message, Master [07:37:49] AH [07:38:17] _joe_: so Jenkins works, but the GUI is blocked until some plugin completed its migration :/ [07:38:45] <_joe_> eheheh gotta love java enterprise architectures [07:39:13] the upgrade is registered during jenkins init :-( [07:39:15] and before the ui [07:39:20] but the job are there bah [07:40:17] _joe_: have you got the puppet compiler instance fixed up ? [07:40:23] <_joe_> yes [08:20:55] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 06:20:41 UTC [08:23:27] (03CR) 10Alexandros Kosiaris: [C: 032] Allow overriding postgres datadir [operations/puppet] - 10https://gerrit.wikimedia.org/r/151864 (owner: 10Alexandros Kosiaris) [08:26:21] (03PS1) 10Alexandros Kosiaris: Fix typo in ebcb7cd [operations/puppet] - 10https://gerrit.wikimedia.org/r/152007 [08:27:15] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix typo in ebcb7cd [operations/puppet] - 10https://gerrit.wikimedia.org/r/152007 (owner: 10Alexandros Kosiaris) [08:28:57] hi _joe_ [08:29:18] is stream.wikimedia.org meant to be working? [08:29:24] if so, it doesn't [08:29:33] yields a 404 [08:29:33] @notify _joe_ [08:29:33] This user is now online in #wikimedia-operations. I'll let you know when they show some activity (talk, etc.) [08:29:58] the 404 is for /, which is deliberate [08:30:06] the 502s are known but not yet resolved, but the site hasn't launched yet [08:30:08] ori: I am using that python example [08:30:13] mhm [08:30:17] I get the 502 error [08:30:23] ok so is there any working one? [08:30:34] because without that I can hardly implement parser... [08:30:44] so wait [08:30:54] I hope IRC will not get disabled before you actually get it working :P [08:31:01] of course not! [08:31:06] it won't [08:31:36] is there a ticket for 502 error I could subscribe to [08:32:25] hmm stream.wmflabs.org looks better I guess [08:32:37] it doesn't really do anything but at least it doesn't crash [08:32:44] petan: https://bugzilla.wikimedia.org/show_bug.cgi?id=66989 [08:32:48] ok thanks [08:33:43] ori: transparency.wikimedia.org is up. Nice work! [08:34:02] weee. i forgot to check, thanks! [08:35:13] petan: i know you have reservations about it, but please have a tiny bit of patience and good faith [08:35:20] i promise it won't suck [08:37:05] I think that if you actually had the experience of implementing this in a C / C++ solution with limited possibilities to use any 3rd libraries, you would understand why simple solutions like IRC are sometimes very powerful [08:37:20] parsing that IRC text isn't simple, but it's much easier than writing parser for JSON [08:39:06] you've made your feelings about this known, and i hear you [08:40:25] but you were almost gloating a moment ago, and i wonder if your opinion about the data encoding isn't spilling over into bad faith about stability [08:42:24] <_joe_> petan: sorry I was ignoring IRC for a few minutes before [08:43:07] <_joe_> but yeah, stream.wm.org is not "launched" [08:52:52] any clue how/which apache config is loaded on beta cluster ? [08:52:57] I mean on mediawiki01 / 02 [08:52:59] curl --verbose -L http://deployment.wikimedia.beta.wmflabs.org/wiki/Meta:Huggle/List 2>&1 |fgrep Location [08:53:03] yields a redirect loop :-/ [08:54:32] ah /data/project/apache/conf [08:55:58] <_joe_> hashar: everything seems to return a redirect loop [08:56:17] <_joe_> :/ [08:56:26] <_joe_> it didn't last I checked [08:56:33] <_joe_> this makes me sad [08:56:49] I am sure it is related to x-forwarded-proto [09:00:35] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Wed Aug 6 09:00:32 UTC 2014 [09:01:34] <_joe_> hashar: it redirects to itself [09:01:38] <_joe_> how weird [09:02:56] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [09:07:14] bah [09:07:21] on mediawiki01 I can't even manage to hit the URLs [09:07:29] curl --verbose -x 127.0.0.1:80 'http://en.wikipedia.beta.wmflabs.org/w/index.php' [09:07:29] yields a 404 /wiki/ not found hehe [09:08:42] that broken labstore1003 thing is causing problems with pupppet in labs [09:09:09] (03PS1) 10Alexandros Kosiaris: osm: Move postgres datadir to /srv/postgres [operations/puppet] - 10https://gerrit.wikimedia.org/r/152013 [09:10:30] it's Include sites-enabled/*.conf [09:10:39] puppet should be disabled on those hosts [09:10:47] beta still has to load from the old location [09:12:30] gross hack: apache::conf { 'load-beta-sites': content => "Include /usr/local/apache/conf/main.conf" } [09:12:56] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [09:13:02] ori: _joe_: opinions on this ? https://rt.wikimedia.org/Ticket/Display.html?id=8068 [09:13:12] never done that before btw [09:16:06] akosiaris: the ticket could stand to be a bit clearer [09:17:21] * _joe_ concurs [09:17:32] yup [09:17:41] which is why I am asking :-) [09:18:16] (03CR) 10Nuria: [C: 031] "I cannot +2 this change. +1 on my end." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150850 (owner: 10Andrew Bogott) [09:19:23] (03CR) 10QChris: "The Analytics infrastructure can deal with an" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [09:19:29] <_joe_> well, I'd say the DNS solution (which I didn't know about) would be better [09:23:39] heh [09:23:48] https://support.google.com/webmasters/answer/176792?hl=en&ref_topic=4564314 [09:24:06] the links for DNS TXT and CNAME records point to https://en.wikipedia.org/wiki/TXT_record#TXT and https://en.wikipedia.org/wiki/CNAME_record :) [09:26:56] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [09:28:51] * hoo whispers "restart JSON dumps" [09:28:57] * hoo hides [09:29:33] ^^ [09:52:39] (03CR) 10Chad: [C: 032] Enable more features for fawiki AbuseFilter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151421 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [09:53:13] (03Merged) 10jenkins-bot: Enable more features for fawiki AbuseFilter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151421 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [09:54:46] !log demon Synchronized wmf-config/abusefilter.php: abuse filter settings for fawiki (duration: 00m 21s) [09:54:51] Logged the message, Master [09:56:19] ^d: Hey! I want my patch deployed too! Boo! [09:57:03] * ^d hides [10:10:26] !log Jenkins web interface is back up [10:10:31] _joe_: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/ is back :) [10:10:32] Logged the message, Master [10:11:28] <_joe_> hashar: thanks [10:25:15] (03CR) 10Bene: "@Thiemo: that's actually not correct. This patch allows badges to be added to Wikidata." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149918 (https://bugzilla.wikimedia.org/40810) (owner: 10Bene) [10:37:03] (03PS1) 10Alexandros Kosiaris: Revert "Labs: switch dumps to the new server" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152022 [10:38:51] (03CR) 10Giuseppe Lavagetto: [C: 031] Revert "Labs: switch dumps to the new server" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152022 (owner: 10Alexandros Kosiaris) [10:39:33] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "Labs: switch dumps to the new server" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152022 (owner: 10Alexandros Kosiaris) [11:06:06] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [11:13:56] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [11:17:53] (03PS1) 10Calak: Try to fix a minor bug [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152031 (https://bugzilla.wikimedia.org/69073) [11:18:06] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [11:20:06] (03PS1) 10Brion VIBBER: Followup fix to fawiki AbuseFilter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152033 (https://bugzilla.wikimedia.org/69073) [11:21:02] (03CR) 10Brion VIBBER: [C: 031] "Looks like right fix, need someone who can push the update to merge." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152031 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [11:22:10] (03Abandoned) 10Brion VIBBER: Followup fix to fawiki AbuseFilter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152033 (https://bugzilla.wikimedia.org/69073) (owner: 10Brion VIBBER) [11:23:09] (03PS2) 10Chad: Fix up fawiki abusefilter settings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152031 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [11:23:20] (03CR) 10Chad: [C: 032] Fix up fawiki abusefilter settings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152031 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [11:23:59] (03Merged) 10jenkins-bot: Fix up fawiki abusefilter settings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152031 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [11:25:59] !log demon Synchronized wmf-config/abusefilter.php: (no message) (duration: 00m 19s) [11:26:05] Logged the message, Master [11:26:56] ^d: You done? [11:27:00] <^d> alldone. [11:27:18] ok [11:27:29] (03PS3) 10Hoo man: Grant 'centralauth-rename' right to stewards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139655 (owner: 10Gerrit Patch Uploader) [11:28:53] (03CR) 10Hoo man: [C: 032] "This makes sense to me... will remove the global right immediately after." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139655 (owner: 10Gerrit Patch Uploader) [11:28:56] (03Merged) 10jenkins-bot: Grant 'centralauth-rename' right to stewards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139655 (owner: 10Gerrit Patch Uploader) [11:30:33] !log hoo Synchronized wmf-config/CommonSettings.php: Grant 'centralauth-rename' to 'steward' (duration: 00m 24s) [11:30:40] Logged the message, Master [11:36:27] (03CR) 10Calak: "Thank you very much." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152031 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [11:39:34] hoo: I want my patch merged too! Boo! [11:40:50] mh? [11:40:58] if it's trivial that might be possible [11:41:19] Didn't Greg say it was supposed to be emergency deploys only? [11:41:27] It's Hackathon [11:41:45] So? [11:42:06] trivial stuff usually is fine ;) [11:42:39] https://gerrit.wikimedia.org/r/#/c/147922/ has been waiting for three weeks now [11:43:10] I saw that [11:43:22] but I'm not into that code enough to be able to tell it's trivial [11:43:32] so not going to happen, sorry :( [11:43:43] Pfff. [11:44:08] are you at the hackathon right now? [11:57:46] ori: around by any chance ? :-D [11:59:48] ACKNOWLEDGEMENT - Host elastic1019 is DOWN: PING CRITICAL - Packet loss = 100% Chris Johnson Added new disks..needs reinistall [12:18:45] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: Puppet has 1 failures [12:27:59] (03PS1) 10Calak: Change upload settings on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152042 (https://bugzilla.wikimedia.org/69171) [12:28:05] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [12:30:14] hoo: I did not remember to start the wikiata dumps yesterday during the right window :-( but I have just started them today, at the right time. [12:35:45] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [12:39:44] (03PS1) 10Alexandros Kosiaris: postgres: Followup commit for ebcb7cd [operations/puppet] - 10https://gerrit.wikimedia.org/r/152046 [12:41:30] (03CR) 10jenkins-bot: [V: 04-1] postgres: Followup commit for ebcb7cd [operations/puppet] - 10https://gerrit.wikimedia.org/r/152046 (owner: 10Alexandros Kosiaris) [12:46:05] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [12:49:06] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [13:01:39] apergos: Awesome :) [13:01:58] do you think it might be a good idea to tell me which window that is and give me access to the datasets user maybe? [13:02:03] so that I can start stuff myself [13:02:09] might be needed more often :S [13:06:06] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [13:07:33] hashar: yt? [13:07:46] yih [13:07:58] StevenW: Chris McMahon told me you were looking for me :] [13:08:02] tip: I am not in London [13:08:08] Yeah he told me :) [13:08:33] Did you see the earlier ping from superm401 or phuedx about Beta Labs not updating to master? [13:09:38] StevenW: nop [13:09:47] I guess some job is broken :/ [13:09:53] Prbly [13:09:57] (03CR) 10QChris: Set up passive icinga for webrequest data imports in HDFS and Hive (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151963 (owner: 10Ottomata) [13:10:00] StevenW: do you know whether a bug has been filled? [13:10:15] I don't think so [13:10:21] ah https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ is broken [13:12:47] StevenW: some git directory is dirty [13:12:49] Iam fixing it [13:13:20] merci beaucoup [13:14:04] (03PS2) 10Alexandros Kosiaris: postgres: Followup commit for ebcb7cd [operations/puppet] - 10https://gerrit.wikimedia.org/r/152046 [13:14:04] StevenW: fixed up [13:14:56] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [13:17:36] (03PS1) 10Yuvipanda: quarry: Specify uid of user explicitly [operations/puppet] - 10https://gerrit.wikimedia.org/r/152062 [13:17:42] Coren_WM2014: ^ merge? :) [13:20:24] Yay! Seems to be working now. [13:20:26] Thanks hashar [13:20:51] (03PS1) 10Hashar: beta: fix ansi escapes for wmf-beta-autoupdater [operations/puppet] - 10https://gerrit.wikimedia.org/r/152063 [13:22:14] StevenW: do file a bug next time there is an issue on beta :-] [13:22:27] there is a few folks in the default CC: list that can fix up issues [13:24:18] (03CR) 10Hashar: "Cherry picked on beta cluster puppetmaster. Should see something slightly different on https://integration.wikimedia.org/ci/job/beta-code-" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152063 (owner: 10Hashar) [13:26:54] (03CR) 10coren: [C: 04-1] "This is unlikely to do what you expect if you are using any part of NFS: the user needs to be known in LDAP (we can add users in LDAP at n" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152062 (owner: 10Yuvipanda) [13:28:06] hashar: will do [13:29:08] (03PS3) 10Alexandros Kosiaris: postgres: Followup commit for ebcb7cd [operations/puppet] - 10https://gerrit.wikimedia.org/r/152046 [13:33:48] (03CR) 10Alexandros Kosiaris: [C: 032] postgres: Followup commit for ebcb7cd [operations/puppet] - 10https://gerrit.wikimedia.org/r/152046 (owner: 10Alexandros Kosiaris) [13:34:40] (03PS2) 10Hashar: beta: fix ansi escapes for wmf-beta-autoupdater [operations/puppet] - 10https://gerrit.wikimedia.org/r/152063 [13:37:24] (03CR) 10Hashar: "fixed an indentation error: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/18829/console" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152063 (owner: 10Hashar) [13:58:39] Coren_WM2014: so... id *does* work. I can read and write with matching uids [13:59:20] YuviPanda: The thing you say, it makes no sense to me. Lemme see why. [13:59:31] Coren_WM2014: I just tried touching things and it's finnne [14:00:23] YuviPanda: Can you point me at an example file you have touched? [14:00:32] Coren_WM2014: /data/project/quarry/results/hi [14:02:25] Oh! Ew! Eeeew! You just randomly hit upon the default system user. Not only is this horrible (because that's going to not actually do access control) but you'll get random group membership. [14:12:52] (03PS1) 10Giuseppe Lavagetto: hhvm: fix status [operations/puppet] - 10https://gerrit.wikimedia.org/r/152079 [14:12:54] (03PS1) 10Giuseppe Lavagetto: hhvm: add process monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152080 [14:12:56] (03PS1) 10Giuseppe Lavagetto: mediawiki: basic HHVM monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152081 [14:20:49] (03PS1) 10Giuseppe Lavagetto: mediawiki: move testwiki to HAT [DO_NOT_MERGE] [operations/puppet] - 10https://gerrit.wikimedia.org/r/152082 [14:21:43] <_joe_> was this clear enough? :) [14:22:08] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "Just to be sure no-one gets fancy..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152082 (owner: 10Giuseppe Lavagetto) [14:40:57] (03CR) 10Hashar: [C: 031] beta: Clone mediawiki/vendor instead of mediawiki/core/vendor [operations/puppet] - 10https://gerrit.wikimedia.org/r/151131 (https://bugzilla.wikimedia.org/68485) (owner: 10BryanDavis) [14:53:10] (03CR) 10Filippo Giunchedi: "will puppet change the remote on an existing repo or it needs to be manually fixed?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151131 (https://bugzilla.wikimedia.org/68485) (owner: 10BryanDavis) [14:56:36] Coren_WM2014: any idea what was happening? [14:57:25] <_joe_> YuviPanda: regarding what? [14:57:59] _joe_: the nfs uid issue, which was working when apparently it should've not been [14:58:22] <_joe_> oh no idea [14:59:58] :D [15:01:59] Reedy: so, wanna deploy https://gerrit.wikimedia.org/r/#/c/151425/ ? [15:04:13] (03CR) 10Yurik: Log when Internet.org in X-Analytics iorg when appropriate (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [15:15:56] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [15:16:01] (03PS1) 10Terrrydactyl: Added CentralAuth url and testing databases [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/152108 [15:33:41] (03CR) 10Dzahn: [C: 04-1] "what Filippo said, since we start puppet agent via cron, i think we don't want the service dependency here" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [15:36:38] (03CR) 10Dzahn: "since this has moved to I398127fe8e853 this is duplicate/deprecated now, please abandon" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 (https://bugzilla.wikimedia.org/60238) (owner: 10Tim Landscheidt) [15:41:02] (03CR) 10Dzahn: [C: 032] otrs: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/151787 (owner: 10Matanya) [15:42:43] eh, Google UK forces https to http ? [15:42:45] sweet [15:43:02] (03PS1) 10Hashar: zuul: raise Gearman related log level [operations/puppet] - 10https://gerrit.wikimedia.org/r/152118 [15:43:20] mutante: UK has a big firewall now with deep packet inspection [15:43:31] sounds like China [15:44:37] mutante: would you mind merging some Zuul config changes for me please? https://gerrit.wikimedia.org/r/#/c/152118/ :D [15:44:42] (03CR) 10Giuseppe Lavagetto: wmflib: add ssl_ciphersuite (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150781 (owner: 10Giuseppe Lavagetto) [15:44:49] (03PS3) 10Giuseppe Lavagetto: wmflib: add ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/150781 [15:45:11] (03PS1) 10Bartosz Dziewoński: Remove unnecessary file_exists checks for skin requires [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152119 [15:45:13] (03PS1) 10Bartosz Dziewoński: Add skins requires for Vector and MonoBook [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152120 [15:45:15] (03PS4) 10Giuseppe Lavagetto: wmflib: add ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/150781 [15:45:18] (03PS2) 10Hashar: Added CentralAuth url and testing databases [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/152108 (owner: 10Terrrydactyl) [15:45:34] (03CR) 10Hashar: "Dummy patch to retrigger tests." [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/152108 (owner: 10Terrrydactyl) [15:45:52] mutante: mmhh worked for me, https://www.google.co.uk [15:45:55] (03CR) 10Dzahn: [C: 032] zuul: raise Gearman related log level [operations/puppet] - 10https://gerrit.wikimedia.org/r/152118 (owner: 10Hashar) [15:46:13] mutante: I will run puppet where needed :] [15:46:15] Danke! [15:46:19] godog: hmm. getting http://www.google.co.uk/?gws_rd=ssl [15:46:22] hashar: de rien [15:48:26] godog: i hear it's the Barbican doing this ? [15:49:23] mutante: ouch :( I have "https everywhere" on chrome but on ff just typing google.com does the right thing [15:49:24] (03Abandoned) 10Tim Landscheidt: Set up redirects for toolserver.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 (https://bugzilla.wikimedia.org/60238) (owner: 10Tim Landscheidt) [15:49:27] (03PS2) 10Bartosz Dziewoński: Add skin requires for Vector and MonoBook [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152120 [15:49:34] godog: https://encrypted.google.com/ works [15:49:41] got the hint on #wikimania [15:50:07] mutante: have you merged the change on the puppetmaster ? [15:50:34] hashar: now i did :p [15:50:47] :-D [15:52:13] !log Restarted Zuul and Zuul-merger on gallium to tweak logging settings {{gerrit|152118}} [15:52:18] Logged the message, Master [15:52:20] <_joe_> mutante: hey, hi [15:52:56] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 13:52:37 UTC [15:52:57] (03PS5) 10Giuseppe Lavagetto: wmflib: add ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/150781 [15:53:06] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] wmflib: add ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/150781 (owner: 10Giuseppe Lavagetto) [15:53:14] <_joe_> mutante: ^^ [15:53:21] <_joe_> we should test it and use it everywhere [15:53:33] <_joe_> so we can change configs in one single place [15:53:46] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Wed Aug 6 15:53:43 UTC 2014 [15:53:50] <_joe_> and next chiphersuite change will work everywhere with 1 gerrit change and not 100 [15:54:55] all good I am off [16:01:06] _joe_: sounds cool, i've been thinking if thats possible in the past [16:03:49] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "The virtualhost seems ok, but... why should we add this virtualhost on all mediawikis? Shouldn't be toolserver.org be managed on other ser" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151523 (https://bugzilla.wikimedia.org/60238) (owner: 10Tim Landscheidt) [16:09:53] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "This requires some additional research before it gets a green light." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (https://bugzilla.wikimedia.org/68553) (owner: 10JanZerebecki) [16:10:30] (03CR) 10Filippo Giunchedi: "I think it'd be helpful to get examples of the two approaches to clear things up" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151605 (owner: 10Giuseppe Lavagetto) [16:13:56] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 14:13:18 UTC [16:14:36] (03CR) 10Giuseppe Lavagetto: "An example of the new approach is https://gerrit.wikimedia.org/r/#/c/151606/1/modules/mediawiki/manifests/web/modules.pp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151605 (owner: 10Giuseppe Lavagetto) [16:15:15] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [16:27:15] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [16:33:41] _joe_: what would be the value of mailman queue to alert [16:37:52] using 42 :) [16:38:40] <_joe_> eheh [16:38:56] <_joe_> matanya: I really have no clue at the moment :) [16:39:43] thanks _joe_ and mutante :) [16:41:52] (03CR) 10BryanDavis: "It needs to be manually fixed, but that has already been done in beta. This change will just make it "do the right thing" in the event tha" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151131 (https://bugzilla.wikimedia.org/68485) (owner: 10BryanDavis) [16:43:58] (03PS5) 10Dzahn: mailman: monitor queue size [operations/puppet] - 10https://gerrit.wikimedia.org/r/146756 (owner: 10Matanya) [16:44:34] we all hate submodules [16:45:46] to be fair, lots to hate :) [16:48:16] (03PS6) 10Dzahn: mailman: monitor queue size [operations/puppet] - 10https://gerrit.wikimedia.org/r/146756 (owner: 10Matanya) [16:56:18] (03CR) 10Dzahn: [C: 031] "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146756 (owner: 10Matanya) [16:58:35] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] beta: Clone mediawiki/vendor instead of mediawiki/core/vendor [operations/puppet] - 10https://gerrit.wikimedia.org/r/151131 (https://bugzilla.wikimedia.org/68485) (owner: 10BryanDavis) [16:58:45] (03PS2) 10Filippo Giunchedi: beta: Clone mediawiki/vendor instead of mediawiki/core/vendor [operations/puppet] - 10https://gerrit.wikimedia.org/r/151131 (https://bugzilla.wikimedia.org/68485) (owner: 10BryanDavis) [16:58:52] (03CR) 10Filippo Giunchedi: [V: 032] beta: Clone mediawiki/vendor instead of mediawiki/core/vendor [operations/puppet] - 10https://gerrit.wikimedia.org/r/151131 (https://bugzilla.wikimedia.org/68485) (owner: 10BryanDavis) [17:01:51] leeeroy jeeeenkins.. come on [17:02:13] akosiaris: does backup::client need ferm rule to allow access ? [17:02:36] <^d> !log jenkins restarted, was stuck [17:02:38] Logged the message, Master [17:02:41] thanks ^d [17:02:52] \o/ :) [17:03:03] <^d> yw [17:03:45] (03CR) 10Dzahn: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146756 (owner: 10Matanya) [17:07:03] (03PS7) 10Dzahn: mailman: monitor queue size [operations/puppet] - 10https://gerrit.wikimedia.org/r/146756 (owner: 10Matanya) [17:07:18] lets see of jenkins will make us get to ps20 [17:08:01] (03CR) 10Dzahn: [C: 032] [Italian Planet] Update fcvg.it [operations/puppet] - 10https://gerrit.wikimedia.org/r/151362 (owner: 10Nemo bis) [17:08:59] (03CR) 10Dzahn: [C: 032] mailman: monitor queue size [operations/puppet] - 10https://gerrit.wikimedia.org/r/146756 (owner: 10Matanya) [17:09:23] \o/ [17:10:36] I am fed up with Zuul / Jenkins. [17:12:08] <_joe_> !log stopped the jobrunner on mw1053, was running in fcgi mode unpuppetized and with a broken vhost. Fixed it, it started spawning exceptions. DO NOT enable puppet again [17:12:13] Logged the message, Master [17:13:36] ^d: the Jenkins artifact deployer plugin cause jenkins to take a while to boot :-( [17:14:20] !log killed Jenkins [17:14:25] Logged the message, Master [17:15:02] <^d> hashar: Boo sorry :( [17:15:14] ^d: I have upgraded it yesterday [17:15:19] that is an upstream issue :-: [17:16:04] <^d> Was I right that it was jenkins stuck and not zuul? [17:16:07] <^d> zuul logs looked ok. [17:16:33] there is some crazy problem with Jenkins suddenly disappearing from Zuul [17:16:36] or something along that way [17:16:49] I have parsed gigabytes of log but haven't found any serious clue :-/ [17:16:56] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [17:17:01] <^d> Kicking jenkins the way to get it happy? Or kicking zuul? [17:17:16] (03PS2) 10Ottomata: Set up passive icinga for webrequest data imports in HDFS and Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/151963 [17:17:19] usually I disconnect Jenkins Gearman client [17:17:22] I should document it [17:17:39] icinga is broken due to gwicke not being a contact [17:18:12] ^d: and here is the slowdown : http://paste.debian.net/114007/ [17:18:17] will report it [17:20:01] !log Jenkins process jobs again, the UI will take a bunch of hours to load though due to some issue when initializing [17:20:05] Logged the message, Master [17:23:54] ^d: thanks for the restart :] [17:24:17] <^d> yw. Function of people being 10 feet from me. [17:24:29] <^d> Ran over, "CHAD, CAN YOU RESTART JENKINS? WE KILLED IT" [17:25:11] mutante: I77f412fc202c187fa37710c6e2c004c7ca94f6c6 [17:25:27] (03PS1) 10Dzahn: remove gwicke from icinga contactgroups [operations/puppet] - 10https://gerrit.wikimedia.org/r/152144 [17:26:50] (03CR) 10Dzahn: "this broke icinga - Error: Could not find any contact matching 'gwicke' etc" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 (owner: 10Physikerwelt) [17:27:42] ^d: yeah that was the way to handle it :] [17:27:53] web interface will come back eventually [17:28:00] at least jobs are being run [17:28:09] any patchset pending need to be resent as a new ps [17:28:42] (03PS2) 10Dzahn: remove gwicke from icinga contactgroups [operations/puppet] - 10https://gerrit.wikimedia.org/r/152144 [17:34:14] (03CR) 10Dzahn: "reverted contactgroup part in Id1406c28db329193" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 (owner: 10Physikerwelt) [17:35:11] (03CR) 10Dzahn: [C: 032 V: 032] "override jenkins" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152144 (owner: 10Dzahn) [17:41:28] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 15:40:01 UTC [18:01:03] (03PS1) 10Dzahn: decom tarin (pmtpa poolcounter) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152154 [18:03:23] bblack, around? [18:04:04] (03PS1) 10Dzahn: retab icinga checkcommands template [operations/puppet] - 10https://gerrit.wikimedia.org/r/152155 [18:05:11] (03CR) 10Dzahn: "something is missing, did not show up in icinga yet (but also no error anymore)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146756 (owner: 10Matanya) [18:07:45] (03CR) 10Dzahn: "Ori, wanna take it? path conflict though" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148544 (owner: 10MaxSem) [18:08:55] " Please wait while Jenkins is getting ready to work" [18:09:06] on f.e. https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/ [18:10:24] !log puppet-catalog-compiler says to "wait while Jenkins is getting ready to work" [18:10:30] Logged the message, Master [18:11:09] mutante: hashar said earlier "web interface will come back eventually \ at least jobs are being run" [18:11:14] Jenkins Continuous Integration Server is running with the pid 23374 [18:11:46] spagewmf: ok, i'll just let it run [18:11:54] thx, will check again later [18:14:28] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 14:13:18 UTC [18:15:25] yurikR: yeah [18:16:29] oh, eh [18:16:42] yurikR: pong [18:16:53] do i need to do something like puppet-merge after merging in "analytics/wikistats"? [18:17:13] bblack, hi. I was looking at lots of log entries for zero, seems like we are getting tons of bad XFF [18:17:28] e.g., with intermixed internal and extrenal ip ranges, etc [18:17:43] possibly some system is even prepending IPs instead of appending them [18:17:49] have any examples? [18:18:18] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [18:18:37] (03CR) 10Dzahn: "Tim, i'll run it for you, jenkins was just busy right now. You can also try though if you have permission, i'm not sure, but the login wou" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124001 (owner: 10Tim Landscheidt) [18:29:46] (03PS1) 10Dzahn: wikimedia.org, TXT records for google verification [operations/dns] - 10https://gerrit.wikimedia.org/r/152159 [18:30:18] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [18:32:59] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Wed Aug 6 18:32:56 UTC 2014 [18:36:43] (03CR) 10Dzahn: "please rebase if still desired" [operations/dns] - 10https://gerrit.wikimedia.org/r/133275 (owner: 10Ori.livneh) [18:37:00] (03CR) 10Dzahn: "please rebase if still desired" [operations/dns] - 10https://gerrit.wikimedia.org/r/115093 (owner: 10coren) [18:40:19] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Wed Aug 6 18:40:17 UTC 2014 [18:44:00] (03CR) 10Jalexander: [C: 031] wikimedia.org, TXT records for google verification [operations/dns] - 10https://gerrit.wikimedia.org/r/152159 (owner: 10Dzahn) [18:48:59] !log Stopping Jenkins. Reverting upgrade of artifact deployer plugin [18:49:04] Logged the message, Master [18:50:29] !log restarting jenkins [18:50:35] Logged the message, Master [18:53:14] !log Jenkins slow startup is {{bug|69197}} [18:53:19] Logged the message, Master [18:53:58] looks like I understand Java stack traces afterall [19:09:13] (03CR) 10BBlack: [C: 031] "I double-checked (in case I remembered incorrectly): it's perfectly legal to have other arbitrary TXT records alongside the SPF record (de" [operations/dns] - 10https://gerrit.wikimedia.org/r/152159 (owner: 10Dzahn) [19:10:46] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 17:10:26 UTC [19:17:46] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [19:42:15] bblack, hmm, as i was making an example for you, it occured to me that it should be ok if XFF has a mixture of internal&externa IPs on both ends because those might belong to the internal network of the carrier [19:50:35] (03PS1) 10Jgreen: nagios plugin to check OCG server health [operations/puppet] - 10https://gerrit.wikimedia.org/r/152168 [19:56:31] (03PS2) 10Jgreen: nagios plugin to check OCG server health [operations/puppet] - 10https://gerrit.wikimedia.org/r/152168 [20:06:21] !log I have broke Zuul/Jenkins :-] [20:06:23] again [20:06:27] Logged the message, Master [20:07:18] hashar: good job :D [20:21:36] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [20:26:21] !log Jenkins: downgraded ansicolor plugin from 0.4 to 0.3.1 Some colors.js function emits ANSI codes to reset the color which are not properly understood [20:26:27] Logged the message, Master [20:29:26] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:32:16] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.009 second response time [20:32:36] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [20:32:37] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 30 data above and 0 below the confidence bounds [20:32:37] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 30 data above and 0 below the confidence bounds [20:34:36] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [20:46:21] Alchimista: Fix your connection. [20:47:48] (03CR) 10JanZerebecki: "Do the inline comments provide enough additional information?" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (https://bugzilla.wikimedia.org/68553) (owner: 10JanZerebecki) [21:00:45] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 20:57:55 UTC [21:02:45] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 20:57:55 UTC [21:04:45] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 20:57:55 UTC [21:06:45] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 20:57:55 UTC [21:08:46] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 20:57:55 UTC [21:10:45] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 20:57:55 UTC [21:11:45] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 17:10:26 UTC [21:12:45] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 20:57:55 UTC [21:14:45] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 20:57:55 UTC [21:16:45] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 20:57:55 UTC [21:18:05] RECOVERY - Puppet freshness on tmh1002 is OK: puppet ran at Wed Aug 6 21:17:58 UTC 2014 [21:18:45] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [21:19:45] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 21:17:58 UTC [21:29:52] (03PS1) 10Jgreen: puppetize check_ocg_health nagios check [operations/puppet] - 10https://gerrit.wikimedia.org/r/152180 [21:33:07] !log Jenkins: moved mediawiki-core-regression-hhvm-master to run on Trusty instance [21:33:12] Logged the message, Master [21:37:02] (03CR) 10JanZerebecki: "Adding a virtualhost on all mediawikis is an expression that I don't understand technically. It would just be another virtual host for red" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151523 (https://bugzilla.wikimedia.org/60238) (owner: 10Tim Landscheidt) [21:38:15] RECOVERY - Puppet freshness on tmh1002 is OK: puppet ran at Wed Aug 6 21:38:06 UTC 2014 [21:59:21] (03CR) 10Dr0ptp4kt: "See question in comment." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [21:59:36] ^ yurikR [22:00:35] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 19:59:50 UTC [22:25:25] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [22:26:48] (03CR) 10Yurik: Log when Internet.org in X-Analytics iorg when appropriate (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [22:28:18] (03CR) 10Dr0ptp4kt: Log when Internet.org in X-Analytics iorg when appropriate (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687 (owner: 10Dr0ptp4kt) [22:37:25] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [23:12:35] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Wed 06 Aug 2014 17:10:26 UTC [23:19:35] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [23:39:55] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Wed Aug 6 23:39:50 UTC 2014 [23:45:14] (03PS9) 10Dr0ptp4kt: Log when Internet.org in X-Analytics with proxy tag [operations/puppet] - 10https://gerrit.wikimedia.org/r/151687