[00:00:10] alright, LD time! [00:04:10] !log maxsem synchronized php-1.23wmf6/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/99551' [00:04:26] Logged the message, Master [00:05:45] i wonder when we shall fix the warning Recursion detected in RequestContext::getLanguage [00:06:17] is there a way to get meta info about a warning? like url, stacktrace, etc? [00:10:21] my LD is over [00:12:27] thanks MaxSem [00:14:20] Reedy: ori-l if you "touch" a js file, does that invalidate caches and createa new "version" (timestamp)? [00:15:20] aude, it does [00:15:42] can someone try... [00:15:43] i prefer to touch all files in extension though, to be sure [00:15:43] DataValues/ValueView/resources/jquery.valueview/valueview.experts/experts.CommonsMediaType.js [00:15:49] even better [00:15:57] aude, wikibase? [00:16:00] which branch? [00:16:04] it's an issue on test.wikidata since switching to wmf6 [00:16:15] DataValues (maybe Wikibase too) [00:17:27] the js works fine in debug mode or in a fresh browser [00:17:52] yeah, this crap sometimes happens [00:18:02] * aude nods [00:18:10] that's why we have test.wikidata [00:18:35] * aude would panic if real wikidata was broken [00:18:38] !log maxsem synchronized php-1.23wmf6/extensions/DataValues 'touch' [00:18:47] thanks [00:18:55] Logged the message, Master [00:19:03] seems to work! [00:19:53] hello [00:19:57] I just got back [00:20:19] ok, when i go to a new page, i see the problem [00:20:29] it might be an issue with the code and resource loader [00:20:35] RobH: paging is broken, I got no pages at all [00:20:40] * aude investigates more tomorrow [00:20:57] ok [00:20:59] found the root cause [00:23:14] paravoid: hah, within a minute? jerk [00:23:19] paravoid: what was it? [00:23:33] (03PS1) 10Faidon Liambotis: base: don't ensure => latest for rsyslog [operations/puppet] - 10https://gerrit.wikimedia.org/r/99576 [00:23:57] (03CR) 10Faidon Liambotis: [C: 032] base: don't ensure => latest for rsyslog [operations/puppet] - 10https://gerrit.wikimedia.org/r/99576 (owner: 10Faidon Liambotis) [00:24:02] (03CR) 10Faidon Liambotis: [V: 032] base: don't ensure => latest for rsyslog [operations/puppet] - 10https://gerrit.wikimedia.org/r/99576 (owner: 10Faidon Liambotis) [00:24:17] there [00:24:27] I'll have a look at the underlying swift bug tomorrow [00:24:30] huh [00:24:50] paravoid: thanks :) [00:25:12] paravoid: also, I called your office extension, was that just worthless? does it forward anywhere? [00:25:21] it rings the phone on my desktop [00:25:24] but I was out [00:25:27] * greg-g nods [00:25:28] cool [00:25:35] use my mobile next time :-) [00:25:35] (03CR) 10Aaron Schulz: "Is the job queue one the only one that has problems?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99410 (owner: 10Hashar) [00:25:51] paravoid: I don't have international calling :) [00:26:01] but ariel should have texted you, no? [00:26:12] greg-g, why so 20th century?:) [00:26:25] MaxSem: I'm cheap [00:26:36] bah [00:26:37] Much of the world is still in the 19th century. [00:26:41] yeah, but after it was over + I saw it an hour later [00:26:46] it was very noisy where I was [00:26:51] calling would have worked [00:26:52] but you're working at an international org [00:26:58] * MaxSem bites greg-g [00:27:05] * ori-l is living in a material world [00:27:22] * AaronSchulz is living in a lonely world [00:27:30] I wonder if I have hangouts open on my phone :-) [00:27:36] * Elsie is living on a prayer. [00:27:36] anyway [00:27:42] oh, I do have international, it'd be 35cents/min: https://ting.com/international_calling [00:27:44] * ^d mmbops [00:27:47] * AaronSchulz should take the midnight train going anywhere [00:27:54] greg-g: Brevity is ... wit. [00:27:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [00:28:09] Elsie: :) [00:28:16] "hi, get on irc, bye" [00:28:39] "Erk! Erk!" [00:28:50] 00:40 < AaronSchulz> paravoid: are all the mc servers on the same row? [00:29:01] the boolean answer to this question is "yes" [00:29:08] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:12] the more verbose even worse answer to this question is [00:29:19] "no, they're all on the same rack" [00:29:57] I was just curious since some config can have multiple servers for availability, so it's good to choose well [00:30:06] * AaronSchulz ended up using the jq servers though [00:30:08] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [00:30:17] paravoid: actually what about rdb1001/rdb1003, same rack? [00:30:44] no, different rack/row, fortunately [00:30:51] the mc* issue I pointed out many months ago [00:31:25] not on a ticket though [00:41:44] gwicke: hey [00:44:09] * AaronSchulz is very confused about swift in tampa now [00:44:31] (03PS2) 10Tim Starling: Update trusted-xff.cdb for I4fd360a6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98036 [00:44:43] (03CR) 10Tim Starling: [C: 032] Update trusted-xff.cdb for I4fd360a6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98036 (owner: 10Tim Starling) [00:45:27] (03Merged) 10jenkins-bot: Update trusted-xff.cdb for I4fd360a6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98036 (owner: 10Tim Starling) [00:48:02] paravoid: hi back [00:50:44] oh hey [00:51:09] so as expected node 0.10 is not an issue in itself [00:51:34] just a bit annoying to backport all the bits & pieces [00:55:10] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:55:30] PROBLEM - Puppet freshness on srv193 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 08:44:23 PM UTC [00:57:00] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [01:12:04] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:13:54] RECOVERY - Puppet freshness on srv193 is OK: puppet ran at Fri Dec 6 01:13:53 UTC 2013 [01:15:04] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [01:24:53] paravoid: the test runners now all have your packages installed, and a new run is started [01:25:25] from my side all looks good for a deploy to prod next week [01:37:11] fatalmonitor: 1 Segmentation fault (11) [01:37:18] haven't seen that one before [01:37:22] only 1 though [01:38:23] PROBLEM - Puppet freshness on cp1046 is CRITICAL: Last successful Puppet run was Thu 05 Dec 2013 10:38:05 PM UTC [01:42:30] (03PS1) 10Tholam: Update favicon wikispecies.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99590 [01:56:13] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:59:13] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [02:02:14] !log LocalisationUpdate completed (1.23wmf5) at Fri Dec 6 02:02:14 UTC 2013 [02:02:36] Logged the message, Master [02:10:40] so I'm missing ganglia data for all the hosts I care about again, someone remind me of a common fix for that other than restarted ganglia-monitor on the hosts? [02:10:51] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Mobile%2520caches%2520esams&tab=m&vn=&hide-hf=false [02:11:37] I seem to remember I've hit this once before and it was easy, but apparently it was easily forgettable [02:15:21] !log LocalisationUpdate completed (1.23wmf6) at Fri Dec 6 02:15:21 UTC 2013 [02:15:37] Logged the message, Master [02:22:06] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Dec 6 02:22:06 UTC 2013 [02:22:30] Logged the message, Master [02:27:36] !log tstarling synchronized wmf-config/trusted-xff.cdb [02:27:36] (03PS1) 10Springle: Direct 'vslow' query group to LB=0 snapshot slaves. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99594 [02:27:52] Logged the message, Master [02:28:01] (03PS1) 10Tholam: Update favicon wikiveristy.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99595 [02:28:26] (03CR) 10Springle: [C: 032] Direct 'vslow' query group to LB=0 snapshot slaves. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99594 (owner: 10Springle) [02:29:43] !log springle synchronized wmf-config/db-eqiad.php 'direct vslow query group to LB=0 snapshot slaves' [02:29:58] Logged the message, Master [02:43:56] (03PS1) 10Springle: depool es1001 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99599 [02:45:06] (03CR) 10Springle: [C: 032] depool es1001 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99599 (owner: 10Springle) [02:46:07] !log springle synchronized wmf-config/db-eqiad.php 'depool es1001 for upgrade' [02:46:23] Logged the message, Master [02:48:02] (03PS1) 10Mattflaschen: Test under-review English Wikipedia draft namespace on Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99600 [02:48:07] (03PS1) 10Tholam: Update favicon wmf.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99601 [02:48:07] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:49:07] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [02:50:08] (03PS2) 10Mattflaschen: Test under-review English Wikipedia draft namespace on Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99600 [02:51:17] !log set es1004 max_connections back to default (should have been reset previously, hence dberror log noise) [02:51:38] Logged the message, Master [02:55:44] (03CR) 10Mattflaschen: "Legoktm, it's no longer an issue since MZMcBride changed the patch to "noindex,nofollow". However, http://www.robotstxt.org/faq/relnofollo" (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97675 (owner: 10MZMcBride) [02:58:07] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:07] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [03:06:12] (03CR) 10Reedy: Test under-review English Wikipedia draft namespace on Labs (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99600 (owner: 10Mattflaschen) [03:06:20] (03PS1) 10Tholam: Update favicon wikimania.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99602 [03:11:03] PROBLEM - mysqld processes on es1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [03:12:10] oh [03:16:13] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:17:09] (03CR) 10Brian Wolff: Test under-review English Wikipedia draft namespace on Labs (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99600 (owner: 10Mattflaschen) [03:19:13] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [03:27:16] https://bugzilla.wikimedia.org/show_bug.cgi?id=58074 looks fairly serious (namespace aliases on mlwiki disappeared) [03:31:11] (03PS1) 10Reedy: Fix pa_uswikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99604 [03:31:24] (03CR) 10Brian Wolff: "I'm not sure if this matters, but I think this hook would allow oAuth apps with only "Edit existing pages" rights, also the ability to cre" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99600 (owner: 10Mattflaschen) [03:31:36] (03CR) 10Reedy: [C: 032] Fix pa_uswikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99604 (owner: 10Reedy) [03:32:37] (03Merged) 10jenkins-bot: Fix pa_uswikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99604 (owner: 10Reedy) [03:32:47] (03CR) 10Mattflaschen: "Replied inline." (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99600 (owner: 10Mattflaschen) [03:33:23] !log reedy synchronized multiversion/MWMultiVersion.php [03:33:39] Logged the message, Master [03:35:52] Stupid wiki is stupid [03:38:26] (03PS1) 10Tholam: Update favicon chapcom.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99605 [03:40:40] (03PS1) 10Reedy: Fix comment about . to _ replacement in setSiteInfoForWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99606 [03:47:38] (03CR) 10Mattflaschen: [C: 04-1] "Brian Wolff brought up that it might interfere with OAuth. This can be avoided (and in general the scope is narrowed) by adding an isAnon" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97675 (owner: 10MZMcBride) [03:48:58] beta labs' selenium readership thanks you for being circumspect [03:55:32] (03PS3) 10Mattflaschen: Test under-review English Wikipedia draft namespace on Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99600 [03:58:47] (03PS1) 10Springle: switch es1001 to mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/99608 [04:00:58] (03CR) 10Springle: [C: 032] switch es1001 to mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/99608 (owner: 10Springle) [04:01:20] (03CR) 10Ori.livneh: [C: 032] Test under-review English Wikipedia draft namespace on Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99600 (owner: 10Mattflaschen) [04:02:28] (03PS1) 10Ori.livneh: Revert "added groups::wikidev and accounts::bd808 to zirconium for scholarship app" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99610 [04:02:47] (03Merged) 10jenkins-bot: Test under-review English Wikipedia draft namespace on Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99600 (owner: 10Mattflaschen) [04:04:22] (03PS2) 10Ori.livneh: Revert "added groups::wikidev and accounts::bd808 to zirconium for scholarship app" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99610 [04:06:25] (03CR) 10Ori.livneh: [C: 032] Revert "added groups::wikidev and accounts::bd808 to zirconium for scholarship app" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99610 (owner: 10Ori.livneh) [04:22:14] (03CR) 10Mattflaschen: Create "Draft" namespace on the English Wikipedia (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97675 (owner: 10MZMcBride) [04:26:13] (03PS1) 10Springle: repool es1001 after upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99611 [04:26:41] (03CR) 10Springle: [C: 032] repool es1001 after upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99611 (owner: 10Springle) [04:26:49] (03Merged) 10jenkins-bot: repool es1001 after upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99611 (owner: 10Springle) [04:27:51] !log springle synchronized wmf-config/db-eqiad.php 'repool es1001 after upgrade, max_connections lowered during warm up' [04:28:06] Logged the message, Master [04:39:00] PROBLEM - Puppet freshness on cp1046 is CRITICAL: Last successful Puppet run was Thu 05 Dec 2013 10:38:05 PM UTC [05:17:07] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:19:06] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [05:27:07] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:06] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [05:30:30] (03Abandoned) 10Tholam: Update favicon wikispecies.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99590 (owner: 10Tholam) [05:30:46] (03Abandoned) 10Tholam: Update favicon wikimania.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99602 (owner: 10Tholam) [05:30:58] (03Abandoned) 10Tholam: Update favicon wmf.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99601 (owner: 10Tholam) [05:31:13] (03Abandoned) 10Tholam: Update favicon wikiveristy.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99595 (owner: 10Tholam) [05:48:30] wonder why all abandoned... [06:12:14] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:13:05] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [06:27:14] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:27:24] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [06:29:04] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [06:30:24] RECOVERY - udp2log log age for lucene on oxygen is OK: OK: all log files active [06:45:13] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:03] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [06:56:13] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:58:13] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [07:15:12] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:16:11] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [07:28:11] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:11] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [07:39:43] PROBLEM - Puppet freshness on cp1046 is CRITICAL: Last successful Puppet run was Thu 05 Dec 2013 10:38:05 PM UTC [07:40:13] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:13] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [08:06:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:08:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:10:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:12:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:14:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:16:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:18:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:20:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:22:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:24:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:26:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:01:34 AM UTC [08:27:54] RECOVERY - Puppet freshness on search1020 is OK: puppet ran at Fri Dec 6 08:27:51 UTC 2013 [08:30:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:27:51 AM UTC [08:30:14] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:31:04] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [08:31:24] RECOVERY - Puppet freshness on search1020 is OK: puppet ran at Fri Dec 6 08:31:17 UTC 2013 [08:33:04] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 08:31:17 AM UTC [08:42:14] RECOVERY - Puppet freshness on search1020 is OK: puppet ran at Fri Dec 6 08:42:09 UTC 2013 [08:44:34] PROBLEM - LVS HTTPS IPv4 on text-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:36] PROBLEM - LVS HTTPS IPv4 on wikipedia-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:36] PROBLEM - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:36] PROBLEM - LVS HTTPS IPv4 on mediawiki-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:44] PROBLEM - LVS HTTPS IPv6 on wikisource-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:44] PROBLEM - LVS HTTP IPv6 on mediawiki-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:44] PROBLEM - LVS HTTPS IPv6 on wikibooks-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:44] PROBLEM - LVS HTTPS IPv4 on wikivoyage-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:54] PROBLEM - LVS HTTP IPv4 on wikipedia-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:54] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:54] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:44:54] PROBLEM - LVS HTTP IPv4 on wikimedia-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:04] PROBLEM - LVS HTTP IPv4 on mediawiki-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:04] PROBLEM - LVS HTTPS IPv6 on mediawiki-lb.esams.wikimedia.org_ipv6 is CRITICAL: Connection timed out [08:45:14] PROBLEM - LVS HTTPS IPv4 on wikinews-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:14] PROBLEM - LVS HTTPS IPv6 on wikinews-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:24] PROBLEM - LVS HTTPS IPv6 on wikivoyage-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:24] I hope those are not true [08:45:24] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:24] PROBLEM - LVS HTTP IPv4 on wikibooks-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:34] PROBLEM - LVS HTTPS IPv6 on mobile-lb.esams.wikimedia.org_ipv6 is CRITICAL: Connection timed out [08:45:36] PROBLEM - Frontend Squid HTTP on amssq35 is CRITICAL: Connection timed out [08:45:36] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [08:45:36] PROBLEM - LVS HTTPS IPv6 on wiktionary-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:36] PROBLEM - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:36] RECOVERY - LVS HTTP IPv6 on mediawiki-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 63488 bytes in 7.400 second response time [08:45:44] PROBLEM - LVS HTTP IPv6 on foundation-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:44] PROBLEM - LVS HTTPS IPv4 on wikimedia-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:48] PROBLEM - LVS HTTP IPv6 on wiktionary-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:54] PROBLEM - LVS HTTP IPv4 on text-lb.esams.wikimedia.org is CRITICAL: Connection timed out [08:45:55] PROBLEM - LVS HTTPS IPv4 on wikiquote-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:04] RECOVERY - LVS HTTPS IPv6 on mediawiki-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 63488 bytes in 3.205 second response time [08:46:14] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 63188 bytes in 0.632 second response time [08:46:14] RECOVERY - LVS HTTPS IPv6 on wikivoyage-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 45166 bytes in 1.071 second response time [08:46:14] RECOVERY - LVS HTTP IPv4 on wikibooks-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 63186 bytes in 0.649 second response time [08:46:24] RECOVERY - Frontend Squid HTTP on amssq35 is OK: HTTP OK: HTTP/1.0 200 OK - 1420 bytes in 0.256 second response time [08:46:24] RECOVERY - LVS HTTPS IPv4 on text-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 63619 bytes in 1.058 second response time [08:46:26] RECOVERY - LVS HTTPS IPv4 on wikipedia-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 86541 bytes in 1.191 second response time [08:46:26] RECOVERY - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 63622 bytes in 1.417 second response time [08:46:26] RECOVERY - LVS HTTPS IPv6 on mobile-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 23477 bytes in 0.808 second response time [08:46:28] RECOVERY - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 63188 bytes in 1.497 second response time [08:46:28] RECOVERY - LVS HTTPS IPv4 on mediawiki-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 63617 bytes in 1.437 second response time [08:46:28] RECOVERY - LVS HTTPS IPv6 on wiktionary-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 63623 bytes in 2.829 second response time [08:46:34] RECOVERY - LVS HTTP IPv6 on foundation-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 63638 bytes in 0.660 second response time [08:46:34] RECOVERY - LVS HTTPS IPv6 on wikisource-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 63488 bytes in 1.069 second response time [08:46:34] RECOVERY - LVS HTTPS IPv6 on wikibooks-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 63617 bytes in 1.112 second response time [08:46:34] RECOVERY - LVS HTTPS IPv4 on wikivoyage-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 45196 bytes in 1.279 second response time [08:46:34] RECOVERY - LVS HTTPS IPv4 on wikimedia-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 86529 bytes in 1.525 second response time [08:46:36] RECOVERY - LVS HTTP IPv6 on wiktionary-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 63629 bytes in 0.489 second response time [08:46:44] RECOVERY - LVS HTTP IPv4 on wikipedia-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 86050 bytes in 0.558 second response time [08:46:44] RECOVERY - LVS HTTP IPv4 on text-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 63482 bytes in 0.505 second response time [08:46:46] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 63489 bytes in 0.810 second response time [08:46:46] RECOVERY - LVS HTTPS IPv4 on wikiquote-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 63489 bytes in 0.810 second response time [08:46:46] RECOVERY - LVS HTTP IPv4 on wikimedia-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 86050 bytes in 0.512 second response time [08:46:48] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 63617 bytes in 0.864 second response time [08:46:54] RECOVERY - LVS HTTP IPv4 on mediawiki-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 63186 bytes in 0.509 second response time [08:47:04] RECOVERY - LVS HTTPS IPv6 on wikinews-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 63489 bytes in 0.878 second response time [08:47:04] RECOVERY - LVS HTTPS IPv4 on wikinews-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 63489 bytes in 0.882 second response time [08:48:44] some kind of network hiccup? [08:48:54] maybe [08:49:30] http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20esams&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [08:50:21] http://ganglia.wikimedia.org/latest/?c=Text%20caches%20esams&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [08:51:08] ouch [08:51:28] pretty annoying [08:54:15] for whatever reason the mobile caches having been showing any data in ganglia for hours now [08:54:21] (in esams) [08:54:30] which boxes? [08:54:58] ugh [08:55:32] I'll have a look [08:55:53] I found some stuck (old) gmonds earlier today and have been clearing them out [08:56:01] but not on these hosts (yet) [08:56:02] I restarted gmond on a few of them [08:56:09] didn't change anything [08:56:14] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:56:44] I see [08:57:04] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [08:59:50] most of the esams upload ones, too: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Upload%2520caches%2520esams&tab=m&vn=&hide-hf=false [09:24:12] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:25:02] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [09:47:39] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [10:02:14] !log shot and restarted gmond processes on hooft [10:02:30] Logged the message, Master [10:10:16] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:11:06] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [10:12:57] hey [10:13:06] of course I didn't get paged today either [10:16:49] !log hashar synchronized php-1.23wmf5/extensions/ProofreadPage 'ProofreadPage bug fixes {{gerrit|99522}}' [10:17:06] Logged the message, Master [10:21:58] figures [10:22:19] but it was a short enough hiccup there was no thought of calling in reinforcements or anything [10:27:12] it was GTT [10:27:13] again [10:27:16] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:27:44] apparently I'm not going to get paged from now on [10:27:56] because there's no way in hell I'm going to start dealing with that kind of vendors now [10:28:05] !log hashar synchronized php-1.23wmf6/extensions/ProofreadPage 'ProofreadPage bug fixes {{gerrit|99627}}' [10:28:20] Logged the message, Master [10:29:16] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [10:30:39] do [10:30:52] today I learned tinet was worth 54 millions dollars https://en.wikipedia.org/wiki/Tinet [10:31:06] find that to be quite cheap for a worldwide network [10:40:16] PROBLEM - Puppet freshness on cp1046 is CRITICAL: Last successful Puppet run was Thu 05 Dec 2013 10:38:05 PM UTC [10:40:16] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:41:16] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [10:50:24] I get a patch to raise nrpe_check timeout, might solve that check_job_queue which is spamming us [10:50:39] https://gerrit.wikimedia.org/r/#/c/99410/ implements passing the timeout [10:50:54] and https://gerrit.wikimedia.org/r/#/c/99411/ raises it for check_job_queue to 30 seconds [10:50:58] (untested though) [10:51:16] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:55:06] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [11:03:15] hi, a question concerning cdh4: does anyone know how to set up zookeeper [11:07:23] I created an issue: https://github.com/wikimedia/operations-puppet-cdh4/issues/5 [11:13:36] physikerwelt__: I doubt anyone looks at github issues [11:13:43] our bugtracker is at bugzilla [11:17:32] ok good point. But it's not a bug. I think I'm just too stupid [11:19:00] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:19:10] physikerwelt__: you probably want ottomatta he is NYC based so should show up in a couple hours or so [11:20:00] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [11:26:00] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:26:51] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [11:36:14] physikerwelt__: we don't use CDH4's zookeeper [11:36:20] physikerwelt__: we use the Ubuntu packages [11:36:50] ok my goal was to set up hbase [11:36:53] the "zookeeper" module is for the regular Debian/Ubuntu packages, not the CDH4 packages (so that's why it's not under the "cdh4" module) [11:37:51] but is there a way to configure zookeeper and hdfs to run on the same server? [11:38:02] not with our module, no [11:38:34] ok. bad luck for vagrant [11:38:51] you'd have to either do it manually or make heavy modifications to the module [11:39:01] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:39:52] ok thank you I think I'll go with manual installation for now [11:40:28] maybe, /maybe/, you can get away with it by creating some dummy packages to satisfy the hadoop dependencies [11:40:36] it's going to get tricky, though [11:41:43] for now installing the packages zookeeper and zookeeper-server seems to work [11:41:53] I just had to run service zookeeper-server init [11:42:10] the CDH4 packages are annoying [11:42:21] they've redone everything with different package names, different paths etc. [11:43:24] ok I'll write a guide how to get it working with vagrant. This would be a good starting point for further package development I think [11:43:45] ottomata has been driving this work, you might want to ping him [11:43:59] yes I was very annoyed by the cdh4 packages in the past [11:44:01] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [11:44:15] what do you need HBase for? [11:44:18] is it for Wikimedia? [11:44:24] or just reusing our modules for some other purpose? [11:44:50] (ottomata would be so happy to hear others use the CDH4 module :)) [11:45:06] no it's not directly connected to wikipedia. It's for Stratosphere [11:45:22] but I use Stratosphere for my MathSearch project [11:45:50] however, in the long run I might switch to casandra [11:46:20] but that depends on parsoid [11:46:31] I have to leave [11:46:32] cu [11:46:37] bye! [12:01:11] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:02:01] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [12:27:54] PROBLEM - Disk space on virt11 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 44145 MB (3% inode=99%): [12:36:17] qchris: hey [12:36:24] Hey paravoid [12:38:05] qchris: we had a big traffic spike last night [12:38:12] about 45% by my count [12:38:20] Whooops! [12:38:22] :-) [12:38:31] starting at 21:45 UTC, reaching its peak at 22:00 and ending at 23:20 UTC [12:38:34] my guess is [12:38:37] news about mandela's death [12:39:30] I'll check if that matches the squid logs [12:39:51] (or ... I guess you already did that?) [12:39:59] I didn't [12:40:07] Ok. I'll have a look then. [12:40:19] it didn't cause any problems [12:40:24] (afaik) [12:40:30] it's just interesting :) [12:40:31] That's good to hear. [12:40:43] and thought you might care :) [12:40:53] https://graphite.wikimedia.org/render/?title=HTTP%20Requests/sec%20%28excludes%20bits.wikimedia.org:%20css/js%29%20-1day&from=-1%20day&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=1&lineMode=connected&target=color%28cactiStyle%28alias%28scale%28reqstats.requests,%220.01666%22%29,%20%22requests/sec%22%29%29,%22blue%22%29 [12:41:12] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:41:44] Interesting. [12:42:18] the most interesting part is [12:43:12] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [12:43:28] that percentage-wise, mobile was much more than desktop [12:44:40] :-D [12:52:12] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:53:12] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [12:55:58] manybubbles: hey, we have this https://mingle.corp.wikimedia.org/projects/scrum_of_scrums/cards/8 on the SoS; it's currently listed as "blocked"; it's not very clear to me if that's still the case, is there anything else you guys need? [13:13:11] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:14:01] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [13:23:11] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:25:11] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [13:31:41] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [13:36:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:37:30] (03CR) 10Hashar: [C: 031] "Limited to beta, so I guess it is fine :]" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [13:38:04] paravoid: do we have a graph of varnish HTTP connection that timed-out ? [13:38:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:38:27] it's part of the reqerror [13:39:17] so it is mixed with 503 emitted by backend :( [13:39:28] for some reason on beta, I got one timing out after 50-60 seconds on the text cache although the first byte timeout is supposed to be 180s [13:40:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:41:16] PROBLEM - Puppet freshness on cp1046 is CRITICAL: Last successful Puppet run was Thu 05 Dec 2013 10:38:05 PM UTC [13:42:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:44:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:46:18] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:48:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:50:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:51:06] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:52:06] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [13:52:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:54:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:56:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [13:57:21] i have 'server unavailble' errors on gerrit [13:58:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [14:00:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 01:30:59 PM UTC [14:00:46] RECOVERY - Puppet freshness on pdf3 is OK: puppet ran at Fri Dec 6 14:00:43 UTC 2013 [14:02:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 02:00:43 PM UTC [14:04:16] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 02:00:43 PM UTC [14:13:34] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:14:34] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [14:31:24] RECOVERY - Puppet freshness on pdf3 is OK: puppet ran at Fri Dec 6 14:31:20 UTC 2013 [14:33:18] paravoid: Thanks again for the heads up around the Nelson Mandela spike. All the checks that I did also concluded with the spike being just normal traffic. [14:33:31] cool [14:36:36] RIP. :-( [14:49:01] (03CR) 10Faidon Liambotis: [C: 032] Revert "turn off logging for parsoid for now, was filling /" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99251 (owner: 10GWicke) [14:50:42] (03PS4) 10Faidon Liambotis: Automatically pull proxies from Wikipedia Zero's config namespace on META. [operations/puppet] - 10https://gerrit.wikimedia.org/r/97004 (owner: 10Dr0ptp4kt) [14:50:47] (03PS5) 10Faidon Liambotis: varnish: automatically pull Zero proxies from meta [operations/puppet] - 10https://gerrit.wikimedia.org/r/97004 (owner: 10Dr0ptp4kt) [14:50:56] (03CR) 10Faidon Liambotis: [C: 032] varnish: automatically pull Zero proxies from meta [operations/puppet] - 10https://gerrit.wikimedia.org/r/97004 (owner: 10Dr0ptp4kt) [14:51:57] (03CR) 10Faidon Liambotis: [V: 032] varnish: automatically pull Zero proxies from meta [operations/puppet] - 10https://gerrit.wikimedia.org/r/97004 (owner: 10Dr0ptp4kt) [14:55:32] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:56:32] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [14:58:09] paravoid: regarding my mingle card. We've merged the stuff. We have a strategy for deploying plugins (make a subrepo, put the jars in that, get salt to deploy it, switch to trebuche when that is ready and nuke sub repo) [14:58:45] (03PS1) 10Hashar: parsoid: startup script now has cleared out FDs [operations/puppet] - 10https://gerrit.wikimedia.org/r/99656 [14:58:45] okay [14:58:51] and finally we have a plan for better elasticsearch monitoring - I'm going to implent some improvements upstream and then we'll use them when we release it. At least, that is the plan right now because it seems the most helpful. [14:58:56] so you can clear the card [14:59:31] (03CR) 10Hashar: "Gabriel, Roan, I am not sure what it is going to cause in production. Seems to work fine for me on beta though." [operations/puppet] - 10https://gerrit.wikimedia.org/r/99656 (owner: 10Hashar) [14:59:40] it was more of a "what do you need", not "I want to clear the card" :-) [15:00:17] paravoid: I understand. I think I have all I need. Atleast, everything is moving along. [15:00:21] thanks for asking [15:01:04] cool [15:10:24] (03CR) 10Dan-nl: "who knows how to properly set-up a filebackend source on the beta cluster?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [15:13:39] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:14:29] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [15:15:08] (03Restored) 10Tholam: Update favicon wikimania.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99602 (owner: 10Tholam) [15:15:26] (03Restored) 10Tholam: Update favicon wikispecies.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99590 (owner: 10Tholam) [15:15:42] (03Restored) 10Tholam: Update favicon wmf.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99601 (owner: 10Tholam) [15:15:56] (03Restored) 10Tholam: Update favicon wikiveristy.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99595 (owner: 10Tholam) [15:16:59] PROBLEM - MySQL Recent Restart Port 3308 on db1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:18:39] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:19:29] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [15:21:49] RECOVERY - MySQL Recent Restart Port 3308 on db1054 is OK: OK 928605 seconds since restart [15:34:39] (03PS1) 10Hashar: beta: properly connect to parsoid instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/99659 [15:35:35] (03CR) 10jenkins-bot: [V: 04-1] beta: properly connect to parsoid instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/99659 (owner: 10Hashar) [15:36:41] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:37:31] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [15:40:41] (03CR) 10Hashar: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99659 (owner: 10Hashar) [15:50:46] (03CR) 10Hashar: "Aaron Schulz at least would know. Each time I have to tweak file backends settings, I end up reverse engineering the documentation by rea" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [15:51:41] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:52:31] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [15:58:41] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:59:31] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [16:08:33] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:09:33] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [16:09:48] heya paravoid, you there? [16:09:58] yes [16:10:01] got a weird graph in ganglia i'm trying to make sense of [16:10:01] http://ganglia.wikimedia.org/latest/index.php?r=day&title=&vl=&x=&n=&hreg[]=%28cp1046%7Ccp1047%7Ccp1059%7Ccp1060%7Ccp3011%7Ccp3012%7Ccp4011%7Ccp4012%7Ccp4019%7Ccp4020%29.%2A&mreg[]=pkts_in>ype=line&glegend=show&aggregate=1&embed=1&_=1386346021577&dg=1&tab=m [16:10:15] this is all mobile hosts pkts_in for hte last 24 hours [16:10:24] oh, dan and I are in a hangout talking if you want to join [16:10:26] ganglia was broken on them [16:10:38] you know about this? [16:10:39] bblack noticed, apergos restarted gmond [16:10:43] oh, hm [16:10:49] i'm trying to figure out if this is related to logster [16:10:58] / varnihskafka deployment [16:11:00] apparently it was the one on hooft that matter [16:11:02] i've not got that to work yet [16:11:04] hm [16:11:05] but [16:11:23] cp1046 and cp1047 dno't have recent data at all [16:11:35] if you want: https://plus.google.com/hangouts/_/calendar/d2lraW1lZGlhLm9yZ19jYjM3bXU0OGNuaHRkN2hybmE4czI3b25hb0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t.c6j7qidqs491nhi7ovk9pi4h14?authuser=1 :) [16:12:02] yeah hooft is the aggregator for most of esams, if not all of it [16:12:07] but [16:12:10] hangout about what? [16:12:12] cp1046 and cp1047 are not in esams [16:12:17] oh, dan and I are just there talking about this [16:12:20] he's helping me trouble shoot [16:12:23] no worries, IRC is fine [16:12:52] ganglia hiccup seems more likely to me [16:13:23] i think so too, [16:13:28] i haven't touched cp1047 [16:13:32] so please don't muck with the amssq ganglias, I'm looking at those [16:13:38] ok [16:13:39] or I will be again probably tomorrow [16:13:45] all the graphs for the host are NaN [16:13:50] yes [16:14:02] all recent data is [16:14:03] I had to restart a bunch of these in esams by hand because the pid was wrong and the puppetized restart didn't [16:14:26] apergos, I have been touching cp3011 and cp3003 in the last couple of days [16:14:26] ganglia_new didn't write a pid file until yesterday [16:14:29] i haven't messed with anything anywhere else [16:14:30] I fixed this yesterday [16:14:31] that might mean more issues around that setup, but on the amssqs the ones that work are using multicast still [16:14:33] maybe it's broken? [16:14:36] in esams [16:14:36] and the two 'broken' ones are not so [16:14:43] yeah I saw you pushed that out [16:14:53] I was going to look into that after sorting out the multicast/unicast thing [16:14:58] but that won't happen tonight [16:15:03] cp3003 is a test upload, it has some manual things on it that I haven't reverted yet [16:15:08] i'm still using it to figure out why ganglia things aren't working [16:15:29] ok, so you guys are telling me there are lots of weird ganglia problems right now, that probably are not related to me :p :) [16:15:29] ? [16:16:23] apergos, are you looking into ganglia problems in just esams, or eqiad too? [16:16:34] these were esams only [16:16:48] not lookin at eqiad at all [16:16:56] those apparently all restarted fine [16:17:10] and/or work well after restart :-D [16:17:17] k, the two i'm looking at right now are cp1046 and cp1047, although i'm seeing weirdness on some esams hosts too [16:17:18] so in eqiad go to town if you like [16:17:32] I just want to treat the esams ones as a [16:17:39] *cough* learning opportunity *cough* [16:17:49] hm [16:17:54] (03PS1) 10BBlack: varnish (3.0.3plus~rc1-wm25) precise; urgency=low [operations/debs/varnish] (testing/3.0.3plus-rc1) - 10https://gerrit.wikimedia.org/r/99670 [16:17:57] ganglia is not running on cp1046. [16:17:58] hm. [16:18:01] starting [16:18:36] (03CR) 10BBlack: [C: 032 V: 032] varnish (3.0.3plus~rc1-wm25) precise; urgency=low [operations/debs/varnish] (testing/3.0.3plus-rc1) - 10https://gerrit.wikimedia.org/r/99670 (owner: 10BBlack) [16:20:13] was cp1046 one of the hosts you have disabled puppet on? [16:20:19] yes [16:20:23] i think the only one [16:20:24] it is still ther [16:20:25] thought I remembered it from earlier [16:20:26] disabled [16:20:29] yes the only one right no [16:20:31] w [16:20:33] yea [16:20:43] i guess that would keep it from starrting back up [16:20:45] :p [16:20:48] :-D [16:20:56] * apergos whistles cheerfully [16:21:06] ok, but gmond is running on cp1047 [16:21:09] and it has the same problems [16:21:09] no data [16:21:10] (03PS1) 10Andrew Bogott: Revert "Point the puppet freshness check to nagios.wmflabs.org" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99671 [16:22:45] (03PS2) 10Andrew Bogott: Revert "Point the puppet freshness check to nagios.wmflabs.org" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99671 [16:22:51] what other cp host is it 'like'? cp10...? [16:23:06] cp1046 and cp1047 are both mobiles [16:23:14] they should be identical [16:24:08] but 1046 has the same problem yes? [16:24:10] apergos: what is the multicast/unicast thing you have to sort out? [16:24:20] (03CR) 10Andrew Bogott: [C: 032] Revert "Point the puppet freshness check to nagios.wmflabs.org" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99671 (owner: 10Andrew Bogott) [16:24:39] the amssqs are sending to 239 something on 8649 [16:24:42] (03CR) 10Odder: [C: 031] "Looks great, thanks much!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99602 (owner: 10Tholam) [16:24:46] the 'old' ones that work [16:25:01] on restart they use unicast to a specific port (which we expect), it doesn't [16:25:36] so I gotta see what's wrong on hooft, for someone who knows ganglia it's probably 2 minutes, for me it's going to take a bit [16:27:28] amssqs are esams squids? [16:27:35] (03CR) 10Odder: [C: 031] "The favicon looks great, thank you!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99590 (owner: 10Tholam) [16:27:36] (03PS1) 10Andrew Bogott: Added an additional snmp trap. [operations/puppet] - 10https://gerrit.wikimedia.org/r/99672 [16:28:06] andrewbogott: fwiw, I hate the snmp trap [16:28:40] andrewbogott: I want us to kill it with fire, possibly with either some nrpe checks or (akosiaris' idea) using the puppetmaster's reports [16:28:54] paravoid: I'm starting to hate it too :) Do you have something in mind you'd prefer for puppet monitoring? Just a query rather than a trap? [16:29:06] oh, you beat me to it :) [16:29:13] (03CR) 10Odder: [C: 031] "The favicon looks great, thanks!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99601 (owner: 10Tholam) [16:29:37] hm, actually turning on puppetmaster reporting on labs might get me what I want anyway… I'll give that a look. [16:29:48] ok, I gotta go [16:30:07] :-) [16:30:34] see folks later, hope everything stays quiet [16:30:41] stopping gmond on cp1046, running it in fg in debug mode... [16:30:43] laters! [16:31:05] (you could see if it's actually sending data, then you could go to the aggregator and see if it's coming in) [16:31:13] cp1046 is the aggregator [16:31:18] oh :-D [16:31:22] yeah i've been watching it send [16:31:36] how about nickel (I guess that's where it goes)? [16:31:46] ugh if I start this conversation I will not leave! [16:31:46] (03CR) 10Odder: [C: 031] "Again, the favicon is great; thanks for your contribution, Thomas." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99595 (owner: 10Tholam) [16:31:53] i mean, this looks good [16:31:54] Processing a metric metadata message from cp1046.eqiad.wmnet [16:31:54] ***Allocating metadata packet for host--cp1046.eqiad.wmnet-- and metric --kafka.rdkafka.brokers.analytics1021-eqiad-wmnet_9092.21.rtt.avg-- **** [16:31:58] yeah [16:32:10] but in the rrd directory no updates? [16:32:48] right. I am really sorry. but I have to be gone... good luck, I'll look at it with you when I return if you are not long done by then (most likely) [16:32:49] tah! [16:32:54] right, no updates [16:32:56] s'ok [16:32:59] thanks apergos, laters [16:33:13] sent message 'load_one' of length 44 with 0 errors [16:41:35] PROBLEM - Puppet freshness on cp1046 is CRITICAL: Last successful Puppet run was Thu 05 Dec 2013 10:38:05 PM UTC [16:50:35] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:35] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [16:54:10] (03PS3) 10Greg Grossmeier: Enable GWToolset on betacommons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [16:54:43] (03PS4) 10Greg Grossmeier: Enable GWToolset on betacommons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [16:56:35] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:57:35] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [17:06:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:08:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:10:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:11:43] greg-g: tgr pointed out https://bugzilla.wikimedia.org/show_bug.cgi?id=58100, we should maybe snag a deploy window today [17:12:16] (03CR) 10GWicke: "Time to kill the init script for good I'd say." [operations/puppet] - 10https://gerrit.wikimedia.org/r/99656 (owner: 10Hashar) [17:12:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:14:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:16:14] (03PS5) 10BryanDavis: Enable GWToolset on betacommons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [17:16:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:18:41] gwicke: hey [17:18:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:20:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:21:31] paravoid: good evening [17:21:41] did you see my patch? :) [17:21:50] yes, amended it a bit [17:21:57] awesome [17:22:03] good idea to just set the proxy [17:22:39] the requests are not exactly the same [17:22:39] should go live on Monday [17:22:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:22:54] they're going to be GET http://.../api.php HTTP/1.1 [17:22:57] instead of GET /api.php [17:23:02] but that's fine [17:23:06] RFC2606 allows both [17:24:01] btw, the requests emitted now have "Cookie: undefined" [17:24:18] I saw on a different codepath that you set Cookie conditionally [17:24:28] but I didn't want to do too many unrelated changes [17:24:38] afk for a moment- phone call [17:24:41] k [17:24:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:26:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:27:39] paravoid: Is it important that puppet checks are still handled by icinga? I see a ready-made solution here to log to IRC (but it would bypass any of our existing monitoring tools.) [17:28:00] I think it is, yes [17:28:10] ETOOMANYTOOLS already :) [17:28:22] 'k [17:28:34] I think akosiaris was interested very much on fixing these checks [17:28:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:28:45] he might be able to help [17:28:50] yeah, I'll coordinate with him before doing anything in production. [17:28:52] if you need it :) [17:29:14] akosiaris: (these checks = freshness snmp traps) [17:29:53] Getting notice of a failed run is pretty easy… that's a different problem from freshness though [17:30:07] yeah. I have a ruby script already working through nrpe that works rather well for "freshness" [17:30:17] but not all hosts have nrpe :-( [17:30:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:02:53 PM UTC [17:30:59] So I now I modify it to run on the master collecting the reports and parse them and blah blah you get the picture [17:31:22] hopefully we can then kill that ugly thing that uses snmptraps [17:32:03] RECOVERY - Puppet freshness on wtp1003 is OK: puppet ran at Fri Dec 6 17:32:01 UTC 2013 [17:33:42] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Fri 06 Dec 2013 05:32:01 PM UTC [17:34:21] !log reedy synchronized php-1.23wmf5/extensions/Wikibase 'I06994427b780cd0b66dc0b0279045df7699aef1c' [17:34:38] Logged the message, Master [17:35:30] marktraceur: eek, sorry, yeah, please backport [17:36:06] greg-g: Any particular time? Can do it right after our standup. [17:36:44] I think Reedy is done fixing wikibase, so yeah, after that is fine [17:36:55] couple of minutes and I'll be done [17:36:55] marktraceur: ^ [17:36:56] !log reedy synchronized php-1.23wmf6/extensions/DataValues 'I7cf7af5525f3223dbd044e7676b5c0255b45928c' [17:36:58] ah [17:36:59] 1 to go [17:37:11] oh yeah, two of 'em [17:37:21] Logged the message, Master [17:37:35] There was talk of updating CentralAuth in wmf5 too... [17:37:38] !log reedy synchronized php-1.23wmf6/extensions/Wikibase 'I7cf7af5525f3223dbd044e7676b5c0255b45928c' [17:37:52] Logged the message, Master [17:41:41] (03PS1) 10Ottomata: Fixing bug where JsonLogster would only report metrics for last line in file [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/99678 [17:41:55] (03PS2) 10Ottomata: Fixing bug where JsonLogster would only report metrics for last line in file [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/99678 [17:42:01] (03CR) 10Ottomata: [C: 032 V: 032] Fixing bug where JsonLogster would only report metrics for last line in file [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/99678 (owner: 10Ottomata) [17:45:03] (03PS1) 10Ottomata: Updating changelog with changes from master for version 0.0.4-1 [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/99680 [17:45:11] (03CR) 10Ottomata: [C: 032 V: 032] Updating changelog with changes from master for version 0.0.4-1 [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/99680 (owner: 10Ottomata) [17:45:45] cookie [17:45:53] gr [17:46:37] (03CR) 10Hashar: "Yup should. I am not sure why I spend so much time this afternoon debugging that issue and writing a workaround for it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/99656 (owner: 10Hashar) [17:47:39] gwicke: don't forget to add the config option on monday's deploy :) [17:48:33] alright...... Reedy greg-g the parser function bug still appears but i can't reproduce on my test wikis (on the branch) [17:48:44] aude: lame [17:48:53] (03PS1) 10Aude: disable display of wikibasse parser function errors and debug log group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99681 [17:49:06] i am submitting patch that 1) adds debug logging 2) disable display of the errors [17:49:26] can be displayed with css, for debugging [17:49:47] and sure i never saw it on test2 [17:51:18] the debugging on every page view won't cause use to DOS ourselves (again), right? [17:51:25] no idea how to do it [17:51:40] (03PS1) 10Ottomata: Fixing changelog version to include ~precise1 [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/99683 [17:51:55] (03CR) 10Ottomata: [C: 032 V: 032] Fixing changelog version to include ~precise1 [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/99683 (owner: 10Ottomata) [17:53:34] greg-g: i think it is sampled, but not sure [17:53:42] if we want, it could be tried on one less popular wiki [17:54:19] unfortunatly not test2 [17:54:54] :/ [17:55:29] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:56:40] (03PS1) 10Ottomata: Ignoring fetch_state and ts metrics [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/99684 [17:56:52] (03CR) 10Ottomata: [C: 032 V: 032] Ignoring fetch_state and ts metrics [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/99684 (owner: 10Ottomata) [17:57:17] (03PS1) 10Ottomata: Updating varnishkafka module with metric changes [operations/puppet] - 10https://gerrit.wikimedia.org/r/99686 [17:57:19] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [17:57:28] (03CR) 10Ottomata: [C: 032 V: 032] Updating varnishkafka module with metric changes [operations/puppet] - 10https://gerrit.wikimedia.org/r/99686 (owner: 10Ottomata) [17:58:59] RECOVERY - Puppet freshness on cp1046 is OK: puppet ran at Fri Dec 6 17:58:57 UTC 2013 [18:00:33] grmbl @ freenode disconnects re [18:00:59] greg-g: at least the js on test.wikidata works now :) [18:02:09] RECOVERY - Puppet freshness on wtp1003 is OK: puppet ran at Fri Dec 6 18:02:07 UTC 2013 [18:09:25] (03PS2) 10Aude: disable display of wikibasse parser function errors [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99681 [18:10:43] hey akosiaris, do you have time to help with a small packaging task? [18:13:50] (03CR) 10Dzahn: [C: 031] "correct, it's been removed from Apache in I91124 because before it was already "No wiki found" since $unknown" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99030 (owner: 10MZMcBride) [18:14:18] ori-l: tell me. I don't have too much time though [18:16:10] akosiaris: we need to rebuild imagemagick with this patch: https://bugzilla.wikimedia.org/show_bug.cgi?id=55541#c13 [18:16:40] well, this is the patch: http://paste.tstarling.com/p/IKYsdr.html [18:16:45] but the bugzilla thread provides context [18:17:35] (03PS2) 10Dzahn: Add Maria Pacana to the English Planet Wikimedia [operations/puppet] - 10https://gerrit.wikimedia.org/r/99177 (owner: 10Odder) [18:17:41] ori-l: that's PHP, not imagemagick [18:18:09] exactly. Still easy enough [18:18:38] ah, right. i misread, sorry. [18:18:52] (03CR) 10Dzahn: [C: 032] "feed works, has content about "Contributing to Parsoid"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99177 (owner: 10Odder) [18:19:05] ori-l: also see https://bugzilla.wikimedia.org/show_bug.cgi?id=49118 [18:19:08] very relevant [18:19:50] heh, the 3D helix animated gif is mocking us [18:20:23] have you seen it too? [18:20:45] yeah, it was all over the logs yesterday, along with some other repeat offenders [18:20:51] https://commons.wikimedia.org/wiki/File:Light_dispersion_conceptual_waves.gif [18:21:33] if you don't succeed, try, try, try, try, try, try, try, ... [18:21:46] s/if/if at first/ [18:25:29] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:26:01] (03CR) 10Dzahn: [C: 031] icinga: raise timeout of check_job_queue nrpe command [operations/puppet] - 10https://gerrit.wikimedia.org/r/99411 (owner: 10Hashar) [18:29:18] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [18:30:58] (03CR) 10Dzahn: [C: 031] "confirmed check_nrpe has -t for timeout and the default is 10, having the option is not a bad thing, but on the other hand, yeah, 10 is al" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99410 (owner: 10Hashar) [18:35:51] (03CR) 10GWicke: "We have both upstart and systemd configs at https://www.mediawiki.org/wiki/Parsoid/Setup#Run_the_server, need to test those and check them" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99656 (owner: 10Hashar) [18:36:44] (03CR) 10GWicke: [C: 031] parsoid: startup script now has cleared out FDs [operations/puppet] - 10https://gerrit.wikimedia.org/r/99656 (owner: 10Hashar) [18:47:32] paravoid, I am running one more overwrite test currently in case you want to check out stats while it is ongoing [18:47:39] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=ch&vn=&hide-hf=false&hreg%5B%5D=%28cerium%7Cxenon%7Cpraseodymium%29 [18:47:42] greg-g: Sanity check, I have two backport commits ready to go out [18:47:47] Go? [18:48:08] marktraceur: engage [18:48:20] Aye aye cap'n [18:48:22] * greg-g points forward [18:48:57] (03PS3) 10Aude: disable display of wikibasse parser function errors [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99681 [18:49:12] (03PS4) 10Aude: disable display of wikibase parser function errors [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99681 [18:49:17] Reedy ^ [18:49:55] that doesn't solve the issue but minimizes impact on users until it is solved [18:54:25] !log mholmquist synchronized php-1.23wmf5/extensions/UploadWizard/resources/mw.UploadWizardDetails.js 'Fix UploadWizard for IE8' [18:54:56] !log mholmquist synchronized php-1.23wmf6/extensions/UploadWizard/resources/mw.UploadWizardDetails.js 'Fix UploadWizard for IE8' [18:54:56] Logged the message, Master [18:55:01] greg-g: Done ^^ [18:55:15] Logged the message, Master [18:55:20] marktraceur: things broken? [18:55:27] Not that I could see [18:55:46] But I'm not an IE8 user, so... [18:55:55] marktraceur: you aren't?! [18:56:25] Hey MatmaRex, can you try uploading something with UploadWizard in IE8 to make sure that fix was useful? rillke isn't around to halp test his change. [18:56:51] (I think Matma's a Windows user) [18:57:38] i am, but i reaaaally don't feel like it [18:57:41] the change looks good tho [18:57:53] old IE can't change the type of a button or an input after creating it [18:58:01] i think our CCs mentions this [18:58:05] mention* [19:02:36] (03CR) 10Dzahn: "bump, what MZMcBride said ;)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90117 (owner: 10Nemo bis) [19:03:01] PROBLEM - RAID on virt2 is CRITICAL: Timeout while attempting connection [19:03:51] RECOVERY - RAID on virt2 is OK: OK: optimal, 1 logical, 2 physical [19:05:30] (03PS1) 10Vldandrew: Update favicon office.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 [19:07:31] (03CR) 10Dzahn: "out of curiosity, what's the bug being fixed, at a glance they appear the same to me" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [19:08:54] (03PS3) 10Manybubbles: Enable Cirrus as a beta feature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [19:10:52] (03CR) 10Manybubbles: "Chad and I had a talk and figured this'd be the simplest way to maintain which wikis don't get cirrus as a BetaFeature." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [19:11:57] (03CR) 10Chad: [C: 031] "lgtm! Once we have a window, let's merge :D" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [19:12:09] (03PS1) 10Akosiaris: Add a content parameter to ferm::conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/99704 [19:12:10] (03PS1) 10Akosiaris: Template ferm's defs.production [operations/puppet] - 10https://gerrit.wikimedia.org/r/99705 [19:12:40] (03PS1) 10Dr0ptp4kt: Give zeroadmin autoreview on Zero: configs to make first revs stable. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99706 [19:16:02] akosiaris: should I file an RT ticket for the PHP patch? [19:16:29] ^^yurik [19:17:40] ori-l: that be great if you could :-) [19:17:53] sure [19:19:26] greg-g, can i push out https://gerrit.wikimedia.org/r/#/c/99706/ ? [19:19:35] minor config change of the flagged revs [19:20:32] yurik: what does that fix? [19:20:35] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [19:20:55] (03CR) 10Manybubbles: "Please please please triple check it! I'm always nervous when making large changes to this repository after trying to set wgCirrusSearch " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [19:20:57] it's friday, only other stuff that's gone out have been decently important bug fixes ;) [19:21:31] on meta site, all zero admins gain autoreview right - its only affects meta and its a configuration only [19:21:57] (03CR) 10Manybubbles: "I think we'll do it during out Monday window." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [19:22:03] no rush i guess [19:22:05] so, it isn't that high priority is what you're saying ;) [19:22:37] but... in this case, it isn't that high risk either [19:22:41] we can live without it for a few days, yes - i will depl it on monday than - i still want to make sure it works right before i push out code that relies on it [19:22:43] yurik: you wanna just get it out of the way? [19:23:18] that patch would allow us to review flaggedrevs properly before changing code behavior in zero [19:23:31] * greg-g nods [19:23:35] (which in turn IS a much more dangerous change :)) [19:23:35] go for it [19:23:39] oki [19:23:55] ok, will file-sync it [19:25:44] (03CR) 10Yurik: [C: 032 V: 032] Give zeroadmin autoreview on Zero: configs to make first revs stable. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99706 (owner: 10Dr0ptp4kt) [19:26:00] yurik you deploying that? [19:26:07] (03CR) 10Faidon Liambotis: [C: 04-1] "That's some hairy templating! Three things:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99705 (owner: 10Akosiaris) [19:26:07] yurik, nm, i see you are :) [19:26:08] LOL [19:28:47] !log yurik synchronized wmf-config/flaggedrevs.php [19:28:56] dr0ptp4kt, ^ [19:29:00] Logged the message, Master [19:29:48] yurik, no more 'accept this revision' option on blob creation [19:30:08] dr0ptp4kt, awesome :) [19:32:15] yurik, you want me to accept current revs on real configs? or do you want to do that? i guess we could accept a few on meta, make sure it's good (which it should be), and then do the others 30 seconds at a time or something. you're not seeing any fatals, correct? [19:33:22] dr0ptp4kt, sec [19:36:50] (03PS2) 10Ori.livneh: logstash: add Ganglia group and specify aggregators [operations/puppet] - 10https://gerrit.wikimedia.org/r/99330 [19:37:07] (03PS3) 10Ori.livneh: logstash: add Ganglia group and specify aggregators [operations/puppet] - 10https://gerrit.wikimedia.org/r/99330 [19:40:23] OK, I have a very basic programming question: I want my puppet report to store the status of instances as they come in, and later I want wikitech to query those statuses. [19:40:42] I can think of a ton of ways to handle that -- redis, mysql, file system… [19:40:47] but, is there an obvious right way to do this? [19:41:07] Currently the only place I have an actual list of instances is in ldap, and I'm pretty sure I don't want to use /that/ [19:43:44] (03CR) 10Faidon Liambotis: [C: 032] logstash: add Ganglia group and specify aggregators [operations/puppet] - 10https://gerrit.wikimedia.org/r/99330 (owner: 10Ori.livneh) [19:43:49] thanks [19:44:24] andrewbogott: how is the wikitech interface going to look? [19:44:43] is it going to be a summary of all hosts, or will you be adding this as an annotation to each host's page? [19:45:11] ori-l, I want to add a column to this page: https://wikitech.wikimedia.org/wiki/Special:NovaResources [19:45:35] (loading... loading... loading... ) [19:45:46] Yeah, should probably add some caching to that page :) [19:45:55] andrewbogott: wondering if when you say reports you mean using this stuff "Puppet creates reports as Puppet::Transaction::Report objects, which have changed format several times over the course of Puppet’s history." [19:46:13] http://docs.puppetlabs.com/guides/reporting.html#report-formats [19:46:19] mutante: I'm going to write a custom, tiny ruby script to intercept reports and stash the status. [19:46:24] anyways, you could construct a query to the icinga API via javascript [19:46:44] http://docs.icinga.org/latest/en/icinga-web-api.html [19:46:48] andrewbogott: aha.. just curious how [19:46:50] (03PS1) 10Jeremyb: add throttle exception for queens public lib event [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99716 [19:46:53] Yeah, although at the moment icinga labs is pretty broken. [19:47:29] Also the info I want is slightly different from what icinga gathers at the moment. [19:47:44] anyone feel like doing a deploy for me? :) ( https://gerrit.wikimedia.org/r/99716 ) (any windows in progress?) [19:47:51] you could ask icinga's backend db [19:47:55] as opposed to nagios it's easy [19:47:55] I'm on the fence… fixing icinga and then querying it is maybe the 'right' solution, but… feeling massive. [19:48:42] (03PS2) 10Vldandrew: Update favicon office.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 [19:48:51] andrewbogott: did you find out how labs icinga gets those host names? [19:49:01] it's the script by petan, right [19:49:24] mutante: I stopped caring, for the moment. [19:49:26] IMO it's the 'right' not in an abstract, puritanical way, but because you fix a comprehensive, standard, and already-deployed tool for caching and reporting host statuses instead of adding another one to scratch a specific itch [19:49:52] andrewbogott: k, one by one.. nod [19:50:23] Except for the 'already deployed' part :) [19:50:49] (03CR) 10Odder: "This is part of a Google Code-In task which aims to provide us with multi-layer favicons for all (or, as many as posible) projects. See bu" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [19:51:08] So… at the moment, the (broken) labs monitoring is /on/ a labs host. Whereas the puppet master runs on virt0. [19:51:18] Getting virt0 to know about puppet status is pretty easy, via reporting. [19:52:00] So that's making my head spin a bit. Should labs monitoring be a production service or a labs service? If it runs within labs, can puppet status be conveyed from virt0 to the labs instance? etc. etc. [19:52:36] IMO it makes sense for it to be a production service [19:52:41] (03PS3) 10Jeremyb: Update favicon office.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [19:53:14] * andrewbogott wishes petan would answer my pages, just to give me the slightest idea how things work now [19:53:39] presumably its role is not so much to watch the specific services that are provisioned on various labs nodes but to watch the overall software patchlevel and security setup on nodes that share a network with production hosts [19:54:13] mutante: has to be no blank line between bug and change-id i guess [19:54:33] greg-g: window in progress? [19:54:58] greg-g: i.e. can i has a deploy? (I asked above about https://gerrit.wikimedia.org/r/99716 ) [19:55:06] jeremyb: you should file a bug, first [19:55:14] ori-l: i can... [19:56:03] (03CR) 10Odder: [C: 04-1] "This icon consists of four layers instead of three -- please remove the 256x256 pixel layer, and resubmit the file again. Thanks :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [19:56:13] if you do, I can sync it; we don't usually schedule deployment windows for throttle exemptions because it's understood that in a perfect world this is the sort of thing that would have a config interface anyway [19:56:32] (03CR) 10Dzahn: "i downloaded old and new .ico and used imagemagick's identify command. results below:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [19:57:06] ori-l: someone bothered to get IP addresses ahead of time but they didn't bother to ask for exemption (they just got it so they could unblock the IP) :( :( [19:57:11] After spending several days getting nowhere with icinga, I'm sad to hear that it's the correct solution to my problem :( [19:57:52] andrewbogott: it might be worthwhile just starting a new instance in the monitoring project and putting production icinga classes on it, emulate neon [19:57:59] instead of trying to fix existing one [19:58:08] mutante, yes, that's what I'm doing. [19:58:11] it never matched prod anyways. and this way we actually have something to test [19:58:17] before putting it on neon [19:58:25] But neon works with naggen which requires puppet exported resources, not supported on labs [19:58:26] that's awesome.:) [19:58:34] Which means that I'm more-or-less nowhere. [19:58:45] oooh.. i see [19:58:55] And also the only thing I /care/ about (puppet freshness) everyone agrees is handled stupidly on neon and should not be replicated. [19:59:00] So, I'm exactly nowhere. [19:59:04] sigh [19:59:13] Which is why I'm tempted to ignore icinga [19:59:22] yea, understand [19:59:24] andrewbogott: how is wikitech getting information about hosts at the moment? [19:59:34] openstack API calls made in mediawiki, right? [19:59:38] yep [20:00:42] (03PS4) 10Vldandrew: Update favicon office.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 [20:00:44] (03PS2) 10Jeremyb: add throttle exception for queens public lib event [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99716 [20:00:53] i think that's the same labs icinga does then, using a script by petan [20:00:57] andrewbogott: http://docs.openstack.org/api/openstack-compute/2/content/Create_or_Replace_Metadata-d1e5358.html [20:00:57] ori-l: amended with the bug # [20:01:11] andrewbogott: have the puppet reporter set a metadata item on the node via the openstack API [20:01:18] and have wikitech query for that metadata item [20:02:12] hm… if our version of OS has proper metadata support that might work [20:03:01] of course I'll have to write yet another OS client in ruby… *grumble* [20:03:11] why? [20:03:47] Well… maybe not, I can use a wrapper. [20:04:22] (03CR) 10Odder: [C: 031] "The favicon looks OK to me now, thanks much!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [20:05:00] (03CR) 10Ori.livneh: [C: 032 V: 032] add throttle exception for queens public lib event [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99716 (owner: 10Jeremyb) [20:05:24] !log ori updated /a/common to {{Gerrit|Ifbd6417a2}}: add throttle exception for queens public lib event [20:05:45] Logged the message, Master [20:05:56] ori-l: toda! [20:06:09] !log ori synchronized wmf-config/throttle.php 'Ifbd6417a2: add throttle exception for queens public lib event' [20:06:30] Logged the message, Master [20:06:45] (03CR) 10Dzahn: [C: 031] "256x layer has been removed." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [20:13:55] !log Restarting gmetad on nickel for Ib111e53b6 [20:14:12] Logged the message, Master [20:14:40] ori-l: this is looking promising, except… I need to know the project name to write to the metadata. So, given an instance id... [20:16:14] why do you need to know the project name? [20:17:12] Well, that's if virt0 does the setting. Maybe I can just have everything happen on the instance. [20:17:34] (03PS4) 10Manybubbles: Enable Cirrus as a beta feature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [20:18:11] andrewbogott: why does virt0 need to know it? [20:18:38] Everything in OpenStack is multi-tenant… instance IDs aren't even necessarily unique across an install. Only within a project. [20:19:37] (03CR) 10Manybubbles: "Had to make sure Cirrus is loaded after beta features. Now it needs even more careful review!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [20:20:08] andrewbogott: virt0 is puppetmaster, right? [20:20:20] right [20:21:07] is the project name available as a facter fact? [20:21:41] Hm… yeah, it probably is, I'll look at that. [20:22:06] If/after I exhaust the possibility of instances logging to metadata themselves. [20:22:15] unclear if that's allowed [20:28:15] (03PS1) 10Dzahn: fix parameter order in bugzilla logmail [operations/puppet] - 10https://gerrit.wikimedia.org/r/99727 [20:29:29] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:30:29] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [20:34:58] (03PS1) 10Ori.livneh: Specify $cluster = 'logstash' on logstash hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/99728 [20:35:07] (03CR) 10jenkins-bot: [V: 04-1] Specify $cluster = 'logstash' on logstash hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/99728 (owner: 10Ori.livneh) [20:35:08] (03PS2) 10Ori.livneh: Specify $cluster = 'logstash' on logstash hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/99728 [20:35:15] (03CR) 10Ori.livneh: [C: 032 V: 032] Specify $cluster = 'logstash' on logstash hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/99728 (owner: 10Ori.livneh) [20:39:23] ah, right, sorry for missing that too [20:40:51] paravoid: it didn't work, somehow [20:41:08] gmond.conf wasn't updated [20:41:39] i touched it to see if it was simply the resource that needed to be refreshed by puppet reset it to 'Miscellaneous eqiad' [20:43:34] maybe the order matters, since it's a variable assignment [20:43:51] (03CR) 10Dzahn: [C: 032] fix parameter order in bugzilla logmail [operations/puppet] - 10https://gerrit.wikimedia.org/r/99727 (owner: 10Dzahn) [20:45:07] (03PS1) 10Ori.livneh: logstash nodes: assign $cluster before includes [operations/puppet] - 10https://gerrit.wikimedia.org/r/99730 [20:45:17] (03CR) 10Ori.livneh: [C: 032 V: 032] logstash nodes: assign $cluster before includes [operations/puppet] - 10https://gerrit.wikimedia.org/r/99730 (owner: 10Ori.livneh) [20:46:58] ori-l: Are facts buried in the puppet report someplace? Or are they somehow in scope when reports are processed? [20:49:02] andrewbogott: you should be able to access them via Facter.value('foo') [20:50:08] ping [20:50:26] can someone hook me up with a recent copy of the maxmind database? [20:50:55] or our login to their site, and I can go grab one. [20:51:21] cajoel: do you need the proprietary database? [20:51:30] yes [20:51:34] the free one should be sufficient for testing, shouldn't it? [20:51:34] the paid one [20:51:45] hrm -- maybe.. [20:51:50] they're compatible [20:51:52] ori-l: on the master? [20:52:14] include the puppet class "geoip" is all you need to do [20:52:20] it'll do the right thing in both labs & production [20:52:24] free & paid respectively [20:52:25] lovely [20:52:29] merci [20:52:42] but we can give you the paid one too if needed [20:52:57] I'd rather test against the full -- what's the footprint delta? (ram) [20:53:04] nah, it's nothing [20:53:06] and lookup performance (relatively the same?) [20:53:12] yeah it's the same [20:53:47] -rw-r--r-- 1 root root 43M Dec 1 03:35 GeoIPCity.dat [20:53:49] -rw-r--r-- 1 root root 16M Nov 10 03:39 GeoLiteCity.dat [20:53:52] paid vs. free [20:54:03] that's the city databases, do you need that? [20:54:05] the country is [20:54:10] -rw-r--r-- 1 root root 954K Dec 1 03:35 GeoIP.dat [20:54:12] -rw-r--r-- 1 root root 582K Nov 10 03:35 GeoLite.dat [20:54:44] not seeing geoip in glocal groups list for puppet on 'configure instance' [20:54:46] there's also an IP->ASN one, I'm guessing you don't need that :-) [20:55:02] hm, I think you need to add it in the puppet groups first [20:55:06] andrewbogott: ^ ? [20:55:30] I /will/ need long format ASN names, I'll check it that's in there. [20:55:41] I'd rather not pull that from a different source. [20:55:55] cajoel: Yeah, if you're using a new puppet class you need to add it to the list before you can select it. [20:56:10] https://wikitech.wikimedia.org/wiki/Special:NovaPuppetGroup [20:56:15] csteipp: heh, beta labs still uses a copy of that old thumb-handler.php script we stopped using since swift [20:56:29] !log Restarting gmetad on nickel [20:56:43] andrew/para: lost [20:56:50] Logged the message, Master [20:56:51] can I come shoulder surf someone? [20:57:26] cajoel, what puppet class are you wanting to add, and in what project? I'll add it for you. [20:57:31] it's going to take about 17-18h for the flight, but sure :P [20:57:32] geoip [20:57:44] which is that, the class or the project? [20:57:45] project == netflow [20:57:54] (03PS5) 10Reedy: Update favicon office.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [20:57:59] (03CR) 10Reedy: [C: 032] Update favicon office.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [20:58:15] (03Merged) 10jenkins-bot: Update favicon office.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99702 (owner: 10Vldandrew) [20:58:17] I want the geoip puppet class added to netflow project [20:58:24] https://wikitech.wikimedia.org/wiki/Nova_Resource:I-0000096d [20:59:11] it looks like you already added a group called 'geoip'? [20:59:17] I did that on accident [20:59:17] !log reedy synchronized docroot/bits/favicon/office.ico 'Ib5332726fd02874b25213dd7813b59a684197ac3' [20:59:28] Just add the class to your group, and you should be set. [20:59:32] Logged the message, Master [20:59:33] I /believe/ there's already a puppet class for geoip (see paravoid above) [20:59:42] there's a module, yes [20:59:49] which has a geoip class [21:00:58] hm, ok, now you deleted the group... [21:01:19] (class/group/module) !!#!@ [21:01:23] :) [21:02:07] so… now configure your instance, and tick the 'geoip' box up top. [21:02:24] ah [21:02:34] it's the arbitrary group name that tripped me up [21:02:53] yeah, groups on that page are maybe unnecessary sugar. [21:03:42] delightful [21:04:01] * andrewbogott has a migraine; retires to sit in the dark for a bit [21:06:51] puppet error on labs stage box. [21:06:52] err: /Stage[main]/Geoip::Data::Puppet/File[/usr/share/GeoIP]: Failed to generate additional resources using 'eval_generate: Error 400 on SERVER: Not authorized to call search on /file_metadata/volatile/GeoIP with {:recurse=>true, :checksum_type=>"md5", :links=>"manage"} [21:06:57] err: /Stage[main]/Geoip::Data::Puppet/File[/usr/share/GeoIP]: Could not evaluate: Error 400 on SERVER: Not authorized to call find on /file_metadata/volatile/GeoIP Could not retrieve file metadata for puppet:///volatile/GeoIP: Error 400 on SERVER: Not authorized to call find on /file_metadata/volatile/GeoIP at /etc/puppet/modules/geoip/manifests/data/puppet.pp:22 [21:23:41] (03CR) 10Dan-nl: [C: 031] Enable GWToolset on betacommons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [21:27:10] (03PS6) 10BryanDavis: Enable GWToolset on betacommons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [21:28:16] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:28:58] (03CR) 10BryanDavis: [C: 032] "Sending GWToolset to beta for the first time! Thanks to everyone who helped." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [21:29:06] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [21:29:06] (03Merged) 10jenkins-bot: Enable GWToolset on betacommons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [21:29:38] ottomata: hey [21:29:42] ottomata: how's varnishkafka going? [21:29:44] (03PS1) 10Jgreen: remove sahar's account from aluminium, he's migrated to lutetium [operations/puppet] - 10https://gerrit.wikimedia.org/r/99736 [21:29:52] heya [21:29:53] paravoid, good! [21:30:15] the ganglia thing i think was caused by gmetad deciding not to talk to aggregator data sources that it thought was bad [21:30:19] i just restarted the offending gmonds [21:30:22] and then restarted gmetad [21:30:24] is better [21:30:32] i'm getting the logster stuff a little bit better [21:30:42] setting ganglia slopes properly, filtering out irrlevant stuff, etc. [21:30:46] I care less about logster/ganglia and more about varnishkafka itself tbh :) [21:31:05] is it pushing logs alright? [21:31:13] you're doing 1:1 right now, right? [21:32:01] (03CR) 10Jgreen: [C: 032 V: 031] remove sahar's account from aluminium, he's migrated to lutetium [operations/puppet] - 10https://gerrit.wikimedia.org/r/99736 (owner: 10Jgreen) [21:32:44] (03PS1) 10Ebrahim: Enabling Persian Wikipedia Education Program [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99739 [21:34:01] (03PS1) 10Chad: Configure commons support for Cirrus in beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99741 [21:34:44] (03CR) 10Chad: "Needs I42d944c7 to land before this is fully functional, but can't hurt to merge & go ahead and rebuild labs commonswiki now." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99741 (owner: 10Chad) [21:35:53] (03CR) 10Manybubbles: [C: 031] Configure commons support for Cirrus in beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99741 (owner: 10Chad) [21:36:23] (03CR) 10Ebrahim: "All Extension messages are translated http://translatewiki.net/w/i.php?title=Special:Translate&group=ext-educationprogram&filter=%21transl" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99739 (owner: 10Ebrahim) [21:37:03] (03CR) 10Calak: [C: 031] Enabling Persian Wikipedia Education Program [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99739 (owner: 10Ebrahim) [21:37:59] (03CR) 10Chad: [C: 032] Configure commons support for Cirrus in beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99741 (owner: 10Chad) [21:38:14] (03CR) 10Legoktm: [C: 031] "Actually, the loading order of CirrusSearch vs BetaFeatures doesn't matter, it should work either way." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [21:42:23] (03PS1) 10Chad: Fix undefined variable [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99748 [21:42:52] (03Merged) 10jenkins-bot: Configure commons support for Cirrus in beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99741 (owner: 10Chad) [21:43:46] (03CR) 10Chad: [C: 032 V: 032] Fix undefined variable [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99748 (owner: 10Chad) [21:44:30] (03PS1) 10Ottomata: DRYing up a little in JsonLogster [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/99750 [21:44:40] !log demon synchronized wmf-config/InitialiseSettings.php 'Fix undefined variable' [21:44:47] (03PS1) 10Manybubbles: Remove pool counter setting for Cirrus updates [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99752 [21:44:49] (03CR) 10Ottomata: [C: 032 V: 032] DRYing up a little in JsonLogster [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/99750 (owner: 10Ottomata) [21:44:56] Logged the message, Master [21:45:27] (03PS1) 10Ottomata: Filtering out more irrevant varnishkafka metrics [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/99753 [21:45:28] (03PS1) 10Ottomata: Filtering out more irrelevant metrics and setting slope to positive on some [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/99754 [21:45:42] paravoid: is there any real follow up needed for the rsyslog/switft stupidness (on our part) right now? [21:45:55] greg-g: yes [21:46:09] paravoid, sorry, talkign with dan, doing python review [21:46:30] I found https://bugs.launchpad.net/swift/+bug/780025 [21:46:45] and https://bugs.launchpad.net/swift/+bug/1094230 [21:46:50] and the abandoned https://review.openstack.org/#/c/24871 [21:47:12] plus the completely and utterly broken "fix" https://github.com/openstack/swift/commit/f0eb25a973585e6a6cdb7c69a342fc6a38055e0d [21:47:34] (03CR) 10Manybubbles: "If it doesn't matter I'll revert it but I was having trouble getting Cirrus to work out when it was first on my sandbox. I can try again " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [21:48:12] manybubbles: What wasn't working? [21:48:39] paravoid: ugh, died because of too old? sad [21:48:48] legoktm: the option wasn't appearing [21:48:52] hmmm [21:48:59] I can try again right now [21:49:10] well it should work since its just a hook...lemme try too [21:49:57] (03PS1) 10Vldandrew: Update favicon office.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 [21:50:37] (03PS1) 10Dzahn: add puppet doc compatible comments to bugzilla module [operations/puppet] - 10https://gerrit.wikimedia.org/r/99757 [21:51:00] (03PS2) 10Ottomata: VarnishkafkaLogster.py - setting slope to positive for some Ganglia metrics [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/99754 [21:51:10] (03CR) 10Ottomata: [C: 032 V: 032] Filtering out more irrevant varnishkafka metrics [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/99753 (owner: 10Ottomata) [21:51:19] legoktm: yeah, doesn't work because BetaFeatures sets $wgHooks['GetBetaFeaturePreferences'] = array(); [21:51:26] wut [21:51:26] which clears anything I've already set [21:51:31] * legoktm slaps marktraceur  [21:51:41] it shouldnt do that [21:51:57] sigh [21:52:13] (03PS1) 10Chad: Missing Cirrus settings for beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99758 [21:52:22] (03CR) 10Chad: [C: 032 V: 032] Missing Cirrus settings for beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99758 (owner: 10Chad) [21:53:39] legoktm: Things I didn't know [21:55:04] (03CR) 10Qgil: [C: 04-1] "You have used a MediaWiki logo with text underneath, while the original has no text." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 (owner: 10Vldandrew) [21:56:32] (03CR) 10Odder: [C: 04-1] "The current MediaWiki favicon does not include the name of the software; please remove it and leave just the sunflower in square brackets " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 (owner: 10Vldandrew) [21:57:42] (03CR) 10Qgil: "Also, please use the commit message description to explain what exactly have you changed. Thank you!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 (owner: 10Vldandrew) [21:58:58] (03PS2) 10Vldandrew: Update favicon mediawiki.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 [22:01:26] (03CR) 10Qgil: [C: 04-1] "The original 16x16 icon has better definition than the current one, which looks clearly blurry." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 (owner: 10Vldandrew) [22:04:19] (03PS5) 10Manybubbles: Enable Cirrus as a beta feature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [22:04:32] (03PS3) 10Ottomata: VarnishkafkaLogster.py - setting slope to positive for some Ganglia metrics [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/99754 [22:06:19] (03CR) 10Manybubbles: "Reverted ordering. This requires that a change to BetaFeatures be deployed but that is less scary than moving things around in CommonSett" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98046 (owner: 10Legoktm) [22:08:52] (03PS1) 10Ottomata: Updating to 0.0.5-1 [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/99763 [22:09:23] (03PS2) 10Ottomata: Updating to 0.0.5-1 [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/99763 [22:09:37] (03CR) 10Ottomata: [C: 032 V: 032] Updating to 0.0.5-1 [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/99763 (owner: 10Ottomata) [22:13:01] (03CR) 10Ottomata: [C: 032 V: 032] VarnishkafkaLogster.py - setting slope to positive for some Ganglia metrics [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/99754 (owner: 10Ottomata) [22:13:54] (03PS1) 10Ottomata: Updating varnishkafka module with filter and proper ganglia slope [operations/puppet] - 10https://gerrit.wikimedia.org/r/99765 [22:14:07] (03CR) 10Ottomata: [C: 032 V: 032] Updating varnishkafka module with filter and proper ganglia slope [operations/puppet] - 10https://gerrit.wikimedia.org/r/99765 (owner: 10Ottomata) [22:14:40] PROBLEM - MySQL Slave Delay on db73 is CRITICAL: CRIT replication delay 309 seconds [22:14:53] paravoid: yeah, i want to be able to answer your question confidently, hence the logster/ganglia stuff [22:14:59] but yes, as far as I can tell things look good :) [22:15:00] PROBLEM - MySQL Replication Heartbeat on db73 is CRITICAL: CRIT replication delay 321 seconds [22:15:10] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:16:10] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [22:19:45] (03PS1) 10M4tx: Update favicon wikipedia.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 [22:21:01] paravoid: http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=(cp1046%7Ccp1047%7Ccp1059%7Ccp1060%7Ccp3011%7Ccp3012%7Ccp4011%7Ccp4012%7Ccp4019%7Ccp4020).*&mreg%5B%5D=kafka.rdkafka.brokers..*.rtt.avg>ype=line&glegend=show&aggregate=1 [22:21:23] i think that is in microseconds [22:21:23] so [22:21:33] unsurprisingly, esams takes longer in rtt [22:23:48] (03CR) 10Odder: [C: 04-1] "The Wikipedia favicon is the stylized "W" letter, not the project logo . please improve!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [22:25:00] RECOVERY - MySQL Replication Heartbeat on db73 is OK: OK replication delay 146 seconds [22:26:09] where would I find the ganglia frontend for a labs host? [22:26:33] http://ganglia.wmflabs.org/latest/ [22:26:39] guessed.. :) [22:26:40] RECOVERY - MySQL Slave Delay on db73 is OK: OK replication delay 131 seconds [22:27:10] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:28:10] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [22:35:18] (03PS3) 10Vldandrew: Update favicon mediawiki.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 [22:37:40] PROBLEM - MySQL Slave Delay on db73 is CRITICAL: CRIT replication delay 309 seconds [22:38:00] PROBLEM - MySQL Replication Heartbeat on db73 is CRITICAL: CRIT replication delay 310 seconds [22:40:39] (03PS2) 10M4tx: Update favicon wikipedia.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 [22:41:49] (03CR) 10M4tx: "Oh, I see the new version is incorrect, too. Will fix in the moment." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [22:44:00] (03PS4) 10Vldandrew: Update favicon mediawiki.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 [22:44:56] !log mediawiki 1.22 has been released - http://dumps.wikimedia.org/mediawiki/1.22/ [22:45:20] Logged the message, Master [22:48:49] (03PS3) 10M4tx: Update favicon wikipedia.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 [22:49:44] (03CR) 10M4tx: "OK. I think this one is okay." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [22:55:41] (03PS2) 10Reedy: Fix comment about . to _ replacement in setSiteInfoForWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99606 [22:55:43] (03CR) 10Reedy: [C: 032] Fix comment about . to _ replacement in setSiteInfoForWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99606 (owner: 10Reedy) [22:56:10] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:57:06] (03Merged) 10jenkins-bot: Fix comment about . to _ replacement in setSiteInfoForWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99606 (owner: 10Reedy) [22:57:10] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [23:00:10] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:01:10] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [23:03:01] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [23:06:21] (03PS12) 10Dr0ptp4kt: WIP: Add an extra header for cache variance of W0 banners for proxies. [operations/puppet] - 10https://gerrit.wikimedia.org/r/88261 [23:08:06] (03CR) 10Odder: "CC-ing Isarra, the author of the new Wikipedia favicon so that she can have a look at the new icon, as there is some difference between he" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [23:10:01] (03PS5) 10Vldandrew: Update favicon mediawiki.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 [23:12:19] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:13:19] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [23:15:09] (03CR) 10Dr0ptp4kt: [C: 04-1] "Do not merge yet. See commit message." [operations/puppet] - 10https://gerrit.wikimedia.org/r/88261 (owner: 10Dr0ptp4kt) [23:16:18] (03PS6) 10Vldandrew: Update favicon mediawiki.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 [23:16:20] (03CR) 10Isarra: "I can't say one way or another because I can't figure out how to get the real files out of gerrit. Sorry." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [23:19:34] (03PS1) 10Dan-nl: open-up-wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99775 [23:22:27] (03CR) 10Odder: "@Isarra: please refer to - files with the "new" part in them are the ones that Mateusz created, t" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [23:23:13] (03CR) 10M4tx: "@Isarra: just click the filename above in Patch set 3 and you'll have 2 "Download" links there... What's the problem?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [23:26:19] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:27:49] (03CR) 10Odder: [C: 04-1] "As far as I can see, the 32px layer uses a different sunflower than the 16px and 48px layers (with the inside part of the sunflower being " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99756 (owner: 10Vldandrew) [23:28:19] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [23:32:19] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:33:09] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [23:34:14] (03CR) 10Brian Wolff: "For reference, if we did this in production, it would fix bug 45735" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99775 (owner: 10Dan-nl) [23:39:28] (03CR) 10GWicke: "I think this is no longer needed, the plan is to use mathoid instead." [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 (owner: 10Physikerwelt) [23:40:41] (03CR) 10BryanDavis: [C: 031] "$wgCopyUploadsDomains is the whitelist for UploadFromUrl::isAllowedHost(). If the list is empty then all hosts are allowed." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99775 (owner: 10Dan-nl) [23:40:42] RECOVERY - MySQL Slave Delay on db73 is OK: OK replication delay 134 seconds [23:41:02] RECOVERY - MySQL Replication Heartbeat on db73 is OK: OK replication delay 112 seconds [23:42:04] ori-l: no:) [23:42:22] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:43:12] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [23:51:11] (03PS2) 10Dzahn: add puppet doc compatible comments to bugzilla module [operations/puppet] - 10https://gerrit.wikimedia.org/r/99757 [23:51:22] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:52:22] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [23:53:43] (03PS2) 10Dan-nl: open-up-wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99775 [23:54:09] (03CR) 10Dzahn: [C: 032] "/puppet/modules/bugzilla/manifests$ puppet doc init.pp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99757 (owner: 10Dzahn) [23:56:51] (03CR) 10Dzahn: add puppet doc compatible comments to bugzilla module (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/99757 (owner: 10Dzahn) [23:57:34] (03PS1) 10Ori.livneh: graphite: correct STORAGE_DIR [operations/puppet] - 10https://gerrit.wikimedia.org/r/99782 [23:57:44] (03CR) 10Greg Grossmeier: [C: 031] "I'm fine with this, especially for testing on betalabs. And those domains are reputable, more so than flickr, even." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99775 (owner: 10Dan-nl) [23:58:01] (03CR) 10Ori.livneh: [C: 032 V: 032] graphite: correct STORAGE_DIR [operations/puppet] - 10https://gerrit.wikimedia.org/r/99782 (owner: 10Ori.livneh) [23:58:30] (03CR) 10Dzahn: "this made it show up on https://doc.wikimedia.org/puppet/classes/bugzilla/bugzilla.html" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99757 (owner: 10Dzahn) [23:58:40] (03CR) 10Aaron Schulz: [C: 031] open-up-wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99775 (owner: 10Dan-nl) [23:59:01] (03CR) 10Legoktm: open-up-wgCopyUploadsDomains (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99775 (owner: 10Dan-nl) [23:59:02] (03CR) 10BryanDavis: [C: 031] "After discussion on irc with Greg G, we may have found a more production friendly approach where sites would be selectively added via bug " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99775 (owner: 10Dan-nl)