[03:50:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 13 seconds [03:50:42] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [03:53:24] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [03:53:24] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [03:53:24] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [03:53:24] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [03:55:30] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.015 second response time on port 11000 [06:03:40] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [06:48:57] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:48:57] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [07:06:17] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [08:16:19] hello :) [08:25:27] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [08:25:54] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [08:30:06] moin [08:47:31] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [09:36:24] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 8 seconds [12:44:55] PROBLEM - Host ms-be3 is DOWN: PING CRITICAL - Packet loss = 100% [13:12:28] hashar: do you know who can help us with wikidata ssl bug? [13:12:29] https://bugzilla.wikimedia.org/show_bug.cgi?id=41437 [13:12:45] and squid stuff [13:14:24] New patchset: Jgreen; "add jgreen nagios admin privs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30580 [13:15:33] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30580 [13:15:57] aude: ops I guess :-] [13:16:23] aude: it sounds like the wikidata.org virtual domain is wrongly configured in apache [13:16:38] New patchset: Demon; "Disabling wikidata.org for SUL until we fix SSL" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30581 [13:16:38] which ops person? [13:17:04] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30581 [13:17:36] ^demon: thanks for looking at this [13:17:43] <^demon> No problem. [13:18:14] do we also have to disable the central auth icon? [13:18:15] aude: I am not sure. I guess wikidata using https pass via an nginx proxy which terminates the SSL connection. I have no idea who is the nginx xpert [13:18:31] hmmm [13:19:55] I have no knowledge about nginx setup, most probably something in manifests/protoproxy.pp [13:19:55] !log demon synchronized wmf-config/CommonSettings.php 'Disabling wikidata for SUL until we fix SSL' [13:20:10] Logged the message, Master [13:20:25] <^demon> aude: The logo is what does the remote login. We don't want the logo to show until SSL is fixed. [13:20:49] agree [13:22:16] <^demon> You don't have to disable the icon setting. It won't show the icon if the domain isn't configured. [13:22:28] <^demon> (Just confirmed by logging out/in) [13:23:59] ok [13:24:45] <^demon> Who was configuring ULS earlier? [13:24:51] <^demon> http://p.defau.lt/?RBwd_BU7aoUWkgQVlgqQcA was uncommitted on fenari :\ [13:24:54] reedy [13:24:55] i think [13:25:05] * ^demon glares at Reedy [13:25:16] yes reedy [13:25:35] we have some issues with uls and squid [13:26:03] https://bugzilla.wikimedia.org/show_bug.cgi?id=41451 [13:29:09] Change abandoned: CSteipp; "Added by Chad" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28862 [13:29:34] PROBLEM - Apache HTTP on mw38 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:30:47] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:30:55] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:31:17] RECOVERY - Apache HTTP on mw38 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.966 second response time [13:32:17] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.039 second response time [13:32:25] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.269 second response time [13:42:24] New patchset: Jgreen; "attempting to put aluminium,grosley,erzurumi,loudon in "fundraising" nagios host group" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30583 [13:43:01] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30583 [13:54:01] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [13:54:01] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [13:54:01] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [13:54:01] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [13:55:31] New patchset: Hydriz; "(bug 40474) Set Transwiki namespace on Chinese (zh) wikimedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30585 [14:17:59] Who would be able to see if an IP was being denied access to sites? There is an email on OTRS from an ISP saying they have a number of complaints. [14:18:14] > 121.54.13.51 [14:18:14] > 121.54.2.188 [14:18:23] Ticket 2012102210000612 [14:20:24] not at the squid level [14:20:53] can they view the material but not edit? [14:21:43] From what I gather, they can't load the site [14:21:52] I always try to make sure it's not just an editing block [14:22:12] well like I say not at the squid level [14:22:22] OK. Thanks [14:22:24] sure [14:29:44] New patchset: Hydriz; "(bug 38134) Enable Extension:GoogleNewsSitemap on es wikinews" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30589 [14:54:19] New patchset: Silke Meyer; "Added puppet files for Wikidata on labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30593 [14:57:34] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:59:08] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.881 second response time [15:10:36] New patchset: Mark Bergsma; "Add stanzas for new esams cp and object storage servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30595 [15:29:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30595 [15:39:07] New review: jan; "I would create a new module like "mediawiki" for this with wikidata as a part. This would be much ni..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/30593 [15:39:52] PROBLEM - Host db42 is DOWN: PING CRITICAL - Packet loss = 100% [15:40:11] !log reedy synchronized php-1.21wmf3 'Initial file sync-out' [15:40:24] Logged the message, Master [15:41:15] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: test2wiki to 1.21wmf3 [15:41:28] Logged the message, Master [15:41:39] New patchset: Mark Bergsma; "Add cp30*" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30600 [15:42:52] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30600 [15:46:43] !log reedy Started syncing Wikimedia installation... : Rebuilding localisation cache for 1.21wmf3 [15:47:03] Logged the message, Master [15:47:47] :) [15:49:52] PROBLEM - Host storage3 is DOWN: PING CRITICAL - Packet loss = 100% [15:50:31] !log reedy synchronized php-1.21wmf3/extensions/EventLogging/ [15:50:44] Logged the message, Master [15:51:20] !log reedy synchronized php-1.21wmf3/extensions/PostEdit/ [15:51:22] RECOVERY - Host db42 is UP: PING OK - Packet loss = 0%, RTA = 0.41 ms [15:51:33] Logged the message, Master [15:52:12] * aude can't stay logged into wikidata at the moment [15:52:35] !log Running sync-common on searchidx1001 [15:52:47] Logged the message, Master [15:54:25] !log reedy synchronized wmf-config/ [15:54:34] Logged the message, Master [15:56:11] Reedy: did you just take down test2? [15:56:17] Yeah [15:56:22] Scap takes an age [15:56:24] OK [15:56:29] zeljkof: ^^ [15:56:30] And we have to have a wiki on a version to get a cache built [15:56:32] shouldn't be too long [15:56:51] thanks [15:57:06] Reedy, chrismcmahon: thanks for letting me know :) [15:57:15] if you're lucky, on refresh you might hit an apache with the cache ;) [15:57:36] I was just scanning the irc channels trying to figure out if it is a known problem [15:57:37] Does anyone have any objections to using "y" as wikivoyage interwiki prefix? [15:59:42] zeljkof: either here or wikimedia-tech usually has that information, and it's almost always Reedy 's fault. :) [15:59:55] chrismcmahon: good to know :) [16:01:43] !log reedy Finished syncing Wikimedia installation... : Rebuilding localisation cache for 1.21wmf3 [16:01:56] Logged the message, Master [16:03:47] New patchset: Silke Meyer; "Added puppet files for Wikidata on labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30593 [16:04:46] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [16:06:30] New review: Silke Meyer; "addressed comments by Jan concerning tabs etc." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/30593 [16:08:08] !log reedy Started syncing Wikimedia installation... : Rebuilding localisation cache for 1.21wmf3 [16:08:22] Logged the message, Master [16:09:51] New patchset: Mark Bergsma; "Make cp3019-22 bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30607 [16:12:29] !log reedy Started syncing Wikimedia installation... : Rebuilding localisation cache for 1.21wmf3 [16:12:44] Logged the message, Master [16:23:55] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [16:24:55] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30607 [16:26:58] New patchset: Mark Bergsma; "Add cp3019-3022 as new bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30609 [16:27:15] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30609 [16:27:51] Ganglia is broken [16:27:51] There was an error collecting ganglia data (127.0.0.1:8654): XML error: Invalid document end at 1 [16:30:42] !log mark synchronized wmf-config/CommonSettings.php 'Add new bits server IPs' [16:30:55] Logged the message, Master [16:32:46] New patchset: Mark Bergsma; "Only make the first two hosts Ganglia aggregators" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30611 [16:33:15] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30611 [16:38:12] !log reedy synchronized php-1.21wmf3/cache/l10n/ 'Sync' [16:38:24] why didn't watchmouse detect dead nagios ? [16:38:24] Logged the message, Master [16:41:25] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikidatawiki to 1.21wmf3 [16:41:38] Logged the message, Master [16:42:48] Jeff_Green: nagios is upset about not having a @monitor_group defined for fundraising_pmtpa [16:43:05] well well well [16:43:38] for host erzurumi [16:43:38] !log reedy synchronized php-1.21wmf3/includes/Revision.php [16:43:42] I was trying to set up a new group but it took forever for puppet to get that to nagios [16:43:48] Logged the message, Master [16:44:01] !log reedy synchronized php-1.21wmf3/includes/WikiPage.php [16:44:15] Logged the message, Master [16:44:38] LeslieCarr: where are you seeing the error? [16:44:38] so i don't see a @monitor_group statement in manifests here [16:45:14] well nagios died on spence, so /etc/nagios/puppet_hosts.cfg line 7868 [16:45:23] blargh [16:45:47] which is the first time ez (i can't type that name out all the time, my fingers get tangled) is "mentioned" [16:46:23] so what I did was to add $cluster and $nagios_group to several hosts in site.pp [16:46:26] but, you could make a monitor_group statement in site.pp [16:46:27] and then puppetblast that [16:46:41] it's not my favorite place, however since you don't appear to have any role statements [16:46:52] New review: Faidon; "I think all of these classes are an overkill. Can't we have a php::extension definition that install..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/29975 [16:47:01] I can add role statements too [16:47:07] actually you could use this opportunity to make a role/fundraising.pp , make a comment that it's mostly in private repos, then put the monitor_group statement there [16:47:14] hehehe, brain sync :) [16:47:45] my brain can't be sync'd today, i'm underslept [16:49:00] oh noes [16:49:59] let's do a quick fix first and then I'll go back and role-ify [16:50:36] !log reedy synchronized php-1.21wmf3/includes/ [16:50:49] Logged the message, Master [16:52:02] cool, you got it or want me to fix'er up [16:52:19] if i understand correctly I can just do like: [16:52:54] @monitor_group { "${cluster}_${::site}": description => "${cluster} ${::site}"} [16:52:54] right where I've got $nagios_group right? [16:53:03] if that's right, I got it [16:54:25] yeah [16:54:32] k. [16:54:41] woot [16:55:04] I'm trying to get to a point where I can separate out the fundraising boxes for an additional notification victim [16:55:17] b/c fr-tech wants to suffer the pains [16:57:01] New patchset: Jgreen; "adding monitor_group for fundraising_* boxes, temporarily in site.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30613 [16:57:59] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30613 [16:58:52] now we wait 4E29 seconds for puppet to get the changes onto the nagios box... [17:06:24] is anyone looking at the nagios outage? [17:06:32] paravoid: I think jeff and leslie [17:06:40] Jeff_Green: LeslieCarr want help? [17:07:38] yep [17:07:41] !log reedy synchronized php-1.21wmf2/cache/interwiki.cdb 'Updating interwiki cache' [17:07:42] we are, it was the monitor group [17:07:53] Logged the message, Master [17:08:00] !log reedy synchronized php-1.21wmf3/cache/interwiki.cdb 'Updating interwiki cache' [17:08:09] LeslieCarr: where are you on it currently? [17:08:12] Logged the message, Master [17:08:22] i'm waiting for puppetd -tv to blast the config onto spence [17:08:43] Jeff_Green: want me to fix by hand until that finishes? [17:09:11] actually, nvm, you got this under control :) [17:10:50] if you want to that's fine [17:11:35] i'm working on a new role/fundraising.pp while we wait for the magic of efficient puppet power [17:12:09] heh [17:21:33] New patchset: Jgreen; "moving fundraising config out of site.pp into a new role/fundraising.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30620 [17:23:02] mark: sq37 is back up...replaced the controller card [17:26:51] New review: Mark Bergsma; "This is the correct fix." [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/24797 [17:26:51] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24797 [17:27:37] nagios still doesn't like the config [17:30:53] New patchset: Jgreen; "moving fundraising config out of site.pp into a new role/fundraising.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30620 [17:34:03] RECOVERY - Host sq37 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms [17:34:28] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30620 [17:34:48] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:34:48] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [17:34:48] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [17:35:15] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.006 second response time on port 11000 [17:49:32] New patchset: Demon; "Revert "Disabling wikidata.org for SUL until we fix SSL"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30623 [17:49:51] Can someone please graceful all the apaches? [17:49:56] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30623 [17:50:10] mutante pushed a CORS change for me on friday, but as it was via puppet, it'd take ages to propogate... [17:50:59] catrope is doing a graceful restart of all apaches [17:51:07] !log catrope gracefulled all apaches [17:51:13] Well, "all" [17:51:20] notpeter: still up for helping with the puppet/nagios? i have a fail-to-understand [17:51:20] Logged the message, Master [17:51:20] mw1: System failed sanity check: VIP not configured on lo [17:51:23] For mw1-mw15 [17:51:36] srv283: 29 Oct 17:50:58 ntpdate[30669]: no server suitable for synchronization found [17:51:37] srv283: Error: unable to contact NTP server [17:51:49] Also for srv277, srv275, mw34, mw59 [17:52:01] notpeter: Are the failures above related to precise maybe? --^^^ [17:52:17] !log demon synchronized wmf-config/CommonSettings.php 'Turning SUL back on for wikidata.org' [17:52:29] Logged the message, Master [17:52:50] RoanKattouw: those are now only jobrunners, thus don't have vips [17:53:06] as for ntp, nfi [17:53:15] Jeff_Green: sure [17:53:20] what's going on [17:53:33] Ah so wikidata SSL is working now [17:54:11] the issue is getting puppet to create entries for puppet_hostgroups.cfg and puppet_servicegroups.cfg [17:54:25] <^demon> Krenair: For wikidata.org & www.wikidata.org. Lang subdomains need a little further tweaking. [17:54:30] ok [17:54:55] notpeter: Ah, I see, so mw1-15 aren't running Apache any more? [17:55:24] notpeter: In that case, could you update /etc/dsh/group/apaches ? [17:55:26] notpeter: I added @monitor_group { "${cluster}_${::site}": description => "${cluster}_${::site}"} to each of the classes in role/fundraising.pp that are used, as far as I can tell that's what should be necessary, but puppet didn't update those two files [17:55:38] RoanKattouw: sure [17:55:40] Awesome [17:56:45] done. sorry about that. didn't know that that group was used [17:57:11] what are the names of the hosts in question? [17:57:52] Jeff_Green: er, wait [17:57:56] nagios is running [17:57:59] what is the goal here [17:58:11] it's running because I hand-edited the config [17:58:22] ah, ok :) [17:58:37] goal is to add fundraising-{realm} groups so I can work on split notification [17:58:44] fr-tech wants to get SMS when things explode [17:59:38] what is a host that's having issues? [18:01:35] strictly speaking, spence [18:01:39] * aude needs help with setting up access-origin with wikidata and ULS [18:01:42] https://bugzilla.wikimedia.org/show_bug.cgi?id=41489 [18:01:50] anyone can help us, please? :) [18:01:58] Jeff_Green: sure... [18:02:06] what was it barfing on [18:02:21] notpeter: sec. dialing in [18:02:50] ottomata ? [18:03:06] notpeter: something like "no hostgroup in puppet_hostgroups.cfg for fundraising_pmtpa" for example [18:03:20] I added them by hand [18:03:34] ooo! [18:03:34] sorry [18:04:07] aude: I think I've found it [18:04:12] !log reedy synchronized wmf-config/CommonSettings.php 'Add wikidata to wgCrossSiteAJAXdomains' [18:04:18] aude: ^^ [18:04:24] Logged the message, Master [18:06:50] aude: Looks better in a chrome incognito window now... [18:07:08] Reedy: yay [18:07:18] My current chrome window is caching it to hell and bakc [18:07:26] That's Chrome for ya [18:08:33] * aude learns new stuff everyday [18:08:54] ULS: Unknown language als. load.php:267 [18:08:54] ULS: Unknown language bat-smg. load.php:267 [18:08:54] ULS: Unknown language fiu-vro. load.php:267 [18:09:00] There's just those appearing now [18:09:30] can't get in via sip [18:09:30] i've seen that [18:09:34] those files don't exist yet and it's a bug [18:09:44] multiple tries = fail to connect whatsoever [18:10:23] aude: Also, CreateItem is now quite a bit faster when tabbing! :D [18:11:29] ^demon, I'm looking through the apache config now and the only things that I see are configured differently to the other wiki domains are ServerName and ServerAlias [18:11:36] Gerrit 503'ing again. [18:11:53] Reedy: I filed a bug for ULS issues earlier today [18:12:06] And I just fixed it [18:12:06] Jeff_Green: monitor group doesn't go in node def [18:12:24] it shouldn't require CORS since that doesn't work in all browsers, it shouldn't re-try the same json file 16 times and the unkwown language codes [18:12:30] Jeff_Green: look at role/applicationserver.pp [18:12:35] Reedy: no you didn't, that only hides the problem in some browsers. [18:12:47] notpeter: yes [18:13:01] <^demon> Krenair: Apache config is correct. It needs further DNS work. [18:13:14] I see [18:13:14] <^demon> Krinkle: This has been going on all morning. [18:13:19] <^demon> I don't know why, yet. [18:14:05] The apache restart earlier - was that wikidata ssl-related? [18:14:29] no [18:15:58] notpeter: Have you been messing with searchidx1 by any chance? [18:15:58] Warning: the RSA host key for 'searchidx1001' differs from the key for the IP address '10.64.0.119' [18:16:03] !log catrope synchronized php-1.21wmf3/extensions/VisualEditor/ 'Update VisualEditor' [18:16:06] notpeter: i see the @monitor_group stuff at the top . . . I have that per-class in role/fundraising.pp [18:16:13] RoanKattouw: nope [18:16:16] Logged the message, Master [18:16:17] hah [18:16:28] notpeter: is there something else in this I'm missing? [18:16:31] Oh hmm maybe it's just my homedir that's weird? [18:16:34] Offending key for IP in /home/catrope/.ssh/known_hosts:17 [18:16:36] Matching host key in /etc/ssh/ssh_known_hosts:4753 [18:17:51] Jeff_Green: shouldn't be [18:18:02] (although that's not what I see in git. has this been merged?) [18:18:44] yep [18:19:15] it's in the copy on sockpuppet [18:21:14] Jeff_Green: in what I have in my checkout, I see @monitor_group defs inside of class role::fundraiser::blah [18:21:20] yah [18:21:34] ok, pull those out of the role defs [18:21:47] just throw them in the base of fundraising.pp [18:21:58] really? ok [18:22:27] should work [18:22:32] our stuff is weird.... [18:22:56] it really is [18:25:38] !log aaron synchronized php-1.21wmf3/extensions/TimedMediaHandler 'deployed 05118462e3ddbfc507772cc3c4e1117a8527e0b5' [18:25:52] Logged the message, Master [18:26:37] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [18:27:43] New patchset: Jgreen; "add system_role for fundraising hosts, attempt to fix nagios fundraiser host groups" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30632 [18:28:25] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30632 [18:34:51] chmod g+w /home/wikipedia/common/docroot/www.wikidata.org/favicon.ico [18:34:58] chmod g+w /home/wikipedia/common/docroot/www.wikidata.org/robots.txt [18:35:03] ^ Can someone do that on fenari please? [18:35:49] done [18:36:25] thanks [18:37:09] apergos: Oh, can you do g+w for the www.wikidata.org directory too please? [18:37:23] done [18:37:47] thanks [18:37:52] could have done that and let you fix the other stuff by yerself, oh well :-D [18:37:54] notpeter: did you get me ping about searchidx1001 host key whining? [18:38:42] I pinged him last week about it ;) [18:42:15] nope [18:42:21] meant roan [18:42:42] er [18:42:42] no [18:42:47] gah [18:43:00] not sure why it's not getting picked up [18:43:01] ta [18:43:03] uh [18:43:05] watz? :) [18:43:13] exactly [18:46:05] New patchset: MaxSem; "WIP: support for multiple Solr cores" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29827 [18:46:05] New patchset: MaxSem; "Solr replication" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26571 [18:46:25] New review: MaxSem; "Just a rebase." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/26571 [18:47:45] notpeter: now I'm getting these "Could not retrieve catalog from remote server: Error 400 on SERVER: Exported resource Nagios_host[mw1017] cannot override local resource on node spence.wikimedia.org" [18:48:09] what the fuck? [18:48:38] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [18:48:43] oh, yeah, I have seen that error before. it's seemed to be transient [18:48:43] i'm assuming it's unrelated [18:48:46] I don't fully understand it [18:48:49] yeah [18:49:17] we should just unpuppetize nagios [18:49:59] it's 1000% worse to try to get puppet to do your bidding than it would be to edit the machine over 2400 baud dialup from a 386 running windows 3.0 [18:52:25] lulz [18:57:51] that's bashworthy [18:58:03] ah 1017 [18:58:04] nice [18:58:19] there's like 4-5 hosts it keeps whining about, in random order [19:00:13] hehehe [19:00:27] so naginator works really well [19:00:41] i just hav ebeen putting off fixing neon [19:00:41] :-/ [19:06:34] New patchset: Lcarr; "admins.pp: annotate the include as disabled" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23789 [19:06:52] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23789 [19:08:00] what about the stuff that gets up around the side of the gerrit queue? [19:08:07] Eh.. I just got a 404 error on www.wikidata.org for /w/index.php [19:08:07] hah, LeslieCarr [19:08:15] with a default 404 DocumentHandler response from Apache. [19:08:21] it is fixed now, but how is that even possible? [19:08:37] jeremyb: if you wanna rebase https://gerrit.wikimedia.org/r/#/c/8344/ i'll merge that too [19:09:40] LeslieCarr: in a bit? working on a bug i just noticed [19:09:45] no hurries [19:09:48] danke! [19:09:55] just irc ping me when it's ready ? [19:09:56] sure [19:09:59] kewlio [19:11:07] and an officewiki favicon on https://www.wikidata.org/favicon.ico [19:11:43] LeslieCarr, I will ping you too :) [19:11:47] New patchset: Jgreen; "adjusting banner filters for the fundraisers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30638 [19:11:48] got any time in the next couple of hours for ganglia? [19:12:22] * LeslieCarr hides [19:12:22] when the meeting is done, i go eat lunch, then ganglia sure [19:12:31] notpeter: ah HA. fourth puppetd -tv is finally fetching the config [19:12:36] and if Jeff_Green will be not underwater, also lock down some firewall stuff on payments eqiad [19:13:12] LeslieCarr: ha ok [19:13:32] after lunch, don't wanna do something and then walk away ;) [19:13:46] * Jeff_Green keeps fingers crossed we all don't end up in Oz [19:13:52] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30638 [19:14:44] Jeff_Green: woo! kinda... [19:14:59] spence:/etc/nagios3 is just cruft, right? [19:15:28] every time I get on spence i get confused about which /etc/nagios* dir is used [19:16:01] nagios3 is cruft [19:16:02] spence is insane [19:16:14] notpeter: your suggestion worked, nagios+puppet are once again not enemies sort of [19:16:18] !log reedy synchronized php-1.21wmf3/includes/Revision.php [19:16:31] Logged the message, Master [19:16:32] LeslieCarr: thx. I'm renaming it /etc/nagios3-unused [19:16:38] cool [19:17:07] heh [19:36:43] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: testwiki and mediawikiwiki to 1.21wmf3 [19:36:56] Logged the message, Master [19:37:38] New patchset: Jgreen; "undo added system_role in role/fundraising.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30642 [19:38:07] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30642 [19:39:21] !log reedy synchronized live-1.5/ [19:39:35] Logged the message, Master [19:52:17] mutante: oh for the mysql wikivoyage sizing, make sure the tables are all innodb too [20:16:43] New patchset: Jgreen; "switch fundraising hosts to fundraising ganglia and nagios clusters, additional banner logging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30646 [20:19:46] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30646 [20:24:16] Jeff_Green: ready to firewall lockdown ? [20:30:00] ottomata: ready to figure out why ganglia is being bad ? [20:30:29] yeahhhhhhhh [20:31:10] last we left it, we had an03 as an aggregator [20:31:25] and it wasn't getting any ganglia traffic on the mulitcast addy, other than its own [20:32:31] leslie actually ... i'm sorta stuck on a ganglia problem too, let's hold on the firewall thing until tomorrow? [20:32:41] ok [20:33:05]