[00:11:32] !log demon synchronized php-1.23wmf10/extensions/CirrusSearch 'Cirrus to master, with better job queues' [00:11:39] Logged the message, Master [00:15:23] !log demon synchronized php-1.23wmf10/extensions/CirrusSearch 'Rolling back Cirrus -- fatals' [00:15:30] Logged the message, Master [00:18:20] !log demon synchronized php-1.23wmf10/extensions/Elastica 'Elastica to master' [00:18:27] Logged the message, Master [00:21:20] !log demon synchronized php-1.23wmf10/extensions/CirrusSearch 'Cirrus to master, with better job queues' [00:22:42] !log demon synchronized php-1.23wmf11/extensions/CirrusSearch 'Cirrus to master, with better job queues' [00:22:49] Logged the message, Master [00:25:59] ^d: all good? [00:26:07] <^d> Yep, all dandy now. [00:26:19] <^d> I had a file not appear, and I forgot to update Elastica for wmf10. [00:26:27] * greg-g nods [00:27:44] <^d> cirrusSearchLinksUpdate: 1492161 queued; 41 claimed (25 active, 16 abandoned); 0 delayed [00:27:44] <^d> cirrusSearchLinksUpdatePrioritized: 0 queued; 802 claimed (802 active, 0 abandoned); 0 delayed [00:27:44] <^d> cirrusSearchLinksUpdateSecondary: 2 queued; 0 claimed (0 active, 0 abandoned); 0 delayed [00:27:46] <^d> cirrusSearchUpdatePages: 9 queued; 1572 claimed (1553 active, 19 abandoned); 0 delayed [00:27:56] <^d> cirrusSearchLinksUpdate is slowly going down, for the first time. [00:31:58] cool [00:43:30] (03CR) 10Faidon Liambotis: "Bryan, you said this isn't needed anymore, right?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107609 (owner: 10BryanDavis) [00:49:31] mark: can you look at this and lmk if okay...https://rt.wikimedia.org/Ticket/Display.html?id=6682 [00:53:27] (03Abandoned) 10BryanDavis: kibana: Block access to /status via Varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/107609 (owner: 10BryanDavis) [00:55:39] !log deployed config for jsduck jobs for MultimediaViewer [00:55:47] Logged the message, Master [01:08:28] (03PS1) 10Cmjohnson: removing loudon and erzurumi from puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/108872 [01:10:49] (03CR) 10Cmjohnson: [C: 032] removing loudon and erzurumi from puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/108872 (owner: 10Cmjohnson) [01:33:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [01:54:48] (03PS2) 10Chad: Lower account-related caches from infinity to 1 week [operations/puppet] - 10https://gerrit.wikimedia.org/r/108715 [02:12:29] (03PS1) 10Tim Landscheidt: WIP: Add test suite [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108880 [02:25:20] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:37] !log LocalisationUpdate completed (1.23wmf10) at 2014-01-22 02:28:36+00:00 [02:28:48] RobH: Still around? ^ concerns me, labs puppet updates seem to be hanging as well. [02:29:09] (03PS1) 10Aude: Maintain extension-list-wikidata with Wikidata build (beta only now) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108882 [02:29:22] I'm not sure wikitech.wikimedia.org is accessible. [02:29:36] Gloria: Yep, same problem that I'm pinging RobH about [02:29:37] I think... [02:29:56] Database error after it finally loaded. [02:30:54] Although I don't know why a broken cert would... [02:31:14] hey, it wouldnt [02:31:18] and i never touched virt0 [02:31:22] just virt1000 [02:31:23] Gloria, better now? [02:31:39] the cert is ldap auth stuff and i never changed it, merely have patch waiting for tomorrow [02:31:57] RobH, I see a change to files/ssl/virt0.wikimedia.org.key on palladium [02:32:10] But, ok, I just restarted Apache on virt0 and maybe it's happy now... [02:32:12] yes [02:32:15] i added the key [02:32:17] https://wikitech.wikimedia.org/wiki/SAL isn't loading for me. [02:32:19] its not actually called yet though [02:32:24] Ah, I see. [02:32:29] Ok, so unrelated, sorry for the ping RobH [02:32:32] np [02:32:50] better to check after all =] [02:34:15] (03PS2) 10Aude: Maintain extension-list-wikidata with Wikidata build (beta only now) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108882 [02:34:36] Gloria, is it the case that wikitech works for you when logged out but not when logged in? [02:34:55] Ummm... [02:35:03] Yeah, logged out seems fine. [02:35:08] I'm logged in. [02:35:10] Or trying to be! [02:35:21] login seems to hang for me [02:35:22] Reedy: if you are around or when you get a chance, can you please review https://gerrit.wikimedia.org/r/#/c/108882/ ? [02:36:35] (03CR) 10Tim Landscheidt: [C: 04-1] "Okay, I give up for the next. http://stackoverflow.com/questions/21273104/how-to-set-up-an-apache-httpd-test-instance" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108880 (owner: 10Tim Landscheidt) [02:38:40] PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:39:24] Hm, could use help troubleshooting this if any ops are still up [02:39:25] <^d> andrewbogott: I'm already logged in. I'm only getting pages that are locally cached. Everything else is timing out. [02:39:43] Can't even spell night correctly. FWIW, wikitech doesn't work for me either logged out or in. [02:39:50] ^d: Yeah, I can't tell what's going on... [02:40:09] <^d> Good first place would be to check apache error logs maybe? [02:40:37] <^d> Oh, I see the note earlier about puppet. Might be more than just wikitech? [02:41:04] palladium = virt0? [02:41:13] no, palladium is virt0's puppet master [02:42:10] Anything in the logs? [02:42:13] apache log says 'server reached MaxClients setting, consider raising the MaxClients setting' [02:42:27] So could be a dos I guess? Not sure how to diagnose that. [02:42:41] Didn't we have that same scenario some days ago? [02:42:51] similar, yes. [02:42:53] * andrewbogott looks at ganglia [02:43:05] Hotlinking an image on wikitech was the cause then IIRC? [02:43:49] Hm, very spiky graph here [02:44:22] <^d> what's access.log look like [02:44:22] Well, there were too things. We recently had a plain old dos on wikitech... [02:44:23] <^d> ? [02:44:41] We also recently had a thing where italian wp hit tool-labs on every page load. That didn't break wikitech though [02:45:05] ^d, can you see https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Virtualization%2520cluster%2520pmtpa&tab=m&vn=&hide-hf=false [02:45:14] access.log is quiet and boring [02:45:33] oh, silent actually [02:45:53] <^d> Weird drop off. [02:46:08] <^d> Right before that big spike. [02:46:11] Crazy spikes in CPU usage, followed by a drop-off in network access. [02:46:29] I guess the spike could be me restarting apache? [02:47:31] <^d> I don't *think* so [02:47:32] 152 apache threads currently. That seems like a lot considering the logs are silent :/ [02:47:59] RobH: any chance pmtpa is suffering another network failure? [02:48:48] <^d> I just ssh'd from eqiad to tampa just fine. [02:48:56] Hm, I can get page loads from other tampa boxes [02:50:01] scfc_de: I don't remember the hotlink-an-image thing but that doesn't mean it didn't happen. Do you remember who it was that linked? [02:50:28] ^d: I'm going to restart apache again, just to see what the log says. [02:51:14] andrewbogott: GeoHack linked the logo I believe. [02:51:30] RECOVERY - HTTP on virt0 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 0.073 second response time [02:51:55] andrewbogott: http://permalink.gmane.org/gmane.org.wikimedia.labs/2001 [02:52:10] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.155 second response time [02:53:19] wikitech loads for me again (logged in). [02:53:54] Hm, looks like web crawlers, maybe we lost our robots.txt [02:55:01] ^d, on a normal MW install where would I keep my robots.txt? [02:55:22] <^d> Wherever the web root is. [02:55:51] !log LocalisationUpdate completed (1.23wmf11) at 2014-01-22 02:55:50+00:00 [02:55:57] <^d> It should be served from foo.bar.org/robots.txt [02:56:27] sorry, was afk making dinner [02:56:29] ^d, yeah, 'wherever the web root is' is my question :) All this URL-munging... [02:56:32] just saw pings [02:56:32] But, I'll sort it out. [02:56:41] so what happened? [02:56:55] webcrawler dos? [02:56:55] RobH, wikitech outage… I suspect we were just getting crawled and couldn't handle the load. [02:57:02] Yeah, that's my guess. [02:57:03] eww [02:57:08] Lots of 'bingbot' in the log. [03:01:19] <^d> andrewbogott: I've had to block bingbot from gerrit too. [03:01:23] <^d> Crawls way too much. [03:01:57] Hm, can I use url wildcards like Disallow: /wiki/Special:* [03:01:59] ? [03:02:45] <^d> Yeah [03:02:49] <^d> Should be able to. [03:03:34] ^d, this look sensible? https://wikitech.wikimedia.org/robots.txt [03:03:37] <^d> andrewbogott: Here's the bots I've had issue with in gerrit/gitweb and had to block before: http://p.defau.lt/?Va_822DMrvu6JvvktWO7fg [03:04:38] <^d> I'd disallow all of /w/ too [03:04:55] <^d> Take a look at en.wikipedia.org/robots.txt for comparison for what we deal with in prod. [03:04:56] Seems like we want people to google search our docs, though? [03:05:14] <^d> /w/ isn't /wiki/ :) [03:05:25] ah! [03:05:37] <^d> Anyway, take a look at that pastebin + what we do on enwiki. Should be enough to get you going [03:05:40] * ^d wanders to find food [03:05:42] Hm, any reason for me not to just c/p the en robots.txt? [03:06:09] <^d> Prolly overkill, but couldn't hurt :p [03:06:24] <^d> Can always just c/p then trim it down [03:08:10] wget wget https://en.wikipedia.org/robots.txt ftw! [03:08:36] Gloria, scfc_de, wikitech access OK now? [03:08:40] well, as long as it gets put there normally [03:08:48] ie: when we move wikitech [03:08:59] our docroot stuff is not puppet... nm. [03:09:01] ignore meeeee [03:09:02] andrewbogott: Yes, better now. Thanks. [03:09:05] andrewbogott: Yes. [03:10:07] <^d> RobH: Ideally we'd setup wikitech to use the dynamic robots.txt php script we use in prod, so it doesn't have to be maintained ever. [03:10:11] <^d> :) [03:10:22] yes [03:10:39] someone put in rt ticket that isnt in middle of eating stir fry they made! [03:10:45] <^d> I shall put it in. [03:10:50] <^d> But then I really am going to find food [03:10:53] just so its not forgotten [03:13:26] <^d> rt 6689 [03:16:53] thx ^d [03:19:48] indeed, thx [03:33:21] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-01-22 03:33:21+00:00 [03:33:29] Logged the message, Master [03:33:57] <^demon|away> andrewbogott, RobH: yw [03:34:00] * ^demon|away noms [04:02:29] (03CR) 10Reedy: [C: 031] Maintain extension-list-wikidata with Wikidata build (beta only now) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108882 (owner: 10Aude) [04:34:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [04:35:40] PROBLEM - RAID on db1057 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [04:58:57] (03CR) 10Ori.livneh: [C: 032] webperf: add dispatch_stat helper to navtiming.py [operations/puppet] - 10https://gerrit.wikimedia.org/r/108825 (owner: 10Ori.livneh) [05:02:34] (03PS1) 10Ori.livneh: txstatsd: specify tungsten as graphite host (rather than 'localhost') [operations/puppet] - 10https://gerrit.wikimedia.org/r/108889 [05:02:55] (03CR) 10Ori.livneh: [C: 032 V: 032] txstatsd: specify tungsten as graphite host (rather than 'localhost') [operations/puppet] - 10https://gerrit.wikimedia.org/r/108889 (owner: 10Ori.livneh) [05:19:16] (03PS1) 10Ori.livneh: webperf: fix-up for navtiming.py bug [operations/puppet] - 10https://gerrit.wikimedia.org/r/108891 [05:19:30] (03CR) 10Ori.livneh: [C: 032 V: 032] webperf: fix-up for navtiming.py bug [operations/puppet] - 10https://gerrit.wikimedia.org/r/108891 (owner: 10Ori.livneh) [05:37:24] !log restarting hafnium for kernel upgrade [05:37:31] Logged the message, Master [05:39:40] PROBLEM - Host hafnium is DOWN: PING CRITICAL - Packet loss = 100% [05:41:00] RECOVERY - Host hafnium is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [05:55:30] PROBLEM - NTP on hafnium is CRITICAL: NTP CRITICAL: Offset unknown [06:00:30] RECOVERY - NTP on hafnium is OK: NTP OK: Offset -0.002031087875 secs [06:18:12] (03PS1) 10BryanDavis: logstash: Replace deprecated checksum filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/108897 [06:22:15] (03PS1) 10BryanDavis: logstash: Update fatal grok pattern [operations/puppet] - 10https://gerrit.wikimedia.org/r/108898 [06:24:16] (03PS1) 10Reedy: loginwiki and votewiki are labelled as 'wiki' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108899 [06:25:04] (03CR) 10Reedy: [C: 032] loginwiki and votewiki are labelled as 'wiki' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108899 (owner: 10Reedy) [06:25:10] (03Merged) 10jenkins-bot: loginwiki and votewiki are labelled as 'wiki' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108899 (owner: 10Reedy) [06:31:25] !log reedy synchronized wmf-config/InitialiseSettings.php [06:31:32] Logged the message, Master [07:24:22] (03PS4) 10Reedy: Update CentralAuth RC2UDP config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92463 [07:35:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [07:50:04] (03PS5) 10Reedy: Update CentralAuth RC2UDP config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92463 [07:55:47] (03CR) 10Andrew Bogott: "Yeah, I'm convinced -- when I wrote this patch I thought that the bot was spamming #wikimedia-labs but it turned out to be a semi-legitima" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108654 (owner: 10Andrew Bogott) [07:56:01] (03Abandoned) 10Andrew Bogott: Point the managehome chatbot to #wikimedia-labs-icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/108654 (owner: 10Andrew Bogott) [08:52:50] PROBLEM - MySQL Slave Delay on db69 is CRITICAL: CRIT replication delay 334 seconds [08:52:50] PROBLEM - MySQL Replication Heartbeat on db69 is CRITICAL: CRIT replication delay 339 seconds [08:59:50] RECOVERY - MySQL Slave Delay on db69 is OK: OK replication delay 0 seconds [08:59:50] RECOVERY - MySQL Replication Heartbeat on db69 is OK: OK replication delay -1 seconds [09:20:55] (03PS2) 10Andrew Bogott: When complaining about mount failures, include hostname! [operations/puppet] - 10https://gerrit.wikimedia.org/r/108655 [09:22:28] (03CR) 10Andrew Bogott: [C: 032] When complaining about mount failures, include hostname! [operations/puppet] - 10https://gerrit.wikimedia.org/r/108655 (owner: 10Andrew Bogott) [09:29:10] RECOVERY - Disk space on mw1206 is OK: DISK OK [09:29:20] RECOVERY - Host mw1206 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [09:29:30] RECOVERY - puppet disabled on mw1206 is OK: OK [09:29:30] RECOVERY - SSH on mw1206 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [09:29:40] RECOVERY - twemproxy process on mw1206 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [09:29:40] RECOVERY - RAID on mw1206 is OK: OK: no RAID installed [09:30:00] RECOVERY - DPKG on mw1206 is OK: All packages OK [09:32:00] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.103 second response time [10:02:32] !log restarted mw1206 a little while ago, it was inaccessible even via mgmt [10:02:40] Logged the message, Master [10:06:00] PROBLEM - Disk space on elastic1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 10790 MB (3% inode=99%): [10:07:59] (03PS12) 10Physikerwelt: Add Mathoid module (TeX -> MathML / SVG conversion web service) [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 [10:36:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [11:07:41] PROBLEM - SSH on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:09:10] PROBLEM - HTTP on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:11:10] ^ apergos, bugzilla died [11:11:21] oh oh [11:11:23] sec [11:15:16] watching it boot now [11:15:46] Meh. Want Bugzilla back. [11:15:49] Thanks :) [11:16:40] I bet you do... hopefully this kick will do it (fsck now) [11:17:00] RECOVERY - HTTP on kaulen is OK: HTTP OK: HTTP/1.1 302 Found - 489 bytes in 0.080 second response time [11:17:30] RECOVERY - SSH on kaulen is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [11:17:58] !log powercycled kaulen, it had died the swap death [11:18:05] Logged the message, Master [11:18:27] wonder what job set it off [11:18:32] http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20pmtpa&h=kaulen.wikimedia.org&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [11:33:53] apergos: ugh, wow, just came back [11:34:06] thanks [11:45:27] (03CR) 10JanZerebecki: [C: 031] Set up redirects for toolserver.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 (owner: 10Tim Landscheidt) [11:49:19] (03CR) 10Matanya: [C: 031] Lower account-related caches from infinity to 1 week [operations/puppet] - 10https://gerrit.wikimedia.org/r/108715 (owner: 10Chad) [11:52:38] (03PS1) 10Ori.livneh: route navtiming stats directly to statsd instance on tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/108904 [11:52:48] (03PS2) 10Ori.livneh: route navtiming stats directly to statsd instance on tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/108904 [11:53:04] (03CR) 10Ori.livneh: [C: 032 V: 032] route navtiming stats directly to statsd instance on tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/108904 (owner: 10Ori.livneh) [12:42:10] (03PS1) 10Dzahn: (DO NOT MERGE) switch Bugzilla to zirconium [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 [12:45:58] (03PS2) 10Dzahn: (DO NOT MERGE) switch Bugzilla to zirconium [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 [12:48:53] (03PS3) 10Dzahn: (DO NOT MERGE) switch Bugzilla to zirconium [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 [12:51:24] (03CR) 10Alexandros Kosiaris: [C: 032] Lower account-related caches from infinity to 1 week [operations/puppet] - 10https://gerrit.wikimedia.org/r/108715 (owner: 10Chad) [12:51:29] (03CR) 10Dzahn: [C: 04-2] "-2 per commit message. on a related note: sigh, 3 PS because of actual tabs and in DNS i never know how to do them right, one version will" [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [12:54:40] (03PS1) 10Odder: Add two new user groups for Chinese Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108907 [12:59:49] (03CR) 10Dzahn: "-2 per commit message. on a related note: sigh, 3 PS because of actual tabs and in DNS i never know how to do them right, one version will" (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [13:22:13] (03CR) 10Dzahn: "freenode/dns says we should be using A records for all those service names per http://www.rfc-editor.org/rfc/rfc1912.txt anyways?!, see in" (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [13:25:41] (03CR) 10Dzahn: "/join #dns ; !tell no_cname" [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [13:37:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [14:19:31] (03PS1) 10Ottomata: Adding LICENSE [operations/puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/108911 [14:20:31] (03CR) 10Ottomata: [C: 032 V: 032] Adding LICENSE [operations/puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/108911 (owner: 10Ottomata) [14:33:48] !log csteipp synchronized php-1.23wmf11/includes/media 'bug60339' [14:33:55] Logged the message, Master [14:34:15] !log csteipp synchronized php-1.23wmf10/includes/media 'bug60339' [14:34:23] Logged the message, Master [14:34:26] (03PS1) 10Matanya: bugzilla: remove hardcoded db host [operations/puppet] - 10https://gerrit.wikimedia.org/r/108917 [14:40:05] (03CR) 10Ottomata: [C: 032] "Should I merge?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108852 (owner: 10Manybubbles) [14:40:58] (03CR) 10Dzahn: [C: 04-1] "oh, yes yes, that was on some todo (no db9), thanks! very good. BUT: you're also changing the entire connection method in one go to MySQLi" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108917 (owner: 10Matanya) [14:41:52] (03CR) 10Matanya: "ok, i'll separate it into two patches." [operations/puppet] - 10https://gerrit.wikimedia.org/r/108917 (owner: 10Matanya) [14:41:55] (03CR) 10coren: [C: 031] "Soft +1; it'd have been better if the canonical names were all FQDN or all hostnames, but doing it in this patch would just confuse clarit" (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [14:44:41] (03CR) 10Dzahn: (DO NOT MERGE) switch Bugzilla to zirconium (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [15:03:00] RECOVERY - Disk space on elastic1008 is OK: DISK OK [15:07:06] (03PS2) 10Matanya: bugzilla: remove hardcoded db host [operations/puppet] - 10https://gerrit.wikimedia.org/r/108917 [15:12:35] mutante: what is the varnish misc backend? [15:13:08] ytterbium? [15:15:09] there is no such thing [15:15:09] what are you looking for? [15:16:42] i guess something refers to being "behind misc varnish" [15:16:56] paravoid: manifests/role/cache.pp has a FIXME in regarding antimony [15:17:09] so the backend would be which ever webserver is behind it [15:18:10] in this case i'm not sure i understand the fixme [15:19:09] 'default_backend' => 'antimony', # FIXME [15:19:16] you mean that? [15:19:23] ehm.. [15:19:42] yes, that [15:19:44] the FIXME part is that there should not be a default backend [15:20:14] what should be instead? round rubin? [15:24:12] (03PS3) 10Ottomata: Puppetizing wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/108643 [15:25:08] (03PS4) 10Ottomata: Puppetizing wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/108643 [15:25:59] oh paravoid, was just looking at my gerrit [15:26:05] you need to undo your −2 on this [15:26:05] https://gerrit.wikimedia.org/r/#/c/107723/ [15:26:14] in order for it to actually merge [15:26:52] matanya: that "backend_options =>" hash looks promising, lines 1193ff, but i don't know exactly how it will balance [15:27:10] yeah, was glacing at it [15:27:26] looks risky to break it [15:28:02] i think i should stop breaking paravoid's caching systems :) [15:33:22] (03CR) 10Ottomata: [C: 032 V: 032] Puppetizing wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/108643 (owner: 10Ottomata) [15:45:42] ottomata: just submit a new patch (like change slightly the commit message) of https://gerrit.wikimedia.org/r/#/c/107723/ [15:46:13] that is the one that had both a -2 and a +2 eh ? [15:46:23] yeah ha [15:46:27] right right [15:49:56] (03PS3) 10Matanya: bugzilla: remove hardcoded db host [operations/puppet] - 10https://gerrit.wikimedia.org/r/108917 [15:53:39] (03CR) 10coren: (DO NOT MERGE) switch Bugzilla to zirconium (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [16:12:02] (03CR) 10Dzahn: [C: 032] "thanks! removing db9 was definitely a FIXME on a todo list. and it just influences that reporter script not bugzilla itself, and also just" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108917 (owner: 10Matanya) [16:38:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [16:54:53] (03CR) 10Legoktm: [C: 04-1] "You also need to add "cewiki" to flaggedrevs.dblist" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108302 (owner: 10Gerrit Patch Uploader) [16:59:10] !log rolling certificate changes to virt1000 and virt0. virt0 puppet agent disabled for moment [16:59:18] Logged the message, RobH [17:05:00] !log issues with change, don't re-enable puppet agent on virt0 yet [17:05:07] Logged the message, RobH [17:13:18] So what do I put in my commit message to link a change to another change? [17:13:28] ie: change A did 90% but had typo, change B fixes typo [17:15:04] You can reference either the Git SHA1 or the Gerrit Change-Id. [17:15:38] (Both are searchable in the repo, the short sequential Gerrit ID (1xxxxx) is not.) [17:15:55] well, i can put change id just fine [17:16:02] but what is syntax for that in commit message [17:16:08] so it doesnt think its my changeid [17:16:15] (if someone has example that is fine =) [17:16:29] I'm trying to make my commit messages more useful =P [17:16:49] ^d: you know yer my go to person for this stuff right? so pPPPPPIIIINNNGGGG [17:17:22] <^d> huh? [17:17:41] What is the syntax for gerrit to have my new patchset reference an older one? [17:17:46] in its commit message i suppose [17:17:54] (sorry if stupid question ;) [17:17:55] ^d: syntax for linking one gerrit to another, the preferred way [17:18:06] the one that actually makes it a link [17:18:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [17:18:10] but isn;t full URL [17:18:17] <^d> I[a-z0-9]{8,40} [17:18:20] Everyone hates my commit messages, I'm making an actual attempt to improve them. [17:18:22] that:) [17:18:28] what d said [17:18:38] so I[largechangeid]? [17:18:49] <^d> Well I is the first letter of the changeid. [17:18:50] <^d> Always. [17:18:59] well Ie26b1b6549ca02edb06f6bc6575508c7e7cbb712 [17:19:04] <^d> Right :) [17:19:11] so just paste that anyplace in my commit message? [17:19:16] <^d> Yep [17:19:18] <^d> It'll auto link [17:19:22] it wont take it as my normal changeid, as long as i dont preface it then [17:19:24] ? [17:19:26] cool. [17:19:28] <^d> "bug 1234" and "rt 1345" also auto link [17:19:31] <^d> :) [17:19:36] ack!:) [17:19:41] they are great [17:19:42] <^d> Tons of auto-link magic. [17:19:57] i kneow the bug and rt i use the hell out of them now [17:20:00] but not the changeid stuff [17:20:01] and then you even have automatically updated BZ ticket [17:20:09] cool, thank you ^d and scfc_de [17:20:10] _just_ because you uploaded something to gerrit that mentioned it [17:20:12] <^d> yw [17:20:30] (03PS1) 10RobH: fixing change for virt0/1000 certificate install [operations/puppet] - 10https://gerrit.wikimedia.org/r/108926 [17:20:42] lookit all that fancy linking! [17:22:24] (03CR) 10RobH: [C: 032] fixing change for virt0/1000 certificate install [operations/puppet] - 10https://gerrit.wikimedia.org/r/108926 (owner: 10RobH) [17:23:28] <^d> RobH: Fancy indeed :) [17:23:38] (03CR) 10Siebrand: "Should be able to merge this now." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/96472 (owner: 10Siebrand) [17:29:22] ok, service seems ok on virt1000 [17:29:30] now how the heck do i test it, just seeing the service running isnt good enough. [17:30:29] (03PS1) 10Faidon Liambotis: apache: bump MaxClients from 40 to 100 for bits [operations/puppet] - 10https://gerrit.wikimedia.org/r/108929 [17:31:01] RobH: new cert? with openssl [17:32:19] I mean check that LDAP still is working and my stuff didnt break it [17:32:28] i can see the cert is there and chain is valid yep [17:32:45] but i wanna confirm actual ldap use before i roll same change to virt0 [17:32:52] ok, then , for LDAP, ping Coren is better [17:32:54] i dont feel like breaking everything that touches ldap today ;] [17:32:56] or AndrewB [17:32:58] i did in PM heh [17:33:02] k:) [17:33:06] andrewbogott_afk: [17:33:14] durn, he is afk. [17:33:16] RobH: well, you can try something like on formey [17:33:31] <^d> Gerrit login works fine. [17:33:50] <^d> I just used formey a bit ago (didn't know ldap was down) [17:34:08] like.f.e. ldaplist -l passwd | grep RobH [17:34:27] <^d> mutante: Even shorter, `ldaplist -l passwd robh` [17:34:34] well, coren just hacked his etc/ldap [17:34:36] <^d> :) [17:34:38] ^d: +1 [17:34:42] and tested against virt1000 for me, and it worked [17:34:47] cool [17:35:16] I didn't expect the LDAP client libs to mind the certificate changing so long as it was still to a valid one, and it doesn't look like it causes issues. [17:35:19] <^d> mutante: Thought for rt ticket. "Replacement for formey ldap operations in eqiad" [17:35:45] <^d> (Unless there is a place in eqiad I can use) [17:35:49] Coren: agreed, but i like to be paranoid [17:36:04] ^d: good! can be included on "#6134: shutdown formey" ?:) [17:36:17] i mean, just use it for all services on that host [17:36:22] until it's gone [17:36:23] (03PS2) 10Faidon Liambotis: Varnish: disable WAP on mobile frontends [operations/puppet] - 10https://gerrit.wikimedia.org/r/108738 [17:38:29] (03CR) 10Faidon Liambotis: "So, this completely removes WAP support and WAP clients are going to get normal HTML responses as a response." [operations/puppet] - 10https://gerrit.wikimedia.org/r/108738 (owner: 10Faidon Liambotis) [17:38:31] argh, puppet slow to run on virt0 [17:38:35] Paranoid is good on something like a directory service that is used *everywhere*. I like paranoid, and I approve of it. [17:38:41] all those iptable rules ;] [17:38:55] <^d> mutante: That's the only thing left on it :p [17:38:59] Coren: yea, im distinctly less paranoid on other services, but this one would get a lot of folks really pissed at me [17:39:13] 100% paranoia on things like ldap, varnish.... [17:39:20] ^d: ok:) great, then let's use that one [17:39:25] Coren: if you approve paranoid, can you also finally take care of theopenstack ferm patch that has been sitting there for some mosnth now? [17:39:45] paravoid: I can take a look at it. Linky? [17:40:04] I've pinged you like a half dozen times about it at least :) [17:40:12] https://gerrit.wikimedia.org/r/#/c/98307/ [17:40:32] Oh, THAT one! [17:40:38] !log virt0 puppet agent re-enabled and resumed normal service [17:40:46] Logged the message, RobH [17:40:49] woot, ok, virt0/1000 no longer have wildcards in use [17:41:00] now just down to integration which we're moving after lunch today [17:41:06] paravoid: It already has a +2 before the merge, and I did the merge. What else did you need? :-) [17:41:07] and then a salt sweep for remainders [17:41:31] I thought you'd have merged that one long ago. :-) [17:41:41] it's a labs change [17:41:50] I need labs people to merge it and babysit it, make sure nothing breaks etc. [17:42:03] and you haven't merged it, it's still under "review in progress" :) [17:42:12] Ah! I can do it now then, I'm idle waiting on a maintenance run on the replicas. :-) [17:42:34] I meant merge the conflict. [17:42:36] carefully review it first to make sure I haven't introduced a bug [17:43:06] Oh, FFS, it needs another manual merge now. [17:45:36] bleh, more cert replacements appeared [17:45:44] i think many can move behind misc-web-lb though. [17:45:46] like blog. [17:46:27] yes [17:47:08] (03PS4) 10coren: openstack: convert iptables to ferm [operations/puppet] - 10https://gerrit.wikimedia.org/r/98307 (owner: 10Faidon Liambotis) [17:47:41] A wild certificate appears! [17:48:42] (03PS1) 10Dzahn: bugzilla reporter needs to be template not file [operations/puppet] - 10https://gerrit.wikimedia.org/r/108934 [17:50:13] (03CR) 10coren: [C: 032] "After some merging, this should be okay to deploy." [operations/puppet] - 10https://gerrit.wikimedia.org/r/98307 (owner: 10Faidon Liambotis) [17:51:59] paravoid: From what I can tell, the ferm rules are strictly equivalent. I'll keep a close eye on it. [17:55:08] (03CR) 10Dzahn: [C: 032] bugzilla reporter needs to be template not file [operations/puppet] - 10https://gerrit.wikimedia.org/r/108934 (owner: 10Dzahn) [18:30:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [18:50:49] bblack: hi! you are missed in SF :) [18:57:08] hi :) [19:09:58] (03PS2) 10Reedy: Adding Extension:Babel configuration for simplewiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108663 (owner: 10Hydriz) [19:10:03] (03CR) 10Reedy: [C: 032] Adding Extension:Babel configuration for simplewiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108663 (owner: 10Hydriz) [19:10:10] (03Merged) 10jenkins-bot: Adding Extension:Babel configuration for simplewiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108663 (owner: 10Hydriz) [19:11:17] (03PS2) 10Reedy: Add two new user groups for Chinese Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108907 (owner: 10Odder) [19:11:22] (03CR) 10Reedy: [C: 032] Add two new user groups for Chinese Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108907 (owner: 10Odder) [19:11:29] (03Merged) 10jenkins-bot: Add two new user groups for Chinese Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108907 (owner: 10Odder) [19:11:38] reedy@fluorine:/a/mw-log$ grep SolrUpdateWork fatal.log -c [19:11:38] 0 [19:11:39] Yay [19:12:39] (03PS6) 10Reedy: Update CentralAuth RC2UDP config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92463 [19:12:44] (03CR) 10Reedy: [C: 032] Update CentralAuth RC2UDP config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92463 (owner: 10Reedy) [19:12:51] (03Merged) 10jenkins-bot: Update CentralAuth RC2UDP config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92463 (owner: 10Reedy) [19:14:17] (03PS4) 10Reedy: Resetting legacy channel names on labs and enabling IRC-RC echo again [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108163 (owner: 10Se4598) [19:14:21] (03CR) 10Reedy: [C: 032] Resetting legacy channel names on labs and enabling IRC-RC echo again [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108163 (owner: 10Se4598) [19:14:28] (03Merged) 10jenkins-bot: Resetting legacy channel names on labs and enabling IRC-RC echo again [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108163 (owner: 10Se4598) [19:15:07] (03PS2) 10Reedy: Add namespace aliases for NS_PROJECT on zhwikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108641 (owner: 10Odder) [19:15:13] (03CR) 10Reedy: [C: 032] Add namespace aliases for NS_PROJECT on zhwikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108641 (owner: 10Odder) [19:15:20] (03Merged) 10jenkins-bot: Add namespace aliases for NS_PROJECT on zhwikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108641 (owner: 10Odder) [19:15:44] (03PS3) 10Reedy: Maintain extension-list-wikidata with Wikidata build (beta only now) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108882 (owner: 10Aude) [19:15:49] (03CR) 10Reedy: [C: 032] Maintain extension-list-wikidata with Wikidata build (beta only now) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108882 (owner: 10Aude) [19:15:56] (03Merged) 10jenkins-bot: Maintain extension-list-wikidata with Wikidata build (beta only now) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108882 (owner: 10Aude) [19:16:37] reedy-spam-hour [19:16:50] #word [19:17:38] (03PS2) 10Reedy: Make UploadWizard respect the Flickr blacklist on Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108167 (owner: 10Gergő Tisza) [19:17:43] (03CR) 10Reedy: [C: 032] Make UploadWizard respect the Flickr blacklist on Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108167 (owner: 10Gergő Tisza) [19:17:50] (03Merged) 10jenkins-bot: Make UploadWizard respect the Flickr blacklist on Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108167 (owner: 10Gergő Tisza) [19:18:10] !!reedy-spam-hour is "A short period of time when Reedy merges helluva lot of patches." [19:18:10] Key was added [19:18:14] damn. [19:18:25] :) [19:18:30] I thought it would not accept the double exclamation mark :) [19:18:31] (03CR) 10Reedy: [C: 04-1] "Indenting is in spaces, should be tabs" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108690 (owner: 10John F. Lewis) [19:18:40] !reedy-spam-hour [19:18:43] !!reedy-spam-hour [19:18:43] "A short period of time when Reedy merges helluva lot of patches." [19:18:47] :) [19:18:50] nice [19:18:52] !del !!reedy-spam-hour [19:18:52] If you want to remove a key, type !!!reedy-spam-hour del [19:19:02] !!!reedy-spam-hour del [19:19:02] hahaha [19:19:02] Unable to find the specified key in db [19:19:06] hahaha [19:19:09] !!reedy-spam-hour del [19:19:09] Successfully removed !reedy-spam-hour [19:19:10] needs more exclamation points! [19:19:20] !reedy-spam-hour is A short period of time when Reedy merges helluva lot of patches. [19:19:21] Key was added [19:19:23] here. [19:19:24] (03PS2) 10Reedy: More rights for 'patroller' user group on hewikibooks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108832 (owner: 10Odder) [19:19:29] (03CR) 10Reedy: [C: 032] More rights for 'patroller' user group on hewikibooks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108832 (owner: 10Odder) [19:19:36] (03Merged) 10jenkins-bot: More rights for 'patroller' user group on hewikibooks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108832 (owner: 10Odder) [19:20:30] (03PS3) 10Reedy: Disable wgMaxBacklinksInvalidate now that throttling is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106465 (owner: 10Aaron Schulz) [19:20:36] (03CR) 10Reedy: [C: 032] Disable wgMaxBacklinksInvalidate now that throttling is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106465 (owner: 10Aaron Schulz) [19:20:42] (03Merged) 10jenkins-bot: Disable wgMaxBacklinksInvalidate now that throttling is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106465 (owner: 10Aaron Schulz) [19:20:53] by the greg-g, nice post [19:21:01] *way [19:21:21] (03PS2) 10Reedy: Adjust linkpurge and renderfile limits [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108824 (owner: 10Aaron Schulz) [19:21:31] (03CR) 10Reedy: [C: 032] Adjust linkpurge and renderfile limits [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108824 (owner: 10Aaron Schulz) [19:21:38] (03Merged) 10jenkins-bot: Adjust linkpurge and renderfile limits [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108824 (owner: 10Aaron Schulz) [19:32:17] !log reedy synchronized wmf-config/ [19:32:25] Logged the message, Master [19:33:57] (03PS1) 10Ottomata: Adding icinga alert if primary Hadoop NameNode is not in active state [operations/puppet] - 10https://gerrit.wikimedia.org/r/108948 [19:36:04] (03CR) 10Ottomata: [C: 032 V: 032] Adding icinga alert if primary Hadoop NameNode is not in active state [operations/puppet] - 10https://gerrit.wikimedia.org/r/108948 (owner: 10Ottomata) [19:38:41] https://git.wikimedia.org/ give 500 [19:39:06] No Chad [19:39:17] qchris_away is away [19:39:29] so no git [19:39:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [19:39:53] ticket, i guess? or do you want to page someone? [19:40:05] !log git.wikimedia.org is down [19:40:13] Logged the message, Master [19:40:38] Not sure it's necessary to page someone. It's not so critical infrastructure [19:40:46] ok, thnaks [19:40:51] That, and opsen are about/online [19:43:28] manybubbles: hiiii [19:43:55] (03CR) 10Hashar: "Pending ops review / merge. Then one can apply the role::beta::fatalmonitor class on deployment-bastion.pmtpa.wmflabs and that would insta" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108041 (owner: 10Hashar) [19:44:18] ah, the one i was looking for. hi hashar [19:44:29] quick question? [19:44:31] matanya: be quick :D [19:44:55] hashar: modules/applicationserver/manifests/pybal_check.pp line 3 [19:45:11] what is that weird key + command ? [19:47:28] matanya: I guess it grant the authorization for pybal to poll app servers [19:47:47] yes, but in a weird syntax [19:48:07] wouldn't be better to seperate command from key? [19:48:11] matanya: I am not familiar with ssh key format, might only allow the user to run the uptime and touch commands [19:48:35] ottomata: hi? [19:48:38] hi! [19:48:44] you in sf this week? [19:48:58] yeah. are you here? [19:49:01] I just got in [19:49:08] ayyeee, ok [19:49:10] matanya: anyway, I just did puppet lint on this change and I am not sure who wrote that ssh key stuff [19:49:17] i have some bandwidth for jvm stuff now [19:49:27] you did hashar that is why i asked you [19:49:31] wanted to sync with you on that, although i bet you will be busy with summit stuff eh? [19:50:11] and for general knowlegde: http://www.ietf.org/rfc/rfc4716.txt [19:50:22] matanya: according to git blame, that has been there since day 1 (aka since the first commit of operations/puppet ) [19:51:32] what? is my git blame broken? cc2d56e7 (Antoine Musso 2014-01-02 10:09:54 +0100 3) $authorized_key = 'command="uptime; touch /var/tmp/pybal-check.stamp" ssh-rsa AAAAB3NzaC1yc2 [19:54:02] oh, sorry. the lint merge is yours [19:54:03] (03CR) 10Umherirrender: "Was fixed/adjust with Ie95fdf8e9a84bc68a1827fb297e647ff7c036a9e" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90265 (owner: 10Aaron Schulz) [19:54:33] matanya: yeah :-D [19:54:36] matanya: git blame -w [19:54:41] matanya: -w would ignore wwhitespace changes [19:55:24] so it is pyoungmeister which is longer with us. [19:55:40] yup [19:55:52] but if look at that change, it is merely creating a module [19:55:57] so you have to git blame from the commit before [19:56:07] and you will eventually ends up at the very first commit in the repo [19:56:17] yes, did that [19:56:40] might want to fill a RT to figure out what is happening or get it refactored [19:56:42] mark might now [19:56:51] he wrote pybal iirc [19:57:10] will do that, thanks a lot [20:00:20] lunch time [20:02:16] (03CR) 10Manybubbles: "This has been running in production for a few days and seems safe enough to fossilize in puppet." [operations/puppet] - 10https://gerrit.wikimedia.org/r/107920 (owner: 10Manybubbles) [20:02:20] matanya: You found the "command=" specification in sshd(8)? [20:03:41] scfc_de: i know it exists, just pointed out it is a weird syntax for puppet [20:09:47] apergos: what php version running on application servers? [20:14:56] 5.3.10-1ubuntu3.9+wmf1 [20:17:22] thanks apergos [20:17:45] that's eqiad random app server, I didn't check them all [20:17:46] yw [20:21:17] (03PS1) 10Matanya: Application server pcre: remove limit [operations/puppet] - 10https://gerrit.wikimedia.org/r/108961 [20:36:30] (03PS1) 10Ori.livneh: Logstash: fixes and tweaks [operations/puppet] - 10https://gerrit.wikimedia.org/r/108964 [20:36:46] (03PS2) 10Ori.livneh: Logstash: fixes and tweaks [operations/puppet] - 10https://gerrit.wikimedia.org/r/108964 [20:37:38] (03PS2) 10Ori.livneh: Updating git::clone so that gerrit urls can be assumed by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/108537 (owner: 10Ottomata) [20:37:48] (03CR) 10Ori.livneh: [C: 032 V: 032] Updating git::clone so that gerrit urls can be assumed by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/108537 (owner: 10Ottomata) [20:38:26] (03PS5) 10Ori.livneh: Make Elasticsearch less exciting [operations/puppet] - 10https://gerrit.wikimedia.org/r/107920 (owner: 10Manybubbles) [20:38:32] (03CR) 10Ori.livneh: [C: 032 V: 032] Make Elasticsearch less exciting [operations/puppet] - 10https://gerrit.wikimedia.org/r/107920 (owner: 10Manybubbles) [20:39:42] thanks ori! [20:39:51] thoughts on this? [20:39:51] https://gerrit.wikimedia.org/r/#/c/108640/ [20:40:08] (03CR) 10Ori.livneh: "logstash::output::elasticsearch should also be resource-ified" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108964 (owner: 10Ori.livneh) [20:40:35] ottomata: looks good, I just need to test it [20:40:37] give me a minute [20:41:05] k [20:41:20] ^d: Just coming back, I see I've been pinged about git.wikimedia.org being down. [20:41:36] <^d> Blah, nobody pinged me. [20:41:40] ^d: Are you aware of that already and silently working on it? [20:41:44] Ah ok. [20:41:50] hi christian :-] [20:42:05] ^d: No worries I just came back myself :-) [20:42:09] Hi hashar :-) [20:42:41] qchris: I ported its init.d script to an upstart job a few days ago, I hope it's not related. Let me know if you want me to take a look. [20:42:48] !log blog updated [20:42:52] <^d> ori: Yeah, I saw where you did that [20:42:56] Logged the message, RobH [20:43:00] <^d> Sadly, we don't log gitblit. [20:43:15] /var/log/upstart/gitblit [20:43:30] <^d> Ah ok, didn't know that. [20:43:32] I was not aware (about git) [20:43:52] I could have at least restarted it [20:44:09] <^d> Ew. [20:44:10] <^d> Ugly log. [20:44:15] <^d> Tons of NPEs. [20:44:21] :-( [20:44:26] at least gitblit process is monitored in icinga isn't it ? [20:44:31] !log all plugins updated except caching cuz i hate it and i plan to move it behind misc-web-lb [20:44:38] Logged the message, RobH [20:45:03] <^d> ori: How do I kick the upstart'd gitblit to restart it? [20:45:14] service gitblit restart [20:45:23] <^d> Ah, obvious [20:46:01] someone should add check_http on git.wikimedia.org , making sure it serves 200 :D [20:46:44] <^d> Seems up. [20:46:51] <^d> Still seeing more NPEs than I'm comfortable with. [20:47:25] <^d> http://git.wikimedia.org/commitdiff/mediawiki/core/62205b509323dfaca9d628e4dfae0b7b1a713922 [20:47:30] <^d> ^ Something's generating bad urls. [20:47:40] <^d> That / in mediawiki/core should be percent encoded. [20:49:30] !gitblit mediawiki/core [20:49:34] .. [20:53:07] ^d: owner:"Jenkins-mwext-sync " [21:03:50] (03PS1) 10Ori.livneh: Add service check for https://git.wikimedia.org/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/108966 [21:03:54] ^ hashar [21:04:24] brion: RoanKattouw_away hey, able to make it? [21:05:06] greg-g: mobile's sending max since he does most of our deploys [21:06:02] ori: you are so fast [21:06:30] kkkkk [21:08:55] (03CR) 10Hashar: [C: 031] Add service check for https://git.wikimedia.org/ (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108966 (owner: 10Ori.livneh) [21:10:05] (03CR) 10Ori.livneh: [C: 032] Add service check for https://git.wikimedia.org/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/108966 (owner: 10Ori.livneh) [21:10:40] hasharMeeting: oops, i missed your comment [21:10:48] not a big deal [21:10:48] but -- actually that parameter just decides the Host: header value [21:10:52] so it won't go through misc varnish [21:10:59] it'll go to antimony directly [21:11:15] so that is good to me :D [21:11:16] it expands to /usr/lib/nagios/plugins/check_http --ssl -H git.wikimedia.org -I antimony -u / [21:11:41] semantic is confusing, I should have looked at the declaration [21:11:47] thank you to have double checked that [21:11:52] mooaar monitoring [21:17:03] ottomata: i want to squint at that a liiitle bit more if that's ok [21:17:17] it's a flaw in the puppet code, i think you are right about that [21:17:47] ok cool [21:17:47] but i need to find an elegant way to have the mediawiki apache site listen on both 80 and 8080 without a) adding a port param to the mediawiki class (seems orthogonal), b) hard-coding 80 and 8080 [21:18:01] two virtual hosts? [21:18:06] i'm inclined to think that the apache site config should move out of the mediawiki module entirely [21:18:10] why does it need to listen on 80? [21:18:15] or both [21:18:15] ? [21:18:27] hrm i forget to be honest [21:18:28] yeah, i wasn't sure what you were trying to do with the site.d stuff [21:18:41] the site.d stuff is nice, it lets you add optional config snippets [21:19:02] oh because it includes it…hm, but [21:19:14] why bnot just build that into your vhost template? [21:19:28] at the current moment, i'm of the opinion that puppetizing a generic vhost template is kinda impossible [21:19:28] so [21:19:30] because they're not always part of the same thing [21:19:40] easiest to just use apache module to install apache and set up moduels, etc. [21:19:44] vagrant enable-role role::multimedia enables thumb.php on 404, remove the role removes it [21:19:55] hmm [21:19:56] well, look at the site.d thing [21:20:19] i think it's useful, but i think there are definitely other places where the vagrant apache config is warty [21:20:29] ok, but you could still do that with your vhost template [21:20:36] rather than expecting every other apache site to do that [21:20:40] not sure how but have to run [21:20:41] make your mediawiki one include some directory [21:20:45] just like you are doing [21:20:50] very unlikeyl other vhosts will have to do that [21:21:00] ottomata: hm, that's persuasive [21:21:13] okay have to run to a doc appt, but will rereview as soon as i can [21:21:18] ok cool [21:21:20] lateaaas [21:21:23] bye [21:26:35] !log git.wikimedia.org is back up [21:26:43] Logged the message, Master [21:39:46] Wikimedia Platform operations, serious stuff | Log: http://bit.ly/wikisal | Channel logs: http://ur1.ca/edq22 | MediaWiki error counts: http://ur1.ca/edq1f | Requests: mailto:ops-requests@rt.wikimedia.org | on RT duty: ottomata [21:39:47] Thanks, ottomata, for being awesome. :) [21:39:48] ack [21:39:54] there we go [21:42:27] so ottomata this mean i can bug you now? [21:43:05] bug sure! [21:44:03] what'sup? [21:45:29] regarding https://rt.wikimedia.org/Ticket/Display.html?id=6143 [21:46:02] basiclly ottomata i just want to understand where should it move in equiad [21:50:54] matanya: just commented [21:52:36] thanks ottomata, i should now go and seek for those people? or your will since you are on duty? :P [21:52:48] wasn't planning on it! :p [21:57:28] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Last successful Puppet run was Wed 22 Jan 2014 09:50:42 PM UTC [21:57:28] PROBLEM - Puppet freshness on amssq51 is CRITICAL: Last successful Puppet run was Wed 22 Jan 2014 09:48:21 PM UTC [21:57:28] PROBLEM - Puppet freshness on amssq56 is CRITICAL: Last successful Puppet run was Wed 22 Jan 2014 09:48:41 PM UTC [21:57:28] PROBLEM - Puppet freshness on amssq59 is CRITICAL: Last successful Puppet run was Wed 22 Jan 2014 09:51:17 PM UTC [21:57:28] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Last successful Puppet run was Wed 22 Jan 2014 09:40:21 PM UTC [21:57:28] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Wed 22 Jan 2014 09:28:06 PM UTC [21:57:28] PROBLEM - Puppet freshness on carbon is CRITICAL: Last successful Puppet run was Wed 22 Jan 2014 09:31:55 PM UTC [21:58:41] (03PS1) 10Ottomata: Removing lcarr from icinga contactgroups and permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/108978 [22:00:08] PROBLEM - Hadoop NameNode Primary Is Active on analytics1010 is CRITICAL: Hadoop.NameNode.FSNamesystem.tag_HAState CRITICAL: active [22:02:15] (03CR) 10Matanya: [C: 031] Removing lcarr from icinga contactgroups and permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/108978 (owner: 10Ottomata) [22:02:44] (03PS1) 10Ottomata: Need to escape ! in icinga check [operations/puppet] - 10https://gerrit.wikimedia.org/r/108980 [22:03:05] (03CR) 10Ottomata: [C: 032 V: 032] Need to escape ! in icinga check [operations/puppet] - 10https://gerrit.wikimedia.org/r/108980 (owner: 10Ottomata) [22:04:50] TimStarling: can ## Log for bug 47807 [22:04:50] pipe 10 /usr/bin/udp-filter -F '\t' -p /wiki/Missing_wiki >> <%= log_directory %>/missing-wiki.tsv.log [22:05:02] be moved off udp2log? [22:05:15] moved? [22:05:26] deleted, erased [22:05:33] yes, it can be deleted [22:06:08] yay, thanks. emery is closer to death now [22:07:23] (03CR) 10RobH: [C: 031] Removing lcarr from icinga contactgroups and permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/108978 (owner: 10Ottomata) [22:07:47] (03PS2) 10Ottomata: Removing lcarr from icinga contactgroups and permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/108978 [22:07:53] (03CR) 10Ottomata: [C: 032 V: 032] Removing lcarr from icinga contactgroups and permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/108978 (owner: 10Ottomata) [22:09:11] Why is Faidon's name highlighted in ? [22:09:40] (03PS1) 10Anomie: Reset key for anomie (Brad Jorsch) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108981 [22:09:44] !log reedy updated /a/common to {{Gerrit|Ie95fdf8e9}}: Adjust linkpurge and renderfile limits [22:09:48] haha [22:09:53] Logged the message, Master [22:09:55] because he is a really important dude, duh [22:09:58] or at least, gerrit thinks so [22:10:13] paravoid ^ haha [22:10:31] And Gerrit's probably right :-), but it hides the core of the change. [22:16:17] RECOVERY - Hadoop NameNode Primary Is Active on analytics1010 is OK: Hadoop.NameNode.FSNamesystem.tag_HAState OKAY: active [22:24:43] ori, how the heck does this work for puppet freshness check? [22:24:47] command_line date --date @$LASTSERVICEOK$ +"Last successful Puppet run was %c" && exit 2 [22:24:58] wouldn't that always just print the LASTSERVICEOK [22:25:02] how is it actually checking? [22:25:07] when puppet last ran [22:27:18] (03PS1) 10Reedy: Add formatter parameter to wgCentralAuthRC [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108991 [22:27:42] (03CR) 10Reedy: [C: 032] Add formatter parameter to wgCentralAuthRC [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108991 (owner: 10Reedy) [22:27:48] (03Merged) 10jenkins-bot: Add formatter parameter to wgCentralAuthRC [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108991 (owner: 10Reedy) [22:32:48] ottomata: On Puppet runs, Puppet sends stats to Icinga. [22:33:44] *successful Puppet runs :-) [22:33:55] (03PS1) 10RobH: doc.wikimedia.org and doc.mediawiki.org to use misc-web-lb [operations/puppet] - 10https://gerrit.wikimedia.org/r/108994 [22:34:40] i'm noticing that for a few minutes after i import pages, the beta cluster is exceptionally slow [22:37:27] oh hm [22:37:36] so why hasn't it sent stats in while, hmm [22:37:43] oh, i restarted icinga a bit ago [22:37:48] maybe puppet has to run everywhere for this to fix itself? [22:39:36] ottomata: Don't know; Puppet is a "passive" check in Icinga. I've seen some SNMP stuff in the repository, but the details are rather obscure to me (Bug #1: More documentation!!!eleven!). [22:40:27] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [22:41:09] (03Abandoned) 10RobH: moving integration and doc behind misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/106321 (owner: 10RobH) [22:41:18] ottomata: files/snmp/snmptt.conf{,.icinga} [22:41:41] (03PS1) 10RobH: moving doc.wikimedia.org to misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/108997 [22:42:19] (03CR) 10RobH: [C: 032] moving doc.wikimedia.org to misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/108997 (owner: 10RobH) [22:42:39] (03CR) 10RobH: [C: 032] doc.wikimedia.org and doc.mediawiki.org to use misc-web-lb [operations/puppet] - 10https://gerrit.wikimedia.org/r/108994 (owner: 10RobH) [22:43:18] !log doc.wikimedia.org and doc.mediawiki.org relocating behind misc-web-lb. dns ttl was set to 5 minutes yesterday, so this change should result in less than 5 minutes of unreachability by anyone [22:43:20] ottomata: it is passively checked with snmp and forwarded using submit_check_result [22:44:11] how does puppet know to run that snmp thing? [22:44:54] ah! [22:44:55] found it [22:45:04] puppet just runs an exec [22:45:05] hm [22:45:08] ottomata: yes [22:45:13] notice: /Stage[main]/Base::Puppet/Exec[neon puppet snmp trap]/returns: executed successfully [22:45:17] (03PS1) 10RobH: fix cname for doc.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/108999 [22:45:21] but icinga is not getting it [22:45:22] hm [22:45:42] try to run EXEC /usr/lib/nagios/plugins/eventhandlers/submit_check_result $r "Puppet freshness" 0 "puppet ran at `date`" [22:45:48] by hand and see what happens [22:46:00] (03PS2) 10RobH: fix cname for doc.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/108999 [22:46:32] ok, im off. night all [22:46:42] (03CR) 10RobH: [C: 032] fix cname for doc.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/108999 (owner: 10RobH) [22:47:26] Ok, dns updated, and gallium is running puppet updaes [22:48:02] what's $r there though? [22:48:16] laters matanya [22:51:36] !log doc.wikimedia.org updates complete, once dns re-propagates it should be again reachable over https [22:51:42] cuz my hosts hack makes it work. [22:51:44] Logged the message, RobH [22:52:24] now lets see how long it really takes to work again without a host hack [22:52:33] then i'll feel more comfortable doing integrations move [22:53:45] heya jgage [22:53:49] hola [22:53:53] ori wants to install varnishkafka on bits varnishes [22:54:00] he wants to use it to consume eventlogging data [22:54:01] allegedly [22:54:05] hah [22:54:08] j/k :) [22:54:08] allegedly! [22:54:10] heh ok [22:54:25] he's talked about this with mark and faidon, and they seem cool w it [22:54:25] so i could submit a patch for this and make it happen [22:54:26] ORRRR [22:54:27] you could! [22:54:32] itllbefunipromise! [22:54:34] :D [22:54:36] heheh [22:54:42] ok. ori, are you in sf? [22:54:51] i am! [22:55:07] i just waved at you [22:55:18] hmmm, ok, i am due for some more kafka admin documentation on wikitech [22:55:23] unless i have it [22:55:24] lemme check [22:55:38] ottomata: http://snmptt.sourceforge.net/docs/snmptt.shtml#Variable-substitutions suggests "$r" = "Trap hostname". [22:55:41] naw, i don't cool [22:55:42] ok [22:55:44] so i'll have to add some [22:55:58] jgage, the only manual part will be adding the topic, the rest will be done in puppet [22:56:01] it'll be so fun! [22:56:03] :) [22:56:14] ok you boys talk and submit a patch :) [22:56:35] hehe ok, we have made a plan to talk in ~1hr [22:56:53] oh unless i invite myself to mark+joel's networking meeting [22:56:59] hmm scfc_de [22:57:07] /usr/lib/nagios/plugins/eventhandlers/submit_check_result: No such file or directory [22:57:46] hmm i dunno, i may have broken this since the alerts triggered after I restarted icinga [22:57:47] dunno [22:57:49] but i have to run :/ [22:57:56] fingers crossed this will fix itself [22:57:59] otherwise i'm on it tomorrow [22:58:16] !log restarted icinga (a while ago) and puppet freshness checks are all boogery(?) will check up on this tomorrow [22:58:23] Logged the message, Master [23:03:35] (03PS2) 10Anomie: Reset key for anomie (Brad Jorsch) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108981 [23:06:20] i'm out, laters all! [23:11:45] (03PS1) 10RobH: integration.wikimedia.org relocation to misc-web-lb [operations/puppet] - 10https://gerrit.wikimedia.org/r/109002 [23:15:53] (03PS1) 10RobH: integration.w.o and m.o moved cname to misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/109003 [23:18:08] (03CR) 10RobH: [C: 032] integration.wikimedia.org relocation to misc-web-lb [operations/puppet] - 10https://gerrit.wikimedia.org/r/109002 (owner: 10RobH) [23:18:15] (03CR) 10RobH: [C: 032] integration.w.o and m.o moved cname to misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/109003 (owner: 10RobH) [23:19:07] (03PS1) 10RobH: turns out zuul needs one more change [operations/dns] - 10https://gerrit.wikimedia.org/r/109004 [23:19:19] (03PS1) 10RobH: Revert "integration.wikimedia.org relocation to misc-web-lb" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109005 [23:19:58] (03CR) 10RobH: [C: 032] "zuul needs something done first, lets pretend this never happened. since i didnt merge on nameservers, it really didnt." [operations/dns] - 10https://gerrit.wikimedia.org/r/109004 (owner: 10RobH) [23:20:59] (03CR) 10RobH: [C: 032] Revert "integration.wikimedia.org relocation to misc-web-lb" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109005 (owner: 10RobH) [23:21:20] ok, successfully undone [23:36:56] (03PS1) 10RobH: setup misc-web-lb to cache for blog.w.o server holmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/109008 [23:39:55] (03CR) 10RobH: [C: 032] Reset key for anomie (Brad Jorsch) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108981 (owner: 10Anomie) [23:40:56] !log Going to upgrade Zuul, need a patch to make it send Cache-Control: no-cache so we can migrate Zuul service behind misc varnish (being done with RobH) [23:41:01] RobH: I got the patch ready [23:41:04] Logged the message, Master [23:41:12] the upgrade is a couple of minutes, will do that after current meeting [23:42:01] hasharMeeting: im gonna totally revert my reversions [23:42:06] :D [23:42:08] revert to the third [23:42:12] cherry-pick the reverted changes [23:42:38] to avoid having a commit summary that shows as: "Revert "Revert "Revert feature1" "" [23:42:39] :D [23:42:43] hehe [23:46:23] bleh, well, i fubar'd my git repo trying to do that [23:47:04] git checkout -b hack399 origin/production && cherry-pick 'whatever sha1' [23:47:10] :D [23:50:09] so i do that and it says its an empty commit [23:50:41] you will need to change the change-id too [23:50:46] ie remove it so it's re-added [23:50:59] yea, i just ghouth it would show pending changes is all [23:51:30] Oh [23:51:31] I know [23:51:38] Cherry-pick thinks it's already in the branch [23:51:40] because it is [23:51:46] so it can't be cherry picked [23:51:46] yea, it was merged [23:51:47] yeah sorry [23:51:49] and revered [23:51:56] You need to use one of the other... [23:52:02] i thought i was doing somethign wrong... bleeeeeh [23:52:05] im reverting my revert! [23:53:41] (03PS1) 10RobH: integration.wikimedia.org relocation to misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/109013 [23:54:03] (03PS1) 10RobH: Revert "integration.w.o and m.o moved cname to misc-web-lb" [operations/dns] - 10https://gerrit.wikimedia.org/r/109014 [23:55:48] (03Abandoned) 10RobH: Revert "integration.w.o and m.o moved cname to misc-web-lb" [operations/dns] - 10https://gerrit.wikimedia.org/r/109014 (owner: 10RobH) [23:57:04] (03PS1) 10RobH: integration.w.o and m.o to misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/109015 [23:59:11] (03CR) 10RobH: [C: 032] setup misc-web-lb to cache for blog.w.o server holmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/109008 (owner: 10RobH)