[00:07:34] PROBLEM - Check rp_filter disabled on lvs3004 is CRITICAL: Connection refused by host [00:07:34] PROBLEM - DPKG on lvs3004 is CRITICAL: Connection refused by host [00:07:45] PROBLEM - puppet disabled on lvs3004 is CRITICAL: Connection refused by host [00:07:45] PROBLEM - Disk space on lvs3004 is CRITICAL: Connection refused by host [00:07:45] PROBLEM - check if dhclient is running on lvs3004 is CRITICAL: Connection refused by host [00:07:54] PROBLEM - RAID on lvs3004 is CRITICAL: Connection refused by host [00:07:54] PROBLEM - check configured eth on lvs3004 is CRITICAL: Connection refused by host [00:08:05] PROBLEM - SSH on lvs3004 is CRITICAL: Connection refused [00:08:25] ^d: what's this unmerged interwiki.cbd stuff today? [00:08:39] <^d> Someone updated the interwiki cache. [00:08:44] <^d> Didn't commit the updated file. [00:09:01] <^d> Happened yesterday too? Day before maybe? [00:09:01] ^d: oh, maybe nvm, my scrollback was stuck back in time and I keep thinking I saw you saying the same thing multiple times today [00:09:03] <^d> 2nd time this week. [00:09:08] gotcha [00:12:35] PROBLEM - Host lvs3004 is DOWN: PING CRITICAL - Packet loss = 100% [00:13:52] will those lvs's ever work :) [00:17:44] RECOVERY - Host lvs3004 is UP: PING OK - Packet loss = 0%, RTA = 96.01 ms [00:19:07] ^ not for long! [00:37:30] (03PS1) 10BBlack: squid.php: move cp301[34] addrs to private1-esams [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129346 [00:39:12] (03CR) 10BBlack: [C: 032 V: 032] squid.php: move cp301[34] addrs to private1-esams [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129346 (owner: 10BBlack) [00:39:54] PROBLEM - Host lvs3004 is DOWN: PING CRITICAL - Packet loss = 100% [01:04:04] RECOVERY - SSH on lvs3004 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.3 (protocol 2.0) [01:04:14] RECOVERY - Host lvs3004 is UP: PING OK - Packet loss = 0%, RTA = 97.06 ms [01:25:45] RECOVERY - puppet disabled on lvs3004 is OK: OK [01:25:45] RECOVERY - Disk space on lvs3004 is OK: DISK OK [01:25:45] RECOVERY - check if dhclient is running on lvs3004 is OK: PROCS OK: 0 processes with command name dhclient [01:25:54] RECOVERY - check configured eth on lvs3004 is OK: NRPE: Unable to read output [01:25:54] RECOVERY - RAID on lvs3004 is OK: OK: optimal, 2 logical, 2 physical [01:26:34] RECOVERY - DPKG on lvs3004 is OK: All packages OK [01:30:34] RECOVERY - Check rp_filter disabled on lvs3004 is OK: OK: kernel parameters are set to expected value. [01:33:34] PROBLEM - Host lvs3004 is DOWN: PING CRITICAL - Packet loss = 100% [01:34:44] RECOVERY - Host lvs3004 is UP: PING OK - Packet loss = 0%, RTA = 98.30 ms [01:41:54] PROBLEM - Host db1016 is DOWN: PING CRITICAL - Packet loss = 100% [01:42:05] RECOVERY - Host db1016 is UP: PING OK - Packet loss = 0%, RTA = 1.27 ms [02:13:05] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3468 MB (3% inode=99%): [02:18:05] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3622 MB (3% inode=99%): [02:28:45] http://git-man-page-generator.lokaltog.net/ :) [02:29:16] !log LocalisationUpdate completed (1.23wmf22) at 2014-04-24 02:29:14+00:00 [02:29:25] Logged the message, Master [02:46:00] !log LocalisationUpdate completed (1.24wmf1) at 2014-04-24 02:45:58+00:00 [02:46:06] Logged the message, Master [03:00:44] (03PS1) 10BryanDavis: Exec saltutil.sync_all when adding deployment_server grain [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 [03:01:05] RECOVERY - Disk space on virt0 is OK: DISK OK [03:05:34] PROBLEM - Puppet freshness on fenari is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 12:04:49 AM UTC [03:07:34] PROBLEM - Puppet freshness on tin is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 12:06:40 AM UTC [03:14:34] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 12:14:05 AM UTC [03:15:02] (03CR) 10Ryan Lane: [C: 031] Exec saltutil.sync_all when adding deployment_server grain [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 (owner: 10BryanDavis) [03:18:30] (03CR) 10Ori.livneh: Exec saltutil.sync_all when adding deployment_server grain (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 (owner: 10BryanDavis) [03:19:49] Ryan_Lane: I wasn't sure if there were other things that the deploy.deployment_server_init should require to make sure that it works on the first puppet run. [03:22:07] (03PS2) 10BryanDavis: Exec saltutil.sync_all when adding deployment_server grain [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 [03:27:09] (03CR) 10BryanDavis: Exec saltutil.sync_all when adding deployment_server grain (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 (owner: 10BryanDavis) [03:29:18] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 24 03:29:12 UTC 2014 (duration 29m 11s) [03:29:25] Logged the message, Master [05:11:28] matanya: Logstash test instance is up and running on logstash-dev.eqiad.wmflabs. logstash-deploy is the salt+puppet+trebuchet master node. No data being indexed there yet. [05:11:38] * bd808 sleeps now [05:20:51] !log xtrabackup dbstore1001 to dbstore1002 [05:20:58] Logged the message, Master [05:37:07] <_joe_> springle: this week is very short for me (monday was easter monday, friday is liberation day), so I'm not going to bug you about databases, but I will soon [05:40:21] springle: I'll be glad when https://gerrit.wikimedia.org/r/#/c/129079/ is merged...that always gave me the willies [05:40:53] horrible SPOF [05:42:34] <_joe_> AaronSchulz: looks like it, in fact. [05:43:54] we should do chaos monkey at least in labs from time to time :) [05:45:49] <_joe_> AaronSchulz: chaos monkey makes sense when you have really geodistributed your app (which we don't really do, at the moment, apart from the caches), IMHO [05:46:31] failings slaves should be handled well, and failing master should not break "read" requests [05:46:54] <_joe_> that's because geodistribution is an *hard* problem and it's going to rely on a lot of moving parts and everything - systems and software - will have to strictly take it into consideration [05:46:57] though if no one actually tests that then it might not actually work due to some silly code in some hook somewhere [05:47:13] <_joe_> AaronSchulz: yes I agree with that, and we should test specifically for that [05:47:33] part of the problem with a monolithic app [05:47:58] one bit of code throws an I/O error due to a partition and a bunch of stuff doesn't work that should [05:48:21] hopefully manageable with testing though [06:02:39] (03PS1) 10ArielGlenn: remove last vestiges of capella, decommed in rt #6160 [operations/dns] - 10https://gerrit.wikimedia.org/r/129375 [06:03:52] (03CR) 10ArielGlenn: [C: 032] remove last vestiges of capella, decommed in rt #6160 [operations/dns] - 10https://gerrit.wikimedia.org/r/129375 (owner: 10ArielGlenn) [06:05:46] <_joe_> capella is a star, did we have stars in pmtpa/sdtpa as well? [06:06:34] PROBLEM - Puppet freshness on fenari is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 12:04:49 AM UTC [06:07:31] only Chris and Rob [06:07:54] (03PS1) 10Dzahn: disable account aengels [operations/puppet] - 10https://gerrit.wikimedia.org/r/129376 [06:08:19] <_joe_> ori: I took a look at your patch for diamond, but chasemp was much better than me in rviewing [06:08:34] PROBLEM - Puppet freshness on tin is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 12:06:40 AM UTC [06:10:25] Duplicate definition: Sshkey[cp3013] is already defined in file /etc/puppet/modules/ssh/manifests/hostkey.pp at line 14; .. [06:10:43] <_joe_> mutante: looking into it [06:10:59] _joe_: :) [06:11:09] (03PS1) 10ArielGlenn: remove sql-textxx names from dsh groups (they're long gone from dns) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129377 [06:15:34] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 12:14:05 AM UTC [06:16:04] (03CR) 10ArielGlenn: [C: 032] remove sql-textxx names from dsh groups (they're long gone from dns) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129377 (owner: 10ArielGlenn) [06:20:05] (03PS2) 10Dzahn: disable account aengels [operations/puppet] - 10https://gerrit.wikimedia.org/r/129376 [06:22:39] (03CR) 10Dzahn: [C: 032] disable account aengels [operations/puppet] - 10https://gerrit.wikimedia.org/r/129376 (owner: 10Dzahn) [06:29:35] <_joe_> puppet running again on fenari [06:29:54] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Thu Apr 24 06:29:47 UTC 2014 [06:30:03] :) [06:34:45] PROBLEM - Host cp3014 is DOWN: PING CRITICAL - Packet loss = 100% [06:34:45] RECOVERY - Host cp3014 is UP: PING OK - Packet loss = 0%, RTA = 96.63 ms [06:34:45] PROBLEM - Varnish HTTP mobile-frontend on cp3014 is CRITICAL: Connection refused [06:34:55] PROBLEM - Varnish HTTP mobile-backend on cp3013 is CRITICAL: Connection refused [06:35:06] PROBLEM - Varnish HTTP mobile-frontend on cp3013 is CRITICAL: Connection refused [06:36:01] <_joe_> mmmh it seems this did not go well with cp3013. [06:37:25] RECOVERY - Puppet freshness on tin is OK: puppet ran at Thu Apr 24 06:37:19 UTC 2014 [06:37:45] PROBLEM - Varnish HTTP mobile-backend on cp3014 is CRITICAL: Connection refused [06:38:00] <_joe_> !log ran puppetstoredconfigclean.rb for cp3013.esams.wikimedia.org cp3014.esams.wikimedia.org [06:38:06] Logged the message, Master [06:45:25] RECOVERY - Puppet freshness on bast1001 is OK: puppet ran at Thu Apr 24 06:45:22 UTC 2014 [06:47:12] <_joe_> !log also ran puppet node clean to revoke certs, facts, etc (cp3013.esams.wikimedia.org cp3014.esams.wikimedia.org) [06:47:19] Logged the message, Master [07:24:43] (03PS1) 10ArielGlenn: remove pascal (long since dead, maybe from 2008!) [operations/dns] - 10https://gerrit.wikimedia.org/r/129379 [07:32:30] (03PS2) 10ArielGlenn: remove pascal (long since dead, maybe from 2008!) [operations/dns] - 10https://gerrit.wikimedia.org/r/129379 [07:34:38] (03CR) 10ArielGlenn: [C: 032] remove pascal (long since dead, maybe from 2008!) [operations/dns] - 10https://gerrit.wikimedia.org/r/129379 (owner: 10ArielGlenn) [07:47:42] (03PS4) 10Matanya: tcpircbot: add a role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129135 [08:15:48] (03PS2) 10Dzahn: remove labstore 1-4 [operations/dns] - 10https://gerrit.wikimedia.org/r/127264 [08:15:50] (03CR) 10jenkins-bot: [V: 04-1] remove labstore 1-4 [operations/dns] - 10https://gerrit.wikimedia.org/r/127264 (owner: 10Dzahn) [08:22:30] (03PS1) 10Dzahn: remove tampa.tech.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/129383 [08:24:16] (03CR) 10Dzahn: [C: 031] "tampa.tech.wikimedia.org has address 208.80.152.139" [operations/dns] - 10https://gerrit.wikimedia.org/r/129383 (owner: 10Dzahn) [08:27:36] (03CR) 10Dzahn: "used to be: " This is the laptop IP for our on site tech, leave in place. "" [operations/dns] - 10https://gerrit.wikimedia.org/r/129383 (owner: 10Dzahn) [08:30:07] apergos: there also was https://wikitech.wikimedia.org/wiki/Tampa_cluster#Misc._Services_Pending_Migration [08:33:20] (03PS2) 10Nemo bis: Set $wgBabelCategoryNames for betawikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129103 (owner: 10Withoutaname) [08:39:09] (03PS2) 10Dzahn: remove tampa.tech.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/129383 [08:43:59] (03CR) 10Dzahn: [C: 04-2] ""These servers will be relocating. Please contact coren before proceeding with decommissioning "" [operations/dns] - 10https://gerrit.wikimedia.org/r/127264 (owner: 10Dzahn) [08:44:05] (03Abandoned) 10Dzahn: remove labstore 1-4 [operations/dns] - 10https://gerrit.wikimedia.org/r/127264 (owner: 10Dzahn) [08:58:20] (03PS3) 10Giuseppe Lavagetto: Tools to compare compiled puppet catalogs. [operations/software] - 10https://gerrit.wikimedia.org/r/129189 [08:59:13] <_joe_> BTW, I'm gonna merge this ^ as I don't expect anyone to code-review thousands of LOC of bad python and worse shell [09:00:14] (03CR) 10Giuseppe Lavagetto: [C: 032] Tools to compare compiled puppet catalogs. [operations/software] - 10https://gerrit.wikimedia.org/r/129189 (owner: 10Giuseppe Lavagetto) [09:01:36] _joe_: ahhhhh here is the catalog comparator :] [09:02:39] <_joe_> hashar: you'll see it live soon :) [09:02:53] _joe_: what do you mean ? :-] [09:03:02] are you guys building a user interface on top of it? [09:04:57] <_joe_> hashar: no, I've just made a lame script that produces html-browsable output [09:05:08] will give it a shot this afternoon [09:05:24] I am wondering whether we can run it on labs instance and use Jenkins has a frontend to it [09:05:45] this way people can trigger the job, be asked the change they want to test out and jenkins will report back with the diff [09:06:27] <_joe_> hashar: I planned on talking with you about this exactly :) [09:06:59] <_joe_> hashar: the whole thing is still hacky, I spent most of last week to work around some puppet crazyness [09:07:11] <_joe_> and a whole lot of ruby WTFs [09:08:31] _joe_: you can probably set it up on a labs instance and we can then pair together to have it registered as a Jenkins slave [09:08:41] then craft a dummy Jenkins job to trigger the script on that labs instance [09:09:12] <_joe_> hashar: I'm doing it now, but right now this script takes a LOT of time to run [09:09:19] well [09:09:21] <_joe_> it's basically compiling all catalogs twice [09:09:26] at least you will get a frontend to it :] [09:09:30] <_joe_> in puppet 2.7 and puppet 3 [09:09:31] can work on making it faster later on [09:09:33] oh [09:10:11] a potential speed up is to save a copy of the generated catalog. This way if you later compare another change based on the same parent you already have the catalog for it [09:12:07] <_joe_> hashar: I'm comparing the same catalog (from production head) between different puppet versions [09:12:15] ahhh [09:12:32] <_joe_> this is now for the 2.7 -> 3.x migration [09:14:22] _joe_: ah I was mixing up with Alexandros script that compare catalogs between changes [09:14:34] <_joe_> yes, that is way easier btw [09:14:47] <_joe_> and the script Alex created should do the work [09:17:47] heading out for nap, desperately need some sleep [09:17:58] will give a shot at your script post lunch :) [09:18:18] <_joe_> hashar: I'll have something better by then I hope [09:18:30] :-} [09:18:31] <_joe_> like, a showcase for the output :) [09:18:43] <_joe_> then I'll try to make it faster [09:18:52] we can talk about it over video if you want :] [09:19:12] <_joe_> yep, but please have your nap :) [09:19:48] * hashar sets alarm and sleep(40 * 60) [09:23:00] YAY! _joe_ i admire you [09:34:37] (03Abandoned) 10QChris: Remove unused metrics and metrics-api [operations/dns] - 10https://gerrit.wikimedia.org/r/129134 (owner: 10QChris) [09:44:59] <_joe_> matanya: try to use it with vagrant on your home box [09:47:32] <_joe_> you may need a few yaml facts to use in order for it to work. [09:48:24] thanks _joe_ i have no vagrant atm, and far from having time :) but i will surely try to load it on a labs box [09:49:01] <_joe_> matanya: I'm on it [09:49:09] <_joe_> so, no need for that [09:49:21] <_joe_> I'll grant you access once this works "as expected" [09:49:31] i meant after you are done ... :) [10:03:04] (03PS3) 10Matanya: applicationserver: lint and tidy [operations/puppet] - 10https://gerrit.wikimedia.org/r/122269 [10:11:11] (03PS3) 10Dzahn: decom the decom script [operations/puppet] - 10https://gerrit.wikimedia.org/r/125986 [10:13:53] (03PS1) 10ArielGlenn: Revert "turn off rsyncs to/from dataset2, prep for 12th floor move" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129394 [10:18:04] hi all, is it still possible to have a look at icinga.wikimedia.org (like nagios.wikimedia.org a while ago?)? [10:18:42] look at what? [10:19:44] look at icinga? [10:31:38] (03CR) 10Odder: "I believe this can be merged any time since I own the wikisource.pl domain, and I can change DNS settings on my side." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/126969 (owner: 10Odder) [10:40:51] (03PS2) 10ArielGlenn: Revert "turn off rsyncs to/from dataset2, prep for 12th floor move" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129394 [10:42:33] (03CR) 10ArielGlenn: [C: 032] Revert "turn off rsyncs to/from dataset2, prep for 12th floor move" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129394 (owner: 10ArielGlenn) [10:51:07] (03CR) 10Dzahn: [C: 032] add wikisource.pl, link to wikisource.org [operations/dns] - 10https://gerrit.wikimedia.org/r/126968 (owner: 10Dzahn) [10:56:15] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:56:15] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:02:06] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [11:02:06] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [11:29:45] RECOVERY - Varnish HTTP mobile-backend on cp3014 is OK: HTTP OK: HTTP/1.1 200 OK - 188 bytes in 0.193 second response time [11:29:46] RECOVERY - Varnish HTTP mobile-frontend on cp3014 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 0.192 second response time [11:29:55] RECOVERY - Varnish HTTP mobile-backend on cp3013 is OK: HTTP OK: HTTP/1.1 200 OK - 188 bytes in 0.191 second response time [11:30:06] RECOVERY - Varnish HTTP mobile-frontend on cp3013 is OK: HTTP OK: HTTP/1.1 200 OK - 261 bytes in 0.193 second response time [11:50:53] !log fixing private4/private6 ACLs to be consistent across all routers [11:51:01] Logged the message, Master [11:53:11] !log reenabling ospf3 between cr1-eqiad/cr2-knams [11:53:17] Logged the message, Master [11:54:00] (03PS1) 10Dzahn: WIP - put apache sync scripts into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/129399 [11:59:53] !log Running deleteEqualMessages.php on nahwiktionary (bug 43917) [11:59:53] !log Running deleteEqualMessages.php on tlwiki (bug 43917) [11:59:53] !log Running deleteEqualMessages.php on suwiki (bug 43917) [11:59:55] !log Running deleteEqualMessages.php on alswiki (bug 43917) [12:00:00] Logged the message, Master [12:00:06] Logged the message, Master [12:00:12] Logged the message, Master [12:00:16] mutante: are the pdf servers still in Tampa? [12:00:19] Logged the message, Master [12:00:24] * Nemo_bis kudos Krinkle  [12:00:35] nahwiktionary is a fantastic name [12:00:55] nah, ya think? [12:01:06] It goes right up the list of nowiki, and dawiki [12:01:11] there's quite a few like that :) [12:01:23] do we have xxxwikipedia yet? [12:02:04] two pdf servers are in tampa, the third is dead, and now lunch break [12:02:13] I believe it aliases to enwikipedia, Krinkle [12:02:18] thanks Arial [12:02:24] Ariel [12:02:28] Arial is bad [12:02:48] Well, we have zerowiki and qualitywiki [12:02:49] :P [12:02:57] :D [12:03:01] https://github.com/wikimedia/operations-mediawiki-config/blob/master/all.dblist [12:03:09] I like wuuwiki as well [12:03:23] warwiki, and, for Germans, wowiki [12:05:16] warwiki is a botpedia [12:05:28] tankbotpedia, I suppose [12:09:33] re [12:22:18] !log restarted morebots after upgrade of adminbot to 1.7.5 [12:22:25] Logged the message, Master [12:24:30] <_joe_> morebots logging himself [12:24:31] I am a logbot running on tools-exec-09. [12:24:31] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [12:24:31] To log a message, type !log . [12:24:52] <_joe_> uh, sorry :) [12:52:09] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor stuff. otherwise LGTM" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129135 (owner: 10Matanya) [13:09:01] (03PS5) 10Matanya: tcpircbot: add a role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129135 [13:11:42] (03PS6) 10Matanya: tcpircbot: add a role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129135 [13:12:21] (03PS2) 10Alexandros Kosiaris: Add IPv6 address to carbon [operations/puppet] - 10https://gerrit.wikimedia.org/r/127895 [13:13:52] (03CR) 10Alexandros Kosiaris: [C: 032] tcpircbot: add a role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129135 (owner: 10Matanya) [13:14:20] (03PS3) 10Alexandros Kosiaris: Add IPv6 address to carbon [operations/puppet] - 10https://gerrit.wikimedia.org/r/127895 [13:14:29] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add IPv6 address to carbon [operations/puppet] - 10https://gerrit.wikimedia.org/r/127895 (owner: 10Alexandros Kosiaris) [13:18:07] akosiaris: anything else needed for enabling ferm on neon? [13:18:33] I think not. I was thinking about it right now. Wanna try and see what happens ? [13:18:55] i would like to, i feel brave (or stupid) [13:19:07] be bold then :-) [13:19:29] (the not too bold part, we already covered with the previous patchsets I believe) [13:20:15] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:15] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:27] <_joe|away> brave and stupid, not OR [13:21:05] <_joe|away> braveness almost always implies stupidity :) [13:21:06] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [13:21:07] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [13:21:29] (03PS1) 10Hoo man: Add two languages not supported by MediaWiki to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 [13:21:34] <_joe_> but then be brave! [13:24:21] so matanya ? wanna do the honours of submitting an inclusion of base::firewall on neon ? [13:24:38] (03CR) 10Amire80: [C: 04-1] Add two languages not supported by MediaWiki to testwikidata (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 (owner: 10Hoo man) [13:24:40] already doing :) [13:24:45] had the manager issue [13:24:57] OK. I 'll merge https://gerrit.wikimedia.org/r/#/c/117674/ and that one together and see what happens [13:26:14] (03CR) 10Hoo man: Add two languages not supported by MediaWiki to testwikidata (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 (owner: 10Hoo man) [13:26:37] (03PS2) 10Hoo man: Add two languages not supported by MediaWiki to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 [13:27:13] (03PS1) 10Matanya: firewall: enable on neon [operations/puppet] - 10https://gerrit.wikimedia.org/r/129410 [13:27:18] here we go [13:28:51] (03PS2) 10Matanya: firewall: enable on neon [operations/puppet] - 10https://gerrit.wikimedia.org/r/129410 [13:30:44] (03CR) 10Alexandros Kosiaris: [C: 032] firewall: enable on neon [operations/puppet] - 10https://gerrit.wikimedia.org/r/129410 (owner: 10Matanya) [13:31:09] <_joe_> akosiaris: I was giving -1 for style [13:31:19] ahahaha [13:31:31] <_joe_> matanya: when including classes without parameters, you should use include module::class [13:31:34] <_joe_> matanya: As the class is included without parameters, it's better to just [13:31:37] <_joe_> include base::firewall [13:31:38] <_joe_> see http://docs.puppetlabs.com/puppet/latest/reference/lang_classes.html#include-like-vs-resource-like [13:31:55] yes it is [13:32:12] but I think he followed ottomata's example on stat1003 [13:32:21] <_joe_> ok :) [13:32:22] so let's just change both at some point [13:32:40] and as the comment says there... hopefully some day move it to standard [13:32:57] !log reedy updated /a/common to {{Gerrit|I543df75e3}}: Remove $wgDisableTextSearch and $wgDisableSearchUpdate overrides. [13:33:00] i chose to follow that to flag this is temp [13:33:04] Logged the message, Master [13:33:08] and should be moved some day [13:33:29] i agree it is better to use the normal way [13:34:01] a in case you had not noticed _joe_, logmsgbot logs to morebots [13:35:04] (03PS1) 10Reedy: Add 1.24wmf2 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129411 [13:35:06] (03PS1) 10Reedy: testwiki to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129412 [13:35:08] (03PS1) 10Reedy: Wikipedias to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129413 [13:35:10] (03PS1) 10Reedy: group0 wikis to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129414 [13:35:14] (03PS2) 10Reedy: Add 1.24wmf2 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129411 [13:35:31] (03CR) 10Reedy: [C: 032] Add 1.24wmf2 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129411 (owner: 10Reedy) [13:35:39] (03Merged) 10jenkins-bot: Add 1.24wmf2 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129411 (owner: 10Reedy) [13:36:13] akosiaris: i'm leaving town for a few although it is against the rules. please keep an eye :) [13:36:20] (03PS2) 10Reedy: testwiki to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129412 [13:36:30] PROBLEM - tcpircbot_service_running on neon is CRITICAL: PROCS CRITICAL: 3 processes with args tcpircbot [13:36:30] :-) [13:36:32] (03CR) 10Reedy: [C: 032] testwiki to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129412 (owner: 10Reedy) [13:36:39] (03Merged) 10jenkins-bot: testwiki to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129412 (owner: 10Reedy) [13:36:44] here is one fix i need to do [13:36:51] <_joe_> ok this is the first issue :) [13:37:04] heh ? [13:37:11] PROBLEM - tcpircbot_service_running on neon is CRITICAL: PROCS CRITICAL: 3 processes with args tcpircbot [13:37:21] !log reedy Started scap: testwiki to 1.24wmf2 build l10n cache [13:37:24] no the heh was at the 3 [13:37:26] Logged the message, Master [13:37:32] yesterday it was happily 1 [13:37:43] needs to be 1:3 not 1:1 in check command? [13:38:04] let me check, cause I did that already yesterday [13:44:10] PROBLEM - Host barium is DOWN: PING CRITICAL - Packet loss = 100% [13:44:20] PROBLEM - Host boron is DOWN: PING CRITICAL - Packet loss = 100% [13:44:23] (03CR) 10Ottomata: "Some comments inline." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 (owner: 10BryanDavis) [13:44:30] PROBLEM - Host tellurium is DOWN: PING CRITICAL - Packet loss = 100% [13:44:56] those must be wrong... [13:45:37] !log reedy Finished scap: testwiki to 1.24wmf2 build l10n cache (duration: 08m 16s) [13:45:44] Logged the message, Master [13:46:41] !log reedy Started scap: testwiki to 1.24wmf2 build l10n cache (take 2) [13:46:47] Logged the message, Master [13:47:06] helps if you git pull first [13:47:49] <_joe_> akosiaris: mmmh [13:49:45] (03PS1) 10Alexandros Kosiaris: Fix ferm syntax errors on icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/129416 [13:54:10] (03PS2) 10Alexandros Kosiaris: Fix ferm syntax errors on icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/129416 [13:54:15] so, 3 frack hosts... with both internal and public IPs in DNS... something is not right here... [13:55:36] (03PS1) 10coren: Labs: Remove dead hosts from dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129417 [13:56:33] (03CR) 10RobH: [C: 031] Labs: Remove dead hosts from dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129417 (owner: 10coren) [13:56:42] woo cruft cleanup \o/ [13:57:41] (03CR) 10Alexandros Kosiaris: [C: 032] Fix ferm syntax errors on icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/129416 (owner: 10Alexandros Kosiaris) [13:59:25] (03PS1) 10Alexandros Kosiaris: snmptraps are UDP not TCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/129418 [13:59:57] (03PS1) 10coren: Labs: remove dead hosts from DNS [operations/dns] - 10https://gerrit.wikimedia.org/r/129419 [14:01:05] (03CR) 10Alexandros Kosiaris: [C: 032] snmptraps are UDP not TCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/129418 (owner: 10Alexandros Kosiaris) [14:01:08] (03PS2) 10coren: Labs: remove dead hosts from DNS [operations/dns] - 10https://gerrit.wikimedia.org/r/129419 [14:02:27] !log reedy Finished scap: testwiki to 1.24wmf2 build l10n cache (take 2) (duration: 15m 46s) [14:02:34] Logged the message, Master [14:03:22] (03CR) 10coren: [C: 032] Labs: remove dead hosts from DNS [operations/dns] - 10https://gerrit.wikimedia.org/r/129419 (owner: 10coren) [14:06:14] (03CR) 10coren: [C: 032] Labs: Remove dead hosts from dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129417 (owner: 10coren) [14:06:45] RobH: I'm gonna have some of that cleanup goodness for you and chris too! [14:08:19] our boot files are going to be half as large [14:08:27] boot config files that is [14:14:54] akosiaris: so you added http/s and removed snmp? [14:15:20] RECOVERY - Host tellurium is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms [14:15:20] RECOVERY - Host barium is UP: PING OK - Packet loss = 0%, RTA = 0.71 ms [14:15:30] RECOVERY - Host boron is UP: PING OK - Packet loss = 0%, RTA = 0.71 ms [14:16:26] matanya: yes [14:16:56] akosiaris: there is no need for snmp on neon? [14:17:28] nope [14:17:36] neon is the client not the server [14:25:20] PROBLEM - RAID on stat1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:26:10] RECOVERY - RAID on stat1003 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [14:26:42] matanya: snmptrapd [14:30:54] (03PS1) 10Alexandros Kosiaris: Fix ferm syntax error in role::tcpircbot [operations/puppet] - 10https://gerrit.wikimedia.org/r/129428 [14:31:15] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix ferm syntax error in role::tcpircbot [operations/puppet] - 10https://gerrit.wikimedia.org/r/129428 (owner: 10Alexandros Kosiaris) [14:31:37] let's inform the rest of ops now about the change [14:33:14] sorry for the trouble akosiaris :/ [14:33:25] nothing to be sorry about [14:33:52] we moved neon to ferm with minimal problems. It could be way worse [14:34:01] at last, ferm on the host! [14:34:14] which host next? :) [14:35:20] good question [14:36:46] i would say carbon, as it is relatively new, and has important stuff on it, what do you think? [14:39:33] antimony would be a good candidate for me too, but i fear too sensitive. [14:39:39] (03PS1) 10Dzahn: allow 1-3 procs for tcpircbot check_procs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129430 [14:41:38] (03CR) 10Matanya: [C: 031] allow 1-3 procs for tcpircbot check_procs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129430 (owner: 10Dzahn) [14:42:35] (03PS1) 10Andrew Bogott: Remove access for four former analytics users: [operations/puppet] - 10https://gerrit.wikimedia.org/r/129434 [14:42:37] (03PS1) 10Andrew Bogott: Remove access for Jeremy Postlethwaite [operations/puppet] - 10https://gerrit.wikimedia.org/r/129435 [14:42:52] (03Abandoned) 10Andrew Bogott: Disable Andre Engels shell accounts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129321 (owner: 10Andrew Bogott) [14:42:54] (03CR) 10Dzahn: [C: 031] "root@neon:~# /usr/lib/nagios/plugins/check_procs -c 1:3 -a tcpircbot" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129430 (owner: 10Dzahn) [14:44:04] Coren and/or mutante, can I get a quick review of those two access patches? ^ and ^^ [14:44:16] * Coren takes a look. [14:45:25] Wait, I expect you mean the not abandonned ones. :-) [14:45:49] Yeah, not the abandoned one -- mutante wrote an identical patch while mine was pending :) [14:45:52] (03PS2) 10Dzahn: use --ereg-argument-array for tcpircbot check_procs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129430 [14:45:55] (03PS1) 10ArielGlenn: remove dead ps1-xxx in sdtpa, gone [operations/dns] - 10https://gerrit.wikimedia.org/r/129437 [14:46:02] (03CR) 10coren: [C: 031] "Straightforward" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129435 (owner: 10Andrew Bogott) [14:46:57] (03CR) 10coren: [C: 031] "Straightforward." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129434 (owner: 10Andrew Bogott) [14:47:10] (03CR) 10Dzahn: [C: 031] "you are checking against the etherpad.. or ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129434 (owner: 10Andrew Bogott) [14:47:58] mutante: I don't know what the etherpad is, in this context. [14:48:15] Last night I wrote all the engeineering managers with a list of users, suspecting that many of them were stale. [14:48:18] Acting on their responses. [14:48:24] andrewbogott: the one ottomata made during stat1 audit [14:48:47] Oh… well anyway the answer is 'no' :) [14:49:00] If you link me I can update it [14:50:25] (03CR) 10Andrew Bogott: [C: 032] Remove access for four former analytics users: [operations/puppet] - 10https://gerrit.wikimedia.org/r/129434 (owner: 10Andrew Bogott) [14:51:00] andrewbogott: i linked it on the ticket you created [14:51:06] thx [14:51:39] this is basically reopening 6789 [14:51:56] reopens it [14:52:24] sanjayb: if you write something we should just integrate it into the bot so it will always be up to date [14:52:28] gah, sorry, wrong channel [14:52:45] JFYI fundraising is running a banner campaign now, in case of any site-load surprises [14:55:23] (03PS1) 10Alexandros Kosiaris: Add frack to icinga ncsa/nrpe firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/129441 [14:55:49] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add frack to icinga ncsa/nrpe firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/129441 (owner: 10Alexandros Kosiaris) [14:58:06] (03CR) 10Dzahn: [C: 032] "root@neon:~# /usr/lib/nagios/plugins/check_procs -c 1:1 --ereg-argument-array "tcpircbot.py logmsgbot.json"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129430 (owner: 10Dzahn) [15:09:29] (03CR) 10Dzahn: [V: 032] "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129430 (owner: 10Dzahn) [15:13:01] (03PS2) 10Dzahn: Remove access for four former analytics users: [operations/puppet] - 10https://gerrit.wikimedia.org/r/129434 (owner: 10Andrew Bogott) [15:13:38] (03CR) 10Alexandros Kosiaris: use --ereg-argument-array for tcpircbot check_procs (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129430 (owner: 10Dzahn) [15:13:47] (03CR) 10Dzahn: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129434 (owner: 10Andrew Bogott) [15:13:50] ah... you already merged... [15:14:02] akosiaris: i waited for jenkins.. but... [15:14:09] it did not reply yet [15:14:29] so I already tried your approach, it fails because it also matches itself (funny heh?) [15:14:42] akosiaris: it works [15:14:59] well, i mean, i tried it on neon before [15:15:13] root@neon:~# /usr/lib/nagios/plugins/check_procs -c 1:1 --ereg-argument-array "tcpircbot.py logmsgbot.json" PROCS OK: 1 process with regex args 'tcpircbot.py logmsgbot.json' [15:15:31] ah yes [15:15:33] so that works [15:15:39] and the previous one worked as well [15:15:45] but try invoking it via nrpe [15:15:55] and it fails [15:16:01] arg..ah [15:16:13] cause nrpe will spawn a shell and blah blah [15:17:15] hmmm,, i see, still surprised a bit because i used it in the past to fix exactly this [15:17:32] ok, let's use your suggestion [15:18:02] f.e. nrpe_command => "/usr/lib/nagios/plugins/check_procs -w 2:2 -c 2:2 --ereg-argument-array '^/usr/bin/python /usr/local/bin/zuul-server'" etc [15:18:32] ah, i need to run to get to bank before it closes [15:19:13] mutante: /usr/lib/nagios/plugins/check_nrpe -H neon.wikimedia.org -c check_tcpircbot2 [15:19:13] CMD: /bin/ps axwwo 'stat uid pid ppid vsz rss pcpu etime comm args' [15:19:13] Matched: uid=108 vsz=4404 rss=612 pid=21216 ppid=21215 pcpu=0.00 stat=S etime=00:00 prog=sh args=sh -c /usr/lib/nagios/plugins/check_procs -vv -w 1:1 -c 1:1 --ereg-argument-array 'tcpircbot.py logmsgbot.py' [15:19:30] so it works because you match the matcher [15:19:44] and not the actual process ? [15:20:27] because it is 2:2 , yea [15:20:38] so both need to be there for it to be OK [15:20:49] heh [15:21:10] should i upload the change you suggested?:) [15:21:18] i guess you dont want -vv though [15:21:58] a yes [15:22:05] quite right, quite right [15:24:02] (03PS1) 10Dzahn: fix check_procs for tcpircbot [operations/puppet] - 10https://gerrit.wikimedia.org/r/129445 [15:28:48] Does anyone know how to reset jenkins, or are we all waiting for hashar to return? [15:29:12] andrewbogott: service jenkins restart ? :) [15:29:36] akosiaris: want to pick a host? i have some time to hack on one if you want to suggest any [15:30:11] is it jenkins for starters ? it might be zuul... [15:31:03] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] fix check_procs for tcpircbot [operations/puppet] - 10https://gerrit.wikimedia.org/r/129445 (owner: 10Dzahn) [15:31:44] yeah, I don't know which bit is broken [15:31:51] Could be w/in gerrit even [15:31:58] matanya: I got https://gerrit.wikimedia.org/r/#/c/96980/ for ytterbium [15:32:11] whooOooO. icinga is happy again. [15:32:25] Jeff_Green: :-) [15:32:47] I'm going to add sms to the notify list for some of these passive checks as you suggested [15:33:01] akosiaris: that won't work, you have svn on that host as well [15:33:19] Jeff_Green: cool [15:33:35] matanya: I know, it needs more stuff as well [15:33:53] ah, ok [15:33:57] I can tell you which ports are open and which services [15:34:13] please do :) [15:36:38] 10080 for amanda (ignore this one), 22 ssh (done via base::firewall), 80,443 (this is probably needed, apache), 29418 (gerrit's ssh) [15:36:41] greg-g: ping [15:38:01] (03CR) 10BryanDavis: ">Won't this automatically deploy everything anytime a new deployment server is added?" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 (owner: 10BryanDavis) [15:38:06] zuul seems to be getting behind.... [15:38:13] amada is being phased out by bacula, right? [15:38:31] yes [15:38:44] (03CR) 10Daniel Kinzler: [C: 031] "We want to experiment with extra languages, please." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 (owner: 10Hoo man) [15:38:48] (03PS1) 10Alexandros Kosiaris: Update frack ncsa/nrpe icinga rules [operations/puppet] - 10https://gerrit.wikimedia.org/r/129448 [15:39:35] manybubbles: Shall I do my swat change myself? [15:39:48] so your patch is enough and covers all ? [15:40:24] is it ? [15:40:31] hoo: oh, sure, if you'd like. [15:41:02] heh... [15:41:02] sorry, I figured someone'd ping me if they wanted one [15:41:03] manybubbles: :) [15:41:04] but if you feel up to it, go ahead [15:41:06] feared you forgot it, to be honset [15:41:42] (03PS3) 10Hoo man: Add two languages not supported by MediaWiki to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 [15:42:27] (03CR) 10Hoo man: [C: 032] Add two languages not supported by MediaWiki to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 (owner: 10Hoo man) [15:43:24] (03CR) 10Matanya: [C: 031] Add gerrit ferm rules for production [operations/puppet] - 10https://gerrit.wikimedia.org/r/96980 (owner: 10Alexandros Kosiaris) [15:43:41] hoo: do you want me to sync it? [15:43:51] manybubbles: Nah, am already on tin [15:44:03] hoo: sweet. [15:44:05] matanya: heh, I did notice however something cause of that. bacula is not included on that host [15:44:26] yes, i saw that too [15:44:30] Is jenkins taking a nap? :P [15:44:52] was wondering if it was intentional [15:45:02] !log restarted jenkins and zuul on gallium [15:45:09] Logged the message, Master [15:45:19] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Update frack ncsa/nrpe icinga rules [operations/puppet] - 10https://gerrit.wikimedia.org/r/129448 (owner: 10Alexandros Kosiaris) [15:46:11] (03CR) 10Ottomata: "Ah, ok phew. I was getting that confused with git deploy sync. Thanks!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 (owner: 10BryanDavis) [15:46:15] hoo: yeah, its a bit out to lunch. [15:46:38] you can verify the change if the syntax looks good to you. I glanced at it but didn't syntax check it [15:46:52] also, sync-file will check the syntax too [15:46:54] it already passed the syntax check [15:47:04] (03CR) 10Hoo man: [V: 032] Add two languages not supported by MediaWiki to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 (owner: 10Hoo man) [15:47:34] matanya: not it wasn't... [15:47:36] !log zuul on gallium is dead and I don't know why [15:47:43] Logged the message, Master [15:47:57] only has include backup::client [15:48:01] Oh, but jenkins lives! [15:48:09] !log hoo synchronized wmf-config/InitialiseSettings.php 'Add two languages not supported by MediaWiki to testwikidata' [15:48:15] Logged the message, Master [15:48:28] !log hoo updated /a/common to {{Gerrit|I53de8d84b}}: Add two languages not supported by MediaWiki to testwikidata [15:48:35] Logged the message, Master [15:48:37] oh well [15:49:02] (03PS3) 10Andrew Bogott: Remove access for four former analytics users: [operations/puppet] - 10https://gerrit.wikimedia.org/r/129434 [15:49:11] Ignore the second thing... I triggered the hook per hand [15:49:15] * hoo is done [15:49:51] ori: Looks like the post-merge is not being run on fast forward :( [15:49:59] !log scheduled a safe restart of jenkins [15:50:06] Logged the message, Master [15:50:30] but given it is building mediawiki-core-code-coverage it is probably gonna take some time [15:51:48] akosiaris: was my restart of jenkins 'unsafe'? [15:52:42] andrewbogott: I have no idea... I followed advice from a mail by antoine some time ago when you log the zuul problem [15:52:49] ok [15:52:58] hoping it might help... [15:53:58] (03CR) 10RobH: [C: 031] remove dead ps1-xxx in sdtpa, gone [operations/dns] - 10https://gerrit.wikimedia.org/r/129437 (owner: 10ArielGlenn) [15:54:40] RECOVERY - tcpircbot_service_running on neon is OK: PROCS OK: 1 process with command name python, args tcpircbot.py [15:57:02] hoo: pong, but actually, redirect ping to Deskana [15:57:09] I'll be mostly if not entirely offline today [15:57:32] greg-g: The swat stuff has already been answered... who's going to do the MW deploy tonight? [15:57:46] Reedy [15:58:58] Reedy: https://gerrit.wikimedia.org/r/129446 should be pulled in before the deploy [15:59:03] greg-g: thanks :) [16:00:07] (03PS1) 10Alexandros Kosiaris: Backup gerrit review site [operations/puppet] - 10https://gerrit.wikimedia.org/r/129453 [16:00:35] (03PS3) 10BryanDavis: Exec saltutil.sync_all when adding deployment_server grain [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 [16:13:25] !log Killed a leftover jenkins process on gallium [16:13:30] Logged the message, Master [16:14:59] !log Jenkins ended up being stalled due to a known unfigured out issue :/ [16:15:05] Logged the message, Master [16:15:42] hashar: Thanks. akosiaris and I each tried to restart things but to no avail [16:18:00] andrewbogott: akosiaris: yeah the Jenkins init script is bugged. I have been to lazy to fix it up properly [16:18:06] basically it does not track the proper pid [16:18:12] and does not wait for the targeted process to die [16:18:26] I should really fix it up and use our own init script instead of the one provided by the package :/ [16:18:58] and of course, fix the bug that causes Zuul/Jenkins to stall [16:19:33] (03PS1) 10Giuseppe Lavagetto: Use jinja2 templates, various fixes. [operations/software] - 10https://gerrit.wikimedia.org/r/129456 [16:25:09] hashar: is jenkins stuck again, or just seriously backlogged? [16:25:20] * hashar looks at https://integration.wikimedia.org/zuul/ [16:25:24] it dies [16:25:28] err Zuul dies [16:25:51] valar morghulis [16:25:52] !log restarting Zuul. Got asked to be stopped. [16:25:56] rip zuul [16:25:59] Logged the message, Master [16:26:20] PROBLEM - Puppet freshness on mw1199 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:25:51 PM UTC [16:26:28] andrewbogott: Zuul is up again, it is now processing queued events [16:27:16] hoo: Does it have i18n updates? [16:27:32] Reedy: my Wikidata update? Nope, only a JS fix [16:27:37] great [16:29:39] !log reedy synchronized php-1.24wmf2/extensions/Wikidata 'I43988505ea0fd7ac6b1278a50237e0e1d3ee0e9e' [16:29:46] Logged the message, Master [16:36:18] (03PS2) 10Andrew Bogott: Remove access for Jeremy Postlethwaite [operations/puppet] - 10https://gerrit.wikimedia.org/r/129435 [16:41:07] (03CR) 10Alexandros Kosiaris: [C: 032] Backup gerrit review site [operations/puppet] - 10https://gerrit.wikimedia.org/r/129453 (owner: 10Alexandros Kosiaris) [16:42:03] !log restarted both Zuul and Jenkins [16:42:09] Logged the message, Master [16:42:40] (03CR) 10Cmjohnson: [C: 031] "I agree that this should go away in it's current state. We should come up with another automated way to do this but for now running puppet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125986 (owner: 10Dzahn) [16:43:34] (03CR) 10Cmjohnson: [C: 031] remove dead ps1-xxx in sdtpa, gone [operations/dns] - 10https://gerrit.wikimedia.org/r/129437 (owner: 10ArielGlenn) [16:44:06] (03CR) 10Cmjohnson: [C: 031] remove tampa.tech.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/129383 (owner: 10Dzahn) [16:44:20] PROBLEM - Puppet freshness on search1012 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:43:59 PM UTC [16:45:20] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:44:44 PM UTC [16:45:20] PROBLEM - Puppet freshness on mw1173 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:44:24 PM UTC [16:46:06] akosiaris: Is there something I can test regarding jsduck? [16:46:17] Haven't done with it since the failed install yesterday [16:46:20] PROBLEM - Puppet freshness on db1051 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:00 PM UTC [16:46:20] PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:45:24 PM UTC [16:46:20] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:05 PM UTC [16:46:20] PROBLEM - Puppet freshness on lvs1002 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:45:19 PM UTC [16:46:20] PROBLEM - Puppet freshness on logstash1002 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:00 PM UTC [16:46:20] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:45:55 PM UTC [16:46:20] PROBLEM - Puppet freshness on mw1166 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:45:45 PM UTC [16:46:21] PROBLEM - Puppet freshness on search1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:45:29 PM UTC [16:46:21] PROBLEM - Puppet freshness on ms-be1015 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:45:35 PM UTC [16:46:22] PROBLEM - Puppet freshness on wtp1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:45:14 PM UTC [16:46:22] PROBLEM - Puppet freshness on ssl1003 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:45:50 PM UTC [16:46:27] ohhhhhh grrrr [16:47:03] (03CR) 10Hashar: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129435 (owner: 10Andrew Bogott) [16:47:09] (03CR) 10ArielGlenn: [C: 032] remove dead ps1-xxx in sdtpa, gone [operations/dns] - 10https://gerrit.wikimedia.org/r/129437 (owner: 10ArielGlenn) [16:47:14] ok I got this [16:47:20] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:15 PM UTC [16:47:20] PROBLEM - Puppet freshness on antimony is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:51 PM UTC [16:47:20] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:25 PM UTC [16:47:20] PROBLEM - Puppet freshness on mw1194 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:15 PM UTC [16:47:20] PROBLEM - Puppet freshness on mw1208 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:56 PM UTC [16:47:20] PROBLEM - Puppet freshness on search1003 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:10 PM UTC [16:47:20] PROBLEM - Puppet freshness on search1007 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:46:20 PM UTC [16:48:05] (03CR) 10Hashar: "recheck" [operations/software] - 10https://gerrit.wikimedia.org/r/129456 (owner: 10Giuseppe Lavagetto) [16:48:20] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:47:47 PM UTC [16:48:20] PROBLEM - Puppet freshness on db1058 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:47:47 PM UTC [16:48:20] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:47:12 PM UTC [16:48:20] PROBLEM - Puppet freshness on elastic1016 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:47:37 PM UTC [16:48:20] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:47:32 PM UTC [16:48:20] PROBLEM - Puppet freshness on mw1169 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:48:02 PM UTC [16:48:20] PROBLEM - Puppet freshness on mw1179 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:47:12 PM UTC [16:48:21] PROBLEM - Puppet freshness on nickel is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:48:07 PM UTC [16:48:21] PROBLEM - Puppet freshness on ssl1005 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:47:42 PM UTC [16:49:02] (03CR) 10jenkins-bot: [V: 04-1] Use jinja2 templates, various fixes. [operations/software] - 10https://gerrit.wikimedia.org/r/129456 (owner: 10Giuseppe Lavagetto) [16:49:20] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:49:02 PM UTC [16:49:20] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:48:47 PM UTC [16:49:20] PROBLEM - Puppet freshness on erbium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:48:17 PM UTC [16:49:20] PROBLEM - Puppet freshness on mw1165 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:48:47 PM UTC [16:49:20] PROBLEM - Puppet freshness on mw1180 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:48:12 PM UTC [16:49:20] PROBLEM - Puppet freshness on search1005 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:48:22 PM UTC [16:49:21] PROBLEM - Puppet freshness on search1021 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:48:32 PM UTC [16:50:17] ok off for dinner [16:50:20] PROBLEM - Puppet freshness on db1055 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:49:53 PM UTC [16:50:20] PROBLEM - Puppet freshness on es1009 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:49:43 PM UTC [16:50:20] PROBLEM - Puppet freshness on hafnium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:49:28 PM UTC [16:50:20] PROBLEM - Puppet freshness on labsdb1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:49:53 PM UTC [16:50:20] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:49:38 PM UTC [16:50:20] PROBLEM - Puppet freshness on mw1181 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:49:48 PM UTC [16:50:21] PROBLEM - Puppet freshness on search1023 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:49:48 PM UTC [16:51:20] PROBLEM - Puppet freshness on aluminium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:50:58 PM UTC [16:51:20] PROBLEM - Puppet freshness on db1062 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:50:18 PM UTC [16:51:20] PROBLEM - Puppet freshness on mw1198 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:50:13 PM UTC [16:51:20] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:50:33 PM UTC [16:51:20] PROBLEM - Puppet freshness on wtp1002 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:50:28 PM UTC [16:52:20] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:51:23 PM UTC [16:52:20] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:52:03 PM UTC [16:52:20] PROBLEM - Puppet freshness on caesium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:51:48 PM UTC [16:52:20] PROBLEM - Puppet freshness on db1053 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:52:03 PM UTC [16:52:20] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:51:43 PM UTC [16:52:20] PROBLEM - Puppet freshness on manutius is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:51:43 PM UTC [16:52:20] PROBLEM - Puppet freshness on mw1178 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:51:33 PM UTC [16:52:21] PROBLEM - Puppet freshness on mw1200 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:51:58 PM UTC [16:52:21] PROBLEM - Puppet freshness on netmon1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:52:08 PM UTC [16:52:22] PROBLEM - Puppet freshness on ssl1007 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:51:53 PM UTC [16:53:20] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:09 PM UTC [16:53:20] PROBLEM - Puppet freshness on cp1039 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:04 PM UTC [16:53:20] PROBLEM - Puppet freshness on cp1055 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:52:54 PM UTC [16:53:20] PROBLEM - Puppet freshness on magnesium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:52:14 PM UTC [16:53:20] PROBLEM - Puppet freshness on db1063 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:52:14 PM UTC [16:53:20] PROBLEM - Puppet freshness on search1011 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:04 PM UTC [16:53:21] PROBLEM - Puppet freshness on mw1174 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:09 PM UTC [16:53:21] PROBLEM - Puppet freshness on mw1185 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:52:34 PM UTC [16:53:22] PROBLEM - Puppet freshness on wtp1006 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:52:39 PM UTC [16:53:22] PROBLEM - Puppet freshness on ssl1002 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:52:54 PM UTC [16:54:20] PROBLEM - Puppet freshness on dbstore1002 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:19 PM UTC [16:54:20] PROBLEM - Puppet freshness on es1008 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:19 PM UTC [16:54:20] PROBLEM - Puppet freshness on elastic1012 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:29 PM UTC [16:54:20] PROBLEM - Puppet freshness on search1006 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:14 PM UTC [16:54:20] PROBLEM - Puppet freshness on search1016 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:19 PM UTC [16:54:20] PROBLEM - Puppet freshness on stat1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:54:09 PM UTC [16:54:20] PROBLEM - Puppet freshness on ssl1006 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:29 PM UTC [16:54:21] PROBLEM - Puppet freshness on wtp1016 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:53:39 PM UTC [16:54:56] (03CR) 10Andrew Bogott: [C: 032] Remove access for Jeremy Postlethwaite [operations/puppet] - 10https://gerrit.wikimedia.org/r/129435 (owner: 10Andrew Bogott) [16:55:20] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:54:51 PM UTC [16:55:20] PROBLEM - Puppet freshness on db1056 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:54:56 PM UTC [16:55:20] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:54:46 PM UTC [16:55:20] PROBLEM - Puppet freshness on mw1203 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:54:25 PM UTC [16:55:20] PROBLEM - Puppet freshness on search1004 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:54:15 PM UTC [16:55:20] PROBLEM - Puppet freshness on terbium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:54:51 PM UTC [16:55:21] PROBLEM - Puppet freshness on wtp1008 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:54:35 PM UTC [16:56:20] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:55:47 PM UTC [16:56:20] PROBLEM - Puppet freshness on cp1053 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:55:11 PM UTC [16:56:20] PROBLEM - Puppet freshness on cp1054 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:55:17 PM UTC [16:56:20] PROBLEM - Puppet freshness on db1059 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:55:32 PM UTC [16:56:20] PROBLEM - Puppet freshness on chromium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:55:22 PM UTC [16:56:20] PROBLEM - Puppet freshness on mw1177 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:56:02 PM UTC [16:56:20] PROBLEM - Puppet freshness on holmium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:55:22 PM UTC [16:56:21] PROBLEM - Puppet freshness on mw1170 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:55:42 PM UTC [16:56:21] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:55:42 PM UTC [16:57:20] PROBLEM - Puppet freshness on elastic1009 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:56:58 PM UTC [16:57:20] PROBLEM - Puppet freshness on elastic1010 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:56:48 PM UTC [16:57:20] PROBLEM - Puppet freshness on ms-be1014 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:56:42 PM UTC [16:57:20] PROBLEM - Puppet freshness on gallium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:56:53 PM UTC [16:57:20] PROBLEM - Puppet freshness on mw1172 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:56:48 PM UTC [16:57:20] PROBLEM - Puppet freshness on silver is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:56:58 PM UTC [16:57:20] PROBLEM - Puppet freshness on virt1009 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:08 PM UTC [16:57:21] PROBLEM - Puppet freshness on wtp1005 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:56:48 PM UTC [16:57:21] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:56:32 PM UTC [16:58:20] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:58 PM UTC [16:58:20] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:58 PM UTC [16:58:20] PROBLEM - Puppet freshness on cp1057 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:53 PM UTC [16:58:20] PROBLEM - Puppet freshness on db1060 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:48 PM UTC [16:58:20] PROBLEM - Puppet freshness on es1007 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:18 PM UTC [16:58:20] PROBLEM - Puppet freshness on hydrogen is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:13 PM UTC [16:58:20] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:13 PM UTC [16:58:21] PROBLEM - Puppet freshness on mw1184 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:58 PM UTC [16:58:21] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:58 PM UTC [16:58:22] PROBLEM - Puppet freshness on mw1195 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:28 PM UTC [16:58:22] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:03 PM UTC [16:58:23] PROBLEM - Puppet freshness on search1002 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:38 PM UTC [16:58:23] PROBLEM - Puppet freshness on virt0 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:57:23 PM UTC [16:58:58] is this related to ferm on neon? [16:59:20] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:18 PM UTC [16:59:20] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:13 PM UTC [16:59:20] PROBLEM - Puppet freshness on cp1048 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:59 PM UTC [16:59:20] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:39 PM UTC [16:59:20] PROBLEM - Puppet freshness on oxygen is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:54 PM UTC [16:59:20] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:29 PM UTC [16:59:20] PROBLEM - Puppet freshness on search1009 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:34 PM UTC [16:59:21] PROBLEM - Puppet freshness on ssl1009 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:59:05 PM UTC [16:59:21] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:18 PM UTC [16:59:22] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:58:59 PM UTC [16:59:45] matanya: It could be; the puppet checks probably use a different port from the other bits [16:59:54] And the timing fits [16:59:57] * matanya is looking [17:00:20] PROBLEM - Puppet freshness on elastic1011 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:00:01 PM UTC [17:00:20] PROBLEM - Puppet freshness on elastic1014 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:59:56 PM UTC [17:00:20] PROBLEM - Puppet freshness on mw1183 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:59:30 PM UTC [17:00:20] PROBLEM - Puppet freshness on rdb1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:59:25 PM UTC [17:00:20] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 01:59:10 PM UTC [17:00:53] snmptt [17:01:20] PROBLEM - Puppet freshness on logstash1003 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:01:06 PM UTC [17:01:20] PROBLEM - Puppet freshness on ms-be1013 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:01:01 PM UTC [17:01:20] PROBLEM - Puppet freshness on mw1192 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:00:41 PM UTC [17:01:20] PROBLEM - Puppet freshness on search1015 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:01:01 PM UTC [17:01:20] PROBLEM - Puppet freshness on search1020 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:00:51 PM UTC [17:01:20] PROBLEM - Puppet freshness on wtp1017 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:00:51 PM UTC [17:01:20] PROBLEM - Puppet freshness on ssl1008 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:01:01 PM UTC [17:01:21] PROBLEM - Puppet freshness on ytterbium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:00:51 PM UTC [17:02:20] PROBLEM - Puppet freshness on db1054 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:01:26 PM UTC [17:02:20] PROBLEM - Puppet freshness on db1057 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:01:26 PM UTC [17:02:20] PROBLEM - Puppet freshness on ms1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:01:21 PM UTC [17:02:20] PROBLEM - Puppet freshness on nitrogen is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:01:41 PM UTC [17:03:20] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:02:56 PM UTC [17:03:20] PROBLEM - Puppet freshness on cp1047 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:02:46 PM UTC [17:03:20] PROBLEM - Puppet freshness on mw1167 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:02:30 PM UTC [17:03:20] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:02:41 PM UTC [17:03:20] PROBLEM - Puppet freshness on db1061 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:02:46 PM UTC [17:03:20] PROBLEM - Puppet freshness on mw1186 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:02:30 PM UTC [17:03:20] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:02:30 PM UTC [17:03:21] PROBLEM - Puppet freshness on search1013 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:02:30 PM UTC [17:03:28] akosiaris: ^ [17:03:55] (03PS1) 10Alexandros Kosiaris: Update ncsa/snmp rules to include more LANs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129460 [17:04:14] he said he was on it and so there we are [17:04:20] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:03:11 PM UTC [17:04:20] PROBLEM - Puppet freshness on cp1052 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:03:31 PM UTC [17:04:20] PROBLEM - Puppet freshness on es1010 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:03:31 PM UTC [17:04:20] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:03:36 PM UTC [17:04:20] PROBLEM - Puppet freshness on mw1164 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:03:56 PM UTC [17:04:20] PROBLEM - Puppet freshness on mw1187 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:03:36 PM UTC [17:04:20] PROBLEM - Puppet freshness on mw1193 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:03:56 PM UTC [17:04:21] PROBLEM - Puppet freshness on mw1209 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:03:21 PM UTC [17:04:21] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:03:21 PM UTC [17:04:30] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Update ncsa/snmp rules to include more LANs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129460 (owner: 10Alexandros Kosiaris) [17:05:20] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:04:31 PM UTC [17:05:20] PROBLEM - Puppet freshness on cp1045 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:04:46 PM UTC [17:05:20] PROBLEM - Puppet freshness on labsdb1003 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:04:51 PM UTC [17:05:20] PROBLEM - Puppet freshness on mw1176 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:04:31 PM UTC [17:05:20] PROBLEM - Puppet freshness on search1010 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:04:11 PM UTC [17:05:20] PROBLEM - Puppet freshness on virt1008 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:04:16 PM UTC [17:06:20] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:06:07 PM UTC [17:06:20] PROBLEM - Puppet freshness on cp1056 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:05:37 PM UTC [17:06:20] PROBLEM - Puppet freshness on iron is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:05:31 PM UTC [17:06:20] PROBLEM - Puppet freshness on search1018 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:05:26 PM UTC [17:06:20] PROBLEM - Puppet freshness on virt1006 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:05:37 PM UTC [17:06:42] (03PS1) 10Dr0ptp4kt: Support HTTPS for 429-02. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129461 [17:07:20] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:06:17 PM UTC [17:07:20] PROBLEM - Puppet freshness on db1052 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:07:07 PM UTC [17:07:20] PROBLEM - Puppet freshness on labnet1001 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:06:37 PM UTC [17:07:20] PROBLEM - Puppet freshness on mw1162 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:06:42 PM UTC [17:07:20] PROBLEM - Puppet freshness on mw1175 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:06:27 PM UTC [17:07:20] PROBLEM - Puppet freshness on search1019 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:06:52 PM UTC [17:07:20] PROBLEM - Puppet freshness on search1008 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:06:57 PM UTC [17:07:21] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:06:42 PM UTC [17:08:20] PROBLEM - Puppet freshness on cp1046 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:07:57 PM UTC [17:08:20] PROBLEM - Puppet freshness on gadolinium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:08:07 PM UTC [17:08:20] PROBLEM - Puppet freshness on labsdb1002 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:07:27 PM UTC [17:08:20] PROBLEM - Puppet freshness on mw1190 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:07:47 PM UTC [17:08:20] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:07:27 PM UTC [17:08:35] bblack, when you have a moment, would you please review and, if appropriate, +2 merge & deploy ^ [17:09:20] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:09:09 PM UTC [17:09:20] PROBLEM - Puppet freshness on carbon is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:09:09 PM UTC [17:09:20] PROBLEM - Puppet freshness on cp1050 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:08:28 PM UTC [17:09:20] PROBLEM - Puppet freshness on cp1051 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:08:38 PM UTC [17:09:20] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:08:33 PM UTC [17:09:20] PROBLEM - Puppet freshness on mw1182 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:08:18 PM UTC [17:09:21] PROBLEM - Puppet freshness on rubidium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:08:38 PM UTC [17:09:21] PROBLEM - Puppet freshness on wtp1018 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:08:38 PM UTC [17:10:20] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:09:35 PM UTC [17:10:20] PROBLEM - Puppet freshness on mw1196 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:09:25 PM UTC [17:10:20] PROBLEM - Puppet freshness on search1014 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:09:14 PM UTC [17:10:20] PROBLEM - Puppet freshness on search1024 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:09:56 PM UTC [17:10:20] PROBLEM - Puppet freshness on search1017 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:09:14 PM UTC [17:10:20] PROBLEM - Puppet freshness on wtp1004 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:10:01 PM UTC [17:10:20] PROBLEM - Puppet freshness on zirconium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:09:19 PM UTC [17:11:20] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:10:36 PM UTC [17:11:20] PROBLEM - Puppet freshness on mw1188 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:10:26 PM UTC [17:11:20] PROBLEM - Puppet freshness on mw1210 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:10:31 PM UTC [17:11:20] PROBLEM - Puppet freshness on rdb1002 is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:10:36 PM UTC [17:11:20] PROBLEM - Puppet freshness on titanium is CRITICAL: Last successful Puppet run was Thu 24 Apr 2014 02:11:01 PM UTC [17:12:40] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Thu Apr 24 17:12:30 UTC 2014 [17:12:54] ok fixed [17:13:19] Krinkle: I am packages rkelly-remix, I 'll update both packages and let you know [17:13:24] I am packaging* [17:13:29] OK :) [17:13:45] (03PS1) 10ArielGlenn: remove dhcp entries for stdpa cams (already gone from dns) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129462 [17:14:06] (03PS2) 10ArielGlenn: remove dhcp entries for sdtpa cams (already gone from dns) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129462 [17:14:10] RECOVERY - Puppet freshness on mw1173 is OK: puppet ran at Thu Apr 24 17:14:01 UTC 2014 [17:14:20] RECOVERY - Puppet freshness on search1012 is OK: puppet ran at Thu Apr 24 17:14:16 UTC 2014 [17:14:51] RECOVERY - Puppet freshness on search1001 is OK: puppet ran at Thu Apr 24 17:14:47 UTC 2014 [17:15:00] RECOVERY - Puppet freshness on mw1166 is OK: puppet ran at Thu Apr 24 17:14:52 UTC 2014 [17:15:10] RECOVERY - Puppet freshness on wtp1001 is OK: puppet ran at Thu Apr 24 17:15:07 UTC 2014 [17:15:20] RECOVERY - Puppet freshness on bast1001 is OK: puppet ran at Thu Apr 24 17:15:17 UTC 2014 [17:15:20] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Apr 24 17:15:17 UTC 2014 [17:15:29] akosiaris: if you don't mind, push it upstream too [17:15:30] RECOVERY - Puppet freshness on ms-be1015 is OK: puppet ran at Thu Apr 24 17:15:22 UTC 2014 [17:15:40] RECOVERY - Puppet freshness on labsdb1005 is OK: puppet ran at Thu Apr 24 17:15:32 UTC 2014 [17:15:48] (03CR) 10ArielGlenn: [C: 032] remove dhcp entries for sdtpa cams (already gone from dns) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129462 (owner: 10ArielGlenn) [17:15:51] RECOVERY - Puppet freshness on lvs1005 is OK: puppet ran at Thu Apr 24 17:15:47 UTC 2014 [17:15:51] RECOVERY - Puppet freshness on lvs1002 is OK: puppet ran at Thu Apr 24 17:15:47 UTC 2014 [17:16:00] RECOVERY - Puppet freshness on analytics1009 is OK: puppet ran at Thu Apr 24 17:15:52 UTC 2014 [17:16:00] RECOVERY - Puppet freshness on db1051 is OK: puppet ran at Thu Apr 24 17:15:52 UTC 2014 [17:16:06] matanya: the ruby package you mean ? upstream being debian ? [17:16:10] RECOVERY - Puppet freshness on search1003 is OK: puppet ran at Thu Apr 24 17:16:02 UTC 2014 [17:16:16] akosiaris: yes and yes [17:16:20] RECOVERY - Puppet freshness on ssl1003 is OK: puppet ran at Thu Apr 24 17:16:12 UTC 2014 [17:16:21] ok [17:16:30] RECOVERY - Puppet freshness on logstash1002 is OK: puppet ran at Thu Apr 24 17:16:27 UTC 2014 [17:16:40] RECOVERY - Puppet freshness on labstore1001 is OK: puppet ran at Thu Apr 24 17:16:38 UTC 2014 [17:16:40] RECOVERY - Puppet freshness on search1007 is OK: puppet ran at Thu Apr 24 17:16:38 UTC 2014 [17:16:40] RECOVERY - Puppet freshness on mw1194 is OK: puppet ran at Thu Apr 24 17:16:38 UTC 2014 [17:16:42] i spoke to ruby team at debian and they want it quite a lot [17:17:00] RECOVERY - Puppet freshness on elastic1013 is OK: puppet ran at Thu Apr 24 17:16:58 UTC 2014 [17:17:10] RECOVERY - Puppet freshness on antimony is OK: puppet ran at Thu Apr 24 17:17:08 UTC 2014 [17:17:10] RECOVERY - Puppet freshness on mw1208 is OK: puppet ran at Thu Apr 24 17:17:08 UTC 2014 [17:17:20] RECOVERY - Puppet freshness on carbon is OK: puppet ran at Thu Apr 24 17:17:13 UTC 2014 [17:17:20] RECOVERY - Puppet freshness on lvs1004 is OK: puppet ran at Thu Apr 24 17:17:13 UTC 2014 [17:17:30] RECOVERY - Puppet freshness on mw1179 is OK: puppet ran at Thu Apr 24 17:17:24 UTC 2014 [17:17:40] RECOVERY - Puppet freshness on ssl1005 is OK: puppet ran at Thu Apr 24 17:17:34 UTC 2014 [17:17:40] RECOVERY - Puppet freshness on dataset1001 is OK: puppet ran at Thu Apr 24 17:17:39 UTC 2014 [17:17:51] RECOVERY - Puppet freshness on erbium is OK: puppet ran at Thu Apr 24 17:17:49 UTC 2014 [17:18:10] RECOVERY - Puppet freshness on elastic1016 is OK: puppet ran at Thu Apr 24 17:18:09 UTC 2014 [17:18:20] RECOVERY - Puppet freshness on search1005 is OK: puppet ran at Thu Apr 24 17:18:14 UTC 2014 [17:18:20] RECOVERY - Puppet freshness on db1058 is OK: puppet ran at Thu Apr 24 17:18:14 UTC 2014 [17:18:30] RECOVERY - Puppet freshness on mw1180 is OK: puppet ran at Thu Apr 24 17:18:19 UTC 2014 [17:18:30] RECOVERY - Puppet freshness on nickel is OK: puppet ran at Thu Apr 24 17:18:24 UTC 2014 [17:18:40] RECOVERY - Puppet freshness on search1021 is OK: puppet ran at Thu Apr 24 17:18:35 UTC 2014 [17:19:02] RECOVERY - Puppet freshness on mw1169 is OK: puppet ran at Thu Apr 24 17:18:40 UTC 2014 [17:19:02] RECOVERY - Puppet freshness on hafnium is OK: puppet ran at Thu Apr 24 17:18:45 UTC 2014 [17:19:02] PROBLEM - Host db1016 is DOWN: PING CRITICAL - Packet loss = 100% [17:19:02] RECOVERY - Puppet freshness on analytics1022 is OK: puppet ran at Thu Apr 24 17:18:55 UTC 2014 [17:19:10] RECOVERY - Puppet freshness on mw1165 is OK: puppet ran at Thu Apr 24 17:19:06 UTC 2014 [17:19:40] RECOVERY - Puppet freshness on analytics1019 is OK: puppet ran at Thu Apr 24 17:19:31 UTC 2014 [17:19:40] RECOVERY - Host db1016 is UP: PING OK - Packet loss = 0%, RTA = 1.64 ms [17:19:45] also akosiaris there is : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730424 [17:19:50] RECOVERY - Puppet freshness on mw1181 is OK: puppet ran at Thu Apr 24 17:19:41 UTC 2014 [17:20:00] RECOVERY - Puppet freshness on db1062 is OK: puppet ran at Thu Apr 24 17:19:51 UTC 2014 [17:20:03] which i started in the past, but didn't find time to finish [17:20:10] RECOVERY - Puppet freshness on es1009 is OK: puppet ran at Thu Apr 24 17:20:01 UTC 2014 [17:20:10] RECOVERY - Puppet freshness on mw1198 is OK: puppet ran at Thu Apr 24 17:20:07 UTC 2014 [17:20:20] RECOVERY - Puppet freshness on labsdb1001 is OK: puppet ran at Thu Apr 24 17:20:12 UTC 2014 [17:20:20] RECOVERY - Puppet freshness on ms-be1007 is OK: puppet ran at Thu Apr 24 17:20:12 UTC 2014 [17:20:20] RECOVERY - Puppet freshness on search1023 is OK: puppet ran at Thu Apr 24 17:20:17 UTC 2014 [17:20:30] RECOVERY - Puppet freshness on wtp1002 is OK: puppet ran at Thu Apr 24 17:20:27 UTC 2014 [17:20:30] RECOVERY - Puppet freshness on ms-be1009 is OK: puppet ran at Thu Apr 24 17:20:27 UTC 2014 [17:20:30] RECOVERY - Puppet freshness on db1055 is OK: puppet ran at Thu Apr 24 17:20:27 UTC 2014 [17:20:55] matanya: http://apt.wikimedia.org/wikimedia/pool/main/r/ruby-parallel/ [17:21:10] RECOVERY - Puppet freshness on mw1178 is OK: puppet ran at Thu Apr 24 17:21:07 UTC 2014 [17:21:20] RECOVERY - Puppet freshness on aluminium is OK: puppet ran at Thu Apr 24 17:21:12 UTC 2014 [17:21:30] RECOVERY - Puppet freshness on manutius is OK: puppet ran at Thu Apr 24 17:21:28 UTC 2014 [17:21:30] RECOVERY - Puppet freshness on db1053 is OK: puppet ran at Thu Apr 24 17:21:28 UTC 2014 [17:21:40] RECOVERY - Puppet freshness on analytics1011 is OK: puppet ran at Thu Apr 24 17:21:33 UTC 2014 [17:21:51] RECOVERY - Puppet freshness on mw1200 is OK: puppet ran at Thu Apr 24 17:21:48 UTC 2014 [17:21:56] yes akosiaris that is very old, v1 is out already [17:22:00] RECOVERY - Puppet freshness on magnesium is OK: puppet ran at Thu Apr 24 17:21:58 UTC 2014 [17:22:00] RECOVERY - Puppet freshness on db1063 is OK: puppet ran at Thu Apr 24 17:21:58 UTC 2014 [17:22:20] RECOVERY - Puppet freshness on caesium is OK: puppet ran at Thu Apr 24 17:22:13 UTC 2014 [17:22:20] RECOVERY - Puppet freshness on netmon1001 is OK: puppet ran at Thu Apr 24 17:22:18 UTC 2014 [17:22:20] RECOVERY - Puppet freshness on ssl1007 is OK: puppet ran at Thu Apr 24 17:22:18 UTC 2014 [17:22:30] RECOVERY - Puppet freshness on mw1185 is OK: puppet ran at Thu Apr 24 17:22:23 UTC 2014 [17:22:30] RECOVERY - Puppet freshness on lvs1001 is OK: puppet ran at Thu Apr 24 17:22:23 UTC 2014 [17:22:40] RECOVERY - Puppet freshness on dbstore1002 is OK: puppet ran at Thu Apr 24 17:22:33 UTC 2014 [17:22:40] RECOVERY - Puppet freshness on analytics1017 is OK: puppet ran at Thu Apr 24 17:22:38 UTC 2014 [17:22:51] RECOVERY - Puppet freshness on cp1039 is OK: puppet ran at Thu Apr 24 17:22:48 UTC 2014 [17:23:10] RECOVERY - Puppet freshness on ssl1002 is OK: puppet ran at Thu Apr 24 17:23:08 UTC 2014 [17:23:10] RECOVERY - Puppet freshness on wtp1006 is OK: puppet ran at Thu Apr 24 17:23:08 UTC 2014 [17:23:19] heh.. ruby gems and their versions... worse than java [17:23:20] RECOVERY - Puppet freshness on analytics1020 is OK: puppet ran at Thu Apr 24 17:23:13 UTC 2014 [17:23:20] RECOVERY - Puppet freshness on search1011 is OK: puppet ran at Thu Apr 24 17:23:18 UTC 2014 [17:23:20] RECOVERY - Puppet freshness on mw1174 is OK: puppet ran at Thu Apr 24 17:23:18 UTC 2014 [17:23:20] RECOVERY - Puppet freshness on cp1055 is OK: puppet ran at Thu Apr 24 17:23:18 UTC 2014 [17:23:30] RECOVERY - Puppet freshness on es1008 is OK: puppet ran at Thu Apr 24 17:23:24 UTC 2014 [17:23:40] RECOVERY - Puppet freshness on search1016 is OK: puppet ran at Thu Apr 24 17:23:34 UTC 2014 [17:23:40] RECOVERY - Puppet freshness on ssl1006 is OK: puppet ran at Thu Apr 24 17:23:34 UTC 2014 [17:23:40] RECOVERY - Puppet freshness on search1006 is OK: puppet ran at Thu Apr 24 17:23:39 UTC 2014 [17:23:42] yeah, well. what can we do [17:24:00] RECOVERY - Puppet freshness on stat1001 is OK: puppet ran at Thu Apr 24 17:23:59 UTC 2014 [17:24:10] RECOVERY - Puppet freshness on mw1203 is OK: puppet ran at Thu Apr 24 17:24:09 UTC 2014 [17:24:20] RECOVERY - Puppet freshness on elastic1012 is OK: puppet ran at Thu Apr 24 17:24:14 UTC 2014 [17:24:30] RECOVERY - Puppet freshness on search1004 is OK: puppet ran at Thu Apr 24 17:24:20 UTC 2014 [17:24:40] RECOVERY - Puppet freshness on wtp1016 is OK: puppet ran at Thu Apr 24 17:24:30 UTC 2014 [17:24:40] RECOVERY - Puppet freshness on terbium is OK: puppet ran at Thu Apr 24 17:24:35 UTC 2014 [17:25:00] RECOVERY - Puppet freshness on analytics1018 is OK: puppet ran at Thu Apr 24 17:24:51 UTC 2014 [17:25:00] RECOVERY - Puppet freshness on chromium is OK: puppet ran at Thu Apr 24 17:24:51 UTC 2014 [17:25:00] RECOVERY - Puppet freshness on cp1054 is OK: puppet ran at Thu Apr 24 17:24:51 UTC 2014 [17:25:10] RECOVERY - Puppet freshness on holmium is OK: puppet ran at Thu Apr 24 17:25:01 UTC 2014 [17:25:10] RECOVERY - Puppet freshness on ms-fe1004 is OK: puppet ran at Thu Apr 24 17:25:01 UTC 2014 [17:25:10] RECOVERY - Puppet freshness on wtp1008 is OK: puppet ran at Thu Apr 24 17:25:01 UTC 2014 [17:25:10] RECOVERY - Puppet freshness on db1059 is OK: puppet ran at Thu Apr 24 17:25:06 UTC 2014 [17:25:20] RECOVERY - Puppet freshness on db1056 is OK: puppet ran at Thu Apr 24 17:25:16 UTC 2014 [17:25:20] RECOVERY - Puppet freshness on cp1053 is OK: puppet ran at Thu Apr 24 17:25:16 UTC 2014 [17:25:20] RECOVERY - Puppet freshness on mw1170 is OK: puppet ran at Thu Apr 24 17:25:16 UTC 2014 [17:26:10] RECOVERY - Puppet freshness on gallium is OK: puppet ran at Thu Apr 24 17:26:01 UTC 2014 [17:26:10] RECOVERY - Puppet freshness on virt1005 is OK: puppet ran at Thu Apr 24 17:26:06 UTC 2014 [17:26:10] RECOVERY - Puppet freshness on mw1199 is OK: puppet ran at Thu Apr 24 17:26:06 UTC 2014 [17:26:30] RECOVERY - Puppet freshness on ms-be1010 is OK: puppet ran at Thu Apr 24 17:26:26 UTC 2014 [17:26:40] RECOVERY - Puppet freshness on mw1177 is OK: puppet ran at Thu Apr 24 17:26:31 UTC 2014 [17:26:40] RECOVERY - Puppet freshness on analytics1021 is OK: puppet ran at Thu Apr 24 17:26:36 UTC 2014 [17:26:50] RECOVERY - Puppet freshness on mw1172 is OK: puppet ran at Thu Apr 24 17:26:41 UTC 2014 [17:26:51] RECOVERY - Puppet freshness on virt1009 is OK: puppet ran at Thu Apr 24 17:26:46 UTC 2014 [17:26:51] RECOVERY - Puppet freshness on mw1195 is OK: puppet ran at Thu Apr 24 17:26:46 UTC 2014 [17:27:00] RECOVERY - Puppet freshness on elastic1009 is OK: puppet ran at Thu Apr 24 17:26:51 UTC 2014 [17:27:00] RECOVERY - Puppet freshness on analytics1016 is OK: puppet ran at Thu Apr 24 17:26:56 UTC 2014 [17:27:00] RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Thu Apr 24 17:26:56 UTC 2014 [17:27:10] RECOVERY - Puppet freshness on elastic1010 is OK: puppet ran at Thu Apr 24 17:27:01 UTC 2014 [17:27:10] RECOVERY - Puppet freshness on es1007 is OK: puppet ran at Thu Apr 24 17:27:07 UTC 2014 [17:27:20] RECOVERY - Puppet freshness on wtp1005 is OK: puppet ran at Thu Apr 24 17:27:12 UTC 2014 [17:27:20] RECOVERY - Puppet freshness on silver is OK: puppet ran at Thu Apr 24 17:27:17 UTC 2014 [17:27:20] RECOVERY - Puppet freshness on virt0 is OK: puppet ran at Thu Apr 24 17:27:17 UTC 2014 [17:27:20] RECOVERY - Puppet freshness on ms-be1014 is OK: puppet ran at Thu Apr 24 17:27:17 UTC 2014 [17:27:30] RECOVERY - Puppet freshness on hydrogen is OK: puppet ran at Thu Apr 24 17:27:27 UTC 2014 [17:27:40] RECOVERY - Puppet freshness on db1060 is OK: puppet ran at Thu Apr 24 17:27:32 UTC 2014 [17:27:51] RECOVERY - Puppet freshness on mw1191 is OK: puppet ran at Thu Apr 24 17:27:47 UTC 2014 [17:28:00] RECOVERY - Puppet freshness on analytics1026 is OK: puppet ran at Thu Apr 24 17:27:52 UTC 2014 [17:28:00] RECOVERY - Puppet freshness on ms-be1005 is OK: puppet ran at Thu Apr 24 17:27:52 UTC 2014 [17:28:00] RECOVERY - Puppet freshness on search1002 is OK: puppet ran at Thu Apr 24 17:27:52 UTC 2014 [17:28:00] RECOVERY - Puppet freshness on ssl1004 is OK: puppet ran at Thu Apr 24 17:27:57 UTC 2014 [17:28:00] RECOVERY - Puppet freshness on mw1168 is OK: puppet ran at Thu Apr 24 17:27:57 UTC 2014 [17:28:20] RECOVERY - Puppet freshness on mw1184 is OK: puppet ran at Thu Apr 24 17:28:12 UTC 2014 [17:28:20] RECOVERY - Puppet freshness on search1009 is OK: puppet ran at Thu Apr 24 17:28:12 UTC 2014 [17:28:20] RECOVERY - Puppet freshness on analytics1013 is OK: puppet ran at Thu Apr 24 17:28:17 UTC 2014 [17:28:20] RECOVERY - Puppet freshness on cp1057 is OK: puppet ran at Thu Apr 24 17:28:17 UTC 2014 [17:28:30] RECOVERY - Puppet freshness on analytics1024 is OK: puppet ran at Thu Apr 24 17:28:22 UTC 2014 [17:28:30] RECOVERY - Puppet freshness on mw1202 is OK: puppet ran at Thu Apr 24 17:28:27 UTC 2014 [17:28:40] RECOVERY - Puppet freshness on virt1007 is OK: puppet ran at Thu Apr 24 17:28:32 UTC 2014 [17:28:40] RECOVERY - Puppet freshness on cp1048 is OK: puppet ran at Thu Apr 24 17:28:37 UTC 2014 [17:29:00] RECOVERY - Puppet freshness on ssl1009 is OK: puppet ran at Thu Apr 24 17:28:57 UTC 2014 [17:29:20] RECOVERY - Puppet freshness on rhenium is OK: puppet ran at Thu Apr 24 17:29:12 UTC 2014 [17:29:20] RECOVERY - Puppet freshness on mw1183 is OK: puppet ran at Thu Apr 24 17:29:12 UTC 2014 [17:29:30] RECOVERY - Puppet freshness on rdb1001 is OK: puppet ran at Thu Apr 24 17:29:22 UTC 2014 [17:29:30] RECOVERY - Puppet freshness on oxygen is OK: puppet ran at Thu Apr 24 17:29:23 UTC 2014 [17:30:10] RECOVERY - Puppet freshness on ytterbium is OK: puppet ran at Thu Apr 24 17:30:08 UTC 2014 [17:30:20] RECOVERY - Puppet freshness on elastic1011 is OK: puppet ran at Thu Apr 24 17:30:13 UTC 2014 [17:30:30] RECOVERY - Puppet freshness on elastic1014 is OK: puppet ran at Thu Apr 24 17:30:24 UTC 2014 [17:30:30] RECOVERY - Puppet freshness on mw1192 is OK: puppet ran at Thu Apr 24 17:30:29 UTC 2014 [17:30:40] RECOVERY - Puppet freshness on db1054 is OK: puppet ran at Thu Apr 24 17:30:34 UTC 2014 [17:30:40] RECOVERY - Puppet freshness on db1057 is OK: puppet ran at Thu Apr 24 17:30:39 UTC 2014 [17:30:51] RECOVERY - Puppet freshness on search1020 is OK: puppet ran at Thu Apr 24 17:30:44 UTC 2014 [17:30:51] RECOVERY - Puppet freshness on logstash1003 is OK: puppet ran at Thu Apr 24 17:30:49 UTC 2014 [17:30:51] RECOVERY - Puppet freshness on search1015 is OK: puppet ran at Thu Apr 24 17:30:49 UTC 2014 [17:31:00] RECOVERY - Puppet freshness on ssl1008 is OK: puppet ran at Thu Apr 24 17:30:54 UTC 2014 [17:31:00] RECOVERY - Puppet freshness on ms1001 is OK: puppet ran at Thu Apr 24 17:30:54 UTC 2014 [17:31:10] RECOVERY - Puppet freshness on wtp1017 is OK: puppet ran at Thu Apr 24 17:31:04 UTC 2014 [17:31:30] RECOVERY - Puppet freshness on ms-be1013 is OK: puppet ran at Thu Apr 24 17:31:20 UTC 2014 [17:32:00] RECOVERY - Puppet freshness on nitrogen is OK: puppet ran at Thu Apr 24 17:31:50 UTC 2014 [17:32:00] RECOVERY - Puppet freshness on mw1186 is OK: puppet ran at Thu Apr 24 17:31:50 UTC 2014 [17:32:00] RECOVERY - Puppet freshness on wtp1003 is OK: puppet ran at Thu Apr 24 17:31:55 UTC 2014 [17:32:00] RECOVERY - Puppet freshness on mw1167 is OK: puppet ran at Thu Apr 24 17:31:55 UTC 2014 [17:32:10] RECOVERY - Puppet freshness on search1013 is OK: puppet ran at Thu Apr 24 17:32:01 UTC 2014 [17:32:50] RECOVERY - Puppet freshness on analytics1027 is OK: puppet ran at Thu Apr 24 17:32:41 UTC 2014 [17:32:51] RECOVERY - Puppet freshness on elastic1007 is OK: puppet ran at Thu Apr 24 17:32:46 UTC 2014 [17:32:51] RECOVERY - Puppet freshness on es1010 is OK: puppet ran at Thu Apr 24 17:32:46 UTC 2014 [17:33:10] RECOVERY - Puppet freshness on cp1047 is OK: puppet ran at Thu Apr 24 17:33:02 UTC 2014 [17:33:10] RECOVERY - Puppet freshness on ms-be1011 is OK: puppet ran at Thu Apr 24 17:33:02 UTC 2014 [17:33:10] RECOVERY - Puppet freshness on analytics1025 is OK: puppet ran at Thu Apr 24 17:33:02 UTC 2014 [17:33:10] RECOVERY - Puppet freshness on db1061 is OK: puppet ran at Thu Apr 24 17:33:08 UTC 2014 [17:33:10] RECOVERY - Puppet freshness on mw1187 is OK: puppet ran at Thu Apr 24 17:33:08 UTC 2014 [17:33:11] RECOVERY - Puppet freshness on mw1209 is OK: puppet ran at Thu Apr 24 17:33:08 UTC 2014 [17:33:11] RECOVERY - Puppet freshness on cp1052 is OK: puppet ran at Thu Apr 24 17:33:08 UTC 2014 [17:33:30] RECOVERY - Puppet freshness on stat1003 is OK: puppet ran at Thu Apr 24 17:33:28 UTC 2014 [17:33:40] RECOVERY - Puppet freshness on virt1008 is OK: puppet ran at Thu Apr 24 17:33:38 UTC 2014 [17:34:10] RECOVERY - Puppet freshness on search1010 is OK: puppet ran at Thu Apr 24 17:34:03 UTC 2014 [17:34:20] RECOVERY - Puppet freshness on mw1164 is OK: puppet ran at Thu Apr 24 17:34:18 UTC 2014 [17:34:30] RECOVERY - Puppet freshness on cp1040 is OK: puppet ran at Thu Apr 24 17:34:23 UTC 2014 [17:34:30] RECOVERY - Puppet freshness on mw1176 is OK: puppet ran at Thu Apr 24 17:34:28 UTC 2014 [17:34:40] RECOVERY - Puppet freshness on iron is OK: puppet ran at Thu Apr 24 17:34:33 UTC 2014 [17:34:40] RECOVERY - Puppet freshness on labsdb1003 is OK: puppet ran at Thu Apr 24 17:34:38 UTC 2014 [17:34:50] RECOVERY - Puppet freshness on mw1193 is OK: puppet ran at Thu Apr 24 17:34:43 UTC 2014 [17:35:10] RECOVERY - Puppet freshness on search1018 is OK: puppet ran at Thu Apr 24 17:35:03 UTC 2014 [17:35:20] RECOVERY - Puppet freshness on cp1056 is OK: puppet ran at Thu Apr 24 17:35:18 UTC 2014 [17:35:20] RECOVERY - Puppet freshness on cp1045 is OK: puppet ran at Thu Apr 24 17:35:18 UTC 2014 [17:35:30] RECOVERY - Puppet freshness on virt1006 is OK: puppet ran at Thu Apr 24 17:35:23 UTC 2014 [17:36:10] RECOVERY - Puppet freshness on analytics1012 is OK: puppet ran at Thu Apr 24 17:36:04 UTC 2014 [17:36:10] RECOVERY - Puppet freshness on mw1175 is OK: puppet ran at Thu Apr 24 17:36:09 UTC 2014 [17:36:20] RECOVERY - Puppet freshness on analytics1010 is OK: puppet ran at Thu Apr 24 17:36:19 UTC 2014 [17:36:40] RECOVERY - Puppet freshness on virt1003 is OK: puppet ran at Thu Apr 24 17:36:34 UTC 2014 [17:36:51] RECOVERY - Puppet freshness on db1052 is OK: puppet ran at Thu Apr 24 17:36:49 UTC 2014 [17:36:51] RECOVERY - Puppet freshness on labsdb1002 is OK: puppet ran at Thu Apr 24 17:36:49 UTC 2014 [17:37:00] RECOVERY - Puppet freshness on labnet1001 is OK: puppet ran at Thu Apr 24 17:36:59 UTC 2014 [17:37:10] RECOVERY - Puppet freshness on search1019 is OK: puppet ran at Thu Apr 24 17:37:09 UTC 2014 [17:37:20] RECOVERY - Puppet freshness on mw1162 is OK: puppet ran at Thu Apr 24 17:37:14 UTC 2014 [17:37:20] RECOVERY - Puppet freshness on virt1004 is OK: puppet ran at Thu Apr 24 17:37:14 UTC 2014 [17:37:20] RECOVERY - Puppet freshness on search1008 is OK: puppet ran at Thu Apr 24 17:37:14 UTC 2014 [17:37:51] RECOVERY - Puppet freshness on cp1050 is OK: puppet ran at Thu Apr 24 17:37:44 UTC 2014 [17:38:00] RECOVERY - Puppet freshness on mw1182 is OK: puppet ran at Thu Apr 24 17:37:55 UTC 2014 [17:38:30] RECOVERY - Puppet freshness on cp1046 is OK: puppet ran at Thu Apr 24 17:38:20 UTC 2014 [17:38:30] RECOVERY - Puppet freshness on mw1190 is OK: puppet ran at Thu Apr 24 17:38:20 UTC 2014 [17:38:30] RECOVERY - Puppet freshness on gadolinium is OK: puppet ran at Thu Apr 24 17:38:25 UTC 2014 [17:38:40] RECOVERY - Puppet freshness on wtp1018 is OK: puppet ran at Thu Apr 24 17:38:35 UTC 2014 [17:38:40] RECOVERY - Puppet freshness on rubidium is OK: puppet ran at Thu Apr 24 17:38:35 UTC 2014 [17:39:00] RECOVERY - Puppet freshness on elastic1015 is OK: puppet ran at Thu Apr 24 17:38:50 UTC 2014 [17:39:20] RECOVERY - Puppet freshness on search1014 is OK: puppet ran at Thu Apr 24 17:39:15 UTC 2014 [17:39:20] RECOVERY - Puppet freshness on cp1051 is OK: puppet ran at Thu Apr 24 17:39:15 UTC 2014 [17:39:20] RECOVERY - Puppet freshness on analytics1023 is OK: puppet ran at Thu Apr 24 17:39:15 UTC 2014 [17:39:30] RECOVERY - Puppet freshness on zirconium is OK: puppet ran at Thu Apr 24 17:39:20 UTC 2014 [17:39:40] RECOVERY - Puppet freshness on mw1196 is OK: puppet ran at Thu Apr 24 17:39:30 UTC 2014 [17:39:41] akosiaris: nothing's worse than Java. I find ruby gems pretty straightforward. [17:39:50] RECOVERY - Puppet freshness on lvs1006 is OK: puppet ran at Thu Apr 24 17:39:40 UTC 2014 [17:39:50] RECOVERY - Puppet freshness on search1017 is OK: puppet ran at Thu Apr 24 17:39:40 UTC 2014 [17:40:30] RECOVERY - Puppet freshness on wtp1004 is OK: puppet ran at Thu Apr 24 17:40:25 UTC 2014 [17:40:30] RECOVERY - Puppet freshness on search1024 is OK: puppet ran at Thu Apr 24 17:40:25 UTC 2014 [17:40:40] RECOVERY - Puppet freshness on mw1188 is OK: puppet ran at Thu Apr 24 17:40:35 UTC 2014 [17:40:50] RECOVERY - Puppet freshness on rdb1002 is OK: puppet ran at Thu Apr 24 17:40:40 UTC 2014 [17:40:50] RECOVERY - Puppet freshness on mw1210 is OK: puppet ran at Thu Apr 24 17:40:45 UTC 2014 [17:41:00] RECOVERY - Puppet freshness on analytics1015 is OK: puppet ran at Thu Apr 24 17:40:50 UTC 2014 [17:42:19] chrismcmahon: heh, wont argue on the first point, the second one though... well maybe you have not met all the nice problems yet, like gem authors breaking api/abi compatibility a little bit too often, people bundling 80+ gems with the app just because it is the only sane way it will work. And of course security updates :-) [17:43:14] akosiaris: I am aware of the strife between the Debian community and the Ruby community over a lot of that [17:46:48] (03PS2) 10Hoo man: Run rebuildEntityPerPage.php on Wikidata (once per week) [operations/puppet] - 10https://gerrit.wikimedia.org/r/120535 [17:47:14] (03PS3) 10Hoo man: Run rebuildEntityPerPage.php on Wikidata (once per week) [operations/puppet] - 10https://gerrit.wikimedia.org/r/120535 [17:53:03] (03PS1) 10Ricordisamoa: minor changes to InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129464 [17:55:15] (03PS2) 10Ricordisamoa: minor changes to InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129464 [17:56:22] (03PS1) 10BryanDavis: Fix README references to wikiversions.dat [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129473 [17:56:51] (03CR) 10Hoo man: [C: 04-1] "Looks like you overlooked some stuff" (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129464 (owner: 10Ricordisamoa) [18:00:00] (03PS1) 10Andrew Bogott: Disable account for Peter Gehres [operations/puppet] - 10https://gerrit.wikimedia.org/r/129474 [18:00:24] Coren, mutante: Last one for now: ^ [18:00:48] (03CR) 10Ricordisamoa: "I did not change hashes to slashes where the former were more used in that part of code." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129464 (owner: 10Ricordisamoa) [18:01:43] apergos: I just donated wikisource.pl to the WMF, I'm hoping MarkMonitor will remember to change DNS this time [18:03:27] hope so, good luck (or perhaps I should wish coren good luck, as he's on rt this week :-D) [18:08:10] (03CR) 10Odder: "That's no longer true." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/126969 (owner: 10Odder) [18:08:16] (03CR) 10coren: [C: 031] "Also sane." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129474 (owner: 10Andrew Bogott) [18:16:04] (03PS1) 10Ori.livneh: Add EventLogging consumer for db1029 [operations/puppet] - 10https://gerrit.wikimedia.org/r/129481 [18:21:40] (03PS2) 10Ori.livneh: Add EventLogging consumer for db1048 [operations/puppet] - 10https://gerrit.wikimedia.org/r/129481 [18:22:39] (03PS1) 10Manybubbles: Update highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/129484 [18:22:58] (03CR) 10Manybubbles: "Deploying to beta." [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/129484 (owner: 10Manybubbles) [18:24:30] (03CR) 10Hoo man: [C: 031] "Ok, I'm fine with this change then..." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129464 (owner: 10Ricordisamoa) [18:26:30] (03CR) 10Ori.livneh: [C: 032] Add EventLogging consumer for db1048 [operations/puppet] - 10https://gerrit.wikimedia.org/r/129481 (owner: 10Ori.livneh) [18:27:34] (03CR) 10Andrew Bogott: [C: 032] Disable account for Peter Gehres [operations/puppet] - 10https://gerrit.wikimedia.org/r/129474 (owner: 10Andrew Bogott) [18:32:19] Hey mark and others - next week, sometime Tuesday afternoon North America time, we'll be discussing https://www.mediawiki.org/wiki/Requests_for_comment/Third-party_components and https://www.mediawiki.org/wiki/Requests_for_comment/MediaWiki_libraries on IRC, and it feels like a conversation that should include an Ops representative. I was thinking ~ 2200 UTC [18:33:03] !log begin eventlogging migration db1047 to db1048 (m1), RT #7081 [18:33:10] Logged the message, Master [18:33:17] whom should I be inviting? [18:33:24] (03PS1) 10Odder: Add peacepalacelibrary.nl to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129489 [18:34:24] (03PS1) 10Milimetric: Remove EventLogging consumer for db1047 [operations/puppet] - 10https://gerrit.wikimedia.org/r/129490 [18:35:59] and the time could be flexible - what times would work? This would be April 29th [18:36:15] (03PS2) 10Reedy: Wikipedias to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129413 [18:36:55] (03CR) 10Reedy: [C: 032] Wikipedias to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129413 (owner: 10Reedy) [18:37:02] (03Merged) 10jenkins-bot: Wikipedias to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129413 (owner: 10Reedy) [18:37:05] what what [18:37:24] Maybe a North American like Jeff_Green or andrewbogott or Coren would be good? (for the meeting I mentioned) [18:37:28] for timezone reasons [18:37:43] whut me? [18:37:47] bah, only the commit summary is wrong... I already panicked :P [18:37:55] Jeff_Green: (backscroll, a few min ago) [18:38:01] looking [18:38:05] Thanks [18:38:11] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.24wmf1 [18:38:16] (03PS2) 10Odder: Add peacepalacelibrary.nl to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129489 [18:38:17] Logged the message, Master [18:38:31] (03CR) 10Jforrester: "This was "Wikipedias to 1.24wmf1" actually. :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129413 (owner: 10Reedy) [18:38:48] sumanah: I'm not really qualified to opine; I've got all of one commit to core. [18:38:48] thanks, James_F [18:39:03] * James_F was panicking a bit. :-) [18:39:24] James_F: For a moment I thought I backported to the wrong branch earlier and now stuff will break horribly [18:39:33] sumanah: I'm pretty good at making mediawiki run, but I'm hardly a dev. [18:39:42] hoo: Given we just found a wmf2 regression… ;-) [18:40:26] (03CR) 10Reedy: "Translatewiki doesn't even know about those languages..." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 (owner: 10Hoo man) [18:41:05] (03CR) 10Hoo man: "That's why we define them here. According to Amir they shouldn't go into Names.php" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 (owner: 10Hoo man) [18:41:08] (03PS2) 10Reedy: group0 wikis to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129414 [18:41:16] * andrewbogott agrees with Coren [18:41:26] sumanah, I'm willing but will almost certainly have no opinion :) [18:42:00] (03CR) 10Amire80: "No. Names.php should be only for languages with localizations. This may be relevant for rwr some day, and probably not for ota." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129408 (owner: 10Hoo man) [18:42:32] (03CR) 10Reedy: [C: 032] group0 wikis to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129414 (owner: 10Reedy) [18:42:38] andrewbogott: Right. You get why I'd like an Ops perspective, though, right? These are sort of dependency management questions and might include a "hell no we are never doing that kind of package management/installing that library"-type moment [18:42:39] (03Merged) 10jenkins-bot: group0 wikis to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129414 (owner: 10Reedy) [18:43:15] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Rest of group0 to 1.24wmf2 [18:43:21] Logged the message, Master [18:43:28] sumanah: sure, makes sense… I'm trying to think of which Ops would actually have opinions :) [18:43:31] yeah [18:43:38] and would not mind being in that chat around that time [18:43:44] it can move a bit (temporally) [18:43:59] paravoid is surely qualified but also maybe stretched too thin already [18:44:02] (03PS3) 10Reedy: Set FileRender pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129306 (owner: 10Aaron Schulz) [18:44:07] (03CR) 10Reedy: [C: 032] Set FileRender pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129306 (owner: 10Aaron Schulz) [18:44:15] (03Merged) 10jenkins-bot: Set FileRender pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129306 (owner: 10Aaron Schulz) [18:44:30] yeah, I hesitate to ask Faidon for yet more time. [18:45:06] Ryan might have some interest... But not ops now :( [18:45:35] (03PS3) 10Reedy: Add peacepalacelibrary.nl to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129489 (owner: 10Odder) [18:45:48] (03CR) 10Reedy: [C: 032] Add peacepalacelibrary.nl to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129489 (owner: 10Odder) [18:45:56] Reedy: Well, Ryan's going to be there, as he is the author of one of the RfCs! [18:46:00] (03Merged) 10jenkins-bot: Add peacepalacelibrary.nl to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129489 (owner: 10Odder) [18:46:26] (03PS2) 10Reedy: Fix README references to wikiversions.dat [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129473 (owner: 10BryanDavis) [18:46:50] (03CR) 10Reedy: [C: 032] Fix README references to wikiversions.dat [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129473 (owner: 10BryanDavis) [18:47:05] (03Merged) 10jenkins-bot: Fix README references to wikiversions.dat [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129473 (owner: 10BryanDavis) [18:47:35] (03PS2) 10Reedy: FUTURE: Third batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125033 (owner: 10MarkTraceur) [18:47:39] (03CR) 10Reedy: [C: 032] FUTURE: Third batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125033 (owner: 10MarkTraceur) [18:47:50] (03Merged) 10jenkins-bot: FUTURE: Third batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125033 (owner: 10MarkTraceur) [18:48:58] chasemp and bblack are also in NA [18:49:21] (03PS3) 10Reedy: Set $wgBabelCategoryNames for betawikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129103 (owner: 10Withoutaname) [18:49:27] and sometimes mutante [18:49:34] ? [18:49:45] (03CR) 10Reedy: [C: 032] Set $wgBabelCategoryNames for betawikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129103 (owner: 10Withoutaname) [18:49:52] (03Merged) 10jenkins-bot: Set $wgBabelCategoryNames for betawikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129103 (owner: 10Withoutaname) [18:49:54] (03PS2) 10Milimetric: Remove EventLogging consumer for db1047 [operations/puppet] - 10https://gerrit.wikimedia.org/r/129490 [18:50:28] (03PS6) 10Reedy: Use $wgTranslatePageTranslationULS [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115879 (owner: 10Nemo bis) [18:50:32] andrewbogott: I'd be happy to invite either of them, sure - maybe one of them would be more likely to have dependency management opinions? [18:50:33] (03CR) 10Reedy: [C: 032] Use $wgTranslatePageTranslationULS [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115879 (owner: 10Nemo bis) [18:50:46] (03Merged) 10jenkins-bot: Use $wgTranslatePageTranslationULS [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115879 (owner: 10Nemo bis) [18:51:12] chasemp: see backscroll from ~20 min ago [18:51:13] cmjohnson1: what about https://rt.wikimedia.org/Ticket/Display.html?id=5918 ? [18:52:24] sumanah: great post on your blog btw :) [18:52:34] matanya: that server is not doing anything. It needs to be set as a spare. [18:52:38] Thank you matanya! Glad it was good [18:53:22] cmjohnson1: it is in eqiad, right? [18:54:07] yes it is...i need to move it to a 10G rack when I get back [18:54:21] andrewbogott: I will, for now, invite you, and you can feel free to make it a transitive invite & bow out - sound good? [18:54:52] Sounds good. I'm interested in the topic, just don't feel like I'm sufficiently jaded to bring a proper Ops perspective :) [18:56:19] (03PS3) 10Ori.livneh: Remove EventLogging consumer for db1047 [operations/puppet] - 10https://gerrit.wikimedia.org/r/129490 (owner: 10Milimetric) [18:56:26] (03PS4) 10Ori.livneh: Remove EventLogging consumer for db1047 [operations/puppet] - 10https://gerrit.wikimedia.org/r/129490 (owner: 10Milimetric) [18:56:57] (03CR) 10Ori.livneh: [C: 032 V: 032] Remove EventLogging consumer for db1047 [operations/puppet] - 10https://gerrit.wikimedia.org/r/129490 (owner: 10Milimetric) [18:59:06] !log reedy synchronized database lists files: [18:59:13] Logged the message, Master [19:00:15] !log reedy synchronized wmf-config/ [19:00:21] Logged the message, Master [19:00:28] Thanks Andrew [19:00:49] !log eventlogging data streaming into db1048; db1047 consumer decom'd. [19:00:56] Logged the message, Master [19:17:20] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:17:20] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:19:10] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [19:19:10] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [19:22:24] (03CR) 10Nemo bis: "Thanks!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115879 (owner: 10Nemo bis) [19:23:20] ugh, typo in commit message [19:35:15] (03PS1) 10Rush: admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 [19:39:32] (03PS1) 10Rush: one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 [19:39:32] mutante: https://rt.wikimedia.org/Ticket/Display.html?id=6266 ? [19:41:11] (03CR) 10jenkins-bot: [V: 04-1] one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [19:41:23] (03PS1) 10Milimetric: Add second test wiki database [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/129542 [19:41:50] (03Abandoned) 10Rush: old 'mortals' now 'deployment' role per comment in admins.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/121249 (owner: 10Rush) [19:42:02] (03Abandoned) 10Rush: breaking out parsoid admin role [operations/puppet] - 10https://gerrit.wikimedia.org/r/121109 (owner: 10Rush) [19:42:12] (03Abandoned) 10Rush: jenkins admin user breakout [operations/puppet] - 10https://gerrit.wikimedia.org/r/121091 (owner: 10Rush) [19:42:20] (03Abandoned) 10Rush: ops under new admin [operations/puppet] - 10https://gerrit.wikimedia.org/r/120972 (owner: 10Rush) [19:42:38] (03Abandoned) 10Rush: shell for proposed admin module [operations/puppet] - 10https://gerrit.wikimedia.org/r/120724 (owner: 10Rush) [19:46:25] (03CR) 10Aaron Schulz: [C: 032] Removed redundant pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/127646 (owner: 10Aaron Schulz) [19:46:30] PROBLEM - Host ps1-c2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [19:46:30] (03CR) 10jenkins-bot: [V: 04-1] Removed redundant pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/127646 (owner: 10Aaron Schulz) [19:46:40] PROBLEM - Host db72 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:40] PROBLEM - Host ps1-c1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [19:46:50] PROBLEM - Host ps1-d1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [19:46:50] PROBLEM - Host ps1-c3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [19:46:50] PROBLEM - Host fenari is DOWN: PING CRITICAL - Packet loss = 100% [19:46:50] PROBLEM - Host virt0 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:51] PROBLEM - Host mchenry is DOWN: PING CRITICAL - Packet loss = 100% [19:46:51] PROBLEM - Host tarin is DOWN: PING CRITICAL - Packet loss = 100% [19:46:51] PROBLEM - Host ps1-a1-sdtpa is DOWN: PING CRITICAL - Packet loss = 100% [19:46:52] PROBLEM - Host db71 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:52] PROBLEM - Host ekrem is DOWN: PING CRITICAL - Packet loss = 100% [19:46:53] PROBLEM - Host manutius is DOWN: PING CRITICAL - Packet loss = 100% [19:46:53] PROBLEM - Host ps1-d3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [19:46:54] PROBLEM - Host es10 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:54] PROBLEM - Host db74 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:55] PROBLEM - Host es4 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:55] PROBLEM - Host sanger is DOWN: PING CRITICAL - Packet loss = 100% [19:46:56] PROBLEM - Host es7 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:56] PROBLEM - Host linne is DOWN: PING CRITICAL - Packet loss = 100% [19:46:57] PROBLEM - Host db60 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:57] PROBLEM - Host db73 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:58] PROBLEM - Host db69 is DOWN: PING CRITICAL - Packet loss = 100% [19:47:20] PROBLEM - Host dataset2 is DOWN: PING CRITICAL - Packet loss = 100% [19:47:22] ... [19:47:30] PROBLEM - Host ps1-d2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [19:47:30] PROBLEM - Host dobson is DOWN: PING CRITICAL - Packet loss = 100% [19:47:30] PROBLEM - Host ns1.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [19:47:30] PROBLEM - Host pdf3 is DOWN: PING CRITICAL - Packet loss = 100% [19:47:30] PROBLEM - Host 208.80.152.131 is DOWN: PING CRITICAL - Packet loss = 100% [19:47:30] PROBLEM - Host nfs1 is DOWN: PING CRITICAL - Packet loss = 100% [19:47:32] that's me [19:47:40] PROBLEM - Host labs-ns0.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [19:47:40] PROBLEM - Host 208.80.152.132 is DOWN: PING CRITICAL - Packet loss = 100% [19:47:41] config error, wait a bit [19:48:32] (03PS2) 10Aaron Schulz: Removed redundant pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/127646 [19:48:40] (03CR) 10Aaron Schulz: [C: 032] Removed redundant pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/127646 (owner: 10Aaron Schulz) [19:51:00] PROBLEM - Host mexia is DOWN: PING CRITICAL - Packet loss = 100% [19:51:16] (03Merged) 10jenkins-bot: Removed redundant pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/127646 (owner: 10Aaron Schulz) [19:54:44] !log aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Removed redundant pool counter config' [19:54:54] Logged the message, Master [19:56:10] RECOVERY - Host mexia is UP: PING WARNING - Packet loss = 37%, RTA = 27.51 ms [19:56:20] RECOVERY - Host tarin is UP: PING OK - Packet loss = 0%, RTA = 26.72 ms [19:56:20] RECOVERY - Host es10 is UP: PING OK - Packet loss = 0%, RTA = 26.78 ms [19:56:20] RECOVERY - Host db72 is UP: PING OK - Packet loss = 0%, RTA = 26.77 ms [19:56:20] RECOVERY - Host ps1-d3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 29.00 ms [19:56:20] RECOVERY - Host db74 is UP: PING OK - Packet loss = 0%, RTA = 26.81 ms [19:56:20] RECOVERY - Host es4 is UP: PING OK - Packet loss = 0%, RTA = 26.82 ms [19:56:20] RECOVERY - Host db73 is UP: PING OK - Packet loss = 0%, RTA = 26.78 ms [19:56:21] RECOVERY - Host ps1-a1-sdtpa is UP: PING OK - Packet loss = 0%, RTA = 29.84 ms [19:56:21] RECOVERY - Host ps1-d1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 28.70 ms [19:56:22] RECOVERY - Host fenari is UP: PING OK - Packet loss = 0%, RTA = 26.76 ms [19:56:22] RECOVERY - Host db69 is UP: PING OK - Packet loss = 0%, RTA = 26.75 ms [19:56:23] RECOVERY - Host virt0 is UP: PING OK - Packet loss = 0%, RTA = 26.72 ms [19:56:23] RECOVERY - Host db60 is UP: PING OK - Packet loss = 0%, RTA = 26.79 ms [19:56:24] RECOVERY - Host ps1-c3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 29.25 ms [19:56:24] RECOVERY - Host ps1-c2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 28.85 ms [19:56:25] RECOVERY - Host db71 is UP: PING OK - Packet loss = 0%, RTA = 26.78 ms [19:56:25] RECOVERY - Host ps1-c1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 29.12 ms [19:56:26] RECOVERY - Host linne is UP: PING OK - Packet loss = 0%, RTA = 26.77 ms [19:56:26] RECOVERY - Host es7 is UP: PING OK - Packet loss = 0%, RTA = 26.77 ms [19:56:27] RECOVERY - Host sanger is UP: PING OK - Packet loss = 0%, RTA = 26.74 ms [19:56:27] RECOVERY - Host ekrem is UP: PING OK - Packet loss = 0%, RTA = 26.78 ms [19:56:28] RECOVERY - Host manutius is UP: PING OK - Packet loss = 0%, RTA = 26.82 ms [19:56:28] RECOVERY - Host mchenry is UP: PING OK - Packet loss = 0%, RTA = 26.73 ms [19:56:40] RECOVERY - Host dataset2 is UP: PING OK - Packet loss = 0%, RTA = 37.35 ms [19:56:51] RECOVERY - Host dobson is UP: PING OK - Packet loss = 0%, RTA = 36.62 ms [19:56:51] RECOVERY - Host ps1-d2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 38.87 ms [19:56:51] RECOVERY - Host ns1.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 35.68 ms [19:56:51] RECOVERY - Host 208.80.152.131 is UP: PING OK - Packet loss = 0%, RTA = 35.46 ms [19:56:51] RECOVERY - Host pdf3 is UP: PING OK - Packet loss = 0%, RTA = 35.41 ms [19:57:00] RECOVERY - Host nfs1 is UP: PING OK - Packet loss = 0%, RTA = 38.35 ms [19:57:00] RECOVERY - Host labs-ns0.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 36.66 ms [19:57:00] RECOVERY - Host 208.80.152.132 is UP: PING OK - Packet loss = 0%, RTA = 36.45 ms [19:57:10] (03PS3) 10Reedy: Added Markus Glaser's GPG keys [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121397 (owner: 10Mglaser) [19:57:15] (03CR) 10Reedy: [C: 032] Added Markus Glaser's GPG keys [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121397 (owner: 10Mglaser) [19:58:20] Coren: mind having a look at https://gerrit.wikimedia.org/r/#/c/126969/ ? [19:58:42] * Coren looks [19:58:54] Coren: That domain already has the WMF's DNS [19:59:04] Doneva just kindly did that for me. [19:59:05] (03Merged) 10jenkins-bot: Added Markus Glaser's GPG keys [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121397 (owner: 10Mglaser) [19:59:48] (03CR) 10coren: [C: 031] "Pretty standard stuff." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/126969 (owner: 10Odder) [19:59:49] !log reedy synchronized docroot and w [19:59:55] Logged the message, Master [20:00:01] Do you need me to +2 and deploy? [20:00:28] Yes, the redirects won't work without this [20:00:49] If you're doing apache changes.. Could you please do https://gerrit.wikimedia.org/r/#/c/91339/ too as it's trivially simple? [20:12:24] PHP Warning: gzinflate(): data error in /usr/local/apache/common-local/php-1.24wmf2/includes/Revision.php on line 1296 [20:16:41] There's been a few of those [20:16:51] 125 in the last hour [20:16:53] Do we log what revisions they actually are? [20:17:19] Apparently not. I con only find it in the apache2.log [20:17:26] s/con/can/ [20:20:04] Looks like something worth logging [20:20:44] We apparently don't check the return result of gzinflate either. [20:21:29] Ah, so it would just pass out of Revision::decompressRevisionText as false [20:22:50] Reedy: I don't think it's order dependent; did you intend for this to be strictly cosmetic? [20:23:05] Pretty much [20:23:11] Everything else has it near the start [20:23:29] (03CR) 10coren: [C: 032] "Linting." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91339 (owner: 10Reedy) [20:23:37] (03PS2) 10coren: Redirect wikisource.pl to pl.wikisource.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/126969 (owner: 10Odder) [20:25:34] * Coren waits on Jenkins. [20:26:37] (03CR) 10coren: [C: 032] "Simple rebase." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/126969 (owner: 10Odder) [20:33:18] Hm, wait, that obviously shouldn't be done from fenari anymore [20:33:30] PROBLEM - Host ps1-c1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [20:33:40] PROBLEM - Host ps1-d1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [20:33:40] PROBLEM - Host ps1-c3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [20:33:40] PROBLEM - Host ps1-d3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [20:33:40] PROBLEM - Host ps1-a1-sdtpa is DOWN: PING CRITICAL - Packet loss = 100% [20:33:48] Obviously? [20:33:53] AFAIK it hasn't been moved... [20:34:20] PROBLEM - Host ps1-d2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [20:34:28] Hasn't it? [20:34:30] PROBLEM - Host ps1-c2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [20:34:57] One of a few remnants still on fenari, hence it being moved upstairs [20:35:18] Ah, right. [20:35:50] https://rt.wikimedia.org/Ticket/Display.html?id=6085 [20:37:35] !log sync-apache for 126969 and 91339 [20:37:41] Logged the message, Master [20:39:00] \o/ [20:39:25] !log shutting down sdtpa, cr1-sdtpa, csw1-sdtpa, msw1-sdtpa and other sdtpa hosts gone forever [20:39:31] Logged the message, Master [20:50:18] twkozlowski: Should be pushed. [20:51:40] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection refused [20:51:40] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection refused [20:51:40] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection refused [20:51:50] PROBLEM - Apache HTTP on mw1153 is CRITICAL: Connection refused [20:51:50] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection refused [20:51:51] PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection refused [20:51:51] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection refused [20:52:10] PROBLEM - Apache HTTP on mw1156 is CRITICAL: Connection refused [20:52:30] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection refused [20:52:51] Thanks Coren [20:53:05] what's going on? [20:53:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [20:53:11] Coren: that you? [20:53:42] paravoid: Probably; something went funky during the all-graceful, I'm about to fix by hand now. [20:54:20] what went funky? [20:54:30] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 65674 bytes in 0.579 second response time [20:54:40] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.224 second response time [20:54:40] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.095 second response time [20:54:40] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.154 second response time [20:54:48] Got a bunch of system failed sanity check followed by 'httpd not running, trying to start' [20:54:51] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.124 second response time [20:54:51] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.153 second response time [20:54:51] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.067 second response time [20:54:51] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.128 second response time [20:55:10] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.095 second response time [20:56:01] Which clearly didn't work. I've just done a round of apache2ctl start on the boxen and back they are. [20:57:08] It doesn't look like the outage was long enough to break the site. Damn, I'm still not a real ops. :-) [20:57:42] <^d> Coren: Just walk away from the keyboard. I'm sure it'll blow up for you then ;-) [21:18:46] !log restarting Zuul [21:18:51] Logged the message, Master [21:18:53] I wanna sleep!! [21:22:23] !log eventlogging dump loading on db1048 m2 master in screen. ok to kill if necessary [21:22:30] Logged the message, Master [21:30:20] (03PS1) 10Ottomata: Adding 15 minute load average and iowait to Kafka ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129579 [21:31:17] (03PS2) 10Ottomata: Adding 15 minute load average and iowait to Kafka ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129579 [21:33:40] PROBLEM - MySQL Replication Heartbeat on db1046 is CRITICAL: CRIT replication delay 307 seconds [21:34:00] PROBLEM - MySQL Slave Delay on db1046 is CRITICAL: CRIT replication delay 315 seconds [21:34:36] ACKNOWLEDGEMENT - MySQL Replication Heartbeat on db1046 is CRITICAL: CRIT replication delay 307 seconds Sean Pringle Loading eventlogging dump. - The acknowledgement expires at: 2014-04-26 21:33:53. [21:34:36] ACKNOWLEDGEMENT - MySQL Slave Delay on db1046 is CRITICAL: CRIT replication delay 315 seconds Sean Pringle Loading eventlogging dump. - The acknowledgement expires at: 2014-04-26 21:33:53. [21:35:58] (03CR) 10Chad: [C: 031] Update highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/129484 (owner: 10Manybubbles) [21:38:02] (03PS3) 10Ottomata: Adding 15 minute load average and iowait to Kafka ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129579 [21:38:07] (03CR) 10Ottomata: [C: 032 V: 032] Adding 15 minute load average and iowait to Kafka ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129579 (owner: 10Ottomata) [21:44:44] <^d> springle: That pp_sortkey change got merged to master. It's off by default, but I guess it can be applied to production wikis at your leisure. [21:44:48] <^d> You want a bug for tracking it? [21:54:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [21:54:42] (03PS1) 10Bsitu: Update flow cache version to 4.2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129589 [22:31:57] (03PS2) 10Dr0ptp4kt: Support HTTPS for 429-02 & 428-98. Add OM for 428-98, too. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129461 [22:53:27] !log Running PopulateImageSha1.php for all multi-versioned files on all wikis to fix broken SHA-1s [22:53:34] Logged the message, Master [23:00:08] (03PS3) 10Dr0ptp4kt: Support HTTPS for 429-02 & 428-98. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129461 [23:01:12] I guess I'll do the SWAT today [23:02:17] mwalker: one more patch, adding it to list now [23:02:18] https://gerrit.wikimedia.org/r/129604 [23:02:24] spagewmf, your swat request references bug https://bugzilla.wikimedia.org/show_bug.cgi?id=64386 but that bug doesn't have a patch associated? [23:02:31] ebernhardson, kk [23:02:34] mwalker: thats the one i just linked [23:02:36] i think [23:04:14] ebernhardson, hmm; the bug description and the patch commit message dont match [23:04:22] but I'll deploy both the patches [23:04:29] if you dig up the third patch I'll do that one too [23:05:10] mwalker: its just not clear, basically the we write to cache when we save to db, the multiPut patch changed whats written to cache which caused the missing content [23:05:27] so reverting the multiPut to fix the missing content(right content is in db) [23:05:29] ah; kk [23:06:30] (03PS4) 10Ori.livneh: Exec saltutil.sync_all when adding deployment_server grain [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 (owner: 10BryanDavis) [23:08:13] (03CR) 10Ori.livneh: [C: 032] Exec saltutil.sync_all when adding deployment_server grain [operations/puppet] - 10https://gerrit.wikimedia.org/r/129368 (owner: 10BryanDavis) [23:10:10] OK, Deployments has the right gerrits [23:12:03] (03CR) 10Mwalker: [C: 032] Update flow cache version to 4.2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129589 (owner: 10Bsitu) [23:16:42] (03PS4) 10BBlack: Support HTTPS for 429-02 & 428-98. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129461 (owner: 10Dr0ptp4kt) [23:16:45] !log mwalker synchronized php-1.24wmf2/extensions/Flow 'Updating flow for {{gerrit|129589}} and {{gerrit|129604}}' [23:16:48] (03CR) 10BBlack: [C: 032 V: 032] Support HTTPS for 429-02 & 428-98. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129461 (owner: 10Dr0ptp4kt) [23:16:52] Logged the message, Master [23:16:53] bblack, thx [23:17:19] mwalker thanks for doing this while ebernhardson is being a reality TV star :) [23:17:31] hehe [23:17:34] !log mwalker synchronized wmf-config/CommonSettings.php 'Updating flow configuration {{gerrit|129589}}' [23:17:35] spagewmf, your code is live [23:17:39] if you want to test [23:17:41] Logged the message, Master [23:17:53] mwalker will do, thanks [23:18:34] no fatals or exceptions yet [23:18:36] which is good :) [23:18:57] (03PS2) 10BBlack: Add cp3013 to esams mobile cache backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/128228 [23:19:24] (03PS3) 10BBlack: Add cp3013 to esams mobile cache backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/128228 [23:19:33] bblack: \o/ [23:21:16] (03CR) 10BBlack: [C: 032 V: 032] Add cp3013 to esams mobile cache backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/128228 (owner: 10BBlack) [23:24:56] spagewmf, are we looking good? [23:25:28] mwalker: sorry, yes, both bugs addressed. Thanks again. Now streak in front of the camera [23:25:47] hehe [23:25:57] wikidata just blew the heck up; I wonder what happened [23:27:21] greg-g, I'm done with the deploy [23:27:24] mwalker: annnnnnnd you're being taped [23:27:29] :) [23:28:14] anyone filmed, load this up and start typing http://fediafedia.com/neo/blackmesa/index.html [23:29:10] paravoid: yeah so, cp3013 as a backend is working, I don't think the pybal frontend bit is working right for some reason [23:29:27] what's wrong? [23:29:32] pybal.log says it's successful at connecting and it's pooled [23:29:45] but I'm just not seeing the connection traffic on cp3013 varnishlog -n frontend [23:29:50] not sure yet [23:30:21] oh [23:30:22] I know.. [23:30:29] it just occured to me [23:31:04] depool it :) [23:31:06] depool it, it's broken and we're going to lose traffic [23:31:18] aw shit, why didn't I think of this before [23:31:34] ok it's coming out [23:31:39] greg-g: Let me know when I can run a test to make sure scap-rebuild-cdbs still works now that Ori merged my python version. [23:32:17] bd808: pretty sure swat's over (right mwalker ?), so, now [23:32:31] so.. we're in a bit of a trouble :) [23:32:32] * greg-g is still catching up mentally [23:32:35] not just with the cp* [23:32:36] paravoid: ??? [23:32:45] not that kind of trouble greg-g [23:32:58] * greg-g breathes [23:33:02] yepyep; I'm out of the swat [23:33:05] bblack: so, we do LVS-DR [23:33:16] paravoid: sorry, just thought you were responding to me [23:33:31] bblack: which means that the LVS server needs to be on the same LAN(s) as the backend servers [23:33:51] amslvs are not on the private network, so this won't work [23:34:00] right, ok [23:34:04] you could of course say that we could put the cp* back to the public network [23:34:07] but [23:34:12] the new lvs servers are only on the *private* network [23:34:24] so they'll be broken with all public servers [23:34:33] i.e. with all of our caches there [23:34:34] can't we put the lvs on both networks? [23:34:39] right, that's what we do at eqiad [23:34:40] vlans [23:34:44] right [23:34:53] using lan support in linux over a single link, right? [23:34:57] *vlan support [23:34:58] ulsfo is easier, since everything is on the private lan [23:35:00] yeah [23:35:03] 802.1q [23:35:08] ok [23:35:18] hm, I'm not sure if that's we do at eqiad [23:35:19] well, we can still use cp301[34] as backends for now and then sort this mess out [23:35:24] yes it is [23:35:29] yes [23:35:38] the mess needs to be sorted for lvs30xx as well [23:35:44] yeah [23:35:45] it's a good thing we're doing this together [23:35:50] :) [23:35:50] the two transitions :) [23:36:08] it'd be way more error-prone to switch the ports to tagged [23:36:12] on the existing LVSes :) [23:36:16] moving cp301[34] to private1 has proved to be an enlightening and informative adventure :) [23:36:32] !log Running scap-rebuild-cdbs on tin to test python port [23:36:33] well, we'd deal with that even without this step [23:36:38] Logged the message, Master [23:36:38] because of lvs30xx->public [23:37:06] yeah but that would've been a much worse way to find out [23:37:12] :) [23:37:18] indeed [23:37:28] sorry for not thinking about this sooner :( [23:37:40] well I could've thought of it too :P [23:38:34] I think for eqiad we use a separate port to do trunking [23:39:19] yeah I'll dig around and look at eqiad's vlan setup and figuring out how to make lvs300x do something similar [23:39:59] note that eqiad is a bit more complicated anyway [23:40:12] because each of the 4 rows of racks is a separate L3 domain [23:40:23] with a separate switch (stack) [23:40:33] so each LVS server connects with 4 different cross-row cables to each stack [23:40:43] so eth0 is row A, eth1 is row B, eth2 is row C and eth3 is row D [23:41:01] and then there's eth3.1011, which is vlan 1011 (say, analytics vlan) for row D [23:41:24] layer3 domains are basically a NxM where N = row, M = purpose [23:41:38] greg-g: Ok for a "no-op" scap? The quick test on tin looked good but I'd like to see the whole run [23:41:49] (does that make any sense?) [23:41:55] bd808|deploy: yessir [23:43:23] mostly [23:43:57] but esams I should be able to just stick with the existing eth0 and make eth0.100 and eth0.103 or whatever they are [23:44:46] yes [23:45:06] https://ishmael.wikimedia.org/sample/?host=db1052 :( [23:45:19] breaksy wakesy [23:52:30] !log bd808 Started scap: no-op scap to validate I24149ab and Ie967901 [23:52:36] Logged the message, Master [23:53:59] <^d> springle: Went ahead and filed https://bugzilla.wikimedia.org/64411 for the pp_sortkey schema change. [23:55:21] !log bd808 Finished scap: no-op scap to validate I24149ab and Ie967901 (duration: 02m 51s) [23:55:28] Logged the message, Master [23:56:18] Hmmmm [23:56:38] ori: "scap-rebuild-cdbs: 100% (ok: 1; fail: 228; left: 0)" [23:56:52] "scap-rebuild-cdbs: 100% (ok: 1; fail: 228; left: 0)" [23:57:57] fscking file permissions :/ [23:58:57] bd808|deploy: let me know if you want me to help [23:59:22] I think I can fix it. Typing dsh command now