[09:43:03] (03PS2) 10Andrew Bogott: Puppetize the new_install key on palladium, add to iron. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115566 [09:45:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [09:45:53] (03CR) 10Andrew Bogott: [C: 032] Puppetize the new_install key on palladium, add to iron. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115566 (owner: 10Andrew Bogott) [09:49:46] (03CR) 10Ori.livneh: [C: 032] logstash: Parse and store scap logs [operations/puppet] - 10https://gerrit.wikimedia.org/r/115857 (owner: 10BryanDavis) [10:03:09] RECOVERY - Puppet freshness on virt1004 is OK: puppet ran at Thu Feb 27 10:03:01 UTC 2014 [10:04:59] RECOVERY - DPKG on virt1003 is OK: All packages OK [10:05:09] RECOVERY - Disk space on virt1003 is OK: DISK OK [10:05:29] RECOVERY - puppet disabled on virt1003 is OK: OK [10:05:29] RECOVERY - RAID on virt1003 is OK: OK: Active: 16, Working: 16, Failed: 0, Spare: 0 [10:06:29] RECOVERY - puppet disabled on virt1004 is OK: OK [10:06:29] RECOVERY - Disk space on virt1004 is OK: DISK OK [10:06:49] RECOVERY - RAID on virt1004 is OK: OK: Active: 16, Working: 16, Failed: 0, Spare: 0 [10:06:59] RECOVERY - DPKG on virt1004 is OK: All packages OK [10:08:19] RECOVERY - NTP on virt1003 is OK: NTP OK: Offset -0.01093828678 secs [10:09:19] RECOVERY - NTP on virt1004 is OK: NTP OK: Offset -0.01491391659 secs [10:17:41] (03PS1) 10Nemo bis: Use $wgTranslatePageTranslationULS [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115879 [10:17:43] (03PS1) 10Nemo bis: Remove dead ULS configs after I49e812eae32266f165591c75fd67b86ca06b13f0 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115880 [10:17:45] (03PS1) 10Nemo bis: Bring together all Translate/Language configurations and dependencies [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115881 [10:17:47] (03CR) 10jenkins-bot: [V: 04-1] Use $wgTranslatePageTranslationULS [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115879 (owner: 10Nemo bis) [10:19:09] PROBLEM - Host virt1005 is DOWN: PING CRITICAL - Packet loss = 100% [10:19:09] PROBLEM - Host virt1006 is DOWN: PING CRITICAL - Packet loss = 100% [10:20:29] (03PS1) 10Andrew Bogott: Add two more compute nodes to eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/115882 [10:22:35] (03PS2) 10Andrew Bogott: Add two more compute nodes to eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/115882 [10:23:18] (03PS2) 10Nemo bis: Use $wgTranslatePageTranslationULS [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115879 [10:24:19] RECOVERY - Host virt1005 is UP: PING OK - Packet loss = 0%, RTA = 5.62 ms [10:24:19] RECOVERY - Host virt1006 is UP: PING OK - Packet loss = 0%, RTA = 4.11 ms [10:24:28] andrewbogott: got senil, how do i restore a file or a change i did and commited to gerrit? [10:24:37] (03CR) 10Andrew Bogott: [C: 032] Add two more compute nodes to eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/115882 (owner: 10Andrew Bogott) [10:24:48] matanya, what do you mean 'restore'? [10:25:05] master's version [10:25:40] If you want to throw out local changes, just 'git reset --hard origin' [10:25:42] is that what you mean? [10:26:03] maybe [10:26:05] or move to a new branch and leave your changes behind… git checkout -b newbranch origin [10:26:07] i'll try :) [10:26:19] PROBLEM - Disk space on virt1005 is CRITICAL: Connection refused by host [10:26:19] PROBLEM - puppet disabled on virt1006 is CRITICAL: Connection refused by host [10:26:19] PROBLEM - SSH on virt1005 is CRITICAL: Connection refused [10:26:40] I'm reinstalling virt1005 and 1006, all those warnings are me. [10:26:43] andrewbogott: git checkout master was it [10:26:57] Oh yeah, for a single file. [10:26:59] PROBLEM - DPKG on virt1006 is CRITICAL: Connection refused by host [10:26:59] PROBLEM - puppet disabled on virt1005 is CRITICAL: Connection refused by host [10:26:59] PROBLEM - SSH on virt1006 is CRITICAL: Connection refused [10:26:59] PROBLEM - DPKG on virt1005 is CRITICAL: Connection refused by host [10:26:59] PROBLEM - Disk space on virt1006 is CRITICAL: Connection refused by host [10:27:09] PROBLEM - RAID on virt1005 is CRITICAL: Connection refused by host [10:27:16] (03PS2) 10Nemo bis: Remove dead ULS configs after I49e812eae32266f165591c75fd67b86ca06b13f0 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115880 [10:27:19] PROBLEM - RAID on virt1006 is CRITICAL: Connection refused by host [10:27:32] (03PS2) 10Nemo bis: Bring together all Translate/Language configurations and dependencies [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115881 [10:27:35] (03PS2) 10Matanya: remove locke, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/115583 [10:29:37] (03PS3) 10Matanya: remove locke, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/115583 [10:31:39] (03CR) 10Nemo bis: Bring together all Translate/Language configurations and dependencies (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115881 (owner: 10Nemo bis) [10:32:01] damit, rebase [10:35:05] hm… who is merging "Bryan Davis: logstash: Parse and store scap logs" ? [10:37:10] (03PS2) 10Matanya: remove sockpuppet, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/115688 [10:38:19] PROBLEM - NTP on virt1006 is CRITICAL: NTP CRITICAL: No response from NTP server [10:38:49] PROBLEM - NTP on virt1005 is CRITICAL: NTP CRITICAL: No response from NTP server [10:43:09] PROBLEM - Host virt1007 is DOWN: PING CRITICAL - Packet loss = 100% [10:43:19] PROBLEM - Host virt1008 is DOWN: PING CRITICAL - Packet loss = 100% [10:43:19] (03CR) 10Alexandros Kosiaris: "LGTM. Just reiterating to make sure this will work" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115821 (owner: 10DamianZaremba) [10:43:59] PROBLEM - Host virt1009 is DOWN: PING CRITICAL - Packet loss = 100% [10:46:20] RECOVERY - SSH on virt1005 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:46:29] PROBLEM - Host virt1006 is DOWN: PING CRITICAL - Packet loss = 100% [10:47:00] RECOVERY - SSH on virt1006 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:47:09] RECOVERY - Host virt1006 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [10:48:19] RECOVERY - Host virt1007 is UP: PING OK - Packet loss = 0%, RTA = 1.95 ms [10:48:25] * andrewbogott apologizes, again, for the noise [10:48:29] RECOVERY - Host virt1008 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [10:49:09] RECOVERY - Host virt1009 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [10:50:19] PROBLEM - puppet disabled on virt1007 is CRITICAL: Connection refused by host [10:50:29] PROBLEM - DPKG on virt1007 is CRITICAL: Connection refused by host [10:50:29] PROBLEM - SSH on virt1008 is CRITICAL: Connection refused [10:50:29] PROBLEM - RAID on virt1008 is CRITICAL: Connection refused by host [10:50:30] PROBLEM - Disk space on virt1007 is CRITICAL: Connection refused by host [10:50:59] PROBLEM - DPKG on virt1008 is CRITICAL: Connection refused by host [10:51:00] PROBLEM - Disk space on virt1008 is CRITICAL: Connection refused by host [10:51:09] PROBLEM - puppet disabled on virt1008 is CRITICAL: Connection refused by host [10:51:19] PROBLEM - SSH on virt1007 is CRITICAL: Connection refused [10:51:19] PROBLEM - Disk space on virt1009 is CRITICAL: Connection refused by host [10:51:19] PROBLEM - RAID on virt1007 is CRITICAL: Connection refused by host [10:51:39] PROBLEM - DPKG on virt1009 is CRITICAL: Connection refused by host [10:51:39] PROBLEM - RAID on virt1009 is CRITICAL: Connection refused by host [10:51:39] PROBLEM - puppet disabled on virt1009 is CRITICAL: Connection refused by host [10:51:49] PROBLEM - SSH on virt1009 is CRITICAL: Connection refused [10:54:59] RECOVERY - puppet disabled on virt1005 is OK: OK [10:54:59] RECOVERY - DPKG on virt1005 is OK: All packages OK [10:55:19] RECOVERY - RAID on virt1005 is OK: OK: Active: 16, Working: 16, Failed: 0, Spare: 0 [10:55:19] RECOVERY - Disk space on virt1005 is OK: DISK OK [10:56:59] RECOVERY - DPKG on virt1006 is OK: All packages OK [10:56:59] RECOVERY - Disk space on virt1006 is OK: DISK OK [10:57:06] (03CR) 10Hashar: "@Odder please STOP adding me as a reviewer on this change. Thanks." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [10:57:19] RECOVERY - RAID on virt1006 is OK: OK: Active: 16, Working: 16, Failed: 0, Spare: 0 [10:57:19] RECOVERY - puppet disabled on virt1006 is OK: OK [11:02:30] PROBLEM - NTP on virt1008 is CRITICAL: NTP CRITICAL: No response from NTP server [11:02:59] PROBLEM - NTP on virt1007 is CRITICAL: NTP CRITICAL: No response from NTP server [11:03:39] PROBLEM - NTP on virt1009 is CRITICAL: NTP CRITICAL: No response from NTP server [11:08:49] PROBLEM - Puppet freshness on fenari is CRITICAL: Last successful Puppet run was Wed 26 Feb 2014 08:04:36 PM UTC [11:10:29] PROBLEM - Host virt1007 is DOWN: PING CRITICAL - Packet loss = 100% [11:10:39] PROBLEM - Host virt1008 is DOWN: PING CRITICAL - Packet loss = 100% [11:11:19] RECOVERY - SSH on virt1007 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [11:11:19] PROBLEM - Host virt1009 is DOWN: PING CRITICAL - Packet loss = 100% [11:11:30] RECOVERY - Host virt1007 is UP: PING OK - Packet loss = 0%, RTA = 1.77 ms [11:11:49] PROBLEM - Puppet freshness on tin is CRITICAL: Last successful Puppet run was Wed 26 Feb 2014 08:07:13 PM UTC [11:12:29] RECOVERY - SSH on virt1008 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [11:12:39] RECOVERY - Host virt1008 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [11:12:49] RECOVERY - NTP on virt1005 is OK: NTP OK: Offset -0.03917109966 secs [11:12:49] RECOVERY - SSH on virt1009 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [11:12:59] RECOVERY - Host virt1009 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [11:13:26] andrewbogott: I assume all these are you, right? [11:13:33] yes, sorry. [11:13:46] There will be a flood of recoveries in a moment, then blessed silence. [11:13:56] heh, okay :) [11:15:09] RECOVERY - NTP on virt1006 is OK: NTP OK: Offset -0.0413364172 secs [11:16:18] paravoid, is there a way to tell icinga to disable all warnings for a host, temporarily? [11:16:28] yes, silence notifications [11:16:32] I poked around the web interface briefly and didn't feel confident that I wouldn't be disabling things forever. [11:16:38] * andrewbogott looks [11:16:57] iirc, you can give it a time period, or until it recovers, or until you manually reenable it [11:19:49] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Last successful Puppet run was Wed 26 Feb 2014 08:14:54 PM UTC [11:20:00] (03PS1) 10Alexandros Kosiaris: Fix syntax error in osmlabsdb.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/115884 [11:20:24] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix syntax error in osmlabsdb.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/115884 (owner: 10Alexandros Kosiaris) [11:20:27] I see… 'delay next host notification'? [11:20:59] RECOVERY - DPKG on virt1008 is OK: All packages OK [11:20:59] RECOVERY - Disk space on virt1008 is OK: DISK OK [11:21:09] RECOVERY - puppet disabled on virt1008 is OK: OK [11:21:19] RECOVERY - RAID on virt1007 is OK: OK: Active: 16, Working: 16, Failed: 0, Spare: 0 [11:21:19] RECOVERY - Disk space on virt1009 is OK: DISK OK [11:21:19] RECOVERY - puppet disabled on virt1007 is OK: OK [11:21:29] RECOVERY - DPKG on virt1007 is OK: All packages OK [11:21:29] RECOVERY - RAID on virt1008 is OK: OK: Active: 16, Working: 16, Failed: 0, Spare: 0 [11:21:30] RECOVERY - Disk space on virt1007 is OK: DISK OK [11:21:39] RECOVERY - DPKG on virt1009 is OK: All packages OK [11:21:39] RECOVERY - puppet disabled on virt1009 is OK: OK [11:21:39] RECOVERY - RAID on virt1009 is OK: OK: Active: 16, Working: 16, Failed: 0, Spare: 0 [11:22:34] hashar_: ? [11:22:48] no andrewbogott Disable notifications for all services on this host [11:22:59] you will need to reenable once you are done [11:23:32] Ah, yeah, I was hoping for something timed [11:23:36] but, yeah, I see how to do that. [11:23:45] and if you want to disable notifications for the host itselfm :Disable notifications for this host [11:23:54] which you also need to reenable [11:24:23] andrewbogott: i had a handy script once, that changes this on demand [11:24:41] i can look fot it if it is worth something for you [11:25:46] odder: hi! I dont want to be involved in the "remove Flow from metawiki" drama :-] [11:26:31] hashar_: You will notice I had nothing to do with that Gerrit patch since my comment at 18:45 UTC on Feb 27. Thank you. [11:26:40] matanya: Hopefully I won't be doing this again anytime soon [11:26:47] Feb 25, of course [11:26:49] amen [11:27:33] odder: might have confused who asked me to review so :D [11:28:16] MZMcBride updated the commit message, maybe that's why you got a notification [11:29:32] akosiaris: pdf1 mgmt ip is 10.1.8.19 ? [11:30:29] that is what DNS says [11:30:31] paravoid: can you please confirm: https://gerrit.wikimedia.org/r/#/c/114136/ ? [11:30:40] ok, i'll fix it thanks akosiaris [11:34:20] (03PS2) 10Tim Landscheidt: standard-packages: remove absent statements of unused packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/114136 (owner: 10Matanya) [11:37:21] (03CR) 10Tim Landscheidt: labs_lvm: use subshell instead of trust puppet (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115811 (owner: 10coren) [11:38:29] RECOVERY - NTP on virt1008 is OK: NTP OK: Offset -0.02925932407 secs [11:38:29] RECOVERY - NTP on virt1009 is OK: NTP OK: Offset -0.03124427795 secs [11:38:59] RECOVERY - NTP on virt1007 is OK: NTP OK: Offset -0.04845142365 secs [11:43:09] (03PS2) 10Matanya: removed pdf1, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/115581 [11:48:16] (03Abandoned) 10Hashar: retab mail.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/104806 (owner: 10Hashar) [11:48:24] (03Abandoned) 10Hashar: certs.pp puppet lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/104743 (owner: 10Hashar) [11:48:31] (03Abandoned) 10Hashar: mail.pp puppet lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/104807 (owner: 10Hashar) [11:48:37] (03Abandoned) 10Hashar: retab realm.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/104808 (owner: 10Hashar) [11:48:44] (03Abandoned) 10Hashar: realm.pp puppet lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/104809 (owner: 10Hashar) [11:49:24] (03PS9) 10Hashar: sanity test for refreshWikiversionsCDB [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105698 [11:51:44] (03PS2) 10Hashar: contint: webproxy for maven on CI production slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/114597 [11:52:35] (03PS2) 10Hashar: contint: slaves now have openjdk-{6,7}-jdk [operations/puppet] - 10https://gerrit.wikimedia.org/r/114619 [11:52:49] (03CR) 10Hashar: "Already installed manually iirc :-]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/114619 (owner: 10Hashar) [12:07:59] PROBLEM - DPKG on virt1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:09:59] RECOVERY - DPKG on virt1006 is OK: All packages OK [12:46:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [13:35:35] !log Deploy integration/slave-scripts Icd2b25fd882b7953ee [13:35:43] Logged the message, Master [14:09:49] PROBLEM - Puppet freshness on fenari is CRITICAL: Last successful Puppet run was Wed 26 Feb 2014 08:04:36 PM UTC [14:12:49] PROBLEM - Puppet freshness on tin is CRITICAL: Last successful Puppet run was Wed 26 Feb 2014 08:07:13 PM UTC [14:20:49] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Last successful Puppet run was Wed 26 Feb 2014 08:14:54 PM UTC [14:26:57] (03PS1) 10Hashar: beta: fix upload cache directors [operations/puppet] - 10https://gerrit.wikimedia.org/r/115910 [14:29:04] (03CR) 10Matanya: [C: 031] contint: slaves now have openjdk-{6,7}-jdk [operations/puppet] - 10https://gerrit.wikimedia.org/r/114619 (owner: 10Hashar) [14:36:58] (03CR) 10Matanya: contint: webproxy for maven on CI production slaves (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/114597 (owner: 10Hashar) [14:37:31] hashar: i know you didn't ask for it^ , but well :) [14:40:47] (03CR) 10Ottomata: [C: 032 V: 032] Explicitly link with zlib [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/115867 (owner: 10Edenhill) [14:42:29] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [14:43:09] (03CR) 10Ottomata: [C: 032 V: 032] Dont require whitespace-separated Hdr:fields [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/115868 (owner: 10Edenhill) [14:44:06] (03CR) 10Hashar: "A possible solution for git::clone is that one can use git init to set the appropriate file permissions. Ie AFTER the git clone / git pull" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115851 (owner: 10Dzahn) [14:54:52] (03PS1) 10ArielGlenn: Amir Aharoni access to stat1002, rt #6760 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115918 [14:55:33] (03CR) 10jenkins-bot: [V: 04-1] Amir Aharoni access to stat1002, rt #6760 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115918 (owner: 10ArielGlenn) [14:55:58] figures [14:57:20] (03PS2) 10ArielGlenn: Amir Aharoni access to stat1002, rt #6760 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115918 [15:02:24] (03CR) 10ArielGlenn: [C: 032] Amir Aharoni access to stat1002, rt #6760 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115918 (owner: 10ArielGlenn) [15:09:23] scfc_de: you did it! [15:16:27] is there a way to be subscribed to all puppet related patches? [15:18:12] mark: can i subscribe to ops-l? [15:19:39] akosiaris: I'm in the market for project-migration guinea pigs. Perhaps https://wikitech.wikimedia.org/wiki/Nova_Resource:Akosiaristests is a good candidate? [15:19:52] (03PS4) 10BBlack: Set ZeroOpts=tls cookie for HTTPS-enabled Zero clients [operations/puppet] - 10https://gerrit.wikimedia.org/r/115669 [15:20:06] andrewbogott: how much ? [15:20:39] I can offer… the satisfaction of a job well done. [15:20:39] Or, possibly, a job poorly done. [15:20:49] it's a deal! [15:20:53] andrewbogott: you can move the Varnish project if you want [15:21:19] it's got two instances, zero-vmod that I use (but can fairly easily rebuild if it's hosed), and some old instance I don't think anyone's using (varnish-precise) [15:21:31] anyway, feel free to move them [15:21:35] bblack, want to just delete the instances that are defunct? [15:21:54] yeah I guess I should, that other one isn't "mine", but I'm pretty sure nobody's used it in months or cares [15:22:11] varnish-precise is mine [15:22:17] feel free to move it anytime [15:22:47] ah there we go! someone does care about it :) [15:23:08] akosiaris: are all those instances still used? Or can some be deleted? [15:23:30] I'll start with akosiaris if he doesn't mind downtime... [15:23:48] andrewbogott: they are indeed used so I 'd rather you did not delete them [15:23:54] especially the package building one [15:24:00] akosiaris: ok -- downtime's ok though? [15:24:06] yeah sure [15:24:13] as long as i don't notice anything :P [15:24:22] hah [15:24:26] just ddos him [15:24:29] he won't notice anything [15:24:38] andrewbogott: what prevents instances stay around for ever? [15:24:59] matanya: resources ? [15:25:14] i mean how takes care to delete un used ones? [15:25:21] matanya: nothing at the moment. Someday I hope to have some way of detecting a year-long idle instance... [15:25:32] just start charging [15:25:35] wikicoins [15:25:45] pay in edit count [15:25:46] :-) [15:25:49] we could put some kind of cron in the default puppet setup to send an alert if nobody's logged into a machine in a month [15:25:57] you can check uptime with nagios, and kill long running stuff [15:26:37] matanya: https://whispersystems.org/blog/bithub/ [15:26:38] er [15:26:40] mark: ^ [15:27:36] btw paravoid can you please confirm: https://gerrit.wikimedia.org/r/#/c/114136/ ? [15:29:00] (03PS3) 10Faidon Liambotis: base: remove redundant ensure => absent Packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/114136 (owner: 10Matanya) [15:29:04] (03PS4) 10Matanya: base: remove redundant ensure => absent Packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/114136 [15:29:11] (03CR) 10Faidon Liambotis: [C: 032 V: 032] base: remove redundant ensure => absent Packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/114136 (owner: 10Matanya) [15:29:20] thank you [15:31:03] oof, andrewbogott, ariel [15:31:17] do you know how to clear puppet stored data? [15:31:23] the problem is that ssh::hostkey does this [15:31:28] $host = regsubst($title, '^([^\.]+)\..*$', '\1') [15:31:31] sshkey { $host: [15:31:35] and it is called like this: [15:31:40] (apergos: ^) [15:31:46] @@ssh::hostkey { $::fqdn: [15:31:46] That makes sense as th explanation, but I don't know how to clear it [15:32:04] what's the problem? [15:32:14] I see [15:32:26] paravoid, i switched analytics1003 and 1004 from wikimedia.org -> eqiad.wmnet yesterday [15:32:31] you can clean all exported resources with a script we have [15:32:39] oo, have to do all of them eh? [15:32:39] you should had done that when you switched hostnames [15:32:46] it might toss exported resources for both hostnames [15:32:48] also revoke the old puppet certificate & salt certificate [15:33:00] but that doesn't matter, puppet will just put them back with the right name on the next run [15:33:06] i revoked the old puppet cert [15:33:54] akosiaris: plan merging https://gerrit.wikimedia.org/r/#/c/112885/ ? [15:34:15] * matanya is on an ops hunt for reviews/merges  [15:34:47] yes but i wanna test something on it first (just to make sure). but not available to be hunted right now [15:34:55] https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Reclaim_or_Decommission the steps you want here are: run puppetstoredconfigclean.rb on the pupetmaster [15:35:08] ah it's the cron job that does them both, this does only th speciic fqdn [15:35:28] and... manually revoke keys form salt [15:35:30] akosiaris: Ok, I'm going to run a script which will shut down your instances one by one, and bring up duplicates in eqiad. Please leave the pmtpa versions shut down if you can stand it... [15:35:32] (03CR) 10Hashar: "The daemon is installed by timidity-daemon which s apparently not installed on the labs instance on which I applied role::applicationserve" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115618 (owner: 10Hashar) [15:35:50] andrewbogott: Oh can I create new ones in eqiad ? [15:35:52] At the end you'll want to change the dns setting for your one public IP to point to the new eqiad instance. You know how to do that, right? [15:36:04] matanya: Re subscribing, have you looked at Gerrit => settings => watching something? [15:36:21] andrewbogott: I think I do [15:36:29] akosiaris: You'll automatically get duplicate instances in eqiad. They'll have different IDs and IPs. But should otherwise be the same... [15:36:38] I am hoping that you will verify/confirm that they are the same :) [15:36:43] no I mean completely news one [15:36:43] thanks scfc_de [15:36:44] ok apergosi just ran that for an03 and 04 .wm.org [15:36:47] new ones* [15:36:51] If it goes awry we can just restart the old ones. [15:37:06] akosiaris: Wikitech can't do it yet, but you can on virt1000 if you want to. [15:37:16] yeah never mind ... [15:37:22] Wikitech will be able to create eqiad instances in a couple of days, Ryan_Lane willing [15:37:34] thanks. Ok go on moving my VMs at will :-) [15:37:56] Anyway… ready for me to throw the switch? I will most likely go to sleep before it finishes but hopefully it'll be clear w/not chaos is ensuing before I punch out. [15:38:36] ok done, thanks apergos, andrewbogott, didn't know about those steps [15:38:39] is the salt key thing new? [15:38:43] i've never done that one before [15:38:49] RECOVERY - Puppet freshness on bast1001 is OK: puppet ran at Thu Feb 27 15:38:44 UTC 2014 [15:38:53] ottomata: neither did I -- I rebuild some servers today so you probably fixed things for me too :) [15:39:02] woo Recovery! [15:40:29] thank you for that fix [15:40:42] !log disabled puppet on carbon, testing some autoinstall stuff [15:40:43] not that new, it's been there for... well since salt has been on the cluster [15:40:50] Logged the message, Master [15:41:10] and I see a clean run on the bastion I was capming on so yay [15:44:29] akosiaris: do any of your instances use nfs for /data/project or /home? [15:44:37] nope [15:44:44] cool [15:44:47] (03PS1) 10Manybubbles: Configure Elasticsearch for safe full restart [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 [15:44:59] (03CR) 10Manybubbles: [C: 04-1] "not yet tested." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [15:47:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [15:50:28] (03PS6) 10Matanya: Torrus: add torrus to netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/108314 [15:52:48] (03PS1) 10Cmjohnson: Adding dns entries for elastic1013-1016 [operations/dns] - 10https://gerrit.wikimedia.org/r/115926 [15:53:52] i'm willing to pay. who wants money for reviews ? [15:54:31] (03CR) 10Cmjohnson: [C: 032] Adding dns entries for elastic1013-1016 [operations/dns] - 10https://gerrit.wikimedia.org/r/115926 (owner: 10Cmjohnson) [15:55:16] !log Eleven mediawiki-installation dsh group hosts have stale /src/scap checkouts: fenari, mw1010, mw1066, mw1091, mw1107, mw1143, mw1150, mw1189, mw1204, mw1205, mw43 [15:55:24] Logged the message, Master [15:55:41] !log /srv/scap, not /src/scap [15:55:49] Logged the message, Master [15:56:47] akosiaris: do you mind if I update the puppet repo on amaster0? [15:57:04] hmm that one you can actually delete [15:57:10] the entire VM that is [15:57:30] wait, isn't it the puppet master for postgres1 and postgres0? [15:58:01] yeah but my changes are merged now so I will just point them back to virt0 [15:58:28] OK… I need a clean, recent puppet run before I can migrate. Want me to fuss with their puppet configs or do you want to do it [15:58:30] ? [15:58:36] Actually -- [15:58:48] tell you what [15:58:49] I'm pretty interested in seeing how this works :) So I'm going to update amaster0 and keep it for now [15:58:52] To see if they survive. [15:58:56] If that's ok [15:58:59] ah ok [15:59:03] cool [15:59:04] fine with me [15:59:35] Would a root fix the stale /srv/scap checkouts for me? Bug 61970 is keeping me from fixing this myself. dsh -g mediawiki-installation -M -F 40 -- 'cd /srv/scap ; git pull --ff-only' [16:01:55] Also the 11 hosts I logged above are apparently not pulling updates via puppet. I'm not sure if that is due to puppet being disabled on those hosts, the mediawiki::sync class not being applied on them, or so other issue [16:03:58] RECOVERY - Host analytics1004 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [16:04:08] RECOVERY - Host analytics1003 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [16:06:18] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Thu Feb 27 16:06:11 UTC 2014 [16:06:39] bd808: I assume the /src mentions in 61970 are all referring to /srv right ? [16:07:01] akosiaris: Yes. Bad typing on my part [16:07:40] The brain sends /srv but the left hand says that's a typo and aut-corrects [16:07:46] mw1189: error: Your local changes to the following files would be overwritten by merge: [16:07:46] mw1189: Updating 7dacf5b..51ea36d [16:07:46] mw1189: scap/log.py [16:07:46] mw1189: Please, commit your changes or stash them before you can merge. [16:07:50] meh [16:08:01] what local changes ? [16:08:04] That's the permissions issue [16:08:18] RECOVERY - Puppet freshness on tin is OK: puppet ran at Thu Feb 27 16:08:12 UTC 2014 [16:08:20] It's a partially applied pull [16:09:20] Running `git checkout -- .` to clear should be safe [16:09:32] so it 'cd /srv/scap ; git checkout -- . ; git pull --ff-only' then [16:09:59] Yes. That should fix things [16:10:33] ok done [16:11:04] \o/ Thanks akosiaris [16:11:21] I'm seeing all nodes at 51ea36d now [16:11:31] you are welcome. I am a little bit unclear on the 11 hosts not pulling updates via puppet though [16:12:11] I was going to scan site.pp and see if it's obvious [16:13:27] I don't remember how deep in the roles the application of mediawiki::sync is buried [16:14:45] (03CR) 10BryanDavis: "role::logstash will need associated changes to set the new variables appropriately for a 3 node cluster." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [16:16:29] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [16:16:40] (03CR) 10Manybubbles: "Is it always a three node cluster? I was confused because role::elasticsearch seemed to only contain production configuration." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [16:18:16] (03CR) 10BryanDavis: "At the moment role::logstash is only used in production. The labs instance is hand built. Bug 61753 will eventually fix that." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [16:18:48] PROBLEM - NTP on analytics1004 is CRITICAL: NTP CRITICAL: Offset unknown [16:29:23] ok finally: [16:29:24] root@palladium:~# salt -t 120 -G 'cluster:cache_*' cmd.run 'dpkg-query -W varnish' |grep varnish|sort |uniq -c 84 varnish 3.0.5plus~wmftest-wm4 [16:29:44] well that paste sucked, but they're all updated [16:30:58] thanks brandon :) [16:35:27] (03PS4) 10Gerrit Patch Uploader: Enable VisualEditor on OTRS_wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [16:35:29] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [16:35:43] (03PS2) 10Manybubbles: Configure Elasticsearch for safe full restart [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 [16:36:37] (03CR) 10PiRSquared17: [C: 031] "Maybe it will work now?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [16:36:48] (03CR) 10Manybubbles: "Fixed a bug in my new erb, added configuration for logstash, and verified it on a self hosted puppetmaster." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [16:37:40] (03CR) 10Jforrester: "What were you trying to achieve with PS4?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [16:38:31] (03CR) 10PiRSquared17: "jenkins said it couldn't be merged." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [16:39:01] (03CR) 10PiRSquared17: "@Jforrester: it can be merged now" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [16:40:43] (03CR) 10BryanDavis: "One spelling error. See comment" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [16:42:00] (03PS3) 10Manybubbles: Configure Elasticsearch for safe full restart [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 [16:42:07] (03CR) 10Manybubbles: "fixed speeling" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [16:43:44] (03CR) 10BryanDavis: [C: 031] Configure Elasticsearch for safe full restart [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [16:46:40] (03CR) 10Jforrester: [C: 04-1] "Per my comments on PS1." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [16:46:58] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [16:47:38] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 35.43 ms [16:48:13] akosiaris: have a look at buildmachine.eqiad.wmflabs when you have a moment, let me know if it seems right [16:48:30] there's a hangup with puppet -- it will throw errors because it kind of wants you to have a shared homedir but you don't. [17:02:05] (03PS1) 10Cmjohnson: Adding dhcp info for elastic1013-6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115932 [17:08:40] (03CR) 10Cmjohnson: [C: 032] Adding dhcp info for elastic1013-6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115932 (owner: 10Cmjohnson) [17:17:20] andrewbogott: I don't think dns for this exists, host buildmachine.eqiad.wmflabs does not return anything [17:17:52] and I frankly have no idea in which project it is to at least find the id [17:28:31] (03PS1) 10Alexandros Kosiaris: Add mod_accesslog to install-server lighttpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/115935 [17:30:20] (03CR) 10Alexandros Kosiaris: [C: 032] Add mod_accesslog to install-server lighttpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/115935 (owner: 10Alexandros Kosiaris) [17:32:42] (03PS1) 10BryanDavis: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115936 [17:32:44] (03PS1) 10BryanDavis: Wikipedias to 1.23wmf15 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115937 [17:32:46] (03PS1) 10BryanDavis: Group0 wikis to 1.23wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115938 [17:33:07] oooh :) [17:33:53] bd808|deploy: greg-g we have another fix for the js bug we found last night [17:34:04] * bd808|deploy is not reedy [17:34:05] think i discovered a new bug and the other thing is fixed [17:34:06] ok [17:34:30] whenever is a good time for us to get the fix out.... [17:34:46] If you get it all queued up we'll see if we can get you in, but greg-g is out today too [17:34:55] ok [17:35:02] earlier the better [17:35:19] * aude thinks deploying at 1 am is not something to make a habit of [17:36:10] * bd808|deploy agrees [17:38:21] (03CR) 10BryanDavis: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115936 (owner: 10BryanDavis) [17:38:28] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115936 (owner: 10BryanDavis) [17:46:21] aude: Can you help ragesoss backport a fix for the EdProgram extension? I'm … busy :) [17:46:27] sure [17:46:53] hold on, lemme summon Andy. [17:47:02] ok [17:47:13] * aude back in 2 min [17:47:44] Hi aude [17:49:16] hi [17:50:05] Here's the patch: https://gerrit.wikimedia.org/r/#/c/115922/ [17:50:17] ok [17:50:56] It fixes several issues, including one security issue [17:51:18] does EducationProgram run on master? [17:51:40] Yeah it's all master branch [17:51:44] ok [17:52:33] Let me know what I should do to facilitate :) and thanks [17:52:35] give me a minute [17:52:50] (cherry pick button in gerrit) [17:53:07] there is no cherry pick [17:53:21] They've all been picked! [17:53:23] it's update to master [17:53:54] hmmm, "Creating new wmf/1.23wmf15 branch" [17:53:58] I do have a cherry pick button for that patch [17:54:17] so there is a branch [17:54:48] Ah aude sorry I may be wrong [17:55:07] every extension has a branch now [17:55:15] Reedy: ok [17:55:18] created from master at time of branching the deployment branch [17:55:23] makes sense [17:55:31] Sorry about that https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/EducationProgram,branches [17:55:50] i'm doing wmf15, then wmf16 + 14 [17:56:10] no point doing 14 [17:56:14] ok [17:57:03] (03CR) 10PiRSquared17: "@MZMcBride: I'm not sure I agree with "no consensus to add, so we don't need consensus to remove". If we ask on Babel, we will have defin" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [17:57:23] cherry picked to wmf15 and 16 [17:58:27] aude: and Reedy: cool, thanks a ton! [17:58:39] happy to help [17:59:19] https://gerrit.wikimedia.org/r/#/c/115943/ [17:59:22] now wmf16 [18:01:48] PROBLEM - Puppet freshness on carbon is CRITICAL: Last successful Puppet run was Thu 27 Feb 2014 03:01:37 PM UTC [18:04:16] * aude not used to committing the submodule update in tin yet [18:04:23] doing it the way i always do instead [18:04:31] manybubbles: elastic1013-16, going to base install but will not add puppet certs. you can do this when you are ready [18:04:41] aude: Are you going to prep tin already? [18:04:52] I haven't ran the first scap yet [18:04:56] whatever is wanted [18:05:02] fine if you or reedy do [18:05:09] manybubbles|away ^^ [18:05:51] aude: If the changes are ready in gerrit let's hold there until I get testwiki up on wmf16 [18:05:57] sounds good [18:06:37] Deskana: Now is the time when you use your faux-greg-g powers and say it's ok for me to push wmf16 to testwiki. :) [18:07:18] Uh, sure. :) [18:08:25] apergos: around? [18:09:10] !log bd808 Started scap: testwiki to php-1.23wmf16 and rebuild l10n cache [18:09:16] yes but eating, what's up? [18:09:18] Logged the message, Master [18:09:33] (I just reheated my food cause I didn't get to start eating it the first time it was hot :-/) [18:10:48] apergos: one step above just eating it cold, tho! [18:11:08] it's good, as long as I geet to eat it this time [18:11:19] heating a third time in the same hour = fail [18:11:49] AndyRussG: https://gerrit.wikimedia.org/r/#/c/115944/ [18:12:02] there whenever bd808|deploy is ready for them [18:12:31] aude: Thanks. [18:12:33] * aude also wants https://gerrit.wikimedia.org/r/#/c/115919/ :) [18:12:43] AaronSchulz: ping? [18:12:52] I sohuld have included your name in the reply, sorry [18:13:04] aude: Yup, noted [18:14:35] apergos: seems like not enough dprioprocs job runners are active, I was curious what restarting the runners would do [18:14:49] Look what I made: https://logstash.wikimedia.org/#/dashboard/elasticsearch/scap [18:15:02] bd808|deploy: aude: \o/ [18:15:05] one of those links i can't click [18:15:09] :( [18:15:14] I see 5 sub-"coordinator" loops running, and there are some php procs, but it's way less than it should be on the boxes I check [18:15:18] AaronSchulz: where do they run from? [18:15:22] the parsoid loops seem to be fine though [18:15:27] mw1001-mw1016 [18:15:38] aude: Sorry. You can tail on fluorine now at least. [18:15:45] * aude nods [18:15:51] ddsh -g job-runners "/etc/init.d/mw-job-runner restart" ;) [18:16:21] sorry, I should know which host th job runners are but I'm pretty dang fuzzy still [18:16:32] well I was going to check on all the boxes, sec [18:16:38] aude: The log is json and I haven't made a script to make it pretty yet [18:16:50] bd808|deploy: ok [18:18:25] how many do we expect, AaronSchulz? [18:18:36] oh, you said. 5 [18:18:43] gah. like I was saying... [18:18:48] bd808|deploy: can you hold off on pushing changes for a few? we have an issue with MobileFrontend just discovered on betalabs, we're close to af ix [18:19:19] hey bd808|deploy I think we might be shy a MobileFrontend bug fix for testwikis [18:19:31] awjr: Just doing testwiki now. I can hold there. [18:19:31] cmjohnson1: sweet! I'll ask andrew otto to set them up like the others [18:19:33] dprioprocs is 17 [18:19:38] thanks bd808|deploy [18:20:31] aude: Sounds like you and AndyRussG may get to ride an early patch train ^^ [18:21:12] ok [18:21:59] bd808|deploy: think i have a half hour or so before we deploy? [18:22:02] we* [18:22:18] aude: Yeah, at least. Need foods? [18:22:24] yeah and go home [18:22:29] * bd808|deploy nods [18:22:30] * aude begins to get hungry [18:22:37] back in a bit [18:22:52] We won't be pushing until 19:00 at the absolute earliest [18:23:04] Probably more like 19:30 [18:26:21] AaronSchulz: I'm not sure I'm able to tell the difference in raw numbers of php jobs but the few hosts I saw with 2 now have 30-40 [18:26:36] the rest were all in the mid 30s, I restarted them all anyways [18:28:10] AndyRussG. ragesoss: Now that your patch is in master it would be a good time to test it in beta. [18:29:13] bd808|deploy: test2 is where the extension is deployed. looks like it's still on wmf15 and not patched. [18:29:25] bd808|deploy, mobile's fine, you can go ahead [18:29:32] !log bd808 Finished scap: testwiki to php-1.23wmf16 and rebuild l10n cache (duration: 20m 22s) [18:29:39] Logged the message, Master [18:29:42] ragesoss: Not setup in beta labs? [18:29:46] bd808|deploy, in the worst case we might need to touch files to refresh RL cache [18:29:51] bd808|deploy: no. [18:30:13] ragesoss: Then now would be a great time to fix that! :) [18:31:25] (03PS1) 10Addshore: Add integration/php-coveralls to git-deploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/115947 [18:31:59] RECOVERY - Puppet freshness on carbon is OK: puppet ran at Thu Feb 27 18:31:56 UTC 2014 [18:32:02] (03PS2) 10Addshore: Add integration/php-coveralls to git-deploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/115947 [18:36:04] The week of March 3 (next week) is going to be a normal deployment week, right? [18:36:25] All of the MediaWiki rotations (group 0, 1, 2) are the same as the previous week. [18:36:39] I think it's just since someone forgot to change it after copying, but I wanted to double check before I fix it. [18:36:50] ^ greg-g, Deskana [18:36:52] superm401: It should be normal AFAIK [18:38:01] bd808|deploy: it looks like the extension is set up there, so I could test if I could get the admin bit with User:Ragesoss on en.wiki beta. [18:38:03] greg-g: Around? [18:38:23] hoo: Deskana is acting greg-g for the day [18:38:27] hoo: and bd808|deploy is acting Reedy [18:38:52] yuvipanda: Ah :) [18:39:25] bd808|deploy: Deskana: Did aude contact you that we want to push a small (JS only) wikidata change today? [18:39:56] hoo: She did not. More info, please? :) [18:40:55] Deskana: We want this one to get live https://gerrit.wikimedia.org/r/115919 today [18:40:58] hoo: if you mean this? https://gerrit.wikimedia.org/r/#/c/115919/ [18:41:16] ragesoss: Exactly [18:41:17] Deskana, hoo: Bryan is aware, hoo. [18:42:56] Where do I turn for an admin bit on a beta labs wiki? [18:44:10] ragesoss: Ryan_Lane2 or Andrew Bogott, I guess [18:44:38] ragesoss: hashar or chrismcmahon probably can too. Do you need shell access or user rights on wiki? [18:44:55] just user rights. adminship should suffice. http://en.wikipedia.beta.wmflabs.org/wiki/Special:Contributions/Ragesoss [18:45:18] oh, beta [18:45:24] damn, sorry just woke up [18:45:26] Can do that also [18:45:34] though wikitech [18:45:43] * thought [18:46:21] 18:45, 27 February 2014 Reedy (Talk | contribs | block) changed group membership for User:Ragesoss from (none) to administrator [18:46:59] awesome, thank.s [18:47:38] hoo ragesoss let me know if I can help on beta labs [18:48:11] apergos: https://gdash.wikimedia.org/dashboards/jobq/ [18:48:32] I guess that helped [18:48:33] looking [18:48:45] indeed it diid, good! [18:48:48] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [18:49:02] bd808|deploy: the patch is working as expected on beta labs. [18:49:16] chrismcmahon: I'm all set, thanks! [18:49:19] ragesoss: Awesome. [18:49:37] hoo: I lack the technical understanding, so I'll have to ask bd808|LUNCH about your patch. :) [18:49:45] Time to eat before the big show [18:49:47] hoo: When he is not LUNCH. [18:50:04] Deskana: If you prefer that, I can approve it myself [18:50:13] it's just a small JS fix [18:50:19] hoo: It's just the follow up to the changes you tried last night right? [18:51:00] bd808|LUNCH: It's in the same file, so kind of yes... we didn't catch all problems yesterday [18:51:06] * bd808|LUNCH nods [18:51:54] food. Then more deploy fun [18:52:01] heh [18:57:47] apergos: If I want to go from restricted to mortals... do I create a ticket in access-requests? Or another queue? [18:58:01] acces requests it would be [18:58:05] ok :) [18:58:33] and give what you want it for, get 'manager' approval (well point to previous ticket with that approval in it and cite it). [18:58:59] Ok, will do [19:11:05] (03CR) 10Dzahn: [C: 04-1] "it's still removing mgmt too (what you remove is in the $ORIGIN mgmt.pmtpa.{{ zonename }}. block). in this case you just don't have to tou" [operations/dns] - 10https://gerrit.wikimedia.org/r/115581 (owner: 10Matanya) [19:13:49] (03PS3) 10Dzahn: removed pdf1, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/115581 (owner: 10Matanya) [19:17:05] * aude back [19:17:15] * bd808|deploy me too [19:17:32] So... [19:18:45] Any opinions from the crowd on whether I should switch everything to wmf15 and group0 to wmf16 first or pull in the extension patches first? [19:18:51] * bd808|deploy is new at this bit [19:19:23] * ragesoss has no opinion. [19:19:26] I'd probably go first and do the extension ones after to keep the changes small at a time [19:19:42] I like the way hoo thinks :) [19:19:51] (03CR) 10Dzahn: [C: 04-1] "eh, but don't merge while it's still in puppet :p" [operations/dns] - 10https://gerrit.wikimedia.org/r/115581 (owner: 10Matanya) [19:20:14] Deskana: Ready for all wikis to wmf15? [19:20:19] https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcRKQ3eDFdH3whBBaf1t1BqOZHR7SGtavjXoGSg3_-XMCxJMUraf-Q [19:20:20] agree with hoo [19:20:29] we can see the education program thing on test2 [19:20:41] although i am sure it's good :) [19:20:57] bd808|deploy: I suppose? :P [19:21:06] whatever [19:21:13] See previous image. [19:21:34] aude: The education patch isn't actually in the wmf16 branch yet, but we'll do that before adding to wmf15 [19:21:46] ok [19:22:02] Deskana: :) I'm following a checklist at least [19:22:46] (03CR) 10BryanDavis: [C: 032] Wikipedias to 1.23wmf15 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115937 (owner: 10BryanDavis) [19:22:54] (03Merged) 10jenkins-bot: Wikipedias to 1.23wmf15 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115937 (owner: 10BryanDavis) [19:23:33] !log Manually purged the job queue aggregator [19:23:40] Logged the message, Master [19:23:57] !log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.23wmf15 [19:24:06] Logged the message, Master [19:24:12] bd808|deploy: I'm mostly the safeguard against people coming in and going "HAY CAN WE HAVE OUR EXTENSION TURNED ON EVERYWHERE?". :)\ [19:24:24] Aw, damn [19:24:31] There goes that strategy [19:24:42] bd808|deploy: did you ever look at that active versions fix? I can't remember. [19:24:46] Thought for sure I could get Extension:MarkTraceurIsFuckingAwesome deployed everywhere [19:24:48] !log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.23wmf15 (second try) [19:24:55] Logged the message, Master [19:25:04] not that it matters wmf15 [19:25:10] AaronSchulz: It's needs to be done but hasn't been fix yet [19:25:43] * rdwrer watches silently in solidarity [19:25:57] !log sync-wikiversions + dsh doesn't like my ssh key [19:26:06] Logged the message, Master [19:26:55] !log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.23wmf15 (third try) [19:27:05] Logged the message, Master [19:28:14] !log sync-wikiversions + dsh still doesn't like my ssh key, but I can ssh to mw1010 from tin [19:28:21] Logged the message, Master [19:29:14] bd808|deploy: can I try? ;) [19:29:23] AaronSchulz: Yes pelase [19:29:36] *please [19:29:40] how many boxes are getting it? [19:29:59] This is all to wmf15 [19:30:00] !log aaron rebuilt wikiversions.cdb and synchronized wikiversions files: [19:30:08] Logged the message, Master [19:30:26] I mean how many servers are actually syncing? [19:30:33] * AaronSchulz didn't see any errors [19:31:30] apergos: heh, purging the wikis-with-jobs hash gave a little boost too https://gdash.wikimedia.org/dashboards/jobq [19:31:42] $ wc -l files/dsh/group/mediawiki-installation [19:31:42] 427 files/dsh/group/mediawiki-installation [19:31:44] so many things you have to kick every few months :/ [19:32:08] heh [19:32:17] AaronSchulz: Apparently only tin, hume, terbium and fenari. wtf? [19:32:19] should have a crn that kicks once a month just cause [19:32:53] bd808|deploy: that's not very cool :) [19:33:21] bd808|deploy: https://graphite.wikimedia.org/render/?width=588&height=309&_salt=1393380027.43&from=-24hours&target=MediaWiki.SwiftFileBackend.getContainerStat-local-swift-eqiad-miss.count&target=MediaWiki.SwiftFileBackend.getContainerStat-local-swift-eqiad.count [19:33:33] yay, my cache fix went out elsewhere \o/ [19:35:06] Swift can't handle 1 container request every 3 seconds (even to different container DBs) without dozens of 503s a day [19:35:11] * AaronSchulz blames sqlite3 [19:35:30] webscale [19:36:35] :p [19:36:58] you should tell binasher [19:37:11] he likes webscale:) [19:37:12] now I need to fix the 404s on PUT (also caused by sqlite contention...known bug) [19:37:27] AaronSchulz: Here's the failures I'm getting. It's better than I thought at first but not good: https://gist.github.com/bd808/3ab3912093f03c5a6a9b [19:37:52] are you on windows? [19:38:03] OS X [19:38:21] * hoo could troll no... must...resist...trolling [19:38:22] heh, that looks like the errors you get using git-bash [19:38:24] * now [19:38:25] But I've scaped a lot of times before with no problems. Maybe batch size? [19:39:11] they are more likely with high fanout I've noticed [19:39:16] e.g. full 430 [19:39:19] * bd808|deploy confirms it's related to batch size [19:39:27] yup [19:39:29] though I never get it on linux or with putty (windows) [19:39:39] If I limit to 80 at a time it works fine [19:40:01] what if we didn't require agent forwarding [/troll]? [19:40:08] What if... [19:42:31] How long do we wait before switching group0 to wmf16? [19:42:52] * bd808|deploy goes to look at old logs in SAL [19:44:02] A few minutes [19:44:05] !log The problem with me and sync-wikiversions is my ssh-agent. It croaks when all 400+ connections happen at once. [19:44:13] Logged the message, Master [19:44:17] As long as moving the 'pedias has settled down [19:44:28] ie the shit didn't hit the fan [19:44:37] :) [19:45:08] then they're be APC spam in the logs which can be ignored [19:45:26] (03PS1) 10Cmjohnson: Fixing typo for ps1-d1-eqiad wmnet file [operations/dns] - 10https://gerrit.wikimedia.org/r/115956 [19:45:39] APC cache size to smallish for three versions? :P [19:45:54] (03CR) 10Cmjohnson: [C: 032] Fixing typo for ps1-d1-eqiad wmnet file [operations/dns] - 10https://gerrit.wikimedia.org/r/115956 (owner: 10Cmjohnson) [19:45:55] yup [19:46:34] 2 versions is more than enough for most of the time [19:46:40] bedtime for me (early), hope to be back to normal tomorrow. have a good rest of the day. also, move to salt because dsh + ssh agents sucks majorly.. just sayin [19:46:45] aude: apergos: Send out the mail [19:46:55] hoo: yay [19:47:17] apergos: :) good night [19:47:25] night! [19:47:43] good night, apergos ;) [19:48:13] Everything looks good to me. Let's bump group0 to wmf16 [19:48:26] Deskana: Do you concur? ;) [19:48:45] bd808|deploy: Uh, sure. [19:48:55] (03CR) 10BryanDavis: [C: 032] Group0 wikis to 1.23wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115938 (owner: 10BryanDavis) [19:49:02] (03Merged) 10jenkins-bot: Group0 wikis to 1.23wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115938 (owner: 10BryanDavis) [19:49:08] Deskana: If I ever want to have a crazy house party you're going to be the bouncer. [19:49:22] rdwrer: "Can I come in?" "Uh, sure." [19:49:52] Deskana: You'll hopefully have the common decency to be like "!log let some people in, I dunno who they are, they said they know Carlos?" [19:50:06] rdwrer: "Can I come in?" "I don't know. I was told to ask you how long you've been running on the Beta Cluster." "What?" "Never mind. You can come in." [19:50:24] AaronSchulz: Would you please run `sync-wikiversions 'group0 wikis to 1.23wmf16'` [19:50:40] sec [19:51:28] json looks fine, syncing [19:51:29] !log aaron rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.23wmf16 [19:51:36] Logged the message, Master [19:52:13] English <pt-createaccount> <pt-login> [19:52:21] * AaronSchulz hrms at the chrome [19:53:32] strange what messages are missing and what ones aren't [19:53:36] on mw.org [19:53:59] AaronSchulz: Links? [19:54:18] * bd808|deploy sees header message issues too [19:54:19] https://www.mediawiki.org/wiki/Special:RecentChanges [19:56:04] AaronSchulz: Should I try a full scap to see if that rebuilds l10n correctly? [19:56:51] you can even use --version, heh [19:56:52] We saw some weirdness last week on the new branch too but it got lost in the mass of other fubar with scap rsync slave selection problems [19:57:10] * bd808|deploy will try  [19:58:38] !log bd808 Started scap: rebuild php-1.23wmf16 l10n cache [19:58:45] Logged the message, Master [19:59:41] https://en.wikipedia.org/wiki/Special:Version [19:59:49] heh, I never removed swiftcloudfiles [20:01:22] (03PS1) 10Aaron Schulz: Removed unused SwiftCloudFiles extension [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115959 [20:01:33] l10n cache is definitely rebuilding on tin. There's something not quite right with my fix for the new branch initialization bug. [20:01:34] (03PS1) 10Jgreen: add users mah and mglaser to releases webserver, w/o enabling keys pending final verification [operations/puppet] - 10https://gerrit.wikimedia.org/r/115960 [20:04:10] (03CR) 10Hoo man: [C: 04-1] "tabs vs. spaces!" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115960 (owner: 10Jgreen) [20:04:31] what!?! where did I miss a tab [20:04:43] :o [20:04:46] Jeff_Green: You used spaces, rest of the file uses tabs [20:04:53] puppet is spaces [20:04:53] oh kill me [20:04:55] bd808|deploy: If it hasn't started syncing to the apaches yet... [20:05:07] Reedy: it's 40% done [20:05:11] puppet doesn't care but afaik our standard is to gently migrate to spaces [20:05:12] rest of file is wrong [20:05:14] A copy of the "old" json files might be helpful for a diff [20:05:34] Jeff_Green: he just wants you to retab admins.pp really quick and then rebase on top of it, *gg:) [20:05:36] * bd808|deploy slaps head [20:05:36] be consistent, anyone? [20:05:57] but it's true, we want spaces [20:05:59] lol, last time I retabbed something I got in trouble for merging a reformat at the same time as an actual change [20:06:10] but consistency is more important than spaces even [20:06:14] within one file [20:06:33] Jeff_Green: :p , if ta retab ends up in 4 spaces it should be fine nowadays :) [20:06:41] but ,,i know [20:06:57] admins.pp is a trainwreck in ways that are far more important than tabs vs spaces [20:07:15] i also would't want to change the entire admins.pp.. yes, that [20:07:25] and for the record it's also mixed in terms of tabs vs spaces [20:07:34] i'm sure matanya already made a change [20:07:38] that sits somewhere [20:08:12] oh, it already is? oh well. it doesnt matter.. abstain :) [20:08:22] it's a mess [20:08:27] :set list and enjoy [20:08:42] every real tab character you kill means one less lint warning from jenkins/puppet-lint [20:08:45] :) [20:08:51] !log bd808 scap-rebuild-cdbs failed on 422 hosts [20:08:58] Logged the message, Master [20:08:58] !log bd808 Finished scap: rebuild php-1.23wmf16 l10n cache (duration: 10m 20s) [20:09:06] wah wah [20:09:07] Logged the message, Master [20:09:29] * AaronSchulz needs a tuba [20:09:38] or a wah wah pedal [20:09:43] " Wikimedia style uses 4 spaces for indentation [20:09:43] autocmd BufRead */wmf/puppet/* set sw=4 ts=4 et [20:09:49] !log messing with serial connections ps1-b5 and ps1-b6-eqiad [20:09:52] !log scap-rebuild-cdbs failed with "Could not open directory '/upstream'." [20:09:53] "open the file, :retab, :wq, done. " [20:09:58] Logged the message, Master [20:10:06] Logged the message, Master [20:10:26] and so a simple commit goes down the rathole [20:10:39] * bd808|deploy reads source code [20:10:45] we need a policy for already-broken files :-) [20:11:31] * aude bikesheds [20:11:33] bd808|deploy: still deploying I guess? :D [20:11:51] hashar: apparently [20:12:04] Trying to fix group0 l10n cache [20:12:13] (03CR) 10Ottomata: "Looks fine to me. Shall I merge?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [20:12:19] :-/ [20:12:28] I think running scap again fixed that last week [20:12:50] (03CR) 10Dzahn: [C: 031] "as long as the keys get verified and the rest works i don't vote it down for tabs/spaces discussion. the thing is, we want spaces, the fil" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115960 (owner: 10Jgreen) [20:12:58] (03CR) 10Manybubbles: [C: 031] "Please do. It won't do anything until the next full cluster restart but that is kind of the point." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [20:13:02] Jeff_Green: there, +1, not stopping you [20:13:10] bd808|deploy: your 400+ connections issue with ssh-agent can be solved by having a ssh key on the cluster so you would your agent on tin [20:13:18] but.. it wasnt even blocking, i see the keys arent in yet [20:13:19] mutante: ha, thanks [20:13:53] (03PS4) 10Manybubbles: Configure Elasticsearch for safe full restart [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 [20:13:59] (03CR) 10Ottomata: [C: 032 V: 032] Configure Elasticsearch for safe full restart [operations/puppet] - 10https://gerrit.wikimedia.org/r/115924 (owner: 10Manybubbles) [20:14:13] mutante: yeah i just wanted to get the tedious part done. [20:14:17] Jeff_Green: though, technically, it shouldn't be merged like this of course.. :p [20:14:20] because of the keys [20:14:31] yep [20:14:47] why? they're not enabled so the accounts will be generated without auth key files [20:15:00] then it's a trivial thing to enable the key [20:15:49] jdlrobson, so if the issue was due to modules stuck, do we need to do the touching in prod? [20:16:31] I wonder why we don't have seperate bastion only class for then someone only needs access to one (or a few) machines [20:16:34] MaxSem: I suspect not if we can guarantee that i18n messages will also update when stuff is pushed to production which i assume they will :) [20:16:35] Jeff_Green: oh, well, true. i just expected for some reason you are preparing this to amend PS2 later.. [20:16:55] !log bd808 Started scap: full scap; rebuild php-1.23wmf16 l10n cache [20:17:04] Logged the message, Master [20:17:27] No l10n updates built on tin this time... [20:17:49] mutante: i was planning to do a new commit once I manage to sync up with each user and visually verify them [20:17:51] (03PS3) 10Ottomata: Setting up rsync job to copy kafkatee generated webrequest logs to stat1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115677 [20:17:58] (03CR) 10Ottomata: [C: 032 V: 032] Setting up rsync job to copy kafkatee generated webrequest logs to stat1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115677 (owner: 10Ottomata) [20:18:28] Jeff_Green: i'm just letting you do it as you see fit :) last comment is we try to match UIDs to Labs users nowadays [20:18:55] so i would have looked on formey and taken those [20:18:58] ah, that's good to know. I went looking for that bit of information earlier and came up with a different answer [20:19:20] it's just easy to find one that is not taken. and it matches [20:19:49] so, like root@formey:~# ldaplist -l passwd mah [20:19:52] yeah this would have been good to know [20:19:56] uidNumber: 1232 [20:20:11] you can still use those [20:20:27] mglaser = uidNumber: 1229 [20:20:38] i'd rather match them. I was wondering why the uids in admins.pp have suddenly gone insane [20:21:00] they jump from the 600's into seemingly random multi-thousand ranges [20:21:13] because we started taking the ones from labs LDAP, yes [20:21:21] but before we made too many mistakes [20:21:28] when we manually counted up [20:21:35] and the users were not added in order in the file [20:21:46] and people did grep | sort kinda things to just find next UID [20:21:52] https://wikitech.wikimedia.org/wiki/RT_Triage_Duty#Creating_new_shell_users <-- this shoudl be updated [20:22:02] and once you have a duplicate UID.. gotta clean up a lot with find -exec [20:22:20] true [20:22:22] oh yes I know all about that, having cleaned up other peoples messes [20:22:23] (03PS3) 10Hashar: Add integration/php-coveralls to git-deploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/115947 (owner: 10Addshore) [20:22:25] can do [20:22:28] (03PS4) 10Addshore: Add integration/php-coveralls to git-deploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/115947 [20:22:37] mutante: what do we usually do to verify the key these days? [20:22:45] (03CR) 10Hashar: [C: 031] "Good for me. Lets get that merged and deployed on tin :-D" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115947 (owner: 10Addshore) [20:22:57] we used to ask people to post it in officewiki as well as in RT [20:22:59] Jeff_Green: let them paste it on their office wiki user page, then you can link to the revision from wiki in the gerrit change [20:23:09] but for volunteers? [20:23:11] Jeff_Green: if the users are more tech-savvy, ask them to gpg sign it [20:23:17] !log bd808 Finished scap: full scap; rebuild php-1.23wmf16 l10n cache (duration: 06m 21s) [20:23:26] Logged the message, Master [20:23:35] Jeff_Green: oh, yea, that is harder, volunteer shells are new [20:23:45] AaronSchulz: https://www.mediawiki.org/wiki/Special:RecentChanges looks better to me now [20:23:54] Jeff_Green: as I said earlier today, the easiest is probably to do a hangout with them :-D [20:24:01] Jeff_Green: well for hexmode / mah it's easy, he has gpg key and you can trust it because he signs mediawiki packages [20:24:27] hashar: yes, that's what I was planning to do, but if there's another standard that the rest of Ops uses that's good to know [20:24:39] mutante: yep [20:24:45] Lunch time for me now. :) [20:24:51] Jeff_Green: I dont think we added any volunteers recently so you are in unknown lands :-] [20:24:58] Jeff_Green: i think in this specific case you happen to have GPG users [20:25:19] mutante: Jeff_Green i amended my patch in gerrit to add my key [20:25:38] aude: great, thanks [20:25:39] ifyou can trust that [20:25:43] Jeff_Green: so, you can gpg --verify after asking them to sign their key [20:25:48] (i was also present in irc) [20:25:58] aude: hahahaha, well, no, this could all be a hallucination actually... [20:25:58] Ok. group0 at 1.23mwf16 and I'm not seeing obvious l10n errors anymore [20:26:15] :) [20:28:08] gpg --search-keys --keyserver pgp.mit.edu mah@everybody.org [20:29:06] My release manager snuck out in the middle of the release :( [20:29:19] import the first one, then when he sends you a .,sig file, you gpg --verify file.sig [20:29:36] he has done it before when he sent me mw reelase tarbalss to upload to prod [20:29:52] you'll see a "GOod signature" line you can paste ,and confirmed [20:30:10] and you can compare that key to the one that signed all the mw releases in the past [20:30:55] that's hexmode, i can also confirm [20:32:26] he's also posted his key on enwiki and it matches what he sent to RT [20:33:28] I like the do-your-own commit approach idea for established gerrit users [20:33:42] yes, that sounds good too [20:34:09] (03PS1) 10Ottomata: Capturing zero request logs via kafkatee on analytics1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115962 [20:34:17] i add them as reviewers, you can also have him +1 on gerrit [20:34:42] which is the same user from LDAP [20:34:48] right [20:34:51] and he had to login [20:34:58] rght [20:35:32] (03CR) 10Ottomata: [C: 032 V: 032] Capturing zero request logs via kafkatee on analytics1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/115962 (owner: 10Ottomata) [20:35:44] (03PS2) 10Jgreen: add users mah and mglaser to releases webserver, w/o enabling keys pending final verification [operations/puppet] - 10https://gerrit.wikimedia.org/r/115960 [20:35:52] Deskana: I hear by declare group0 to 1.23wmf16 done. [20:36:30] aude: Ready to deploy some extension updates? [20:36:45] go ahead [20:37:14] So I was hoping you would :) You've done it once and I've done it zero times. [20:37:34] ok :) [20:37:35] (03CR) 10Dzahn: [C: 031] "UIDs confirmed. +1 for now matching with labs LDAP. can update docs about that" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115960 (owner: 10Jgreen) [20:37:36] Also my ssh-agent and sync-* aren't getting along today [20:38:16] i'll do wmf16 first [20:38:16] (03CR) 10Jgreen: [C: 032 V: 031] add users mah and mglaser to releases webserver, w/o enabling keys pending final verification [operations/puppet] - 10https://gerrit.wikimedia.org/r/115960 (owner: 10Jgreen) [20:38:23] then wmf15 (wikidata and education) [20:38:34] * bd808 nods [20:38:45] Both to both right? [20:39:04] education on both [20:39:10] wmf16 has the wikidata update [20:39:15] Ok [20:39:24] * bd808 looks around for ragesoss  [20:39:52] (03PS1) 10Ottomata: Need to specify type => pipe for processes [operations/puppet] - 10https://gerrit.wikimedia.org/r/115963 [20:40:10] (03PS2) 10Ottomata: Need to specify type => pipe for processes [operations/puppet] - 10https://gerrit.wikimedia.org/r/115963 [20:40:10] aude: Going to eat, stuff looks quiet from fluorine [20:40:16] (03CR) 10Ottomata: [C: 032 V: 032] Need to specify type => pipe for processes [operations/puppet] - 10https://gerrit.wikimedia.org/r/115963 (owner: 10Ottomata) [20:40:21] hoo|away: ok [20:40:42] ragesoss, AndyRussG : aude is going to push your patch to wmf16 (group0 wikis) [20:41:13] It will need verification before she moves on to the wmf15 updates [20:41:26] * bd808 gives the floor to aude [20:43:55] wait for jenkins [20:45:42] bd808: thanks! [20:46:47] wait and wait [20:47:28] * bd808 asks Jenkins to get with the program [20:48:39] (03PS1) 10Jgreen: enable Mark Hershberger's ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/115966 [20:49:46] (03CR) 10Dzahn: [C: 032] remove locke, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/115583 (owner: 10Matanya) [20:50:45] !log aude synchronized php-1.23wmf16/extensions/EducationProgram [20:50:47] (03CR) 10MarkAHershberger: [C: 031] "yep" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115966 (owner: 10Jgreen) [20:50:54] Logged the message, Master [20:50:59] AndyRussG: you can look on test2 now [20:51:03] !log DNS update - removing locke and squidlog which was a locke cname, locke is dead [20:51:09] or test [20:51:12] Logged the message, Master [20:51:18] (03CR) 10Jgreen: [C: 032 V: 031] enable Mark Hershberger's ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/115966 (owner: 10Jgreen) [20:51:26] * aude do wikidata change on wmf15 then education [20:51:27] aude: Still stuck on jenkins? [20:51:31] bd808: no [20:52:01] heh. Saw that [20:54:52] here goes wikidata [20:55:18] !log aude synchronized php-1.23wmf15/extensions/Wikidata [20:55:26] Logged the message, Master [20:56:32] wikidata looks good [20:56:57] AndyRussG: i'll deploy the educaiton fix to wikipedias now [20:56:58] aude: looks good on test2, the bugs are fixed and I don't see any issues, thanks again! [20:57:03] ragesoss: ^ [20:58:01] again wait for jenkins [20:58:42] If I was paying that butler I'd have to have a talk with him about being more prompt. [20:58:49] hehh [20:58:57] lots of tests [20:59:00] * aude wants more [21:00:23] Yeah we need more tests and probably more slaves too [21:00:35] both [21:03:05] !log aude synchronized php-1.23wmf15/extensions/EducationProgram [21:03:11] done [21:03:12] Logged the message, Master [21:03:22] AndyRussG: ^ [21:04:17] Thanks aude [21:04:53] thanks for getting us in [21:05:00] Excellent, aude, everything cheks out :) [21:05:05] * aude hands over to next person [21:05:18] I think that's it [21:05:30] Thanks aude and bd808 [21:05:36] yw [21:06:13] jgonera: Did you end up with a mobile fix that needs to go out? [21:07:06] yeah, thanks! [21:07:14] phew! [21:07:55] now anons won't be able to delete course by adding action=delete to a url. [21:08:14] yikes [21:08:49] Permissions checks are hard yo [21:08:57] like they could have any time since we started using it, but no one ever did. [21:09:05] :) [21:09:14] bd808, no, everything should be fine [21:09:16] bd808: Everything going okay? :) [21:10:07] Deskana: We're all done. wmf15 to all; wmf16 to group0 and extension updates for Ed and Wikidata [21:10:34] bd808: And nothing's on fire yet! Yay! [21:10:45] :) [21:10:48] * bd808 knocks on wood [21:10:56] I guess failing an exam is worse a punishment than being blocked on a wiki [21:11:01] (03PS1) 10Ottomata: Adding hasrestart to kafkatee service [operations/puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/115971 [21:11:09] Or deflagged [21:12:11] (03CR) 10Ottomata: [C: 032 V: 032] Adding hasrestart to kafkatee service [operations/puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/115971 (owner: 10Ottomata) [21:12:42] (03PS1) 10Ottomata: Updating kafkatee module with hasrestart change [operations/puppet] - 10https://gerrit.wikimedia.org/r/115972 [21:13:01] (03PS2) 10Ottomata: Updating kafkatee module with hasrestart change [operations/puppet] - 10https://gerrit.wikimedia.org/r/115972 [21:13:05] (03CR) 10Ottomata: [C: 032 V: 032] Updating kafkatee module with hasrestart change [operations/puppet] - 10https://gerrit.wikimedia.org/r/115972 (owner: 10Ottomata) [21:25:02] ottomata: would you mind merging a git-deploy config for me please? https://gerrit.wikimedia.org/r/#/c/115947/ :-] [21:26:43] (03PS5) 10Addshore: Add integration/php-coveralls to git-deploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/115947 [21:26:48] (03CR) 10Ottomata: [C: 032 V: 032] Add integration/php-coveralls to git-deploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/115947 (owner: 10Addshore) [21:27:20] ottomata: thanks :-] [21:28:30] yup! [21:28:36] running puppet on tin ow [21:28:37] now [21:28:44] awesome [21:28:58] :) [21:29:02] smile all round! [21:29:17] smiles... [21:29:26] * aude cheers [21:30:22] heh manybubbles [21:30:42] At some point in the next two weeks wikis running CirrusSearch will lose it for a few hours while we upgrade to Elasticsearch 1.0. They'll fall back to the old search for the duration. [13] [21:33:21] ottomata: puppet finished I guess? [21:33:52] yes, not sure what output I am expecting [21:33:56] but ja [21:34:01] no clue [21:34:01] does it need run on a target? [21:34:23] ottomata: I have no idea :-] I think it is broken anyway, dont waste your time with that will figure out later :D [21:34:27] thx for the merge! [21:36:25] bug fillinnggggg [21:39:02] (03CR) 10Dzahn: [C: 032] add cron for bugzilla reporter script [operations/puppet] - 10https://gerrit.wikimedia.org/r/115865 (owner: 10Dzahn) [21:49:48] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [21:59:43] (03PS1) 10Ottomata: Initial 2.0.0-1 debian release [operations/debs/archiva] - 10https://gerrit.wikimedia.org/r/115984 [22:03:06] (03Abandoned) 10Ottomata: Initial 2.0.0-1 debian release [operations/debs/archiva] - 10https://gerrit.wikimedia.org/r/115984 (owner: 10Ottomata) [22:04:04] (03CR) 10Matanya: Initial 2.0.0-1 debian release (032 comments) [operations/debs/archiva] - 10https://gerrit.wikimedia.org/r/115984 (owner: 10Ottomata) [22:04:23] (03PS3) 10Ottomata: Initial 2.0.0-1 debian release [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 [22:05:17] (03PS4) 10Ottomata: Initial 2.0.0-1 debian release [operations/debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/115323 [22:05:31] matanya, the . is the format [22:05:39] it continues the same paragraph [22:05:51] I removed the trailing space in the correct changeset [22:05:53] https://gerrit.wikimedia.org/r/#/c/115323/ [22:05:59] oh, i see now. sorry for the noise [22:06:00] the one you commented on was pushed to the wrong branch (master) [22:06:03] no pob [22:06:04] prob [22:14:20] (03PS1) 10Jgreen: enable mglaser's ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/115987 [22:15:21] (03CR) 10Mglaser: [C: 031] "perfect :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115987 (owner: 10Jgreen) [22:15:55] (03CR) 10Jgreen: [C: 031 V: 031] enable mglaser's ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/115987 (owner: 10Jgreen) [22:16:03] (03CR) 10Jgreen: [C: 032] enable mglaser's ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/115987 (owner: 10Jgreen) [22:35:40] (03PS1) 10BryanDavis: Fix documentation of `--home` option for activeMWVersions.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115992 [23:17:25] (03PS5) 10Reedy: Remove ContactPageFundraiser from testwiki and donatewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113896 [23:17:43] (03PS6) 10Reedy: Remove ContactPageFundraiser from testwiki and donatewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113896 [23:18:19] (03CR) 10Dzahn: fix wrong scap/doc permissions.live hack->puppet (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115851 (owner: 10Dzahn) [23:22:10] (03CR) 10Dzahn: [C: 032] "lgtm, per Alex, and reasoning by Damian on I8cdee5204a25e82ee14e1a06b2dd4e7740bd248f also unblocks the latter change" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115821 (owner: 10DamianZaremba) [23:23:48] (03CR) 10Dzahn: "hmm, usually the bot verifies" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115821 (owner: 10DamianZaremba) [23:25:31] (03CR) 10BryanDavis: "+1 for Antoine's suggestion of fixing the git repo permanently." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115851 (owner: 10Dzahn) [23:25:33] (03CR) 10Dzahn: "thanks for your comments, i tried to merge one of the 2 changes this depends on but it didnt get verified, and the other one i could +1 bu" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110880 (owner: 10Matanya) [23:28:35] (03CR) 10Dzahn: "we recently had a discussion about keeping the mgmt entries or not in these cases. per revised Server Lifecycle, they should stay" [operations/dns] - 10https://gerrit.wikimedia.org/r/112669 (owner: 10Matanya) [23:29:54] (03CR) 10Dzahn: [C: 04-1] "we recently had a discussion about keeping the mgmt entries or not in these cases. per revised Server Lifecycle, they should stay" [operations/dns] - 10https://gerrit.wikimedia.org/r/112669 (owner: 10Matanya) [23:32:51] (03PS4) 10Matanya: remove streber from DNS [operations/dns] - 10https://gerrit.wikimedia.org/r/112669 [23:32:56] odder: yes, not sure exactly when yet [23:33:26] mutante|away: ^^ [23:35:23] (03CR) 10Dzahn: [C: 032] "thanks, yes streber is gone and unreachable" [operations/dns] - 10https://gerrit.wikimedia.org/r/112669 (owner: 10Matanya) [23:36:28] !log DNS update - removing streber (RT 2186) [23:36:36] Logged the message, Master [23:37:12] (03CR) 10Reedy: [C: 032] Remove ContactPageFundraiser from testwiki and donatewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113896 (owner: 10Reedy) [23:37:20] (03Merged) 10jenkins-bot: Remove ContactPageFundraiser from testwiki and donatewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113896 (owner: 10Reedy) [23:38:02] !log streber - revoke puppet cert and salt key [23:38:09] !log reedy synchronized wmf-config/InitialiseSettings.php 'Remove ContactPageFundraiser from testwiki and donatewiki' [23:38:10] Logged the message, Master [23:38:16] Logged the message, Master [23:44:04] (03CR) 10Matanya: "There is a newer version that might achieve what you are looking for:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/62955 (owner: 10Faidon Liambotis) [23:44:36] thanks a lot mutante [23:45:15] !log reedy updated /a/common to {{Gerrit|Ibe70ebb6d}}: Remove ContactPageFundraiser from testwiki and donatewiki [23:45:15] (03PS1) 10Reedy: Update ContactPage config for nlwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116002 [23:45:21] Logged the message, Master [23:45:25] (03CR) 10jenkins-bot: [V: 04-1] Update ContactPage config for nlwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116002 (owner: 10Reedy) [23:46:51] (03PS2) 10Reedy: Update ContactPage config for nlwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116002 [23:48:07] (03CR) 10Reedy: [C: 032] Update ContactPage config for nlwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116002 (owner: 10Reedy) [23:48:13] (03Merged) 10jenkins-bot: Update ContactPage config for nlwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116002 (owner: 10Reedy) [23:48:45] (03PS2) 10Ori.livneh: Get rid of remaining references to $wmfExtendedVersionNumber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114915 [23:48:57] Reedy: what is the fate of https://gerrit.wikimedia.org/r/#/c/74592/ ? [23:49:10] (03CR) 10Reedy: [C: 032] Get rid of remaining references to $wmfExtendedVersionNumber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114915 (owner: 10Ori.livneh) [23:49:17] (03Merged) 10jenkins-bot: Get rid of remaining references to $wmfExtendedVersionNumber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114915 (owner: 10Ori.livneh) [23:50:16] matanya: Run frequency needs deciding and then that patchset updating. Then review/deploy [23:50:49] Reedy: who cand define frequency? [23:50:50] !log reedy synchronized wmf-config/ [23:50:58] Logged the message, Master [23:52:10] Jeff_Green: anything with https://gerrit.wikimedia.org/r/#/c/72157/ ? [23:52:38] It can just be run on the first of every month [23:52:49] Or even once a week.. [23:52:58] it only deletes based on $wgSecurePollKeepPrivateInfoDays [23:53:08] (03CR) 10Dzahn: "this is another case where Damian needs to be added to the trusted users or we have to override jenkins" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115821 (owner: 10DamianZaremba) [23:53:33] Reedy: mind fixing that? [23:54:59] (03CR) 10Dzahn: [V: 032] Define root disk space check for labs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115821 (owner: 10DamianZaremba) [23:55:53] !log reedy updated /a/common to {{Gerrit|Iade498a40}}: Get rid of remaining references to $wmfExtendedVersionNumber [23:55:57] (03PS1) 10Reedy: Swap 2 other $wmfExtendedVersionNumber uses [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116003 [23:56:02] Logged the message, Master [23:56:09] (03CR) 10Reedy: [C: 032] Swap 2 other $wmfExtendedVersionNumber uses [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116003 (owner: 10Reedy) [23:56:24] (03Merged) 10jenkins-bot: Swap 2 other $wmfExtendedVersionNumber uses [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116003 (owner: 10Reedy) [23:56:38] !log reedy synchronized wmf-config/ [23:56:46] Logged the message, Master [23:57:50] (03CR) 10Dzahn: "ok, the 2 changes listed above that should be merged first are now merged" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110880 (owner: 10Matanya)