[00:02:26] (03CR) 10Dzahn: [C: 032 V: 032] "sorry it took so long, i actually checked the diffs of the /template/en/custom files now, i just saw one left in the index.html.tmpl, fixe" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/119726 (https://bugzilla.wikimedia.org/61499) (owner: 1001tonythomas) [00:08:14] (03PS1) 10MaxSem: Fix Amazon Silk detection [operations/puppet] - 10https://gerrit.wikimedia.org/r/137838 [00:10:05] (03CR) 10Kaldari: [C: 031] Fix Amazon Silk detection [operations/puppet] - 10https://gerrit.wikimedia.org/r/137838 (owner: 10MaxSem) [00:12:11] crap, i broke something in Bugzilla, looking [00:16:02] !log fixed permissions on bugzilla's index.cgi, sry [00:16:07] Logged the message, Master [00:22:49] ACKNOWLEDGEMENT - RAID on es1006 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Sean Pringle RT 7639 [00:24:47] mutante: I meant against the upstream project, which is me :) [00:27:24] bblack: which ticket system:) [00:29:16] https://github.com/blblack/gdnsd/issues?state=open [00:29:52] ok:) [00:33:40] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 21:33:08 UTC [00:33:50] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Fri Jun 6 00:33:42 UTC 2014 [00:38:33] !log nginx restarted on ssl* [00:38:37] Logged the message, Master [00:44:39] (03CR) 10BBlack: [C: 032 V: 032] Fix Amazon Silk detection [operations/puppet] - 10https://gerrit.wikimedia.org/r/137838 (owner: 10MaxSem) [01:13:40] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 19:12:41 UTC [01:38:40] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [01:51:23] ori: hmm. when I go to http://codepen.io/Krinkle/full/laucI I get the same thing, even with http:// [01:51:35] let me try disabling https-everywhere [01:51:50] or, I'll just use a different browser [01:57:46] well, it works in safari [01:57:47] weird [02:19:01] (03PS1) 10Yurik: LABS: Disabled wmgZeroBanner on zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137854 [02:19:54] greg-g, can i sync ^ [02:20:47] (03CR) 10Yurik: [C: 032] "production NOOP" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137854 (owner: 10Yurik) [02:20:53] (03Merged) 10jenkins-bot: LABS: Disabled wmgZeroBanner on zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137854 (owner: 10Yurik) [02:40:29] (03PS6) 10Mwalker: Adding GELF support to LogStash [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 [02:43:32] !log LocalisationUpdate completed (1.24wmf7) at 2014-06-06 02:42:28+00:00 [02:43:37] Logged the message, Master [02:53:13] bblack, around? [03:01:31] (03PS1) 10Yurik: netmapper URL based on realm (prod/labs) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137856 [03:02:50] bblack, please change the script to use "action=zeroportal" (works in prod too), and I submitted this to work on beta: https://gerrit.wikimedia.org/r/#/c/137856/ [03:13:22] !log LocalisationUpdate completed (1.24wmf8) at 2014-06-06 03:12:19+00:00 [03:13:26] Logged the message, Master [03:19:11] (03PS2) 10Ori.livneh: Puppetize /home/tstarling/.bashrc [operations/puppet] - 10https://gerrit.wikimedia.org/r/76678 (owner: 10Tim Starling) [03:19:38] (03CR) 10Ori.livneh: "Updated for Chase's new admin module pattern" [operations/puppet] - 10https://gerrit.wikimedia.org/r/76678 (owner: 10Tim Starling) [04:14:40] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 19:12:41 UTC [04:24:15] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 6 04:23:08 UTC 2014 (duration 23m 7s) [04:24:20] Logged the message, Master [04:35:37] (03PS1) 10Ori.livneh: Add my dotfiles [operations/puppet] - 10https://gerrit.wikimedia.org/r/137864 [04:37:23] ^ +1 ? [04:39:40] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [05:29:25] got akosiaris ? [05:52:30] hola cajoel [05:52:39] nighto [05:53:23] cloudy day at the acropolis [05:53:24] http://www.greece-athens.com/webcamera/acropolis.php [05:54:58] jgage: found from lsof that current active TCP sessions are initiated from inside the NAT [05:55:08] so I also need to PBR to the same ports outbound [05:55:12] but that's an easy fix [05:55:21] all ready to go [05:56:30] oh interesting [05:57:37] aside: Do you know anyone at deviantArt? [05:57:44] I know Chase worked there [05:57:56] one of our interns is interested in applying there -- IT [05:59:03] we have two other people on staff from DA i believe, but i don't personally know anyone there [06:00:40] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 03:00:13 UTC [06:04:44] i'm on the verge of zzz. are you planning to wait for akosiaris for that change? [06:05:00] i am here [06:05:08] good evening :) [06:05:46] hi hi [06:06:09] if you have some time, take a look at my notes? [06:06:48] would also be nice if you could lsof on sanger [06:06:54] wait, maybe I can login to sanger.. [06:07:16] hot damn -- I can [06:07:58] looking at the notes right now [06:08:48] java 1276 opendj 78u IPv6 5665 0t0 TCP sanger.wikimedia.org:41461->sanger.wikimedia.org:8989 (ESTABLISHED) [06:08:48] java 1276 opendj 79u IPv6 5655 0t0 TCP sanger.wikimedia.org:41454->sanger.wikimedia.org:8989 (ESTABLISHED) [06:08:48] java 1276 opendj 80u IPv6 48442688 0t0 TCP sanger.wikimedia.org:8989->216.38.130.189:44754 (ESTABLISHED) [06:08:48] java 1276 opendj 82u IPv6 48462983 0t0 TCP sanger.wikimedia.org:8989->sfo-intranet.corp.wikimedia.org:33394 (ESTABLISHED) [06:08:48] java 1276 opendj 83u IPv6 48442686 0t0 TCP sanger.wikimedia.org:8989->sfo-intranet.corp.wikimedia.org:33374 (ESTABLISHED) [06:08:57] connections are being initiated from SF [06:09:14] (03CR) 10Ori.livneh: "Giuseppe, I think you're right. I'll amend the change to undo that particular part." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137470 (owner: 10Ori.livneh) [06:09:26] might work both ways, but current open ports are initiated on sf side [06:10:33] ok so my only point is for WAN in ${WANLIST} [06:10:35] I don't see WANLIST defined somewhere in that script [06:10:44] yeah, it's much higher [06:10:49] eth2 eth8 eth 9 etc.. [06:10:50] ah ok [06:10:59] I use it for other parts of the script [06:11:16] seems fine then [06:11:25] spent some time reading iptables, and it looks like you can specify multiple intefaces in one go, but that's for another time [06:11:39] I have James Alexandar online too [06:11:47] sugarcrm uses LDAP via labs [06:12:11] it uses dns, but he's ready to kick apache if it doesn't clean up [06:12:15] the labs LDAP is completely separate though [06:12:16] !log on osmium installed nodejs for testing [06:12:21] Logged the message, Master [06:12:26] (03PS8) 10Mwalker: Adding GELF support to LogStash [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 [06:12:35] I'm going to grab a glass of water and come back and get started. [06:12:39] ok [06:12:50] akosiaris: it's an instance running on labs that talks to OIT LDAP [06:12:53] for user auth [06:13:01] ah ok [06:13:07] gelf support for logstash! [06:13:14] yep; it now works too [06:13:20] just tested that final revision [06:14:21] i have recently done the same thing, for hadoop. nice to see your use of filters. [06:15:58] akosiaris: ok, ready to roll [06:16:08] ok [06:16:13] one at a time or both at once? [06:16:31] one at a time ? [06:16:58] jgage, how did you make it HA? we were considering LVS + pybal [06:17:00] ok, sfo-intranet1 first [06:17:59] first dns [06:18:03] (03CR) 10Mwalker: "Now tested in labs -- see https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/gelf_test" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 (owner: 10Mwalker) [06:18:56] jgage, also if you want to +2 and puppet merge that would be really cool [06:19:10] (03PS3) 10Ori.livneh: mediawiki::php: restore php5-igbinary; delete some unused files [operations/puppet] - 10https://gerrit.wikimedia.org/r/137470 [06:20:06] dns looks ok to me [06:20:08] you? [06:20:20] ;; ANSWER SECTION: [06:20:20] sfo-intranet1.corp.wikimedia.org. 300 IN A 198.73.209.13 [06:20:27] (03PS1) 10Ori.livneh: remove unused files in files/apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/137883 [06:20:29] mwalker, i haven't done anything with HA so far [06:20:41] mwalker, looks good. gonna +2 and merge. [06:20:42] I just saw the change in your ns, the recursors have not picked it up yet [06:20:47] (03CR) 10Gage: [C: 032] "Cool! I was about to make a nearly-identical change." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137811 (owner: 10Mwalker) [06:21:13] 71 secs left, not purging, will wait it out [06:21:23] akosiaris: i'm happy to see any work on the apache pile-o'-crap, plenty of mess for everyone to get some action :) [06:21:36] akosiaris: I was seeing it on 4.2.2.1 [06:21:36] dig @4.2.2.1 sfo-intranet1.corp.wikimedia.org [06:21:48] mwalker, done! [06:21:53] whooo! [06:21:54] ori: yes indeed. let's clear that mess please :) [06:21:55] thanks :) [06:22:01] :D [06:22:19] ori: got a plan ? I just got started to try and solve some puppet issues on zirconium [06:22:30] but I am up for bigger changes [06:22:47] akosiaris: making PBR changes [06:22:49] cajoel: ok it changed on the recursors too [06:23:00] akosiaris: well, i don't think we're worried about keeping up with upstream, so i figure maybe start by removing all the pieces of that module that we aren't using [06:23:58] akosiaris: i don't really have a final architecture in mind, i usually get a handle on the problem by doing small lint-type commits [06:24:03] ori: ok so some housecleaning first. That reminds me of the scap approach. I am up for it [06:24:11] yep [06:24:42] that is mostly why I started it anyway. To understand what the @@#$ is going on with that set of classes [06:25:33] cajoel: SSL connection attempt from 198.73.209.2 (198.73.209.2) failed: Received close_notify during handshake [06:25:36] is that you ? [06:25:41] akosiaris: [05/Jun/2014:23:23:29 -0700] category=SYNC severity=SEVERE_ERROR msgID=14942389 msg=Replication Server 12440 sanger:8989 dc=corp,dc=wikimedia,dc=org has badly disconnected from this replication server 30547 [06:25:53] yeah, that's one of the IPs [06:26:28] ok, looks like opendj hasn't pick up yet the DNS change [06:26:32] I will restart my end [06:27:13] ok [06:28:09] Could not connect to any replication server on suffix dc=corp,dc=wikimedia,dc=org among the following RS candidates {12440=Url:sanger:8989 ServerId:12440, 12701=Url:sfo-aaa1.corp.wikimedia.org:8989 ServerId:12701, 30547=Url:sfo-intranet1:8989 ServerId:30547}, retrying... [06:28:19] it is complaining about the second one, not the intranet1 one [06:28:45] odd [06:28:59] checking rules [06:29:28] [06/Jun/2014:06:27:31 +0000] category=SYNC severity=NOTICE msgID=15138878 msg=Replication is up and running for domain cn=schema with replication server id 12701 sfo-aaa1.corp.wikimedia.org/216.38.130.188:8989 - local server id is 1243 - data generation is 8408 [06:29:35] ok so the cn=schema repl is working [06:29:44] it only complained about the actual tree [06:29:53] weird [06:29:54] port is open [06:30:00] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Fri Jun 6 06:29:55 UTC 2014 [06:30:18] shall we try a data change with the half brain? [06:30:23] or do the full conversion now? [06:31:35] ok it recovered so let's go on [06:31:52] ok [06:31:58] next dns [06:33:24] applied [06:33:31] 4.2.2.2 seeing it [06:33:57] (03PS1) 10Ori.livneh: ::apache: delete everything we're not already using [operations/puppet] - 10https://gerrit.wikimedia.org/r/137884 [06:35:19] yup it changed on the resolvers too [06:35:29] (03PS2) 10Ori.livneh: ::apache: delete everything we're not already using [operations/puppet] - 10https://gerrit.wikimedia.org/r/137884 [06:37:41] [05/Jun/2014:23:29:45 -0700] category=SYNC severity=NOTICE msgID=15138921 msg=SSL connection attempt from sanger.wikimedia.org (208.80.152.187) failed: Unrecognized SSL message, plaintext connection? [06:39:36] cajoel: yes it needed again a restart [06:39:44] it did not complain at all now [06:40:42] ok [06:40:53] want to do a app level test? [06:41:10] sure [06:41:36] changed my title [06:41:50] are you hitting queries on the command line? [06:41:57] on sanger? [06:42:34] I am dumping the entire ldif [06:42:55] cli ? [06:43:20] title: System and Network Engineer [06:43:31] ds-sync-hist: title:000001466fe935f157ef000044f8:repl:System and Network Engineer [06:43:47] so we are ok it seems, right ? [06:44:10] yes indeed [06:44:16] sudo -u opendj /usr/opendj/bin/export-ldif -l lala.ldif -n userRoot [06:44:26] if you are interested in the command [06:44:33] yes, please and thanks [06:45:06] well sweet [06:47:19] ok [06:47:28] confirmed that sugar+labs is functional [06:47:39] failed once and then connected (didn't need to kick apache) [06:47:53] _joe_: https://gerrit.wikimedia.org/r/#/c/137470/ <-- updated to remove the recursive dir, per yr suggestion [06:48:04] cajoel: cool. Seems like you can go to bed then ? [06:48:09] akosiaris: I think so! [06:48:18] <_joe_> ori: great [06:48:34] gotta be at work in 8.5 hours for a fireside chat! [06:48:37] <_joe_> sorry to be a pain sometimes, but those are paths I already went down in the past [06:48:45] akosiaris: thanks for your help [06:48:58] _joe_: i wouldn't have taken the advice if i didn't end up agreeing [06:49:04] cajoel: you are welcome. Have a nice night :-) [06:49:17] next is the pbx [06:49:23] do you have a sip phone? [06:49:27] I know faidon does [06:49:42] not tonight -- next week [06:49:48] yes I do [06:50:00] are you using dns names or ips? [06:50:06] dns names [06:50:13] asterisk also sends that helper address thingy [06:50:19] hardcoded in the asterisk config [06:50:58] externalIP in sip.conf [06:51:00] something like that [06:51:04] some other night [06:51:07] cheers! [06:51:13] cheers! [06:51:22] nice job dudes [06:52:10] thx gage [06:52:16] happy ulfso touring tomorrow [06:52:27] jgage: still up ? thanks :-) [06:53:40] (03PS1) 10Giuseppe Lavagetto: puppetmaster: acls only in apache for puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137886 [06:53:56] akosiaris: sent a couple of patches your way. i'm permitted to merge changes that get a +1 from opsen, so if any look sane, i can shepherd them into prod and make sure they don't break [06:54:04] hi alex :) sure, happy to stick around and observe [06:55:31] ori: I can merge them too if you feel like going to sleep. I am looking at them now [06:56:02] <_joe_> um gerrit bot where art thou? [06:56:05] akosiaris: happy to have you merge them, but i'll be up either way, have to write up a document [06:56:16] <_joe_> oh I'm blind [07:00:09] (03CR) 10Alexandros Kosiaris: [C: 032] remove unused files in files/apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/137883 (owner: 10Ori.livneh) [07:00:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:01:00] dur, the dependent patch has a typo [07:02:08] (03PS4) 10Ori.livneh: mediawiki::php: restore php5-igbinary; delete some unused files [operations/puppet] - 10https://gerrit.wikimedia.org/r/137470 [07:02:34] (03CR) 10Alexandros Kosiaris: [C: 032] Add my dotfiles [operations/puppet] - 10https://gerrit.wikimedia.org/r/137864 (owner: 10Ori.livneh) [07:02:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:03:00] (03PS2) 10Ori.livneh: remove unused files in files/apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/137883 [07:03:18] (03CR) 10Ori.livneh: [C: 032 V: 032] "(rebase)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137883 (owner: 10Ori.livneh) [07:03:47] puppet's fine on mw1058 [07:03:53] <_joe_> yeah it is [07:03:55] we've been having these false alarms, must be an snmp issue [07:03:56] <_joe_> look at the time [07:04:15] <_joe_> Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:04:22] <_joe_> so 7 minutes ago [07:04:28] <_joe_> I do hate that check [07:04:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:05:57] <_joe_> shut up you icinga [07:06:13] akosiaris: thank you [07:06:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:06:58] ah yes, that crappy thing [07:07:30] so, icinga gets the snmp fine, turns to OK and then again CRITICAL within minutes [07:07:55] <_joe_> yeah we need to change this [07:07:59] the one way I found out on how to solve it is to disable notifications, wait a bit, submit a passive ok check, wait some more time and then reenable notifications [07:08:09] <_joe_> grr [07:08:17] but I hate that check so much... [07:08:31] <_joe_> akosiaris: or, we can write the timestamp and the exit status of the puppet run to a file with a wrapper [07:08:37] <_joe_> and check that via nrpe [07:08:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:08:41] <_joe_> or whatever [07:08:59] <_joe_> that would work a LOT better, and won't tamper with manual runs [07:09:00] so I have had a python script almost ready, then some creeping featuritis came in through the window and I 've never finished it [07:09:09] <_joe_> :) [07:09:27] I am gonna go back and revisit that thing while closing the window at the same time [07:09:28] <_joe_> featuritis is a bad disease, I almost always suffer from it [07:10:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:10:49] (03PS1) 10Giuseppe Lavagetto: puppet: ensure defining $puppet_version does upgrade [operations/puppet] - 10https://gerrit.wikimedia.org/r/137889 [07:12:12] <_joe_> akosiaris: can I ask this couple of CRs? [07:12:32] <_joe_> when those are done, we are officially puppet3-enabled :) [07:12:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:14:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:15:17] ok [07:15:34] <_joe_> and I trust you as a reviewer :P [07:15:38] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 19:12:41 UTC [07:16:36] <_joe_> yeah that, I must ack [07:16:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:17:25] ACKNOWLEDGEMENT - Puppet freshness on virt1000 is CRITICAL: Last successful Puppet run was Thu 05 Jun 2014 19:12:41 UTC Giuseppe Lavagetto working on puppet 3 migration. [07:18:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:20:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:22:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:24:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:26:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 06:57:51 UTC [07:27:28] RECOVERY - Puppet freshness on mw1058 is OK: puppet ran at Fri Jun 6 07:27:27 UTC 2014 [07:29:38] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 07:27:27 UTC [07:30:37] (03CR) 10Alexandros Kosiaris: [C: 04-1] puppetmaster: acls only in apache for puppet 3 (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137886 (owner: 10Giuseppe Lavagetto) [07:30:47] (03CR) 10Alexandros Kosiaris: [C: 032] puppet: ensure defining $puppet_version does upgrade [operations/puppet] - 10https://gerrit.wikimedia.org/r/137889 (owner: 10Giuseppe Lavagetto) [07:35:36] <_joe_> akosiaris: I did not think puppetmaster::passenger had any use outside of the puppetmaster class :) [07:35:55] (03CR) 10Giuseppe Lavagetto: "I agree completely." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137886 (owner: 10Giuseppe Lavagetto) [07:40:03] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [07:40:27] all this puppet freshness is giving me headaches [07:40:27] and I have and reviews today [07:40:41] <_joe_> oh the performance review process [07:40:51] <_joe_> I did not understand if I should participate [07:40:55] <_joe_> or how [07:41:06] <_joe_> I'm lazy as hell with those things [07:41:23] <_joe_> (still need to ask for the reimbursement for Athens) [07:42:17] ask then. I would expect you to not but don't take my word for it [07:46:34] there was a wmfall email about reimbursements before end of fiscal [07:46:56] _joe_: might want to be quick :) [07:48:20] <_joe_> when is the end of fiscal? [07:48:31] <_joe_> god I'm a trainwreck [07:48:33] june 30 i guess [07:48:37] <_joe_> oh ok [07:48:42] <_joe_> "I have time" [07:48:43] <_joe_> :P [07:48:46] heh [07:49:03] <_joe_> why waste time earning money when I can play with tech? [07:51:39] (03PS2) 10Giuseppe Lavagetto: puppetmaster: acls only in apache for puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137886 [07:54:08] (03PS3) 10Giuseppe Lavagetto: puppetmaster: acls only in apache for puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137886 [07:55:46] (03PS4) 10Giuseppe Lavagetto: puppetmaster: acls only in apache for puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137886 [07:55:59] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: acls only in apache for puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137886 (owner: 10Giuseppe Lavagetto) [07:57:15] <_joe_> wat? [07:57:24] <_joe_> that was after the rebase [07:57:33] RECOVERY - Puppet freshness on mw1058 is OK: puppet ran at Fri Jun 6 07:57:31 UTC 2014 [07:59:51] (03PS5) 10Giuseppe Lavagetto: puppetmaster: acls only in apache for puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137886 [08:01:48] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster: acls only in apache for puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137886 (owner: 10Giuseppe Lavagetto) [08:03:10] (03PS1) 10Yurik: LABS: set zerobanner to use lab's zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137892 [08:05:15] good i can't remember [08:08:13] RECOVERY - Puppet freshness on virt1000 is OK: puppet ran at Fri Jun 6 08:08:02 UTC 2014 [08:08:58] <_joe_> \o/ [08:09:03] <_joe_> |o/ [08:10:21] (03CR) 10Yurik: [C: 032] "Another prod noop, updating zero config for labs." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137892 (owner: 10Yurik) [08:10:27] (03Merged) 10jenkins-bot: LABS: set zerobanner to use lab's zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137892 (owner: 10Yurik) [08:11:46] Just +2 my own wmf-config patch - only touches labs files ^ [08:12:16] no production sync (unless there are ops here who say its ok to do sync-dir on wmf-config at this time) [08:13:59] (03PS2) 10Giuseppe Lavagetto: puppet: ensure defining $puppet_version does upgrade [operations/puppet] - 10https://gerrit.wikimedia.org/r/137889 [08:21:38] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Two files still actively referenced in the tree. The rest LGTM" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137470 (owner: 10Ori.livneh) [08:21:40] yurik: +1 :D [08:22:36] hashar, you think its a good idea to do sync-dir or even scap now? :) [08:23:14] yurik: when I do change to the -labs.php files I just rebase the repo on tine [08:23:19] tin [08:23:30] I don't bother running scap since that is a noop for prod [08:24:13] (03PS1) 10Giuseppe Lavagetto: puppet3: correct resource title for removal [operations/puppet] - 10https://gerrit.wikimedia.org/r/137893 [08:24:25] hashar, you mean i should just run git pull in wmf-config dir on tin? [08:25:13] yurik: git fetch then compare current head with whatever you fetched: git log HEAD..FETCH_HEAD [08:25:17] or something like that [08:25:21] then git rebase if you are happy :D [08:25:39] i haven't used git rebase without params [08:25:53] i.e. make sure you are not going to bring in another change that might have been merged but remains undeployed [08:25:56] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet3: correct resource title for removal [operations/puppet] - 10https://gerrit.wikimedia.org/r/137893 (owner: 10Giuseppe Lavagetto) [08:26:21] (03PS5) 10Ori.livneh: mediawiki::php: restore php5-igbinary; delete some unused files [operations/puppet] - 10https://gerrit.wikimedia.org/r/137470 [08:26:50] <_joe_> ow, how stupid I am [08:26:52] akosiaris: ^ amended. we don't have any lucid snapshot hosts, so modules/snapshot/manifests/phpfiles.pp could be killed [08:27:08] ori: yes it could :-) [08:27:10] hashar, right, gotcha [08:27:16] akosiaris: good catch, thanks for that [08:29:21] (03PS1) 10Giuseppe Lavagetto: puppet3: remove 2.7 prefs, not the actual ones [operations/puppet] - 10https://gerrit.wikimedia.org/r/137895 [08:30:00] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppet3: remove 2.7 prefs, not the actual ones [operations/puppet] - 10https://gerrit.wikimedia.org/r/137895 (owner: 10Giuseppe Lavagetto) [08:35:39] ms100{1,2,4} ? what are those machines ? [08:35:53] site.pp is not really forthcoming [08:36:21] "maybe something" [08:37:02] I sure hope you are joking [08:37:58] cause they are running nginx and HTCPpurger and are supposedly serving thumbs ? [08:38:33] i am joking :) i don't know what they are [08:38:48] you had me for a minute there [08:39:21] heh, truth is not much better: per ganglia, ms is "miscellaneous" [08:40:09] sigh [08:40:27] well they are lucids, which is why I picked them up [08:41:00] I am starting to wonder what will happen if I power them down [08:41:33] (03PS1) 10Giuseppe Lavagetto: labs: default all clients to puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137898 [08:42:08] (03CR) 10Giuseppe Lavagetto: "I'll leave it to labs people when and if to merge this." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137898 (owner: 10Giuseppe Lavagetto) [08:42:21] maybe something :) [08:42:59] <_joe_> ori: are you still awake? so I can ask you something about rcstream [08:43:02] <_joe_> :) [08:43:03] https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=&vl=&x=&n=&hreg[]=ms100%5B1-9%5D&mreg[]=bytes_%28in%7Cout%29>ype=stack&glegend=show&aggregate=1&embed=1&_=1402044083307 [08:43:12] I will be damn if those machines actually do anything [08:43:13] _joe_: what's up? [08:43:34] _joe_: You must be new here; ori goes to bed at 11 AM UTC [08:43:47] * twkozlowski winks at ori [08:43:48] <_joe_> ori: I want to introduce some performance instrumentation, and I was asking myself which could be the best way to do that [08:43:48] twkozlowski: shh, no outing [08:44:14] what do you want to measure? [08:44:15] <_joe_> twkozlowski: I share the pain of high insomnia with ori, so I always want not to keep him awake :) [08:45:13] <_joe_> ori: things like the number of changes dispatched, number of clients per subscription type, "response times" do not make that much sense probably [08:45:55] <_joe_> ori: I was thinking of storing counters in redis, it's easy to make counts per minute using it [08:46:21] are you sure it's not featuritis? i thought about that, but i don't see much value [08:46:32] <_joe_> it probably is :P [08:46:45] <_joe_> it's just that I love metrics [08:47:07] no shortage of things to instrument that really do need more instrumentation :P [08:47:12] <_joe_> but, if the number of changes dispatched drops [08:47:21] <_joe_> there is a problem somewhere [08:47:31] <_joe_> and we could set up an alert on that [08:47:43] <_joe_> I want to have a way to check that easily [08:48:57] mm, i'd prefer to wait for some indication that it is failure-prone [08:49:46] bytes_out in ganglia is probably as good indication as any [08:50:34] hashar, is there a way to specify an extra header in eval.php ? [08:51:04] i couldn't find any doc on eval.php for some reason [08:51:17] yurik: there is no doc for eval.php afaik. Read the source :D [08:52:06] are you trying to do a web request ? [08:52:21] <_joe_> ori: yeah it is [08:52:26] <_joe_> agreed :) [08:53:02] _joe_: the job queue is a black pit of darkness, btw [08:53:09] lots to instrument there [08:53:14] you've already done some good work there [08:53:16] <_joe_> yeah I kinda noticed [08:53:20] <_joe_> :P [08:53:35] <_joe_> ori: first I must give some love to the varnishes stats [08:53:58] (03CR) 10Alexandros Kosiaris: [C: 032] mediawiki::php: restore php5-igbinary; delete some unused files [operations/puppet] - 10https://gerrit.wikimedia.org/r/137470 (owner: 10Ori.livneh) [08:54:02] <_joe_> which is probably what I'm going to do staring today [08:54:37] paravoid: ping [08:55:37] akosiaris: thanks! [08:56:13] ori: thank you as well. [08:56:23] * akosiaris still baffled by ms100X [08:57:48] <_joe_> akosiaris: eheh ghosts form the past [08:58:49] for a moment I was like, heh swift [08:59:09] guess what, swift is not installed [08:59:26] but for some reason 24TB of spaces are used by thumbs [08:59:28] meh... [09:00:01] Is there some trick that I’m missing for making Ubuntu 14.04 work on a Dell 1950? [09:01:04] <_joe_> preilly: a poweredge 1950 seems quite old, maybe some drivers are not in the standard ubuntu kernel anymore? [09:01:14] <_joe_> (I know this is very generic) [09:01:23] 2006 machines ? [09:01:39] _joe_: well the installer works just fine just doesn’t find disk after reboot [09:01:44] 2007 model [09:02:17] <_joe_> preilly: as I said above, you're probably missing some module from the kernel [09:02:57] probably initramfs [09:03:23] well I’m loading megaraid_sas in initramfs [09:04:36] <_joe_> maybe support for that generation of percs has been moved to another module [09:05:03] PROBLEM - DPKG on vanadium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:06:03] RECOVERY - DPKG on vanadium is OK: All packages OK [09:15:05] (03CR) 10Hashar: "_joe_ the beta cluster and integration labs projects have their own puppetmaster so we can probably attempt the migration to puppet 3 on b" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137898 (owner: 10Giuseppe Lavagetto) [09:15:53] <_joe_> hashar: at this point, it's really up to labs people [09:16:06] <_joe_> I know too little about labs to make any kind of call [09:16:53] <_joe_> althoug it seems much more time given how much I'm annoying, I'm just here since two months :) [09:16:57] I know to little about puppet 2.7 -> 3.0 migration [09:17:08] <_joe_> hashar: should be seamless [09:17:29] <_joe_> it was seamless (mostly) in all cases where we tried [09:17:46] \O/ [09:17:58] I am going to break the integration labs project so ! [09:18:03] s/break/upgrade/ [09:18:05] <_joe_> it could still break things for individual machines [09:18:11] <_joe_> on friday? [09:18:12] <_joe_> mmmh [09:18:26] it is technically still thursday in hawai [09:18:32] so I can pretend I was working from there [09:18:33] <_joe_> You did not had plans for this weekend? [09:18:45] <_joe_> :P [09:18:54] beside working and the usual on call ? Not much planned :-D [09:18:59] it is ok, it just puppet! [09:21:15] speaking of puppet! I have also https://gerrit.wikimedia.org/r/#/c/136128/ up for review if somebody wants to take a stab :) (mini-dinstall for releases.wikimedia.org) [09:25:25] <_joe_> godog: I do have some time now [09:25:32] <_joe_> I can take a look [09:25:38] thanks _joe_ ! [09:29:09] <_joe_> godog: the first question that comes to mind is: why not reprepro? [09:29:15] <_joe_> :) [09:29:30] <_joe_> oh sorry this is the build pipeline [09:30:19] the main reason is that reprepro doesn't support multiple versions in a suite, however that was a requirement gwicke mentioned [09:32:13] <_joe_> so we're releasing mediawiki and the rest as debs and we want to have multiple versions in a single suite [09:32:56] <_joe_> which is not exactly something apt will care about unless someone pins the version [09:33:20] <_joe_> (another one of the uncountable reasons why debs are a bad way to distribute webapps) [09:33:29] <_joe_> but still, I do see the point [09:33:33] <_joe_> kind of [09:34:01] <_joe_> more software to support/install, yay! [09:39:49] _joe_: will do puppet 3 migration later on. My instances puppet are broken cause of some dupe definition i have to figure out :D [09:42:25] <_joe_> :) ok [10:00:01] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Some small comments, apart from that LGTM." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 (owner: 10Filippo Giunchedi) [10:34:03] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 07:33:22 UTC [10:35:09] (03PS1) 10Hashar: mediawiki::packages learned to disable php-apc [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 [10:41:03] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [10:44:58] (03PS2) 10Hashar: mediawiki::packages learned to disable php-apc [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 [10:44:59] (03PS1) 10Hashar: Qualify $hostname in network manifest [operations/puppet] - 10https://gerrit.wikimedia.org/r/137911 [10:46:41] _joe_: puppet catalog compiler gave me an error: cp: cannot create regular file `/usr/local/bin/naggen': Permission denied :D [10:46:43] https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/59/console [10:46:56] <_joe_> yeah discard that [10:47:08] <_joe_> it means nothing [10:47:11] <_joe_> :) [10:47:25] :D [10:49:27] (03CR) 10Hashar: "I ran it via the puppet catalog compiler against node mw1042 but that is still failing:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137911 (owner: 10Hashar) [10:49:51] _joe_: and apparently neither $hostname nor $::hostname is set in puppet http://puppet-compiler.wmflabs.org/change/137911/compiled/puppet_catalogs_2.7_137911/mw1042.warnings [10:49:53] PROBLEM - Disk space on analytics1015 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/f 74307 MB (3% inode=99%): [10:50:36] neither $hostname or $::hostname seems to be set :/ [10:50:37] <_joe_> hashar: probably missing some facts files [10:51:43] (03PS3) 10Hashar: mediawiki::packages learned to disable php-apc [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 [11:03:33] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Fri Jun 6 11:03:25 UTC 2014 [11:30:54] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:32:54] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:34:54] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:36:54] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:38:54] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:41:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:43:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:45:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:47:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:49:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:51:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:53:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:55:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:57:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:57:22] PROBLEM - Disk space on analytics1010 is CRITICAL: DISK CRITICAL - free space: / 10 MB (0% inode=92%): [11:59:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:27:58 UTC [11:59:02] RECOVERY - Puppet freshness on analytics1024 is OK: puppet ran at Fri Jun 6 11:58:56 UTC 2014 [11:59:22] RECOVERY - Disk space on analytics1010 is OK: DISK OK [12:01:02] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 11:58:56 UTC [12:04:48] (03PS2) 10BBlack: netmapper URL based on realm (prod/labs) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137856 (owner: 10Yurik) [12:04:55] (03CR) 10BBlack: [C: 032 V: 032] netmapper URL based on realm (prod/labs) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137856 (owner: 10Yurik) [12:09:02] (03PS1) 10BBlack: simplify zero update url, remove outdated $zero_realm [operations/puppet] - 10https://gerrit.wikimedia.org/r/137916 [12:10:45] (03CR) 10BBlack: [C: 032 V: 032] simplify zero update url, remove outdated $zero_realm [operations/puppet] - 10https://gerrit.wikimedia.org/r/137916 (owner: 10BBlack) [12:28:35] RECOVERY - Puppet freshness on analytics1024 is OK: puppet ran at Fri Jun 6 12:28:26 UTC 2014 [12:29:54] (03Abandoned) 10Hashar: Qualify $hostname in network manifest [operations/puppet] - 10https://gerrit.wikimedia.org/r/137911 (owner: 10Hashar) [12:30:50] (03PS1) 10BBlack: zerofetch: action name s/zeroconfig/zeroportal/ (per Yuri) [operations/puppet/varnish] - 10https://gerrit.wikimedia.org/r/137919 [12:31:26] (03CR) 10BBlack: [C: 032 V: 032] zerofetch: action name s/zeroconfig/zeroportal/ (per Yuri) [operations/puppet/varnish] - 10https://gerrit.wikimedia.org/r/137919 (owner: 10BBlack) [12:37:26] (03PS1) 10Hashar: contint: reduce duplication with mediawiki::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 [12:40:44] (03PS4) 10Hashar: mediawiki::packages learned to disable php-apc [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 [12:40:53] (03PS2) 10Hashar: contint: reduce duplication with mediawiki::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 [12:41:49] (03PS1) 10Giuseppe Lavagetto: puppetmaster-labs: make puppetsigner work with puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137922 [12:41:52] (03PS1) 10BBlack: Bump modules/varnish to d6e27194 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137923 [12:42:15] (03CR) 10BBlack: [C: 032 V: 032] Bump modules/varnish to d6e27194 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137923 (owner: 10BBlack) [12:42:23] <_joe_> oh no! [12:42:31] <_joe_> I wanted to burn you to the merge :P [12:42:31] what? [12:42:46] <_joe_> paravoid: no nothing [12:42:51] <_joe_> I just have to rebase [12:43:00] <_joe_> because brandon has been faster than me [12:43:05] lol [12:43:19] (03PS2) 10Giuseppe Lavagetto: puppetmaster-labs: make puppetsigner work with puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137922 [12:43:39] that's what you get for waiting on jenkins! :) [12:44:03] heheh [12:44:05] <_joe_> bblack: no I was actually wondering why they did this change [12:44:24] <_joe_> (drop aliases for puppetca and puppetd in 3.x) [12:44:39] <_joe_> I was concluding that they want us ops to suffer [12:44:51] <_joe_> that's the only LOGICAL explanation [12:45:02] churn = confusion, confusion = consulting $$ [12:45:38] <_joe_> bblack: so you agree :) [12:45:51] ahha [12:45:57] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster-labs: make puppetsigner work with puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137922 (owner: 10Giuseppe Lavagetto) [12:48:26] <_joe_> I'm thinking I should go at the next puppet conference [12:48:36] I went to one [12:49:03] a bunch of us did, it was in SF during an allstaff or something [12:49:11] <_joe_> with a pair of pliers and a blow torch? [12:49:16] <_joe_> (cit) [12:51:12] <_joe_> oh my, it seems like the puppetsigner.py that is in puppet is not the file we use on virt1000 [12:52:04] <_joe_> paravoid: /usr/local/lib/instance-management/puppetsigner.py, do you have any idea where this comes from? [12:52:26] nioe [12:52:28] nope [12:52:55] <_joe_> ok so, for now I'll modify it by hand, when Coren is around I'll ask him [12:58:09] !log replacing raid controller db1020 [12:58:14] Logged the message, Master [13:08:50] _joe_: puppetsigner.py used to be in the mw extension OpenStackManager and should be in puppet now [13:09:21] <_joe_> hashar: unluckily, on virt1000 it's still used from an SVN copy of OpenStackManager [13:09:27] ahah [13:09:35] <_joe_> so maybe I change that [13:10:08] removed by https://gerrit.wikimedia.org/r/#/c/81151/ [13:10:17] now in puppet at ./modules/ldap/files/scripts/puppetsigner.py [13:10:34] <_joe_> yeah, well [13:10:45] <_joe_> nothing installs it so... [13:11:10] ah that is not helpful [13:11:31] _joe_: puppet has /usr/local/sbin/puppetsigner.py [13:11:33] different path [13:11:49] I guess the one in /usr/local/lib/instance-management can be phased out [13:12:59] <_joe_> hashar: that is a symlink, as defined in puppet [13:13:11] <_joe_> I'll fix that in a few [13:14:24] meanwhile on labs, I am migrating to puppet3 (hopefully) [13:15:39] * hashar throws consulting $$ at _joe_ for bash: /usr/sbin/puppetd: No such file or directory [13:19:27] <_joe_> hashar: ahah that is exactly the same problem as with puppetsigner [13:19:39] <_joe_> http://docs.puppetlabs.com/guides/tools.html#puppet-cert-or-puppetca [13:19:55] <_joe_> or, more properly, http://docs.puppetlabs.com/guides/tools.html#puppet-agent-or-puppetd [13:20:11] <_joe_> hashar: use the line you find in /etc/cron.d/puppet [13:20:35] <_joe_> puppet agent --onetime --no-daemonize --show_diff [13:20:45] <_joe_> if my memory serves me well [13:20:59] I went with puppet agent -tv [13:21:06] do you have any guide on wikitech about it ? [13:21:21] <_joe_> of course not [13:21:25] <_joe_> :) [13:21:39] <_joe_> should we copy that page from puppetlabs? [13:21:46] you are good fit to our culture ( see bug 1 ), 10/10 will hire again [13:22:07] a puppet 3 overview short guide would be nice [13:22:59] ori: Krinkle|detached http://codepen.io/Krinkle/full/laucI is cool! :) I guess RCStream doesn't send rc tags? [13:24:55] <_joe_> hashar: ? [13:26:10] _joe_: forget it. Just need to update some commands on https://wikitech.wikimedia.org/wiki/Puppet [13:26:15] i.e. there is no more puppetca [13:26:54] (03CR) 10Hashar: [C: 031] "I cherry picked this patch on the labs project 'integration' since it has its own puppetmaster. Everything went fine." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137898 (owner: 10Giuseppe Lavagetto) [13:29:37] (03CR) 10Andrew Bogott: "I'd like to sit on this for a few days and make sure that nothing interesting happens in beta." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137898 (owner: 10Giuseppe Lavagetto) [13:30:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:32:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:34:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:36:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:37:28] cmjohnson1: i may have missed this, but am just double checking [13:37:38] ssds on those old solrs are checked out, and the are ready for install, ja? [13:38:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:40:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:41:43] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [13:42:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:44:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:46:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:48:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:50:32] (03PS6) 10Rush: dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 [13:50:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:28:08 UTC [13:51:43] RECOVERY - Puppet freshness on mw1202 is OK: puppet ran at Fri Jun 6 13:51:39 UTC 2014 [13:52:13] dunno what mw1202's problem was but as soon as I ran puppet manually it was ok, so that's fun [13:52:30] (03CR) 10Rush: [C: 032] dns for phabricator.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/137825 (owner: 10Rush) [13:53:00] <_joe_> chasemp: _happens_continuously_ [13:53:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:51:39 UTC [13:54:00] and back to goofy? [13:54:17] <_joe_> It's plainly stupuid [13:55:22] !log Gerrit having some troubles: error: RPC failed; result=22, HTTP code = 503 (while cloning CirrusSearch ) [13:55:27] Logged the message, Master [13:55:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:51:39 UTC [13:55:45] hashar: same problem here, auth-dnsupdate has hung because of it I think [13:55:52] yeah Gerrit is in trouble [13:56:46] box itself seems ok, any ideas? [13:57:26] antinomy which host giblet has high cpu [13:57:29] guess that is related [13:57:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:51:39 UTC [13:57:50] I'm on it now with htop and it's not that high? [13:58:03] RECOVERY - Puppet freshness on mw1202 is OK: puppet ran at Fri Jun 6 13:57:53 UTC 2014 [13:58:05] <_joe_> hashar: gitblit suffers every thursday-friday [13:59:03] anyay Gerrit is on ytterbium.wikimedia.org [13:59:30] hmm [13:59:43] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 13:57:53 UTC [14:00:58] chasemp: I don't know that much about Gerrit though :( [14:01:05] maybe restart the process on ytterbium? [14:01:52] locked since 13:38:00 UTC apparently [14:02:09] http://paste.openstack.org/show/83116/ [14:02:27] restarted the gerrit process [14:02:27] qchris: if you are around, Gerrit is not happy :-( Some 503 are thrown when cloning [14:02:30] seems to be back now [14:02:56] that would break the replications but it is smart enough to catch up [14:03:38] there really wasn't any way around it I think [14:03:41] solved [14:03:46] thanks chase! [14:04:01] !log Gerrit back. chase rebooted it :) [14:04:01] <_joe_> remember the SAL, chasemp [14:04:02] <_joe_> :) [14:04:06] Logged the message, Master [14:04:58] gah yes thank you hashar [14:05:14] yw :) [14:07:18] (03PS1) 10Giuseppe Lavagetto: puppetsigner: actually use the file in the repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/137929 [14:08:37] (03CR) 10jenkins-bot: [V: 04-1] puppetsigner: actually use the file in the repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/137929 (owner: 10Giuseppe Lavagetto) [14:09:55] <_joe_> oh f*** you jenkins, I'm just moving a file around [14:10:08] bblack, aagghhh, still getting 'No disk drive was detected.' [14:20:34] ottomata: that's odd [14:21:32] does dmesg give anything useful? [14:21:52] (I think you can get out of that "no disk drive" back to some manual installer-menu and launch a shell) [14:22:31] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] "if we want a pep8-compliant script, we must rewrite it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137929 (owner: 10Giuseppe Lavagetto) [14:22:54] k will check [14:24:23] bblack, looking at dmesg, any clues as to what to look for? [14:28:18] RECOVERY - Puppet freshness on mw1202 is OK: puppet ran at Fri Jun 6 14:28:12 UTC 2014 [14:29:44] ottomata: any indication of a storage card or devices? or errors talking to one? [14:30:02] still reading, just got to some stuff about megaraid and megasas... [14:30:05] lspci might be helpful to if they included it [14:30:16] *too [14:30:47] [ 1153.601628] megaraid_sas 0000:02:00.0: PCI INT A -> GSI 42 (level, low) -> IRQ 42 [14:30:47] [ 1153.601633] megaraid_sas 0000:02:00.0: setting latency timer to 64 [14:30:47] [ 1153.601723] megasas: FW now in Ready state [14:30:47] ... [14:30:47] [ 1153.622962] megasas:IOC Init cmd success [14:30:47] [ 1153.646982] megasas: INIT adapter done [14:30:48] [ 1153.719008] scsi0 : LSI SAS based MegaRAID driver [14:30:50] hmm [14:31:32] <_joe_> andrewbogott: the version of puppetsigner.py whe had in the repository does not work, so, rewriting it from scratch so that if makes sense [14:31:35] I believe that is the correct driver for the Perc 710 [14:31:41] (megaraid_sas) [14:31:48] <_joe_> I *hate* half-baked changes [14:31:52] bblack, yeah, megaraid and megaraid_sas are options in the device menu [14:31:58] _joe_: 'does not work'? [14:32:03] shoudl I just select megaraid_sas and continue install? [14:32:07] Surely the version that was actually running on virt1000 worked...? [14:32:11] try it and see I guess [14:32:15] k [14:32:22] you'd think if the driver came up in dmesg, it would've already done that [14:32:35] yeah [14:32:58] hashar: Sorry. Was in a meeting. I read that Gerrit is fine now again? [14:33:07] hm, megaraid_sas just hung a bit and then gave me the same device menu again [14:33:09] trying megaraid... [14:33:22] nope, same [14:33:30] maybe there's another step to do in that control-R bios after defining the VDs [14:33:41] I donno, it's a newer PERC card than the hosts I've done this on before [14:33:44] qchris: yes chase rebooted it. sorry [14:33:50] some kind of "initialize VD device" step [14:33:57] qchris: something got stuck, maybe the replication to the hosts [14:34:07] hashar: Looking through the log files ... [14:34:23] bblack, that ctrl r menus is from perc card? [14:34:24] hashar, chasemp: Thanks for fixing gerrit. [14:34:25] want to google.. [14:34:47] qchris: apparently locked around 13:38:00 UTC. A partial show queue output http://paste.openstack.org/show/83116/ [14:35:38] ottomata: yes [14:35:57] ottomata: I'm reading up in the manual for the card, it sounds like you do have to explicitly initialize somewhere in there, after creating the VD [14:36:07] are you looking at the same manual I am [14:36:09] i just googled for perc [14:36:15] poweredge-rc-h310_User's%20Guide_en-us.pdf [14:36:15] ? [14:36:16] <_joe_> andrewbogott: as in, does not sign the certs [14:36:22] yes [14:36:28] With v3 you mean? [14:36:37] Anyway, I have no objection to you rewriting, as long as it's in python :) [14:37:09] * andrewbogott wants to fight about perl v pythin, it will distract from writing team reviews [14:37:09] (from lspci: 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05) ) [14:37:19] (03PS1) 10Rush: chase settings [operations/puppet] - 10https://gerrit.wikimedia.org/r/137932 [14:37:20] <_joe_> andrewbogott: I love both [14:37:32] dang [14:37:34] (03CR) 10Rush: [C: 032 V: 032] chase settings [operations/puppet] - 10https://gerrit.wikimedia.org/r/137932 (owner: 10Rush) [14:37:41] <_joe_> "we play both kind of music, country and western" [14:37:43] ah i have to press f2 to do this! [14:37:44] :) [14:37:48] ottomata: also, there some seems to be something under the PD section of the bios to "convert a PD to non-raid" [14:37:55] yeah i'm reading that too [14:37:56] might be preferable to this "create a fake 1-disk raid" [14:37:57] we want non raid? [14:38:10] yeah all we're trying to do here is get the card the hell out of the way so we can access the disks [14:38:16] <_joe_> andrewbogott: the signer is disabled at the moment btw. [14:38:40] funny how much magic is involved in getting a disk controller to just serve disks :P [14:39:04] I like that you sent an email saying it was fixed right before disabling it :) [14:39:45] ottomata: did you sort out your F2 issues yet btw? if not I can go stab at it [14:40:08] maybe, gonna try again [14:40:18] i just missed my ctrl-r opportunity... [14:40:18] :p [14:40:31] yeah there's like 43ms window for that :) [14:40:40] i thought i got it too! [14:40:44] then all the sudden..BOOTING [14:40:45] psshhh [14:41:16] the way it works on my screen is you see the network card message for Ctrl+S, then some junk spams the screen over the top of that and you can't read it, but that's your cue to hit ctrl+R [14:41:34] got it [14:42:23] ahh, ok! i disabled my mission control f2 key [14:42:26] and now esc 2 works as f2 [14:42:37] I bet fn-f2 does too [14:42:43] uhhh, haha [14:42:50] naw fn-f2 is brightness! [14:42:52] anyway uh [14:42:56] i don't have a convert to non-raid option! [14:43:10] force offline? [14:43:13] or led blinking [14:43:13] hmm [14:43:18] well you don't have any free PDs right now [14:43:20] maybe because we initialied them before? [14:43:21] right [14:43:24] you probably have to delete the VDs [14:44:07] hmm, ok delted the disk groups [14:44:12] now i can only select [14:44:14] LED blinking [14:44:14] or [14:44:16] Make Global HS [14:44:27] guess that is hot spare [14:44:27] ghm [14:44:45] http://cl.ly/image/2d2q2v3H2Z06 [14:44:54] rebuild maybe? [14:44:59] nah [14:45:47] shoudl I try the initialization stuff, rather than the convert to non raid stuff? [14:46:10] what does the PD status stuff say on the right with the F2 menu not hiding it? [14:46:18] (03PS1) 10coren: add -u option to specify forcible refreshes [operations/software] - 10https://gerrit.wikimedia.org/r/137934 [14:46:33] http://cl.ly/image/0b0I2R2t2B1H [14:47:03] I saw "Error" earlier, but it was "No Error" heh [14:47:15] (03CR) 10coren: [C: 032] "Reflects status quo." [operations/software] - 10https://gerrit.wikimedia.org/r/137934 (owner: 10coren) [14:47:20] I guess maybe go back and create 1-disk VDs again, and try explicitly doing init (fast init) afterwards? [14:47:44] ok [14:48:35] (03PS1) 10Giuseppe Lavagetto: puppetsigner: only fetch unsigned certs [operations/puppet] - 10https://gerrit.wikimedia.org/r/137935 [14:48:55] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppetsigner: only fetch unsigned certs [operations/puppet] - 10https://gerrit.wikimedia.org/r/137935 (owner: 10Giuseppe Lavagetto) [14:48:56] apparently non-raid option is only on the H310 variant of the controller [14:50:09] hm ok, i think i fast-inited them [14:50:10] going to reboot [14:50:14] and try install [14:50:14] wait [14:50:16] ok [14:50:29] you might want to check in controller options and make sure the first VD is bootable, too [14:50:35] Select Bootable Device [14:50:51] ok, done [14:50:58] vd 0 [14:51:25] anything else? [14:51:32] not that I know of [14:51:37] ok [14:51:39] let's try! [14:51:51] I give your install attempt 27% odds of success now :) [14:51:55] haha [14:51:58] i feel the same way [14:51:59] hah [14:52:21] <_joe_> andrewbogott: solved, for now [14:53:42] _joe_: And you fixed the situation with puppet installing the script and such? (Probably that's in the backscroll...) [14:54:05] <_joe_> yes, I did [14:54:15] <_joe_> now the file is managed in puppet [14:54:18] bblack, around? lets create you an account on zero.wikimedia betalabs [14:54:38] (03CR) 10Ori.livneh: "1) I can't find commit 00ca778140" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 (owner: 10Hashar) [14:54:44] _joe_: right on, thank you! [14:54:45] (03CR) 10Ori.livneh: [C: 04-1] mediawiki::packages learned to disable php-apc [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 (owner: 10Hashar) [14:56:27] (03PS5) 10Hashar: mediawiki::packages learned to disable php-apc [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 [14:57:31] (03CR) 10Hashar: "Bah Gerrit is dumb and can't lookup a short commit :D git show would though." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 (owner: 10Hashar) [14:57:47] (03CR) 10Giuseppe Lavagetto: "I second Ori's comment. My prefrence goes to a file in conf.d" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 (owner: 10Hashar) [14:59:26] (03Abandoned) 10Ori.livneh: Disenroll rcstream from Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/136820 (owner: 10Ori.livneh) [14:59:30] <_joe_> ori: I very ungloriously ended up fixing issues with puppet 3 master [14:59:58] <_joe_> and also, don't abandon that change, I'm building the diamond package for trusty :) [15:02:25] _joe_: ok, i'll unabandon it [15:06:16] (03PS1) 10coren: Labs: new replication views [operations/software] - 10https://gerrit.wikimedia.org/r/137938 (https://bugzilla.wikimedia.org/61300) [15:07:21] _joe_: thanks! may want to make sure the diamond package has revision 2 in full package name to match precise [15:07:23] but really your call [15:07:28] chasemp: the admin module is the best thing since sliced bread [15:07:40] hey thanks [15:08:08] it's so nice to have a sane vim everywhere [15:08:52] ori, why aren't you asleep? ) [15:09:23] it's morning, i'm up [15:11:14] (03Abandoned) 10Hashar: mediawiki::packages learned to disable php-apc [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 (owner: 10Hashar) [15:12:00] <_joe_> ori: you sleep less than me... [15:12:31] <_joe_> yeah I'll add my .emacs.d to that soon [15:12:52] <_joe_> making the puppet repo 20 MB larger in a single move :P [15:12:54] i feel ori is never asleep. Didn't you guys had a kid recently? [15:16:37] _joe_: ori never actually sleeps [15:17:21] ori sleeping? citation needed [15:17:59] ori never sleeps. Evil never sleeps. Therefore, by logical fallacy, evil has a new name! :D [15:23:09] (03PS2) 10coren: Labs: new replication views [operations/software] - 10https://gerrit.wikimedia.org/r/137938 (https://bugzilla.wikimedia.org/61300) [15:23:21] (03PS3) 10Hashar: contint: reduce duplication with mediawiki::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 [15:23:25] (03PS1) 10Filippo Giunchedi: update default_gateway.rb module to ruby 1.9 [operations/puppet] - 10https://gerrit.wikimedia.org/r/137940 [15:23:36] (03CR) 10Hashar: "Integrated in https://gerrit.wikimedia.org/r/#/c/137921/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137910 (owner: 10Hashar) [15:24:15] (03CR) 10Hashar: "Make it a bit simpler by including mediawiki::packages as is which brings php-apc. From a review on https://gerrit.wikimedia.org/r/#/c/13" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 (owner: 10Hashar) [15:24:49] (03CR) 10Dzahn: [C: 032] kiwix use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137394 (owner: 10Rush) [15:24:53] that last review should make facter happy with ruby 1.9 (i.e. trusty) [15:25:04] \o/ [15:25:33] grr, bblack, dunno what this install is doing, i reboot, and it seems to netboot, but nothing shows on console for a long time [15:25:41] i guess i'm going to just leave it for a while [15:25:54] \o/ indeed! I think there's light at the end of the tunnel [15:25:58] maybe its just taking a long time to install across the atlantic [15:26:00] dunno [15:26:14] <_joe_> godog: thanks man! [15:26:29] <_joe_> now we have a 100% functional puppet-3 enabled labs :) [15:26:55] ottomata: partman setting up partitions? [15:27:20] (03CR) 10Dzahn: "User[mirror]/groups: groups changed 'www-data' to 'systemusers,www-data'" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137394 (owner: 10Rush) [15:27:37] maybe? [15:27:40] hopefully! [15:28:37] (03PS4) 10Hashar: contint: reduce duplication with mediawiki::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 [15:28:59] (03CR) 10Hashar: "Comments in PHP ini files should use ; instead of deprecated # :D" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 (owner: 10Hashar) [15:30:58] _joe_: andrewbogott does the puppet3 work mean that trusty images are fully usable without any caveats? [15:31:00] (03CR) 10Hashar: [C: 031] "Deployed on integration puppetmaster" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 (owner: 10Hashar) [15:31:22] YuviPanda: Almost [15:31:22] <_joe_> YuviPanda: almost [15:31:26] hah [15:31:34] <_joe_> wait for the change by godog to be merged I'd say [15:31:39] ah, of course. [15:31:51] should add some trusty exec nodes to toollabs next week or so, I guess [15:31:57] I think we need https://gerrit.wikimedia.org/r/#/c/137898/ also [15:32:02] which I'm not going to merge until next week [15:32:11] ottomata: going to try switching the limn user on stat1003 [15:32:19] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137940 (owner: 10Filippo Giunchedi) [15:32:32] probably also upgrading the trusty base image to have puppet3 [15:32:47] andrewbogott: cool! I'm probably not going to have any free time until end of next week, will remember to poke before doing anything [15:33:03] 'k [15:33:13] (03CR) 10Dzahn: [C: 032] limn use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137388 (owner: 10Rush) [15:33:47] <_joe_> godog: well once the other change (default to puppet 3 on labs) goes live that would not make much difference [15:35:04] mutante, ok [15:35:05] actually [15:35:13] hmmm, not sure we even need a limn user on stat1003 [15:36:42] (03CR) 10coren: [C: 032] "Deployed." [operations/software] - 10https://gerrit.wikimedia.org/r/137938 (https://bugzilla.wikimedia.org/61300) (owner: 10coren) [15:37:13] ottomata: hmm, oh, well, it worked and did not break [15:37:21] User[stats]/groups: groups changed 'systemusers' to 'systemusers,wikidev [15:37:24] btw [15:37:34] even though that wasn't related? [15:38:04] _joe_: true, not urgent in that case [15:38:57] <_joe_> godog, according to andrewbogott, we should make the switch next week [15:39:04] <_joe_> like, tuesday-ish [15:39:17] Yeah, I just want beta to be happy for a while. [15:39:29] Although it's very unlikely that anything will happen in a day that didnt' happen in an hour [15:39:31] ok! [15:39:33] <_joe_> andrewbogott, so beta in on puppet3? [15:39:47] _joe_: I think it is already… hashar, that right? [15:39:49] ok phew, this machine is installing! [15:39:49] yay! [15:39:59] <_joe_> wow. [15:40:08] <_joe_> you guys are amaizing. [15:40:24] ottomata: :) consider yourself lucky not having to write a new partman recipe:) [15:41:45] oh, i know [15:44:03] beta-hhvm is really broken [15:44:11] it's letting me download php scripts instead of running them [15:44:34] (03PS1) 10Ori.livneh: own dotfiles: don't set push.default=simple in .gitconfig (unsupported) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137943 [15:45:10] i feel silly asking, but could someone +1 that ^ [15:45:12] _joe_: andrewbogott I have no idea what beta cluster is using. I haven't changed it this afternoon [15:45:14] (03CR) 10Dzahn: [C: 032] librenms use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137389 (owner: 10Rush) [15:45:30] hashar: oh… ok, let me catch up then [15:45:36] _joe_: andrewbogott can switch it on monday though [15:45:45] I just migrate the 'integration' labs project [15:45:50] migrated [15:47:03] (03CR) 10Dzahn: "User[librenms]/groups: groups changed '' to 'systemusers'" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137389 (owner: 10Rush) [15:47:33] hashar: oh, ok, that's my confusion then [15:47:46] hashar, so integration uses puppet3 client and master both? [15:48:04] (03CR) 10Dzahn: [C: 032] tcpircbot use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137379 (owner: 10Rush) [15:48:07] andrewbogott: should be :) [15:48:18] integration-puppetmaster:~$ puppet --version [15:48:18] 3.4.3 [15:48:23] do you have a minute to make sure that's true, and verify that things are working? [15:48:41] yeah I ran puppet agent -tv on all the instances and they work fine [15:48:56] you can give it a try the master is integration-puppetmaster.eqiad.wmflabs [15:49:12] have to rush out to grab my daughter back from the nanny but will be back later this evening [15:49:20] (03CR) 10Ori.livneh: [C: 031] contint: reduce duplication with mediawiki::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 (owner: 10Hashar) [15:50:17] andrewbogott: overall that went flawlessly [15:50:22] great. [15:50:41] So, ok, in that case… _joe_, hashar, lets upgrade beta to 3 on Monday and then if that goes well roll out everywhere. [15:50:51] and I can build new images shortly thereafter. [15:51:18] (03CR) 10Dzahn: [C: 032] icinga use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137397 (owner: 10Rush) [15:51:27] andrewbogott: sounds good to me [15:51:36] will be back in 4 hours or so [15:51:36] I'll send a warning email to labs-l [15:51:41] yeah even better [15:51:44] <_joe_> well, leaving for the weekend [15:51:48] I am out to get my daughter back home [15:51:52] _joe_: enjoy! [15:51:59] <_joe_> have a good weekend everyone! [16:06:57] (03PS1) 10Ori.livneh: role::mediawiki::webserver: set maxclients to 100, dissolve bits role [operations/puppet] - 10https://gerrit.wikimedia.org/r/137947 [16:07:45] (03PS1) 10Dzahn: icinga - fix dependency cycle for system user [operations/puppet] - 10https://gerrit.wikimedia.org/r/137949 [16:08:46] (03PS1) 10Dzahn: Revert "Revert "ocg use generic::systemuser"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137950 [16:09:18] (03CR) 10Dzahn: [C: 032] icinga - fix dependency cycle for system user [operations/puppet] - 10https://gerrit.wikimedia.org/r/137949 (owner: 10Dzahn) [16:12:21] (03CR) 10Dzahn: "Systemuser[tcpircbot]/User[tcpircbot]/groups: groups changed '' to 'systemusers'" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137379 (owner: 10Rush) [16:13:36] (03CR) 10Dzahn: "User[icinga]/groups: groups changed 'dialout,nagios,icinga' to 'dialout,icinga,nagios,systemusers'" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137397 (owner: 10Rush) [16:15:14] (03PS2) 10Dzahn: ocg use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137950 [16:18:19] (03CR) 10Dzahn: [C: 032] ocg use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137950 (owner: 10Dzahn) [16:20:03] (03CR) 10Dzahn: "User[ocg]/groups: groups changed '' to 'systemusers'" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137950 (owner: 10Dzahn) [16:25:21] (03CR) 10Dzahn: [C: 031] own dotfiles: don't set push.default=simple in .gitconfig (unsupported) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137943 (owner: 10Ori.livneh) [16:25:48] PROBLEM - Disk space on analytics1015 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/f 90992 MB (4% inode=99%): /var/lib/hadoop/data/j 73601 MB (3% inode=99%): [16:26:38] ottomata: heya ^ [16:26:52] heh [16:27:06] i have a script to make that not happen soon! also more nodes coming soon! [16:27:20] 'k :) [16:27:33] Oliver asked me to not delete some things this week, because he was still runnign a query [16:27:37] probably ok now, will ask him when he comes online [16:34:23] YuviPanda: Yeah, no changetags [16:34:37] YuviPanda: Because changetags are fundamentally fucked (technical term) there is no way to get them in real-time [16:34:46] Krinkle: ah, hmm. [16:34:53] They are, by utter stupid design, added after the fact. [16:35:05] Krinkle: oh wow, I thought the hooks were called before save and not after? [16:35:26] I'm not going to spoil it, look up an example of VE or AbuseFilter where they are used. [16:35:40] Create the change, then stuff some data into the ChangeTags table for that rc_id. [16:36:10] While that on itself could allow for it to be available, it is not. [16:41:30] (03PS3) 10Dzahn: ipython use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137391 (owner: 10Rush) [16:42:28] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [16:42:57] paravoid: ping [16:43:35] Krinkle: wow. just realized. [16:43:36] ugh [16:43:41] (03CR) 10Dzahn: [C: 032] ipython use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137391 (owner: 10Rush) [16:43:48] * preilly notes that it’s 7:43 PM Friday, June 6, 2014 (EEST) [16:44:13] mutante: we have an ipython module?! [16:44:15] * YuviPanda checks [16:44:32] mutante: do you have any suggestions to getting Ubuntu 14.04 working on a Dell PowerEdge 1950 [16:44:35] YuviPanda: yes, Ori seems to be the author [16:44:42] :D [16:45:01] mutante: can’t see the drives after reboot from liveCD installation [16:45:40] preilly: sorry, haven't really installed the 14.04 [16:45:53] mutante: okay no worries [16:46:19] Oh and for anybody that cares I have 400 Dell PowerEdge 1950’s for sale for $50.00 a piece [16:46:48] (03CR) 10Dzahn: "User[ipython]/groups: groups changed '' to 'systemusers'" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137391 (owner: 10Rush) [16:50:29] YuviPanda: However it could be done using a similar hack as how patrollers are able to process when some edit is marked as patrolled [16:50:33] if anybody wants to have a server to play around with that is [16:50:35] (or a page is deleted) [16:50:38] YuviPanda: Another event [16:50:48] We could add a hidden RC_ type that "adds" change tags [16:51:02] They'll come async, and you can't anticipate them. [16:51:08] But that's only accurate.. [16:51:15] they really can be added/removed at any time [16:52:24] hmm, poweredge 1950's. i suppose they could make good space heaters ;) [16:52:39] ebernhardson: ha ha [16:53:15] ebernhardson: well for $50.00 USD it seemed like a decent box for playing around 2 quad cores and 32 GB ram [16:53:18] Krinkle: I remember an RfC floating around a long time ago to be able to fix it [16:53:36] yes tons of discussion with no outcome [16:53:47] triggered by the alleged oversizeness of the logging table [16:57:07] preilly: I'd buy a couple but it's the shipping that kills. :-) [16:57:20] Coren: yeah where are you based? [16:57:43] preilly: Canada. [16:57:59] Yeah that shipping would stink [16:58:12] bug bounty, users who resolve one of the X oldest bugs get a free Dell server :) [16:58:24] ha ha yeah I’m down for that [16:58:55] They’re just sitting on the floor at 365 Main right now [16:59:29] ottomata: are you re-installing elastics? [17:01:12] cmjohnson1: not yet, but will soon [17:01:43] oh....they're installed but looping..I need to fix bios [17:03:09] k [17:03:13] eating lunch a nyway.. [17:12:28] _joe_: have you tried creating a self-hosted instance with puppet 2 and then switching it to 3? [17:12:49] <_joe_> andrewbogott: I did and it was flawless [17:12:53] great. [17:12:57] <_joe_> and I think godog did too [17:13:11] <_joe_> but try for yourself [17:13:24] <_joe_> upgrade your whole swift labs cluster to puppet 3 :P [17:30:00] (03PS2) 10Dzahn: rcstream use generic::systemusers [operations/puppet] - 10https://gerrit.wikimedia.org/r/137382 (owner: 10Rush) [17:31:59] (03PS1) 10Rush: phabricator trial [operations/puppet] - 10https://gerrit.wikimedia.org/r/137956 [17:32:01] (03CR) 10Faidon Liambotis: [C: 04-2] "Why? It's fine as it is and in fact I prefer the current way rather than generic::systemuser." [operations/puppet] - 10https://gerrit.wikimedia.org/r/137382 (owner: 10Rush) [17:32:27] (03CR) 10Faidon Liambotis: [C: 04-2] "Why?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137383 (owner: 10Rush) [17:32:44] oh wow this is a larger patch series [17:32:49] chasemp: why are you doing that? [17:32:55] (03PS2) 10Rush: phabricator trial [operations/puppet] - 10https://gerrit.wikimedia.org/r/137956 [17:34:04] I actually think we should go the other way around and kill generic:: entirely [17:35:21] paravoid: so we had talked about system users needing to be in a supplementary group if they are greater than uid 500 to avoid removal by enforce-user-and-groups [17:35:42] this is basically funneling all service users through that in order to do so [17:35:58] ? [17:36:14] I don't remember this conversation [17:37:26] why would system users have a uid > 500? [17:37:33] was months ago and at athens, doesn't matter, want to talk about what you want to do? [17:37:47] gmetric does for example [17:38:01] gmetric is not a system user, and that's a bug [17:38:08] what it is? [17:38:17] the systems are cluttered with /home/gmetric too for example.. [17:39:14] $ grep SYSTEM /etc/adduser.conf [17:39:16] FIRST_SYSTEM_UID=100 [17:39:16] LAST_SYSTEM_UID=999 [17:39:24] 999, meh [17:39:34] and [17:39:37] FIRST_SYSTEM_GID=100 [17:39:37] LAST_SYSTEM_GID=999 [17:39:56] these should much our numbering [17:40:03] user { system => true } would do the right thing then [17:41:59] too much to type out, would rather just have a quick convo, in essence it wasn't set on all things I would say are "service" or "system" users and the cleanup logic in the admin module will kill anything above 500 or not in a supplementary group so the sanest thing seemed to be to funnel them through a common defined type that will ensure system => true and/or add accounts to a systemusers group [17:42:44] no, a supplementary group for system users is conceptually wrong [17:42:48] (03CR) 10Dzahn: [C: 031] Tools: Remove faulty exim configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/127481 (owner: 10Tim Landscheidt) [17:43:05] and, additionally, using an abstraction to add system users is error-prone and ugly too [17:43:19] I didn't create it :) [17:43:20] greg-g et al, can we update/redirect https://wikitech.wikimedia.org/wiki/Bits.wikimedia.org as appropriate? [17:43:21] finally, there are system users that get added by packages, not puppet [17:43:42] I would assume package users are all in the correct system range, or that was my assumption [17:43:56] we are duplicating some users I believe that packages create in puppet [17:44:00] so having system users added by puppet & packages be treated equally is more correct [17:44:25] right, the users added by packages would be in the correct range, but would not be in the supplementary group [17:44:36] andrewbogott: you did the bits conversion, right? see ^^ [17:44:40] which is fine from this standpoint [17:44:47] I mean safety wise for cleanup [17:44:50] so having a suppl group "systemusers" that really means "system users that we added via puppet and screwed up by forgetting system => true" is wrong :) [17:44:58] greg-g: bits? I don't think that was me. [17:44:58] it doesn't mean that tho [17:45:29] how it's setup now it would mean any service user added via puppet, I hit all of them to standardize [17:45:38] I frankly don't care, just want it to be a. safe b. consistent [17:45:42] and it was none of those things [17:45:51] greg-g, https://gerrit.wikimedia.org/r/#/c/136317/ filippo [17:46:06] if you would rather reverse and clean out the generic::systemuser entirely that's a convo [17:46:12] Eloquence: yep! [17:46:14] but I went what seemed to be the existing standard [17:46:28] yeah, sorry about that confusion [17:46:32] finally! cp3018 has booted! [17:46:34] i twas in a loop too! [17:46:38] we haven't been using generic::systemuser for new stuff generally [17:46:58] that's part of the problem with your change though [17:47:07] the other part is the suppl group which I dislike [17:47:20] and the third part, is not renumbering system users to the 1-999 range [17:47:32] as long as users are correctly within uid boundaries it's not a thing at all to me or to cleanup [17:48:04] well we had agreed in 0-500 [17:48:05] that bits.wikimedia.org wikitech page is weird ;) [17:48:14] the real solution would be to renumber those users that are in the wrong-range (i.e. the system users that were created as regular users) [17:48:29] Eloquence {{done}} [17:48:35] I don't think peter ever had anything to do with bits.wikimedia.org ;) [17:48:53] I removed that :) [17:48:57] haha, hmmmm [17:49:00] this is new for a new install for me: [17:49:03] root@cp3018:~# puppetd --test [17:49:03] The program 'puppetd' is currently not installed. You can install it by typing: [17:49:03] apt-get install puppet [17:49:08] what did it install!? [17:49:09] hmm [17:49:22] paravoid: what's ugly and error prone about an abstraction of creating system users? [17:50:13] it's a thin layer on top of an adequate existing abstraction for creating system users [17:50:33] ottomata: it's like me when I used to apt-get install epiphany but really want epiphany-browser (not some random game) [17:50:41] what ori said [17:50:49] a thin layer that does magic too [17:50:55] $always_groups=['systemusers'], [17:50:55] OH [17:50:56] puppet 3! [17:51:02] interesting! [17:51:10] ottomata: that's a bug :) [17:51:25] ottomata: it's supposed to be pinned, but I assume pinning was added in puppet but not the installer [17:51:32] pinned to 2.7 [17:51:32] ha [17:51:43] paravoid look at it from my perspective, I want to do user cleanup which is going to be a lot, lot of people adn there are anamolies like gmetric and inconsistent use of user { 'foo': } and it's the wild west [17:51:45] welp, i suppose this is a good a place as any to test puppet 3 in production :p [17:51:45] haha [17:51:45] puppet3 it won't work for you, as the master hasn't been upgraded yet [17:51:49] AHHH [17:51:50] right [17:51:50] so I'm trying to do what is safest [17:51:51] only in labs [17:51:59] i think it was fine until people extended it to become basically the same as 'user' ;) [17:52:05] ok, i'll remove and just specify version.. [17:52:19] plus abstractions that I guess exist that no one wants used but are in use still with no notes about it [17:52:23] so it's frustrating and confusing [17:52:28] chasemp: do you have a list of puppet-provisioned users that are not admin accounts but have no system => true set? [17:52:40] chasemp: some of the anomalies are deliberate [17:52:52] greg-g, thanks :) [17:53:00] ottomata: elastics101[7-9] are installed. I did not add puppet certs/salt etc. I assume you will want to do that when you're ready [17:53:05] oh, ok awesome, thanks [17:53:06] * ori will bbiab [17:53:18] hopefully the partman did the right thing [17:53:20] it probably did [17:53:23] paravoid: I don't but there all there in that topic branch, i'm sure I could come up with a list if that's what we want to do [17:53:27] it should have [17:53:41] ori: i don't understand deliberate anomalies comment [17:53:52] these are different than the other boxes..they're 2.5" 250GB disks [17:53:59] chasemp: no, rcstream had system => true, so that list is at least a superset of what I asked :) [17:54:11] sure I didn't mean 1:1 just contained within [17:54:16] hm, we have an older version of puppet in our own apt [17:54:20] should we remove that? [17:54:32] I guess it's immaterial, what should the end result be? that's all that matters [17:54:33] ottomata: no, you should *use* that [17:54:47] all instances os generic::systemuser should be gone? [17:55:04] ottomata: the others are 2 300GB disk (3.5 disk). Shouldn't be an issue [17:55:07] there shouldn't be any users defined in puppet now who _aren't_ service users [17:55:25] hm, paravoid, i don't think other hosts are using that [17:55:40] palladium for instance [17:55:46] Version: 2.7.11-1ubuntu2.7 [17:55:58] our apt is http://apt.wikimedia.org/wikimedia/pool/universe/p/puppet/puppet_2.7.6%2b2.7.7rc2-1wm1.dscm1 [17:55:59] ack [17:56:00] link [17:56:06] 2.7.6+2.7.7rc2-1wm1 [17:56:26] chasemp: so, from https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:systemuser,n,z [17:56:35] (welcome to ops chat where you get to have 3 conversations at once!) [17:56:55] chasemp: I only see one: "mysql" [17:57:12] all the others are system => true (with the exception of nova, that is just changing a preexisting system user) [17:57:49] I don't know what you are saying, I'm saying all system users should be teh same, since generic::systemuser existed I was making them the same under that container [17:58:01] if you don't want generic::systemuser at all then ok [17:58:19] I'm asking which users are actually problematic for your cleanup script [17:59:05] yes ok, but my efforts were to standardize so I could make sense of service accounts so that I could ensure they would be? [17:59:37] the fact that service users are not defined the same everywhere is the problem I was trying to fix [17:59:48] the outcome of that is I would ensure tehy wouldn't be nuked on account cleanup [17:59:54] but it's two overlapping problems [18:00:04] (03Abandoned) 10Ottomata: solr100[1-3] are now elastic101[7-9] [operations/dns] - 10https://gerrit.wikimedia.org/r/137720 (owner: 10Ottomata) [18:00:18] manybubbles: https://gerrit.wikimedia.org/r/#/c/137721/ [18:00:58] PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 2313 MB (3% inode=89%): [18:05:52] (03CR) 10Ottomata: [C: 031] Setting up solr100[1-3] as elastic101[7-9] [operations/puppet] - 10https://gerrit.wikimedia.org/r/137721 (owner: 10Ottomata) [18:08:05] (03PS2) 10Ottomata: Setting up solr100[1-3] as elastic101[7-9] [operations/puppet] - 10https://gerrit.wikimedia.org/r/137721 [18:08:47] manybubbles: is that ok to merge and to install? [18:08:56] that would add those nodes into the elasticsearch cluster [18:11:34] ACKNOWLEDGEMENT - Disk space on analytics1015 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/j 60252 MB (3% inode=99%): ottomata on it [18:12:11] that was the first time i was able to successfully ack somehting in icinga, woohoo! [18:14:49] (03PS1) 10Faidon Liambotis: Revert recent generic::systemuser conversions [operations/puppet] - 10https://gerrit.wikimedia.org/r/137963 [18:15:31] chasemp: ^^ [18:16:07] that's cool, tbh I wasn't overly fond of it but it was consistent at least [18:16:31] all of them had system => true, so it shouldn't be an issue for the cleanup script [18:17:00] mutante|away: ^ too, I saw you did some of the merges [18:17:08] are you arguing against standardizing system user definitions? I don't understand [18:17:30] (03CR) 10Faidon Liambotis: [C: 032] Revert recent generic::systemuser conversions [operations/puppet] - 10https://gerrit.wikimedia.org/r/137963 (owner: 10Faidon Liambotis) [18:18:14] chasemp: what do you mean? [18:18:21] my process was, I modified the cleanup script to dump a list of tobe-cleaned-ups, ran some reports, realized it was going to be a massive thing, saw service accounts, tried to make sense of what going on in puppet, and realized the best thing would be to do the right thing and fix service accounts across the board [18:18:38] do you have that list anywhere? [18:19:20] all of the ones I changed + the ones that were existing generic::systemuser children already, but I don't have a list outside of grepping [18:19:23] a user { 'foo': [...], system => true, } account shouldn't make your list, so the list of the service accounts that you found is probably underlying puppet bugs [18:19:49] well it did when I was changing them all to generic::systemuser [18:20:23] I'm not generating a list, I'm fixing them first so that none should show up on a list [18:20:32] or at least that is my intention [18:21:02] it seems like removing generic::systemuser and all things using it, and fixing the mysql user to user system => true [18:21:12] is what we want to do? [18:21:37] you made it sound like it was more than just mysql, no? [18:22:05] it is if you consider some being generic::systemuser and some not part of not being consistent [18:22:16] the idea that it's purely system => true is just a misunderstanding [18:22:19] I believe [18:22:55] you had an indication that the cleanup script was going to remove some service accounts [18:23:00] that's your end-goal, I believe, right? [18:23:09] your killing me smalls [18:23:11] ok so [18:23:20] I want to standardize how we create service users [18:23:22] is that cool? [18:23:33] let's table teh cleanup for a minute, it will be a natural byproduct [18:23:58] ok, that's fine [18:24:16] your thought is all should use system => true [18:24:24] and we should remove generic::systemuser stuff across the board? [18:24:26] yes [18:24:33] ok i can do that [18:24:39] and the uid and gid ranges you mentioned [18:24:46] (03CR) 10Manybubbles: [C: 031] Setting up solr100[1-3] as elastic101[7-9] [operations/puppet] - 10https://gerrit.wikimedia.org/r/137721 (owner: 10Ottomata) [18:24:55] what ranges? [18:24:57] that was 100 - 999 ? [18:24:58] the adduser.conf ranges? [18:25:02] that's what system => true does [18:25:06] cool, manybubbles then I shall proceed! [18:25:13] ok so that's outside of the bounds of the cleanup range we had merged [18:25:15] as long as this won't make the cluster explode (it won't, right?) :) [18:25:22] we merged logic to cleanup anything about 500 [18:25:26] seems like that is not what we want [18:25:34] so I was probably getting lots of bad info there [18:25:44] system => true passes --system to adduser, which in turn uses /etc/adduser.conf to pick an id [18:25:53] makes sense [18:26:01] so do we change adduser.conf to be 0 - 500 [18:26:10] or do we change teh cleanup script to ignore above 999 instead of 500 [18:26:18] if we do the former, we'll have to go back and renumber all the users that were previously added [18:26:30] so, I think the latter should be easier [18:26:39] yes that makes sense to me [18:26:40] ok [18:27:08] both would be equally correct, the second is just the path to least resistance :) [18:27:12] (I think) [18:27:23] yup, good with me, this is already crazy enough [18:28:44] alright I think I get it, sorry for the confusion about accounts and system = true, I was not intentionally being confusing [18:28:59] shall I abandon all of https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:systemuser,n,z ? [18:29:16] sorry it took me so long to notice :/ [18:30:41] (03PS5) 10Faidon Liambotis: contint: reduce duplication with mediawiki::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 (owner: 10Hashar) [18:30:43] those are all dead then yes, and I think mutante|away got 4 or 5 in already I will either revert or see if he has time, hung up on a weird ruby 1.8 vs 1.9 issue atm so may be monday? [18:30:46] (03CR) 10Faidon Liambotis: [C: 032] contint: reduce duplication with mediawiki::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 (owner: 10Hashar) [18:31:19] chasemp: no, that was the revert above [18:31:23] I reverted all of the merged ones [18:31:40] ah I don't know for sure that won't have some odd effect I was going to do them piecemeal and make sure [18:53:36] (03CR) 10Ori.livneh: [C: 032] own dotfiles: don't set push.default=simple in .gitconfig (unsupported) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137943 (owner: 10Ori.livneh) [18:56:25] rsyslog rsyslog rsyslog [18:56:26] (03CR) 10Ottomata: "So, that means I should proceed, right? Doing this will add the new nodes to the cluster. Everything should be fine if I do this, ja?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137721 (owner: 10Ottomata) [18:56:30] * ori whistles [18:57:08] brb... [19:02:43] (03Abandoned) 10Yurik: * INCOMPLETE * Enable ZeroBanner & ZeroPortal in production [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137457 (owner: 10Yurik) [19:21:12] (03CR) 10Nemo bis: Meta: automatic translation workflow state changes (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137804 (owner: 10Awight) [19:24:51] (03CR) 10Manybubbles: "The only funky thing will be that they are on a different version of java because I haven't upgraded the other nodes. It should be ok tho" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137721 (owner: 10Ottomata) [19:25:01] ottomata: do you want to do this now? [19:25:31] If not, on Monday morning I plan to do a rolling restart of Elasticsearch anyway to upgrade to 1.2.1 - oh! That'll requires your intervention for apt! [19:25:51] anyway, if you want to wait until the upgrade then the cluster will be normal - but adding the three extra nodes now should be ok too [19:26:06] (03PS2) 10Awight: Meta: automatic translation workflow state changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137804 [19:30:58] (03CR) 10GWicke: [C: 031] "@Giuseppe: Reprepo doesn't support multiple versions per package, which makes it impossible to revert to an earlier version. This matters " [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 (owner: 10Filippo Giunchedi) [19:31:30] manybubbles: sure, i can do for one of them [19:31:32] are you around? [19:31:38] ottomata: yeah [19:31:53] are they all partitioned and stuff? [19:33:17] should be, double checking.. [19:33:31] ja looks good [19:33:48] sweet go ahead and try one pleas1!] [19:34:00] slightly smaller disks that the others it loooks like [19:34:07] 402G vs 494G [19:34:30] (03CR) 10Ottomata: [C: 032 V: 032] Setting up solr100[1-3] as elastic101[7-9] [operations/puppet] - 10https://gerrit.wikimedia.org/r/137721 (owner: 10Ottomata) [19:34:41] (03PS1) 10Jforrester: Enable TemplateData GUI on Portuguese Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137986 (https://bugzilla.wikimedia.org/66255) [19:36:22] (03PS1) 10Jgreen: no-op formatting cleanup of role::otrs manifest [operations/puppet] - 10https://gerrit.wikimedia.org/r/137987 [19:36:57] ok, running puppet on elastic1017 [19:37:21] ottomata: did it get the git-deployed plugins? [19:37:28] is that something puppet does? I forget [19:37:46] hmmm, i don't think so, but you just reminded me that I need to get salt setup too [19:37:55] it might get them, but def not on the first puppet run [19:38:00] chicken/egg problem there [19:38:18] ottomata: ah, well, it can't join the cluster until it has them. it needs to stay out [19:38:40] hmmmmm [19:38:51] gonna be funkymaybe... [19:39:00] we should make that a requirement [19:39:02] salt working or something [19:39:03] in puppet [19:39:05] a dep [19:39:08] so the elastic install fails unless salt is working [19:40:12] (03CR) 10Jgreen: [C: 032 V: 031] no-op formatting cleanup of role::otrs manifest [operations/puppet] - 10https://gerrit.wikimedia.org/r/137987 (owner: 10Jgreen) [19:40:24] ottomata: ok, I'm not sure how to do that. [19:40:31] rather, I'm sure I could but it'd take me hours [19:40:59] ottomata: I can add the plugins being present as a requirement so elasticsearch starting [19:41:02] which might do it [19:41:15] I was thinking we should have that any way [19:41:29] and then we wouldn't have any trouble- elasticsearch could install no trouble [19:41:49] (03PS1) 10Rush: wikistats sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137988 [19:41:51] i think i can do it... [19:41:51] (03PS1) 10Rush: pmacct sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137989 [19:41:52] if that is the best thing then lets revert that change you merged, let me write that requirement on monday, then try again [19:41:53] (03PS1) 10Rush: planet sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137990 [19:41:55] (03PS1) 10Rush: modules/mysql_multi_instance/ sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137991 [19:41:57] (03PS1) 10Rush: jenkins sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137992 [19:41:59] (03PS1) 10Rush: deployment sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137993 [19:42:01] (03PS1) 10Rush: modules/coredb_mysql/ sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137994 [19:42:02] oh starting... [19:42:02] hmmm [19:42:03] (03PS1) 10Rush: bugzilla sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137995 [19:42:05] (03PS1) 10Rush: search sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137996 [19:42:07] (03PS1) 10Rush: parsoid sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137997 [19:42:09] (03PS1) 10Rush: otrs sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137998 [19:42:11] (03PS1) 10Rush: logging sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137999 [19:42:13] (03PS1) 10Rush: dataset sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138000 [19:42:15] (03PS1) 10Rush: install-server sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138001 [19:42:17] (03PS1) 10Rush: openstack sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138002 [19:42:18] that's tough because the deploy target and the main elasticsearch class are included at the same level [19:42:19] (03PS1) 10Rush: nfs sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138003 [19:42:21] (03PS1) 10Rush: statistics sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138004 [19:42:23] (03PS1) 10Rush: rancid sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138005 [19:42:25] (03PS1) 10Rush: icinga sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138006 [19:42:27] (03PS1) 10Rush: fundraising sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138007 [19:42:29] (03PS1) 10Rush: gerrit sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138008 [19:42:31] (03PS1) 10Rush: facilities sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138009 [19:42:32] i don't think we shoudl make a resource in the module itself depend [19:42:33] (03PS1) 10Rush: ganglia sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138010 [19:42:33] AHHH [19:42:35] (03PS1) 10Rush: removing systemuser definition [operations/puppet] - 10https://gerrit.wikimedia.org/r/138011 [19:42:37] (03Abandoned) 10Rush: webperf use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137377 (owner: 10Rush) [19:42:39] (03Abandoned) 10Rush: txstatsd use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137378 (owner: 10Rush) [19:42:39] i will wait... [19:42:40] :p [19:42:42] (03Abandoned) 10Rush: spamassassin use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137380 (owner: 10Rush) [19:42:47] (03Abandoned) 10Rush: fundraiding user add to systemusers [operations/puppet] - 10https://gerrit.wikimedia.org/r/137381 (owner: 10Rush) [19:42:51] (03Abandoned) 10Rush: rcstream use generic::systemusers [operations/puppet] - 10https://gerrit.wikimedia.org/r/137382 (owner: 10Rush) [19:42:54] (03Abandoned) 10Rush: puppetmaster use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137383 (owner: 10Rush) [19:42:57] ottomata: sorry! thanks [19:42:57] (03Abandoned) 10Rush: mysql_wmf use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137385 (owner: 10Rush) [19:42:59] (03Abandoned) 10Rush: mwprof use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137386 (owner: 10Rush) [19:43:01] and, good news manybubbles, puppet will deploy plugins on its first run [19:43:03] :) [19:43:05] (03Abandoned) 10Rush: mediawiki use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137387 (owner: 10Rush) [19:43:07] once salt is working [19:43:09] nice [19:43:10] (03Abandoned) 10Rush: jenkins user add to systemusers [operations/puppet] - 10https://gerrit.wikimedia.org/r/137390 (owner: 10Rush) [19:43:15] (03Abandoned) 10Rush: eventlogging use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137393 (owner: 10Rush) [19:43:16] but making salt work is a manual step (have to sign key) [19:43:20] [2014-06-06 19:41:47,122][INFO ][cluster.service ] [elastic1002] added {[elastic1017][4-0VUDgxS8W_4v2kZ4UQ6w][elastic1017][inet[/10.64.48.39:9300]]{rack=D3, row=D, master=false},}, reason: zen-disco-receive(join from node[[elastic1017][4-0VUDgxS8W_4v2kZ4UQ6w][elastic1017][inet[/10.64.48.39:9300]]{rack=D3, row=D, master=false}]) [19:43:22] (03Abandoned) 10Rush: authdns use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137395 (owner: 10Rush) [19:43:23] so, puppet will have to be run twice [19:43:28] (03Abandoned) 10Rush: nova use generic::systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137396 (owner: 10Rush) [19:43:28] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [19:43:37] i think we can make the class include depend on the deploy target [19:44:10] ottomata: ok [19:44:50] ottomata: looks like it didn't get the plugins in time - I'm going to fix it manually [19:45:08] ok.. [19:45:39] (03CR) 10jenkins-bot: [V: 04-1] bugzilla sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137995 (owner: 10Rush) [19:45:51] ah you are right manybubbles [19:45:56] i'm adding a dependency now [19:45:57] (03CR) 10jenkins-bot: [V: 04-1] search sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137996 (owner: 10Rush) [19:46:03] ottomata: sweet [19:46:03] hopefully it will help for the next one [19:46:13] It was pretty simple for this one - just had to bounce elasticsearch [19:46:15] (03CR) 10jenkins-bot: [V: 04-1] parsoid sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137997 (owner: 10Rush) [19:46:17] (03CR) 10jenkins-bot: [V: 04-1] otrs sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137998 (owner: 10Rush) [19:46:30] it tried to take some shards but failed to get them because it didn't have the plugin [19:46:40] (03CR) 10jenkins-bot: [V: 04-1] logging sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137999 (owner: 10Rush) [19:46:57] (03CR) 10jenkins-bot: [V: 04-1] dataset sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138000 (owner: 10Rush) [19:47:14] (03CR) 10jenkins-bot: [V: 04-1] install-server sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138001 (owner: 10Rush) [19:47:33] (03PS1) 10Ottomata: Add dependency for elasticsearch on deployment target for elasticsearchplugins [operations/puppet] - 10https://gerrit.wikimedia.org/r/138012 [19:47:35] (03CR) 10jenkins-bot: [V: 04-1] openstack sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138002 (owner: 10Rush) [19:47:52] (03CR) 10jenkins-bot: [V: 04-1] nfs sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138003 (owner: 10Rush) [19:47:57] i think that will work [19:47:59] not 100% sure [19:48:09] (03CR) 10jenkins-bot: [V: 04-1] statistics sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138004 (owner: 10Rush) [19:48:10] manybubbles: https://gerrit.wikimedia.org/r/#/c/138012 [19:48:20] (03CR) 10jenkins-bot: [V: 04-1] rancid sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138005 (owner: 10Rush) [19:48:37] (03CR) 10jenkins-bot: [V: 04-1] icinga sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138006 (owner: 10Rush) [19:48:50] ottomata: look at : manybubbles@elastic1017:~$ ls -lrtha /srv/deployment/elasticsearch/plugins/analysis-icu/ [19:48:56] (03PS3) 10Jforrester: Create a dblist for non-Beta Features wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120171 [19:48:59] (03CR) 10jenkins-bot: [V: 04-1] fundraising sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138007 (owner: 10Rush) [19:49:04] (03CR) 10Helder.wiki: [C: 031] Enable TemplateData GUI on Portuguese Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137986 (https://bugzilla.wikimedia.org/66255) (owner: 10Jforrester) [19:49:15] (03CR) 10Jforrester: "PS3 is a rebase." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120171 (owner: 10Jforrester) [19:50:10] ottomata: you wanted to see when a server doesn't get the right files [19:50:24] jut git-fat placeholders [19:50:30] woah [19:50:36] cluster is doing ok though - no problem there [19:50:37] I've never seen jenkins work so fast [19:50:49] ooo git fat didn't get run for initial checkout! [19:50:50] ! [19:50:51] hm [19:51:04] bblack: I wonder if I trust it.... [19:51:14] doing a deploy [19:51:14] (03CR) 10jenkins-bot: [V: 04-1] gerrit sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138008 (owner: 10Rush) [19:51:14] (03CR) 10jenkins-bot: [V: 04-1] facilities sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138009 (owner: 10Rush) [19:51:14] (03CR) 10jenkins-bot: [V: 04-1] removing systemuser definition [operations/puppet] - 10https://gerrit.wikimedia.org/r/138011 (owner: 10Rush) [19:51:32] (03CR) 10jenkins-bot: [V: 04-1] ganglia sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138010 (owner: 10Rush) [19:51:34] bblack: its a chain [19:51:50] ok, just did a deploy manybubbles, better now [19:51:51] hm [19:52:01] I think it tries to build the whole chain together somehow and if it fails then many fail [19:52:01] i guess there will be a chicken/egg problem after all, even with this dependency :/ [19:53:08] ottomata: I can write the "don't start unless you have the plugins check" [19:53:12] its just a property file [19:53:15] hm [19:53:17] that would help [19:54:02] ottomata: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-plugins.html#_mandatory_plugins [19:54:06] you wanna do it:) [19:54:27] the required ones are experimental highlighter, analysis-icu [19:54:32] but you'd have to test it somewhere [19:54:46] I'm probably in the best place to test it what with mediawiki-vagrant [19:55:16] bwaahhhh [19:55:21] don't want to do that right now :) [19:55:23] i mean, test it [19:55:39] testing it [19:55:40] right now [19:56:01] i can do the puppet stuff :) [19:56:07] add params, edit the template, etc. [19:56:18] oh you said you are testing it [19:56:19] ok cool [19:56:34] i would probably do something like [19:56:38] add a param on elasticsearch class [19:56:59] plugin_manditory => ['highllighter', 'analysis-icu', ...] [19:57:08] and make that do the right thing in the yaml erb template [19:57:41] ^d are you about? [19:58:09] ^d: is on vacation [19:58:16] manybubbles: i gotta run for a bit, burt i'll be back on for an hour or so more work [19:58:33] if you tell me the exact values that should be rendered out there (maybe email me?) i can do the puppet work and make it happen for the next node [19:58:37] ottomata: cool - lets see if we can knock that out but not throw the new templates online [19:58:40] ha, or, maybe we should wait til monday anyway...it is firday eve [19:58:42] yeah [19:58:52] can do work today, deploy monday :) [19:59:36] ok coo, ja send me an email, back in a bit [19:59:39] laters [20:01:37] manybubbles: thanks [20:01:37] (03CR) 10jenkins-bot: [V: 04-1] Add dependency for elasticsearch on deployment target for elasticsearchplugins [operations/puppet] - 10https://gerrit.wikimedia.org/r/138012 (owner: 10Ottomata) [20:01:41] (03PS2) 10Ottomata: Add dependency for elasticsearch on deployment target for elasticsearchplugins [operations/puppet] - 10https://gerrit.wikimedia.org/r/138012 [20:01:59] chasemp: sorry - can I help? [20:03:30] so I commited a chain of dependent changsets [20:03:40] say 5 out of 30 had a syntax error, so i fixed it [20:03:43] trying to resubmit [20:04:06] ! [remote rejected] HEAD -> refs/publish/production/bye_systemuser (no changes made) [20:04:22] and I checked 10 times I'm sure I amended the bad commit in the stack [20:04:29] not sure how to make gerrit behave [20:05:14] chasemp: I _think_ when you amend a commit chain all the commits down the chain have to be changed to point at the amend commit/to rebuild the chain [20:05:15] I think [20:05:26] that....is insane [20:05:34] but thank you at least sheds light [20:05:36] chasemp: its not really built for chains and amend [20:05:44] its built for really one or the other [20:06:08] like, some folks will keep adding more and more commit but they'll change things by adding more commits [20:06:31] or other folks (our canonical model) is to have no chains [20:06:43] the other model is actually the standard github model, I think [20:07:19] (03CR) 10Nemo bis: "Thanks! Commented at https://meta.wikimedia.org/w/index.php?diff=8796526&oldid=8796307&rcid=5327490" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/137804 (owner: 10Awight) [20:07:34] so I'm not upset with your or really at all, but it seems like if the only way to have commit depencencies is chains [20:07:45] and you of course can't know you will never need to amend a commit [20:07:57] it seems unwieldy at best [20:08:48] (03PS1) 10BBlack: Add public->private mappings for labs to dnsmasq aliases [operations/puppet] - 10https://gerrit.wikimedia.org/r/138017 [20:10:39] (03PS2) 10BBlack: Add public->private mappings for labs to dnsmasq aliases [operations/puppet] - 10https://gerrit.wikimedia.org/r/138017 [20:11:15] chasemp: yeah, both models are really designed for "I work on one feature and then when it is ready for a review I push it, and work on something else while I wait for a review, then I come back and change it after a review, repeat" chains aren't built into either model properly. [20:11:40] chasemp: when i took a change out of the "middle" of one of those chains i did git rebase -i origin/production, removed all the lines except the last one and uploaded again [20:11:40] having big chains looks like it works, and it does if you never have to amend them, but I don't think either model really works properly for chains [20:20:44] Nemo_bis: thank you for your additions to https://wikitech.wikimedia.org/wiki/HTTPS/Future_work [20:21:42] jzerebecki: :) thank you for concrete work ;) [20:24:04] (03PS2) 10Dzahn: bugzilla sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137995 (owner: 10Rush) [20:24:37] (03PS2) 10Rush: search sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137996 [20:25:01] (03PS2) 10Rush: parsoid sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137997 [20:29:29] (03CR) 10Hashar: "So that is the first step toward bringing DNS split horizon to labs which is totally awesome!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138017 (owner: 10BBlack) [20:36:59] (03PS1) 10Faidon Liambotis: rancid: add asw-ulsfo, mr1-ulsfo [operations/puppet] - 10https://gerrit.wikimedia.org/r/138022 [20:37:20] (03CR) 10Faidon Liambotis: [C: 032] rancid: add asw-ulsfo, mr1-ulsfo [operations/puppet] - 10https://gerrit.wikimedia.org/r/138022 (owner: 10Faidon Liambotis) [20:38:15] (03CR) 10Faidon Liambotis: [V: 032] rancid: add asw-ulsfo, mr1-ulsfo [operations/puppet] - 10https://gerrit.wikimedia.org/r/138022 (owner: 10Faidon Liambotis) [20:39:07] (03PS2) 10Dzahn: otrs sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137998 (owner: 10Rush) [20:45:02] (03PS2) 10Dzahn: logging sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137999 (owner: 10Rush) [20:46:57] (03PS2) 10Dzahn: dataset sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138000 (owner: 10Rush) [20:48:50] (03PS2) 10Dzahn: install-server sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138001 (owner: 10Rush) [20:49:15] (03PS2) 10Dzahn: openstack sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138002 (owner: 10Rush) [20:49:25] (03PS2) 10Dzahn: nfs sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138003 (owner: 10Rush) [20:49:35] (03PS2) 10Dzahn: statistics sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138004 (owner: 10Rush) [20:50:00] (03PS2) 10Dzahn: rancid sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138005 (owner: 10Rush) [20:50:09] (03PS2) 10Dzahn: icinga sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138006 (owner: 10Rush) [20:50:27] (03PS2) 10Dzahn: fundraising sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138007 (owner: 10Rush) [20:50:51] (03PS2) 10Dzahn: gerrit sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138008 (owner: 10Rush) [20:51:00] (03PS2) 10Dzahn: facilities sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138009 (owner: 10Rush) [20:51:08] (03PS2) 10Dzahn: ganglia sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/138010 (owner: 10Rush) [20:51:26] (03PS2) 10Dzahn: removing systemuser definition [operations/puppet] - 10https://gerrit.wikimedia.org/r/138011 (owner: 10Rush) [20:57:49] paravoid: ^ is the general direction ok here? [20:57:56] i saw the earlier discussion and revert [20:58:08] now those are removing generic::systemuser [21:03:54] (03CR) 10Dzahn: [C: 031] "well, yea, if refactoring broke scap on beta it should likely be reverted" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137274 (owner: 10BryanDavis) [21:04:26] (03CR) 10Faidon Liambotis: [C: 04-1] removing systemuser definition (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/138011 (owner: 10Rush) [21:04:48] mutante: the direction is correct, although I haven't reviewed the specifics [21:05:23] hey I'll fix up that one, you are no doubt correct that template password file shouldn't be there [21:05:43] paravoid: ok, just fixed the part that jenkins didn't like so far [21:05:46] also, better commit messages in general :) [21:06:11] "icinga: replace generic::systemuser with user" instead of "icinga sans systemuser", for example [21:07:12] but yes, /me likes :) [21:07:47] chasemp: generic::wikidev-umask might interest you too, btw [21:07:56] I've seen it [21:08:00] not sure what it's about [21:08:00] (03CR) 10BryanDavis: "I don't think any revert for the change that caused this change is warranted. Ori is cleaning up what can best be described as "a pile of " [operations/puppet] - 10https://gerrit.wikimedia.org/r/137274 (owner: 10BryanDavis) [21:08:22] it sets the umask for all wikidevs, so that users create files that are accessible by the group [21:08:23] chasemp: I've got some changes for role/admin.pp that were going to follow my formatting cleanup. want me to deal with the systemuser fix too, since I'm already watching puppet for the OTRS server? [21:08:37] Jeff_Green: sure man, sounds good [21:08:43] ok [21:09:03] paravoid: ah, should that not be handled now that everyone has 500 has their PUG? [21:09:15] no [21:09:23] the umask is being used for permissions of new files [21:09:32] (03CR) 10Dzahn: "ah, don't get me wrong, that comment was _for_ merging this change, it is a (partial) revert of something" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137274 (owner: 10BryanDavis) [21:10:16] 0002 means that files will be g+rw, rather than g+r per the default 0022 [21:10:38] mutante: Ah. Thanks for clarifying. Context is hard sometimes. [21:10:43] ah I see [21:11:01] you want to change it up / do away with ? [21:11:06] dunno yet :) [21:11:23] I'll think about it :) [21:11:29] okay I'll take a peek [21:13:16] says "we don't want another layer of abstraction" is the reason for this [21:13:25] right [21:13:41] (03PS3) 10Dzahn: ganglia - replace generic::systemuser with user [operations/puppet] - 10https://gerrit.wikimedia.org/r/138010 (owner: 10Rush) [21:13:55] hm, maybe we could create a profile.d file that would source e.g. /etc/bashrc.d/$group for each of the user's groups [21:14:19] so that we could have per-group bashrcs as well, instead of just per user [21:14:30] maybe, dunno [21:14:34] I'll think about it some more [21:14:38] (03CR) 10Jgreen: [C: 032 V: 031] otrs sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137998 (owner: 10Rush) [21:15:36] I a now starving going to grab a bite, paravoid thanks for circling back on this, mutante thanks for being you, Jeff_Green thanks just because [21:15:39] later on [21:16:03] (03CR) 10Dzahn: [C: 031] wikistats sans systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/137988 (owner: 10Rush) [21:16:24] chasemp: cya! [21:17:03] hm, that umask issue was mostly for svn [21:17:09] I wonder what git does [21:17:35] obeys the default umask apparently [21:20:34] paravoid: https://gerrit.wikimedia.org/r/#/c/22111/ [21:20:49] and https://gerrit.wikimedia.org/r/#/c/34223/ [21:20:52] (03PS1) 10Jgreen: add Config.pm template for OTRS [operations/puppet] - 10https://gerrit.wikimedia.org/r/138039 [21:20:54] (03PS1) 10Jgreen: add template for OTRS main config file Kernel/Config.pm [operations/puppet] - 10https://gerrit.wikimedia.org/r/138040 [21:21:36] there was a difference between lucid and precise [21:21:44] (i forgot most of this of course :p) [21:22:12] like in lucid the ubuntu profile would have a umask line but in precise it did not [21:24:53] (03PS3) 10Ottomata: Make sure that elasticsearch won't start if required plugins aren't available [operations/puppet] - 10https://gerrit.wikimedia.org/r/138012 [21:25:37] (03CR) 10Dzahn: Add roles for testing swift in labs (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [21:26:07] andrewbogott: it's a labs role but includes passwords::swift::eqiad_prod [21:26:13] labs vs. prod? [21:26:27] it pulls the dummy password from labs private... [21:26:34] aaah [21:26:41] so, it works, and uses a labs-specific password. I agree it's confusing though [21:26:46] PROBLEM - Puppet freshness on gallium is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 18:26:05 UTC [21:26:57] it sounds confusing if it's called "eqiad_prod", ok, yea, just that [21:30:31] (03CR) 10Dzahn: contint: localhost.mediawiki vhost on ci labs slave (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135529 (owner: 10Hashar) [21:31:16] (03CR) 10Manybubbles: [C: 031] Make sure that elasticsearch won't start if required plugins aren't available [operations/puppet] - 10https://gerrit.wikimedia.org/r/138012 (owner: 10Ottomata) [21:31:31] (03CR) 10Dzahn: [C: 032] zuul: bring in python babel and prettytable modules [operations/puppet] - 10https://gerrit.wikimedia.org/r/137552 (owner: 10Hashar) [21:33:12] cool, manybubbles, let's try that out monday, eh? [21:33:23] ottomata: sounds great! [21:33:26] Duplicate definition: Package[apache2-mpm-prefork] is already defined in file /etc/puppet/manifests/webserver.pp [21:33:34] uhm.. recent change on that?^ [21:33:51] looks [21:34:01] wherefore art thou jenkins? [21:34:38] cannot redefine at /etc/puppet/modules/mediawiki/manifests/packages.pp:15 [21:34:53] pretty please, let's fix it before weekend [21:36:59] (03CR) 10Dzahn: "Duplicate definition: Package[apache2-mpm-prefork] is already defined in file /etc/puppet/manifests/webserver.pp at line 75; cannot redefi" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 (owner: 10Hashar) [21:37:14] paravoid: ^ that one merged earlier today broke puppet on gallium [21:37:18] revert or nah? [21:38:31] ori: [21:38:43] mutante: looking [21:39:00] i wonder how that didnt break on integration puppetmaster [21:39:50] ori: it's this part "Make it a bit simpler by including mediawiki::packages as is which brings php-apc." [21:40:00] mediawiki::packages also gets apache2-mpm-prefork [21:40:04] because manifests/webserver.pp:72: if ! defined( Package['apache2-mpm-prefork'] ) { [21:40:08] that's not robust [21:40:12] it relies on the parse order, which is undefined [21:40:16] so it'll work sometimes, but not others [21:40:20] :p [21:40:22] depending on the order in which manifests are evaluated [21:41:15] mutante: let me spend a minute or three trying to disentangle it and if i can't let's revert [21:42:19] is gerrit/jenkins known-dead? [21:42:48] (03PS1) 10Yurik: LABS: Enabled ZeroPortal on betalabs instead of ZRMA [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138049 [21:43:02] Jeff_Green: no, it was still doing things a couple minutes ago.. which change are you on [21:43:09] greg-g, i'll git pull ^^ on tin, its a labs-only [21:43:09] ori: ok! [21:43:26] seems just busy according to graphs at bottom of https://integration.wikimedia.org/zuul/ ? [21:43:45] mutante: https://gerrit.wikimedia.org/r/#/c/138039/ and the one just after [21:43:56] committed ~25 min ago [21:44:02] mutante: there is no simple fix -- at least nothing that i'd want to attempt on a friday afternoon. so i think we should revert. i can add a comment on the patch explaining the issue, if you like. [21:44:02] go jenkins go! [21:44:11] though it says "Queue lengths: 0 events, 0 results." which doesn't seem ture [21:44:24] Jeff_Green: it's working on some parsoid related thing, 3 changes before yours [21:44:42] i've never seen this report page! [21:44:59] ori: sounds good to me [21:45:19] Jeff_Green: probably because one only gets more confused by looking at it :P [21:45:37] yeah, I'm not sure how to read it [21:45:41] ugh, that graph doesnt look good though [21:45:56] see the "gate pipeline" one [21:46:21] the y axis should really range from 0 to Fail [21:46:32] (03CR) 10Yurik: [C: 032] LABS: Enabled ZeroPortal on betalabs instead of ZRMA [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138049 (owner: 10Yurik) [21:46:53] wtf does it do that can possibly take this long? [21:46:55] yurikR2: yah [21:48:38] (03CR) 10Dzahn: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138039 (owner: 10Jgreen) [21:49:42] what is 'recheck' ? [21:49:58] a magic word to make jenkins run again [21:50:06] where do you speak it? [21:50:13] in a gerrit comment [21:50:26] on the same commit? interesting [21:50:50] that's sort of like crawling under a dark bridge to talk to a troll [21:50:57] (03CR) 10Ori.livneh: "webserver::php5 declares the same package with an if ! defined( Package['apache2-mpm-prefork'] ) guard, which is not robust (it depends on" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137921 (owner: 10Hashar) [21:51:05] hrmm.. it really seems stuck [21:51:17] what shall we do.. i know just restarting things wasnt a good idea in the past [21:51:20] jenkins is not happy... [21:51:29] java is doing stuff [21:51:31] i'm just going to force mine. they pass puppet-lint and parser validate [21:52:04] (03CR) 10Jgreen: [C: 032 V: 032] "sigh jenkins: "java is doing stuff"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138040 (owner: 10Jgreen) [21:52:14] (03PS1) 10Ori.livneh: Revert "contint: reduce duplication with mediawiki::packages" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138097 [21:52:30] All the slaves at https://integration.wikimedia.org/ci/ are showing idle, so zuul or german is stuck [21:52:37] it seems it got in trouble on the [21:52:40] https://integration.wikimedia.org/ci/job/parsoidsvc-deploy-parsertests-run-harder/69/ [21:52:47] parsoid "run harder" thing [21:52:48] (03CR) 10Jgreen: [C: 032 V: 032] "sigh jenkins: "java is doing stuff"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138039 (owner: 10Jgreen) [21:53:52] i hit rebuild on #69 [21:53:55] it started #70 [21:54:07] Krinkle: Around? zull/gearman isn't sending jobs into jenkins [21:54:17] s/zull/zuul/ [21:54:54] (03CR) 10Yurik: "recheck" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138049 (owner: 10Yurik) [21:55:03] bd808: Hi [21:55:12] nah, it's dead jim [21:56:17] * Jeff_Green checked the OTRS template and it deployed correctly, so punting for the weekend [21:56:18] ah, gearman crash? [21:56:27] error: [Errno 104] Connection reset by peer [21:56:59] !log Took Jenkins slave on gallium temporarily offline and back online to resolve possible stagnation [21:57:04] Logged the message, Master [21:57:46] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.37 [22:00:45] (03CR) 10Dzahn: [C: 031] "yes, on gallium: Package[apache2-mpm-prefork] is already defined in file /etc/puppet/manifests/webserver.pp at line 75; cannot redefine at" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138097 (owner: 10Ori.livneh) [22:00:51] zuul.log: [22:00:53] raise NoConnectedServersError("No connected Gearman servers") [22:00:53] NoConnectedServersError: No connected Gearman servers [22:01:08] Krinkle: yea, and gearman.log [22:01:17] eh gearman-server.log [22:01:23] error: [Errno 104] Connection reset by peer [22:01:25] Yep, saw it just now [22:01:41] I don't know gearman [22:03:30] it's a plugin in jenkins, not it's own service it seems [22:04:22] afaik it is its own service, Jenkins' plugin connects to it to use it [22:04:26] it's a generic job schedule system [22:04:33] https://bugzilla.wikimedia.org/show_bug.cgi?id=63760 [22:04:34] but maybe it bundles it [22:05:09] "Disconnecting and reconnecting the gearman client does unleash a few jobs." [22:05:48] (03CR) 10Ori.livneh: [C: 032 V: 032] Revert "contint: reduce duplication with mediawiki::packages" [operations/puppet] - 10https://gerrit.wikimedia.org/r/138097 (owner: 10Ori.livneh) [22:05:57] Krinkle: yea, i just said that because hashar mentioned a bug in "the Jenkins Gearman plugin" somewhere in SAL [22:06:36] how to disconnect and reconnect the client? [22:07:07] and it looks like hashar restarted jenkins/zuul (on that bug) [22:07:11] so let's try that? [22:08:26] RECOVERY - Puppet freshness on gallium is OK: puppet ran at Fri Jun 6 22:08:18 UTC 2014 [22:09:07] ori: :) [22:11:02] Krinkle: let's restart jenkins/zuul-server ? [22:11:12] I just restarted zuul [22:11:14] i was hesitant because in some cases hashar said it made things worse [22:11:22] but on the other hand. the bug sounds like that was done last time [22:11:27] Doing a gearman health check from within jenkins yields 'ok' [22:11:28] ok [22:11:32] cool [22:11:37] Jenkins restart is last resort, is very slow, and will nuke the queue. [22:11:49] but may be needed. This setup is hell unstable. [22:11:56] I want to strangle it, every day. [22:17:04] hella * [22:17:16] !log upgraded ssl packages on zirconium [22:17:21] Logged the message, Master [22:22:57] (03CR) 10Yurik: "recheck" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138049 (owner: 10Yurik) [22:35:04] !log same for holmium, hafnium, silver, netmon1001, magnesium, neon, antimony [22:35:09] Logged the message, Master [22:35:39] mutante is a machine :) [22:36:41] !log Restarting stuck Jenkins [22:36:46] Logged the message, Mr. Obvious [22:38:14] (03CR) 10Yurik: "recheck" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138049 (owner: 10Yurik) [22:38:51] bleh, oh well, i guess i will sync it with tin later :) [22:42:57] !log Restarting Jenkins didn't help, jobs still aren't making it across from Zuul into Jenkins [22:43:02] Krinkle|detached: ---^^ [22:43:03] Logged the message, Mr. Obvious [22:43:46] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [22:46:52] James_F: Sorry for the wrong email assign :p [22:50:47] JohnLewis: No worries. :-) [22:51:59] !log same for rhenium, titanium, bast1001, calcium, carbon, ytterbium, stat1003 [22:52:03] Logged the message, Master [22:55:00] RoanKattouw: well fuck [22:55:25] Krinkle: At this point I have no idea what's going on [22:55:30] RoanKattouw: cron-timed jobs work fine [22:55:37] this sucket is working along all the time https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ [22:55:39] Yeah the beta jobs are running [22:55:48] That's the only thing Jenkins is running [22:55:53] though to be fair, those circumvent every single part of the infrastructure [22:55:57] Both before and after I restarted it [22:55:58] haha [22:56:07] remember, we don't use jenkins for anything other than storing build logs for jobs from zuul [22:56:16] that's all gerrit, zuul and gearman [22:56:34] so those timed jobs actually use jenkins [23:01:47] right [23:22:05] Krinkle: Any luck with Jenkins? [23:22:13] debugging.. [23:24:35] Looks like it is at least complaining about one thing, prolly unrelated [23:24:44] "(X) It appears that your reverse proxy set up is broken. " [23:24:50] redirect isn't set up properly [23:24:55] and ssl cert is bad too [23:25:04] XMLHttpRequest cannot load http://integration.wikimedia.org/ci/administrativeMonitor/hudson.diagnosis.ReverseProxySetupMonitor/test-for-reverse-proxy-setup. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'https://integration.wikimedia.org' is therefore not allowed access. [23:25:06] that wasn't there before [23:25:10] http/https redirect is bad [23:41:26] RoanKattouw: Hm.. looking through gearman-server.log.2014-* [23:41:32] they are mostly empty [23:41:37] containing only this very error [23:41:41] so it's definitely happened before [23:42:07] every other day or so [23:42:08] wtf [23:51:13] !log Restarted Jenkins, force stopped Zuul, started Zuul, configure Jenkins via web interface (disable Gearman, save, enable German); Seems to be back up now, finally. [23:51:18] Logged the message, Master