[00:00:16] (03PS5) 10Yuvipanda: toollabs: Initial work for the mongo role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135442 [00:00:18] (03PS3) 10Yuvipanda: mongo: Support newer yaml style configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 [00:02:07] (03PS4) 10Yuvipanda: mongo: Support newer yaml style configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 [00:06:19] ori: yeah, I don't think the stdlib merge() supports deep merges, and even then lists would be a problem [00:06:21] brrr [00:06:43] (03CR) 10Tim Landscheidt: ""echo Test | pastebinit -b tools.wmflabs.org/paste" => "Unknown website, please post a bugreport to request this pastebin to be added (too" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135500 (owner: 10Tim Landscheidt) [00:06:57] scfc_de: https [00:07:03] or http [00:08:34] Now I wanted to complain about it being hanging, when I remembered something about tools.wmflabs.org :-). [00:09:56] Worked: https://tools.wmflabs.org/paste/view/60474ad3 [00:12:59] (03CR) 10Tim Landscheidt: ""echo Test | pastebinit -b http://tools.wmflabs.org/paste" is the magic phrase. Tested on Toolsbeta." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135500 (owner: 10Tim Landscheidt) [00:15:38] (03PS5) 10Yuvipanda: mongo: Support newer yaml style configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 [00:15:57] scfc_de: :) [00:19:00] (03PS6) 10Yuvipanda: mongo: Support newer yaml style configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 [00:20:14] (03PS7) 10Yuvipanda: mongo: Support newer yaml style configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 [00:27:37] (03PS6) 10Yuvipanda: toollabs: Initial work for the mongo role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135442 [00:27:39] (03PS8) 10Yuvipanda: mongo: Support newer yaml style configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 [00:35:16] (03PS7) 10Yuvipanda: toollabs: Initial work for the mongo role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135442 [00:35:18] (03PS9) 10Yuvipanda: mongo: Support newer yaml style configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 [00:37:01] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:40:21] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:40:51] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.011 second response time [00:41:11] RECOVERY - gdash.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 8758 bytes in 0.022 second response time [00:52:56] (03PS3) 10Springle: Set $wgCategoryCollation to 'uca-cs' on cswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134103 (owner: 10Manybubbles) [00:55:01] (03CR) 10Springle: [C: 032] Set $wgCategoryCollation to 'uca-cs' on cswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134103 (owner: 10Manybubbles) [00:58:45] (03CR) 10BryanDavis: [C: 031] "Sounds good to me. Let's see if this works as expected." [operations/puppet] - 10https://gerrit.wikimedia.org/r/125184 (owner: 10Ottomata) [00:59:24] (03PS8) 10Yuvipanda: toollabs: Initial work for the mongo role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135442 [01:02:31] going webscale, YuviPanda?:P [01:02:43] MaxSem: :D yeah, tools is going webscale :) [01:05:49] MaxSem: needed to build something new to take my mind off all this visa crap, so... [01:05:58] MaxSem: plus I think having mongo accessible on tools would do good [01:06:20] now you just need to make joins with mysql... [01:07:12] MaxSem: :P [01:07:29] MaxSem: this would run analogous to tools-db, which can't join with mysql anyway [01:08:00] what aboutt federated tables? [01:08:52] MaxSem: not that either :) [01:09:05] this is kinda like making redis available on tools [01:09:15] not a replacement for mysql, but to augment it for different use cases [01:10:37] (03PS9) 10Yuvipanda: toollabs: Initial work for the mongo role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135442 [01:10:39] (03PS10) 10Yuvipanda: mongo: Support newer yaml style configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 [01:12:37] YuviPanda: would be interesting to see labs test TokuMX [01:12:48] springle: checking [01:12:54] see if the hype about tokumx vs mongodb is true [01:12:58] oooh, intersting [01:13:11] we're testing TokuDB in place of InnoDB for mysql [01:13:23] fractal indexes are impressive there [01:13:31] nice! [01:13:40] potential replacement at some point in the far future? [01:13:47] though it cannot yet beat innodb for sheer speed if data is already in memeory [01:14:01] well, don't know [01:14:07] i've not used tokumx yet [01:14:07] right [01:14:21] or do you mean tokudb? that's only on analytics boxes so far [01:14:25] I meant tokudb, yeah [01:14:35] not tokumx. [01:16:18] springle: for toollabs, I think I'll stick to Mongo for now - since it is just starting to be a service and I'd rather stick to something with a bigger community. But I'll definitely keep an eye on TokuMX and replace it at some point... [01:16:24] springle: thanks for pointing it out! [01:16:27] sure :) [01:16:37] wise approach [01:17:06] springle: :) [01:17:16] springle: do review https://gerrit.wikimedia.org/r/#/c/135499/8 *if* you have the time :) [01:19:02] ok [01:19:09] springle: if you get a minute, can you look at parent5446's concern on https://gerrit.wikimedia.org/r/#/c/135283/ ? [01:27:42] jackmcbarn: done. i think your answer is correct [01:27:59] kk, thanks [01:28:15] or at least a logical assumption until we see real api traffic results [01:29:49] springle: what's the usual backup strategy for databases like these? dump from a slave? [01:30:18] which databases? [01:31:00] springle: in this case I'm about to write one for mongodb, but what do we do for mysql? [01:32:02] in production we have 8h lvm snapshots, weekly logical dumps from slaves, and the public xml dumps (don't know their frequency - monthly?) [01:32:36] yeah, monthly [01:32:37] hmm [01:32:54] springle: do you know if tools-db is backed up in the same way? it used to be on a VM but is on real hardware now... [01:33:32] no, i don't know about tools-db. we'd have to ask Coren [01:33:56] It's not. [01:34:08] right. so the status quo from pmtpa remains [01:34:13] Though I'm planning on having weekly dumps. [01:34:26] hmm, doesn't it have some form of lvm on it we can use to just do snapshots? [01:36:00] !log springle synchronized wmf-config/InitialiseSettings.php '$wgCategoryCollation to uca-cs on cswiki' [01:36:10] Logged the message, Master [01:39:29] !log starting updateCollation on s2 cs-wiki from tin [01:39:35] Logged the message, Master [02:09:07] (03PS1) 10Tim Landscheidt: Tools: Install user-requested packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/135505 (https://bugzilla.wikimedia.org/61445) [02:14:11] !log LocalisationUpdate completed (1.24wmf5) at 2014-05-27 02:13:08+00:00 [02:14:20] Logged the message, Master [02:16:27] (03CR) 10Tim Landscheidt: "Tested on Toolsbeta; Tesseract OCR takes up about 661 MByte, so bearable." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135505 (https://bugzilla.wikimedia.org/61445) (owner: 10Tim Landscheidt) [02:25:55] !log LocalisationUpdate completed (1.24wmf6) at 2014-05-27 02:24:51+00:00 [02:25:59] Logged the message, Master [02:53:51] (03CR) 10Ori.livneh: [C: 04-1] mongo: Support newer yaml style configuration (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 (owner: 10Yuvipanda) [03:13:36] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue May 27 03:12:29 UTC 2014 (duration 12m 28s) [03:13:41] Logged the message, Master [03:24:04] (03CR) 10Phe: [C: 031] Tools: Install user-requested packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/135505 (https://bugzilla.wikimedia.org/61445) (owner: 10Tim Landscheidt) [05:00:39] (03PS1) 10Springle: The job control approach doesn't work if a script is in non-interactive mode, which seems to be the case for /bin/sh (though not bash?). Also that method waited for the oldest job to die which is not efficient if the oldest job is a large wiki (which is m [operations/puppet] - 10https://gerrit.wikimedia.org/r/135517 [05:05:08] (03PS2) 10Springle: The job control approach doesn't work if a script is in non-interactive mode, which seems to be the case for /bin/sh (though not bash?). [operations/puppet] - 10https://gerrit.wikimedia.org/r/135517 [05:15:08] (03CR) 10Springle: "I went to find a cswiki dump today and found the files empty. I think this jobs/wait bug caused all the dumps to start at once with most f" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135517 (owner: 10Springle) [05:26:41] PROBLEM - Parsoid on wtp1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:31] PROBLEM - Parsoid on wtp1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:21] PROBLEM - Parsoid on wtp1015 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:31] PROBLEM - Parsoid on wtp1016 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:31] PROBLEM - Parsoid on wtp1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:41] PROBLEM - Parsoid on wtp1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:41] PROBLEM - Parsoid on wtp1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:01] PROBLEM - Parsoid on wtp1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:01] PROBLEM - Parsoid on wtp1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:01] PROBLEM - Parsoid on wtp1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:11] PROBLEM - Parsoid on wtp1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:01] PROBLEM - Parsoid on wtp1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:01] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:31] RECOVERY - Parsoid on wtp1005 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.446 second response time [05:31:41] PROBLEM - Parsoid on wtp1018 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:51] PROBLEM - Parsoid on wtp1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:51] PROBLEM - Parsoid on wtp1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:58] hmmm [05:32:01] PROBLEM - Parsoid on wtp1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:32:17] parsoid load is *very* high [05:32:21] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:33:01] PROBLEM - Parsoid on wtp1008 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:33:01] PROBLEM - Parsoid on wtp1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:33:14] wow [05:33:22] PROBLEM - Parsoid on wtp1014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:33:31] PROBLEM - Parsoid on wtp1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:33:36] load climbing for last 30 mins [05:33:41] PROBLEM - Parsoid on wtp1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:33:41] PROBLEM - LVS HTTP IPv4 on parsoid.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:33:51] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.018 second response time [05:34:11] RECOVERY - gdash.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 8758 bytes in 0.021 second response time [05:34:21] PROBLEM - Parsoid on wtp1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:34:41] PROBLEM - Parsoid on wtp1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:34:41] PROBLEM - Parsoid on wtp1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:35:11] RECOVERY - Parsoid on wtp1014 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 1.002 second response time [05:35:41] PROBLEM - Parsoid on wtp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:11] hmm. what can be done for parsoid? [05:37:31] RECOVERY - Parsoid on wtp1021 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.003 second response time [05:37:39] ok, i just got the page that came through [05:38:09] I'll restart the parsoids [05:38:24] wow, lookit that spike [05:38:29] didn't see anything obvious in the logs [05:38:33] of course springle already mentioned that [05:38:41] might be worth looking into the varnish logs [05:39:31] RECOVERY - Parsoid on wtp1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.010 second response time [05:39:40] looks like somebody flooded parsoid with expensive requests [05:40:01] PROBLEM - LVS HTTP IPv4 on parsoid-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:40:14] restarts are slow [05:40:21] RECOVERY - Parsoid on wtp1002 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.008 second response time [05:40:24] 2/24 so far [05:40:41] i take it its sequential, hence the clearing so far? [05:41:21] RECOVERY - Parsoid on wtp1003 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.009 second response time [05:41:31] robh, switched to parallel now [05:42:11] they are still slow, as our restarts are normally graceful (wait for requests to finish) [05:42:11] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:42:21] RECOVERY - Parsoid on wtp1016 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.008 second response time [05:42:21] RECOVERY - Parsoid on wtp1013 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.014 second response time [05:42:31] RECOVERY - Parsoid on wtp1012 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.009 second response time [05:42:31] RECOVERY - Parsoid on wtp1024 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.004 second response time [05:42:31] RECOVERY - Parsoid on wtp1010 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.005 second response time [05:42:31] RECOVERY - Parsoid on wtp1005 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.009 second response time [05:42:31] RECOVERY - Parsoid on wtp1018 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.008 second response time [05:42:32] RECOVERY - LVS HTTP IPv4 on parsoid.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.009 second response time [05:42:41] RECOVERY - Parsoid on wtp1004 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.008 second response time [05:42:41] RECOVERY - Parsoid on wtp1017 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.015 second response time [05:42:50] !log restarted parsoids after load surge [05:42:51] RECOVERY - Parsoid on wtp1011 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.011 second response time [05:42:51] RECOVERY - LVS HTTP IPv4 on parsoid-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 1289 bytes in 0.002 second response time [05:42:54] RECOVERY - Parsoid on wtp1020 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.006 second response time [05:42:54] RECOVERY - Parsoid on wtp1019 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.007 second response time [05:42:54] RECOVERY - Parsoid on wtp1008 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.008 second response time [05:42:54] RECOVERY - Parsoid on wtp1009 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.012 second response time [05:42:55] RECOVERY - Parsoid on wtp1023 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.020 second response time [05:42:55] Logged the message, Master [05:43:02] RECOVERY - Parsoid on wtp1007 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.014 second response time [05:43:02] RECOVERY - Parsoid on wtp1006 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.005 second response time [05:43:11] RECOVERY - Parsoid on wtp1015 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.003 second response time [05:43:11] RECOVERY - Parsoid on wtp1022 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.016 second response time [05:43:21] gwicke: well, that certainly fixed the symptoms =] [05:43:21] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:44:11] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 8.441 second response time [05:44:11] RECOVERY - gdash.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 8758 bytes in 0.013 second response time [05:45:59] I've copied the current parsoid logs to /tmp on all machines [05:46:13] can investigate those later [05:47:12] good, cuz i was actually mostly asleep when the page woke me up [05:47:31] it pages till 11pm and silly me falling asleep minutes before! [05:48:15] meh [05:48:20] I was also about to head to bed [05:49:02] still no clue what / who caused this [05:50:00] might also be hard to figure out without persistent logging in the front-end varnishes [05:50:56] well, thank you for responding and restarting the parsoid boxen [05:51:25] its nice to walk into a page and someone is already taking point on a remedy [05:52:03] yeah, np [05:52:31] saw the highlight on IRC when I was just about to shut my laptop down [05:55:09] well, its logged that it happened and what was done [05:55:16] so i think the idea of shutting down is an excellent one [05:55:21] so doing just that [05:55:45] robh, see you tomorrow! [05:55:56] & sorry for the interrupted sleep [06:19:59] (03CR) 10devunt: [C: 031] Tools: Install user-requested packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/135505 (https://bugzilla.wikimedia.org/61445) (owner: 10Tim Landscheidt) [08:05:42] (03PS1) 10Odder: Update Stu West feed address for English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135525 [08:22:23] (03CR) 10Nemo bis: [C: 031] Update Stu West feed address for English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135525 (owner: 10Odder) [08:25:04] (03PS1) 10Nemo bis: Add Mike Linksvayer [operations/puppet] - 10https://gerrit.wikimedia.org/r/135526 [08:25:31] (03PS2) 10Nemo bis: Add Mike Linksvayer to the English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135526 [08:40:24] (03PS1) 10Hashar: contint: localvhost requires mod_rewrite [operations/puppet] - 10https://gerrit.wikimedia.org/r/135527 [08:49:09] (03CR) 10Hashar: [C: 031 V: 031] "Cherry picked on labs puppetmaster for the integration project. Works :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135527 (owner: 10Hashar) [09:07:38] (03PS1) 10Hashar: contint: localhost.mediawiki vhost on ci labs slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/135529 [09:09:11] Cannot alias Apache_module[mediawiki_apache_mod_rewrite] to ["rewrite"] at /etc/puppet/modules/contint/manifests/localvhost.pp:38; resource ["Apache_module", "rewrite"] already defined [09:09:14] I am doooomed [09:09:24] (03CR) 10QChris: [C: 031] Remove more noise. [operations/puppet] - 10https://gerrit.wikimedia.org/r/134984 (owner: 10Dr0ptp4kt) [09:10:24] * qchris offers a chicken to hashar. Sacrificing that will undoom you. [09:11:18] I can't believe how much time I am wasting with puppet [09:11:28] (03PS2) 10Giuseppe Lavagetto: puppet_compiler: add ferm rule to allow web access [operations/puppet] - 10https://gerrit.wikimedia.org/r/135050 [09:11:28] :-) [09:12:24] <_joe_> hashar: it's all time I save [09:12:43] <_joe_> since I don't have to keep consistency around by hand :) [09:13:42] _joe_: you didn't reply to my comment on the ferm rule patch [09:14:12] <_joe_> matanya: wait a second, the answer arrives :) [09:14:21] :) [09:15:06] (03CR) 10Giuseppe Lavagetto: puppet_compiler: add ferm rule to allow web access (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135050 (owner: 10Giuseppe Lavagetto) [09:21:19] (03Abandoned) 10Hashar: contint: localvhost requires mod_rewrite [operations/puppet] - 10https://gerrit.wikimedia.org/r/135527 (owner: 10Hashar) [09:21:28] (03PS2) 10Hashar: contint: localhost.mediawiki vhost on ci labs slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/135529 [09:23:24] (03PS3) 10Hashar: contint: localhost.mediawiki vhost on ci labs slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/135529 [09:26:19] (03CR) 10Hashar: [C: 031 V: 032] "cherry picked on labs :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135529 (owner: 10Hashar) [09:47:34] computer should have a build in feature to prevent us from copy pasting code [09:55:50] lunch [09:56:50] (03PS1) 10QChris: Add hcatalog-core to hive path [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/135539 [09:59:12] (03CR) 10QChris: "I guess we cannot force other consumers of this repo to also" [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/135539 (owner: 10QChris) [10:00:16] (03CR) 10JanZerebecki: "Do you mean the cpu load on the servers nginx terminates SSL on might get too high?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [10:31:25] (03PS1) 10Filippo Giunchedi: ship {texvc,texvccheck} via mediawiki-math-texvc [operations/puppet] - 10https://gerrit.wikimedia.org/r/135544 [10:32:41] (03CR) 10Filippo Giunchedi: "probably ocaml and make can be removed, not sure if something else uses those though" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135544 (owner: 10Filippo Giunchedi) [11:29:11] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:29:21] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:34:35] mhh there are many /usr/bin/python /srv/deployment/reporter/reporter/report.py spawned by apache on tungsten [11:40:01] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.012 second response time [11:40:11] RECOVERY - gdash.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 8758 bytes in 0.016 second response time [11:40:15] !log restart apache2 on tungsten, many report.py hung [11:40:20] Logged the message, Master [12:05:18] looks like they were stuck fetching xml from mwprof, some were quite old too [12:06:00] <_joe_> godog: why are they called from apache2? [12:07:05] _joe_: it is aliased to performance.wikimedia.org/profiler/report [12:07:31] <_joe_> oh yeah, report.py [12:07:37] <_joe_> sorry [12:08:06] <_joe_> well if that's still supported, we should include a timeout for calls to mwprof [12:08:32] <_joe_> (or, make it a cronjob and not make it run on demand) [12:08:39] <_joe_> (or both) [12:12:57] ori: ^ [12:18:49] (03CR) 10Hashar: "I am nitpicking but would you mind prefixing the groups with 'contint' instead of 'jenkins'? Jenkins is only one brick." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 (owner: 10Dzahn) [12:19:25] !log Created SecurePoll tables on zerowiki, legalteamwiki, zhwikivoyage, viwikivoyage, tyvwiki [12:19:31] Logged the message, Master [12:37:18] (03CR) 10Ottomata: "Hm, ok, but let's parameterize it then. Make a parameter that defaults to undef for this property on the cdh4::hive class, and then only " [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/135539 (owner: 10QChris) [12:51:50] (03PS1) 10Phuedx: Disable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135560 [13:52:18] (03PS1) 10Cmjohnson: adding bonded ports to labstore1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135568 [13:53:51] coren please review that ^ [13:54:04] On it [13:54:59] cmjohnson1: Is it just me or is this mixed in with a decomission? [13:55:18] it is mixed...i don't know where that happened [13:56:46] okay..forgot I was working on something last week [13:58:26] (03Abandoned) 10Cmjohnson: adding bonded ports to labstore1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135568 (owner: 10Cmjohnson) [14:00:12] (03PS1) 10Cmjohnson: adding bonded port labstore1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135571 [14:03:09] (03CR) 10coren: [C: 031] "The change, she makes sense. (But needs to be applied during Friday's scheduled downtime)." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135571 (owner: 10Cmjohnson) [14:03:36] WMF, Chris. [14:03:51] I take it it's physically wired now? [14:04:01] coren: yes [14:04:06] and nic is enabled [14:05:02] cmjohnson1: ip links sees the electrons at the other end. [14:05:13] So we be all good. [14:05:44] cool...it's yours to merge whenever ready [14:52:28] James_F, jackmcbarn, aharoni, legoktm, JohnLewis, anomie: SWAT in 10 minutes. Please confirm you're here and ready to check your respective patches. [14:52:31] anomie: I'm here [14:52:36] anomie: here [14:52:41] anomie: here [14:53:41] (03PS3) 10Hashar: planet: Add Mike Linksvayer to the English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135526 (owner: 10Nemo bis) [14:54:23] anomie: Yes. [14:57:42] * anomie hopes Jenkins isn't slow this morning, we have a lot of patches for SWAT [14:58:47] Shalom [14:58:52] * Nemo_bis runs to comment "recheck" on a hundred patches [15:00:14] James_F: Ok, let's start with the VE regression fix [15:00:23] Cool. [15:01:33] anomie: I'm here [15:01:59] legoktm: Good. We'll do your fatal fixes (and mine too) after the VE fix. [15:02:32] sounds good [15:03:37] Then jackmcbarn's message fix, then aharoni's ULS fix, then JohnLewis's config change. [15:03:51] kk [15:04:06] Busy morning. [15:04:09] anomie: Sounds good, best for last? ;) [15:04:22] Something to do with all the SWATs for yesterday needed to be done now, I suppose. [15:04:50] JohnLewis: Bug fixes before config changes ;) [15:05:41] anomie: Can we say best for last to give me a confidence boost? :p [15:05:46] !log anomie synchronized php-1.24wmf6/extensions/VisualEditor/modules/ve-mw/ 'SWAT: Fix for VisualEditor image alignment regression [[gerrit:135171]]' [15:05:47] JohnLewis: Sure [15:05:48] James_F: ^ Test please [15:05:49] Logged the message, Master [15:07:05] * James_F waits for bits to purge. [15:08:25] anomie: Looks to be working. Thanks. [15:08:31] James_F: Thanks [15:13:53] !log anomie synchronized php-1.24wmf5/includes/revisiondelete/ 'SWAT: Revert another visibility change that causes fatal errors [[bugzilla:65733]] [[gerrit:135388]]' [15:13:57] legoktm: ^ Check wmf5 wikis, please [15:13:58] Logged the message, Master [15:14:21] * legoktm finds something to delete [15:15:17] anomie: works! [15:15:29] 15:15, 27 May 2014 Legoktm (talk | contribs | block) changed visibility of a revision on page File:A Theory of Everything.jpg: content hidden (Orphaned non-free file(s) deleted per F5) (more...) [15:15:49] !log anomie synchronized php-1.24wmf6/includes/revisiondelete/ 'SWAT: Revert another visibility change that causes fatal errors [[bugzilla:65733]] [[gerrit:135389]]' [15:15:53] legoktm: Good! Now check wmf6 please [15:15:54] Logged the message, Master [15:18:46] anomie: (change visibility) 15:18, 27 May 2014 Legoktm (talk | contribs | block) changed visibility of a revision on page File:Iiiii.jpeg: content hidden (change visibility) on testwiki [15:18:51] thanks! [15:21:10] !log anomie synchronized php-1.24wmf6/includes/HistoryBlob.php 'SWAT: Revert another visibility change that causes errors [[bugzilla:65665]] [[gerrit:135574]]' [15:21:13] anomie: ^ Check please [15:21:15] Logged the message, Master [15:21:20] anomie: Looks good [15:23:08] halloo Nikerabbit [15:25:42] anomie: crap, V-1 on one of mine [15:25:52] aharoni: iltaa [15:26:26] 15:23:24 ERROR: Couldn't find any revision to build. Verify the repository and branch configuration for this job. [15:26:36] jackmcbarn: Looks like Jenkins screwed up, let me try merging it again [15:27:13] !log anomie synchronized php-1.24wmf5/includes/Title.php 'SWAT: Check correct message in catego [15:27:17] jackmcbarn: ^ Test on wmf5, please [15:27:18] Logged the message, Master [15:27:37] thanks for your swating this morning, you were/are busy [15:28:02] so much for the "max 8 patches", eh? ;) [15:28:16] anomie: works [15:28:17] greg-g: I think we're right at 8, actually. [15:28:44] jackmcbarn: Good! Looks like Jenkins found the revision for the wmf6 version this time, too [15:29:01] I count 9, but splitting hairs :) [15:29:04] legoktm: can you G6 delete [[Category:Id1f03240c203f32a12953f49a075cfd5c25f0f31]] and [[Category:Id1f03240c203f32a12953f49a075cfd5c25f0f31 moved]] before the WP:AN angry mob comes after me? [15:29:07] greg-g: We were at 7 when I put my patches on, andrewbogott added 1 after me :p [15:29:13] :) [15:29:16] anomie ^ I should say [15:29:24] yeah [15:29:28] jackmcbarn: doing [15:30:05] greg-g: You're right. But a number are the same patch going to both wmf5 and wmf6 [15:30:20] (in which case I still miscounted) [15:30:24] greg-g: If we have more, slap the person going 'don't be mean to others, schedule at once!' :p [15:32:02] !log anomie synchronized php-1.24wmf6/includes/Title.php 'SWAT: Check correct message in category moving [[gerrit:135211]]' [15:32:03] greg-g: And everyone patching extensions had their update-extension-in-core patches queued up, which speeds things up. And Jenkins isn't being too slow. [15:32:06] Logged the message, Master [15:32:07] jackmcbarn: ^ Test wmf6 please [15:32:32] anomie: works [15:34:36] hi anomie [15:34:40] I see you started merging. [15:34:49] aharoni: Yes, yours are going now [15:35:36] I put what you taught me three weeks or so ago to good use. I think, you tell me :) [15:36:18] aharoni: Yeah, you did them good [15:36:25] \o/ [15:39:53] awesome [15:40:01] !log anomie synchronized php-1.24wmf6/extensions/UniversalLanguageSelector/resources/ 'SWAT: Update ULS to fix beta feature [[gerrit:135310]]' [15:40:05] aharoni: ^ Test wmf6 please [15:40:05] Logged the message, Master [15:40:13] looking [15:42:03] !log anomie synchronized php-1.24wmf5/extensions/UniversalLanguageSelector/resources/ 'SWAT: Update ULS to fix beta feature [[gerrit:135535]]' [15:42:07] Logged the message, Master [15:42:09] aharoni: ^ wmf5 too [15:42:18] ack, looking at that, too [15:42:53] wmf5 is good [15:43:44] hmmm [15:43:49] commons is good, but not enwiki [15:44:00] both are wmf5, aren't they? [15:44:20] anomie: ^ [15:44:55] aharoni: Yes, both are wmf5 (until this afternoon) [15:45:15] so, just a sec. [15:45:28] mediawiki.org is wmf6, right? it works well. [15:45:32] aharoni: "not enwiki" as in it doesn't seem to have the fix, or that it's broken in a new and different way? [15:45:38] aharoni: Yes, mediawiki.org is wmf6 [15:45:48] doesn't seem to have the fix [15:45:58] it doesn't seem to work. [15:46:06] let me check whether the code appears... [15:46:34] (03CR) 10Anomie: [C: 032] Move-categorypages permission changes on fawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135426 (https://bugzilla.wikimedia.org/65728) (owner: 10John F. Lewis) [15:46:50] anomie: it works now [15:46:52] (03Merged) 10jenkins-bot: Move-categorypages permission changes on fawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135426 (https://bugzilla.wikimedia.org/65728) (owner: 10John F. Lewis) [15:47:01] maybe it needed a few moments to sync or something [15:47:09] thanks, everything seems to be done [15:47:27] aharoni: Good! Must have been caches or something [15:48:29] !log anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: move-categorypages permission changes on fawiki [[gerrit:135426]]' [15:48:32] JohnLewis: ^ Test please [15:48:34] Logged the message, Master [15:48:46] anomie: {{confirmed}} [15:48:55] * anomie is done with SWAT! [15:49:21] Thanks everyone [15:49:38] Thanks anomie! [15:51:09] Thanks anomie [15:54:11] PROBLEM - Parsoid on wtp1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:41] PROBLEM - Parsoid on wtp1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:57:31] PROBLEM - Parsoid on wtp1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:57:41] PROBLEM - Parsoid on wtp1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:57:59] <^d> Something up with graphite? [15:58:51] PROBLEM - Parsoid on wtp1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:01] PROBLEM - Parsoid on wtp1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:31] PROBLEM - Parsoid on wtp1016 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:41] PROBLEM - Parsoid on wtp1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:01:01] PROBLEM - Parsoid on wtp1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:01:31] PROBLEM - Parsoid on wtp1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:01] PROBLEM - Parsoid on wtp1008 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:02] PROBLEM - Parsoid on wtp1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:08] this happened a few hours ago (05:42) and gwicke responded by restarting the parsoids. we should probably do that again. [16:02:31] PROBLEM - Parsoid on wtp1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:41] PROBLEM - Parsoid on wtp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:45] gwicke captured the logs from the previous surge, so i wouldn't be too worried about losing the ability to debug [16:03:01] PROBLEM - Parsoid on wtp1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:03:17] damn, it happens again [16:03:21] restarting parsoids.. [16:03:21] PROBLEM - Parsoid on wtp1014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:03:21] PROBLEM - Parsoid on wtp1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:03:22] PROBLEM - Parsoid on wtp1015 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:03:42] ahhhh [16:03:51] PROBLEM - Parsoid on wtp1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:04:11] PROBLEM - Parsoid on wtp1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:04:11] RECOVERY - Parsoid on wtp1015 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.009 second response time [16:04:11] RECOVERY - Parsoid on wtp1014 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.009 second response time [16:04:11] RECOVERY - Parsoid on wtp1022 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.057 second response time [16:04:13] Hi guys, I was wondering if I'm missing something about Varnish and $u = User::newFromName('Myuser', 'creatable'); $u->load(); /* setters... */ $u->saveSettings(); $u>setCookies(); [16:04:21] RECOVERY - Parsoid on wtp1016 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.011 second response time [16:04:21] RECOVERY - Parsoid on wtp1003 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.031 second response time [16:04:22] RECOVERY - Parsoid on wtp1002 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.010 second response time [16:04:22] RECOVERY - Parsoid on wtp1013 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.016 second response time [16:04:26] !log restarted parsoids after another surge in load [16:04:31] Logged the message, Master [16:04:31] RECOVERY - Parsoid on wtp1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.008 second response time [16:04:31] RECOVERY - Parsoid on wtp1010 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.013 second response time [16:04:31] RECOVERY - Parsoid on wtp1005 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.017 second response time [16:04:31] RECOVERY - Parsoid on wtp1021 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.013 second response time [16:04:38] gwicke: parsoid quickdraw champ! [16:04:41] RECOVERY - Parsoid on wtp1004 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.004 second response time [16:04:41] RECOVERY - Parsoid on wtp1017 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.012 second response time [16:04:43] =P [16:04:51] RECOVERY - Parsoid on wtp1011 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.012 second response time [16:04:51] RECOVERY - Parsoid on wtp1008 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.003 second response time [16:04:51] RECOVERY - Parsoid on wtp1019 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.003 second response time [16:04:51] RECOVERY - Parsoid on wtp1020 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.016 second response time [16:04:51] RECOVERY - Parsoid on wtp1009 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.007 second response time [16:05:01] RECOVERY - Parsoid on wtp1007 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.004 second response time [16:05:01] RECOVERY - Parsoid on wtp1006 is OK: HTTP OK: HTTP/1.1 200 OK - 1112 bytes in 0.008 second response time [16:05:18] robh, we were just looking into last night's logs [16:06:10] (03PS1) 10Jgreen: enable spamassassin SPF debug logging [operations/puppet] - 10https://gerrit.wikimedia.org/r/135579 [16:06:32] nothing definite yet though [16:06:54] This one is interesting [16:06:55] in both cases the spikes were quite sudden: http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Parsoid+eqiad&m=cpu_report&s=descending&mc=2&g=cpu_report [16:07:33] (03PS1) 10Jforrester: Enable TemplateData GUI on Catalan Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135580 (https://bugzilla.wikimedia.org/65785) [16:07:35] my current working theory is that the requests are coming from the job queue [16:07:38] (03PS1) 10Jforrester: Remove outdated eswiki config disabling VE for anons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135581 [16:07:45] renoirb: where are you doing that? seems a question for #wikimedia-dev [16:09:02] On my own deployment, but thought you are aware of things such as Varnish. More than devs, generally ! [16:09:06] likely some combination between a bug in parsoid & the job queue retrying hundreds of requests at once [16:09:10] (03CR) 10Jgreen: [C: 032 V: 031] enable spamassassin SPF debug logging [operations/puppet] - 10https://gerrit.wikimedia.org/r/135579 (owner: 10Jgreen) [16:09:18] Thanks Nemo_bis [16:10:58] (03CR) 10Yuvipanda: mongo: Support newer yaml style configuration (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 (owner: 10Yuvipanda) [16:13:02] (03PS2) 10Rush: grant cscott deployment access [operations/puppet] - 10https://gerrit.wikimedia.org/r/135418 (owner: 10Filippo Giunchedi) [16:13:21] (03CR) 10Rush: [C: 031] "looks good change wise, seems approved and post 3 days to me per" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135418 (owner: 10Filippo Giunchedi) [16:15:21] (03CR) 10Krinkle: "Are we ready to allow this being enabled in production wikis?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135580 (https://bugzilla.wikimedia.org/65785) (owner: 10Jforrester) [16:15:58] (03PS1) 10Nemo bis: Typofix [operations/puppet] - 10https://gerrit.wikimedia.org/r/135582 [16:16:57] (03CR) 10Jforrester: "It's been in production for months (since February), on MediaWiki.org…" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135580 (https://bugzilla.wikimedia.org/65785) (owner: 10Jforrester) [16:17:49] (03PS2) 10Nemo bis: Typofix [operations/puppet] - 10https://gerrit.wikimedia.org/r/135582 [16:20:04] (03CR) 10Ori.livneh: mongo: Support newer yaml style configuration (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 (owner: 10Yuvipanda) [16:24:05] (03CR) 10Yuvipanda: mongo: Support newer yaml style configuration (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 (owner: 10Yuvipanda) [16:24:41] (03PS1) 10Rush: redeploy admin to fenari [operations/puppet] - 10https://gerrit.wikimedia.org/r/135583 [16:25:23] (03CR) 10Rush: [C: 032 V: 032] "go" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135583 (owner: 10Rush) [16:42:06] (03CR) 10Dzahn: [C: 032] Add README [operations/puppet] - 10https://gerrit.wikimedia.org/r/135157 (owner: 10Ori.livneh) [16:43:16] mutante: \o/ https://github.com/wikimedia/operations-puppet [16:44:55] ori: ah, yea :) i figured they include it in a special way [16:45:17] ori: have you seen the new labs-vagrant logo? :) http://pagemigration.wmflabs.org/ [16:45:30] (03CR) 10Ori.livneh: mongo: Support newer yaml style configuration (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 (owner: 10Yuvipanda) [16:45:43] btw, inside the puppet modules it should be README.md [16:45:51] for puppet doc etc [16:45:52] YuviPanda: haha! [16:46:04] md for markdown [16:46:09] mutante: it can be README.md in the top-level too, github will render it as markdown [16:46:32] e.g. https://github.com/wikimedia/mediawiki-vagrant [16:46:37] ori: ah, ok [16:46:42] ori: can't really move it to an inline rb file since none exist :P [16:46:47] or README.mediawiki [16:46:48] YuviPanda: you can add one [16:47:09] ori: hmm, can I call ordered_json from there templates as well? [16:47:14] yes [16:47:17] we have such a symlink in mw/core (README.mediawiki -> README [16:47:27] ori: alright. I like that since it hides the ugliness some more :) [16:47:59] YuviPanda: https://github.com/wikimedia/operations-puppet/blob/production/modules/statsd/templates/localConfig.js.erb#L12 [16:48:24] YuviPanda: note slightly different calling convention (scope.function_ordered_json([@config])) [16:48:45] ori: ah, right. cool. I can't probably use merge even there, will have to use ruby's version of is_hash I guess [16:48:49] (03CR) 10Dzahn: [C: 032] "requested by Stu" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135525 (owner: 10Odder) [16:49:32] YuviPanda: i'd add a recursive merge func in the erb [16:49:41] ori: but arrays! [16:49:43] err, lists! [16:49:51] you don't need a general solution, you just need one that works here [16:50:35] isn't the is_hash one (when moved to erb) specific enough for this situation? :) [16:50:36] hashar: you saw the admin lint related changes got merged up to the ones you had -2ed..etc? [16:51:01] YuviPanda: yeah, fine. i wish it were more elegant, but maybe that's best [16:51:06] (03PS4) 10Dzahn: planet: Add Mike Linksvayer to the English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135526 (owner: 10Nemo bis) [16:51:08] mutante: yup and deployed the Jenkins job yesterday iirc. I posted an announcement on ops list (though I forgot to thank you in that email) [16:51:28] hashar: awesome :) i should read mail first, heh [16:52:42] (03CR) 10Dzahn: [C: 032] planet: Add Mike Linksvayer to the English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135526 (owner: 10Nemo bis) [16:57:01] mutante: and thanks to both of you :] [17:04:15] (03CR) 10Filippo Giunchedi: send user and channel count to statsd for ircd (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135074 (owner: 10Rush) [17:14:04] chasemp, mutante: hi, i want to move files/sudo/sudoers.appserver to the new admin.yaml is this a needed work, don't want to step on toes [17:15:03] so I was looking at that, I'm unsure how that should look actually [17:15:15] most of it will be moved out, some of it is service account privs [17:15:29] and that isn't handled in the new scheme atm on purpose, so it will be a mixed bag [17:15:56] and somewhat complex as some of those perms are overlapping I believe and unclear as to the whom with an actual deployment group and not everyone stuffed into a general mortals [17:18:18] (03CR) 10Dzahn: [C: 032] grant cscott deployment access [operations/puppet] - 10https://gerrit.wikimedia.org/r/135418 (owner: 10Filippo Giunchedi) [17:18:35] woot! [17:19:00] chasemp: the service stuff can be similar to the parsoid rights [17:19:17] (03PS1) 10Rush: remove dupe embedded user/group management [operations/puppet] - 10https://gerrit.wikimedia.org/r/135589 [17:19:26] yes maybe but the decision has not been made to manage service accounts in this data.yaml [17:19:33] matanya: hi, what he said :) [17:19:35] and I'm inclined not to [17:19:36] i see [17:19:40] chasemp: helo [17:19:45] hey yo [17:19:56] lets put it the other way around, what help is needed ? [17:20:08] cscott: running puppet on tin. .hold on :) [17:20:13] mutante: I believe this was hosing up fenari https://gerrit.wikimedia.org/r/#/c/135589/ [17:20:23] that should fix it and is needed anyways for consolidation [17:20:42] matanya: i have something completely unrelated.. [17:20:58] i'm listening [17:21:48] matanya: maintenance crons, i recently added the /var/log/mediawiki. now we want all crons to log there, instead of randomly having some in /home/mwdeploy , some in /tmp and so on [17:22:05] so i restored Reedys old change [17:22:14] and it needs rebasing [17:22:22] (03PS2) 10Rush: remove dupe embedded user/group management [operations/puppet] - 10https://gerrit.wikimedia.org/r/135589 [17:22:31] matanya: https://gerrit.wikimedia.org/r/#/c/83574/ [17:22:32] patch link ? [17:22:38] (03CR) 10Rush: [C: 032 V: 032] "self reviewing as dupe logic, and I need to see if this is _all_ the duplicate logic" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135589 (owner: 10Rush) [17:22:39] ha [17:23:35] chasemp: ah:) and i was looking in that second [17:23:55] hey I'm sorry I didn't mean to jump too soon, wasn't like that at all [17:24:16] I figured duplicate logic can be removed for certain, no logic change [17:24:17] no, i just didnt want to mess with that before the long weekend [17:24:22] no doubt [17:24:32] took me far too long to track this down [17:24:42] but I'm glad I did because I discovered a bunch of nested users and groups assignments [17:24:43] so it was that and caesium [17:24:45] that need to be rooted out [17:24:48] all the others worked fine [17:25:14] i see. yea [17:25:27] role/mediawiki including admins ... [17:25:52] expected mortals to already handle that [17:26:30] yeah there were a few levels of duplicate logic that got circular included [17:26:39] now that it's a sin, it was a pita to find [17:27:01] nods [17:28:50] cscott: notice: /Stage[main]/Accounts::Cscott/Ssh_authorized_key[cananian@skiffserv]/ensure: created [17:29:07] !log welcome new deployer cscott [17:29:12] Logged the message, Master [17:29:18] <^d> Something up with graphite? [17:29:23] whoop whoop [17:29:29] you have an account on tin now, you should talk to another deployer [17:29:32] greg-g: ^:) [17:29:53] mwalker|away is going to hold my hand to deploy ocg [17:30:05] presumably not until he's mwalker, not mwalker|away ;) [17:30:05] alright, perfect [17:30:12] hehe, yea [17:30:45] i'll also be a backup deployer for parsoid, but at the moment subbu and gwicke seem to have that well in hand [17:31:20] greg-g: around? [17:31:39] ^d: seems up, problems? [17:32:45] chasemp: mediawiki data isn't going in; mwprof might be hung [17:32:54] mwprofctl status / mwprofctl restart [17:33:10] would anyone care if i quickly jump in and deploy https://gerrit.wikimedia.org/r/#/c/135592/ ? [17:33:10] ori: ah, thank you! [17:33:13] mutante: i looked into it [17:33:18] so it's on test.wikidata first [17:33:26] chasemp: np, thanks for looking [17:33:30] it will be faster and nicer from scratch, i'll do it [17:33:57] mwprof thinks it was running ok, but I restarted it [17:34:24] * aude waits a few minutes [17:36:36] definitely looks like some large portion of metrics dropped out [17:36:37] https://graphite.wikimedia.org/render/?width=1027&height=485&_salt=1400076727.454&from=-3d&target=carbon.relays.tungsten-a.metricsReceived&target=secondYAxis%28sum%28carbon.relays.*.destinations.*.fullQueueDrops%29%29 [17:36:45] and there are not drops to speak of [17:37:05] so either mwprof was choking or they aren't getting there or something strange [17:37:16] <^d> chasemp: Ah, might be what ori said :) [17:37:17] ori: no greg-g ? [17:37:22] <^d> Yeah, looking at mw stuff. [17:37:30] hey, what's up, sorry [17:37:33] ah, [17:37:45] i'd like to deploy https://gerrit.wikimedia.org/r/#/c/135592/ to test.wikidata [17:37:52] cscott, deploying ocg eh? [17:37:52] before wikidata gets on wmf6 [17:38:07] tiny fix but fixes a bunch of js issues [17:38:08] mwalker: now *able* to deploy ocg ;) [17:38:09] aude: ok, Reedy's your man [17:38:15] ok [17:38:42] no one else seems to be doing anything now and nothign on calendar for now [17:39:04] always best to verify on test.wikidata first when possible :) [17:39:14] greg-g: common deploy today, right? we're pretty certain that the square thumbs patch will have no effect, but i'm online in case. [17:39:22] cscott: yes [17:39:57] cscott: awesome, thank you sir [17:40:08] aude: oh, I see, wanna do it? [17:40:28] yes [17:40:28] greg-g: did your task fire? my 'at' job hasn't yet -- it's set for teatime today. ;) [17:40:45] will take no more than couple minutes (fast as jenkins allows) [17:40:46] it's well past tea time [17:40:56] not edt, nor pdt [17:41:30] there are more than two timezones in the world [17:42:04] yes, but i was on pdt when i entered the 'at' command, and my laptop is now on edt (when the command should trigger) [17:42:27] cscott: it just yells at me that it's due today whenever I do a "task list" [17:42:30] recapping, the command in question was: [17:42:36] cscott@x201s:~$ echo "echo talk to greg-g | wall" | at teatime next tue [17:43:03] vs greg-g's: [17:43:04] greg@x200s:~$ task add "check in with cscott re status of thumbnail pregening" proj:wmf.deploy pri:H due:tuesday [17:43:30] the proj: and pri: are optional (and really due: is as well, but what's the point) [17:44:39] greg-g: last time i was giving the author of 'task' a hard time because the command didn't use GNU getopt: task --add "foo" --proj=wmf.deploy --pri=H --due=tuesday [17:44:58] bah, too many dashes [17:44:59] or, presumably in the GNU spirit: task -afoo -pwmf.deploy -PHdT [17:46:19] much less readable! wwsd? [17:46:40] !log aude synchronized php-1.24wmf6/extensions/Wikidata 'JS fixes for Wikidata' [17:46:40] done [17:46:40] Logged the message, Master [17:47:43] test.wikidata is much better [17:48:01] * aude eager for wikidata to get these, but shall wait [17:54:12] aude: glad to hear it worked [18:09:15] (03PS1) 10Dzahn: include vs. require wikidev in releasers group [operations/puppet] - 10https://gerrit.wikimedia.org/r/135596 [18:10:19] (03PS1) 10Dzahn: Revert "caesium - revert yaml admin include" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135598 [18:11:14] (03CR) 10Dzahn: [C: 032] include vs. require wikidev in releasers group [operations/puppet] - 10https://gerrit.wikimedia.org/r/135596 (owner: 10Dzahn) [18:11:32] andrewbogott: adding these users to the wikidev group would be adequate for my purposes too [18:11:54] ori: So, I tried an experiment with mwalker, changing his uid to 500. That went fairly well... [18:12:08] it meant that he could no longer read his own files, but he had the prives to chgrp them so that he could read them. [18:12:22] So, that's OK for a one-off but too ugly to apply widely [18:12:42] but this is about changing gid [18:12:54] um, yes, sorry, typo [18:12:54] omg [18:12:58] s/uid/gid/ [18:13:14] what about just adding the users to the wikidev group? [18:13:18] or even better [18:13:34] changing the gid to 500, but adding membership in 550 [18:13:42] andrewbogott, I don't know if you were in -operations on thurs when we discovered an ugly problem -- there are hosts that didn't get the new config and I was forbidden from ssh'ing into them because my authorized_keys file was owned incorrectly [18:14:08] but for some reason; in general it worked OK -- so I was able to login to tin and what not [18:14:13] mwalker: what hosts for instance? [18:14:25] That… shouldn't be possible, everything derives from ldap [18:14:26] all hosts [18:15:02] andrewbogott, osmium and mw1151 [18:15:13] wait, you're talking about production now? [18:15:14] osmium has puppet disabled atm [18:15:23] ah; but it wasn't the gid that was the problem; "osmium,mw1151 fixed UID of mwalker (605->2454)" [18:15:24] mw1151 too, possibly [18:15:24] mwalker: I think we must be talking about two unrelated things [18:15:50] aye; that problem was the uid change you did -- so I still don't have any blocking problems with the gid change in labs [18:16:00] * mwalker goes back into hiding [18:16:18] andrewbogott: so what about the suggestion above? [18:16:32] mwalker: any boxes without puppet are… well, them being broken in any and all ways comes as no surprise :) [18:16:57] ori: I think it's a good suggestion and also I don't know how to do it. Need to research [18:17:11] that is, I know where the primary group is stored but not how group membership is structured in ldap [18:17:39] (03CR) 10Chad: [C: 032] Allow to add/remove the autoreview group on mediawiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134994 (owner: 10Legoktm) [18:17:49] andrewbogott: kk, can i ask you to prioritize it? i'm cleaning hacks and workarounds from the mediawiki puppetization and this is one [18:17:59] sure, ok. [18:18:11] thank you [18:18:17] ori, you don't by chance already know the answer to ^^ ? [18:18:59] nope, sorry [18:19:01] (03CR) 10Dzahn: [C: 032] "revert revert = enable, try again now without requiring wikidev" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135598 (owner: 10Dzahn) [18:20:49] (03PS2) 10Dzahn: Revert "caesium - revert yaml admin include" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135598 [18:21:19] (03Merged) 10jenkins-bot: Allow to add/remove the autoreview group on mediawiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134994 (owner: 10Legoktm) [18:22:34] (03CR) 10Dzahn: [C: 032] Revert "caesium - revert yaml admin include" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135598 (owner: 10Dzahn) [18:23:14] !log demon synchronized wmf-config/InitialiseSettings.php [18:23:20] Logged the message, Master [18:23:50] thanks [18:27:36] Hi, I wan't to proceed too a mass upload with the GWToolset [18:28:09] tounoki: IIRC it's temporarily disabled because of some issues; gi11es can confirm or deny I think [18:28:51] what's is IIRC ? [18:29:19] tounoki: If I Recall Correctly [18:31:29] marktraceur: only GWTollset is disabled ? [18:31:33] !log reedy synchronized php-1.24wmf6/includes/SkinTemplate.php [18:31:37] Logged the message, Master [18:31:54] (03CR) 10Chad: [C: 032 V: 032] Update hooks-bugzilla to 5edd392d926daaa58917b1c8bb174cdb022e4c76 [operations/gerrit/plugins] - 10https://gerrit.wikimedia.org/r/133732 (https://bugzilla.wikimedia.org/65370) (owner: 10QChris) [18:31:56] I think so. [18:32:04] tounoki: I'm not totally sure, but...sec [18:32:47] (03PS1) 10Dzahn: remove wikidev requirement from releases role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135602 [18:33:00] hum, sorry, my file is on my other hard drive :'( [18:33:20] tounoki: I lied, it's not disabled [18:33:37] tounoki: The official line is to not use it for large files [18:34:18] (03PS1) 10Reedy: Non Wikipedias to 1.24wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135603 [18:34:37] ori: Just as a second opinion… can you think of a reason why /any/ user in ldap should have a primary gid of 550? [18:34:49] marktraceur: yes but the goal of our project is to use it with big files and we work on some issues in order to do that [18:35:24] tounoki: Oh, you're here to offer help? [18:35:31] * marktraceur can dig up bugs for you [18:36:58] marktraceur: see https://bugzilla.wikimedia.org/show_bug.cgi?id=63864 [18:37:00] (03CR) 10Reedy: [C: 032] Non Wikipedias to 1.24wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135603 (owner: 10Reedy) [18:37:08] (03Merged) 10jenkins-bot: Non Wikipedias to 1.24wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135603 (owner: 10Reedy) [18:37:09] tounoki: I was about to point you at https://bugzilla.wikimedia.org/show_bug.cgi?id=65217 [18:37:50] But I think I've seen what you're talking about [18:38:23] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.24wmf6 [18:38:28] Logged the message, Master [18:39:48] !log reedy synchronized docroot and w [18:39:52] Logged the message, Master [18:39:57] marktraceur: thx, I'm reading... [18:40:46] bd [18:46:40] marktraceur: I don't understand everything / does it mean we have to wait fixing this bug to upload large file with GWToolset ? [18:46:57] andrewbogott: 550 is wikidev or svn? [18:47:02] svn [18:48:53] tounoki: Definitely. [18:49:01] At the very least mitigating its effects [18:49:13] DoSing Commons is not something we really want to do... [18:55:56] (03PS2) 10Dzahn: remove wikidev requirement from releases role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135602 [18:56:05] andrewbogott: no reason at all [18:56:13] ori: excellent [18:57:05] (03CR) 10Dzahn: [C: 032] remove wikidev requirement from releases role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135602 (owner: 10Dzahn) [18:57:09] w/in 6 [19:01:51] (03CR) 10Rush: [C: 031] "seems reasonable to me in regards to a new group, let's do it here" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135134 (owner: 10Dzahn) [19:02:32] (03PS3) 10Dzahn: remove wikidev requirement from releases role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135602 [19:03:56] (03PS1) 10Cmjohnson: changing dhcpd for mw1163 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135608 [19:05:03] (03PS1) 10RobH: rcs1001-rcs1002 dns setup [operations/dns] - 10https://gerrit.wikimedia.org/r/135609 [19:08:42] (03CR) 10Cmjohnson: [C: 032] changing dhcpd for mw1163 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135608 (owner: 10Cmjohnson) [19:08:48] (03CR) 10RobH: [C: 032] rcs1001-rcs1002 dns setup [operations/dns] - 10https://gerrit.wikimedia.org/r/135609 (owner: 10RobH) [19:10:21] (03PS1) 10RobH: missing trailing period on last change, opps! [operations/dns] - 10https://gerrit.wikimedia.org/r/135610 [19:10:25] goddamnit [19:10:45] (03CR) 10RobH: [C: 032 V: 032] missing trailing period on last change, opps! [operations/dns] - 10https://gerrit.wikimedia.org/r/135610 (owner: 10RobH) [19:16:21] <^d> mutante: On solr100[1-3] again. Where does racktables say they're located? Shouldn't matter much as we've spread elastic over 3 rows anyway, just curious. [19:21:50] (03PS3) 10Krinkle: Move rcstream server implementation to external repo [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 (owner: 10Ori.livneh) [19:21:55] (03CR) 10Krinkle: [C: 031] Move rcstream server implementation to external repo [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 (owner: 10Ori.livneh) [19:25:29] ^d: solr1001 - eqiad row A,rack A6 - solr 1002 - rack A7, solr 1003 - row B, rack B6 [19:27:13] (03PS1) 10Rush: admin yaml to dataset hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135613 [19:28:44] (03CR) 10jenkins-bot: [V: 04-1] admin yaml to dataset hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135613 (owner: 10Rush) [19:30:10] <^d> mutante: thx! [19:30:33] (03PS1) 10Rush: admin yaml for dbstore hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135614 [19:32:02] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for dbstore hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135614 (owner: 10Rush) [19:38:42] (03PS1) 10Dzahn: still require the wikidev resource but not class [operations/puppet] - 10https://gerrit.wikimedia.org/r/135615 [19:38:58] (03CR) 10jenkins-bot: [V: 04-1] still require the wikidev resource but not class [operations/puppet] - 10https://gerrit.wikimedia.org/r/135615 (owner: 10Dzahn) [19:39:57] (03PS2) 10Dzahn: still require the wikidev resource but not class [operations/puppet] - 10https://gerrit.wikimedia.org/r/135615 [19:40:12] (03CR) 10jenkins-bot: [V: 04-1] still require the wikidev resource but not class [operations/puppet] - 10https://gerrit.wikimedia.org/r/135615 (owner: 10Dzahn) [19:40:25] (03PS3) 10Dzahn: still require the wikidev resource but not class [operations/puppet] - 10https://gerrit.wikimedia.org/r/135615 [19:42:29] (03CR) 10Dzahn: [C: 032] still require the wikidev resource but not class [operations/puppet] - 10https://gerrit.wikimedia.org/r/135615 (owner: 10Dzahn) [19:46:33] (03PS3) 10MaxSem: Kill GeoData Solr, decom servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/133886 [20:04:04] chasemp: There are still a few (three or four?) users who I've been unable to reach about their UIDs. I'm guess that if/when I need to change them I should now make the chance in json rather than in admins.pp? [20:04:11] Or should I just stand down until things stabilize? [20:05:12] (03PS1) 10BryanDavis: Revert "Labs: Add deployment related sudoer rules for svn group" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 [20:05:45] (03PS1) 10MaxSem: Completely remove misc::maintenance::geodata [operations/puppet] - 10https://gerrit.wikimedia.org/r/135623 [20:05:56] andrewbogott: could either change it in both if they are migrated or wait [20:06:03] I haven't been looking back to see on UID's? [20:06:10] but either way is cool w/ me [20:06:16] chasemp: at the moment I'm waiting anyway since I can't contact people :) [20:06:28] I'll just stop trying until you're ready for phase 2 [20:06:42] (03CR) 10BryanDavis: "Cherry-picked in beta for testing." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (owner: 10BryanDavis) [20:08:04] (03PS2) 10BryanDavis: Revert "Labs: Add deployment related sudoer rules for svn group" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (https://bugzilla.wikimedia.org/65548) [20:08:12] (03PS4) 10MaxSem: Kill GeoData Solr, decom servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/133886 [20:10:55] (03PS1) 10Dzahn: add new yaml groups for release uploaders [operations/puppet] - 10https://gerrit.wikimedia.org/r/135624 [20:12:03] (03PS1) 10Rush: admin yaml for dobson [operations/puppet] - 10https://gerrit.wikimedia.org/r/135625 [20:12:05] (03PS1) 10Rush: admin yaml for ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/135626 [20:12:07] (03PS1) 10Rush: admin yaml for dysprosium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135627 [20:12:09] (03PS1) 10Rush: admin yaml for eeden.esams.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135628 [20:12:11] (03PS1) 10Rush: admin yaml for tarin.pmtpa.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135629 [20:12:13] (03PS1) 10Rush: admin yaml for erbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135630 [20:12:15] (03PS1) 10Rush: admin yaml for es([569]|10)\.pmtpa\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135631 [20:12:17] (03PS1) 10Rush: admin yaml for labstore100[12]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135632 [20:12:19] (03PS1) 10Rush: admin yaml for linne.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135633 [20:12:21] aude: When you get a minute I'd love for you to try a `sudo -u mwdeploy ...` command on deployment-bastion to verify that the special svn group permissions are no longer needed. [20:13:08] (03CR) 10Rush: "I think I'm just missing MW context, what do these ppl do that someone in teh deployment wouldn't do?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135624 (owner: 10Dzahn) [20:13:47] (03PS3) 10BryanDavis: Revert "Labs: Add deployment related sudoer rules for svn group" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (https://bugzilla.wikimedia.org/65548) [20:13:52] (03PS2) 10Dzahn: add new yaml groups for release uploaders [operations/puppet] - 10https://gerrit.wikimedia.org/r/135624 [20:15:07] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for dobson [operations/puppet] - 10https://gerrit.wikimedia.org/r/135625 (owner: 10Rush) [20:15:09] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/135626 (owner: 10Rush) [20:15:11] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for dysprosium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135627 (owner: 10Rush) [20:15:44] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for eeden.esams.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135628 (owner: 10Rush) [20:15:46] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for tarin.pmtpa.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135629 (owner: 10Rush) [20:16:10] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for erbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135630 (owner: 10Rush) [20:16:48] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for es([569]|10)\.pmtpa\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135631 (owner: 10Rush) [20:16:58] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for labstore100[12]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135632 (owner: 10Rush) [20:17:10] (03PS2) 10Rush: admin yaml for eeden.esams.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135628 [20:17:12] (03PS2) 10Rush: admin yaml for tarin.pmtpa.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135629 [20:17:14] (03PS2) 10Rush: admin yaml for erbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135630 [20:17:16] (03PS2) 10Rush: admin yaml for es([569]|10)\.pmtpa\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135631 [20:17:18] (03PS2) 10Rush: admin yaml for dobson [operations/puppet] - 10https://gerrit.wikimedia.org/r/135625 [20:17:20] (03PS2) 10Rush: admin yaml for dbstore hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135614 [20:17:22] (03PS2) 10Rush: admin yaml for ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/135626 [20:17:24] (03PS2) 10Rush: admin yaml to dataset hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135613 [20:17:26] (03PS2) 10Rush: admin yaml for dysprosium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135627 [20:17:28] (03PS2) 10Rush: admin yaml for linne.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135633 [20:17:30] (03PS2) 10Rush: admin yaml for labstore100[12]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135632 [20:17:38] (03CR) 10Dzahn: "deployers can deploy mediawiki on the cluster, they have a shell on tin, they can run all the deployment scripts and change what is runnin" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135624 (owner: 10Dzahn) [20:17:52] (03PS3) 10Rush: admin yaml to dataset hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135613 [20:18:16] (03CR) 10BryanDavis: "I purged /etc/sudoers.d/svn in the beta cluster using `salt '*' cmd.run 'rm /etc/sudoers.d/svn'` from deployment-salt. As soon as Katie, A" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (https://bugzilla.wikimedia.org/65548) (owner: 10BryanDavis) [20:18:46] (03CR) 10Ori.livneh: "Thanks :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (https://bugzilla.wikimedia.org/65548) (owner: 10BryanDavis) [20:20:04] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for erbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135630 (owner: 10Rush) [20:21:10] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for es([569]|10)\.pmtpa\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135631 (owner: 10Rush) [20:22:40] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for linne.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135633 (owner: 10Rush) [20:23:43] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for labstore100[12]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135632 (owner: 10Rush) [20:25:30] (03CR) 10Rush: [C: 032 V: 032] "go" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135613 (owner: 10Rush) [20:27:04] (03PS3) 10Rush: admin yaml for dbstore hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135614 [20:28:21] (03CR) 10Rush: [C: 032 V: 032] "go" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135614 (owner: 10Rush) [20:30:53] (03CR) 10Rush: [C: 031] "I'm already using gid 10 and probably 11 too ;) otherwise gtg" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135624 (owner: 10Dzahn) [20:37:49] (03PS3) 10Dzahn: add new yaml groups for release uploaders [operations/puppet] - 10https://gerrit.wikimedia.org/r/135624 [20:39:03] (03CR) 10Dzahn: [C: 032] add new yaml groups for release uploaders [operations/puppet] - 10https://gerrit.wikimedia.org/r/135624 (owner: 10Dzahn) [20:39:53] (03PS1) 10Rush: admin yaml dedup account logic for dataset hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135689 [20:40:00] (03PS2) 10Rush: admin yaml dedup account logic for dataset hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135689 [20:40:13] (03CR) 10Rush: [C: 032 V: 032] "fix broken puppet due to dupes on dataset" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135689 (owner: 10Rush) [20:42:40] (03PS3) 10Rush: admin yaml for dobson [operations/puppet] - 10https://gerrit.wikimedia.org/r/135625 [20:43:00] (03CR) 10Rush: [C: 032 V: 032] "go" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135625 (owner: 10Rush) [20:47:29] (03PS1) 10Dzahn: add releaser roles to node caesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135690 [20:48:06] (03CR) 10Faidon Liambotis: [C: 04-1] "Thanks for embarking on this! I've been wanting to refactor the shit out of our syslogging for a really long time :)" (0312 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 (owner: 10Ori.livneh) [20:48:36] (03PS3) 10Rush: admin yaml for ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/135626 [20:48:43] (03PS1) 10Yurik: zero update - all langs for 416-03 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135691 [20:48:54] (03CR) 10Rush: [C: 032 V: 032] admin yaml for ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/135626 (owner: 10Rush) [20:50:22] (03PS3) 10Rush: admin yaml for dysprosium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135627 [20:50:30] (03CR) 10Rush: [C: 032 V: 032] admin yaml for dysprosium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135627 (owner: 10Rush) [20:51:06] (03PS2) 10Dzahn: add releaser roles to node caesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135690 [20:51:14] (03CR) 10Dzahn: [C: 032] add releaser roles to node caesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135690 (owner: 10Dzahn) [20:51:58] (03PS3) 10Rush: admin yaml for eeden.esams.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135628 [20:52:03] (03CR) 10Rush: [C: 032 V: 032] admin yaml for eeden.esams.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135628 (owner: 10Rush) [20:53:05] (03PS3) 10Dzahn: add releaser roles to node caesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135690 [20:53:14] (03CR) 10Dzahn: [C: 032] add releaser roles to node caesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135690 (owner: 10Dzahn) [20:55:24] (03CR) 10Rush: [C: 032 V: 032] admin yaml for erbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135630 (owner: 10Rush) [21:00:14] (03PS3) 10Rush: admin yaml for tarin.pmtpa.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135629 [21:00:20] (03CR) 10Rush: [C: 032 V: 032] admin yaml for tarin.pmtpa.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135629 (owner: 10Rush) [21:00:29] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Tue 27 May 2014 05:59:45 PM UTC [21:00:31] (03CR) 10Ori.livneh: "Many thanks for the review!" (039 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 (owner: 10Ori.livneh) [21:03:36] (03PS4) 10Dzahn: add releaser roles to node caesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135690 [21:05:18] (03CR) 10Dzahn: [C: 032] add releaser roles to node caesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135690 (owner: 10Dzahn) [21:05:29] (03PS3) 10Rush: admin yaml for erbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135630 [21:05:31] (03PS3) 10Rush: admin yaml for es([569]|10)\.pmtpa\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135631 [21:05:33] (03PS3) 10Rush: admin yaml for linne.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135633 [21:05:35] (03PS3) 10Rush: admin yaml for labstore100[12]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135632 [21:05:39] (03PS4) 10Rush: admin yaml for erbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135630 [21:05:43] (03CR) 10Rush: [C: 032 V: 032] admin yaml for erbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135630 (owner: 10Rush) [21:13:08] (03PS1) 10Rush: fix dupe user logic for erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135694 [21:13:33] (03CR) 10Rush: [C: 032 V: 032] "fix puppet on erbium" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135694 (owner: 10Rush) [21:15:51] (03PS4) 10Rush: admin yaml for es([569]|10)\.pmtpa\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135631 [21:15:56] (03CR) 10Rush: [C: 032 V: 032] admin yaml for es([569]|10)\.pmtpa\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135631 (owner: 10Rush) [21:19:50] (03PS4) 10Rush: admin yaml for labstore100[12]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135632 [21:19:54] (03CR) 10Rush: [C: 032 V: 032] admin yaml for labstore100[12]\.eqiad\.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135632 (owner: 10Rush) [21:21:55] (03PS4) 10Rush: admin yaml for linne.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135633 [21:34:10] (03PS1) 10Rush: remove admin for labstore tmp [operations/puppet] - 10https://gerrit.wikimedia.org/r/135697 [21:34:24] (03CR) 10Rush: [C: 032 V: 032] remove admin for labstore tmp [operations/puppet] - 10https://gerrit.wikimedia.org/r/135697 (owner: 10Rush) [21:36:49] bd808: still works [21:37:21] aude: w00t. Wanna give a +1 on https://gerrit.wikimedia.org/r/#/c/135622/ ? [21:39:15] ok [21:40:41] (03CR) 10Aude: [C: 031] "works for me" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (https://bugzilla.wikimedia.org/65548) (owner: 10BryanDavis) [21:46:17] (03PS1) 10Gergő Tisza: Limit the number of expensive thumbnails processed at the same time [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135704 (https://bugzilla.wikimedia.org/65691) [22:00:57] !log caesium, release files: changed file owner groups mwupld->releasers-mediawiki, mobileupld->releasers-mobile (to match switch to yaml groups) [22:01:02] Logged the message, Master [22:05:09] that switched the way permissions are handled on caesium [22:05:13] also in yaml now [22:05:24] permissions to upload mw releases i mean [22:06:37] if uploaders report any unexpected issues you can send them to me, but fixed the files with find / exec [22:07:16] mwupld is now called releasers-mediawiki, mobileupld is now called releasers-mobile [22:09:07] (03CR) 10Dzahn: [C: 032] "per Bug 63028 being resolved" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (https://bugzilla.wikimedia.org/65548) (owner: 10BryanDavis) [22:09:22] (03PS4) 10Dzahn: Revert "Labs: Add deployment related sudoer rules for svn group" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (https://bugzilla.wikimedia.org/65548) (owner: 10BryanDavis) [22:12:10] (03CR) 10Dzahn: [C: 032] Revert "Labs: Add deployment related sudoer rules for svn group" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (https://bugzilla.wikimedia.org/65548) (owner: 10BryanDavis) [22:12:56] (03CR) 10Dzahn: "yep, a workaround for a bug that is now resolved, thanks" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135622 (https://bugzilla.wikimedia.org/65548) (owner: 10BryanDavis) [22:16:48] (03PS3) 10Dzahn: let reseacher group read researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/135134 [22:19:08] (03PS4) 10Dzahn: let reseacher group read researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/135134 [22:19:57] (03CR) 10Dzahn: [C: 032] let reseacher group read researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/135134 (owner: 10Dzahn) [22:21:09] andrewbogott: that's cool! (re: primary groups, all the svn users fixed) [22:35:10] mutante: yeah, so far seems to have worked fine [22:39:15] nice, yep [22:42:38] (03PS1) 10Dzahn: add researchers yaml group to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135712 [22:44:04] (03PS2) 10Dzahn: add researchers yaml group to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135712 [22:44:59] (03PS3) 10Dzahn: add researchers yaml group to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135712 [22:55:32] (03CR) 10Dzahn: [C: 032] add researchers yaml group to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135712 (owner: 10Dzahn) [22:55:58] RoanKattouw_away, mwalker, MaxSem: I can do SWAT if you like [22:56:15] no objectons from /me [22:57:03] ori, that would be wonderful [22:57:09] oh, there's only one [22:57:10] easy peasy [22:58:59] (03PS1) 10Dzahn: Revert "add researchers yaml group to stat1003" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135715 [22:59:14] I'm new to SWAT windows. [22:59:21] k [22:59:25] Am I supposed to merge my own, or let the SWAT people do it. [22:59:41] It's to mediawiki-config. [23:01:58] superm401: you have deploy rights, right? [23:02:05] greg-g, yeah. [23:02:09] (03CR) 10Dzahn: [C: 032] "can't do this yet because of yet another wikidev group dependency" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135715 (owner: 10Dzahn) [23:02:35] you can JFDI in that case, especially for something like this (disabling something you enabled as an experiment) [23:02:55] Okay, wasn't sure if I would be stepping on people's toes. [23:03:02] Will do it now, then. [23:03:06] nah, should be ok [23:03:18] no one's peeking in right now anyways [23:03:20] :) [23:03:29] (03CR) 10Mattflaschen: [C: 032] Disable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135560 (owner: 10Phuedx) [23:03:32] (03PS2) 10Ori.livneh: Disable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135560 (owner: 10Phuedx) [23:03:35] (03CR) 10Ori.livneh: [C: 032] Disable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135560 (owner: 10Phuedx) [23:03:46] oh there :) [23:03:46] (03Merged) 10jenkins-bot: Disable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135560 (owner: 10Phuedx) [23:03:57] superm401: ^^ ori's on it ;) [23:04:08] sorry, i tabbed out [23:04:15] whoa, jouncebot! [23:04:15] no worries [23:04:19] :) [23:04:19] what is this? [23:04:26] the evil genius of mwalker [23:04:28] that's very useful [23:05:23] ori, okay, let me know if you have any questions, and I'll verify when you're done. [23:06:14] !log ori synchronized wmf-config/InitialiseSettings.php 'I7fdafede7: Disable the anonymous signup invite experiment' [23:06:25] Logged the message, Master [23:09:45] superm401: ^ [23:11:14] Yep, thanks, ori [23:14:23] Looks good [23:16:45] np [23:30:29] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Tue May 27 23:30:19 UTC 2014 [23:56:00] Is https://wikitech.wikimedia.org/wiki/User:Phuzion/Documentation_Initiative anything (semi-) official? He uses big words, yet I don't remember him from Gerrit/wikitech-l/elsewhere. [23:59:29] PROBLEM - Puppet freshness on oxygen is CRITICAL: Last successful Puppet run was Tue 27 May 2014 08:59:17 PM UTC [23:59:29] <^d> Page hasn't been edited since 2010. Never heard of it, lol.