[00:01:24] also makes sense... we would have probably at least gotten a tcp connection, even if the bot dislikes us :P [00:01:58] (03CR) 10Ori.livneh: [C: 04-1] "Change the value in manifests/role/tcpircbot.pp, not the module" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134279 (owner: 10Dzahn) [00:02:03] ok, so did anyone say it should actually work from terbium? [00:02:24] mutante: mh, not sure... the script is where, so I would at least expect it [00:02:28] * there [00:03:02] I was just idling on terbium at that time, that's why I tested there [00:03:11] It think it's only meaningful from tin (eg the deploy server) [00:05:05] include misc::deployment::scap_scripts [00:05:10] why are those on terbium even? [00:06:31] because scap works by sshing into each target nodes and running a script to pull code changes [00:07:08] so there isn't a clear-cut separation of code meant for the deployment server and code meant for the deployment targets [00:07:29] ori: The application servers don't seem to have this class [00:07:34] only tin and terbium have it [00:07:36] AFAIS [00:08:28] The mediawiki::sync class has the files that need to be on the scap target hosts. [00:09:09] yep, but the scap scripts don't need to be on terbium [00:09:29] Not for any reason that I'm aware of, no. [00:10:47] git-blame the line in site.pp [00:11:49] dangerous territory [00:12:54] dates back from fb609118 [00:13:09] doesn't give a hint [00:13:25] sure it does [00:13:41] the ability to set explicitly will also be helpful if the maint [00:13:42] host in the primary site dies. [00:14:25] that refers to the the misc:maint classes [00:14:53] I guess he just copied that over from hume back then [00:16:32] (03PS1) 10Hoo man: Remove misc::deployment::scap_scripts from terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 [00:16:40] yeah, that sounds right [00:19:48] https://www.mail-archive.com/mediawiki-commits@lists.wikimedia.org/msg67233.html [00:20:05] the IPv4-mapped IPv6 doesn't map on tin i suppose [00:20:20] it currently has the ::ffff compat format again [00:21:40] but tin doesn't have a v6 interface added via interface::add_ip6_mapped [00:21:58] so either first add the actual interface to make it mapped [00:22:05] or back to your change making it use v4 [00:22:42] '::ffff:10.64.0.196/128', # tin this isn't it [00:26:05] (03PS1) 10Dzahn: let tin have a proper IPv6 address [operations/puppet] - 10https://gerrit.wikimedia.org/r/134284 [00:27:35] (03CR) 10Dzahn: "totally true, should have touched the role class only, but looking at that i think the fix is rather Change-Id: Iadd9b978" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134279 (owner: 10Dzahn) [00:27:47] (03Abandoned) 10Dzahn: allow tin and terbium Ipv6 IP to talk to tcpircbot [operations/puppet] - 10https://gerrit.wikimedia.org/r/134279 (owner: 10Dzahn) [00:28:39] (03CR) 10Dzahn: "also see: Change-Id: Iadd9b978" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134284 (owner: 10Dzahn) [00:29:19] (03CR) 10Dzahn: "also see: Id688a26dbf5c71" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134284 (owner: 10Dzahn) [00:29:53] ori: gah, i forgot 100% of that [00:30:07] we even reverted that with "Connection from ('::ffff:10.64.0.196' matches tin and is what is trying to connect in /var/log/upstart/tcpircbot-logmsgbot.log" [00:30:26] i'd say the root cause is that the v6 address is not mapped [00:30:33] but gotta run for now [00:33:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [00:43:27] (03CR) 10Hoo man: "bump" [operations/puppet] - 10https://gerrit.wikimedia.org/r/126027 (owner: 10Hoo man) [00:49:14] (03PS3) 10Withoutaname: Reduce string URLs to defined constant [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131914 (https://bugzilla.wikimedia.org/48618) [01:00:35] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Mon May 19 22:00:21 2014 [01:25:05] PROBLEM - MySQL Processlist on db1064 is CRITICAL: CRIT 86 unauthenticated, 0 locked, 0 copy to table, 0 statistics [01:26:05] RECOVERY - MySQL Processlist on db1064 is OK: OK 27 unauthenticated, 0 locked, 0 copy to table, 0 statistics [01:51:07] anyone happen to know why a patch would not have made it to the deploy schedule? expected patch https://gerrit.wikimedia.org/r/#/c/127839/ to be part of https://www.mediawiki.org/wiki/MediaWiki_1.24/wmf5#GWToolset ... [02:05:28] dan-nl: it should be, gerrit thinks it is [02:06:15] hmm, if something is on testwiki today but not enwiki, it should be on enwiki on thursday, right? [02:06:40] dan-nl: so does gitblit: http://git.wikimedia.org/log/mediawiki%2Fextensions%2FGWToolset.git/refs%2Fheads%2Fwmf%2F1.24wmf5 [02:06:48] YuviPanda: right [02:06:59] cool, ty grrrit-wm [02:07:02] gah [02:07:03] greg-g: [02:07:05] :) [02:07:19] dan-nl: apparently the script that generates that wikipage is finicky [02:07:22] * YuviPanda remembers to never write a bot with name yuppybot [02:07:37] thanks for checking, just didn't see it on https://www.mediawiki.org/wiki/MediaWiki_1.24/wmf5#GWToolset. where would i see it in gerrit? [02:07:54] dan-nl: click on the "Included In" arrow on that gerrit change [02:08:05] it'll take a second, but it'll show all branches that it thinks the change is in [02:08:22] ah, cool, thanks greg-g [02:09:46] np :) [02:13:25] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3789 MB (3% inode=99%): [02:20:36] !log LocalisationUpdate completed (1.24wmf4) at 2014-05-20 02:19:33+00:00 [02:20:43] Logged the message, Master [02:21:25] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3433 MB (3% inode=99%): [02:30:15] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Tue May 20 02:30:07 UTC 2014 [02:34:34] !log LocalisationUpdate completed (1.24wmf5) at 2014-05-20 02:33:31+00:00 [02:34:39] Logged the message, Master [03:00:25] RECOVERY - Disk space on virt0 is OK: DISK OK [03:31:17] Coren: I'm considering writing a tool to make handling Oversight requests that sucks less than OTRS. Think more AFTish, rather than EmailUser. [03:31:30] Coren: is this something to host on wmflabs, or... [03:32:16] lfaraone: It's a reasonable place to host it, but you'll have to be careful about logging and permissions. [03:32:36] lfaraone: We'd sure as hell rather you host it here than on some random third party hosting. :-) [03:33:38] lfaraone: Also there are significant privacy issues, so you'll need to (a) make sure only oversighters have access and (b) authenticate properly. [03:34:06] speaking of permissions… is there a bugzilla bug for "please enable 2FA on enwp and friends and make it mandatory for functionaries"? :P [03:34:41] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [03:38:38] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue May 20 03:37:32 UTC 2014 (duration 37m 31s) [03:38:42] Logged the message, Master [03:54:52] lfaraone: What's wrong with OTRS? [03:56:12] Gloria: its mostly a workflow problem for us; we often miss requests, users are surprisingly bad at identifying the actual items to be Oversighted, its hard to escalate to "discuss with the Oversight team" when an individual OS looks at a ticket and goes "uhm, not sure" [04:08:27] lfaraone: Perhaps the Oversight extension needs love? [04:08:45] It could probably better integrate with Flow and Echo. [04:11:41] Gloria: sure. the problem is I really don't want to learn how to PHP :) [04:12:24] Well, Oversight already has a supported and approved authentication scheme. [04:12:31] And it's already live in production and has been for some time. [04:12:43] So it's likely a much better starting place than a brand new tool. [04:22:03] you can do exciting things with OAuth these days [04:22:25] plus, isn't suppression MW core these days? Not sure a request queue is really in scope for that. [04:26:48] (03CR) 10Ori.livneh: [C: 031] Remove misc::deployment::scap_scripts from terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 (owner: 10Hoo man) [04:27:09] (03CR) 10Ori.livneh: "should be accompanied by the removal of the scripts. this won't actually purge them." [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 (owner: 10Hoo man) [06:00:37] (03PS1) 10Giuseppe Lavagetto: icinga: stop pages from 5xx alerts [operations/puppet] - 10https://gerrit.wikimedia.org/r/134301 [06:02:36] (03CR) 10Giuseppe Lavagetto: [C: 032] icinga: stop pages from 5xx alerts [operations/puppet] - 10https://gerrit.wikimedia.org/r/134301 (owner: 10Giuseppe Lavagetto) [06:07:14] morning [06:07:32] * springle waves [06:07:37] _joe_: isn't it like 8am over there ? [06:07:45] springle: I won't even ask ... [06:07:47] <_joe_> akosiaris: it is [06:07:58] <_joe_> akosiaris: yesterday I started at around 7 am [06:08:24] <_joe_> I am moving and I try to begin early to be free in the afternoon [06:08:55] ok that makes sense. As long as it does not become a habbit :P [06:09:02] <_joe_> also, I'm at my new flat waiting for Telecom Italia to come and connect my DSL, as they 'forgot' [06:09:22] pfff DSL... who needs DSL ? you got fibers over there :P [06:09:29] you didn't bribe the right people? [06:09:36] <_joe_> akosiaris: it will be upgraded [06:09:47] springle: ahahah :-) [06:09:48] <_joe_> springle: no, it's not like that. this is Telecom [06:10:07] <_joe_> no way to make them work correctly, no matter how much you bribe someone [06:10:13] you always need to bribe [06:10:37] IIRC Italy had a company that had FTTH (but the rest of the network was...) [06:11:01] <_joe_> yesterday the technician called me 5 (f i v e) times and every call started with "so... can you explain me what your problem is?" [06:11:15] I even remember having a public IP at my friend's home and the next hop being a 10.x.x.x [06:11:30] or maybe the next-next hop [06:11:35] <_joe_> akosiaris: my DSL will be upgraded to fiber [06:11:49] i tried to ignore it the first time you said it [06:11:55] THAT IS UNFAIR!!! [06:11:56] <_joe_> akosiaris: did I tell you there is *no* ipv6 offering for businesses here? [06:12:03] <_joe_> no serious one at least [06:12:25] I think we got that now [06:12:49] some ISPs do provide IPv6 [06:12:55] <_joe_> akosiaris: it will be a shitty fiber with the Iran-style internet filters and the ransom-based peering italian ISPs are famous for :) [06:13:00] some in pilot, some not in pilot [06:13:05] <_joe_> so no reason to hate me :) [06:13:48] heh. Still it gives you hope [06:13:52] <_joe_> oh yes telecom has an "ipv6 pilot", a friend who worked as a consultant setting it up told me it's one router in a basement :P [06:14:09] internet filters are only one court ruling away from being called illegal [06:14:20] <_joe_> in greece? [06:14:23] well a "final" court ruling [06:14:33] <_joe_> in Italy they're all loved by almost everyone [06:14:38] ah, in greece that translates to around 12-15 years [06:14:45] <_joe_> ahhh ok [06:14:56] <_joe_> akosiaris: they should unify greece and italy, it seems [06:14:58] <_joe_> :P [06:15:02] ahaha [06:15:17] <_joe_> btw, I was amazed by how clean Athens is compared to Rome [06:15:41] is it ? [06:15:43] <_joe_> and no, this is not a compliment for Athens :) [06:15:51] ah, ok [06:15:54] <_joe_> akosiaris: it is, Rome has become filthy [06:16:09] <_joe_> well it's not-so-dirty anyway [06:16:40] <_joe_> it seems you don't consider the public streets as your private litter/ashtray as much as we do [06:17:02] (03CR) 10Alexandros Kosiaris: [C: 032] let tin have a proper IPv6 address [operations/puppet] - 10https://gerrit.wikimedia.org/r/134284 (owner: 10Dzahn) [06:18:05] somehow I think you got a skewed version of reality while you were here. Good, we managed to hide the truth well! [06:19:21] <_joe_> eheh [06:19:52] (03CR) 10Alexandros Kosiaris: [C: 04-2] "Let's fix whatever problem IPv6 has instead of working around it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/134277 (owner: 10Hoo man) [06:22:09] akosiaris: i saw some reality last time. this trip was much better :) you local guides did well [06:22:42] :-) :-) :-) [06:25:40] <_joe_> akosiaris: athens looks a lot like big sicilian cities, btw [06:25:54] <_joe_> only, slightly less chaotic [06:26:14] <_joe_> (yes, they *are* that chaotic) [06:29:55] hmmm did we upgrade RT or something ? [06:30:08] I was logged out and when I logged in again [06:30:19] it started displaying everything in greek [06:30:33] which is nice and all... but still ... [06:31:00] oh my god I know what it is ... my 1 year cookie just expired... [06:31:44] i like the new greek enabled RT better :P [06:35:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [06:40:02] <_joe_> lol [06:40:24] <_joe_> akosiaris: so it's 1 year today? [06:40:32] <_joe_> linkedin told me, in fact [06:41:14] yup. one year and 1 week to be exact [06:55:26] (03CR) 10Alexandros Kosiaris: [C: 032] puppet:self::master use newer package names [operations/puppet] - 10https://gerrit.wikimedia.org/r/134143 (owner: 10Alexandros Kosiaris) [07:04:18] akosiaris: firewall question please ? [07:04:54] matanya: sure [07:05:16] (03CR) 10Alexandros Kosiaris: [C: 032] Avoid connection tracking for DNS recursors [operations/puppet] - 10https://gerrit.wikimedia.org/r/134071 (owner: 10Alexandros Kosiaris) [07:05:56] looking at hafnium.wikimedia.org - it has role::eventlogging::graphite which has input and output, does those ports need to be open too ? [07:10:42] matanya: not sure where the too goes but in this specific case, neither [07:11:14] the software works by connecting to input, consuming and then connecting to output and writing them [07:11:42] not the best description but the point I want to make is that both connections are outgoing and not incoming [07:12:12] so no listening ports on hafnium due to that role [07:12:26] akosiaris: a suggestion to make it simpler: can get a list of all iptables current rules for the host listed in the etherpad ? i'm just guessing in the fog now [07:14:21] hafnium ? It has no rules right now if this is what you are asking [07:15:36] i was asking about all hosts. [07:15:57] <_joe_> matanya: man iptables [07:16:03] <_joe_> that usually tend to work [07:16:05] <_joe_> :) [07:16:06] ? [07:16:44] <_joe_> "can get a list of all iptables current rules" -> as in, you need someone to run iptables in prod for you? [07:16:53] yes [07:17:18] <_joe_> ok did not get that, it seemed you were asking how to get the rules :) [07:17:31] <_joe_> matanya: I have absolutely no time for that, sorry [07:17:40] I am not sure I understand what that would accomplish [07:18:09] i can look at what has allow in the rules and transform that into ferm rules [07:18:22] <_joe_> akosiaris: having a list of rules already in place, wich given we're moving from default-> all open to default -> all dropped does not serve its purpose [07:18:58] _joe_: my point exactly [07:19:19] matanya: so the answer is easy. Do not assume any rules exist for any machine unless puppet says so [07:19:33] good answer [07:20:21] so my next question would be role::webperf - is that incoming or out going ? [07:22:01] outgoing [07:22:35] so just apply ferm on the host, no rules needed :) [07:22:51] for example if you follow the path from role::webperf you will probably end up to ve.py for example [07:22:58] modules/webperf/files/ve.py [07:23:15] which clearly has sock.sendto calls inside [07:23:56] so it is sending udp packets [07:24:04] and not receiving [07:24:18] hence outgoing [07:25:10] thank you, too tired to think about stuff more complicated than "include something" :P [07:25:20] so to verify your sentence, yes no rules needed, just apply ferm [07:26:44] (03PS1) 10Matanya: hafnium: add firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/134304 [07:39:54] (03PS1) 10Giuseppe Lavagetto: compare-puppet-catalogs: bugfixes [operations/software] - 10https://gerrit.wikimedia.org/r/134305 [07:39:57] (03CR) 10jenkins-bot: [V: 04-1] compare-puppet-catalogs: bugfixes [operations/software] - 10https://gerrit.wikimedia.org/r/134305 (owner: 10Giuseppe Lavagetto) [07:51:30] (03PS2) 10Giuseppe Lavagetto: compare-puppet-catalogs: bugfixes [operations/software] - 10https://gerrit.wikimedia.org/r/134305 [08:00:57] (03PS2) 10Giuseppe Lavagetto: erbium: fix template variable scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/134060 [08:12:27] <_joe_> Oh snap, the old catalog differ did not take into account missing resources [08:12:43] <_joe_> the new one does otoh [08:12:55] <_joe_> so, finding new issues [08:12:57] <_joe_> argh [09:02:16] (03CR) 10Filippo Giunchedi: [C: 031] "I'll merge this on Wed if no objections arise" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134258 (owner: 10Dzahn) [09:02:51] (03PS1) 10Ottomata: Allowing root and hdfs as well as hue to submit oozie -doas commands as any user [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/134315 [09:03:26] (03CR) 10Ottomata: [C: 032 V: 032] Allowing root and hdfs as well as hue to submit oozie -doas commands as any user [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/134315 (owner: 10Ottomata) [09:04:09] (03PS1) 10Ottomata: Updating cdh4 module with oozie -doas change [operations/puppet] - 10https://gerrit.wikimedia.org/r/134317 [09:04:18] (03PS2) 10Ottomata: Updating cdh4 module with oozie -doas change [operations/puppet] - 10https://gerrit.wikimedia.org/r/134317 [09:04:26] (03CR) 10Ottomata: [C: 032 V: 032] Updating cdh4 module with oozie -doas change [operations/puppet] - 10https://gerrit.wikimedia.org/r/134317 (owner: 10Ottomata) [09:04:53] (03CR) 10Filippo Giunchedi: [C: 031] "to be merged on Wed" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134251 (owner: 10Dzahn) [09:08:56] <_joe_> ottomata: you may want to take a look at http://puppet-compiler.wmflabs.org/html/analytics1027.eqiad.wmnet.html [09:09:15] <_joe_> it's not clear to me if those number changes around is a problem [09:11:41] um, first of all [09:11:43] THAT IS AWESOME [09:11:45] 2nd of all [09:12:15] random nums diff [09:12:18] oo, brb... [09:13:44] <_joe_> ok np :) [09:18:33] hm yea so hm [09:19:06] ah, hm, _joe_, its ok [09:19:28] that will be different every time, but will only run if something is missing in hdfs [09:19:40] <_joe_> ottomata: ok that's fair [09:19:48] actually, i should change that to use tempfile [09:19:57] <_joe_> I figured that, but I wanted a confirmation [09:20:00] $oozie_sharelib_tmpdir = inline_template('/tmp/oozie_sharelib_install.<%= rand() %>') [09:20:07] pretty silly now that I'm looking at it [09:20:52] _joe_, so, i tried to use puppet comparator for this [09:20:52] https://gerrit.wikimedia.org/r/#/c/133695/ [09:21:00] but it was kinda hard, because it was a submodule change [09:21:09] <_joe_> oh, god [09:21:11] <_joe_> :) [09:21:14] hehe [09:21:16] i copied down a catalog from palladium [09:21:25] then manually updated the submodule the vm's checkuot [09:21:27] <_joe_> ottomata: I think it should handle it [09:21:29] <_joe_> let me check [09:21:42] <_joe_> which nodes should this affect? [09:21:49] cp1052 is a good one [09:21:53] it shouldn't change anything [09:21:58] but that's the one i was testing with [09:22:47] <_joe_> ottomata: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/lastBuild/ [09:23:05] ? [09:23:15] <_joe_> ottomata: it's in jenkins now! [09:23:23] coool! [09:23:27] <_joe_> you can trigger builds of the compiler from jenkins [09:23:27] but wait, this change hasn't been merged... [09:23:29] ohhh [09:23:30] <_joe_> it's not finished [09:23:32] hm [09:23:43] <_joe_> yes, I do fetch the change from gerrit [09:23:57] hm, yeah, probably it would work when ops/puppet actually changes the sha that the submodule points to [09:24:00] but, this is just the submodule commit [09:24:04] so nothing has changed in ops/puppet yet [09:24:15] <_joe_> oh, ok [09:24:34] <_joe_> right. [09:24:44] <_joe_> yeah, no way to test submodules ATM [09:24:55] <_joe_> create me a ticket, I'll add that ability :) [09:25:16] <_joe_> it will require some time and is not high on my priority list now [09:25:32] <_joe_> for now, just create a dependent change on operations/puppet and test that [09:26:50] yeah, but i can't until the submodule change is merged! [09:27:14] <_joe_> fair enough. [09:27:24] <_joe_> ottomata: yeah I don't have an instant solution for that [09:28:04] <_joe_> we should add command-line switches for this, also we would need to code a little around [09:30:55] (03PS3) 10Giuseppe Lavagetto: erbium: fix template variable scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/134060 [09:31:09] (03CR) 10Giuseppe Lavagetto: [C: 032] erbium: fix template variable scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/134060 (owner: 10Giuseppe Lavagetto) [09:31:29] ok well, at least the submodule won't go live in prod until we make a specific merge for it [09:31:35] just gotta get a few folks to review this first [09:36:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [09:40:56] <_joe_> ottomata: I have a trick for you [09:41:51] (03CR) 10Hashar: [C: 04-1] "I have already packaged a statsd pure python module. It is available in apt.wikimedia.org and being used by Zuul." [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [09:42:44] <_joe_> ottomata: oh sorry, nevermind, it won't [09:43:02] ha [09:43:06] well, i mean, i think i got it to work locally [09:43:14] by manually checking out that patch inthe submodule [09:43:21] and then running comparator [09:43:28] although..i think i had to mess with the python cod emaybe [09:43:38] to get it to compare just 2.7 [09:44:12] hm, i ust did +PUPPET_VERSIONS = [('2.7', 'production')] [09:44:17] because I didn't want to compare with 3.0 now [09:45:43] <_joe_> ottomata: no that is wrong [09:46:03] oh? [09:46:13] (03CR) 10Hashar: "Forgot to add, adding this package with the name python-statsd will conflict with the version Zuul is depending upon and break CI :-(" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [09:46:15] <_joe_> ottomata: do you have the latest version of the code? [09:46:23] hmm, as of late last week [09:46:27] haven't tried this wek [09:46:28] week [09:46:32] will pull now [09:46:55] oo many changes ) [09:46:56] :) [09:47:11] <_joe_> ottomata: yeah sorry, you will also need to reprovision vagrant [09:47:30] ah, s'ok np [09:47:32] <_joe_> as they (as in puppetlabs) changed the catalog differ in an incompatible way [09:47:46] <_joe_> gotta love them [09:52:21] <_joe_> ottomata: working on your needs now [09:52:32] haha :) [09:52:41] <_joe_> it's gonna be kind of a workaround for now [09:52:54] <_joe_> we'll add proper submodule support later. [10:15:10] (03PS3) 10Giuseppe Lavagetto: compare-puppet-catalogs: bugfixes, jenkins integration [operations/software] - 10https://gerrit.wikimedia.org/r/134305 [10:22:26] (03CR) 10Giuseppe Lavagetto: [C: 032] compare-puppet-catalogs: bugfixes, jenkins integration [operations/software] - 10https://gerrit.wikimedia.org/r/134305 (owner: 10Giuseppe Lavagetto) [10:45:43] (03CR) 10Mark Bergsma: "One minor comment. Feel free to merge after you've resolved that." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 (owner: 10Rush) [10:48:06] (03PS1) 10Giuseppe Lavagetto: creation of role::puppet_compiler [operations/puppet] - 10https://gerrit.wikimedia.org/r/134327 [10:50:25] if only that meant we were going to natively compiled puppet code ;-) [10:50:52] <_joe_> mark: he. [10:51:21] <_joe_> mark: on a positive note, the new puppet compiler showed clearly some differences we were not shown before [10:51:33] mark: i saw you said something about firewall yesterday, are you not pleased with firewalling some of the hosts other than lvs ? [10:53:36] godog: https://rt.wikimedia.org/Ticket/Display.html?id=6845 does this mean i should mail each and every one of them ? [10:55:33] (03PS1) 10Yuvipanda: tools: Tune nginx to handle higher load [operations/puppet] - 10https://gerrit.wikimedia.org/r/134328 [10:55:39] hello! [10:55:43] anyone who can merge ^? [10:55:48] tools proxy is having perf issues [10:55:51] these should help [10:55:56] am going to test them live now [10:58:13] <_joe_> YuviPanda: reviewing [10:59:05] _joe_: actually no, test failed. fixing [10:59:26] <_joe_> YuviPanda: I said reviewing, not merging :) [10:59:31] ah :) [11:00:11] hmm, it's a bit weird. I did put them in the right context but nginx tells me they aren't [11:00:36] matanya: if that's an updated list of people (and they still require prod access) then yes they should be poked. I think an email will be fine [11:04:14] ah, hmm [11:19:16] (03PS2) 10Yuvipanda: tools: Tune nginx to handle higher load [operations/puppet] - 10https://gerrit.wikimedia.org/r/134328 [11:20:12] _joe_: better now ^ [11:20:13] I should eventually just use the nginx module in ops/puppet, but that's a fair chunk of work [11:21:54] (03CR) 10Giuseppe Lavagetto: [C: 04-1] tools: Tune nginx to handle higher load (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134328 (owner: 10Yuvipanda) [11:22:11] <_joe_> YuviPanda: see my comment, apart from that it seems fair [11:22:24] <_joe_> still need to review the syntax of the whole file [11:22:56] _joe_: I just picked a randomly high number. I know that this is causing issues because I keep seeing errors about not enough workers in the error logs. [11:22:59] _joe_: let me look into it [11:23:40] <_joe_> YuviPanda: ok that is a little high, but my main comment was about its effectiveness [11:23:54] _joe_: yeah, true. let me both reduce the number and look into the upstart file [11:24:28] _joe_: hmm, I don't think nginx comes with an upstart script. I just see a normal init script. [11:24:47] <_joe_> oh ok sorry I assumed it was ported to upstart [11:27:38] <_joe_> YuviPanda: then it's /etc/security/limits.conf [11:27:48] <_joe_> sorry gotta go, lunch! [11:28:25] _joe_: I'll poke around. Thanks for the review! [12:02:52] (03CR) 10TTO: "Although this is a true constant unlike $stdlogo and friends, I wonder if it would be neater to do this via CommonSettings.php line 210ff?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131914 (https://bugzilla.wikimedia.org/48618) (owner: 10Withoutaname) [12:02:55] (03PS1) 10Giuseppe Lavagetto: compare-puppet-catalogs: jenkins integration [operations/software] - 10https://gerrit.wikimedia.org/r/134335 [12:04:32] (03CR) 10Giuseppe Lavagetto: [C: 032] compare-puppet-catalogs: jenkins integration [operations/software] - 10https://gerrit.wikimedia.org/r/134335 (owner: 10Giuseppe Lavagetto) [12:04:36] (03CR) 10TTO: "This is surely not redundant. If you want to deny a permission, don't you use $wgRevokePermissions?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134155 (owner: 10Nemo bis) [12:05:14] (03CR) 10TTO: [C: 031] Disable query pages for closed wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130609 (https://bugzilla.wikimedia.org/42436) (owner: 10Withoutaname) [12:05:53] (03CR) 10TTO: [C: 04-1] Disable query pages for closed wikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130609 (https://bugzilla.wikimedia.org/42436) (owner: 10Withoutaname) [12:08:33] (03CR) 10Giuseppe Lavagetto: [C: 032] "This does not affect any other role." [operations/puppet] - 10https://gerrit.wikimedia.org/r/134327 (owner: 10Giuseppe Lavagetto) [12:11:06] (03PS3) 10Yuvipanda: tools: Tune nginx to handle higher load [operations/puppet] - 10https://gerrit.wikimedia.org/r/134328 [12:11:08] _joe_: updated [12:11:39] _joe_: /etc/security/limits.conf doesn't actually have any content by default, so this should be safe. [12:11:45] and I didn't find a limits module in our puppet [12:22:00] (03Abandoned) 10Nemo bis: Remove plwiki permission for 'autoconfirmed' denied for 'user' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134155 (owner: 10Nemo bis) [12:27:35] (03PS1) 10Gilles: Introduce finer-grained Media Viewer EventLogging sampling [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134343 [12:28:00] (03CR) 10Gilles: [C: 04-2] "Depends on https://gerrit.wikimedia.org/r/134064" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134343 (owner: 10Gilles) [12:37:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [12:41:36] (03CR) 10Andrew Bogott: tools: Tune nginx to handle higher load (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134328 (owner: 10Yuvipanda) [12:42:22] (03PS4) 10Yuvipanda: tools: Tune nginx to handle higher load [operations/puppet] - 10https://gerrit.wikimedia.org/r/134328 [12:42:27] andrewbogott: fixed [12:43:35] (03CR) 10Andrew Bogott: [C: 032] tools: Tune nginx to handle higher load [operations/puppet] - 10https://gerrit.wikimedia.org/r/134328 (owner: 10Yuvipanda) [12:46:36] (03CR) 10Hashar: [C: 031] adding .gitreview [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/95144 (owner: 10AzaToth) [12:51:10] (03PS1) 10Giuseppe Lavagetto: role::puppet_compiler - lint and fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/134349 [12:52:52] (03PS2) 10Ottomata: adding .gitreview [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/95144 (owner: 10AzaToth) [12:53:04] (03CR) 10Ottomata: [C: 032 V: 032] adding .gitreview [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/95144 (owner: 10AzaToth) [12:56:41] (03PS2) 10Giuseppe Lavagetto: role::puppet_compiler - lint and fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/134349 [12:56:52] (03CR) 10Giuseppe Lavagetto: [C: 032] role::puppet_compiler - lint and fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/134349 (owner: 10Giuseppe Lavagetto) [13:20:41] !log git-deploy: Deploying integration/slave-scripts I4a4e2a4c90fb6 [13:20:46] Logged the message, Master [13:47:02] (03PS1) 10Giuseppe Lavagetto: puppet_compiler: easier jenkins launch with parameters [operations/puppet] - 10https://gerrit.wikimedia.org/r/134354 [13:48:44] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet_compiler: easier jenkins launch with parameters [operations/puppet] - 10https://gerrit.wikimedia.org/r/134354 (owner: 10Giuseppe Lavagetto) [13:51:00] (03PS2) 10Andrew Bogott: Renumber many users to match labs: [operations/puppet] - 10https://gerrit.wikimedia.org/r/133968 [13:55:08] hellp anskar [13:55:11] *o [13:55:19] hello [13:55:39] ops, anskar is here to get help with an rc bot flooding [13:56:11] :) thanks matanya [13:56:54] May 17 15:44:54 [13:24:34] Greetings, can someone help me with a bot in #da.wikipedia ? - It keeps getting "Excess Flood", so that mean it keep joining and leaveing. :) [13:56:54] May 17 15:44:54 [13:26:38] There's nothing we can do about it [13:56:54] May 17 15:44:54 [13:26:55] Oka, have a nice day then. [13:57:26] Reedy, that wasn't the most helpful of answers... Why can't anyone fix it? This sounds like a normal IRC ops task. Who can oper on irc.wm.o? [13:57:46] the bot now is wikichanges-bf5a6c68-7bd8-4fcf-b331-2912f3fb013a (~nodebot@anonymous.user) [13:57:59] * matanya pokes _joe_ ^ [13:58:10] but change nick [13:58:31] it's possible block wikichanges-*-*-*-*-*!~nodebot@anonymous.user ? [14:05:58] (03CR) 10Andrew Bogott: [C: 032] Renumber many users to match labs: [operations/puppet] - 10https://gerrit.wikimedia.org/r/133968 (owner: 10Andrew Bogott) [14:07:34] (03PS4) 10Rush: ircd-ratbox and udpmxircecho puppetized [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 [14:08:24] (03PS5) 10Rush: ircd-ratbox and udpmxircecho puppetized [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 [14:09:17] (03CR) 10Rush: [C: 032 V: 032] "fixed the last outlier." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 (owner: 10Rush) [14:26:11] (03PS1) 10Rush: wm-rc-irc echo and not relay typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/134358 [14:27:10] (03CR) 10Rush: [C: 032 V: 032] "typo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134358 (owner: 10Rush) [14:27:35] (03PS6) 10Hashar: Describe Math related packages in a class [operations/puppet] - 10https://gerrit.wikimedia.org/r/115133 (https://bugzilla.wikimedia.org/61090) [14:33:22] (03PS1) 10Giuseppe Lavagetto: puppet_compiler: add naggen installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/134359 [14:34:36] (03CR) 10jenkins-bot: [V: 04-1] puppet_compiler: add naggen installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/134359 (owner: 10Giuseppe Lavagetto) [14:37:18] (03PS2) 10Giuseppe Lavagetto: puppet_compiler: add naggen installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/134359 [14:39:11] (03PS3) 10Giuseppe Lavagetto: puppet_compiler: add naggen installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/134359 [14:41:14] (03CR) 10Giuseppe Lavagetto: [C: 032] "Adding forgotten naggen installation." [operations/puppet] - 10https://gerrit.wikimedia.org/r/134359 (owner: 10Giuseppe Lavagetto) [14:43:05] PROBLEM - RAID on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:43:15] PROBLEM - Disk space on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:43:15] PROBLEM - puppet disabled on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:43:25] PROBLEM - check if dhclient is running on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:43:35] PROBLEM - twemproxy port on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:43:35] PROBLEM - DPKG on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:44:25] PROBLEM - check configured eth on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:44:25] PROBLEM - twemproxy process on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:44:32] likely down for disk replacement [14:45:18] PROBLEM - RAID on ms-be3003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:45:18] cmjohnson1: ^ ? [14:45:39] godog: looking [14:45:56] cmjohnson1: oh ok, just curious if you were replacing mw1151's disk [14:47:35] !log running 'find' commands on many hosts to chown files for users with new UIDs. [14:47:40] Logged the message, Master [14:47:55] godog: no I am not. working on mw1163 for the millionth time though ...different rack [14:47:56] RECOVERY - RAID on ms-be3003 is OK: OK: optimal, 12 logical, 12 physical [14:50:20] cmjohnson1: kk, thanks [14:51:05] PROBLEM - SSH on mw1151 is CRITICAL: Server answer: [14:52:25] RECOVERY - twemproxy port on mw1151 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [14:52:35] RECOVERY - DPKG on mw1151 is OK: All packages OK [14:53:05] RECOVERY - SSH on mw1151 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [14:53:15] RECOVERY - Disk space on mw1151 is OK: DISK OK [14:53:15] RECOVERY - puppet disabled on mw1151 is OK: OK [14:53:32] ok so ms-be3003 hiccup might have been andrewbogott's find, I found sdk on it was complaining though [14:53:56] RECOVERY - RAID on mw1151 is OK: OK: no RAID installed [14:54:01] <_joe_> godog: if swat is running, it will kill mw1151 as it has a disk failure [14:54:09] <_joe_> so, expected [14:54:12] godog: I didn't run the find on the ms-be boxes [14:54:19] or… at least I tried not to [14:54:25] salt -E '^(.(?!ms-be|labstore|snapshot))*$' [14:54:51] mh there are several of these [14:54:52] root 9804 0.0 0.0 4404 612 ? S 14:41 0:00 \_ /bin/sh -c find / -user 605 -print0 | xargs -0 chown -h 2454 [14:55:11] <_joe_> we may decide to shut it down and put it in scheduled downtime [14:55:15] RECOVERY - check configured eth on mw1151 is OK: NRPE: Unable to read output [14:55:17] dammit! [14:55:25] I tested that regext to death [14:55:31] godog: feel free to kill that [14:55:35] PROBLEM - twemproxy port on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:55:35] PROBLEM - DPKG on mw1151 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:56:15] RECOVERY - twemproxy process on mw1151 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [14:56:15] RECOVERY - check if dhclient is running on mw1151 is OK: PROCS OK: 0 processes with command name dhclient [14:56:19] andrewbogott: can salt do that? likely it is running elsewhere too? [14:56:25] RECOVERY - twemproxy port on mw1151 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [14:56:25] RECOVERY - DPKG on mw1151 is OK: All packages OK [14:56:37] godog: I can 'killall find'... [14:56:44] <_joe_> godog: if it's running on mw1151 [14:56:48] <_joe_> andrewbogott: no! [14:57:05] <_joe_> your finds are not the only one running at a given moment [14:57:11] _joe_: yeah, that's what I figured :) [14:57:28] lemme see what I can do... [14:59:39] _joe_: can you suggest a way for me to selectively kill jobs like '/bin/sh -c find / -user 603 -print0 | xargs -0 chown -h 2400' [14:59:41] ? [14:59:56] (03PS1) 10Giuseppe Lavagetto: puppet_compiler: fix user in differ [operations/puppet] - 10https://gerrit.wikimedia.org/r/134367 [15:00:15] <_joe_> andrewbogott: not really... some ugly grep on ps -ef maybe [15:00:31] <_joe_> killall is unreliable IMO [15:00:56] andrewbogott: I suspect salt doesn't write anywhere its minion's pids? [15:01:22] godog: I don't know. I don't think so [15:02:03] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet_compiler: fix user in differ [operations/puppet] - 10https://gerrit.wikimedia.org/r/134367 (owner: 10Giuseppe Lavagetto) [15:02:06] (03CR) 10Physikerwelt: [C: 031] Describe Math related packages in a class [operations/puppet] - 10https://gerrit.wikimedia.org/r/115133 (https://bugzilla.wikimedia.org/61090) (owner: 10Hashar) [15:03:16] (03PS1) 10Cmjohnson: Adding dns entries for test HP servers ...called it hp1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/134369 [15:03:38] trivia: salt-minion won't create a new process group, so its process group is its father's, so killing all the related process groups will do the trick [15:05:02] godog: so you mean if I kill /usr/bin/salt-minion... [15:05:10] that seems about as bad as killing all 'find' jobs [15:05:59] andrewbogott: no, collecting all salt-minion processes that have spawned find and kill their process group (i.e. likely the pid-1) [15:06:20] godog: i assigned a bunch of RT to you, don't be shocked.. it doesn't mean you have to resolve all of them and they should be more like just following up with people if things can be closed or not yet [15:06:32] godog: and it also means "please ask me" about any of them [15:06:40] if any questions [15:06:47] mutante: yup, will do! [15:07:10] godog: ok… I still don't follow. How would I 'collect all salt-mion processes that have...' [15:10:31] andrewbogott: kill $(ps aux | grep find | awk '{print $2}') ? [15:10:37] andrewbogott: heh, good question [15:11:44] or something better than "grep find", but the print $2 gets you the PIDs [15:15:34] !log running salt "ms-be*" cmd.run "kill $(ps aux | grep 'find / -user' | awk '{print $2}')" to kill runaway 'finds' on swifts [15:15:36] (03CR) 10Cmjohnson: [C: 032] Adding dns entries for test HP servers ...called it hp1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/134369 (owner: 10Cmjohnson) [15:15:40] Logged the message, Master [15:17:25] well, that didn't work at all [15:19:39] andrewbogott: try to find "saltutil.killjob" [15:19:59] http://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.saltutil.html [15:20:15] * andrewbogott is in quoting hell [15:20:20] ah [15:20:27] <_joe_> ok, done. puppet-compiler can be run via jenkins now [15:20:42] \o/ ! [15:20:45] !:)) [15:21:21] <_joe_> https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/13/console for a live stream of compilation of 'production' in puppet 2.7 vs 3 for a smart subset of all nodes [15:21:48] <_joe_> it runs one job at a time, so don't expect to be able to use it in the near future [15:21:52] <_joe_> :) [15:22:12] _joe_: does it needs rights to be run for changes ? [15:24:57] <_joe_> matanya: no [15:25:05] <_joe_> matanya: not that I'm aware of [15:25:21] <_joe_> but it's limited to 1 concurrent run [15:25:21] (03PS2) 10Dzahn: let tin have a proper IPv6 address [operations/puppet] - 10https://gerrit.wikimedia.org/r/134284 [15:25:31] <_joe_> so for now any new job will be queued [15:25:36] yes, i don't have rights to run it [15:25:54] <_joe_> matanya: we should speak with hashar about this :) [15:26:06] i guess with greg-g [15:26:10] <_joe_> I am deep down the $mariadb dynamic lookup hole [15:26:14] based on membership in ops LDAP group? [15:26:18] nda issues and the like [15:26:30] <_joe_> matanya: in case just ask me :( [15:26:35] matanya: it comes up at the meetings though...:) [15:26:40] it did again yesterday [15:27:00] i'm blocked in so many ways :) [15:27:30] i'll push a patch to ldap: group : volunteer ou : wikimedia [15:27:45] matanya: Scott Lee .. do you know him? [15:28:03] he is trying to contribute [15:28:26] known by irc nick pancake so something similar [15:28:55] didn't see any conrib yet [15:29:15] don't talk to me about NDAs :P [15:29:27] (kidding, it's just an annoying situation right now) [15:30:24] matanya: thanks, i wanted to confirm it is indeed pancake [15:30:31] matanya: tell him how you got yours? :p [15:30:49] matanya: how did you get your NDA? [15:30:50] no, i'll lose it :P [15:30:53] haha [15:38:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [15:39:07] can someone bother looking at this ^ around for a few days already [15:44:52] <_joe_> matanya: that server is down, atm [15:45:27] do we know why/how long? [15:45:49] so please ack the alert [15:45:59] no point in ignoring alerts [15:46:33] the worst thing is if you get used to ignoring your alerting system [15:46:41] <_joe_> mutante: look at icinga :) [15:46:47] (03PS1) 10Giuseppe Lavagetto: puppet3: fix $mariadb dynamic lookup [operations/puppet] - 10https://gerrit.wikimedia.org/r/134374 [15:46:49] _joe_: doesn't tell me why it's down [15:46:57] <_joe_> matanya: well I didn't. [15:47:03] if there was a ticket ... [15:47:07] then that's ACKed [15:47:16] <_joe_> mutante: site.pp can give you a hint :) [15:48:01] <_joe_> that server is still not in service, I asked if someone knew why it was down [15:48:16] <_joe_> nobody answered --> mentally acked [15:48:55] i don't see hints in site.pp [15:49:01] creates a ticket [15:49:08] mutante: there is one [15:49:31] <_joe_> mutante: maerlant just includes standard [15:49:34] oh, come on, so you all now what is going on [15:49:43] i didnt [15:50:00] mutante: it is the toolserver box [15:50:02] what does that mean besides [15:50:06] not being puppetized [15:50:51] matanya: do toolserver people care? [15:51:07] nosy said keep up until june [15:51:19] but no one complains, so who knows [15:51:24] matanya: ticket number? [15:51:33] i don't see anything with maerlant in subject [15:52:37] looking [15:53:34] dammit, got confused with amaranth [15:54:07] right, amaranth is tampa [15:54:11] maerlant is esams [15:55:19] 19:37 mutante: maerlant - re-adding 2620:0:862:1::80:2 to eth0, starting nginx [15:55:25] mutante: RT ticket please [15:55:42] 21:24 RobH: maerlant still offline (200 days?). dropped esams ticket for onsite troubleshooting, as mgmt is offline. [15:55:49] those are all OLD entries from SAL [15:55:51] many months ago [15:56:06] matanya: yes [15:56:21] so it is down 200 days, but only complaining few days, i doubt it [15:57:24] just search wikitech for it and you will see ..it's old [15:57:29] old log entries [15:58:24] ACKNOWLEDGEMENT - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 daniel_zahn RT #7539 [16:00:17] thanks mutante [16:05:05] (03PS1) 10Filippo Giunchedi: fix analytics servicegroup name [operations/puppet] - 10https://gerrit.wikimedia.org/r/134375 [16:06:16] (03CR) 10Filippo Giunchedi: "triggered by this on neon:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134375 (owner: 10Filippo Giunchedi) [16:08:12] (03CR) 10Ottomata: [C: 031] "Hm, weird, ok!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134375 (owner: 10Filippo Giunchedi) [16:10:10] godog: sure it's not the extra invisible character in there like recently? [16:10:49] godog: https://gerrit.wikimedia.org/r/#/c/133369/ [16:11:58] godog: ehmm.. it seems unrelated but then maybe not, because it worked after that fix [16:12:21] puppet gets angry at dashes [16:15:14] (03CR) 10Dzahn: "i think it's wrong in that one place in the kafka role instead of in all the other places: role/analytics/kafka.pp: $nagios_servicegrou" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134375 (owner: 10Filippo Giunchedi) [16:15:51] (03CR) 10Dzahn: "related to Change-Id: I0969fe80e3c06 ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134375 (owner: 10Filippo Giunchedi) [16:23:37] mutante: yeah I saw that, see my comment in the code review [16:23:43] Error: Could not find any servicegroup matching 'analytics_eqiad' (config file '/etc/icinga/puppet_services.cfg', starting on line 8603) [16:26:33] godog: i think you can fix the one place, vs. fixing all the other places [16:29:14] mutante: not sure I'm following, what do you mean? [16:31:58] godog: it is analytics_eqiad in a single place, it's analytics-eqiad in like 8 places [16:32:38] so i would just fix that single places [16:32:46] instead of renaming the entire group [16:33:44] mutante: though every @monitor_group has _ not - [16:35:11] yeah [16:35:14] i like _ better than - i think [16:35:20] especially if other spots have it that way usually [16:36:21] <_joe_> please NO [16:36:31] <_joe_> do NOT use '-' in class names [16:36:54] <_joe_> it will NOT work in puppet 3 [16:36:54] <_joe_> ok, it will work if you include the class [16:36:55] I don't mind either (but - doesn't require shift on us keyboards) [16:37:06] <_joe_> but you will be unable to reference its variables [16:37:23] <_joe_> see https://wikitech.wikimedia.org/wiki/Puppet_migration#Dashes_in_class_names_are_not_allowed [16:38:50] (03CR) 10Giuseppe Lavagetto: [C: 031] "https://wikitech.wikimedia.org/wiki/Puppet_migration#Dashes_in_class_names_are_not_allowed" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134375 (owner: 10Filippo Giunchedi) [16:39:34] (03CR) 10Filippo Giunchedi: "I think we should stick with _ for @monitor_group, it seems to be the standard, plus puppet (3) is fussy about - in class names for reason" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134375 (owner: 10Filippo Giunchedi) [16:39:35] <_joe_> godog: that is a by-by product of a change I made [16:40:57] _joe_: this isn't about a class name [16:41:06] its just something that the icinga module uses to group nodes together (i think) [16:41:14] a value of a variable [16:41:52] <_joe_> it's only the value? [16:42:36] <_joe_> ottomata: it's also the title of a define [16:43:03] <_joe_> which should not be a problem, either. [16:43:04] oh ja, then no - please! [16:43:07] naw, i think it might [16:43:15] hmmmm [16:43:18] wait [16:43:21] <_joe_> ottomata: no it doesn't but then again [16:43:29] -> My::Define::Nmae['instance-name'] [16:43:29] eah [16:43:30] its ok [16:43:31] you are right [16:43:37] i was as thinking about that case, but i think its fine [16:43:38] <_joe_> all servicegroups in nagios are $cluster_$place [16:43:48] <_joe_> ehm $cluster_$::site [16:43:58] <_joe_> so it's standardization [16:44:18] <_joe_> so I was wrong but I gave the correct advice :P [16:44:43] <_joe_> ok, going away for real :) [16:49:58] byeeee [16:54:39] mutante: if that looks good to you I'll go ahead and merge [16:57:19] (03PS1) 10Rush: wgRCFeedschanges/updates via udp to second destination [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134384 [16:58:14] (03CR) 10Dzahn: [C: 031] "if that's a standard and just about consistency then i'm fine with that, yea" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134375 (owner: 10Filippo Giunchedi) [17:13:35] PROBLEM - Puppet freshness on ekrem is CRITICAL: Last successful Puppet run was Tue May 20 14:12:50 2014 [17:14:53] warning: Not using cache on failed catalog [17:14:53] err: Could not retrieve catalog; skipping run [17:16:01] Could not find class role::ircd for ekrem.wikimedia.org [17:16:18] i remember somebody said somewhere to make it "ircd" instead of "irc" [17:16:30] right [17:16:36] Coren: ? [17:18:32] chasemp: https://gerrit.wikimedia.org/r/#/c/132495/ deleted role/ircd [17:18:39] but ekrem is still trying to use that role [17:18:46] ah [17:18:47] k [17:19:13] mutante: That wasn't me, but I'd agree with it. [17:21:01] it looks like now there is "role::mw-rc-irc" instead [17:21:05] mutante: I will address it in just a moment, is that cool? [17:21:21] mutante: the compromise w/ mark was not using ircd as a module [17:21:27] yea, sure, i was just reacting to icinga [17:21:28] since it's not a generic ircd but a custom one [17:21:47] I didn't get anything from icinga on it weird [17:22:09] +icinga-wm> PROBLEM - Puppet freshness on ekrem is CRITICAL: [17:22:10] from the bot [17:22:17] ah [17:36:34] bblack, pls comment - https://www.mediawiki.org/wiki/Requests_for_comment/Unfragmented_ZERO_design [17:36:47] let me know if there are any obvious blockers on your side [17:42:50] (03PS2) 10Rush: initial data.yaml for users/groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 [17:42:52] (03PS1) 10Rush: standardize a few things in admins.pp for conversion [operations/puppet] - 10https://gerrit.wikimedia.org/r/134394 [17:43:57] (03CR) 10Rush: "to ignore the main offenders and check for sanity:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134394 (owner: 10Rush) [17:44:12] (03CR) 10jenkins-bot: [V: 04-1] initial data.yaml for users/groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [17:44:23] (03CR) 10jenkins-bot: [V: 04-1] standardize a few things in admins.pp for conversion [operations/puppet] - 10https://gerrit.wikimedia.org/r/134394 (owner: 10Rush) [17:46:32] (03CR) 10Rush: [C: 04-2] "WIP" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134394 (owner: 10Rush) [17:51:45] (03PS1) 10BryanDavis: Group1 wikis to 1.24wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134399 [17:52:16] bd808|deploy: we'll have a small update to do for wikidata [17:52:26] * bd808|deploy runs in fear [17:52:30] you can deploy w/o it but we'll want it soonish [17:52:50] waiting for jenkins [17:53:16] aude: Should we do the normal group1 update and then your patch or vice versa? [17:53:23] either way [17:53:43] it just means there are no error details in our error tooltip when a user gets an error [17:53:53] no console errors though, just does nothing [17:54:27] and i fear more issues may appear after deployment, although hope not / htink not [17:55:45] Well... [17:56:21] Let's do the group1 update and then you or hoo can jump on tin to do your update. [17:56:35] ok [17:57:30] * bd808|deploy waits for the clock to strike 18:00Z [17:57:33] :) [17:57:38] yeah, agree with bd808|deploy [17:59:51] * aude not used to deploying during lunch time ;) [18:00:01] (03PS1) 10Nemo bis: Gather all soft-disabled uploads wikis in one config item [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 [18:00:03] have to prepare earlier [18:00:30] Blerg. "error: Unable to append to .git/logs/refs/remotes/origin/master: Permission denied" [18:00:45] grah [18:00:56] Any roots available to fix permissions in /srv/deployment/scap/scap for me? [18:01:17] Apparently trebuchet is just as crappy as git:clone was for file permissions. [18:01:32] akosiaris: still around? [18:01:45] "-rw-r--r-- 1 root wikidev 147 May 7 22:49 .git/logs/refs/remotes/origin/master" [18:01:52] Should be 0664 I guess [18:01:54] godog: still available? [18:02:09] * greg-g pings the RT people first, for good measure [18:02:36] Is my buddy ori still feeling poorly? [18:02:40] i can [18:02:42] (03PS1) 10Rush: restore old ircd role/module for ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/134402 [18:02:48] bd808|deploy: getting it now [18:02:53] Thanks RobH [18:02:57] this on tin? [18:03:09] mutante: yes? https://gerrit.wikimedia.org/r/#/c/134402/ [18:03:18] RobH: Yes, on tin [18:03:31] thanks RobH [18:03:55] I'm trying to update scap and `git fetch` dies with that file ownership error [18:04:08] bd808|deploy: try now pls [18:04:18] group should have write now [18:04:47] RobH: No joy. Now I can't even cd into /srv/deployment/scap/scap [18:05:19] wow... i had something off there [18:05:29] now should be group accessible [18:05:30] off by one sucks with chmod [18:05:35] yes, yes it does [18:05:50] Closer but "error: Unable to append to .git/logs/refs/remotes/origin/master: Permission denied" [18:06:07] (back to where you started ;) ) [18:06:11] I hate shared git clones :/ [18:06:14] <_joe_> do you need assistance? I should REALLY be off now [18:06:27] just unix file perms being hard [18:06:35] _joe_: RobH is on it I believe [18:06:42] <_joe_> ok :) [18:07:56] it says its group writeable [18:07:58] bleh [18:08:35] "-rw-r--r-- 1 root wikidev 147 May 7 22:49 /srv/deployment/scap/scap/.git/logs/refs/remotes/origin/master" [18:08:50] -R, RobH, -R [18:08:57] ok [18:08:58] (03CR) 10Nemo bis: Gather all soft-disabled uploads wikis in one config item (035 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 (owner: 10Nemo bis) [18:09:01] check now [18:09:06] sorry, i had a brain fart [18:09:21] too many other open windows and such [18:09:53] it's ok, if it's more than 2 weeks before use, I have to refresh my memory on octal vs rwx-display [18:09:57] RobH: Thanks! fatch worked. Running `git deploy sync` now [18:10:07] huzzah, deployment train rolls onward [18:10:18] choochoo [18:10:41] "0/230 minions completed fetch" [18:10:53] we never said how fast said train was ;] [18:13:05] !log `git deploy sync` for scap ended with "0/230 minions completed fetch" [18:13:05] Logged the message, Master [18:13:21] I bet the perms on the apaches are busted? [18:13:40] Maybe... I'll ssh to one and see what I can see. [18:13:58] Anybody want to look at slat logs to see if there's any clues there? [18:14:10] RobH: mind still staying on point? [18:14:50] yea if you need me to change a permission just lemme know [18:15:01] i'm around working on stuff for the forseeable next hour [18:15:10] Things on the apaches shouldn't have permissions issues. salt runs the checkout as root. [18:15:12] RobH: Anybody want to look at slat logs to see if there's any clues there [18:15:24] bd808|deploy: oh right, of course [18:15:26] slat logs? [18:15:38] (03CR) 10Nemo bis: Gather all soft-disabled uploads wikis in one config item (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 (owner: 10Nemo bis) [18:15:40] I don't know where to look to see why salt isn't triggering the fetch/checkout [18:15:50] oh, salt logs? [18:15:57] (i dunno how you guys deploy now =P) [18:15:59] RobH: yep, sorry [18:16:00] The "returner" is to redis on the salt master [18:16:41] RobH: the scap program is deployed with trebuchet. Then it still uses rsync to push out mw-core [18:16:59] Right now I'm trying to update scap to the latest master [18:17:12] greg-g: where are the salt logs stored for this? [18:17:32] The would be... on the salt master (paladium?) [18:17:34] just palladium? [18:17:37] oh, ok, yea [18:18:33] huh /var/log/salt has no updated logs with content [18:18:46] on palladium [18:18:46] 0k log files [18:18:46] (03CR) 10Dzahn: [C: 031] "i support this because ekrem is still up and the puppet run breaks there, so it's good to fix that and only delete it after ekrem is decom" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134402 (owner: 10Rush) [18:20:00] greg-g: we may need to pull mor ethan me on this [18:20:09] cuz honestly im not sure what it should look like normally [18:20:20] RobH: https://wikitech.wikimedia.org/wiki/Trebuchet#Troubleshooting suggests restarting the minions [18:20:23] and the fact im reading the wikitech pages on trebuchet now isnt good [18:20:43] huh [18:20:52] bd808|deploy: so should just apply that with the regex for mw servers? [18:20:54] RobH: who do you suggest? mutante, you've helped with parsoid stuff right? [18:21:04] i think bd is on right track [18:21:05] parsoid+salt stuff [18:21:08] k [18:21:15] RobH: I don't think it would hurt anything [18:21:17] mutante: stand-down, nvm ;) [18:21:19] but yea i dont mind if other opsen gets involved [18:21:29] nah, no stand down, blind is being lead by half blind here! ;] [18:21:38] i'll restart salt minions on mw servers [18:22:03] I pinged Ryan in another channel, but he's been idle for over an hour so... [18:22:15] what's going on [18:22:23] !log restarting salt minions on mw servers [18:22:28] Logged the message, RobH [18:22:29] i can say this, the file "master" is owned by root [18:22:33] root:wikidev that is [18:22:34] the mw servers are failing the initial fetches [18:22:38] NOT by trebuchet [18:22:59] while higher up in /srv/deployment there is trebuchet as an owner [18:23:00] mutante: Trying to `get deploy sync` scap from tin with result of "0/230 minions completed fetch" [18:23:03] bd808|deploy: try now [18:23:09] i restarted salt minions on all mw servers in eqiad [18:23:37] which matches the description you linked in https://wikitech.wikimedia.org/wiki/Trebuchet#Troubleshooting [18:23:54] RobH: Unfortunately "0/230 minions completed fetch" still [18:23:58] hrmm [18:24:09] bd808|deploy: is Unable to append to .git/logs/refs/remotes/origin/master: Permission denied" solved? [18:24:25] my understanding is yes [18:24:30] i fixed that one and now its just minions failing fetch [18:24:34] bd808|deploy: ? [18:24:36] mutante: Yes. RobH fixed that and `git fetch` worked [18:24:38] if wikidev needs to write to it, yea [18:24:52] alright [18:25:08] Ok. We can table this for now. I don't need updated scap python to do the group1 rollout [18:25:25] We should figure out what's wrong soon though. [18:25:34] next steps is torubleshooting from multiple location steps i think [18:25:38] blah, typos [18:25:41] context conveyed [18:26:02] my memory is hazy, but IIRC when we started deploy Ryan ran some salt command manually to make the first fetch work [18:26:04] I'll assume RobH is on point (whether he delegates is up to him) for further troubleshooting post deploy [18:26:12] really? [18:26:14] i dont wanna beeee [18:26:20] then delegate :) [18:26:23] damn it the rt duty folks are sleeping [18:26:28] This has been fetched a few times already [18:26:29] yeah, I tried them first, honest [18:26:30] (03PS2) 10Nemo bis: Gather all soft-disabled uploads wikis in one config item [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 [18:26:31] rt duty should shift hours for deployment [18:27:03] ok, so salt can call from the mw hosts [18:27:13] i just did a test repo fetch just fine [18:27:27] hrmm [18:27:34] does this involve submodules? [18:27:36] (03PS3) 10Nemo bis: Gather all soft-disabled uploads wikis in one config item [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 [18:27:53] gwicke: Nope. Simple repo [18:27:57] greg-g: by delegate you mean put all the details as we have them into an RT ticket and assign it to godog as the rt triage person? [18:28:03] (03CR) 10Nemo bis: "PS2-3 adds some more wikis which did the same but leaving sysop upload permissions implicit." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 (owner: 10Nemo bis) [18:28:17] RobH: `git deploy sync` failed for me in test/testrepo as well [18:28:23] bd808|deploy, thx [18:28:23] cuz thats what im totally goign to do rather than spend all afternoon poking in a system i dunno (normally i'd say thats cool and do it but i have to do procurement stuff today!) [18:28:24] RobH: as long as it is resolved before Thursday morning Pacific [18:28:30] RobH: so, yeah [18:28:33] im not willing to promise that ;] [18:28:42] RobH: cc Ryan on it [18:28:51] or... bug report? [18:28:53] bd808|deploy: wait, are you speaking in terms of troubleshooting or in terms of its halting your deploy? [18:28:54] public? [18:29:15] what were the last changes to this (gerrit etc) [18:29:20] RobH: I can skip this step and finish the MediaWiki deploy [18:29:21] this is just for updating scap, we can use the current scap version for now [18:29:31] * bd808|deploy agrees with greg-g  [18:30:08] you guys are pushing this on me, ill take it and make a report [18:30:08] agrees with greg-g and bd808|deploy , skip this window, user current scap, make Bugzilla [18:30:08] but konw that while im doing this [18:30:11] im not getting codfw orders done. [18:31:05] (not skip the window, we're still doing the normal mw rollout to non-wikipedias) [18:31:12] (03CR) 10BryanDavis: [C: 032] Group1 wikis to 1.24wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134399 (owner: 10BryanDavis) [18:31:16] So I don't understand how you guys deploy [18:31:21] (03Merged) 10jenkins-bot: Group1 wikis to 1.24wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134399 (owner: 10BryanDavis) [18:31:22] for me to write up whats wrong I have to I suppose [18:31:25] "it's complicated" [18:31:34] so... quickly/enough for the bug report: [18:31:47] (or someone else could file the bug) [18:31:57] you're right [18:32:02] * greg-g goes to write in bz instead of here [18:32:09] sorry man not to push it off [18:32:12] no no, you're right [18:32:42] !log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.24wmf5 [18:32:45] Logged the message, Master [18:33:50] !log Gave up on updating scap with trebuchet [18:33:55] Logged the message, Master [18:34:48] when did you do this the last time? (updating scap with trebuchet) [18:35:48] mutante: On or after April 28th [18:36:03] That's the version checked out on mw1173 (randomly sampled) [18:36:07] oh, could this be crazy firewall stuff? (that would be nice) [18:36:24] (not saying it is, im just shooting in dark, so not blaming firewalls!) [18:36:26] It looks like .git was touched there on May 7th [18:36:44] * bd808|deploy always blames firewalls until they are proben innocent [18:36:48] *broven [18:36:52] *proven [18:36:58] sheesh [18:37:02] bd808|deploy: srsly, why cant we just assume good faith? ;] [18:37:23] whenever we can jump in.... [18:37:28] I worry about deployers who typo [18:37:29] Firewalls (and router acls) have been my enemy for 15+ years [18:37:29] ;) [18:37:42] tab-complete ftw [18:37:47] i'm still getting cached js though and only see the bug with debug=true [18:37:47] aude: Go for it. Fatalmonitor looks good [18:37:50] ok [18:38:28] * aude waits for jenkins [18:39:48] mutante: I would say that May 7 22:49 was the last time `git deploy sync` was used on the scap repo [18:40:06] I was wondering who the best person to ask about the open Operations Security Engineer job rec would be? [18:40:22] (03PS2) 10Rush: restore old ircd role/module for ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/134402 [18:40:36] npcomp: mark probably [18:41:06] bd808: successfully, right? [18:41:11] (03CR) 10Rush: [C: 032 V: 032] "go" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134402 (owner: 10Rush) [18:41:12] * greg-g doesn't want to assume [18:41:27] greg-g: Yes. sucessfully [18:42:48] exception log is full with stuff from commonswiki [18:42:55] UploadFromStash [18:43:36] marktraceur: tgr_ ^^ related to UploadWizard or some such? [18:44:01] (03PS1) 10RobH: setup mgmt dns for virt server testing [operations/dns] - 10https://gerrit.wikimedia.org/r/134407 [18:44:18] greg-g: yes, it's an API only used by UploadWizard [18:44:52] !log aude synchronized php-1.24wmf5/extensions/Wikidata 'Fix jquery error tooltip issue' [18:44:53] "Error: 1213 Deadlock found when trying to get lock; try restarting transaction (10.64.16.29)" [18:44:55] Logged the message, Master [18:45:10] tgr: can you or marktraceur take a look to say whether it's worrisome or not? [18:45:23] greg-g and others - thoughts on getting wikibugs in here? [18:45:25] Looks to be database sandness. [18:45:29] in here? no [18:45:37] (03PS1) 10Rush: restore old ircd role/module for ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/134408 [18:45:44] ah, forgot icinga-wm is here :) [18:45:55] (03PS2) 10Rush: restore old ircd role/module for ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/134408 [18:45:59] yurikR: definitely a step in the right direction. re: "All m. & zero. results are varied on X-CS, X-SUBDOMAIN headers" - the main content urls are rewritten internally without m/zero, so this really means just the banner content links/api stuff right? [18:46:03] greg-g: it's fairly programmable now, so it will be just a tiny subset of things, but I guess makes sense to not get it here [18:46:15] (03CR) 10Rush: [C: 032 V: 032] "ircd forgot a file" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134408 (owner: 10Rush) [18:46:28] * marktraceur tries running through commons uploadwizard [18:46:42] YuviPanda: yeah, it's already loud in here with what they have, maybe in the future when all of Ops related things are in phab anyways [18:47:08] YuviPanda: but I have only partial say officially :) [18:47:08] greg-g: are these errors in logstash? [18:47:08] RECOVERY - Puppet freshness on ekrem is OK: puppet ran at Tue May 20 18:46:54 UTC 2014 [18:47:08] that fixed it :) [18:47:28] * greg-g looks [18:47:35] aude: ? [18:47:52] (03PS2) 10RobH: setup dns for virt server testing [operations/dns] - 10https://gerrit.wikimedia.org/r/134407 [18:47:58] jquery issue [18:48:28] * aude hopes that was the last and only issue [18:48:30] ah [18:48:44] (03PS3) 10RobH: setup dns for virt server testing [operations/dns] - 10https://gerrit.wikimedia.org/r/134407 [18:48:50] tgr: I don't see it in logstash [18:49:06] well, we had another one yesterday, so hope only 2 issues [18:49:18] greg-g: I probably won't be able to take a look then [18:49:34] tgr: All I see is https://logstash.wikimedia.org/#dashboard/temp/JO2W0QQ1SzKT5x-XVobJcA [18:49:44] hoo: got any more details re UploadfromStash? [18:50:24] I'm running an upload now, we'll see how it goes [18:50:38] is there a window to merge another deployment.pp style change [18:50:46] greg-g: :) I'll let it be now [18:50:52] just got one of them in yesterday [18:50:54] greg-g, tgr: The logstash exception logs are very unreliable. It's a transport problem where udp2log truncates the json events. [18:51:35] greg-g: Error: 1213 Deadlock found when trying to get lock; try restarting transaction (10.64.16.29) [18:51:51] mutante: maybe in an hour? I'd like marktraceur & tgr etc to finish debugging this uploadstash error first if possible [18:52:17] unless it's really really really a no-op :) [18:52:53] greg-g: i'll wait, i'm not a huge fan of "should be no-op" merge message myself:) [18:53:00] :) :) [18:54:00] greg-g: I didn't see any issue with a vanilla UW run; dunno about what the errors are saying [18:54:25] Also "only used by UploadWizard" isn't necessarily accurate; the stash upload is available for any client to use AFAIK [18:54:40] (03CR) 10RobH: [C: 032] setup dns for virt server testing [operations/dns] - 10https://gerrit.wikimedia.org/r/134407 (owner: 10RobH) [18:54:56] yeah, it's a public API [18:55:11] but only used by UW in practice AFAIK [18:55:25] anyway, the log greg-g just linked is definitely from UW [18:56:19] not GLAM/ [18:56:26] s\/\?\ [18:56:42] greg-g: I think GLAM is a straight upload from a job... [18:56:49] gotcha [18:58:30] (03PS3) 10Rush: initial data.yaml for users/groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 [18:58:32] (03PS2) 10Rush: standardize a few things in admins.pp for conversion [operations/puppet] - 10https://gerrit.wikimedia.org/r/134394 [18:59:38] well [18:59:50] there is an insert command with this comment [18:59:52] # Test to see if the row exists using INSERT IGNORE [18:59:52] # This avoids race conditions by locking the row until the commit, and also [18:59:55] # doesn't deadlock. SELECT FOR UPDATE causes a deadlock for every race condition. [18:59:56] bd808: can you do a test git-deploy of scap? [19:00:02] and it deadlocks [19:00:11] tgr: :/ [19:00:13] greg-g: Yup. [19:01:28] greg-g: \o/ [19:01:39] weee [19:02:13] !log Updated scap to 7b6fc47 [19:02:17] Logged the message, Master [19:04:41] bblack, not sure what you mean by the main content urls are rewritten - all together, there will be desktop, non-zero mobile (m.), zero mobile (m.), and no images zero mobile (zero.) [19:05:06] greg-g: on a completely wild guess I would flag https://gerrit.wikimedia.org/r/#/c/133268/ [19:07:21] is beta running old code by any chance? [19:07:33] aude: "Class 'ComposerAutoloaderInitb40c909e83d8b70e6daa93d0830bc [19:07:33] e9e' not found in /usr/local/apache/common-local/php-1.24wmf5/extensions/Wikidat [19:07:33] AaronSchulz: we're getting Error: 1213 Deadlock found when trying to get lock; try restarting transaction (10.64.16.29) [19:07:33] a/vendor/autoload.php on line 7" [19:07:50] AaronSchulz: tgr thinks it might be related to https://gerrit.wikimedia.org/r/#/c/133268/ ? [19:08:05] what? [19:08:07] (sorry for the paste collision) [19:08:13] aude: Looks like it's all from mw1138. I'll sync it [19:08:31] this happend before when hoo synced [19:08:58] i don't know how to prevent, as the code is correct in git [19:09:09] oh crap [19:09:30] which mw? [19:09:30] blockUI requires jQuery v1.3 or later! You are using v1.11.1 <-- is this known ? [19:09:38] on every page @ commons [19:09:40] !log ran sync-common on mw1138 [19:09:45] Logged the message, Master [19:09:55] greg-g: how common? [19:09:56] mw1128 [19:09:57] 38 [19:09:59] 1138 [19:10:07] is that the same one? [19:10:11] nope [19:10:17] hoo: how common were those uploadstash errors? [19:10:26] greg-g: sorry, busy [19:10:28] yeah [19:10:32] just realized [19:10:42] marktraceur: ^ [19:11:03] aude, hoo: I ran sync-common on mw1138. Not sure why it was out of step with the rest of the cluster. [19:11:19] bd808: I'll do what Reedy did [19:11:25] greg-g: was there any deploy to commons last few hours ? [19:11:26] it still is [19:11:48] matanya: yeah [19:11:55] Uh not sure [19:11:58] hoo: I think I just did that (what Sam did) [19:11:59] hoo: which was? [19:12:01] I didn't get it with my one upload [19:12:02] every page on commons show blockUI requires jQuery v1.3 or later! You are using v1.11.1 [19:12:09] marktraceur: can you see the logs? [19:12:13] aude: rm -rf wikidata and then sync [19:12:14] I haven't looked, no [19:12:18] marktraceur: I assume if hoo can you can ;) [19:12:21] sounds evil [19:12:22] wait [19:12:23] greg-g: see #wikimedia-commons for all complains [19:12:30] ok :/ [19:13:03] matanya: yeah, figured as much :( [19:13:07] (re jquery update) [19:14:05] aude: Stopped now AFAIS [19:14:21] hoo: ok [19:14:31] scary this happens [19:14:34] !log fixed Wikidata for php-1.24wmf5 on mw1138 by manually removing it and then running sync-common [19:14:37] Logged the message, Master [19:15:11] do we do rsync file diff base on file size or so? [19:15:11] bd808: ^ [19:15:46] only happened twice now... first time was during hackathon [19:15:46] aude: Would be nice, if we could force composer to not do this [19:15:54] hoo: +1 [19:16:36] hoo: Let me look at the rsync flags [19:17:21] aude: Well, we could just checkout HEAD all these files after composer update [19:17:45] hoo: the sync is basically `rsync -a --delete-delay --delay-updates --compress --delete --no-perms` [19:17:53] but that can be error prone if you forget one (unit tests will cathc that, thouhh) [19:18:24] i checked the diff carefully including for that [19:18:36] i see what you mean [19:18:45] matanya: sounds like someone did not realize there are numbers larger than 9 [19:18:52] what is using blockUI? [19:19:06] a lot of gadgets on commons [19:19:32] so, when folks have a chance, we have issues on beta (only, it seems) [19:19:34] is that a 3rd party library? can we just upgrade it? [19:21:16] tgr: http://jquery.malsup.com/block/ [19:22:18] tgr: post about this 9 month ago : http://wordpress.org/support/topic/blockui-requires-jquery-v13-or-later-you-are-using-v1102 [19:22:46] bd808: That looks totally sane... now idea how that can happen [19:24:23] matanya: I suppose this is for the 1.x branch? the 2.x which is on GitHub does not seem to make such version checks anywhere [19:24:39] correct [19:24:43] greg-g: Is the jquery noise the same as the stash error noise? [19:24:50] * marktraceur loads up the logs [19:25:02] marktraceur: that'd be weird, I don't think so [19:25:19] bd808: mh... can we maybe add --checksum just for sync-common? [19:25:21] i suggest to upgrade that lib to v2, if possible [19:25:35] not sure how much worse that would be [19:25:42] (performance wise) [19:25:55] matanya: is blockui in the gadget or in mwcore? [19:26:07] * greg-g hasn't pulled down mwcore since reinstalling [19:26:14] i wonder why we have such an old version of wikidata code on beta? [19:26:27] * matanya is looking [19:26:48] aude: really? file a bug about that please. [19:26:49] greg-g: https://commons.wikimedia.org/wiki/MediaWiki:Gadget-jquery.blockUI.js [19:26:58] and it's v2.66 actually [19:27:10] or did the code move? [19:27:12] bd808: ? [19:27:30] /data/project/apache/common/php-master ? [19:27:33] april 21 [19:27:48] also core and visual editor (can't be for real) [19:27:57] csteipp: hey, would be nice if you'd answer on https://gerrit.wikimedia.org/r/#/c/132393/ [19:27:58] tgr: Do you want to revert and deploy for that patch? I'm not sure if it's the cause but your guess is as good as mine [19:28:03] aude: The code in /data/project isn't used anymore [19:28:11] oh, ok! [19:28:17] i see /a/common [19:28:52] Yeah. That's a symlink to the files that Jenkins preps in beta now [19:29:08] ok [19:29:14] greg-g: all i see is https://gerrit.wikimedia.org/r/#/c/41938/ [19:29:18] And /usr/local/apache/common-local is the scap deploy target [19:31:08] ok [19:31:40] matanya: pretty sure we don't have that on wmf wikis [19:32:07] greg-g: seems like the issue was spotted [19:32:10] matanya: yep, not on commons [19:32:17] cool /me goes back over to that channel [19:32:23] seems we're getting an old cached version of a js file that's causing errors on beta (only) [19:32:36] and missing css (per the parser output issue we had last week) [19:33:07] i don't know if touching a file there would fix and apparently don't have permission (or sudo mwdeploy) [19:33:33] matanya: I am at a loss, I don't see any jQuery version check happening in the bockUI code [19:33:43] aude: on beta? ask in -qa [19:33:57] tgr: thedj fix it [19:34:08] oh, ok [19:34:25] greg-g, it's still very rough; but introducing jouncebot... it'll ping people when their window is up (and you can ask it what the next window is with the 'next' command) [19:34:35] tgr: for ref: https://commons.wikimedia.org/w/index.php?title=MediaWiki:Gadget-jquery.blockUI.js&diff=124461872&oldid=84405899 [19:34:41] greg-g: maybe [19:34:43] greg-g: ^ [19:34:56] mwalker: niiice [19:35:16] matanya: interesting [19:35:18] (03PS1) 10Legoktm: Remove NS_USER_TALK from $wmgNamespacesToPostIn [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134472 (https://bugzilla.wikimedia.org/65524) [19:35:20] jouncebot: next [19:35:21] In 1 hour(s) and 39 minute(s): Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140520T2100) [19:35:43] oooh, a bot announcing stuff? [19:35:48] mwalker: btw, will it break tomorrow morning since Megan doesn't have an IRC name? [19:35:56] hopefully not [19:36:02] :) [19:36:20] jouncebot: next2 [19:36:25] :) [19:36:37] asking for the moon here arentcha [19:36:54] marktraceur: dunno, seems like it was done to prevent some sort of race condition [19:37:04] mwalker: it's lovely, thank you :) [19:37:05] can you check how often this happens? [19:37:30] mwalker: There's no github.org :P [19:37:53] * [jouncebot] (tools.joun@208.80.155.145): https://github.org/mattofak/jouncebot [19:37:56] heh, yeah [19:38:06] jouncebot: help [19:38:15] eek [19:38:27] jouncebot: next [19:38:27] In 1 hour(s) and 32 minute(s): Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140520T2100) [19:38:27] tgr: Looks like a couple times a minute [19:38:33] nice one :) [19:39:02] marktraceur: we should probably roll it back then [19:39:15] AaronSchulz: ^^ any objections? [19:39:54] AaronSchulz: still talking about https://gerrit.wikimedia.org/r/#/c/133268/1 [19:40:22] Hm maybe more like once every couple of minutes but still [19:42:55] * AaronSchulz has a hard time following what's being talked about in the channel [19:43:30] there were about 3 difference conversations at one point. [19:43:36] s/ce/t/ [19:44:21] AaronSchulz: there are lots of database deadlock errors generated on commons [19:44:40] in LocalFile:recordUpload2, when it attempts to write the image table [19:45:14] https://gerrit.wikimedia.org/r/#/c/133268/1 seems like the only related change that was deployed to Commons today [19:45:17] that has been the case for months (as with waves of recent changes ones) [19:45:27] mwalker: Shame on you, jouncebot doesn't reply to CTCP SOURCE :) [19:45:34] * AaronSchulz looks at rates [19:45:44] marktraceur, ctcp is hard [19:45:59] *to test [19:46:39] The weird thing is, we don't seem to be getting reports of UW failures [19:46:47] Maybe people are too used to it failing [19:47:22] marktraceur: or maybe it is successful but messes up file/page history [19:47:32] upload errors have a tendency to do that [19:47:47] https://commons.wikimedia.org/w/index.php?title=File:Lindy_Hop_at_Washington,_DC%27s_DuPont_Circle.jpg&action=history seems fine [19:49:53] rate is up though [19:53:33] bblack, was disconnecting, if you replied [19:55:38] greg-g, for pdf work, it would be lovely if cscott had deploy rights -- what is the long arduous process for that? [19:55:59] mwalker: Ask, manager approves, wait 3 days, done. [19:56:05] via RT [19:56:13] working on that [19:56:23] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Deployment_requirements [19:56:29] James_F: :* [19:56:30] cscott: see ^^ [19:56:44] marktraceur: according to the last deadlock error in innodb, it's two threads doing the INSERT of similarly named files waiting on each other [19:57:46] twkozlowski: :-) [19:58:37] Hm. [19:58:53] bd808|LUNCH: assuming since |LUNCH you aren't mid-bug report for tracking the follow up needed re git-deploy/scap, so i'll do it right now [20:00:08] AaronSchulz: sounds like https://bugzilla.wikimedia.org/show_bug.cgi?id=64883#c5 but that bug does mess with the file histories and here the affected files seem to be all right [20:00:11] (03PS1) 10Jgreen: modify apache-fast-test to fetch config from web [operations/puppet] - 10https://gerrit.wikimedia.org/r/134478 [20:00:30] * AaronSchulz is still making sense out of the innodb output [20:03:16] (03CR) 10Jgreen: [C: 032 V: 031] modify apache-fast-test to fetch config from web [operations/puppet] - 10https://gerrit.wikimedia.org/r/134478 (owner: 10Jgreen) [20:04:33] AaronSchulz, marktraceur: this seems to affect users who are doing batch uploads with UploadWizard, hence the similarly named files [20:07:01] (03PS2) 10Nemo bis: Remove NS_USER_TALK from $wmgNamespacesToPostIn [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134472 (https://bugzilla.wikimedia.org/65524) (owner: 10Legoktm) [20:07:04] it's clearly some sort of gap locking problem [20:07:37] it would be odd for the FOR UPDATE in recordUpload2() to be a problem when recordUpload2() is wrapped in LocalFile::lock(), which also selects the row FOR UPDATE by primary key [20:08:56] yurikR: in varnish, the main content url for /wiki/Whatever gets the m. or zero. stripped from the hostname before passing on to the apache backends (and replaced with the Subdomain header) [20:09:30] yurikR: that's what I meant about varying... when you say you'll only vary on X-CS + X-Subdomain on m./zero., that means not on the main content, just on the zero API stuff for banners, etc? [20:10:21] (at least, that's what I seem to remember reading before) [20:10:57] otherwise things don't make sense if the main content is still varying on X-CS [20:11:23] bblack, of course we will vary on the URL's path - that's a path, language, x-cs ("enabled" or nothing), x-subdomain (nothing, m, zero) [20:11:37] that would be the full list of vary on [20:11:40] that's not what I meant :) [20:12:20] Are you saying the textual article content for the mobile english version of the article on Coffee will vary on X-CS still? because that doesn't solve anything. [20:12:34] oh [20:12:41] it will vary on it, but X-CS becomes either "enabled" or unset [20:12:47] which is only two variants [20:12:48] I missed the enabled-or-nothing bit, you're redefining what X-CS means :P [20:12:54] exactly! [20:13:00] let's pick a new header name? it will probably break something if we don't [20:13:17] i would rather not - this way we can gradually introduce the new system [20:13:20] without any new headers [20:13:42] we can slip it in one carrier at a time if we want to [20:13:47] and roll back easily [20:14:04] ok [20:14:43] so, the new meaning of X-CS (in terms of varying) is: if it's a special zero page like api/banner stuff, it's the carrier ID, but for other pages it's a boolean flag, and either way we vary on it? [20:14:52] yep [20:14:54] you got it :) [20:15:39] I think it will work for us, assuming we have a realistic plan for getting analytics on-board to kill the if/else [20:17:00] bblack, the great thing about analytics -- we can totally roll out this new system and reduce variance, while keeping analytics happy with if/else. [20:17:00] once they are done, they can remove it :) [20:17:00] no dependencies [20:21:00] hm, what's the policy for user accounts on RT. as an WMF employee i should have one, right? [20:21:29] i can't seem to login, trying to figure out the 1st order q of whether i've forgotten my login info, or never had it to begin with. [20:22:06] cscott: I had that same question, I think I wound up not having one initially [20:22:17] RT accous are auto created when you email RT [20:22:48] https://wikitech.wikimedia.org/wiki/RT#RT_access [20:22:53] "The best way to request access to RT is to use it. :) Just mail ops-requests@rt.wikimedia.org. You already have an autogenerated user by just mailing it; if you need more permissions, request them and we'll handle it. " [20:23:03] greg-g: ok, yay [20:23:35] greg-g, what would be the best way to roll out zero refactoring? I already mentioned it to you - I am refactoring zero extension into 3 extensions - JsonConfig, Zero-Banner, and Zero-Config [20:23:36] tgr, AaronSchulz - what's our status? [20:23:46] yurikR: carefully [20:23:50] thank you!!! [20:23:54] :) [20:24:04] yurikR: you mean time-wise? [20:24:10] what's the procedure to add a new ext to prod? [20:24:11] Heh, was about to say "I think refactoring zero extensions into three is just called 'writing three extensions'" [20:24:23] marktraceur: :P [20:24:28] marktraceur, not exactly - the code is mostly the same - its called refactoring :)) [20:24:31] yurikR: you mean technically/the keys you type? [20:24:39] (03PS1) 10Nemo bis: Remove unused AFT config for AbuseFilter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134482 [20:24:42] is it documented anywhere? [20:24:44] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Case_1d:_new_extension [20:24:54] ooo! we have that :) [20:24:57] :) [20:25:05] MaxSem is also pretty good at it :) [20:25:18] ? [20:25:20] MaxSem hates me now :) [20:25:25] deploying new extensions on prod [20:25:48] basically, yu follow the manual:) [20:25:51] heh [20:25:53] marktraceur: it would be nice to remove the SELECT from LocalFile::lock and update the one caller that uses that value [20:25:59] carefully, right? [20:26:16] yurikR: dot your t's, cross your i's [20:26:18] just rely in $wgMemc alone (maybe use LockManager later for more robustness) [20:26:30] ok, gotcha. as always. and watch fatalmonitor [20:26:58] yurikR: also, best to do it in phases (ie: just enable it on one wiki first before going to all wikis) [20:27:14] not sure if that applies in your case [20:27:23] the new 'FOR UPDATE' would also have to go [20:27:42] yes it does, i can do staged. I might even obsolete the zero ext all together :). Is there an easy way to [20:27:43] greg-g, for him it's harder because Zero is already deployed [20:27:46] maybe the row could be read by changing the isolation level and back instead (READ-COMMITTED) [20:28:07] is there an easy way to beta cluster deploy without prod? [20:28:11] yes [20:28:23] we have separate config for beta [20:28:40] yurikR: yeah, what max said [20:28:44] the -labs files [20:28:52] oh, yes, that would work. So basically deploy it as part of prod, but only change the beta configs to enable it [20:29:05] or even without it - because beta runs on master [20:29:21] right, just edit the -labs.php settings files [20:29:35] and extension-list-labs [20:29:39] doing a select FOR UPDATE for a non-existing row blocks out the INSERT for similarly named images (unless the closest existing rows were lexicographically super close to the non-existing one) [20:29:40] to break/test on beta cluster, then do the same on the non -labs ones to break/rollout to prod :) [20:29:44] thx! [20:29:55] <^d> Break all the clusters [20:29:56] tgr: I'm not really familiar with this area of the code, are you able to fix it up or should I dive in? [20:30:15] ^d, starting from scratch is always nice [20:30:15] ^d: :) [20:30:34] * MaxSem runs TRUNCATE TABLE revision; [20:31:02] <^d> I had a maintenance script that did that a long time ago. [20:31:11] <^d> Was called destroyWiki.php [20:31:19] <^d> Useful for when you fubar your own wiki. [20:31:21] PUT IT IN CRON [20:32:20] performance would go up [20:33:31] <^d> smaller tables -> less data to query [20:33:33] <^d> EVERYBODY WINS [20:35:58] !log Reload zuul to deploy I80496db747a8668be [20:36:00] ^d, vagrant destroy + vagrant up ;) [20:36:02] Logged the message, Master [20:36:46] marktraceur: I'll making a patch now...I'll see how far it gets [20:37:00] 'kay [20:38:56] hashar: ping [20:39:10] hashar: Why is it that jobs always go LOST when you create them and then trigger from Zuul? [20:39:21] Is there some kind of undocumented initiation ritual? [20:39:33] https://gerrit.wikimedia.org/r/#/c/132821/ [20:39:40] https://integration.wikimedia.org/ci/job/rcstream-pep8/ [20:39:40] i can abandon and resubmit [20:39:49] yurikR: it would still be nice to have a viable plan for analytics first, even if they're not done implementing it yet. otherwise we could get stuck with the if/else as an additional constraint on top of the new system [20:39:50] No, it'll fail either way [20:39:58] Krinkle: yeah i noticed that during the hackathon. Work around is to save one job via the Jenkins GUI [20:40:00] the problem is in jenkins/zuul somewhere [20:40:10] hashar: Yeah, build with parameters [20:40:25] but that's hard for a new project, there is no existing job build to copy from [20:40:34] Is there a bug for this? This is bad. [20:40:38] Krinkle: seems creating a job via the Jenkins API does not trigger a registration of the corresponding function in gearman [20:40:52] ah [20:40:54] wait, save one job? [20:40:55] you can save ANY job [20:40:57] or build a job? [20:41:02] that triggers the registration process [20:41:03] job config [20:41:06] Oh, that's better. [20:41:20] you can debug it in Zuul gearman server [20:41:27] to check whether the job is properly registered [20:41:28] What do I look for? [20:41:34] I looked there, but it's all greek to me [20:42:36] Krinkle: https://www.mediawiki.org/wiki/Zuul#Debugging $ echo status|nc -q 3 localhost 4730|grep whatever [20:42:45] that will show you the Jenkins job registered in Zuul [20:42:59] the one you just created is probably not there [20:43:35] it is there now :] [20:43:51] hashar: i'll look into the two ci bugs you flagged today btw, sorry i haven't replied yet [20:44:11] bblack, we have a plan for them - they will implement post-filtering on hadoop [20:44:21] ori: you might want to delegate to some other folks :] [20:46:10] ^d: I'm trying to match up an existing wikitech user with their old svn username. Things look generally right, but gerrit is still committed to his old shell name, even though there's no longer an ldap entry for that name at all. Does gerrit cache this someplace? Or have its own user db? [20:46:27] old name: grabovsky, new name: mgrabovsky [20:47:26] <^d> It's....in its own table. [20:47:48] <^d> Renaming users is a pain in the rear. If they just login as the new name it'll work but it won't be attached to the old one. [20:50:01] <^d> andrewbogott: I have docs on this. One sec. [20:50:18] Thanks. And, I know it's a pain, I'm not quite sure how I got lured into this :( [20:50:31] <^d> https://wikitech.wikimedia.org/wiki/Renaming_users, #3 is what you're wanting [20:50:49] cool, thank you! [20:50:53] * andrewbogott breaks things more [20:50:56] <^d> yw [20:52:09] hashar: https://gerrit.wikimedia.org/r/#/c/131466/5 [20:53:53] ^d: Oh, that's about how to change their full name (aka 'on wiki' name) whereas I want to change his shell name which gerrit calls 'Username' [20:54:44] Krinkle: I havent caught up with those patches yet [20:54:58] Just checkout and verify colors are fine [20:55:02] (for oyu) [20:55:15] not asking to merge [20:55:48] !seen pancakes [20:56:08] .seen pancake [20:56:19] .seen panforte [20:56:26] it all got eaten :< [20:56:44] :) it's an actual person [20:56:49] Scott [20:56:52] afaict [20:56:56] :P [20:57:13] (03PS1) 10Aaron Schulz: Removed old tampa config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134488 [20:59:10] matanya: maerlant is going to be decom ->repurpose [21:02:34] ^d: OK, I think I got it [21:04:53] <^d> andrewbogott: Sounds good. If you get stuck lemme know. [21:05:15] (03CR) 10Bsitu: [C: 032] Enable Flow on 3 mediawiki talk pages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133267 (owner: 10Spage) [21:05:43] (03Merged) 10jenkins-bot: Enable Flow on 3 mediawiki talk pages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133267 (owner: 10Spage) [21:06:46] (03PS4) 10Nemo bis: Gather all soft-disabled uploads wikis in one config item [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 [21:08:19] !log bsitu updated /a/common to {{Gerrit|I549967ca2}}: Group1 wikis to 1.24wmf5 [21:08:22] Logged the message, Master [21:09:19] (03CR) 10Dzahn: [C: 031] "this all looks good to me, and i totally understand for conversion this needs to be a bit more standard. the only thing left i was wonderi" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134394 (owner: 10Rush) [21:10:38] andrewbogott, apparently when you re-uid'd me this morning you broke parsoid [21:10:54] per RoanKattouw [21:11:04] on beta? I didn't change anything on beta... [21:11:12] You changed things in LDAP [21:11:13] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable Flow on 3 mediawiki talk pages' [21:11:13] beta uses LDAP [21:11:16] Logged the message, Master [21:11:18] Even for service users [21:11:21] Except… I didn't? [21:11:25] But, let me see... [21:11:26] Because you cannot have local users in beta at all [21:11:58] andrewbogott: All I know is I had files that were owned by parsoid that were suddenly owned by mwalker instead, which puppet half-corrected but not well enough for the log file to be writable, and so the service didn't come back up the next time it got restarted [21:12:27] RoanKattouw: This is all possible, but can you please clarify where this is happening? [21:12:32] deployment-prep /data/project/parsoid/* [21:12:41] I've chowned everything already so you can't tell now [21:12:46] hm... [21:12:52] But that directory and everything in it magically changed ownership [21:12:53] Ah, I bet I know what happened. [21:12:58] RoanKattouw, keep in mind that the parsoid user's uid was changed too [21:13:21] (03PS1) 10Yurik: added Opera support to 514-02 [operations/puppet] - 10https://gerrit.wikimedia.org/r/134490 [21:13:22] Note that service users in labs use LDAP because things explode horribly if you try to create local users and have them own files, for whatever reason [21:13:22] so maybe the missing chown was then, and mwalker just happened to get the recycled uid later [21:13:30] That's possible [21:13:38] In any case, it looks like parsoid and mwalker swapped UIDs [21:13:46] without their files being updated [21:13:48] Here's what happened: [21:13:52] (03CR) 10Dzahn: "needs the key for AndrewB. and wondering if naming it just "admin" is specific enough because we have those other admin groups" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [21:13:54] I wonder if Matt now has a bunch of files owned by parsoid [21:13:59] parsoid (in labs) has the same uid that mwalker used to have (in production) [21:14:11] (03CR) 10Dzahn: [C: 04-1] initial data.yaml for users/groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [21:14:15] So when I chowned all of mwalker's files to the new uid... [21:14:34] (03PS1) 10BryanDavis: Labs: Add deployment related sudoer rules for svn group [operations/puppet] - 10https://gerrit.wikimedia.org/r/134491 (https://bugzilla.wikimedia.org/65548) [21:14:42] andrewbogott, aha! [21:14:44] parsoid (on labs) got chowned as well. Despite my carefully not running the chown on labstore it ran there anyway. (due to me being a regexp dunce, apparently) [21:14:52] Oooh [21:15:00] You ran this in prod and accidentally ran it on labstore [21:15:02] OK that makes more sense [21:15:05] yep [21:15:14] So, are things still broken, or have you already chowned everything back? [21:15:21] The service is up [21:15:25] Since this is presumably not an ongoing thing :) [21:15:28] There might still be things owned by mwalker [21:15:37] the only thing that needs to be owned by parsoid is the logs [21:15:39] but not things that fatally break the startup of the service, such as the log directory [21:15:44] Yeah, hm... [21:15:58] And yeah as gwicke says, the log dir is the only Parsoid-related thing that lives in project storage [21:16:02] Everything else is on the instance itself [21:16:03] the above conflated parsoid and mwalker ownership. So there's probably not any good automated way to separate them. [21:16:53] all the old parsoid logs are now mwalker's to clean up ;) [21:17:20] sorry mwalker :( [21:17:47] (03PS1) 10Giuseppe Lavagetto: puppet3: remove dashes from ssh::hostkeys-collect class [operations/puppet] - 10https://gerrit.wikimedia.org/r/134493 [21:18:16] murp; better me than someone outside the foundation [21:20:24] (03PS1) 10Aaron Schulz: Mark internal backend as using "redisLockManager" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134494 [21:20:54] Mercifully there are only four more users that need their uids changed. Which means only five more of this kind of fuckup [21:21:53] (03CR) 10Dzahn: "don't get me wrong, this looks like it could work, but i'm thinking maybe the better fix is to convert all the old SVN users, like per htt" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134491 (https://bugzilla.wikimedia.org/65548) (owner: 10BryanDavis) [21:22:23] mutante: +1 for fixing ldap instead of my hack [21:22:43] bd808: also because we just talked about that today and andrewbogott fixed the bug i had submitted [21:22:52] for the missing docs to convert those [21:23:13] mutante: except that I'm trying that for a user now and it is going poorly :( [21:23:20] i guess we could run a script that just changes the primary group for all that are svn to wikidev? [21:23:31] Hm, maybe a different case since in his case he already had a new ldap entry and wanted me to switch it over [21:23:43] andrewbogott: is "add-labs-user" replacing the primary group? [21:23:51] i thought it was [21:24:16] 'labs' => ['wikidev', 'svn'], 80 [21:24:17] default => 'wikidev', [21:24:23] (03PS1) 10Bsitu: Undo "enable flow on Talk:Design" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134495 [21:24:25] if we could get rid of "svn" in here completely.. you know [21:25:03] wow, so my ldap account is from svn days? [21:25:20] aude: yea:) [21:25:25] (03CR) 10Bsitu: [C: 032] Undo "enable flow on Talk:Design" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134495 (owner: 10Bsitu) [21:25:26] get rid of it :) [21:25:33] if it's easy enough [21:25:33] (03Merged) 10jenkins-bot: Undo "enable flow on Talk:Design" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134495 (owner: 10Bsitu) [21:25:39] aude: it's more "convert it" [21:25:43] yeah [21:26:51] so if your primary group isnt svn anymore we also don't need to work around the sudo issue [21:27:02] (03CR) 10Krinkle: [C: 031] Move rcstream server implementation to external repo [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 (owner: 10Ori.livneh) [21:27:23] i'm not sure yet if modify-ldap-group .. changes the primary one or adds a new one [21:27:50] (as it's used by the add-labs-user script which was made for this conversion) [21:27:53] (03CR) 10Nemo bis: "Do we really/still want this?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83566 (owner: 10Reedy) [21:28:24] mutante: might be you can just use ldapvi. I worry about file ownership though. [21:28:46] !log bsitu updated /a/common to {{Gerrit|Idde23abd3}}: Undo "enable flow on Talk:Design" [21:28:50] Logged the message, Master [21:29:37] andrewbogott: can you think of a user we already converted? [21:29:49] the gid you mean? I don't think so. [21:29:56] So far it hasn't much mattered. [21:29:57] I think you would want those users to retain membership in the svn group [21:29:58] no, i mean running the add-labs-user script [21:30:04] because it was an old svn account [21:30:20] Just for the sake of permissions as andrewbogott mentioned [21:30:35] andrewbogott: RoanKattouw_away thanks for the parsoid/mwalker UID fix on beta cluster :-] [21:30:46] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Disable flow on mediawiki:Talk:Design' [21:30:51] Logged the message, Master [21:31:00] mutante: then I don't understand your question… isn't every user with gid 550 one that was converted? [21:31:08] bd808: does that mean back to your fix or "change primary group but keep svn as secondary" [21:31:22] mutante: change + svn [21:31:25] andrewbogott: no, not all users existed since svn [21:33:03] like aude and my user both have gid 550 [21:33:28] but i have not been converted. i just created it on wikitech [21:34:09] So… what's the question then? [21:34:28] Does this mean that wikitech used to create users with 550 and now creates them with 500? [21:35:06] the actual question is how to avoid needing this patch [21:35:14] https://gerrit.wikimedia.org/r/#/c/134491/1/modules/mediawiki/manifests/users.pp [21:35:30] "Older ldap accounts in labs have 'svn' as the default group" [21:35:49] That patch seems way safer than trying to modify 1000 users :) [21:35:50] (03PS1) 10Bsitu: Re-enable flow on Talk:Design ( Revmoed LQT code ) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134496 [21:35:51] now if aude already has 550 in LDAP [21:36:05] how does that even cause this problem [21:36:20] problem = https://bugzilla.wikimedia.org/show_bug.cgi?id=65548 [21:36:22] 550(svn) 500(wikidev) [21:36:27] (03PS2) 10Bsitu: Re-enable flow on Talk:Design ( Removed LQT code ) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134496 [21:36:31] then why am i 550 ?:) [21:36:49] (03PS2) 10Ori.livneh: Remove obsoleted debug code [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134053 [21:36:51] (03PS1) 10Ori.livneh: Update hostname of rcstream destination for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134497 [21:36:58] mutante: you sure that Ryan didn't create your account by hand? I assume that's true of me. [21:37:11] $ id dzahn -- uid=2075(dzahn) gid=550(svn) [21:37:12] andrewbogott: not sure enough about anything now:) [21:37:13] (03CR) 10Bsitu: [C: 032] Re-enable flow on Talk:Design ( Removed LQT code ) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134496 (owner: 10Bsitu) [21:37:24] (03Merged) 10jenkins-bot: Re-enable flow on Talk:Design ( Removed LQT code ) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134496 (owner: 10Bsitu) [21:37:26] ok, i officially requested deploy access. let the three day clock begin. [21:37:32] id bd808 -- uid=3518(bd808) gid=500(wikidev) [21:37:51] why doesn't' bd808 have uid 808? ;) [21:38:09] That would be sweet. 808's all around [21:38:16] sshh.. don't make people write scripts to wait for the "cool" UIDs before creating wikitech users [21:38:52] * mwalker creates script to create bogus users in order to get the 31337 uid [21:38:56] hehe, when wikidata started that happened [21:39:00] the low Q's [21:39:02] I will shamefully admit to having uid 1337 on at least one host I maintain [21:39:43] !log bsitu updated /a/common to {{Gerrit|I037cd0a42}}: Re-enable flow on Talk:Design ( Removed LQT code ) [21:39:47] Logged the message, Master [21:40:34] mutante: My patch doesn't seem to work anyway. Beta must be using another class to get that sudoers file [21:41:13] bd808: i was about to say the patch starts to look better again the more we talk about it :) [21:41:53] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Re-enable flow on mediawiki:Talk:Design' [21:41:55] I'll try to track down the right class to modify just in case [21:41:56] Logged the message, Master [21:41:56] (03Abandoned) 10Giuseppe Lavagetto: puppet3: remove dashes from ssh::hostkeys-collect class [operations/puppet] - 10https://gerrit.wikimedia.org/r/134493 (owner: 10Giuseppe Lavagetto) [21:42:52] andrewbogott: but work-around or not, we should still just run the !add-labs-user thing for aude? [21:43:47] mutante: Sorry, I'm lacking context. Is this a different problem you're talking about now? [21:43:48] whatever works [21:44:22] (03CR) 10Ori.livneh: [C: 032] Remove obsoleted debug code [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134053 (owner: 10Ori.livneh) [21:44:29] no, it's still the same problem that i linked to. can you just change aude's gid ? [21:44:41] just like for the other guy we talked about today [21:44:50] by running the script.. you created the docs today [21:44:54] (03CR) 10Ori.livneh: [C: 032] Update hostname of rcstream destination for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134497 (owner: 10Ori.livneh) [21:45:03] (03Merged) 10jenkins-bot: Update hostname of rcstream destination for labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134497 (owner: 10Ori.livneh) [21:45:06] (03Merged) 10jenkins-bot: Remove obsoleted debug code [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134053 (owner: 10Ori.livneh) [21:47:09] (03PS1) 10Steinsplitter: gtitmsg [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134501 [21:47:17] (03CR) 10jenkins-bot: [V: 04-1] gtitmsg [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134501 (owner: 10Steinsplitter) [21:47:36] mutante: I can try changing aude's gid to 500, if that's what you mean. [21:47:44] But as I said, I think that will cause problems and hilarity. [21:47:50] The bug is confusing because it says 'changed in LDAP from 550 (prod: wikidev labs: svn) to 550 (prod: svn labs: wikidev)' [21:48:18] which, I assume is a typo? [21:49:06] andrewbogott: ok, if you think it causes problems don't do it, i assumed it's the same as for jayvdb [21:49:09] gid 500 is wikidev in prod and labs [21:49:16] andrewbogott: then let's wait for the work around instead... [21:49:31] bd808: +1 for the patch now , heh [21:49:39] once we found the right file that is [21:49:41] mutante: if you're willing to be a test subject, I can change /your/ gid and we can see what happens :) [21:49:48] (03PS4) 10Rush: initial data.yaml for users/groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 [21:50:09] mutante: Now I just have to figure out why it's not applying on deployment-bastion [21:50:18] andrewbogott: no thanks i'm fine, i just say whatever works [21:50:50] just is irritating that i'm getting old cached js on beta [21:51:00] if i can 'touch' some of them, it might help :) [21:53:47] (03CR) 10Rush: initial data.yaml for users/groups (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [21:54:10] (03PS3) 10Rush: standardize a few things in admins.pp for conversion [operations/puppet] - 10https://gerrit.wikimedia.org/r/134394 [21:54:20] (03CR) 10Rush: [C: 032 V: 032] standardize a few things in admins.pp for conversion [operations/puppet] - 10https://gerrit.wikimedia.org/r/134394 (owner: 10Rush) [21:54:38] (03CR) 10Aaron Schulz: [C: 032] Removed old tampa config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134488 (owner: 10Aaron Schulz) [21:56:00] (03Merged) 10jenkins-bot: Removed old tampa config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134488 (owner: 10Aaron Schulz) [21:57:49] !log aaron synchronized wmf-config/filebackend.php 'Removed old tampa config' [21:57:54] Logged the message, Master [21:57:55] (03CR) 10Aaron Schulz: [C: 032] Mark internal backend as using "redisLockManager" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134494 (owner: 10Aaron Schulz) [21:58:25] (03Merged) 10jenkins-bot: Mark internal backend as using "redisLockManager" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134494 (owner: 10Aaron Schulz) [21:59:46] mutante: Interestingly, the /etc/sudoers.d file I was hoping to replicate for %svn on deployment-bastion apparently isn't really controlled by puppet. I edited it and re-ran puppet to see what rule changed it and it didn't change :/ [22:00:11] !log aaron synchronized wmf-config/filebackend.php '69201b4caf703ef1ab52b38be29c80b4e939fdc2 - no-op' [22:00:16] Logged the message, Master [22:00:24] aude: Can you try to sudo something as mwdeploy on deployment-bastion now? [22:00:36] bd808: there's a gui to manage sudo policy in labs projects... [22:00:47] https://wikitech.wikimedia.org/wiki/Special:NovaSudoer [22:00:56] Maybe I'm missing the point though [22:02:12] (03CR) 10Dzahn: [C: 031] initial data.yaml for users/groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [22:02:36] ok [22:02:59] andrewbogott: Yeah. It's not managed by that either. deployment-bastion:/etc/sudoers.d/wikidev_deploy has a header that says "This file is managed by Puppet!" and it looks like a Sudo_group[] managed file, but apparently it's not. [22:03:35] works! [22:03:40] bd808: i renamed it in one of my changes over the weekend [22:03:44] bd808: and i cleared it from prod [22:03:53] but not labs [22:04:01] ori: ah. [22:04:48] bd808: wow, it says it's managed by puppet but then it's not? :p [22:05:00] it was managed by puppet [22:05:01] or is it a role that is applied via wikitech [22:05:22] aude: You are live hacked for the win. I'll try to track down the right way to fix this now. [22:07:33] bd808: thanks [22:08:41] then when i 'touch' something, i do sync-dir ? [22:08:42] do i need to ? [22:08:50] jouncebot, die [22:09:11] aude: Run /usr/local/bin/wmf-beta-scap [22:09:22] mwalker: :( [22:09:26] ok [22:09:27] THat does a full scap but it typically takes <45s in beta [22:09:36] greg-g, had to increase it's memory on the job runner [22:09:38] it's coming back [22:09:45] hopefully to stay for longer than 20 minutes [22:09:45] ohai jouncebot [22:09:51] jouncebot: next [22:09:51] In 0 hour(s) and 50 minute(s): SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140520T2300) [22:10:01] geez, it's already past 3? :( [22:11:22] bd808: and touching particular file worked to fix the cache issue :) [22:11:36] ClaimGuidGenerator (which had GuidGenerator as a dependency) [22:11:47] but seems to depend on ClaimGuidGenerator modified time [22:11:50] greg-g: that's pretty slick, that jouncebot [22:12:09] chrismcmahon: mwalker is my fav [22:13:12] * aude eats now [22:15:03] bd808: does that mean all fixed? [22:15:54] mutante: Not quite yet. I hand edited a sudoers file on deployment-bastion. I need to figure out how to fix it properly. [22:17:43] ori: ^ you said you renamed it? [22:18:20] bd808: i'd say re-add to puppet [22:18:23] It is the sudo_group I changed in that patch. It was named 'wikidev_deploy' and now it's 'wikidev' [22:18:28] (03PS3) 10Withoutaname: Disable query pages for closed wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130609 (https://bugzilla.wikimedia.org/42436) [22:18:53] But the strange thing is that the new sudoers files isn't being created on deployment-bastion [22:19:21] So I'm looking to see if another rename detached the role/class from deployment-bastion [22:19:49] The beta overrides are a bit of a jumble [22:30:14] greg-g: i see 30 mins to deploy.. now that style change ... [22:30:18] ? [22:30:31] sure [22:30:34] well, and i could change tin's IPv6 but maybe later [22:30:39] sorry, I keep forgetting to ping you when there's a good time [22:31:09] no worries, you actually did and then i had something else.. [22:31:19] (03CR) 10Dzahn: [C: 032] fix quoting, arrows etc in deployment.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133644 (owner: 10Dzahn) [22:38:13] (03CR) 10Dzahn: "no puppet change on tin. only 12 errors/warnings left (vs. 414 before retab and 153 before this one)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133644 (owner: 10Dzahn) [22:42:14] nice [22:42:39] can also delete that scap stuff from terbium now [22:46:36] (03CR) 10Dzahn: [C: 031] "makes sense. there are quite a few things to delete on terbium, what ori said" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 (owner: 10Hoo man) [22:53:07] anyone good with iptables + conntrack? [22:56:20] (03CR) 10BryanDavis: [C: 04-1] Remove misc::deployment::scap_scripts from terbium (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 (owner: 10Hoo man) [22:56:24] I can do SWAT [22:56:33] or there are other pretenders?:) [22:57:19] pretenders? [22:57:51] * bd808 already had one deploy today [22:57:54] I haven't paid my penance this week if you dont want to do it [22:58:11] mwalker: you made jouncebot though! [22:58:46] * mwalker is waiting 2 minutes to see if it works without crashing in the real world this time [23:00:14] (03CR) 10Dzahn: Add usne DNS (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/133992 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [23:00:41] greg-g, nope; didn't work -- it didn't crash; but for some reason it's just not playing nice [23:01:56] :/ [23:02:03] MaxSem: godspeed [23:02:06] I'll work on it tonight again [23:02:12] beat it into submission [23:02:17] :P [23:02:43] (03CR) 10Dzahn: [C: 031] "good, but better after the Apache change, to avoid having to purge the URL from varnish (it will cache the error page)" [operations/dns] - 10https://gerrit.wikimedia.org/r/133980 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [23:06:09] (03CR) 10Dzahn: [C: 031] "easy enough, thanks to Reedy refactoring this (afair)" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133981 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [23:06:21] bd808: misc::deployment::vars is already in mediawiki::sync [23:06:41] but your point about include misc::deployment::common_scripts is probably correct [23:06:48] (https://gerrit.wikimedia.org/r/#/c/134282/1/manifests/site.pp) [23:06:53] hoo: Oh good. I didn't dig deep [23:08:51] bd808: How do we purge of old stuff? Will a root do that per hand or shall I mess with puppet (that will be messy) [23:09:06] (03CR) 10Dzahn: [C: 04-1] "usne.newikimedia seems wrong/double "ne"" (032 comments) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133991 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [23:09:14] hoo: I think mutante was resigned to doing it by hand [23:09:18] (03PS2) 10Hoo man: Remove misc::deployment::scap_scripts from terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 [23:09:28] hoo: yea, a root will do it [23:09:49] (03CR) 10Hoo man: Remove misc::deployment::scap_scripts from terbium (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 (owner: 10Hoo man) [23:10:05] hoo: the alternative would be reverting the entire old class, ensure => absent instead of present .. [23:10:15] but i didn't think we want that in in this case [23:10:31] by hand is easiest in this case [23:10:45] but hoo you can prepare the script or list of paths to make life easier for mutante [23:10:56] mutante: That's what I meant with messy [23:11:56] !log maxsem synchronized php-1.24wmf5/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/134405/' [23:12:01] Logged the message, Master [23:13:41] ori: I guess mutant.e is good enough at copy and pasting on his own :D [23:14:19] !log maxsem synchronized php-1.24wmf4/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/134405/' [23:14:24] Logged the message, Master [23:16:27] (03CR) 10Dzahn: "wait, afair from us moving "pa.us" to "pa-us", please copy pa-us, so that would mean the entire thing is "ne-us" instead of "us-ne"" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133982 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [23:20:03] i don't think we should make "us-ne" for New England [23:20:16] when we have "pa-us" for Pennsylvania [23:22:00] (03CR) 10Dzahn: [C: 04-1] Add us_newikimedia set up configs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133982 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [23:22:58] (03CR) 10Dzahn: [C: 04-1] "sorry, i think it should be "ne-us", not "us-ne" because Pennsylvania is already http://pa-us.wikimedia.org/wiki/Main_Page" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133981 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [23:23:55] (03CR) 10Dzahn: [C: 04-1] "sorry, i think it should be "ne-us", not "us-ne" because Pennsylvania is already http://pa-us.wikimedia.org/wiki/Main_Page" [operations/dns] - 10https://gerrit.wikimedia.org/r/133980 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [23:25:15] PROBLEM - MySQL Processlist on db1064 is CRITICAL: CRIT 150 unauthenticated, 0 locked, 0 copy to table, 2 statistics [23:26:05] RECOVERY - MySQL Processlist on db1064 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 1 statistics [23:29:17] (03PS1) 10BryanDavis: [WIP] labs: Fix beta to work with role::mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/134519 [23:31:50] (03PS5) 10Rush: initial data.yaml for users/groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 [23:31:59] (03CR) 10Rush: [C: 032 V: 032] initial data.yaml for users/groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [23:34:00] (03PS2) 10BryanDavis: [WIP] labs: Fix beta to work with role::mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/134519 [23:36:38] (03CR) 10Ori.livneh: "the if labs / if production branching in the modules is awful, it'd be good not to reintroduce it if we can avoid it. why are mwdeploy / l" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134519 (owner: 10BryanDavis) [23:37:29] (03PS1) 10Rush: manage wikidev group exists [operations/puppet] - 10https://gerrit.wikimedia.org/r/134524 [23:38:39] (03PS1) 10Yurik: Remove optional alpha character at the end of the X-CS [operations/puppet] - 10https://gerrit.wikimedia.org/r/134525 [23:38:54] (03CR) 10BryanDavis: "> why are mwdeploy / l10nupdate in ldap in the first place?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134519 (owner: 10BryanDavis) [23:39:11] !log maxsem synchronized php-1.24wmf4/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/134517' [23:39:15] Logged the message, Master [23:40:30] (03PS2) 10Rush: manage wikidev group exists [operations/puppet] - 10https://gerrit.wikimedia.org/r/134524 [23:40:35] !log maxsem synchronized php-1.24wmf5/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/134517' [23:40:40] Logged the message, Master [23:41:23] (03CR) 10Ori.livneh: "Could we then either configure Puppet to talk with LDAP (http://docs.puppetlabs.com/references/latest/type.html#user-provider-ldap) or mov" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134519 (owner: 10BryanDavis) [23:41:33] (03CR) 10jenkins-bot: [V: 04-1] manage wikidev group exists [operations/puppet] - 10https://gerrit.wikimedia.org/r/134524 (owner: 10Rush) [23:41:40] (03PS3) 10Rush: manage wikidev group explicitly [operations/puppet] - 10https://gerrit.wikimedia.org/r/134524 [23:42:24] (03PS4) 10Rush: manage wikidev group explicitly [operations/puppet] - 10https://gerrit.wikimedia.org/r/134524 [23:42:54] (03CR) 10Rush: [C: 032 V: 032] "no new users or anything, just making sure group 500 exists as we have to carry over using it as the PIG for users from admins.pp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134524 (owner: 10Rush) [23:43:57] (03CR) 10Dzahn: "good! thinking about the recent discussion regarding 500 vs. 550 for historic reasons etc.." [operations/puppet] - 10https://gerrit.wikimedia.org/r/134524 (owner: 10Rush) [23:52:19] !log maxsem synchronized php-1.24wmf5/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/#/c/134504/' [23:52:22] Logged the message, Master [23:53:38] !log maxsem synchronized php-1.24wmf4/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/#/c/134504/' [23:53:42] Logged the message, Master [23:54:17] (03PS1) 10Rush: fix sudo stuff for ops/sudoers [operations/puppet] - 10https://gerrit.wikimedia.org/r/134533 [23:54:35] (03CR) 10jenkins-bot: [V: 04-1] fix sudo stuff for ops/sudoers [operations/puppet] - 10https://gerrit.wikimedia.org/r/134533 (owner: 10Rush)