[00:00:07] and each different network can have a different network node [00:00:09] dan-nl: don't think so, as long as things aren't broken, I'm happy :) [00:00:20] I'll be honest in that I haven't looked at how to configure that [00:00:23] nothing is broken as far as i know :) [00:00:27] hm [00:00:30] that might not be so bad [00:00:35] is that possible now? [00:00:52] mutante: have you seen my mail about bugzilla audit mails? (another one just arrived) [00:01:04] I haven't used neutron at all [00:01:11] cool, david and i will test further later today [00:01:11] well, i think that the multiple rows is because we just didn't plan out and have servers randomly around ;) [00:01:21] it should be possible in nova-network, since it supports network node per compute node [00:01:22] thanks for all your help! [00:01:28] but... multiple network nodes would be more awesome [00:01:32] but we want to switch to neutron which does not currently support that [00:01:36] paravoid: no i havent, is it very annoying? [00:01:38] oh [00:01:46] paravoid: i can make sure to turn it off, it's on my plate [00:01:57] mutante: it is, it's also wrong, mails --@ and some other weird aliases like that [00:02:10] dan-nl: thank you. g'night! [00:02:11] it's pretty late in the game to be changing this kind of stuff up ;) [00:02:12] mutante: also mails RT and continues appending to a what is now a monstrous RT ticket [00:02:25] paravoid: sorry about that, its from non-prod , i'll stop it [00:02:38] my billable time might be a little low to investigate this [00:02:43] paravoid: the plan was once that it'd be smart about using RT like that because i have the tickets and it nags me [00:02:45] I'd say get mhoover on it [00:02:46] if there is an issue [00:03:03] (03CR) 10jenkins-bot: [V: 04-1] Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [00:03:25] there must be a difference to kaulen, because the same ones there have been running all the time [00:03:33] without doing this part wrong [00:03:47] so its me changing the bugzilla puppet to module [00:04:23] (03PS6) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [00:04:47] so, if the vlan is split this means that traffic between instances will need to be routed through the network node [00:05:09] !log Reloading zuul to deploy I6df1f4acccd7a9 [00:05:23] Logged the message, Master [00:05:27] is it smart in the way instances are allocated? e.g. each tenant into one physical network/network node? [00:05:41] if so, I doubt we have much tenant-to-tenant traffic [00:06:02] right now we have one large flat network [00:06:24] and one vlan [00:06:51] eqiad's network design is very different than tampa's [00:06:53] it does not assign instances into a single physical network or network node [00:07:28] so [00:07:45] virt1000 & virt1009 are in A4 & A2 respectively [00:07:55] virt1001-virt1008 are in the same *rack* [00:07:55] virt1000 doesn't matter [00:07:58] not row, *rack* [00:08:18] assuming the compute nodes and the network node are in the same rack we're fine [00:08:42] or row [00:08:42] lose a switch = lose labs [00:08:42] scary [00:08:42] otoh, we have all of memcached in one rack, so... [00:08:42] (03CR) 10jenkins-bot: [V: 04-1] Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [00:08:52] if we lose any single rack we're likely fucked [00:09:23] and labs isn't currently designed with redundancy [00:09:32] ok [00:09:39] I have a feeling that what we're after is possible, but I haven't really looked into it [00:09:56] so I think the simplest solution for now would be to just swap virt1009 with one of the two unallocated boxes in B3 [00:09:57] (03PS7) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [00:10:14] and be done with this [00:11:32] virt1000 is labsconsole, right? [00:11:35] yes [00:11:39] and ldap I guess [00:11:58] nova-scheduler, keystone, ldap, dns, mediawiki, mysql [00:12:00] way too many things [00:12:08] right [00:12:18] !log ongoing schema changes on slaves, indexing only, logging gerrit 85508 [00:12:20] yeah, it's slowly coming back :P [00:12:23] heh [00:12:33] Logged the message, Master [00:12:42] hm [00:16:32] (03CR) 10Mattflaschen: [C: 032] Create "Draft" namespace on the English Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97675 (owner: 10MZMcBride) [00:17:14] (03CR) 10Qgil: [C: 04-1] "The structure looks valid, good! I would say the font is also pretty close. Just curious, which font is it? However, at least the 16x16 ic" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 (owner: 10RashiqAhmad) [00:17:27] ori-l: if you have a free moment today; I'd appreciate you taking a look at https://gerrit.wikimedia.org/r/#/c/102352/ and giving me a +/-1 on if that's how I should create a puppet module [00:17:45] (03Merged) 10jenkins-bot: Create "Draft" namespace on the English Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97675 (owner: 10MZMcBride) [00:19:56] !log mflaschen synchronized wmf-config 'Deploy Draft namespace to English Wikipedia. Remove old Labs overrides that were used for testing this.' [00:20:12] Logged the message, Master [00:20:30] greg-g, done deploying. [00:22:17] springle: have you seen the alert of es1003 having a failed disk? [00:22:24] paravoid: yes [00:22:26] ok [00:22:28] getting there :) [00:22:39] ok, just making sure, sorry :) [00:24:24] paravoid: yeah, it may just be easier to move servers and maybe even designate a rack as labs [00:24:33] it's just one server to move :) [00:24:50] and there's space in the rack, plus WMFnnnn servers (unallocated0 [00:25:10] hehe [00:25:11] ok [00:25:34] know the funny part? i had andrew switch the ip of labnet1001 from row b to row c ;) [00:25:40] haha [00:25:51] oh right, I didn't check labnet [00:26:04] it's row c right now [00:26:06] right [00:26:09] so we have to move that too [00:26:39] that was silly anyway even without eqiad's design [00:26:41] b5 is basically empty [00:26:53] if we don't mind having something with the netapps [00:26:55] having 100% of labs' traffic being cross rack [00:33:16] LeslieCarr, Ryan_Lane, paravoid, reading backscroll... [00:33:32] So is labnet pointless until it's moved to a different rack? [00:34:07] mwalker: revewing [00:34:35] (03CR) 10RashiqAhmad: "Qgil: I used the font "PT Sans Bold", which comes pretty close to the original font." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 (owner: 10RashiqAhmad) [00:35:04] yeah... [00:36:02] welp [00:37:14] LeslieCarr: so you opened a ticket about expanding a vlan across racks… is that still the plan or do we instead need Chris to haul the box across the datacenter? [00:37:38] haul across the dc [00:37:43] if it's urgent we can do smarthands [00:37:55] cuz chris is out here all week [00:39:10] ok, so… when I originally requested this box did I need to specify 'should be able to connect to labs'? Does that not go without saying? [00:39:12] * andrewbogott is peeved [00:39:29] (03PS3) 10RashiqAhmad: Updated internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 [00:40:05] well [00:40:10] i think nobody was thinking about that [00:40:11] since labs is special [00:40:26] honestly i wasn't thinking about it until today [00:41:25] So, just to reconfirm -- the current agreement is that we should have all labs hosts in the same row? [00:41:41] Should I make an RT about that? [00:43:30] LeslieCarr, not mad at you, just… mhoover's contract is going to run out before we even get him hardware :( [00:43:51] oh sad [00:43:51] yes [00:45:25] What about e.g. labsdb and labstore? I feel like they are intentionally distributed right now. Do they need to be consolidated as well? [00:45:54] Coren? [00:46:19] Aye? [00:46:53] Coren: So, keeping in mind that I don't know a single thing about networking... [00:47:08] andrewbogott: to be clear, it's just virt1009 + labnet1001 that need to be moved to B3 [00:47:08] I'm told that labnet isn't useful because it's on a different row and vlan from the labs nodes. [00:47:23] and the former can probably wait :) [00:47:33] but labnet being far away isn't great [00:47:42] paravoid: the db and storage hosts can be elsewhere because… we don't care about latency so much for them? [00:48:24] andrewbogott: That latency wouldn't matter much, because it doesn't live /inside/ the networking stack. [00:48:39] 'k [00:49:04] labnet is a virtual "switch" that needs to be able to exchange frames a lot with the instances; its latency is literally multiplicative. [00:49:36] OK, renamed https://rt.wikimedia.org/Ticket/Display.html?id=6524 [00:50:09] I'll make another ticket for virt1009 [00:50:54] this has nothing to do with latency [00:51:04] it's all the same DC, latency doesn't matter [00:51:21] different row means cross row traffic (i.e. filling uplinks) [00:51:24] paravoid: Well, I'd have thought that was the primary issue. What is, then? [00:51:26] ok, then… why does sharing a row matter? [00:51:33] and for labnet, it also means that it's impossible due to eqiad's network design [00:51:41] since you can't have layer 2 cross-row traffic [00:51:52] (each row is a separate l2 domain) [00:51:54] paravoid: Really? Huh. [00:52:00] layer 2 = ethernet? [00:52:06] yes [00:52:32] And openstack happens at l2, it's not just plain old ip? [00:52:53] andrewbogott: It's an entire virtual network at that level. [00:52:55] labnet is a router, you need to be on the same broadcast domain to route [00:53:07] to be the default gateway, basically [00:53:32] paravoid: Well, it /could/ be worked around but it'd be a mostrosity. (read: happy fun GRE) [00:53:36] Hm, ok. I knew that it /acted/ like a virtual network but assumed that it was… virtual :) [00:54:16] Anyway, ok, tickets created. I guess there's no point in my nagging since we're waiting for Chris to return to Virginia? [00:54:24] yes, there are multiple workarounds, but we're talking about moving a single box two rows over :) [00:54:37] andrewbogott: if it's urgent, you could use smart hands [00:54:40] DC staff [00:55:04] and if you don't trust them to move a server, you could just ask them to lay a cable cross racks/rows for now [00:55:29] although I'd prefer you to not do that actually [00:55:36] on second thought :) [00:55:38] I don't know if it's urgent, will defer to ryan and mike. [00:55:42] they should be able to move a server [00:55:48] I'm not worried about trusting them with the server, it's empty now anyway [00:56:00] yeah, let's not have them potentially plug into a different port and bring the DC down [00:56:52] dunno, I've never seen eqiad smart hands and how smart they are :) [01:00:03] they're all right [01:03:21] (03PS1) 10Krinkle: role/gerrit.pp: Clean up extra line breaks [operations/puppet] - 10https://gerrit.wikimedia.org/r/102362 [01:03:59] (03PS1) 10Krinkle: role/gerrit.pp: Mirror VisualEditor as wikimedia/VisualEditor [operations/puppet] - 10https://gerrit.wikimedia.org/r/102363 [01:04:18] (03PS2) 10Krinkle: role/gerrit.pp: Mirror VisualEditor as wikimedia/VisualEditor [operations/puppet] - 10https://gerrit.wikimedia.org/r/102363 [01:06:05] ok, I've made some tickets and sent some emails. [01:06:24] If we decide we need the smarthands, how do I go about getting them to do stuff? Is that something Rob mediates? [01:07:07] who can get me admin wiki privileges on http://en.m.wikipedia.beta.wmflabs.org/ ? anyone? [01:07:13] PROBLEM - Puppet freshness on cp1048 is CRITICAL: Last successful Puppet run was Mon 16 Dec 2013 06:59:39 PM UTC [01:07:19] i want to edit MediaWiki:Common.js [01:07:32] most anyone with ops can --- let me check the access list [01:08:18] (03PS1) 10Dzahn: temp disable bugzilla auditlog/metrics mails [operations/puppet] - 10https://gerrit.wikimedia.org/r/102365 [01:08:21] jdlrobson: on the wiki or on labs [01:08:30] just the wiki mutante [01:08:42] admin of wikipedia.beta, not admin of wikitech wiki [01:08:52] then i guess hashar is good to ping [01:08:55] correct mutante :) [01:09:02] or reedy maybe? [01:09:05] andrewbogott: ariel or faidon can [01:09:11] i can give daniel privileges as well [01:09:13] ok [01:09:24] I will wait for feedback from mhoover, then bug one of them if necessary. [01:09:55] This whole thing is kind of weird, I'm scrambling to get hardware ready for mike and ryan but am totally out of the loop in terms of their timeline. [01:10:16] yeah [01:10:21] oh ryan lane also has privileges [01:10:53] createAndPromote.php [01:11:05] (03CR) 10Dzahn: [C: 032] "paravoid, this disabled the annoying mails" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102365 (owner: 10Dzahn) [01:13:12] jdlrobson, done - why didn't you just ask me?:) [01:13:22] MaxSem: i didn't know - but thanks :) [01:13:28] Reedy: ah, i just remembered that from one single time.. wikivoyage wiki and making the first user ever:) [01:13:44] ah, it's resolved, thx Max [01:13:49] I seemingly can't login... [01:13:55] Presumably need a password from my vault [01:15:24] andrewbogott: hey andrew, thanks. im checking out your email [01:16:18] andrewbogott: as far as timeline, i'm not sure what the deadlines are. before xmas for the migration? or at least the network up? [01:16:50] mhoover: well, ideally the migration would be done before your contract runs out [01:17:00] (03CR) 10Reedy: "Caused bug 58612" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97675 (owner: 10MZMcBride) [01:17:28] andrewbogott: as far as the networking is concerned, i haven't mapped out or looked at eqiad in detail yet, so i can't be sure that moving this machine is a problem or not. i would think, if it's all on the same flat network, should be cool [01:18:12] mhoover: It has to be moved. [01:18:26] Currently it can't be used. [01:18:37] The question is when it being unusable will become a blocker for you. [01:18:41] andrewbogott: i would say leave the server move for the chris guy - if it needs to be moved before monday, we can go the smart hands route [01:18:55] (03CR) 10Ori.livneh: [C: 04-1] "Pretty good." (0320 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [01:19:04] OK. I'll drop it for now. [01:19:10] mwalker: ^ [01:19:25] andrewbogott: i don't think it would be a blocker until next week [01:19:34] 'k [01:19:42] andrewbogott: we're still figure out the openstack-ish details [01:20:25] (03CR) 10Dzahn: [C: 032] Fetch only "Wikipedia" label from Netha Hussain's blog to Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/102210 (owner: 10Nemo bis) [01:20:44] (03PS2) 10Dzahn: Fetch only "Wikipedia" label from Netha Hussain's blog to Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/102210 (owner: 10Nemo bis) [01:20:44] ok -- please let me know if there's anything I can do to help. [01:21:47] for now I'm out. [01:21:57] (03CR) 10Dzahn: [C: 04-1] "I'm totally copying Ori on this review: "unrated changes are easily overlooked so i'm giving it a -1, even though it's more of a -0.5"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [01:22:22] heh [01:22:31] :) [01:22:37] it fit perfectly [01:22:38] andrewbogott: thank you for the help/heads-up andrew/leslie/paravoid :) [01:22:45] http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Mobile+caches+ulsfo&m=cpu_report&s=by+name&mc=2&g=mem_report hm ? [01:23:19] there's a spike in bytes_in http://ganglia.wikimedia.org/latest/graph.php?r=month&z=xlarge&c=Mobile+caches+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [01:23:29] for mobile caches eqiad [01:24:13] PROBLEM - Puppet freshness on manutius is CRITICAL: Last successful Puppet run was Tue 17 Dec 2013 01:21:21 PM UTC [01:24:39] (03PS1) 10Reedy: Simplify Drafts related TitleQuickPermissions hook subscriber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102366 [01:24:41] varnishkafka? [01:24:55] Looking at the Drafts problem [01:25:13] paravoid: that was my thought when i saw the memory spike, but the change in network traffic is in bytes *in* [01:25:35] ignore the first graph, ulsfo is not in production, this is just slowly rampup of page cache following an OOM coming from a netmapper bug [01:25:55] (03CR) 10Mattflaschen: [C: 04-2] "Not equivalent." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102366 (owner: 10Reedy) [01:25:57] superm401: Stupid typo I bet [01:26:03] ulsfo network is measured in kilobytes per second, if you look more closely [01:26:24] (03CR) 10Reedy: "Needlessly verbose?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102366 (owner: 10Reedy) [01:26:37] mem on eqiad mobile caches also kinda funky, though: http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Mobile+caches+eqiad&m=cpu_report&s=by+name&mc=2&g=mem_report [01:26:40] and the second graph, eqiad, hmm, let me see [01:27:11] that funkyness is ottomata messing up [01:27:25] cp3011.esams & cp4011.ulsfo appear in that group [01:27:31] as separate entries from the proper ones [01:27:41] http://ganglia.wikimedia.org/latest/?c=Mobile%20caches%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [01:27:46] probably manual gmetric invocation [01:28:12] so if you go to search and type cp3011 you'll see two entries [01:28:41] this would explain the network spike as well [01:28:55] heh [01:29:02] Reedy, is it possible that wmg variables can not be arrays? [01:29:17] hm, maybe [01:29:32] * ori-l has to run [01:29:32] I'm pretty sure I tested the exact same thing, just with a different name, on Labs. [01:29:32] ok [01:29:40] will investigate, although not too worried [01:29:43] thanks for the heads up :) [01:30:26] superm401: Nope... We do that, a lot [01:30:26] '+arwikiversity' => array( 0 => 1, 6 => 1, 10 => 0 ), [01:31:04] Yeah, there's even one right above. [01:33:53] It's not an obvious typo (it's the exact same string in both places) [01:34:07] noc.wikimedia.org looks right, so it doesn't seem like only one file got deployed. [01:36:24] (03PS1) 10Manybubbles: Cirrus config update [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102369 [01:41:26] (03PS2) 10Reedy: Simplify Drafts related TitleQuickPermissions hook subscriber [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102366 [01:41:32] (03CR) 10Reedy: Simplify Drafts related TitleQuickPermissions hook subscriber (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102366 (owner: 10Reedy) [01:42:33] superm401: var_dump( $wmgExemptFromUserRobotsControlExtra ); is right for enwiki and also for !enwiki [01:42:52] let's dig a little deeper [01:43:10] It's running way after the extract. [01:45:52] I wonder what it is if not an array [01:46:00] (03CR) 10Mattflaschen: [C: 04-1] "Equivalent now, but update the comment." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102366 (owner: 10Reedy) [01:46:49] !log reedy synchronized wmf-config/ 'Ensuring consistency...' [01:47:06] Logged the message, Master [01:49:38] Reedy, $wgExemptFromUserRobotsControl is right too, according to mwscript eval.php on enwiki. [01:50:04] And a random other wiki (ruwiki). [01:50:40] The other thing could be caching [01:50:42] ruwiki is just 0, of course. [01:51:41] * Reedy kicks things [01:52:02] !log reedy synchronized wmf-config/ 'touch' [01:52:16] Logged the message, Master [01:52:36] * Reedy waits [01:52:37] Reedy, it's not on every request, right? [01:52:44] If it were, it would be a lot more, I think. [01:52:48] Nope [01:53:34] tail has fallen silent [01:54:21] I do wonder if when we're syncing wmf-config and/or InitialiseSettings.php explicitly we should touch it [01:54:38] None since Dec 18 01:51:27 [01:54:55] I actually changed all four files, though. [01:55:18] Did you happen to check the mtime before you did the touches? [01:55:22] if ( @filemtime( $filename ) >= filemtime( "$wmfConfigDir/InitialiseSettings.php" ) ) { [01:55:43] -rw-rw-r-- 1 mflaschen wikidev 431778 Dec 18 00:18 InitialiseSettings.php [01:55:53] -rw-rw-r-- 1 mflaschen wikidev 431778 Dec 18 01:51 InitialiseSettings.php [01:55:57] That's UTC, right? [01:56:11] Yeah [01:56:15] The beginning of the lightning window was 00:00 on the dot. [01:56:18] I think we're on UTC [01:56:21] So that mtime should be fine. [01:56:27] Yup [01:56:28] Wed Dec 18 01:56:22 UTC 2013 [01:56:44] It fixed it though (coincidence?) [01:57:03] I've seen this happen a few times [01:57:29] Usually pulling in "older" changes [01:57:48] date +'%Z' confirms UTC [01:57:59] Blech, I hate when the problem stops but I have no idea why. [01:58:33] Thanks for helping my troubleshoot, and for fixing it if it was the touch. [01:59:42] I hate computers [02:02:12] Reedy, could it just be that some of the Apaches have really off time? [02:02:16] Are we monitoring that? [02:02:41] It could be the config caching then (BTW, didn't know about that). [02:05:34] Looking at the apache log entries and running date... [02:05:38] At most it's seconds [02:06:46] Yeah, never mind, the log entries do look about right. [02:08:48] Might almost be worth making sync-file touch by default [02:12:03] i actually did a sync-dir. [02:12:30] But yeah, it should DTRT [02:13:38] ori-l: Do I recall you saying something about js issues not needing files touched now to recache? [02:18:18] I think Krinkle's d3bdda3 change should have fixed that, so e.g. if you remove the oldest file in a module, it Just Works [02:25:17] !log LocalisationUpdate completed (1.23wmf7) at Wed Dec 18 02:25:17 UTC 2013 [02:25:35] Logged the message, Master [02:29:55] Reedy: ori-l: superm401: indeed. [02:41:17] (03Abandoned) 10Dzahn: add virtual language subdomain redirects for wikidata [operations/apache-config] - 10https://gerrit.wikimedia.org/r/65443 (owner: 10Dzahn) [02:42:19] longest abandon message ever [02:43:24] zero gerrit sounds like a pretty good goal [02:47:41] !log LocalisationUpdate completed (1.23wmf6) at Wed Dec 18 02:47:41 UTC 2013 [02:47:53] legoktm: tell me when you reach it >.> <.< [02:47:58] Logged the message, Master [02:48:07] p858snake|l: start revieiwng my patches plz [02:48:23] -1 for all \o. [03:09:14] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Dec 18 03:09:14 UTC 2013 [03:09:30] Logged the message, Master [04:03:29] (03PS1) 10Aquifacae: changed wiktionary/en.ico favicon to proper image from wikimedia commons and added 16x16, 32x32, and 48x48 resolutions to the file. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102382 [04:08:13] PROBLEM - Puppet freshness on cp1048 is CRITICAL: Last successful Puppet run was Mon 16 Dec 2013 06:59:39 PM UTC [04:25:13] PROBLEM - Puppet freshness on manutius is CRITICAL: Last successful Puppet run was Tue 17 Dec 2013 01:21:21 PM UTC [04:35:58] Reedy: why -- are you finding that you do need to touch files? [05:05:42] (03PS1) 10Tim Landscheidt: Tools: Unify Tools and Toolsbeta configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/102385 [05:20:05] (03PS1) 10Aquifacae: Changed wiktionary/en.ico favicon to proper image and resolutions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 [05:20:59] (03Abandoned) 10Aquifacae: changed wiktionary/en.ico favicon to proper image from wikimedia commons and added 16x16, 32x32, and 48x48 resolutions to the file. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102382 (owner: 10Aquifacae) [05:21:34] (03PS2) 10Ori.livneh: Add scholarships.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/98849 (owner: 10BryanDavis) [05:23:12] (03CR) 10Ori.livneh: [C: 032] Add scholarships.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/98849 (owner: 10BryanDavis) [05:32:39] !log DNS update: scholarships.wikimedia.org CNAME misc-web-lb.eqiad [05:32:57] Logged the message, Master [06:04:08] (03PS1) 10BryanDavis: Configure twig cache directory for Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/102393 [06:05:38] (03CR) 10Ori.livneh: [C: 032] Configure twig cache directory for Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/102393 (owner: 10BryanDavis) [06:07:40] please don't deploy such changes without ops present (I'm hardly present) [06:07:49] it looks safe, but you never know [06:08:00] which? [06:08:05] the cache, or DNS? [06:08:05] dns for starters :) [06:09:03] well, okay. [06:09:47] at least do it on business hours when people are present [06:10:57] !log Reloading zuul to deploy Ib05ad8d180d5239ac [06:11:24] okay, my bad. noted. i thought it was trivial. [06:13:43] it is, true [06:16:33] (03PS2) 10Aquifacae: Changed wiktionary/en.ico favicon to proper image and resolutions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 [06:19:50] anything I can help with now? [06:20:42] bd808: ^ ? [06:20:55] i think we're good, but maybe bryan needs something [06:21:02] are you both working at this hour?! [06:21:10] I'd like a hot cup of cocoa :) [06:21:19] OTher than that I think I'm golden [06:21:20] I'm used to ori-l by now [06:21:21] paravoid: you're one to talk :P [06:21:47] :P [06:22:05] I have a deploy tomorrow and was still getting new requirements this afternoon [06:22:29] xmas messed up the schedule a bit [06:23:06] paravoid: I'd like to talk to you about logstash next week if you have some time [06:23:25] yup, I can do that [06:23:37] I'd love to [06:23:55] Mostly I'd like your thoughts about syslog and other data you would like to see in there [06:24:25] I leave on the 26th [06:24:36] so early next week I suppose [06:24:53] Sure. Whatever time you can find [06:34:02] the trivial ones are usually the surprising ones /me isnt here [06:34:13] mutante: availability bias :P [06:34:23] the surprising ones are the ones you remember [06:35:53] /me changes nick to schrodinger's mutante [06:48:23] (03PS1) 10Ori.livneh: Re-disable Scholarship app until scheduled deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/102397 [06:48:39] oh? [06:48:45] I hope that's not because of my comments [06:49:23] no, I pushed for us to test it in prod so that there are no nasty surprises tomorrow [06:49:29] ('today' for you) [06:49:50] it looks good, though [06:50:30] (03PS2) 10Ori.livneh: Re-disable Scholarship app until scheduled deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/102397 [06:51:55] (03CR) 10Ori.livneh: [C: 032] Re-disable Scholarship app until scheduled deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/102397 (owner: 10Ori.livneh) [06:55:39] paravoid: do you mind if i restart apache on zirconium so that it picks up the vhost drop? [06:56:19] it's etherpad, soon it will be bugzilla, now it's not the one in use [06:56:20] reload should suffice, but yes, go ahead [06:56:44] ok, reloaded [06:57:09] doesn't seem !log-worthy [06:57:44] and it's that civicrm stuff, meh [07:06:24] how do you debug the memory usage of a running process? [07:06:39] pmap + gdb? [07:09:13] PROBLEM - Puppet freshness on cp1048 is CRITICAL: Last successful Puppet run was Mon 16 Dec 2013 06:59:39 PM UTC [07:26:13] PROBLEM - Puppet freshness on manutius is CRITICAL: Last successful Puppet run was Tue 17 Dec 2013 01:21:21 PM UTC [07:27:55] (03CR) 10Faidon Liambotis: "Didn't we explicitly agree to *not* do this for now?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102316 (owner: 10Yurik) [07:29:08] (03CR) 10Faidon Liambotis: [C: 032] "Thanks, the inconsistency was bugging me too." [operations/puppet] - 10https://gerrit.wikimedia.org/r/102333 (owner: 10Yurik) [07:29:29] paravoid, we had a meeting with dfoy yesterday, apparently HTTPS is a big blocker on his plate - hence wanted to get it in, as it won't fragment cache or add much to processing time [07:29:44] blocker for what? [07:30:06] it's not fragmenting the cache, but the config is already horrible to read & understand [07:30:36] for the partners that have already whitelisted us using ip -- mobile often forces HTTPS, which hides banners [07:30:50] we already have two of them (listed in that config [07:30:50] often? [07:31:15] "often" shouldn't happen [07:31:16] i don't have data - apparently we have forceHTTPS, which switches users [07:31:33] plus i think mobile really wants to push everyone to https by default [07:31:59] how did we get from "a big blocker" to "mobile really wants to push this!" ? [07:32:02] and this will provide a mitigating path - as we can set some "allow https" magic header [07:32:08] hehe :) [07:32:18] sorry, misspoke - dan asked for it [07:32:22] blocker is the wrong word [07:32:28] I'm all for https btw [07:32:40] but let's work on one piece at a time [07:32:46] and work on simplifying our config first [07:33:03] sure - but the only real simplification is ESI, right? [07:33:06] we keep creeping changes in because they're blockers, and we've created a monster [07:33:14] have you had a chance to talk to bblack ? [07:33:15] esi or the vmod we were discussing the other day [07:33:21] yes we did talk about it last week [07:33:31] he's working on 3.0.5 (he was before our meeting even) [07:34:10] right, so i would rather him not spend time on a vmod that will be thrown out if it works :( [07:34:15] do you know his ETA? [07:34:17] it's a three-way merge [07:34:28] our patches? [07:34:44] upstream's -plus branch (which is stuck at 3.0.3), upstream's 3.0.5, our patches [07:36:09] https://github.com/varnish/varnish-cache-plus + https://github.com/varnish/varnish-cache/tree/3.0 + operations/debs/varnish [07:36:26] (03PS2) 10Yurik: Handle HTTPS for Zero traffic [operations/puppet] - 10https://gerrit.wikimedia.org/r/102316 [07:36:37] that was a rebase [07:37:23] !log online schema changes on masters with sql_log_bin=0, indexes only, gerrit 85508 [07:37:41] Logged the message, Master [07:39:30] (03CR) 10Dan-nl: [C: 031] "- this can be merged now" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102347 (owner: 10Brian Wolff) [07:40:00] paravoid, so here are my thoughts - correct me if i'm wrong: ESI path - has two sub-paths -- migrate to 3.0.5 and backport gzip patch to our 3.0.3+. The non-ESI path is vmod. VMOD requires a new API module feature to generate mapping, a new vmod to do string->string (in simplest form), and some complex VCL code to parse that string into "supported features", and match those features... [07:40:01] ...against the incoming request. [07:40:23] don't get into the details of how to do the vmod [07:40:37] ? [07:40:48] we brainstormed a few ideas with brandon [07:40:52] it doesn't matter much [07:41:28] paravoid, i'm ok with vmod, but we also wanted to ask if bblack thinks it would be fairly simple to backport that one gzip patch? [07:41:38] we talked about having a vmod that would be able to load properties for carriers, if they support certain proxies, https etc. [07:41:39] i tihnk it would take much less time to try it out [07:42:07] I think he did already? [07:42:17] ping him I guess [07:42:52] not that i know - he backported a tiny sub-part of the patch (i incorrectly gave him the link to the code from mark, and later found another URL from mark explaining which it is) [07:44:11] that's why i think we really want to try backporting first, than vmod or 305 [07:44:28] timewise might be much much quicker [07:44:35] and than we are golden! [07:44:41] (maybe) [07:56:39] paravoid, btw, most of that patch for zero would be identical for vmod [07:56:47] (https) [07:57:01] because we still have to parse out ssl proxy ip [07:57:13] so the only difference would be inside the IF statements [08:00:51] (03PS2) 10Matanya: mediawiki_singlenode : lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/100790 [08:02:41] sorry, doctor's appt, gotta go [08:03:22] why is yurik awake? [08:04:20] jeremyb, one of those strange things... [08:04:26] (03PS1) 10ArielGlenn: fix up torrus squid template for case when there are no squids [operations/puppet] - 10https://gerrit.wikimedia.org/r/102400 [08:06:10] (03CR) 10Matanya: "Agreed. i'll refactor it to be a static svn server with a role class. This will take some time though." [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [08:06:27] (03CR) 10ArielGlenn: [C: 032] fix up torrus squid template for case when there are no squids [operations/puppet] - 10https://gerrit.wikimedia.org/r/102400 (owner: 10ArielGlenn) [08:12:02] (03PS3) 10Nemo bis: Changed wiktionary/en.ico favicon to proper image and resolutions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [08:12:22] (03CR) 10Nemo bis: "Thanks for your patch! I've tweaked the wrapping in the commit message a bit." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [08:13:06] (03CR) 10Yurik: "Faidon, most of this patch is needed regardless - we will need it for both VMOD and the ESI." [operations/puppet] - 10https://gerrit.wikimedia.org/r/102316 (owner: 10Yurik) [08:13:22] (03PS4) 10Nemo bis: Updated internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 (owner: 10RashiqAhmad) [08:14:05] (03CR) 10Nemo bis: "Thanks for the patch, glad to see you with us here in gerrit. :) I've just made "Wikimedia" uppercase in the commit message." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 (owner: 10RashiqAhmad) [08:14:23] (03PS6) 10Nemo bis: Updated wikibooks.ico. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102297 (owner: 10RashiqAhmad) [08:38:43] (03PS1) 10ArielGlenn: more torrus conf generation fixup for case of no active squids [operations/puppet] - 10https://gerrit.wikimedia.org/r/102402 [08:39:40] (03CR) 10jenkins-bot: [V: 04-1] more torrus conf generation fixup for case of no active squids [operations/puppet] - 10https://gerrit.wikimedia.org/r/102402 (owner: 10ArielGlenn) [08:39:59] meh [08:41:14] (03PS2) 10ArielGlenn: more torrus conf generation fixup for case of no active squids [operations/puppet] - 10https://gerrit.wikimedia.org/r/102402 [08:43:26] (03CR) 10ArielGlenn: [C: 032] more torrus conf generation fixup for case of no active squids [operations/puppet] - 10https://gerrit.wikimedia.org/r/102402 (owner: 10ArielGlenn) [08:45:23] RECOVERY - Puppet freshness on manutius is OK: puppet ran at Wed Dec 18 08:45:13 UTC 2013 [08:52:24] PROBLEM - MySQL Replication Heartbeat on db69 is CRITICAL: CRIT replication delay 307 seconds [08:53:13] PROBLEM - MySQL Slave Delay on db69 is CRITICAL: CRIT replication delay 341 seconds [08:55:04] good morning [08:55:13] RECOVERY - MySQL Slave Delay on db69 is OK: OK replication delay 0 seconds [08:55:23] RECOVERY - MySQL Replication Heartbeat on db69 is OK: OK replication delay -1 seconds [08:59:24] apergos: hi :-D [08:59:32] morning [08:59:42] apergos: I paired with Gabriel yesterday evening to polish up the Parsoid upstart script [08:59:49] excellent [08:59:57] I had an issue with nodejs being run as a background task using an ampersand & [09:00:16] that caused some weird issue with upstart respawning the process over and over hu [09:00:42] anyway that is fixed , patchset 6 of https://gerrit.wikimedia.org/r/#/c/99656/ should be fine [09:00:59] top thing to review is making sure it is not going to hit production :-D [09:01:58] I had some discussion with Gabriel about copytruncate, it is not ideal but we don't have any other option right now [09:02:21] though about letting upstart log to /var/log/upstart/parsoid.log but then that is only readable by roots :/ [09:02:26] so in short, ready to merge [09:02:33] (03PS1) 10ArielGlenn: remove cp1001-1020 from dsh/dhcp, reclaim rt #6530 [operations/puppet] - 10https://gerrit.wikimedia.org/r/102405 [09:05:42] (03CR) 10ArielGlenn: [C: 032] remove cp1001-1020 from dsh/dhcp, reclaim rt #6530 [operations/puppet] - 10https://gerrit.wikimedia.org/r/102405 (owner: 10ArielGlenn) [09:18:48] (03PS3) 10ArielGlenn: Fetch only "Wikipedia" label from Netha Hussain's blog to Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/102210 (owner: 10Nemo bis) [09:20:17] (03CR) 10ArielGlenn: [C: 032] Fetch only "Wikipedia" label from Netha Hussain's blog to Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/102210 (owner: 10Nemo bis) [09:23:50] apergos: could you get the parsoid upstart job for beta in please? :D https://gerrit.wikimedia.org/r/#/c/99656/ [09:24:14] ah everybody's back [09:24:20] that was a weird little netsplit [09:24:36] yes, but it's going to be about 15 to 20 mins, I'm in the middle of something righ tnow [09:26:45] okkk [09:29:35] hashar, I can haz jenkins support for TextExtracts?:) [09:30:19] MaxSem: do you have the Jenkins job builder config change and triggers added to zuul-config ? :-] [09:30:46] if I knew how to make them...:P [09:31:01] that is like [09:31:01] hmm [09:31:04] very easy :-] [09:31:30] MaxSem: https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Adding_a_MediaWiki_extension [09:31:54] MaxSem: or you can just clone ssh://gerrit.wikimedia.org:29418/integration/jenkins-job-builder-config.git [09:32:37] then in mediawiki-extensions.yaml there is huge list of all extensions, add TextExtracts there [09:32:47] then send for review, will create the jobs for you :] [09:33:03] unless you feel brave and follow the tutorial, that would let you create the jobs directly \O/ [09:34:15] ok [09:41:28] coffee [09:41:29] brb [09:50:45] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 06:49:39 AM UTC [09:55:33] did someone disable puppet on tungsten? I don't see it logged [09:56:43] which is a polite way of saying 'someone disabled it because I see it in the bash history, please log it.' [09:58:40] (03CR) 10Ori.livneh: Rewrite for multithreading (031 comment) [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101793 (owner: 10Ori.livneh) [09:59:54] as indicated in pm, it was me [09:59:56] sorry 'bout that [10:00:20] !log disabled puppet on tungsten to test mwprof changes; re-enabling now [10:00:34] Logged the message, Master [10:00:38] thank you! [10:02:31] (03PS8) 10Ori.livneh: Rewrite for multithreading [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101793 [10:05:17] hashar: can you please add a maxsize param to the parsoid logrotate? [10:06:00] even with daily rotation we would have had disk full from the onlyinclude bug in an hour or so [10:09:45] PROBLEM - Puppet freshness on cp1048 is CRITICAL: Last successful Puppet run was Mon 16 Dec 2013 06:59:39 PM UTC [10:09:52] I figure 1gb oughta be plenty large enough for it [10:10:58] apergos: will do [10:11:05] thanks! [10:13:26] apergos: any clue what is the disk space of /var on parsoid servers ? [10:13:32] cause we have rotate 15 [10:13:59] though the 1G files are going to be compressed so probably not that much of an issue [10:14:24] (03CR) 10Ori.livneh: "PS8: Added asserts for rewind, g_output_stream_close, and g_input_stream_close, fixed a small bug (& instead of | to combine flags), added" [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101793 (owner: 10Ori.livneh) [10:14:45] there's a lot of room [10:15:08] 1 gb and then compress it, even if we keep a pile of them, ought to be fine [10:15:37] i guess [10:15:43] (03PS7) 10Hashar: beta: manage parsoid using upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/99656 [10:15:52] apergos: amended adding 'size 1GB' [10:17:05] uh oh [10:18:15] hashar: you shouldn't configure logrotate [10:18:33] 6 gb total. I still think that's going to be ok after compression [10:18:38] er total free [10:19:02] instead of redirecting the output of parsoid, let it write to stdout; upstart will capture it and redirect it to /var/log/upstart/parsoid.log [10:19:04] and it will rotate [10:19:14] na can't do that [10:19:18] cause /var/log/upstart is root only [10:19:25] had that discussion with Gabriel yesterday :/ [10:19:42] ugh, makes sense [10:19:44] but annoying! [10:19:45] and we haven't found a way to change upstart log destinations nor how to change the ownership of resulting log files [10:19:46] we do want to configure it, because we need things like 'rotate if it gets too big' [10:20:19] i think the whole idea of upstart logging is giving clue to the sysadmin about some system going while [10:20:31] while the application is supposed to handle the logging itself to /var/log/ [10:20:48] +1 [10:21:42] well, [10:21:45] that make sense, actually [10:21:50] *makes [10:21:57] i concede the point. [10:23:01] but if this is an actual application log, as opposed to just stuff streaming into stdout, it should be, well, an actual application log [10:23:13] that would be nice, but for right now... [10:23:23] feel free to poke g wicke about it however :-D [10:24:33] (03PS1) 10ArielGlenn: remove mgmt and prod ips for cp1001-1020, reclaim rt #6530 [operations/dns] - 10https://gerrit.wikimedia.org/r/102412 [10:24:41] ah ori-l you are a grrrit-wm person! [10:24:44] oh nm it is back [10:24:46] so weird [10:25:03] i got roped into it, no idea how it's implemented really [10:25:09] ah ok [10:25:19] I just noticed your name on 'have access, can restart' on wikitech [10:26:20] (03CR) 10ArielGlenn: [C: 032] remove mgmt and prod ips for cp1001-1020, reclaim rt #6530 [operations/dns] - 10https://gerrit.wikimedia.org/r/102412 (owner: 10ArielGlenn) [10:27:23] blargh [10:27:26] /* User job logging not currently available */ [10:27:26] nih_assert (log->uid == 0); [10:27:39] (upstart/init/log.c) [10:29:52] nice [10:30:37] * User jobs by necessity are handled differently to system jobs. Since [10:30:37] * a user job must log their data to files owned by a non-root user, the [10:30:37] * safest technique is for a process running as that user to create the [10:30:39] * log file. [10:30:41] so what do we do for parsoid logs ? [10:30:45] from the horse's mouth [10:30:51] hashar: what you have in your patch [10:30:52] I really only need the upstart part for beta, logrotate I don't care about [10:30:53] i was wrong [10:30:59] you were right, etc [10:31:10] :D [10:31:23] yep but we want to be able to steal it for prod :-) [10:31:33] and I think parsoid is going to learn how to send syslog messages [10:31:39] or maybe some new json based system [10:31:58] it can be the first producer to logstash! [10:32:12] yeah i understand for prod though now I end up doing all the "integrate parsoid properly in production" when I just want to be able to restart it over ssh :D [10:32:14] i hope i don't regret suggesting that [10:33:06] :-D [10:33:33] i'll poke gwicke tomorrow to see if he'd be interested in that [10:34:23] today, rather [10:35:07] (03CR) 10Steinsplitter: [C: 031] Have sysops add and remove users from the 'gwtoolset' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102347 (owner: 10Brian Wolff) [10:35:11] ori-l: he would for sure [10:35:27] ori-l: I think he started an RFC to structure logging [10:35:36] no, that was bd808 [10:35:37] syslog|text files must die! [10:36:00] but bd808 is just finishing up the scholarship app and was looking to start piping data into logstash [10:36:51] (a.pergos: enabled puppet on tungsten now) [10:42:42] cool [10:42:57] I admit to being a fan of text files [10:43:04] when you are in the middle of omg it's broken [10:43:10] I wanna look at flat text easily [10:43:30] I want other stuff in parallel if I can get it, to be sure [10:45:32] (03CR) 10Steinsplitter: "we can merge it immediately to prevent confusion on commons." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102347 (owner: 10Brian Wolff) [10:49:45] RECOVERY - Puppet freshness on tungsten is OK: puppet ran at Wed Dec 18 10:49:42 UTC 2013 [10:50:10] (03CR) 10Alexandros Kosiaris: "So it is a middleware layer (an abstraction lets say). We live and breath abstractions as all who work in the (very very broad) "Computer"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83768 (owner: 10Dzahn) [10:51:24] (03PS8) 10ArielGlenn: beta: manage parsoid using upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/99656 (owner: 10Hashar) [10:53:19] (03CR) 10ArielGlenn: [C: 032] beta: manage parsoid using upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/99656 (owner: 10Hashar) [10:53:54] thank you for slogging through the prod stuff hashar [10:55:27] nice [10:55:31] deploying on the labs instance [10:56:56] info: /Service[parsoid]: Provider upstart does not support features enableable; not managing attribute enable [10:56:57] ... [10:58:14] wtffffff [10:58:48] (03PS1) 10Hashar: parsoid: upstart provider does not support 'enable' [operations/puppet] - 10https://gerrit.wikimedia.org/r/102416 [10:58:55] yeah [10:59:13] and now I have two parsoid daemons running ... [10:59:15] upstart jobs start on events, not rc [10:59:36] well, not necessarily runlevels [10:59:42] so "enabled" is ambiguous [11:00:23] (03PS1) 10ArielGlenn: move brion from roots to mortals, rt #4798 [operations/puppet] - 10https://gerrit.wikimedia.org/r/102417 [11:00:54] aargh [11:01:33] any possibility someone can deploy https://gerrit.wikimedia.org/r/#/c/102347/ and https://gerrit.wikimedia.org/r/#/c/102347/ at the same time? [11:01:40] (03CR) 10ArielGlenn: [C: 032] move brion from roots to mortals, rt #4798 [operations/puppet] - 10https://gerrit.wikimedia.org/r/102417 (owner: 10ArielGlenn) [11:01:51] (03PS2) 10Hashar: parsoid: upstart provider does not support 'enable' [operations/puppet] - 10https://gerrit.wikimedia.org/r/102416 [11:02:28] (03CR) 10Hashar: "Was removing the enable parameter from the production service (which is init script). PS2 remove it from the labs service." [operations/puppet] - 10https://gerrit.wikimedia.org/r/102416 (owner: 10Hashar) [11:04:57] Reedy: https://gerrit.wikimedia.org/r/#/c/102347/ is ready if you're okay with it [11:05:25] apergos: https://gerrit.wikimedia.org/r/#/c/102416/ would get rid of the puppet notice about upstart service not supporting enableable :D [11:05:30] yes I saw [11:05:37] I also saw the ps2 comment :-D [11:05:58] hehe [11:06:01] (03PS3) 10ArielGlenn: parsoid: upstart provider does not support 'enable' [operations/puppet] - 10https://gerrit.wikimedia.org/r/102416 (owner: 10Hashar) [11:06:19] something weird was that some reason I ended up with Parsoid being started by the old init script AND by the upstart job [11:06:26] ouch [11:06:43] so I guess puppet refreshed the service before updating the /etc/init.d/parsoid to point to upstart :( [11:06:56] yuck [11:07:02] long story short: ended up with two servers running [11:07:28] (03CR) 10ArielGlenn: [C: 032] parsoid: upstart provider does not support 'enable' [operations/puppet] - 10https://gerrit.wikimedia.org/r/102416 (owner: 10Hashar) [11:07:33] hmm [11:07:41] and I don't have the puppet run log :-( [11:08:50] maybe require should be made after [11:09:02] to ensure the /etc/init.d/parsoid symlink is realized before the Service is monitored [11:09:17] not sure whether "after" exists first [11:10:28] (03PS3) 10Hashar: beta: properly connect to parsoid instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/99659 [11:12:31] apergos: and https://gerrit.wikimedia.org/r/99659 can go in. That adapt the beta python script which updates beta continuously [11:12:38] I"m already lookin at it [11:12:42] basically added a tiny shell wrapper to remotely restart Parsoid. [11:12:56] tested it out a few weeks ago. should still be fine. The sudo policies are nasty though [11:13:12] but will get rid of them when I refactor the way Parsoid is updated on beta :D [11:13:12] I would recommend retesting now [11:14:15] ahhh [11:14:26] can't we merge amend later on if broken ? :D [11:14:48] gotta stop puppet, apply the sudo permissions manually, scp the files on the instance then test [11:14:54] * apergos gives hashar the old hairy eyeball  [11:23:37] hashar: shouldn't the file stanza for $beta_parsoid_remote_script have a source attribute? [11:28:07] apergos: ah yeah hmm [11:29:55] (03PS4) 10Hashar: beta: properly connect to parsoid instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/99659 [11:30:41] why do everyone add me as reviewers of random changes .. :( [11:30:47] no idea [11:30:50] take yourself off [11:30:54] I can't empty up my Gerrit dashboard :D [11:31:02] ah but you can :-D [11:31:05] yeah I should do remove myself more often [11:32:22] jenkins-deploy will only ever restart or status (not stop)? [11:33:13] yup [11:33:23] though I can add in stop [11:33:35] I don't know if you will want stop/start [11:33:37] just asking [11:33:39] but automatically jenkins would just restart, added in status in case I need it later on [11:33:45] ok [11:33:48] (03PS2) 10Mark Bergsma: Rename role::cache::varnish::text/upload to role::cache::text/upload [operations/puppet] - 10https://gerrit.wikimedia.org/r/102164 [11:34:11] that is all a bit messy :( [11:35:47] (03CR) 10ArielGlenn: [C: 032] beta: properly connect to parsoid instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/99659 (owner: 10Hashar) [11:36:34] (03CR) 10Mark Bergsma: [C: 032] Rename role::cache::varnish::text/upload to role::cache::text/upload [operations/puppet] - 10https://gerrit.wikimedia.org/r/102164 (owner: 10Mark Bergsma) [11:36:34] thx :-D [11:36:40] well don't thank me yet [11:37:28] well I am sure the sudo policies are fine, I spent coutnless hours figuring them out :D [11:37:57] .me resists the urge to ask 'and so how many hours was that?' :-P [11:38:11] 'undef' [11:38:14] . -> / off by one... [11:39:02] yeah so that is it [11:39:08] i hate puppet [11:39:11] and sdo [11:39:12] sudo [11:39:41] (03PS6) 10Mark Bergsma: Update Icinga cache groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/101859 [11:39:44] i thought your name was sudo -u hashar [11:40:54] (03CR) 10Mark Bergsma: [C: 032] Update Icinga cache groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/101859 (owner: 10Mark Bergsma) [11:45:41] (03PS1) 10Mark Bergsma: Remove old Ganglia groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/102424 [11:46:47] (03CR) 10Mark Bergsma: [C: 032] Remove old Ganglia groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/102424 (owner: 10Mark Bergsma) [11:47:20] mark, i took a first stab at the HTTPS, let me know what you think [11:47:31] off to bed, need some sleep :) [11:47:55] (03Abandoned) 10Mark Bergsma: Revert "dumps: Copy pagecounts data to public labs nfs too" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91604 (owner: 10Mark Bergsma) [11:50:14] (03PS2) 10Nemo bis: Make logscale in reqerror graphs actually work [operations/puppet] - 10https://gerrit.wikimedia.org/r/101065 [11:50:29] (03PS3) 10Nemo bis: Make logscale in reqerror graphs actually work [operations/puppet] - 10https://gerrit.wikimedia.org/r/101065 [11:53:10] (03PS2) 10Nemo bis: Enable collect_exim_stats_via_gmetric cron for mail relay [operations/puppet] - 10https://gerrit.wikimedia.org/r/101117 [12:16:53] (03CR) 10Alexandros Kosiaris: [C: 032] Setting up nrpe alert for varnishkafka process [operations/puppet] - 10https://gerrit.wikimedia.org/r/102325 (owner: 10Ottomata) [12:17:00] (03CR) 10Physikerwelt: [C: 04-1] "There are some manual steps required." [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 (owner: 10Physikerwelt) [12:21:35] (03PS1) 10Mark Bergsma: Consolidate all eqiad text IPs into text-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/102426 [12:45:06] (03PS2) 10Hashar: beta: fix up parsoid restarter [operations/puppet] - 10https://gerrit.wikimedia.org/r/102429 [12:45:49] apergos: and I am back with the necessary "follow up change" :D https://gerrit.wikimedia.org/r/#/c/102429 [12:46:00] got confused by the multiple levels of sudo / ssh / users :( [12:46:06] that one tried on labs :D [12:46:53] yay for that ;-) [12:48:36] hahaha [12:48:42] I read that line with mwdeplyo multiple times [12:49:01] and sure didn't say 'but wait a minute' ! [12:49:08] yeah that is hard to spot [12:49:15] no, it should have been easy! [12:49:24] anyways.. *testing* spots that right? :-P :-D [12:49:27] for mw/core we usually have multiple levels of review [12:49:35] (03CR) 10ArielGlenn: [C: 032] beta: fix up parsoid restarter [operations/puppet] - 10https://gerrit.wikimedia.org/r/102429 (owner: 10Hashar) [12:55:13] apergos: looks good enough now. Thank you very much! [12:55:33] glad it's working [12:56:00] bath udp2log still using augeas :D [12:56:06] * hashar resists [12:56:52] feel free! [12:57:01] don't do icinga, I have a draft I am slowly slugging away at [12:57:15] I mean, if you would suddenly be seized by the urge :-P [12:58:55] time for a break [12:59:02] * apergos looks at the clock... and time to eat! [13:10:22] PROBLEM - Puppet freshness on cp1048 is CRITICAL: Last successful Puppet run was Mon 16 Dec 2013 06:59:39 PM UTC [13:28:13] (03CR) 10Mark Bergsma: [C: 032] Consolidate all eqiad text IPs into text-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/102426 (owner: 10Mark Bergsma) [13:40:30] (03PS1) 10Mark Bergsma: Add new bits-lb IPs (new Zero scheme) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102432 [13:42:58] (03PS2) 10Mark Bergsma: Add new bits-lb IPs (new Zero scheme) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102432 [13:46:59] (03PS3) 10Mark Bergsma: Add new bits-lb IPs (new Zero scheme) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102432 [13:50:36] (03PS4) 10Mark Bergsma: Add new bits-lb IPs (new Zero scheme) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102432 [13:51:50] !log setting up new transit peering in eqiad [13:52:05] Logged the message, Master [13:52:32] (03CR) 10Mark Bergsma: [C: 032] Add new bits-lb IPs (new Zero scheme) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102432 (owner: 10Mark Bergsma) [13:52:32] done :) [13:54:19] hm [14:05:45] (03PS1) 10Mark Bergsma: Add new bits IPs to the https section [operations/puppet] - 10https://gerrit.wikimedia.org/r/102435 [14:07:35] (03CR) 10Mark Bergsma: [C: 032] Add new bits IPs to the https section [operations/puppet] - 10https://gerrit.wikimedia.org/r/102435 (owner: 10Mark Bergsma) [14:24:00] (03PS1) 10Mark Bergsma: Move eqiad text IPv6 addresses to text, update LVS monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/102437 [14:25:05] (03PS2) 10Mark Bergsma: Move eqiad text IPv6 addresses to text, update LVS monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/102437 [14:25:28] mark: hey btw, we can simplify config-geo a lot now, do you want me to prepare a patchset perhaps? [14:25:47] sure [14:29:20] (03PS1) 10Faidon Liambotis: Simplify config-geo with just one text-lb resource [operations/dns] - 10https://gerrit.wikimedia.org/r/102438 [14:29:21] (03CR) 10jenkins-bot: [V: 04-1] Simplify config-geo with just one text-lb resource [operations/dns] - 10https://gerrit.wikimedia.org/r/102438 (owner: 10Faidon Liambotis) [14:29:27] damn :) [14:29:36] oh [14:29:52] (03CR) 10Faidon Liambotis: "recheck" [operations/dns] - 10https://gerrit.wikimedia.org/r/102438 (owner: 10Faidon Liambotis) [14:30:14] ok, it passed now [14:31:11] mark: want to review that? [14:31:22] it's trivial, but there's the potential for an epic fail :) [14:32:58] (03CR) 10Mark Bergsma: [C: 031] Simplify config-geo with just one text-lb resource [operations/dns] - 10https://gerrit.wikimedia.org/r/102438 (owner: 10Faidon Liambotis) [14:34:11] so much cleanup this week [14:37:25] (03CR) 10Faidon Liambotis: [C: 032] Simplify config-geo with just one text-lb resource [operations/dns] - 10https://gerrit.wikimedia.org/r/102438 (owner: 10Faidon Liambotis) [14:38:09] (03PS1) 10Mark Bergsma: Remove now redundant LVS service 'ipv6' [operations/puppet] - 10https://gerrit.wikimedia.org/r/102439 [14:39:15] yeah [14:39:17] your machete [14:39:24] and my small pocket knife :P [14:40:59] (03CR) 10Andrew Bogott: [C: 04-1] "One typo -- looks good otherwise" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100790 (owner: 10Matanya) [14:42:14] i actually carry a pocket knife, always [14:42:33] https://wiki.debian.org/Merchandise/SwissKnives [14:42:33] :P [14:43:46] http://www.leatherman.com/7.html [14:44:21] it's very nice when in the dc [14:44:40] i have a swiss knife also, not debian branded ;p but I don't really use that anymore [14:46:40] (03PS3) 10Matanya: mediawiki_singlenode : lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/100790 [14:48:28] (03PS2) 10Ottomata: Let Mingle card references point to Thoughtworks' instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/102306 (owner: 10QChris) [14:48:36] (03CR) 10Ottomata: [C: 032 V: 032] Let Mingle card references point to Thoughtworks' instance [operations/puppet] - 10https://gerrit.wikimedia.org/r/102306 (owner: 10QChris) [14:49:52] (03CR) 10Andrew Bogott: [C: 032] mediawiki_singlenode : lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/100790 (owner: 10Matanya) [14:50:11] I think you've shown that to me before [14:50:16] looks familiar [14:50:35] thanks andrewbogott_afk [14:50:36] yeah, it's a leatherman ;p [14:50:42] (03PS2) 10Ottomata: Log X-Analytics header of response instead of request [operations/puppet] - 10https://gerrit.wikimedia.org/r/102107 (owner: 10QChris) [14:50:49] (03CR) 10Ottomata: [C: 032 V: 032] Log X-Analytics header of response instead of request [operations/puppet] - 10https://gerrit.wikimedia.org/r/102107 (owner: 10QChris) [14:51:34] (03PS2) 10Ottomata: role/gerrit.pp: Clean up extra line breaks [operations/puppet] - 10https://gerrit.wikimedia.org/r/102362 (owner: 10Krinkle) [14:51:41] (03CR) 10Ottomata: [C: 032 V: 032] role/gerrit.pp: Clean up extra line breaks [operations/puppet] - 10https://gerrit.wikimedia.org/r/102362 (owner: 10Krinkle) [14:52:07] (03PS3) 10Ottomata: role/gerrit.pp: Mirror VisualEditor to github as wikimedia/VisualEditor [operations/puppet] - 10https://gerrit.wikimedia.org/r/102363 (owner: 10Krinkle) [14:52:13] (03CR) 10Ottomata: [C: 032 V: 032] role/gerrit.pp: Mirror VisualEditor to github as wikimedia/VisualEditor [operations/puppet] - 10https://gerrit.wikimedia.org/r/102363 (owner: 10Krinkle) [14:54:52] (03CR) 10Mark Bergsma: [C: 032] Move eqiad text IPv6 addresses to text, update LVS monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/102437 (owner: 10Mark Bergsma) [14:56:30] oook morning manybubbles [14:56:45] morebots: [14:56:45] I am a logbot running on tools-exec-04. [14:56:45] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [14:56:45] To log a message, type !log . [14:56:48] morning [14:57:10] ottomata: everything is waiting on chad to wake up and approve the changes before our window [15:00:32] ok, so [15:00:35] there's this titlekey change [15:00:37] the config changes [15:00:45] !log Gerrit dead :( [15:00:53] !log Gerrit back :-) [15:00:55] and the submodule updates for wmf6 and wmf7 [15:00:59] right? [15:01:02] Logged the message, Master [15:01:18] Logged the message, Master [15:01:48] what's the process we go through manybubbles? [15:01:55] merge those, deploy, build indexes? [15:01:59] orrrr have indexes already been built? [15:02:05] hashar: lol [15:02:25] ottomata: can't build indexes until we deploy the change [15:02:45] ottomata: merge wmf7, deploy that. basic smoke test on test2wiki [15:02:52] merge wmf6, deploy that [15:03:06] wait a few moments to make sure we haven't caused more fatals in the logs [15:03:21] merge the config changes (both) and then deploy those [15:03:46] I have a mapping change that I'll deploy right at the beginning of the window. that email I sent. [15:04:00] that process isn't worth persisting because I'll make it no longer required soon [15:04:41] after config change, build indexes ala the wiki page [15:04:45] then dump data into them [15:05:09] manybubbles: yeah got some 503 for a few seconds, that was about it ;) [15:05:11] none of them are _that_ big today. enwikinews is the largest, followed by specieswiki [15:05:14] ok cool [15:05:31] I'm still building commonswiki in four screen sessions [15:05:44] manybubbles: so someone asked me whether ElasticSearch has a build in OCR to detect text in images it index [15:05:51] aye ok cool [15:06:01] manybubbles: and thus let us search on the text supposed to be in picture. Is that science fiction? [15:06:13] hashar: 10% science fiction [15:06:30] * hashar is 63,9% happy [15:06:32] mediawiki has a way to yank text out of some file formats (pdf and another) [15:06:41] we're going to be indexing that soon [15:06:48] that is crazy :-D [15:06:48] like, I'm working on the commit that searches it [15:06:51] (03PS1) 10Dan-nl: beta: i18n messages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102444 [15:06:57] I'll add you as a reviewer so you can see it [15:07:01] but there is no build in OCR in elastic search is there ? [15:07:07] no [15:07:16] I wouldn't want to send it the whole image any way [15:07:17] (03CR) 10Chad: [C: 031] Cirrus config update [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102369 (owner: 10Manybubbles) [15:07:27] yeah was just some random speculation with someone here [15:07:31] better to use stuff like imaginemagic to grab it [15:07:41] so yeah get MediaWiki to extract some metadata and index that [15:07:48] if imagemagic can't do it, no big ass java application will be able to either [15:07:53] imaginemagic is a lovely name :D [15:08:02] hashar: not just metadata. text too [15:08:18] ahhhh [15:08:44] Special:Search [ incategory:porn metadata:"color=pink" ] [15:08:57] so you would get exif search as well ? [15:09:19] (03PS3) 10Ottomata: Setting up nrpe alert for varnishkafka process [operations/puppet] - 10https://gerrit.wikimedia.org/r/102325 [15:09:23] (03CR) 10Ottomata: [C: 032 V: 032] Setting up nrpe alert for varnishkafka process [operations/puppet] - 10https://gerrit.wikimedia.org/r/102325 (owner: 10Ottomata) [15:09:57] I guess so [15:10:00] manybubbles: thank you :-] [15:10:20] hashar: exif is coming too, I believe [15:10:25] just not as soon as file text [15:10:49] take your time :-] [15:13:50] paravoid WHAA CHECK_GRAPHITE?! with holt winters confidence bands [15:13:52] aahhhhhhhh :) [15:14:01] :) [15:15:41] ahhhhhHHHHH [15:15:54] now i want to switch back to logster and just use statsd, [15:15:57] ahhhhhHHHH [15:16:04] lol [15:16:42] not gonna do it this year! hehehh [15:17:21] paravoid, so the part in the JVM application deployment proposal you don't like is jenkins building the jars? [15:17:35] mostly, yes [15:17:51] this idea came from a discussion with Ryan_Lane and uhhh, some other folks? maybe manybubbles was involved too [15:17:53] it's still very terse as a draft proposal, but I immediately flagged that [15:17:56] yeah [15:18:08] it was just documenting an iRC discussion that came up with that [15:18:21] so we wouldn't have to remember what we talked about [15:18:52] PROBLEM - Host text-lb.eqiad.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:861:ed1a::1 [15:19:02] hmz [15:19:12] that doesn't sound very good [15:19:24] yeah, so jenkins builds jars and uploads them is pretty standard. I figured we could wrap it in the appropriate amount of paranoia (whitelisted dependencies, etc) and then we wouldn't be too confusing. [15:19:48] i'd also be ok if there was a way for us to manually upload jars, rather than allowing jenkins to do it [15:19:51] if that is the worry [15:19:55] that's similar to how we use apt [15:20:05] we could build the jars on tin itself [15:20:07] from the repo [15:20:07] manybubbles: is having jenkins and gerrit open to the internet & volunteers pretty standard too? :) [15:20:21] (03CR) 10Qgil: "The proposed favicon is departing from a different graphic than the one currently deployed. Are you doing this following some instructions" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [15:20:22] and have it stored on the system, which would also be the fetch location [15:21:24] or we could have a second jenkins that only picks up certain types of jobs [15:21:33] * RoanKattouw wonders why he didn't get paged [15:21:40] paravoid: not too standard, but I've seen it before. The thing you have to do is make sure you control who can click the "build release" button. [15:21:49] It's paging hours CET but not paging hours PST, and Alex did change my hours to CET the other day [15:22:10] openness/transparency and security are often in conflict, briging these worlds usually involves more complicated solutions than what's standard out there [15:22:14] Oh, no nvm [15:22:17] Ryan_Lane: normally jenkins uploads the results to a maven repository manager. I believe my proposal was to have it do that [15:22:38] I have pages, they're just sent as MMS apparently and so my phone can't read them while overseas [15:22:48] yes, but we'd still need to trust jenkins and gerrit for that [15:22:53] RoanKattouw: ??? MMS ??? [15:22:54] manybubbles: anyone who can run tests for any other repository can just upload a fake test suite that opens a reverse shell [15:22:56] so ipv6 is broken atm [15:22:59] (03CR) 10Qgil: [C: 031] "Odder and me had already +1 the favicon itself. It is ready to be merged." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102297 (owner: 10RashiqAhmad) [15:23:03] mark: oh? [15:23:06] and i'm not sure yet why [15:23:14] I thought it was a broken check [15:23:16] looking [15:23:40] paravoid: fair enough. no reason a real person can't build the release artifact if required. hell, if we want a real person to do it all the time that isn't that bad. [15:24:09] the reason it isn't normally done is because jenkins requires a clean slate [15:24:13] mark: did you fix it? [15:24:20] it just came back here [15:24:21] akosiaris: http://i.imgur.com/4pTn9mK.png [15:24:40] akosiaris: On Dec 5 I was in the US, this week I'm in Europe [15:25:03] RECOVERY - Host text-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [15:25:12] icinga@neon ? lol ... mine show up only as Wikimedia [15:25:36] paravoid: now I see. having a blessed jenkins instance that only runs the release builds seems like it'd be overkill. [15:25:44] Hmm I have roaming auto-retrieve turned off [15:25:56] Presumably because there are charges for retrieving MMS messages while roaming [15:26:07] But even when I click the Download button I just get an error message [15:26:13] yes [15:26:14] Once I'm back in the US it'll work again [15:26:22] stupid ipv6 lvs service setup again [15:26:32] oh [15:26:32] ok [15:27:02] PROBLEM - Varnish HTTP text-backend on cp1065 is CRITICAL: Connection timed out [15:27:30] so, for this proposal then, we don't need to worry about jenkins right now [15:27:31] ? [15:27:32] yay for kmem_alloc [15:27:32] PROBLEM - Varnish HTCP daemon on cp1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:27:33] PROBLEM - Varnish traffic logger on cp1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:27:43] we just need to be able to upload jars to a repo? [15:27:51] and then make git deploy able to deploy from it? [15:28:08] it's really not a great time to discuss this while random things are breaking [15:28:17] let's do it via wikitech/email? [15:28:20] ha, sorry, discussion started before that and just kept going [15:28:28] k [15:28:29] you looking at cp1065? [15:28:42] I logged in, saw the kmem_alloc in dmesg [15:28:44] but now it hanged [15:29:00] I thought it was you and echo b >, but apparently not [15:29:08] I'm going to reset it now [15:29:12] ok [15:29:42] RECOVERY - Puppet freshness on cp1048 is OK: puppet ran at Wed Dec 18 15:29:35 UTC 2013 [15:29:42] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [15:29:50] !log powercycling cp1065, kmem_alloc deadlock [15:30:07] Logged the message, Master [15:30:18] I never upgraded those text varnishes for the jemalloc bug, either. Not that anything at that layer should explain a kernel deadlock, but still [15:31:02] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [15:32:42] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 1.04 ms [15:33:22] RECOVERY - Varnish HTCP daemon on cp1065 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [15:33:22] RECOVERY - Varnish traffic logger on cp1065 is OK: PROCS OK: 2 processes with command name varnishncsa [15:39:11] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [15:39:18] (03CR) 10Mark Bergsma: [C: 032] Remove now redundant LVS service 'ipv6' [operations/puppet] - 10https://gerrit.wikimedia.org/r/102439 (owner: 10Mark Bergsma) [15:39:22] aw crap [15:40:22] [ 480.683140] BUG: soft lockup - CPU#12 stuck for 22s! [varnishd:2765] [15:40:25] yay [15:40:50] !log powercycling cp1065 again, kernel BUG, soft lockup [15:41:07] Logged the message, Master [15:43:51] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 0.57 ms [15:46:31] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [15:52:01] (03PS1) 10Mark Bergsma: Remove unused 'specials' LVS class [operations/puppet] - 10https://gerrit.wikimedia.org/r/102447 [15:53:27] (03CR) 10Mark Bergsma: [C: 032] Remove unused 'specials' LVS class [operations/puppet] - 10https://gerrit.wikimedia.org/r/102447 (owner: 10Mark Bergsma) [15:55:18] (03CR) 10Hashar: [C: 032] beta: i18n messages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102444 (owner: 10Dan-nl) [15:55:29] (03Merged) 10jenkins-bot: beta: i18n messages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102444 (owner: 10Dan-nl) [16:13:47] (03CR) 10Odder: [C: 04-1] "Wow, that's totally the wrong icon. Please have a look at https://bits.wikimedia.org/favicon/wiktionary/en.ico -- you only need to add the" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [16:16:31] hashar: in the mood to deploy two favicons created by GCI students? [16:23:49] have things calmed down or should I call off the cirrus deploy that is in half an hour? [16:24:34] (03CR) 10Odder: [C: 04-1] "Wow, great improvement!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 (owner: 10RashiqAhmad) [16:25:03] Odder: [C: -1] "Wow, great improvement!" => o_0 [16:28:52] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [16:31:34] (03Abandoned) 10Ottomata: debian/varnishkafka.logrotate - fixing postrotate reload command [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/100385 (owner: 10Ottomata) [16:32:38] (03PS1) 10Ottomata: debian/varnishkafka.logrotate - fixing postrotate reload command [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/102452 [16:32:54] (03CR) 10Ottomata: [C: 032 V: 032] debian/varnishkafka.logrotate - fixing postrotate reload command [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/102452 (owner: 10Ottomata) [16:35:15] (03PS1) 10Ottomata: Updating changelog for release 1.0.0-2 [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/102453 [16:35:35] (03CR) 10Ottomata: [C: 032 V: 032] Updating changelog for release 1.0.0-2 [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/102453 (owner: 10Ottomata) [16:36:53] (03PS1) 10Ebrahim: Use local Wiki.png for Persian Wikipedia Logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 [16:40:29] (03CR) 10Odder: [C: 04-1] "Why?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [16:44:33] (03CR) 10Ebrahim: "As enwiki is using local file for its logo http://en.wikipedia.org/wiki/File:Wiki.png we want keep history of wiki logo changes instead ch" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [16:46:02] !log updated varnishkafka in apt to 1.0.0-2 and installed on mobile varnishes (this includes fix for logrotate postrotate command) [16:46:12] PROBLEM - DPKG on cp1046 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:46:20] Logged the message, Master [16:46:42] PROBLEM - DPKG on cp4011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:50:04] (03CR) 10Odder: "Config changes are generally required to pass through Bugzilla and show on-wiki community consensus." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [16:54:45] twkozlowski: not today sorry :( [16:57:36] hashar: kk, they've been waiting for a while, can wait a bit more :-) [17:01:27] about to start today's cirrus deployment. [17:08:42] (03PS1) 10Dan-nl: beta: adding gwtoolset jobs to $wgJobTypesExcludedFromDefaultQueue [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102464 [17:10:20] (03CR) 10Dalba: [C: 031] "As a user and admin on fawiki I think this a good idea and I'm pretty sure community will agree with it." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [17:13:18] (03PS2) 10Manybubbles: Keep TitleKey from stealing Cirrus' prefix search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102217 (owner: 10Chad) [17:13:22] (03CR) 10Manybubbles: [C: 032] Keep TitleKey from stealing Cirrus' prefix search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102217 (owner: 10Chad) [17:16:29] (03CR) 10Aquifacae: "The favicon I uploaded is the one used on all of the other language subdomains. I thought that it was peculiar that the .en subdomain shou" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [17:17:39] (03Merged) 10jenkins-bot: Keep TitleKey from stealing Cirrus' prefix search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102217 (owner: 10Chad) [17:20:07] (03CR) 10Odder: "Yeah, that's the problem with Wiktionary projects; they use two different favicons, not to mention the different logos!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [17:23:47] !log Zuul : speedy deploy of a Wikimedia hack to prevent Zuul from updating build description. Seems to slowing down everything. Hack is {{gerrit|102465}}. Deployed using https://www.mediawiki.org/wiki/Continuous_integration/Zuul#upgrading [17:24:03] Logged the message, Master [17:26:25] !g 99637,5 [17:26:25] https://gerrit.wikimedia.org/r/#q,,n,z [17:26:29] pff [17:26:30] !g del [17:26:31] Unable to find the specified key in db [17:26:32] !del g [17:26:32] If you want to remove a key, type !g del [17:26:44] !g 99637,5 [17:26:44] https://gerrit.wikimedia.org/r/#q,99637,5,n,z [17:26:49] * hashar whistles [17:27:12] !log otto synchronized php-1.23wmf7/extensions/CirrusSearch 'updating php-1.23wmf7/extensions/CirrusSearch to master' [17:27:25] Logged the message, Master [17:27:45] after something like 7 months, I think I got Zuul slowness fixed :-D [17:27:59] hashar: yay! [17:32:12] pff [17:32:23] I miss being in an office to exchange moods with folks in real time [17:32:33] Zuul got unleashed by commenting two lines [17:32:46] and I have known the cause for months and months :-( [17:33:44] (03PS2) 10Manybubbles: Cirrus config update [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102369 [17:34:02] (03CR) 10Ladsgroup: [C: 031] "As an admin of fa.wp I agree with Ebrahim." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [17:35:06] (03CR) 10Manybubbles: [C: 032] Cirrus config update [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102369 (owner: 10Manybubbles) [17:35:34] (03Merged) 10jenkins-bot: Cirrus config update [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102369 (owner: 10Manybubbles) [17:46:07] !log Zuul upgrade working as expected as far as I monitored it. Mail sent to wikitech-l and eng list. [17:46:08] I am off [17:46:24] Logged the message, Master [17:47:29] (03CR) 10Aquifacae: "Would you not think it better that the wiktionary projects at least have a uniform favicon? Since the english subdomain is the only one to" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [17:48:41] is there someone online that could +2 this https://gerrit.wikimedia.org/r/#/c/102464/1? [17:48:52] and merge it? [17:51:04] !log otto synchronized php-1.23wmf6/extensions/CirrusSearch 'updating php-1.23wmf6/extensions/CirrusSearch to master' [17:51:20] Logged the message, Master [17:51:23] (03CR) 10Odder: "It's much more complicated than that; there is significant political objection against changing stuff without gaining prior approval from " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [17:51:29] hm, my i think my ssh key expired before I ran that sync-dir [17:51:36] odd [17:51:38] it gave me key errors [17:51:44] well, i have it expiring locally every 5 minutes [17:51:50] ah [17:51:51] i usually ssh to stuff all at once, and then just work [17:51:55] i don't forward me key often [17:52:00] logging back in and trying again [17:52:01] yeah, only on tin [17:52:24] !log otto synchronized php-1.23wmf6/extensions/CirrusSearch 'updating php-1.23wmf6/extensions/CirrusSearch to master' [17:52:32] that's looking better [17:52:41] Logged the message, Master [17:53:02] (03PS2) 10Reedy: beta: adding gwtoolset jobs to $wgJobTypesExcludedFromDefaultQueue [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102464 (owner: 10Dan-nl) [17:53:06] (03CR) 10Reedy: [C: 032] beta: adding gwtoolset jobs to $wgJobTypesExcludedFromDefaultQueue [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102464 (owner: 10Dan-nl) [17:53:21] (03Merged) 10jenkins-bot: beta: adding gwtoolset jobs to $wgJobTypesExcludedFromDefaultQueue [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102464 (owner: 10Dan-nl) [17:53:22] thanks! [17:53:48] (03CR) 10Nemo bis: "Aquifacae, yes, it would be logical. However en.wikt recently explicitly decided to have its own special favicon and some other Wiktionari" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [17:54:19] (03PS4) 10Reedy: Have sysops add and remove users from the 'gwtoolset' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102347 (owner: 10Brian Wolff) [17:54:23] (03CR) 10Reedy: [C: 032] Have sysops add and remove users from the 'gwtoolset' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102347 (owner: 10Brian Wolff) [17:54:33] (03Merged) 10jenkins-bot: Have sysops add and remove users from the 'gwtoolset' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102347 (owner: 10Brian Wolff) [17:54:35] \o/ [17:57:27] RoanKattouw: I wonder if I should use the opportunity and add 102347 for labs too [17:57:38] I meant Reedy :-) [17:57:46] OK :) [17:57:49] * RoanKattouw was already wtf-ing :) [18:00:03] https://plus.google.com/hangouts/_/72cpicv6a7c4un0v5fmbhff79k?authuser=1&hl=en [18:00:04] hangout link ^ [18:01:29] oh, I see this imports stuff from the regular InitialiseSettings.php, good :) [18:05:03] (03CR) 10Qgil: "Just a bit of context for Aquifacae: in Wikimedia each project is quite sovereign. Wikimedia in English can decide one thing, Wikipedia in" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [18:08:43] (03CR) 10Andrew Bogott: "This is a brutal hack, but has proved useful for the last couple of new employees. Any objection to my merging this for use in the meanti" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98700 (owner: 10Andrew Bogott) [18:10:57] (03CR) 10Andrew Bogott: "Sorry I've neglected this... if you want to rebase and resolve conflicts I'll have a look." [operations/puppet] - 10https://gerrit.wikimedia.org/r/98377 (owner: 10Matanya) [18:19:49] !log otto synchronized cirrus.dblist 'Cirrus config update, adding new wikis' [18:20:06] Logged the message, Master [18:20:45] !log otto synchronized wmf-config 'Syncing wmf-config for Cirrus config updates' [18:21:00] Logged the message, Master [18:22:14] (03PS8) 10Matanya: toollabs: lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/98377 [18:23:32] (03CR) 10jenkins-bot: [V: 04-1] toollabs: lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/98377 (owner: 10Matanya) [18:26:42] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:28:33] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [18:39:53] (03CR) 10Calak: [C: 031] "This is a good idea, Thank you Ebrahim." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [18:49:43] (03PS9) 10Matanya: toollabs: lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/98377 [18:52:08] bblack, around? [18:52:14] yup [18:52:24] any status yet on the gzip esi stuff? :P [18:53:14] bblack, hehe :) [18:53:15] greg-g: we're done syncing files [18:53:18] if you are here [18:53:47] (03CR) 10Matanya: "conflicts resloved and patch rebased." [operations/puppet] - 10https://gerrit.wikimedia.org/r/98377 (owner: 10Matanya) [18:53:59] andrewbogott: ^ [18:54:08] yurik: I'm working on merging 3.0.5-plus and our patches. Some of the gzip/esi -relevant fixes are in there. Once we test that we'll see what else is or isn't still not good enough. [18:54:47] (03PS5) 10RashiqAhmad: Updated internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 [18:54:52] (03PS1) 10Ebrahim: Rename Persian Wikibooks per community consensus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102482 [18:55:04] (03PS6) 10RashiqAhmad: Updated internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 [18:55:18] bblack, basically i am wondering if an attempt to backport gzip patch to 303 would take much less time, assuming that 305 will take yet a few more months? or how complex do you think is the 305 migration? [18:56:23] bblack, also, in the meantime, could you review https://gerrit.wikimedia.org/r/#/c/102316/ -- because that will instantly enable HTTPS for us, and we won't be blocked on that front :) [18:56:34] manybubbles: thanks! :) [18:57:02] greg-g: not sure what is going on, but enwikisource is still using lucenesearch even though we don't expect it. the rest of what we wanted to do happened though. [18:57:12] I'll figure that out and reword schedule from there [18:57:19] huh [18:57:20] bblack, in reality, if we get the 102316 in, we can wait months for the migration to happen [18:57:33] manybubbles: that's, weird. [18:57:51] (03CR) 10Calak: [C: 031] Rename Persian Wikibooks per community consensus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102482 (owner: 10Ebrahim) [18:58:00] yeah [18:58:04] but I'll figure it out [18:58:18] oh, I know :) [18:58:27] yurik: I think I'll have a 3.0.5-plus~wm1 package today, but we'll need to test it [18:58:41] I'm just cleaning up the last bits for it [18:58:45] bblack, oh, wow!!! awesome :) [18:59:01] please let me know, i will try it out on the betalabs [18:59:17] greg-g: figured it out. I was checking enwiktionary instead of enwikisource. enwikisource is good. [18:59:22] so realistically, if it's not horribly broken, maybe we get it thoroughly tested on betalabs by sometime during next week, although it's holiday time for many [18:59:23] yay. no mysteries [18:59:39] (me included) [19:00:50] (03PS2) 10Ebrahim: Rename Persian Wikibooks per community consensus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102482 [19:00:57] (03PS7) 10RashiqAhmad: Updated internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 [19:01:13] (03PS8) 10RashiqAhmad: Updated internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 [19:02:04] bblack, sounds good! But let me know about https://gerrit.wikimedia.org/r/#/c/102316/ because we will have to most of that code regardless (the first portion of it), and this way there won't be any pressure on me from Dan to get HTTPS support, and we can do testing slowly :) [19:03:58] !log reedy synchronized wmf-config/ [19:04:15] Logged the message, Master [19:05:36] !log reedy synchronized php-1.23wmf7/extensions/GWToolset/ [19:05:43] (03PS9) 10RashiqAhmad: Updated internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 [19:05:52] Logged the message, Master [19:06:53] !log Created EducationProgram tables on fawiki [19:07:09] Logged the message, Master [19:09:14] manybubbles: hah! [19:13:38] Coren, bblack: are either of you available to review some C code? if so [19:17:15] ori-l: hardcore. :D [19:17:33] MatmaRex, no, ccode [19:17:44] sorry, typo - ccore [19:17:57] nevermind, doesn't make much sense ( [19:18:07] (03CR) 10Andrew Bogott: [C: 04-1] "Two typos, looks good otherwise" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/98377 (owner: 10Matanya) [19:18:18] (03CR) 10Calak: [C: 031] Rename Persian Wikibooks per community consensus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102482 (owner: 10Ebrahim) [19:18:53] ori-l: Sure. [19:19:23] sweet, thanks [19:19:59] ori-l: I added myself as reviewer; I'll take a peek at it later today. [19:20:35] yep yep thanks! [19:21:23] Reedy: wee, thanks for deploying gwtoolset :) [19:24:25] (03PS1) 10Reedy: Enable EducationProgram on arwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102484 [19:26:09] (03PS10) 10Matanya: toollabs: lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/98377 [19:33:36] (03CR) 10Odder: "I'm aware the new favicon differs from the current one, but I think it's even better; certainly the 16px version is much more visible than" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 (owner: 10RashiqAhmad) [19:44:40] (03CR) 10Ragesoss: [C: 031] Enable EducationProgram on arwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102484 (owner: 10Reedy) [19:45:19] mutante: around? [19:48:35] (03PS10) 10RashiqAhmad: Updated internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 [19:49:43] (03CR) 10RashiqAhmad: "Just pushed a version, where the font of the 32px and 48px version should the same :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 (owner: 10RashiqAhmad) [20:03:49] Reedy: fyi, this commit seems to have been deployed to commons without this one https://gerrit.wikimedia.org/r/#/c/102347/. as a result no one can add a user to the gwtoolset group. i don't know if this was intended or not. if intended no need to deploy https://gerrit.wikimedia.org/r/#/c/102347/. in any case let me know [20:04:14] this is the one that appears to have been deployed https://gerrit.wikimedia.org/r/#/c/102343/ [20:11:01] dan-nl: It's deployed [20:11:42] Reedy: just now or earlier? [20:11:48] Or maybe not [20:12:20] the way i'm testing it is on https://commons.wikimedia.org/wiki/Special:ListGroupRights … i'm looking to see that admins can add/remove gwtoolset [20:12:22] nah, I can't add people to gwtoolset at the moment [20:12:31] [20:11:45] Or maybe not [20:12:34] !log reedy synchronized wmf-config/ [20:12:42] Is now [20:12:50] Logged the message, Master [20:13:02] yep [20:13:04] thanks Reedy [20:13:05] cool, that took care of it thanks Reedy [20:16:28] (03CR) 10Andrew Bogott: [C: 032] toollabs: lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/98377 (owner: 10Matanya) [20:16:42] thanks andrewbogott [20:21:30] (03CR) 10Dzahn: "some time you should try getting Chad/demon to glance at it. that would be partly to let him know of changes to svn stuff as the person wh" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [20:24:24] mutante: if you have time, i'd like you to share with me your view of this svn server [20:24:58] !log reedy updated /a/common to {{Gerrit|I33c68abb1}}: Have sysops add and remove users from the 'gwtoolset' group [20:25:03] (03PS1) 10Reedy: Remove duplicate configuration from CommonSettings-labs.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102510 [20:25:15] Logged the message, Master [20:26:01] matanya: saw my comment above? that was meant to be my view right now. and yea, thanks for considering to move it [20:26:12] it was just about a module without init.pp [20:26:33] (03PS1) 10Ottomata: Making drerr regex more precise in ganglia varnishkafka view [operations/puppet] - 10https://gerrit.wikimedia.org/r/102527 [20:26:46] and the second comment about your questions regarding the status of the service in general, aside from puppet nickpicking [20:27:04] mutante: yes, i saw that comment. I was intersted if this was your last viewpoint [20:27:08] (03CR) 10Ottomata: [C: 032 V: 032] Making drerr regex more precise in ganglia varnishkafka view [operations/puppet] - 10https://gerrit.wikimedia.org/r/102527 (owner: 10Ottomata) [20:27:13] i got my answer there :) [20:27:19] matanya: well, just added it 5 minutes ago:) [20:27:37] which other specific view were you after [20:28:23] added Chad because i suggested to do so [20:28:35] mutante: should i build it as a read only public svn, without the blows and whistles of svn [20:28:53] i'm just sure that it will stay read-only [20:29:10] forever or until we have some fancy apache redirects to something else in place, but i dont think so [20:29:33] matanya: i think the answer is yes, but get more opinions [20:29:40] wouldn't be easier to redirect svn patches to their git replacments? [20:29:52] that's what i meant with the fancy redirects part [20:30:01] and i dont really have the answer if it is [20:30:05] jee, i'm slow today [20:30:25] (03CR) 10Ebrahim: "I guess community would ask why this is changed without their consensus once (the moving to Wikimedia Commons)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [20:30:31] i'll try to unboard chad [20:31:17] thanks, i just added him as reviewer [20:31:17] *onboard [20:31:23] i figured:) [20:32:17] ok, i'll wait for chad before i break it apart. [20:32:33] matanya: you definitely don't need to add anything that is for writing, especially not if it doesnt exist right now [20:32:39] cool [20:40:05] (03CR) 10Dzahn: "this is not a code review at all yet, but thanks for the idea, that we agreed on now always syncing the UID with labs for new shells and s" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98700 (owner: 10Andrew Bogott) [20:54:12] (03CR) 10Andrew Bogott: lint applicationserver::cron (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100784 (owner: 10Hashar) [20:56:30] hey, any ideas how can I found out who edited a particular line in our repos? [20:56:40] git blame? [20:56:41] :P [20:56:41] and I'm thinking SVN era, not git [20:56:54] what file? [20:56:55] (03PS1) 10Aquifacae: Changed the favicon back to original design. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102546 [20:57:02] InitialiseSettings.php Reedy [20:57:29] I think the svn repo is still there (I seem to recall trying to find something for someone else not too long ago) [20:57:31] Why? [20:57:38] curiosity. [20:57:47] It is, but I have no idea where the file resides there :) [20:57:59] http://svn.wikimedia.org/viewvc o_0 [20:58:04] (03Abandoned) 10Aquifacae: Changed wiktionary/en.ico favicon to proper image and resolutions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102386 (owner: 10Aquifacae) [20:58:05] it's not a public repo [20:58:11] as it ha[sd] passwords in it [20:58:52] (03Abandoned) 10Aquifacae: Changed the favicon back to original design. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102546 (owner: 10Aquifacae) [21:01:38] (03CR) 10Odder: "I can understand the curiosity, but that setting has been there for a long time, even BGE (Before Git Era)." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [21:04:32] (03PS1) 10Dzahn: create stat1 shell account for Nuria Ruiz [operations/puppet] - 10https://gerrit.wikimedia.org/r/102551 [21:06:13] (03CR) 10Aklapper: "I wasn't annoyed by this one email. Hey, it's not even getting close to random internal mailing lists. ;)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102365 (owner: 10Dzahn) [21:07:05] (03PS1) 10Aquifacae: Added new 48x48 resolution to original favicon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102554 [21:07:36] (03PS2) 10Aquifacae: Added new 48x48 resolution to original favicon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102554 [21:11:10] (03PS1) 10BryanDavis: Fix bad erb syntax in wikimania_scholarships template [operations/puppet] - 10https://gerrit.wikimedia.org/r/102558 [21:11:25] (03CR) 10Andrew Bogott: [C: 031] create stat1 shell account for Nuria Ruiz [operations/puppet] - 10https://gerrit.wikimedia.org/r/102551 (owner: 10Dzahn) [21:11:32] (03CR) 10Dzahn: "thanks Andre:) I'm afraid roots may have gotten a few more than you did the last couple days.as stated above there is something odd about " [operations/puppet] - 10https://gerrit.wikimedia.org/r/102365 (owner: 10Dzahn) [21:13:37] (03Abandoned) 10Mwalker: Stating that OCG is an upstart job and setting config repo [operations/puppet] - 10https://gerrit.wikimedia.org/r/97681 (owner: 10Mwalker) [21:13:59] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix bad erb syntax in wikimania_scholarships template [operations/puppet] - 10https://gerrit.wikimedia.org/r/102558 (owner: 10BryanDavis) [21:14:17] (03CR) 10Dzahn: "fwiw, i have given up on doing users alphabetically, at some point i started sorting something here but nobody does it the same anyways, a" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102551 (owner: 10Dzahn) [21:15:00] (03CR) 10Nemo bis: "As odder says, this happened before 015f5b7, i.e. in 2010 when all the projects were pointed to the new logo on Commons. https://commons.w" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [21:25:03] (03CR) 10Odder: [C: 04-1] "You seem to be using a different font; the current version is much more condensed and the height of the letters is bigger." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102554 (owner: 10Aquifacae) [21:27:18] (03CR) 10Dzahn: [C: 032] create stat1 shell account for Nuria Ruiz [operations/puppet] - 10https://gerrit.wikimedia.org/r/102551 (owner: 10Dzahn) [21:27:50] (03CR) 10Odder: [C: 031] "Looks good! \o/" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 (owner: 10RashiqAhmad) [21:33:57] (03PS3) 10Qgil: Added new 48x48 resolution to original favicon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102554 (owner: 10Aquifacae) [21:36:04] (03CR) 10Hashar: lint applicationserver::cron (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100784 (owner: 10Hashar) [21:42:21] (03PS1) 10Ottomata: Changing topic to webrequest_mobile, now subscribing to config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/102562 [21:43:21] (03PS2) 10Ottomata: Changing topic to webrequest_mobile, now subscribing to config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/102562 [21:43:26] (03CR) 10Ottomata: [C: 032 V: 032] Changing topic to webrequest_mobile, now subscribing to config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/102562 (owner: 10Ottomata) [21:45:09] !log stopping kafka in order to cleanup old topic 'webrequest-mobile'. Usually we won't have to do cleanups like this, but since this topic is new and no one is using it yes, I'd rather clean up now rather than never.' [21:45:25] Logged the message, Master [21:52:51] (03PS1) 10Ottomata: Fixing bug where subscribe happened even if should_subscribe was false [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/102563 [21:53:10] (03CR) 10Ottomata: [C: 032 V: 032] Fixing bug where subscribe happened even if should_subscribe was false [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/102563 (owner: 10Ottomata) [21:53:55] (03PS1) 10Ottomata: Updating varnishkafka module to master with bugfix [operations/puppet] - 10https://gerrit.wikimedia.org/r/102564 [21:54:29] (03CR) 10Ottomata: [C: 032 V: 032] Updating varnishkafka module to master with bugfix [operations/puppet] - 10https://gerrit.wikimedia.org/r/102564 (owner: 10Ottomata) [21:57:36] (03CR) 10Nuria: "thanks" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102551 (owner: 10Dzahn) [21:58:12] RECOVERY - DPKG on cp1046 is OK: All packages OK [21:59:42] RECOVERY - DPKG on cp4011 is OK: All packages OK [22:03:28] (03CR) 10Dzahn: "I don't know what to make of his, is this really just supposed to be 1 char on one line?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99369 (owner: 10Matanya) [22:05:12] mutante: didn't understand the question [22:05:30] (03CR) 10Dzahn: [C: 04-1] "if yea, i think that's overdoing the lint changes, even being a fan and creator of many lint changes myself, and small changes being a goo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99369 (owner: 10Matanya) [22:07:23] (03CR) 10Matanya: "You are right. though i was thought not to mix lint and functional changes. Isn't this correct?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99369 (owner: 10Matanya) [22:14:07] (03CR) 10Dzahn: "that is also correct, you got me/us there :) afraid i don't have more than a vague "use good judgement" there, sry" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99369 (owner: 10Matanya) [22:17:18] (03CR) 10Matanya: "yeah, well you are right. give me a black point and merge :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99369 (owner: 10Matanya) [22:20:14] (03CR) 10Dzahn: [C: 032] "ok, you are also right that it is redundant to talk longer about it not being worth the time than actually merging it, but i think point h" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99369 (owner: 10Matanya) [22:21:19] (03PS1) 10Ottomata: Adding auto_create_topics_enable parameter [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/102573 [22:21:41] where is hashar when you need him [22:21:47] jenkins-bot stopped merging changes [22:21:52] it just votes V+2 [22:22:07] this is a feature, not a bug MatmaRex [22:22:09] sounds like repo specific [22:22:14] wfm in ops/puppet [22:22:45] (03CR) 10Ottomata: [C: 032 V: 032] Adding auto_create_topics_enable parameter [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/102573 (owner: 10Ottomata) [22:22:53] https://gerrit.wikimedia.org/r/#/q/status:open,n,z [22:23:04] just look how many rows with two checks are there [22:23:07] that's not normal [22:23:36] (03PS1) 10Ottomata: Updating kafka module to disable topic creation by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/102575 [22:23:46] (03PS2) 10Ottomata: Updating kafka module to disable topic creation by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/102575 [22:23:51] (03CR) 10Ottomata: [C: 032 V: 032] Updating kafka module to disable topic creation by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/102575 (owner: 10Ottomata) [22:24:05] https://gerrit.wikimedia.org/r/#/q/status:open+label:verified%252B2+label:code-review%252B2,n,z [22:24:18] matanya: please do explain how that is a feature. :( [22:26:50] (03PS1) 10Ottomata: Properly outputting string from boolean [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/102576 [22:26:59] MatmaRex: are you talking mediawiki or globally all gerrit [22:27:05] (03CR) 10Ottomata: [C: 032 V: 032] Properly outputting string from boolean [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/102576 (owner: 10Ottomata) [22:27:18] mutante: all gerrit afaics, Krinkl.e is looking into it [22:27:30] (03PS1) 10Ottomata: Updating kafka module to master [operations/puppet] - 10https://gerrit.wikimedia.org/r/102577 [22:27:44] (03CR) 10Ottomata: [C: 032 V: 032] Updating kafka module to master [operations/puppet] - 10https://gerrit.wikimedia.org/r/102577 (owner: 10Ottomata) [22:27:47] MatmaRex: k,thx [22:30:14] (03CR) 10Dzahn: "Mark, this one could use your input when you get to it.thx I'd rather not decide any mail logging things without a couple reviews, your's " [operations/puppet] - 10https://gerrit.wikimedia.org/r/101117 (owner: 10Nemo bis) [22:32:07] !log Zuul seems to be unresponsive to gate-and-submit events for mediawiki/* projects [22:32:23] Logged the message, Master [22:36:10] [22:32:07] !log Zuul seems to be unresponsive to gate-and-submit events for mediawiki/* projects [22:36:20] that's just the norm for everyone, Krinkle :P [22:38:16] !log Reloading Zuul to deploy Ic8cf602a0aa72 [22:38:34] Logged the message, Master [22:40:38] (03PS7) 10Dzahn: install various Perl modules needed by Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 [22:47:24] since my nick is listed as being on RT duty: /away food, checkout 'memoserv' if you dont know it yet, and feel free to use that for a reliable IRC ping in general [22:48:52] :) [22:52:37] (03CR) 10RashiqAhmad: "thanks! :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102354 (owner: 10RashiqAhmad) [22:55:47] (03CR) 10Catrope: "This broke because tyvwiki was never added to the Parsoid configuration" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79321 (owner: 10Amire80) [22:57:35] (03PS1) 10Ori.livneh: Launch Wikimania Scholarships app [operations/puppet] - 10https://gerrit.wikimedia.org/r/102580 [22:58:22] (03CR) 10BryanDavis: [C: 031] Launch Wikimania Scholarships app [operations/puppet] - 10https://gerrit.wikimedia.org/r/102580 (owner: 10Ori.livneh) [22:58:34] bd808: is that tantamount to a green light? [22:59:08] greg-g: Are we good to go for scholarships? [22:59:35] bd808: yessir [22:59:40] (03CR) 10Tim Landscheidt: "@Matanya: What script/settings did you use?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98377 (owner: 10Matanya) [22:59:43] (03CR) 10Ori.livneh: [C: 032] Launch Wikimania Scholarships app [operations/puppet] - 10https://gerrit.wikimedia.org/r/102580 (owner: 10Ori.livneh) [23:02:04] (03PS1) 10Ottomata: Updating hive-partitioner cron job to use hive-serdes [operations/puppet] - 10https://gerrit.wikimedia.org/r/102583 [23:02:08] !log zirconium: service apache2 reload for wikimania scholarships app [23:02:23] bd808: http://scholarships.wikimedia.org/apply [23:02:25] Logged the message, Master [23:02:29] Puppet doesn't do that in the notify step? [23:02:52] (03PS2) 10Ottomata: Updating hive-partitioner cron job to use hive-serdes [operations/puppet] - 10https://gerrit.wikimedia.org/r/102583 [23:02:58] (03CR) 10Ottomata: [C: 032 V: 032] Updating hive-partitioner cron job to use hive-serdes [operations/puppet] - 10https://gerrit.wikimedia.org/r/102583 (owner: 10Ottomata) [23:02:59] Front page looks good [23:03:06] bd808: no. it's often left out to prevent critical services from being restarted while they are not being attended [23:03:19] I guess that makes sense [23:03:33] !log kafka brokers on an21 and an22 are now running with cleaned up topics [23:03:47] Logged the message, Master [23:03:59] bd808: ok, multitasking between this & a meeting with the analytics folks so ping me if you need anything [23:04:01] i'm available [23:04:10] can focus on scholarships if the need arises, just let me know [23:04:26] Thank ori [23:06:31] Where can I find apache error logs from zirconium? [23:07:17] The app's dispatcher is not working correctly there for some reason and I'm hoping there might be info in the apache error log [23:07:24] ori-l: ^ [23:08:04] bd808: do you not have shell? [23:08:18] I do. [23:08:33] cd: /var/log/apache2/: Permission denied [23:08:54] Did I get sudo there too? [23:09:17] bd808: fluorine.eqiad.wmnet:/home/bd808/zirc-access.log & zirc-error.log [23:09:59] zirc-access.log: Permission denied [23:10:49] (03PS1) 10Tim Landscheidt: Tools: Add key for MariaDB repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/102585 [23:13:29] ori-l: Those logs are root:root 0640 [23:15:15] The apache rewrite rules are working for GET requests obviously because the root URL redirects to /appy and displays the teaser text [23:16:06] But when I go to the login screen and submit a POST request for /login.post I get routed to the 404 handler. [23:16:59] Would the misc varnish cluster be converting my POST to a GET for some reason? [23:17:33] (03PS1) 10RashiqAhmad: Updated community.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102586 [23:17:52] bd808: should be readable now [23:18:08] (03PS2) 10RashiqAhmad: Updated community.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102586 [23:18:21] i can run varnishlog on the varnish host to see what it's doing [23:19:19] ori-l: That would be helpful [23:20:06] (03PS3) 10RashiqAhmad: Updated community.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102586 [23:20:34] My requests should be from [2001:470:d:1045:7448:20fa:6676:eecf] [23:22:38] bd808: k, sec [23:25:24] (03PS2) 10Tim Landscheidt: Tools: Unify Tools and Toolsbeta configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/102385 [23:26:46] bd808: how do you get to the login screen? [23:27:02] https://scholarships.wikimedia.org/login [23:27:12] So sneaky :) [23:32:08] bd808: pm [23:32:18] I'd drying to debug with curl and I see that a 302 response is getting in the response stream [23:34:07] !log Zuul is once again not processing any CR+2 events [23:34:22] Logged the message, Master [23:36:47] !log Reloading Zuul [23:37:03] Logged the message, Master [23:39:40] greg-g: I'm deployed but there are some issues with the app not interacting properly with varnish [23:40:45] bd808: yeah, saw, anything you need more ops help with? [23:40:57] The only content that anyone is expected to see as of today is being served but I can't login to the backend screens [23:41:08] which is kind of a needed feature ;) [23:41:32] I think Ori is looking at some things but he said something above about being in another meeting [23:42:18] I'm "good for now" but will need help figuring out what my app is doing wrong behind varnish [23:44:41] !log Zuul pipeline for gate-and-submit fixed (the revert was someone undone after the reload, deployed the same git commit again) [23:45:00] Logged the message, Master [23:45:00] bd808: ok, don't feel bad for poking too much [23:45:29] greg-g: Ori and I are working on it in a PM :) [23:45:33] cool [23:48:59] Krinkle: Undid, not undone. :-) [23:50:00] "Thank you, but I prefer it my way." – Andre Baptiste Sr. [23:50:21] (Lord of War) [23:50:22] Thx [23:50:50] * James_F grins. [23:52:00] eiii! was wmf7 just deployed? [23:52:18] no...? [23:52:38] greg-g, i just saw for the first time fatalmonitor's warning for the invalid message param [23:52:40] Invalid parameter for message "parentheses": a:1:{s:3:"raw";N;} [23:52:42] (and do you mean wmf8?) [23:52:50] no, 7 [23:53:04] now we know which message causes this issue :) [23:53:31] there were Cirrus updates this morning, and some config changes for the gwtoolset [23:53:50] its just that i saw lots of those warnings, but there were all coming from 6 [23:54:02] and 6 didn't show which message was failing [23:54:20] i'm happy it went live, now can start figuring it out [23:56:57] greg-g, spoke too soon :((( 'parentheses' is one of the most common messages in use :(((( [23:57:00] (it went out yesterday) [23:57:17] guess i will have to add stacktracing to it [23:57:19] bleh