[00:00:24] it connects as root though [00:00:46] I think I have a meeting starting now [00:02:53] Reedy: arr, i need to rush to a bus, i can be back in half an hour [00:03:45] TimStarling: yup. :) [00:04:41] TimStarling: if you're still working with Reedy, though, being late is fine [00:06:05] Change merged: Tim Starling; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/67190 [00:07:22] robla said I should push it out before the meeting starts, so I'm doing that [00:08:31] !log aaron synchronized wmf-config/InitialiseSettings.php 'Enabled message cache debug log' [00:08:39] Logged the message, Master [00:10:04] !log deployed apache configuration for vote.wikimedia.org [00:10:11] Logged the message, Master [00:10:44] New patchset: Aaron Schulz; "Enabled message cache debug log." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67197 [00:11:22] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67197 [00:12:34] Reedy: Jamesofur: have the voter qualification scripts been run yet? [00:12:45] robla: they have [00:12:51] whew :) [00:13:00] * Jamesofur nods, yeah lol [00:13:03] Tim was just explaining this to me :) [00:18:07] https://vote.wikimedia.org/wiki/Main_Page [00:18:37] Is https://vote.wikimedia.org/ not taking anyone else to the main page? [00:19:01] I get a Forbidden error. [00:19:13] I was getting the www.wikimedia.org content until I bypassed my cache. [00:19:13] That's what apache-fast-test said too [00:19:36] There was also some vote.wikimedia.org --> vote.wikipedia.org wonkiness earlier. [00:19:54] Firefox keeps trying to do that [00:20:22] Yeah, I'm getting a 301 at https://vote.wikimedia.org/w/index.php [00:20:33] Location: https://vote.wikipedia.org/wiki/Main_Page [00:20:43] Maybe cache, though I suspect a typo. [00:21:07] reedy@tin:/a/common$ mwscript eval.php votewiki [00:21:07] > echo $wgCanonicalServer [00:21:07] http://vote.wikipedia.org [00:21:07] > echo $wgServer [00:21:08] /vote.wikipedia.org [00:21:17] Not set specifically [00:21:32] I guess it's identiying as a wikipedia for some sillyness [00:21:34] I think it's the default [00:21:40] You didn't specify wgServer? [00:22:12] Easy enough to fix. [00:22:42] Because the suffix was 'wiki' instead of 'wikimedia'? [00:22:44] I sorta hoped 'wikimedia' => '//$lang.wikimedia.org', would cater for it [00:22:46] !log reedy synchronized wmf-config/InitialiseSettings.php [00:22:55] Logged the message, Master [00:22:55] Krenair: Probably. :-) [00:25:13] Guess we're missing a couple of rewrite rules [00:25:39] New patchset: Reedy; "Set wgServer and wgCanonicalServer for votewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67202 [00:27:10] New patchset: Reedy; "Add a couple of rewrites for votewiki" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/67203 [00:30:52] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67202 [00:48:30] AaronSchulz: twemproxy actually can't reload its conf, needs to be restarted. so i'm not going to touch the scap scripts at all [00:50:29] AaronSchulz: but will instead add a "restart twemproxy" script. i don't think the restart need is a problem. timed a restart via upstart. from the shutdown to the new copy establishing connections to all memcached instances and then accepting connections took 59ms. [00:51:45] AaronSchulz: ideally, mediawiki would retry requests in this case though [01:00:27] New patchset: Yurik; "Minor fix to base decision on X-CS, not X-Carrier" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67214 [01:02:17] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.005628585815 secs [01:02:18] Reedy: re..still need something? [01:02:37] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.003818631172 secs [01:32:04] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:32:21] ignore that [01:32:32] geez [01:32:54] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:32:54] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 2.157 second response time [01:32:56] paravoid: it will be time for you to wake up soon [01:33:44] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 1.044 second response time [01:53:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [02:04:32] Aaron|home: so, it needs about an hour to upgrade every box (pgs need "upgrading") [02:04:49] Aaron|home: that leaves 9 boxes and I don't want to do them at the same time [02:05:01] Aaron|home: plus waiting for .3, and the bug I reported [02:05:14] !log Graceful reload of Zuul to deploy {{gerrit|I4e7978f8a2a770}} [02:05:26] Aaron|home: and friday is off, so all in all, it's unlikely that we'll put ceph back into production this week [02:05:31] Logged the message, Master [02:05:36] Aaron|home: hopefully early next week [02:05:40] ok [02:05:57] I'll finish up the cuttlefish upgrades tomorrow [02:06:02] !log LocalisationUpdate completed (1.22wmf5) at Thu Jun 6 02:06:01 UTC 2013 [02:06:06] * Aaron|home stopped his copy scripts earlier due to errors [02:06:10] Logged the message, Master [02:06:11] that can be finished later [02:06:16] yeah [02:06:20] I also have copy scripts copying thumbs [02:06:28] running since earlier today [02:07:48] let's say monday for now [02:07:57] based on past experience, I don't think we need a deployment window [02:15:33] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 6 02:15:33 UTC 2013 [02:15:42] Logged the message, Master [02:18:02] TimStarling: Writing up all of our next steps for the election. I'm assuming we have to flip SecurePoll to true by default, it looks like we turn it off for most wikis in between elections? [02:35:27] Jamesofur: https://bugzilla.wikimedia.org/show_bug.cgi?id=42464 [02:36:21] amen :-/ we need to set aside resources to tie up some of the loose ends after this election [02:36:48] especially in case Tim gets hit by a bus [02:46:25] Jamesofur: not sure why it is switched off [02:46:32] it can be on everywhere all the time [02:46:44] * Jamesofur nods, that's what I expected.  [02:46:47] easy enough [02:50:38] <^demon|away> Jamesofur: Speaking of elections, I never heard back from anyone after I got you guys that info you were looking for. [02:50:47] <^demon|away> Was that sufficient, or do we need more info? [02:51:50] I think it's likely to be as close as we're going to get, the real question is if we want to turn SecurePoll on for wikitech [02:51:56] if we do that we can just have them start voting there [02:52:18] otherwise we might do the "if you can't vote otherwise and you're a developer ask the election committee and they'll check the list and add you" meathod [02:52:23] method too [02:52:28] <^demon|away> Either way you have to end up reconciling the data with normal wiki voting. [02:52:33] yes [02:52:42] <^demon|away> I for example would qualify to vote either way, and I imagine I'm not alone. [02:52:44] we'll have to do that with staff too [02:52:47] me too [02:53:00] That's why I'm thinking the exception method may actually be better overall [02:53:06] if they don't qualify the normal ways they can ask [02:53:11] and we can add them to the meta list [02:53:14] and then they can vote [02:53:35] the 2 weeks and some staff attention from my side makes that relatively painless (relatively) [02:53:46] as long as we have the list you gave us [02:54:03] <^demon|away> Well its up to you guys ;-) Just lemme know if you need anything else from me. [02:54:11] will do [02:54:15] thanks much for the help [02:54:19] <^demon|away> yw. [03:00:48] New patchset: Jalexander; "Enable SecurePoll globally and VoteWiki logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67227 [03:27:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:29:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [03:45:46] !log tstarling synchronized php-1.22wmf5/includes/cache/MessageCache.php 'temp debug log for high master query load' [03:45:54] Logged the message, Master [03:48:02] !log tstarling synchronized php-1.22wmf5/includes/cache/MessageCache.php 'revert temp hack' [03:48:09] Logged the message, Master [03:54:10] !log tstarling synchronized php-1.22wmf5/includes/Revision.php 'temp debug hack' [03:54:18] Logged the message, Master [03:56:58] AaronSchulz: it wasn't MessageCache::getMsgFromNamespace() after all [04:18:36] !log tstarling synchronized php-1.22wmf5/includes/Revision.php 'remove debug hack' [04:18:44] Logged the message, Master [04:22:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:23:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [04:53:40] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [04:57:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.740 second response time [05:02:00] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [05:12:15] !log tstarling synchronized php-1.22wmf5/extensions/GuidedTour/modules/tours/test.js [05:12:23] Logged the message, Master [05:12:53] !log tstarling synchronized php-1.22wmf5/extensions/GuidedTour/GuidedTourHooks.php [05:13:00] Logged the message, Master [05:14:08] !log tstarling synchronized php-1.22wmf5/extensions/GuidedTour [05:14:16] Logged the message, Master [05:15:11] !log fixed high enwiki master query load due to GuidedTour [05:15:20] Logged the message, Master [05:15:39] nice [05:15:41] superm401: ^ [05:15:46] * greg-g was watching and curious ;) [05:15:46] * ori-l catches up [05:16:13] ori-l, yeah, TimStarling already filled me in. [05:16:15] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=db1056.eqiad.wmnet&m=cpu_report&s=by+name&mc=2&g=network_report&c=MySQL+eqiad [05:16:28] He IDed a big perf problem in GuidedTour then fixed it, and then I cherry-picked it for him. [05:16:29] wow [05:17:41] also user CPU: http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=MySQL+eqiad&h=db1056.eqiad.wmnet&jr=&js=&v=0.8&m=cpu_user&vl=%25&ti=CPU+User [05:18:45] well done. [05:19:39] more than I was expecting, the site is definitely up isn't it? [05:19:41] ;) [05:20:05] appears so ;) [05:20:44] good catch [05:20:47] also good morning [05:20:59] oh right, look at the connection counts: http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=MySQL%20eqiad&h=db1056.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1370495751&v=6204&m=mysql_connections&vl=conns&ti=mysql_connections&z=large [05:21:24] see, we didn't just kill a major source of queries, we removed master connections from parser cache hits entirely [05:21:31] g'morning apergos [05:21:35] so the associated BEGIN queries are gone too [05:21:44] Wow [05:21:56] very nice [05:22:00] *Really* good catch, TimStarling [05:22:37] meh, i've seen better [05:23:02] * ori-l kids [06:25:12] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:13] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:13] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:14] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:14] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:15] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:15] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [07:04:36] PROBLEM - Swift HTTP on ms-fe1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:12:41] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out [07:12:42] PROBLEM - Apache HTTP on mw1156 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:12:42] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection timed out [07:12:42] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection timed out [07:12:49] PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection timed out [07:12:49] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection timed out [07:13:01] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection timed out [07:13:09] PROBLEM - Swift HTTP on ms-fe3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:13:09] PROBLEM - Apache HTTP on mw1153 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:13:29] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.068 second response time [07:13:29] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:13:59] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 7.290 second response time [07:13:59] RECOVERY - Swift HTTP on ms-fe3 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.060 second response time [07:13:59] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [07:14:19] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 61954 bytes in 0.207 second response time [07:14:23] !log swift init restart all on ms-fe1 and 3 [07:14:33] Logged the message, Master [07:14:47] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.052 second response time [07:14:47] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.066 second response time [07:14:48] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [07:14:57] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.072 second response time [07:14:57] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.058 second response time [07:14:57] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [07:17:36] !log restarted swift on most backends as well [07:17:46] Logged the message, Master [07:19:02] wee [07:20:51] PROBLEM - Host wtp1008 is DOWN: CRITICAL - Plugin timed out after 15 seconds [07:20:52] none too exciting [07:21:01] PROBLEM - Swift HTTP on ms-fe2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:17] RECOVERY - Host wtp1008 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [07:21:40] !log and the rest of the ms-fes [07:21:48] Logged the message, Master [07:21:51] RECOVERY - Swift HTTP on ms-fe2 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.060 second response time [07:22:00] next time I'll just do them all [07:22:34] parsoid... ugh [07:25:28] it rebooted itself, well that's not too nice [07:27:42] !log wtp1008 rebooted itself a few minutes ago (hence the icinga whine), no idea why [07:27:50] Logged the message, Master [08:02:10] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.006365776062 secs [08:13:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:14:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [08:23:06] mark: on beta is very still any use for deployment-cache-text1 ? IIRC it was to setup a text varnish cache on beta :) [08:27:05] mark when can you provide Snaps the profiling information for varnishncsa? [08:32:56] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.005370020866 secs [09:29:49] drdee: i'm hoping to get to that today [09:34:58] sweet! [10:02:29] ...and if not, it'll be tomorrow [10:04:22] and else the day after that :D but if you are very busy maybe you can give me headsup so we can think of a solution, i rather not have him blocked (he isn't yet but eventually he will get blocked) [10:11:27] no worries, I'm actually planning to get it done this week ;) [10:11:53] New review: Diederik; "(1 comment)" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/53170 [10:22:59] apergos: moin [10:23:14] hello [10:23:31] how's swift? [10:23:39] ceph might take a few more days after all [10:24:26] the boxes fell over this morning with similar symptoms to the last time [10:24:29] and I am off for some extended week-end. See you all on monday [10:24:29] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:24:29] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:24:35] ignore those [10:24:58] PROBLEM - HTTP radosgw on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:01] they've been chugging along since then as though nothing ever happened :-/ [10:25:09] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:20] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:24] sorry for the page [10:25:38] heh [10:26:40] what's the status of the replacement boxes? [10:26:50] RECOVERY - HTTP radosgw on ms-fe1003 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.005 second response time [10:27:01] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.006 second response time [10:27:32] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.005 second response time [10:27:41] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.007 second response time [10:27:41] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.006 second response time [10:28:18] ms-be1 is racked and has he base install but not the first puppet run I believe, and I have no idea about the others (I guess they will go in the same racks as the existig servers, this may mean they can't go in til the live server comes out, depends on space) [10:29:13] ms-be4 is picking up the data for the disks that didn't match the old partition layout now [10:29:25] did you push rings recently? [10:29:26] ms-be9 has one disk not recognized and there's a ticket [10:29:33] this morning [10:29:48] so that's about the state of it [10:30:02] before the leaks? [10:30:03] 1 box + 1 disk unavailable [10:30:17] yes, before [10:30:21] (I !log every time I push rings) [10:31:53] they fell over about 2.5 hours later [10:31:54] so you can tell the age of a log by the number of rings [10:32:44] bad puns == bedtime. bye! [10:33:04] the host with the changes didn't actually exhibit any of the symptoms... [10:33:06] good night [10:33:07] New review: Daniel Kinzler; "(1 comment)" [operations/apache-config] (master) C: -1; - https://gerrit.wikimedia.org/r/65443 [10:33:12] did you also push rings before the other time we had the outage too? [10:33:29] nope [10:33:29] ori-l: i appreciated your comment ;) [10:33:43] heh, it was worth a try [10:33:53] if I had I would have mentioned it in the log :-D [10:34:32] and right now as I say it's only playing catchup for 4 disks (q for obj, 2 for account and container that are likely already done) [10:34:52] s/q for/2 for/ [10:36:11] and what's the plan for ms-be1? [10:36:26] once ms-be4 is happy then we'll bing ms-be1 up at 33% etc [10:36:28] same old same old [10:36:34] heh [10:36:45] when it's all the way in then we start on the ones with the h310 controllers [10:36:53] right [10:37:22] then the race will be on: does ceph become primary front end before we get all the bad controllers out [10:37:30] er back end [10:38:09] wanna place any bets? [10:38:21] hah [10:39:29] I'll take that as a no! [12:02:18] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:03:09] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [12:13:37] Help please... [12:13:51] I'm trying to push a change to gerrit, but I'm getting a weird errors. [12:14:00] To ssh://siebrand@gerrit.wikimedia.org:29418/operations/puppet.git [12:14:00] ! [remote rejected] HEAD -> refs/for/master (branch master not found) [12:14:01] error: failed to push some refs to 'ssh://siebrand@gerrit.wikimedia.org:29418/operations/puppet.git' [12:14:05] Any ideas? [12:14:13] I haven't pushed to that repo before. [12:16:16] push to production, not master [12:18:18] siebrand: ^^ [12:18:58] drdee: ty [12:25:12] <^demon> !log bringing gerrit down for a few minutes to pick up some changes [12:25:20] Logged the message, Master [12:26:08] :( [12:26:35] * aude hopes for shiny new features in gerrit [12:26:42] mmm [12:26:46] shiny! [12:27:01] gitblit is cool [12:27:09] <^demon> Tons of shiny features. [12:27:12] <^demon> And bugfixes [12:27:16] yay! [12:27:30] 'tons' you say? [12:27:34] <^demon> Should be back in a minute, waiting for puppet to finish [12:27:38] ok [12:27:46] <^demon> YuviPanda: I believe the technical term is "a metric shit-ton" :) [12:27:57] metric system ftw [12:28:07] +1 to sane systems of measurements [12:28:21] though I'd reccomend against visiting the metric shit-ton reference brick in Paris... [12:28:24] * apergos isn't sure the metric shit-ton is any more sane than the shit-ton in other systems [12:28:36] <^demon> And we're back with gerrit 2.7-rc1-424-gef469ac [12:28:37] <^demon> :) [12:28:43] nice [12:29:10] oooh, there is a draft / status column [12:29:34] for merged or abandoned I guess [12:29:39] <^demon> !log gerrit's back, running 2.7-rc1-424-gef469ac [12:29:47] Logged the message, Master [12:29:47] yes [12:30:28] als someone reviewed my rev_len change leaving demon off the hook [12:31:33] now the search bar is even more outside my screen :( [12:31:52] <^demon> They got rid of the silly expanding bar at least. [12:32:41] <^demon> Yayayayayayayayay! [12:32:50] <^demon> The stupid "can't expand crap" bug seems fixed. [12:33:02] <^demon> Trying to replicate that's always been a pain in the ass. [12:33:23] New patchset: Siebrand; "Enable auto commit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67249 [12:33:39] * aude wonder if i can paste links into gerrit comments again [12:33:46] and stack traces and such [12:35:04] links work :) https://gerrit.wikimedia.org/r/#/c/67083/ [12:37:04] why does https://gerrit.wikimedia.org/r/#/q/Iaedbea9b5f43ac,n,z show endless "Working" [12:39:29] ^demon: any idea why puppet just hangs when i'm running it on my labs instance? [12:39:35] worked last night [12:40:39] New review: Manybubbles; "Autocommits in Solr are a normal and this will allow us to use this Solr without committing. If we ..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/67249 [12:41:23] <^demon> aude: Not a clue :\ [12:41:37] hmmm [12:41:46] * aude moves on to do other stuff and come back later [12:42:12] trying to submit a patch for puppet, though :/ [12:43:40] ^demon: Was this possibly your change, or is this local. When trying to push a change: [12:43:40] remote: Resolving deltas: 100% (711/711) [12:43:41] error: unpack failed: error Missing blob 83aff0fd25c82dd440dfd95165bec61a6b945908 [12:43:56] <^demon> Yeah, that just showed up in the error log...looking [12:44:11] <^demon> Ugh, that's probably because of Gustaf's change. [12:46:04] while there are gerrit experts in the room.. I was trying to pull down a change (so I could git commit --amend it) after having synced up with some changes in master. It really really wanted to merge the thing in and I could only work around it by rebase via the gerrit gui and then pull the change. what's the "right" way? [12:46:28] by "merge the thing in" I mean a new commit with "merging blah blah" [12:47:10] <^demon> qchris: Ugh, I think Gustaf's stuff is maybe not ready for primetime. [12:47:12] <^demon> http://p.defau.lt/?4B2BZFLIKsBfW4604_mbvw [12:47:24] ^demon: Push just worked. You fixed something? [12:47:28] <^demon> Nope. [12:47:46] <^demon> But I'm concerned about these stacktraces, so I'm rolling some of the experimental changes out of our build. [12:48:33] * aude wouldn't be surprised if this is related to puppet not working [12:48:38] if it's trying to fetch something from gerrit [12:48:47] <^demon> I doubt it [12:48:51] hmmmm [12:50:14] <^demon> Anyway, we're just deploying master without our experimental changes. They seem a bit wonky from siebrand's experience. [12:50:16] * aude takes a nap while puppet runs [12:50:22] <^demon> Gonna roll back now [12:51:04] ^demon: You want some traffic on Gerrit let me know. [12:51:20] ^demon: I start 2 scripts and you have up to 20 repo pulls at the same time... [12:52:49] <^demon> !log gerrit rollback to 2.7-rc1-420-g5d5c5c3, some of the new work wasn't quite ready for prime time [12:52:57] Logged the message, Master [12:53:19] rats [12:53:27] <^demon> Just 4 changes. We still have hundreds of others :) [12:53:39] ah ok then :-) [12:53:46] <^demon> It's mainly a series of changes that do a lot of work to tighten security on drafts. [12:55:05] I have never used the draft feaure [12:55:09] or feature either [12:55:15] * aude assumes nothing is private that i put on the internet [12:55:40] it's still a nice feature but i expect no privacy [12:56:10] <^demon> Yeah, with drafts they've always shot for security through obscurity. [12:56:16] <^demon> We were trying to fix that. [12:56:17] yep [12:56:20] and we all know how well that works [12:56:40] * apergos votes for job security through obscurity, proven to actually work :-D [12:56:49] :) [12:57:06] <^demon> siebrand: Thanks for the offer. Rebooting gerrit replicates all projects, so I get a nice queue of ~2000 jobs to complete already :) [12:57:36] ^demon: Just tried to open https://gerrit.wikimedia.org/r/#/c/48121/ and after 30 seconds, it's still going. Expected? [12:57:36] <^demon> apergos: Work on projects that nobody else understands (or wants to do)? ;-) [12:57:55] <^demon> siebrand: Link wfm, maybe you clicked while it was restarting? [12:58:04] * siebrand sighs. [12:58:07] possibly. [12:58:30] that and don't document anything [12:58:31] it loads in chromium but never ends for me on FF [12:58:36] then you become indispensible :-P [12:59:10] mm link not quite working for me [12:59:17] same for https://gerrit.wikimedia.org/r/#/q/Iaedbea9b5f43ac,n,z (note that in FF it doesn't even get redirected to https://gerrit.wikimedia.org/r/#/c/58082/ ) [12:59:19] <^demon> Wfm in firefox too [12:59:19] "Working" it says [12:59:28] heh [12:59:36] obviously I loaded it after you ^demon so it wasn't a startup issue [12:59:40] maybe ^demon meant "Working..." For Me [12:59:44] I"m in ff something [12:59:54] 21.0 [12:59:58] stupid versioning [13:00:13] <^demon> I'm on 19.something [13:00:25] siebrand's URL loaded after some minutes or so but still shows "Working ..." [13:00:40] <^demon> Just updated to 21.0, wfm too [13:00:42] here it still has not loaded [13:01:36] <^demon> :\ [13:02:03] <^demon> Nothing's hitting the error log... [13:02:09] <^demon> Caches are always cold after a restart. [13:02:09] there are so many warnings in error console that I don't know what may be relevant [13:02:54] <^demon> I've got the usual warnings about the avatars 404'ing, but that shouldn't be it (it's due to the batshit way the avatar plugin point is implemented, which we don't even use) [13:03:45] does cache affect time to redirect from https://gerrit.wikimedia.org/r/#/q/Iaedbea9b5f43ac,n,z to https://gerrit.wikimedia.org/r/#/c/58082/ ? [13:04:02] might be indeed faster now *shrugs* [13:04:05] still no changeset for me [13:04:19] should I reload or wait? decisions decisions [13:04:35] open a new tab :p [13:04:43] or a new browser window [13:04:55] the new tab loaded [13:06:03] <^demon> silly gerrit. [13:06:38] <^demon> Ok....so is anyone seeing any other problems? [13:08:23] <^demon> Silence is consent. Yay! [13:08:27] * aude still napping but probably unrelated [13:08:39] shall wait for ryan or andrew [13:08:56] not yet, I clicked around some [13:09:08] * aude didn't change anything in my puppet since last time it ran successfully [13:12:29] morning akosiaris! [13:15:56] ottomata: more like well into the afternoon but good morning to you!! [13:16:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:17:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [13:19:50] New patchset: Siebrand; "Increase ramBufferSizeMB from 32 to 100" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67252 [13:25:57] good afternoon to you akosiaris! [13:26:00] good morning to me! [13:26:11] so, I have (and have had) postrm and postinst scripts for you [13:26:11] and [13:29:03] how shoudl I get them to you? [13:29:10] i don't htink I can push to your github repo [13:29:29] github? do a pull request? [13:29:39] baaah then I have to fork :) [13:29:50] ok ok [13:30:19] have akosiaris put it in gerrit instead of github [13:30:23] oh but akosiaris, ja you have a new repo, right? [13:30:54] paravoid: we wil do that i think, its probably on github because its annoying to figure out branching and pushing on gerrit, espeically if you are afraid you won't be able to remote delete branches withtout special permissions [13:31:32] the script are on gerrit though, in the old deb attempt [13:31:39] https://gerrit.wikimedia.org/r/#/c/53170/10/debian/postinst [13:31:39] https://gerrit.wikimedia.org/r/#/c/53170/10/debian/postrm [13:31:39] * YuviPanda mentions the GitHub to Gerrit bot, wonders if any other repositories would want to participate  [13:31:50] github to gerrit bot? [13:32:02] ottomata: cool that's fine thanx [13:32:09] ottomata: open a pull request in GitHub, automatically creates a patchset in Gerrit [13:32:15] updating it in GitHub also updates gerrit patchset [13:32:29] the github repo is already way to old now and i have done some rebases and changes [13:32:33] (mediawiki.org/wiki/User:Yuvipanda/G2G) enabled on 6-8 reposories now. [13:32:38] i will destroy the repo within the day [13:32:54] unrelated to the current issue, but is going to be useful way to have people work on GitHub while still having things on Gerrit :) [13:32:56] <^demon> YuviPanda: Fwiw, the actual gerrit plugin for github is supposed to be open sourced "real soon now." :) [13:33:01] the workflows of the two are very different so I doubt how well it would work [13:33:01] <^demon> Granted, he said the same thing 3 weeks ago [13:33:07] so the 2 links will suffice for now. Thanks ottomata [13:33:09] ^demon: yes, that's what I heard last time too :P [13:33:20] <^demon> It was repeated this week :p [13:33:33] paravoid: true, but no harm experimenting. Plus there are other advantages to GitHUb [13:33:54] ^demon: do let me know if real time soon happens sometime soon :D [13:37:45] YuviPanda: that's cool [13:37:53] does it allow arbitrary repo name mapping? :D [13:37:58] from github to gerrit? [13:38:02] i would love to try it if so [13:38:04] ottomata: no reason it couldn't. [13:38:12] ottomata: let me know which repos you want to map to [13:38:14] and i'll set it up [13:38:21] <^demon> ottomata: We've trying to push some changes to the replication plugin that allows that. [13:38:47] yeah, thanks ^demon! i talked to christian real briefly about that at AMS [13:39:02] but I think YuviPanda is talking about github -> gerrit, which is real curious [13:39:10] yes, I am [13:39:32] it does only code sync now, but I'll work on comments too once my exams end (Saturday!) [13:39:37] Hmm [13:39:58] actually i'm really not sure which repo I'd want to try right now…hmmmmm [13:40:24] <^demon> !log rolling gerrit back to 2.6-rc0-144-gb1dadd2 (version before any updates). Too many rough edges we didn't catch in testing, zuul is not happy [13:40:33] Logged the message, Master [13:40:52] can we change 'Master' to 'Lord' or 'm'lord' or soemthing similar? [13:43:37] <^demon> !log zuul seems to not be yelling about invalid json anymore [13:43:45] Logged the message, Master [13:44:53] YuviPanda: wouldn't that be sexist [13:44:58] <^demon> Back to the devil you know, I suppose. [13:45:19] Nemo_bis: we could make it alternate between "m'lord" and "m'lady" with 50/50 splits [13:45:38] YuviPanda: that would just make it fairly sexist [13:45:47] well, master is already that way [13:45:48] * Nemo_bis knows no devils [13:45:57] hmm [13:46:11] <^demon> You know me :) [13:46:30] you're a demon [13:46:32] <^demon> I was referring to the old build of gerrit though as the devil we knew :) [13:46:32] difference! [13:47:00] <^demon> There weren't any backward-incompatible changes, so rolling back was easier than playing whack a mole. [13:47:25] YuviPanda: I'm not sur "master" is male-only for this meaning [13:47:41] <^demon> Much as we all like new features and bugfixes, the old version *is* very stable :) [13:49:23] Nemo_bis: why so [13:52:19] O_O Those well-known devils of worth and zeal, Belial Croker an d Beelzebub Peel, With all the minor infernal crew, Were off at the very first holla-balloo ; [13:52:19] bleh, also Carlyle, Thackeray and other disgustings [13:52:46] we should make it say 'Mr. Saavik' [14:00:40] soooo akosiaris1, ja lemme know when I can clone/pull something [14:01:17] yeah i will [14:01:21] hopefully soon [14:17:07] PROBLEM - Packetloss_Average on analytics1003 is CRITICAL: STALE [14:17:08] PROBLEM - Packetloss_Average on analytics1006 is CRITICAL: STALE [14:17:21] PROBLEM - Packetloss_Average on analytics1005 is CRITICAL: STALE [14:17:36] PROBLEM - Packetloss_Average on analytics1004 is CRITICAL: STALE [14:17:47] PROBLEM - Packetloss_Average on analytics1008 is CRITICAL: STALE [14:19:10] ottomata: ^^ [14:22:30] HMMMM [14:22:45] PROBLEM - Packetloss_Average on analytics1009 is CRITICAL: STALE [14:37:32] hmm, anyway around to help me look at those? [14:37:39] it looks like icinga can't get the proper value from ganglia [14:37:50] ./check_ganglios_generic_value -m packet_loss_average -H emery.wikimedia.org -g [14:37:50] STALE [14:37:56] same for all hosts [14:42:51] LeslieCarr: you around? [14:46:55] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67214 [14:53:35] New patchset: Ottomata; "Initial commit of zookeeper module." [operations/puppet/zookeeper] (master) - https://gerrit.wikimedia.org/r/66882 [14:57:32] ottomata: maybe the hash should be the other way around? [14:57:36] id => fqdn? [14:58:12] i need to key by fqdn [14:58:18] i could search the values for fqdn i guess [14:58:22] but that seems less elegant [14:58:41] ah, the lookup you mean [14:59:13] what would happen if someone makes a typo and you assign two fqdns the same id? [14:59:27] like copy/pasting a line, changing the fqdn and forgetting to bump the id? [14:59:59] would the cluster still work minus that misconfigured server? [15:06:14] ees good question, i do not know! hm. [15:06:59] is that something I should deal with though? I mean you could do the same if you were editing the config file manually [15:08:54] just wondering all the implications [15:10:54] <^demon> If memory serves, if you assign duplicate ids to servers in the ensemble, its first come first served. [15:11:05] <^demon> The second (third, fourth) duplicate id's box wouldn't be allowed to join. [15:27:54] ehm, search in Ganglia doesn't work [15:28:22] it does [15:28:27] as long as you keep it in javascript [15:28:30] and never hit enter [15:28:31] :) [15:29:33] New review: Manybubbles; "It looks like we use the default max and min heap size for solr. On some machines this could lead t..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/67252 [15:29:44] not the case for me in 2 browsers [15:30:15] and yes, I know how this search works:) [15:30:21] wfm with firefox 21 [15:31:13] grrr, started working for me too. slowwwwwwwly [15:31:23] manybubbles: the existing solr stuff need some work in general, if you're interested in reusing them for our new search platform, a redesign would be most welcome [15:34:34] <^demon> paravoid: We talked a fair bit yesterday. I think we're going to start iterating with the solr cloud stuff. [15:34:49] perfect [15:35:02] are you collaborating with anyone from ops already? [15:35:16] <^demon> Not yet, just started playing around yesterday in labs. [15:35:25] okay [15:35:27] <^demon> But I was starting to think about what we'd need in terms of ops support + hardware. [15:35:55] I don't feel sufficiently informed about our search infrastructure, but someone from us should be involved in that I think [15:36:19] also we have a few more solr use cases such as the ones mobile/MaxSem have worked on [15:36:46] <^demon> Translation stuff, Max's stuff for mobile, Wikidata. [15:36:48] and it'd be great to find some common ground in all that [15:37:08] <^demon> I'm pretty sure we'll be able to design it as a generally available resource like databases. [15:37:14] <^demon> Just add new cores + their schema and you're set. [15:38:49] cool [15:39:08] we're planning to have some ops member help with that [15:39:46] <^demon> It scales very nicely. Just spin up a new box, have it point to the zookeeper ensemble, and the cloud takes care of the rest. [15:39:58] <^demon> Plus 4.3 has online re-sharding, which was the biggest feature from elastic that I liked. [15:41:21] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67227 [15:41:51] ^demon, is adding a new core is as PITA in S 4 as it is in S 3.6? [15:42:10] <^demon> No, you just use the web admin panel (or the rest API) :) [15:43:03] cuz in 3.6 you use the API but then you still need to create the dir and copy the config files manually [15:43:19] <^demon> Does all that magic now :) [15:44:06] <^demon> mark: Back of the napkin calculations...if our indexes are roughly the same size and search traffic stays roughly the same...~24 boxes for Solr + probably 5 for zookeeper. [15:44:14] so is 4.0 coming?:) [15:44:14] !log reedy synchronized wmf-config/InitialiseSettings.php [15:44:21] <^demon> MaxSem: 4.3 :) [15:44:22] Logged the message, Master [15:44:31] how soon? [15:44:45] <^demon> We just started experimenting and iterating yesterday, no timelines yet. [15:44:52] meh [15:44:55] <^demon> Plus like mark said, gotta work with ops :) [15:45:11] ^demon: ops != just field ops :-) [15:45:29] <^demon> :) [15:45:48] <^demon> But I think the plan was to start with some of the smaller wikis on Solr, then work our way up...so we wouldn't need the full scale cluster to start. [15:45:57] <^demon> We won't do enwiki first :) [15:46:06] what I mean with that is that ops involvement shouldn't just be about the hardware [15:46:08] Change abandoned: MaxSem; "Screw Solr 3.6, long live Solr 4!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29827 [15:46:31] lol [15:46:34] <^demon> paravoid: Indeed. Hardware + advice on what to do (since I'm not ops, I just pretend) [15:46:46] cross-functional teams and everything [15:47:05] <^demon> Well, I like you guys and don't mind working with you all :) [15:47:36] hehe, I think the sentiments go both ways :) [15:48:32] Change abandoned: MaxSem; "If nobody cares, why should I?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35864 [15:49:31] MaxSem: it's not that we don't care, it's that our efforts are concentrated on getting rid of all that [15:49:42] MaxSem: and just switch to varnish already [15:50:39] i've never even seen that change [16:06:53] heya notpeter, you there? [16:07:12] he's on vacation [16:07:52] MaxSem: that too: changes without explicit reviewers tend to get missed [16:08:13] I was trying to find a way to query gerrit for operations/puppet changes without reviewers (excluding jenkins-bot) and I couldn't find a way [16:08:31] ahh, hmk [16:08:33] heh so you ops never look at open changes? [16:09:33] that no reviewers were added? I don't think so [16:09:49] is there any way to do so without looking at ALL changes? [16:11:12] <^demon> paravoid: "is:open project:operations/puppet" [16:11:18] <^demon> Or if you're lazy, "is:open puppet" [16:11:48] <^demon> Projects can also define dashboards. These show up when you click that search icon for your project. [16:12:16] this also has changes under review [16:12:36] or with explicit reviewers attached [16:12:40] paravoid, https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+-reviewerin:ldap/ops,n,z [16:12:51] <^demon> That works :) [16:13:03] <^demon> "Things in puppet not reviewed by ops" [16:13:08] hmmmm [16:13:20] that's actually what I want [16:13:22] thanks! [16:13:49] oh heh OSM module there [16:14:07] MaxSem: that's one of the cases where you *should* add a reviewer [16:14:48] <^demon> https://gerrit.wikimedia.org/r/#/c/60302/ could use review. [16:31:48] ^demon: what do these dashboards look like? Just another list of changes? [16:39:55] <^demon> More like your personal dashboard [16:40:30] New patchset: Andrew Bogott; "Fix wikidata singlenode manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67144 [16:40:36] <^demon> greg-g: https://gerrit-review.googlesource.com/#/admin/projects/gerrit,dashboards - eg these. [16:40:49] <^demon> I'd like to copy those, but the config is hidden to me :( [16:41:18] ask Google?:P [16:42:17] <^demon> https://gerrit.googlesource.com/gerrit/+log/refs/meta/dashboards :( [16:43:08] <^demon> Well, they inherit. What I really want is https://gerrit.googlesource.com/All-Projects/+/refs/meta/dashboards [16:43:41] that last one wants me to login [16:44:02] no access [16:44:11] but yeah, gotcha, thanks :) [16:44:46] <^demon> I'm gonna prod someone. [16:44:51] !log reedy synchronized php-1.22wmf6/ [16:44:58] Logged the message, Master [16:45:45] !log reedy synchronized docroot [16:45:45] w00t one week starting :) [16:45:47] Logged the message, Master [16:46:39] mutante: Any chance you could merge and deal with https://gerrit.wikimedia.org/r/#/c/67203/ ? I missed some of the more basic rewrites due to the place I copied them from [16:46:46] !log reedy synchronized w [16:46:53] Logged the message, Master [16:48:20] ^demon: https://gerrit.wikimedia.org/r/#/c/67194/ one line [16:48:35] <^demon> lol. [16:48:59] <^demon> What if someone assigns $wgGroupPermissions['*']['review'] = true;? [16:49:00] !log reedy synchronized docroot [16:49:02] <^demon> Other than being an idiot ;-) [16:49:07] Logged the message, Master [16:50:02] !log reedy synchronized w [16:50:10] Logged the message, Master [16:50:17] New patchset: Reedy; "1.22wmf6 stuff" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67269 [16:50:58] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67269 [16:53:58] ^demon: yeah, that might miss the point of the ext :) [16:55:03] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: test2wiki to 1.22wmf6 and rebuild l10n cache [16:55:12] Logged the message, Master [16:56:33] ok [16:56:41] time for BBQ [16:57:46] !log reedy Started syncing Wikimedia installation... : test2wiki to 1.22wmf6 and rebuild l10n cache [16:57:53] Logged the message, Master [17:08:39] !log reedy Finished syncing Wikimedia installation... : test2wiki to 1.22wmf6 and rebuild l10n cache [17:08:47] Logged the message, Master [17:09:27] Uhh [17:09:27] -rw-rw-r-- 1 reedy wikidev 21239 Jun 6 16:55 ExtensionMessages-1.22wmf5.php [17:09:27] -rw-r--r-- 1 reedy wikidev 0 Jun 6 16:55 ExtensionMessages-1.22wmf6.php [17:12:43] paravoid, how's zookeeper looking? [17:12:46] approvy? [17:17:27] New patchset: Demon; "Turn off lucene for the time being" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67273 [17:17:52] <^demon> Anyone who could take a look at that for me ^ [17:18:27] !log reedy Started syncing Wikimedia installation... : Take 2 [17:18:34] Logged the message, Master [17:18:59] New patchset: Reedy; "Fix permissions of ExtensionMessages-1.XXwmfX.php (needs group write)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67274 [17:24:57] !log reedy Finished syncing Wikimedia installation... : Take 2 [17:25:05] Logged the message, Master [17:29:00] ottomata: hey [17:35:08] * ^demon tries to get someone's attention [17:35:22] New review: Faidon; "4 spaces, not 2 :) Sorry for not catching it earlier." [operations/puppet/zookeeper] (master) C: -1; - https://gerrit.wikimedia.org/r/66882 [17:36:29] <^demon> paravoid: Can you look at https://gerrit.wikimedia.org/r/#/c/67273/? [17:39:38] is lucene killing gitblit ? [17:41:47] New review: Faidon; "if you say so :)" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/67273 [17:41:48] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67273 [17:42:46] <^demon> Thanks, on sockpuppet now? [17:42:55] yep [17:44:14] <^demon> The performance on it is atrocious right now. [17:44:19] <^demon> And affecting general usage [17:50:37] but lucene is such a maintained clean search engine.... [17:51:51] LeslieCarr: yeah something weird with ganglia / icinga [17:52:14] things that use check_ganglios_generic_value to get stuff into icinga are returning stale [17:52:15] ok, what's up ? or should i actually investigate the backscroll ? [17:52:18] ah [17:52:22] but the values are in ganglia fine [17:52:41] ahha [17:52:47] / is full [17:52:51] fire alarm [17:52:52] bbiab [18:01:16] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: testwiki and mediawikiwiki to 1.22wmf6 [18:01:24] Logged the message, Master [18:02:31] uh so manny ULS commits live [18:03:31] Reedy: does PostEdit need to be undeployed? [18:05:45] Undeployed? [18:06:04] it was integrated into core [18:07:11] Presumably [18:08:37] New patchset: Ottomata; "Initial commit of zookeeper module." [operations/puppet/zookeeper] (master) - https://gerrit.wikimedia.org/r/66882 [18:08:55] back [18:13:10] Change merged: Ottomata; [operations/puppet/zookeeper] (master) - https://gerrit.wikimedia.org/r/66882 [18:16:02] someone installed lilypond which had like 2gig of docs [18:16:13] oh neon or nickel? [18:16:14] we need lilypond for Score [18:16:16] neon [18:16:19] on neon ? [18:16:25] oh, don't know about neon :) [18:16:27] yeah [18:16:35] hopefully you aren't using it for anything on the site ;) [18:18:08] New patchset: Reedy; "testwiki, test2wiki and mediawikiwiki to 1.22wmf6" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67279 [18:18:09] also i don't see lilypond anywhere in puppet ? [18:18:25] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67279 [18:19:26] RECOVERY - Disk space on ms-be1011 is OK: DISK OK [18:19:38] PROBLEM - Packetloss_Average on gadolinium is CRITICAL: STALE [18:19:42] !log reedy synchronized database lists files: [18:19:50] Logged the message, Master [18:20:27] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [18:20:36] Logged the message, Master [18:21:50] RECOVERY - Packetloss_Average on gadolinium is OK: OK: packet_loss_average is -0.481355978261 [18:21:50] aude: http://test.wikidata.org/wiki/Main_Page [18:21:53] Step 1, hurrah! [18:22:06] well, not step 1... [18:22:36] New patchset: Reedy; "Add testwikidatawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67281 [18:22:46] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67281 [18:22:58] New patchset: Reedy; "testwikidatawiki config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65602 [18:23:04] New patchset: Reedy; "testwikidatawiki config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65602 [18:23:26] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65602 [18:23:45] gah Ryan_Lane i am having google fail and brain fail -- do you remember off the top of your head the salt pkg command to see all machines with a certain package installed ? [18:24:03] yeah, but give me a bit [18:24:17] we're going to be doing a presentation for metrics [18:24:54] LeslieCarr: it should be in the grains: http://docs.saltstack.com/ref/modules/all/salt.modules.grains.html [18:25:05] ah yay [18:25:06] !log reedy synchronized wmf-config/ [18:25:11] this, I think: salt '*' grains.get pkg:apache ? [18:25:13] err [18:25:15] Logged the message, Master [18:25:19] this, I think? salt '*' grains.get pkg:apache [18:25:22] bookmarked! [18:26:04] hrm, except that actually doesn't work ... [18:26:14] I upgraded salt yesterday. let me know if you have issues [18:26:26] !log reedy synchronized database lists files: [18:26:34] Logged the message, Master [18:27:31] !log reedy synchronized database lists files: [18:27:40] Logged the message, Master [18:29:03] !log Created translate tables on testwikidatawiki [18:29:11] Logged the message, Master [18:35:22] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [18:36:22] PROBLEM - Puppet freshness on mw1033 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:13] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [18:38:13] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:13] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:13] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:13] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:13] !log removed some extraneous packages from neon which had filled its root partition [18:38:14] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:14] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:15] PROBLEM - Puppet freshness on mw63 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:15] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:16] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:16] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:17] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:17] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:17] !log reedy synchronized wikidata.dblist [18:38:22] Logged the message, Mistress of the network gear. [18:38:30] Logged the message, Master [18:43:09] !log reedy synchronized database lists files: [18:43:17] Logged the message, Master [18:47:31] New patchset: Reedy; "Add testwikidatawiki to the wikidata dblist" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67284 [18:47:54] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67284 [18:52:17] !log reedy synchronized wmf-config/ [18:52:25] Logged the message, Master [18:52:57] !log reedy synchronized database lists files: [18:53:05] Logged the message, Master [19:00:54] !log reedy synchronized w/auth-api.php [19:01:01] Logged the message, Master [19:03:41] !log reedy synchronized wmf-config/CommonSettings.php 'Set wgSecurePollScript' [19:03:49] Logged the message, Master [19:04:35] New patchset: Reedy; "Add auth-api entrypoint for SecurePoll in w" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67287 [19:08:17] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67287 [19:16:53] !log reedy synchronized wmf-config/InitialiseSettings.php [19:17:00] Logged the message, Master [19:30:47] RECOVERY - twemproxy process on mw6 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:31:07] RECOVERY - twemproxy process on mw37 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:31:18] RECOVERY - twemproxy process on mw115 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:31:19] RECOVERY - twemproxy process on mw55 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:31:27] RECOVERY - twemproxy process on mw1050 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:31:27] RECOVERY - twemproxy process on mw61 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:31:27] RECOVERY - twemproxy process on mw99 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:31:37] RECOVERY - twemproxy process on mw113 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:31:47] RECOVERY - twemproxy process on srv235 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:17] RECOVERY - twemproxy process on mw2 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:17] RECOVERY - twemproxy process on mw10 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:17] RECOVERY - twemproxy process on mw4 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:20] !log reedy synchronized php-1.22wmf5/extensions/WikimediaMaintenance/ [19:33:27] RECOVERY - twemproxy process on mw8 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:27] RECOVERY - twemproxy process on mw14 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:27] RECOVERY - twemproxy process on mw5 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:27] RECOVERY - twemproxy process on mw3 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:28] Logged the message, Master [19:33:38] RECOVERY - twemproxy process on mw12 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:38] RECOVERY - twemproxy process on mw7 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:38] RECOVERY - twemproxy process on mw9 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:48] RECOVERY - twemproxy process on mw15 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:48] RECOVERY - twemproxy process on mw13 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:48] RECOVERY - twemproxy process on mw11 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:33:48] RECOVERY - twemproxy process on mw1 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:34:18] !log reedy synchronized wmf-config/interwiki.cdb 'Updating interwiki cache' [19:34:26] Logged the message, Master [19:34:35] New patchset: Reedy; "Update interwiki cache" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67296 [19:34:49] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67296 [19:38:13] !log reedy synchronized wmf-config/InitialiseSettings.php [19:38:21] Logged the message, Master [19:38:46] New patchset: Reedy; "Enable import from wikidatawiki to testwikidatawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67297 [19:40:04] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67297 [19:40:26] RECOVERY - twemproxy process on fenari is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:40:36] RECOVERY - twemproxy process on hume is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:40:46] RECOVERY - twemproxy process on tmh1 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:40:56] RECOVERY - twemproxy process on srv193 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [19:41:14] !log reedy synchronized php-1.22wmf6/extensions/WikimediaMaintenance/ [19:41:29] Logged the message, Master [19:43:54] New patchset: Faidon; "Ceph: upgrade to Cuttlefish" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67299 [19:44:13] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67299 [19:52:55] New patchset: awjrichards; "Remove unused 'site' key from wgMFCustomLogos" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67300 [19:54:12] New review: MaxSem; "To be deployed after https://gerrit.wikimedia.org/r/67159" [operations/mediawiki-config] (master); V: 2 C: -2; - https://gerrit.wikimedia.org/r/67300 [19:59:07] New patchset: Ottomata; "Adding zookeeper submodule" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67302 [20:02:27] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67302 [20:06:05] New patchset: Ottomata; "Renaming role::hadoop classes to role::analytics::hadoop" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66976 [20:06:12] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66976 [20:08:06] PROBLEM - Host wtp1008 is DOWN: PING CRITICAL - Packet loss = 100% [20:08:16] RECOVERY - Puppet freshness on mw1033 is OK: puppet ran at Thu Jun 6 20:08:08 UTC 2013 [20:08:16] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Thu Jun 6 20:08:09 UTC 2013 [20:09:33] RECOVERY - Host wtp1008 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [20:09:46] PROBLEM - NTP on wtp1008 is CRITICAL: NTP CRITICAL: Offset unknown [20:10:36] New patchset: Ottomata; "Adding role/analytics/zookeeper.pp for Analytics zookeeper puppetization in labs and production" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67306 [20:11:10] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67306 [20:12:10] New patchset: Ottomata; "Running zookeeper on labs kraken-puppet instance (so we don't conflict with cdh4 zookeeper packages)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67307 [20:12:42] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67307 [20:12:44] RECOVERY - NTP on wtp1008 is OK: NTP OK: Offset 0.003912568092 secs [20:21:33] LeslieCarr: Random informal question: how would you feel about taking the Parsoid setup in Tampa and killing it with fire? [20:22:46] It can't support the load that we're gonna put on the eqiad cluster starting a few weeks from now, so we'd have to either expand it (which seems silly) or not use it, in which case we might as well repurpose the boxes or whatever [20:23:10] RobH: Wondering what you think too ---^^ [20:26:09] New patchset: Andrew Bogott; "Don't override $wgServer if $_SERVER["SERVER_NAME"] is undefined." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67311 [20:26:12] New patchset: Ottomata; "Using hash for zookeeper_hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67312 [20:26:25] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67312 [20:31:12] New patchset: Catrope; "Send all Parsoid traffic to eqiad, even when running out of Tampa" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67314 [20:33:19] Change merged: Catrope; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67314 [20:33:20] New review: Manybubbles; "Thank you so much for making this change! It helps folks like me who aren't yet used to PHP and get..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/67311 [20:33:59] !log reedy synchronized php-1.22wmf6/extensions/ [20:34:07] Logged the message, Master [20:35:34] !log catrope synchronized wmf-config/CommonSettings.php 'Send all Parsoid requests to eqiad' [20:35:42] Logged the message, Master [20:36:02] New patchset: Reedy; "Kill 1.22wmf1 and 1.22wmf2" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67317 [20:36:14] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67317 [20:37:11] greg-g, could i push out zero ext (minor change) -- adding some logging for weird cases [20:37:58] In the puppet configs, if a maintenance script is set to run every "monthday => '*/3'", what does that mean? The 3rd of each month? Every 3rd day? [20:37:58] RoanKattouw: hehe [20:38:36] RoanKattouw: first off…. FIRE!!!!! secondly, sure :) [20:38:40] more to decom, the better [20:38:53] if it can't handle it anyways, better not make it appear as if it could [20:38:58] Yeah exactly [20:39:01] kaldari: IIRC it's 3rd of the month [20:39:06] No [20:39:10] It's the 3rd, 6th, 9th, ... [20:39:11] LeslieCarr: Thanks! [20:39:13] oops [20:39:25] ok [20:39:27] "monthday" => "3" would be the third [20:39:28] RoanKattouw wins :) [20:39:32] so any day divisible by 3 [20:39:37] Yes [20:39:41] got it [20:40:00] Which means every three days except occasionally more or less frequently around month boundaries [20:40:11] and includes today :) [20:40:20] Indeed [20:40:20] which is what I was wondering [20:40:27] kaldari, it's 7th there:) [20:40:40] rats :P [20:40:46] New patchset: Mattflaschen; "Remove PostEdit configuration on all wikis." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67318 [20:40:48] where? [20:41:26] I assume the puppet servers are on UTC [20:41:45] haven't we switched them to moscow time? [20:41:47] but I don't actually know [20:42:06] The puppet server doesn't make that determination [20:42:19] It just writes */3 into the crontab, then cron on the local machine schedules things [20:42:58] New patchset: Reedy; "Update size related dblists" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67321 [20:44:04] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67321 [20:44:41] !log reedy synchronized database lists files: [20:44:48] Logged the message, Master [20:44:51] Reedy, are you deploying anything? [20:45:02] Random things [20:45:18] ok, so any objections if i push out a few minor changes to zero config? [20:45:23] i mean - zero extension [20:45:39] New patchset: CSteipp; "Add test.wikidata.org to SUL2" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67322 [20:45:41] no scaping, just dir-sync [20:45:54] Fine by me [20:46:08] I don't think it's still my window [20:46:23] greg-g: UTC is confusing. I can never remember if I'm in UTC or not due to the time of year :p [20:46:39] It's supposed to be e3 right now [20:46:44] Reedy, there's a gadget that displays UTC [20:47:13] We don't have it on wikitech currently [20:47:19] and it's also not so useful at the top of the screen [20:50:22] New patchset: Jalexander; "wmf favicon for votewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67327 [20:50:31] * Reedy pets Jamesofur [20:50:46] * Jamesofur purrs [20:51:26] Reedy: blame siebrand, he makes me do everything in UTC ;) [20:51:45] I blame siebrand for lots of things ;) [20:51:55] we need a super awesome cache breaking function to display everything in the users timezone [20:52:04] :) [20:52:16] isn't there a gadget for that on en? [20:52:29] Probably [20:52:56] "Change UTC-based times and dates, such as those used in signatures, to be relative to local time." [20:53:05] yup lol [20:53:21] Reedy: the wiki looks great [20:54:05] * aude got user id 28 :/ [20:54:11] on wikidata i am user #13 :) [20:54:45] I'm still suprised so few people have logged into https://login.wikimedia.org [20:54:49] dir-syncing zero ext... [20:54:50] 116 [20:54:53] Is Jenkins down? Or just very busy? [20:55:11] "The Wikidata Test Wiki may also be used in a classroom environment for teaching basic wiki editing to avoid the great confusion resulting from creating pages and breaking a regular project" :) [20:55:27] New review: Spage; "no postedit left in mediawiki-config after this change" [operations/mediawiki-config] (master) C: 2; - https://gerrit.wikimedia.org/r/67318 [20:55:31] good luck :) [20:55:37] aude: Guess it needs a config update for test2 to use it as a repo. Then any relevant duplication of cronjobs etc [20:55:52] sure, [20:55:58] i can look at it more tomorrow [20:56:00] I enabled import from wikidata... No idea whow well that may work [20:56:07] should be ok [20:57:03] !log yurik synchronized php-1.22wmf5/extensions/ZeroRatedMobileAccess [20:57:10] New patchset: GWicke; "Clamp parsoid host to "parsoid"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67328 [20:57:10] Logged the message, Master [20:57:33] No doubt a few wrinkles to fix later [20:57:38] sure [20:57:40] !log Restarting zuul on gallium [20:57:48] Logged the message, Mr. Obvious [20:59:14] !log yurik synchronized php-1.22wmf6/extensions/ZeroRatedMobileAccess [20:59:23] Logged the message, Master [20:59:24] New review: Spage; "recheck" [operations/mediawiki-config] (master); V: 1 - https://gerrit.wikimedia.org/r/67318 [21:01:26] Hrmph [21:01:32] Zuul is not receiving events from Gerrit at all [21:01:33] Btw, greg-g, can I get a time today to make testwikidatawiki use the new SUL? [21:01:48] ^demon|away, Krinkle|detached: Zuul is totally broken ---^^ [21:02:14] csteipp: sure thing, between 3-4 is open [21:02:26] greg-g: Cool, I'll do that [21:02:33] It should be quick [21:03:04] cool [21:04:13] New review: Spage; "forcing, Zuul/Jenkins broken" [operations/mediawiki-config] (master); V: 2 - https://gerrit.wikimedia.org/r/67318 [21:04:14] Change merged: Spage; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67318 [21:04:53] greg-g, i'm done with deployment, hope i haven't stepped too hard on anyone's toes :) [21:05:01] csteipp: would be cool to have new SUL [21:05:28] greg-g: Please be aware that Zuul/Jenkins is dead right now, this may impede deployments. I've tried to get people's attention but Antoine, Timo and Chad are all AWOL [21:05:36] now if only i could figure out where all my log entries are coming from :(\ [21:05:44] I also tried restarting the service as documented, but that didn't do anything [21:06:02] yurik, thanks, E3 is deploying. greg-g, we're forcing our merges. [21:06:25] weee :( [21:07:15] spagewmf, did i break it? :( [21:09:20] !log Running updateSpecialPages.php on testwiki to test new Disambigutor special pages [21:09:28] Logged the message, Master [21:10:18] !log completed updateSpecialPages.php on testwiki [21:10:26] Logged the message, Master [21:13:51] New patchset: Krinkle; "wgRC2UDPPrefix: Use hostname-".org" instead of lang.site" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47307 [21:14:18] RoanKattouw: Checking now [21:14:28] User::addToDatabase: hit a key conflict attempting to insert user 'Doruta micutza' row, but it was not present in select! [21:14:37] csteipp: strange that there are still some of those [21:14:37] RoanKattouw: Did you do restart or reload of zuul? [21:14:40] restart [21:15:17] Aaron|home: how many per day are we getting? [21:15:28] RoanKattouw: anything interesting from stdout of that command? [21:15:39] Nope [21:15:54] Oh! [21:15:56] In the logs: [21:15:58] 2013-06-06 21:07:36,994 ERROR gerrit.GerritWatcher: Exception on ssh event stream: [21:16:09] RoanKattouw: When did it stop? Within last hour? [21:16:17] s/stop/fail [21:16:18] I think so [21:16:19] Krinkle: Longer ago, I think. [21:16:26] raise ValueError("No JSON object could be decoded") [21:16:35] Gerrit change was rollbacked earlier today, that's why I'm wondering [21:16:40] csteipp: ~68 [21:16:57] That usally means gerrit is sending invalid JSON [21:17:01] Hmmm [21:17:04] Let me try kicking it again [21:17:11] As in sending it another Gerrit event [21:17:18] Cause it has been responding to a few now [21:17:36] RoanKattouw: invalid json in the past has been the result of the connection being terminated between gerrit and zuul. Can we try restarting Gerrit? [21:19:08] Yup and more invalid JSON errors in the log [21:19:39] Jenkins is idling and waiting, so no likely issues on that end. This is definitely Zuul/Gerrit not communicating, e.g. the json packages not being delivered. [21:19:58] invalid json is a bit of an odd error, more like absence of json and some low-level connection borking [21:20:24] 13:44 ^demon: zuul seems to not be yelling about invalid json anymore [21:20:25] 13:41 ^demon: rolling gerrit back to 2.6-rc0-144-gb1dadd2 (version before any updates). Too many rough edges we didn't catch in testing, zuul is not happy [21:20:25] 12:53 ^demon: gerrit rollback to 2.7-rc1-420-g5d5c5c3, some of the new work wasn't quite ready for prime time [21:20:26] 12:30 ^demon: gerrit's back, running 2.7-rc1-424-gef469ac [21:22:11] https://github.com/openstack-infra/zuul/blob/master/zuul/lib/gerrit.py?source=cc#L37-L43 [21:23:31] The invalid json errors are not appeared as a result of gerrit events [21:23:49] Oh OK [21:23:53] I think it is the error appearing when it tries to establish the socket [21:23:57] I don't really want to touch restarting Gerrit right now [21:24:01] But you could ask Ryan to [21:24:08] I didn't get any new json errors when triggering gerrit events [21:24:11] fxing gitdeploy right now [21:24:13] what is the normal method to set up extension configurations when that extension is not yet deployed to all wikis? [21:24:20] I *really* need a new keyboard [21:24:34] my keys are all screwed up and keypresses don't always go through :( [21:24:45] Parsoid is in wmf6, but not in wmf5 [21:24:56] * Damianz wonders if shipping Ryan_Lane a keyboard with no letters on would be evil [21:25:04] Damianz: it would work for me [21:25:08] I don't look at the keyboard [21:25:16] gwicke: 'wmgFoobarConfigThing' => array( 'default' => false, 'enwiki' => true ), etc [21:25:23] gwicke: in $wgConf in InitializeSettings.php, create a config key the maps to an array( default => false, [21:25:27] Reedy, damn you. [21:25:35] !log Restarting zuul on gallium [21:25:41] He was talking about an extension that's in wmf6 but not wmf5 [21:25:43] Logged the message, Master [21:25:50] I told him he needs to put it in wmf5 as well or things will break [21:25:53] ah, ok- is there a way to key on the branch too? [21:26:06] or do I have to list the wikis the branch might be deployed to explicitly? [21:26:17] No, you can't do any such thing [21:26:24] It has to be present in both branches, or it will break [21:26:46] Unless you depend on something that's new in core in wmf6, it's best to just enable it on all wikis that have VE [21:27:11] the code won't be there in wikis that still run wmf5 [21:27:23] (and by "depend" I mean "horrible things will happen when running on wmf6", not "it doesn't work but doesn't actively break anything") [21:27:23] RoanKattouw: ..what will break? you don't require_once unless the configuration array indicates it's enabled for the wiki, which it will only for wmf6 wikis [21:27:35] ori-l: Localization and extension-list stuff [21:27:49] That's all global [21:28:23] I was thinking about simply checking if the extension is there, but that won't likely work in extension-list [21:28:57] No, extension-list is global [21:28:57] New patchset: Catrope; "Fix port for Parsoid health check" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67336 [21:29:06] So you're gonna have to add the extension to both branches [21:29:37] And unless there's a really good reason not to, I recommend you enable it on all wikis that have VE (by adding to the $wmgUseVisualEditor block in CommonSettings) [21:29:42] LeslieCarr: ping [21:29:52] RoanKattouw: that is what I have right now [21:29:54] LeslieCarr: One-liner: https://gerrit.wikimedia.org/r/67336 [21:29:56] gwicke: OK good [21:30:04] That should work fine, as long as you also add the ext to wmf5 [21:30:17] Which I or Reedy can help you with if you like [21:30:25] that would be helpful [21:31:04] preilly: what's up ? [21:31:21] the wmf6 bit was done by adding Parsoid to the list of default extensions in release/make-wmf-branch [21:31:43] New patchset: Ryan Lane; "git-deploy updates for salt upgrade" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67337 [21:31:46] Hmm, this is scary, there was no monitoring for parsoidcache eqiad LVS [21:31:47] LeslieCarr: can I send you a PM [21:31:53] hehe sure [21:31:54] gwicke: I'll do that then [21:32:00] i always find it hilarious when people ask me if they can send me a pm [21:32:08] bleh. I really need a test deploy repo [21:32:08] RoanKattouw: cool, thanks! [21:32:56] Ryan_Lane: you can use EventLogging, if you like [21:33:09] will it not cause issues? [21:33:10] gwicke: Bear with me while I perform some time-consuming git operations :) [21:33:29] I need to merge these changes and deploy them all the way through first [21:33:46] Ryan_Lane: no; I've been lazy and haven't configured the binaries to run from the deploy target dir, so it still requires an explicit run of python setup.py install. My negligence is your gain :P [21:33:52] oh [21:33:53] cool [21:34:05] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67337 [21:34:37] soon I'll set up a sham repo and deployment target [21:35:10] Ryan_Lane: RoanKattouw: I think I've reached a blank. I'm not getting anything from Zuul, all I can think of is giving Gerrit another kick. I've got various tabs on logs to see what it does. [21:35:28] ok. want me to do so? [21:35:34] yes [21:35:43] openstack folks told us increasing the ssh connections would help [21:35:55] Yes, that's what they said last time [21:35:56] I'm not sure if ^demon went through those changes [21:36:12] maybe I'll apply some myself next week [21:36:31] restarting gerrit [21:37:20] Ryan_Lane: From the regualr log I onyl see invalid json (it started again now that you did the restart) [21:37:26] Ryan_Lane: RoanKattouw: From debug.log I see "error: [Errno 111] Connection refused " [21:37:39] New patchset: GWicke; "Enable Parsoid for all VisualEditor wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67338 [21:37:48] https://gist.github.com/anonymous/eb876b0e818e368e323c [21:38:02] It is unable to make a connection to Gerrit it seems [21:38:20] it's restarted now [21:38:40] Ah, I see. It was because Gerrit was temprarily down [21:38:45] New patchset: GWicke; "Enable Parsoid for all VisualEditor wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67338 [21:38:48] connection no longer being refused [21:38:53] and... it seems Zuul is back up [21:38:58] New patchset: GWicke; "Enable Parsoid for all VisualEditor wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67338 [21:39:20] New patchset: GWicke; "Enable Parsoid for all VisualEditor wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67338 [21:39:35] !log catrope synchronized php-1.22wmf5/extensions/Parsoid 'Adding Parsoid to wmf5' [21:39:43] Logged the message, Master [21:39:50] New patchset: Reedy; "Fixup docroot code to work for wikimanias all from one docroot folder" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67341 [21:41:31] gwicke: Your change looks fine, let's go for it [21:41:41] Krinkle: Thanks so much man [21:41:46] hm. well, deploy works, but reporting isn't [21:41:51] * Ryan_Lane grumbles [21:42:15] New patchset: Reedy; "Fixup docroot code to work for wikimanias all from one docroot folder" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67341 [21:42:22] !log Zuul gave up on connecting to Gerrit. Restarting Zuul made it try again. Restarting Gerrit again made it connect and it seems back up. [21:42:30] Logged the message, Master [21:42:55] RoanKattouw: wmf5 would need to be synced before the config can go live [21:43:06] New patchset: Reedy; "Fixup docroot code to work for wikimanias all from one docroot folder" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67341 [21:43:13] gwicke: Already done [21:43:14] oh, you already did- nm [21:43:14] I'll hang out for a few minutes to see if it stays. Otherwise I'll be off again until I get home. [21:43:21] gwicke: logmsgbot !log catrope synchronized php-1.22wmf5/extensions/Parsoid 'Adding Parsoid to wmf5' [21:44:14] RoanKattouw: hold off for a moment.. [21:44:36] New patchset: Reedy; "Simplify wikimania apache conf" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/62566 [21:45:03] gwicke: I'm not doing anything [21:45:12] Do you have deployer shell access? [21:45:52] New patchset: GWicke; "Enable Parsoid for all VisualEditor wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67338 [21:46:32] RoanKattouw: I believe so [21:46:40] New patchset: Reedy; "Simplify wikimania apache conf" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/62566 [21:48:54] a bit scary that there is no way to test a config first [21:50:57] oh. reporting is working after all, it was broken before I fixed the other stuff. heh. horray [21:51:00] *hooray too [21:52:55] The E3 deploy is running late, but we should be done before lightning starts. [21:53:15] DocumentRoot "/usr/local/apache/common/docroot/arbcom_nlwiki" [21:53:15] ServerName arbcom.de.wikimedia.org [21:53:18] I wonder how that ever worked [21:54:22] hahaha [21:55:43] RoanKattouw: btw, your parsoid cluster is marked with a grain, so you can target all of them at once: salt -G 'cluster:parsoid' saltutil.sync_all [21:56:02] all system are actually marked with grains, based on their ganglia cluster [22:00:32] Oh cool [22:00:34] !log mflaschen Started syncing Wikimedia installation... : Deploying PostEdit change to core, GettingStarted and GuidedTour for E3 deployment [22:00:43] Logged the message, Master [22:02:17] New patchset: Catrope; "Remove misc::parsoid::cache, unused now" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67347 [22:02:31] superm401: Can you ping me when you're done? I had a deploy before 4 too. [22:02:34] New patchset: Catrope; "Clean up DSH list for Parsoid" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67348 [22:02:45] I have an LD window but I'll be patient [22:03:05] New patchset: Catrope; "Actually monitor parsoidcache in eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67349 [22:03:19] New patchset: Catrope; "Kill Parsoid in Tampa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67350 [22:03:32] csteipp, yes, sorry, I didn't know that. [22:03:34] New patchset: Reedy; "Move/redirect arbcom.XX wikis to arbcom-XX" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/67351 [22:03:52] gwicke: Looks good. Let me know when you're actually deploying it (not now, because superm401 is deploying) and I'll +2 it then [22:03:54] I think csteipp is next [22:04:07] If there's no one else after csteipp, gwicke could go after him, and then me [22:05:32] ok [22:05:52] New review: Alex Monk; "(1 comment)" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/47307 [22:06:48] New patchset: Catrope; "Decommission the Parsoid boxes in Tampa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67354 [22:07:35] New review: Alex Monk; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47307 [22:08:30] New review: Alex Monk; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47307 [22:09:58] PROBLEM - Puppet freshness on mw1143 is CRITICAL: No successful Puppet run in the last 10 hours [22:09:58] PROBLEM - Puppet freshness on mw23 is CRITICAL: No successful Puppet run in the last 10 hours [22:10:29] PROBLEM - Host wtp1008 is DOWN: PING CRITICAL - Packet loss = 100% [22:10:58] PROBLEM - Puppet freshness on amssq51 is CRITICAL: No successful Puppet run in the last 10 hours [22:10:58] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: No successful Puppet run in the last 10 hours [22:10:58] PROBLEM - Puppet freshness on amssq56 is CRITICAL: No successful Puppet run in the last 10 hours [22:10:58] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 10 hours [22:10:58] PROBLEM - Puppet freshness on cp1017 is CRITICAL: No successful Puppet run in the last 10 hours [22:10:58] PROBLEM - Puppet freshness on es1004 is CRITICAL: No successful Puppet run in the last 10 hours [22:10:59] PROBLEM - Puppet freshness on lvs1002 is CRITICAL: No successful Puppet run in the last 10 hours [22:10:59] PROBLEM - Puppet freshness on mc14 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:00] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:00] PROBLEM - Puppet freshness on mw1021 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:01] PROBLEM - Puppet freshness on mw1104 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:01] PROBLEM - Puppet freshness on mw1131 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:02] PROBLEM - Puppet freshness on mw1154 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:02] PROBLEM - Puppet freshness on mw1155 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:03] PROBLEM - Puppet freshness on mw1208 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:03] PROBLEM - Puppet freshness on mw74 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:03] !log mflaschen Finished syncing Wikimedia installation... : Deploying PostEdit change to core, GettingStarted and GuidedTour for E3 deployment [22:11:04] PROBLEM - Puppet freshness on mw93 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:04] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [22:11:05] PROBLEM - Puppet freshness on sq84 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:05] PROBLEM - Puppet freshness on srv261 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:06] PROBLEM - Puppet freshness on srv283 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:06] PROBLEM - Puppet freshness on ssl1003 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:07] gwicke: Are you messing with Parsoid (as in actual Parsoid, not the MW part of it) right now? Mind if I deploy a config change? [22:11:10] Logged the message, Master [22:11:16] Yay fun, puppet breakage [22:11:20] RoanKattouw: no, I'm not doing anything yet [22:11:23] OK [22:11:24] csteipp, done. [22:11:28] RECOVERY - Host wtp1008 is UP: PING OK - Packet loss = 0%, RTA = 0.46 ms [22:11:33] Then I'm gonna do a config change in /srv/deployment/parsoid/config [22:11:55] hooray. batch runs are working in this version of salt :) [22:11:58] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:58] PROBLEM - Puppet freshness on capella is CRITICAL: No successful Puppet run in the last 10 hours [22:11:58] PROBLEM - Puppet freshness on cp1034 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:58] PROBLEM - Puppet freshness on cp1018 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:58] PROBLEM - Puppet freshness on db1005 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:58] PROBLEM - Puppet freshness on db1026 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:59] PROBLEM - Puppet freshness on db1027 is CRITICAL: No successful Puppet run in the last 10 hours [22:11:59] PROBLEM - Puppet freshness on db32 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:00] PROBLEM - Puppet freshness on db69 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:00] PROBLEM - Puppet freshness on db67 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:01] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:01] PROBLEM - Puppet freshness on grosley is CRITICAL: No successful Puppet run in the last 10 hours [22:12:02] PROBLEM - Puppet freshness on hume is CRITICAL: No successful Puppet run in the last 10 hours [22:12:02] PROBLEM - Puppet freshness on mw1180 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:03] PROBLEM - Puppet freshness on mw1049 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:03] PROBLEM - Puppet freshness on mw1084 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:03] superm401: thanks, RoanKattouw it should be about 10-20 mins [22:12:04] PROBLEM - Puppet freshness on ms6 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:04] PROBLEM - Puppet freshness on mw1194 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:05] PROBLEM - Puppet freshness on mw13 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:05] PROBLEM - Puppet freshness on palladium is CRITICAL: No successful Puppet run in the last 10 hours [22:12:05] ewwww [22:12:06] PROBLEM - Puppet freshness on sanger is CRITICAL: No successful Puppet run in the last 10 hours [22:12:06] PROBLEM - Puppet freshness on snapshot1002 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:07] PROBLEM - Puppet freshness on sq62 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:07] PROBLEM - Puppet freshness on srv299 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:08] PROBLEM - Puppet freshness on ssl4 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:08] PROBLEM - Puppet freshness on tmh1001 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:14] !log Running puppet on srv283 to figure out what broke [22:12:22] Logged the message, Mr. Obvious [22:12:47] Change merged: CSteipp; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67322 [22:13:03] PROBLEM - Puppet freshness on amssq31 is CRITICAL: No successful Puppet run in the last 10 hours [22:13:03] PROBLEM - Puppet freshness on amssq35 is CRITICAL: No successful Puppet run in the last 10 hours [22:13:03] PROBLEM - Puppet freshness on amssq37 is CRITICAL: No successful Puppet run in the last 10 hours [22:13:03] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: No successful Puppet run in the last 10 hours [22:13:03] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: No successful Puppet run in the last 10 hours [22:13:19] RoanKattouw: what do you plan to change? [22:13:33] (our code is not updated yet) [22:13:36] !log WTF, puppet ran just fine on srv283 [22:13:43] gwicke: Just adding a bunch of interwiki prefixes in localsettings.js [22:13:44] Logged the message, Mr. Obvious [22:13:58] RoanKattouw: for wikipedias? [22:14:37] wikipedia languages are all predefined by default [22:14:51] New patchset: Ottomata; "Fixing public-datasets rsync job on stat1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67355 [22:14:58] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: No successful Puppet run in the last 10 hours [22:14:58] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [22:14:58] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [22:14:58] PROBLEM - Puppet freshness on db1010 is CRITICAL: No successful Puppet run in the last 10 hours [22:14:58] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: No successful Puppet run in the last 10 hours [22:15:00] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67355 [22:15:34] New review: RobH; "looks good, but dependent changes need merge first" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/67354 [22:15:59] PROBLEM - Puppet freshness on aluminium is CRITICAL: No successful Puppet run in the last 10 hours [22:15:59] PROBLEM - Puppet freshness on amssq40 is CRITICAL: No successful Puppet run in the last 10 hours [22:15:59] PROBLEM - Puppet freshness on amssq59 is CRITICAL: No successful Puppet run in the last 10 hours [22:15:59] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: No successful Puppet run in the last 10 hours [22:15:59] PROBLEM - Puppet freshness on cp3006 is CRITICAL: No successful Puppet run in the last 10 hours [22:16:22] RoanKattouw: if you used $wgLanguageCode then our default prefixes would work as-is [22:16:48] We use $wgDBname for the prefixes now [22:16:54] Because we need to be able to support non-Wikipedias [22:17:35] New review: RobH; "seems fine to me" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/67350 [22:18:58] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: No successful Puppet run in the last 10 hours [22:18:58] PROBLEM - Puppet freshness on cp1001 is CRITICAL: No successful Puppet run in the last 10 hours [22:18:58] PROBLEM - Puppet freshness on cp1039 is CRITICAL: No successful Puppet run in the last 10 hours [22:18:58] PROBLEM - Puppet freshness on db1045 is CRITICAL: No successful Puppet run in the last 10 hours [22:18:58] PROBLEM - Puppet freshness on db1057 is CRITICAL: No successful Puppet run in the last 10 hours [22:19:07] AaronSchulz: did you see the conclusion to that master query thing yesterday? [22:19:58] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [22:19:58] PROBLEM - Puppet freshness on cp1009 is CRITICAL: No successful Puppet run in the last 10 hours [22:19:58] PROBLEM - Puppet freshness on db1015 is CRITICAL: No successful Puppet run in the last 10 hours [22:19:58] PROBLEM - Puppet freshness on db1018 is CRITICAL: No successful Puppet run in the last 10 hours [22:19:58] PROBLEM - Puppet freshness on db1046 is CRITICAL: No successful Puppet run in the last 10 hours [22:20:43] RoanKattouw: we'll have to tweak our extension then before you can use cached versions [22:20:56] OK [22:20:58] PROBLEM - Puppet freshness on amssq36 is CRITICAL: No successful Puppet run in the last 10 hours [22:20:58] PROBLEM - Puppet freshness on amssq41 is CRITICAL: No successful Puppet run in the last 10 hours [22:20:58] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: No successful Puppet run in the last 10 hours [22:20:58] PROBLEM - Puppet freshness on amssq60 is CRITICAL: No successful Puppet run in the last 10 hours [22:20:59] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: No successful Puppet run in the last 10 hours [22:20:59] PROBLEM - Puppet freshness on ersch is CRITICAL: No successful Puppet run in the last 10 hours [22:20:59] PROBLEM - Puppet freshness on db1028 is CRITICAL: No successful Puppet run in the last 10 hours [22:21:00] Just let me know when it's ready [22:21:44] !log csteipp synchronized wmf-config/InitialiseSettings.php 'enable SUL2 on testwikidatawiki' [22:21:52] Logged the message, Master [22:21:58] PROBLEM - Puppet freshness on amssq58 is CRITICAL: No successful Puppet run in the last 10 hours [22:21:58] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [22:21:58] PROBLEM - Puppet freshness on cp1008 is CRITICAL: No successful Puppet run in the last 10 hours [22:21:58] PROBLEM - Puppet freshness on cp1019 is CRITICAL: No successful Puppet run in the last 10 hours [22:21:58] PROBLEM - Puppet freshness on cp3021 is CRITICAL: No successful Puppet run in the last 10 hours [22:22:21] is there documentation on how to efficiently deploy config changes to both deployed branches? [22:22:50] Config changes only happen in one place... [22:22:58] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: No successful Puppet run in the last 10 hours [22:22:58] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: No successful Puppet run in the last 10 hours [22:22:58] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [22:22:58] PROBLEM - Puppet freshness on db1007 is CRITICAL: No successful Puppet run in the last 10 hours [22:22:58] PROBLEM - Puppet freshness on db1039 is CRITICAL: No successful Puppet run in the last 10 hours [22:23:03] git pull in wmf-config and then sync-file on the files that changed? [22:23:07] RoanKattouw: I'm done. That was easier than I remembered.. [22:23:20] Yup gwicke [22:23:31] ok, thanks! [22:23:55] RoanKattouw: https://gerrit.wikimedia.org/r/#/c/67338/ [22:23:58] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: No successful Puppet run in the last 10 hours [22:23:59] PROBLEM - Puppet freshness on cp1004 is CRITICAL: No successful Puppet run in the last 10 hours [22:23:59] PROBLEM - Puppet freshness on cp1011 is CRITICAL: No successful Puppet run in the last 10 hours [22:23:59] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [22:23:59] PROBLEM - Puppet freshness on cp3022 is CRITICAL: No successful Puppet run in the last 10 hours [22:23:59] PROBLEM - Puppet freshness on db1030 is CRITICAL: No successful Puppet run in the last 10 hours [22:24:05] csteipp: heya, is SUL going to be on wmf5 as well as 6? [22:24:25] gwicke: Want me to merge that now? [22:24:27] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67347 [22:24:43] RoanKattouw, csteipp: if csteipp is done & I am next in line then yes [22:24:43] greg-g: The SUL2 stuff is probably not going to go out until after a UX signoff on the 19th [22:24:45] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67348 [22:25:07] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67349 [22:25:07] OK here we go hten [22:25:09] gwicke: I'm done [22:25:15] gwicke: +2ed, Jenkins will merge [22:25:17] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67350 [22:25:22] Once merged, log into tin and run git pull in /a/common [22:25:32] New patchset: Ryan Lane; "Decommission the Parsoid boxes in Tampa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67354 [22:25:33] Then scap [22:25:46] really? I though scap was way too slow [22:25:50] *thought [22:25:56] It is but you're touching extension-list [22:25:58] PROBLEM - Puppet freshness on amssq38 is CRITICAL: No successful Puppet run in the last 10 hours [22:25:58] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [22:25:58] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [22:25:58] PROBLEM - Puppet freshness on db1006 is CRITICAL: No successful Puppet run in the last 10 hours [22:25:58] PROBLEM - Puppet freshness on db1008 is CRITICAL: No successful Puppet run in the last 10 hours [22:26:06] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67338 [22:26:14] csteipp: ah, I may have misunderstood jdlrobson's question/context then. [22:26:37] ah, ok- yes. We want the credits message to be localized, at least in English ;) [22:26:38] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67354 [22:26:58] ^ MaxSem [22:26:59] PROBLEM - Puppet freshness on amssq33 is CRITICAL: No successful Puppet run in the last 10 hours [22:26:59] PROBLEM - Puppet freshness on amssq42 is CRITICAL: No successful Puppet run in the last 10 hours [22:26:59] PROBLEM - Puppet freshness on amssq45 is CRITICAL: No successful Puppet run in the last 10 hours [22:26:59] PROBLEM - Puppet freshness on amssq49 is CRITICAL: No successful Puppet run in the last 10 hours [22:26:59] PROBLEM - Puppet freshness on amssq54 is CRITICAL: No successful Puppet run in the last 10 hours [22:27:14] csteipp: UX sign off?! :( [22:27:28] hey ops people: who wants to be the whistleblower and reveal if WMF has been tapped by the NSA as well? http://www.guardian.co.uk/world/2013/jun/06/us-tech-giants-nsa-data [22:27:31] we just want the tokenzzzzzz [22:27:57] !log deploying https://gerrit.wikimedia.org/r/#/c/67338/ [22:27:58] PROBLEM - Puppet freshness on amssq39 is CRITICAL: No successful Puppet run in the last 10 hours [22:27:59] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [22:27:59] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: No successful Puppet run in the last 10 hours [22:27:59] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [22:27:59] PROBLEM - Puppet freshness on amssq61 is CRITICAL: No successful Puppet run in the last 10 hours [22:27:59] PROBLEM - Puppet freshness on db1037 is CRITICAL: No successful Puppet run in the last 10 hours [22:28:05] Logged the message, Master [22:28:58] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: No successful Puppet run in the last 10 hours [22:28:58] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [22:28:59] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [22:28:59] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [22:28:59] PROBLEM - Puppet freshness on db52 is CRITICAL: No successful Puppet run in the last 10 hours [22:28:59] PROBLEM - Puppet freshness on mc1003 is CRITICAL: No successful Puppet run in the last 10 hours [22:28:59] PROBLEM - Puppet freshness on maerlant is CRITICAL: No successful Puppet run in the last 10 hours [22:29:58] PROBLEM - Puppet freshness on amssq52 is CRITICAL: No successful Puppet run in the last 10 hours [22:29:58] PROBLEM - Puppet freshness on cerium is CRITICAL: No successful Puppet run in the last 10 hours [22:29:58] PROBLEM - Puppet freshness on cp1007 is CRITICAL: No successful Puppet run in the last 10 hours [22:29:58] PROBLEM - Puppet freshness on db40 is CRITICAL: No successful Puppet run in the last 10 hours [22:29:58] PROBLEM - Puppet freshness on es1 is CRITICAL: No successful Puppet run in the last 10 hours [22:30:57] * gwicke waits for LocalisationCache updates [22:30:58] PROBLEM - Puppet freshness on amssq55 is CRITICAL: No successful Puppet run in the last 10 hours [22:30:58] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: No successful Puppet run in the last 10 hours [22:30:58] PROBLEM - Puppet freshness on cp1020 is CRITICAL: No successful Puppet run in the last 10 hours [22:30:58] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [22:30:58] PROBLEM - Puppet freshness on cp3007 is CRITICAL: No successful Puppet run in the last 10 hours [22:31:58] PROBLEM - Puppet freshness on amssq34 is CRITICAL: No successful Puppet run in the last 10 hours [22:31:58] PROBLEM - Puppet freshness on amssq57 is CRITICAL: No successful Puppet run in the last 10 hours [22:31:59] PROBLEM - Puppet freshness on caesium is CRITICAL: No successful Puppet run in the last 10 hours [22:31:59] PROBLEM - Puppet freshness on cp1003 is CRITICAL: No successful Puppet run in the last 10 hours [22:31:59] PROBLEM - Puppet freshness on cp1013 is CRITICAL: No successful Puppet run in the last 10 hours [22:31:59] PROBLEM - Puppet freshness on db1052 is CRITICAL: No successful Puppet run in the last 10 hours [22:31:59] PROBLEM - Puppet freshness on cp3010 is CRITICAL: No successful Puppet run in the last 10 hours [22:33:59] PROBLEM - Puppet freshness on amssq44 is CRITICAL: No successful Puppet run in the last 10 hours [22:33:59] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: No successful Puppet run in the last 10 hours [22:33:59] PROBLEM - Puppet freshness on db1004 is CRITICAL: No successful Puppet run in the last 10 hours [22:33:59] PROBLEM - Puppet freshness on db43 is CRITICAL: No successful Puppet run in the last 10 hours [22:33:59] PROBLEM - Puppet freshness on es4 is CRITICAL: No successful Puppet run in the last 10 hours [22:33:59] PROBLEM - Puppet freshness on mc1009 is CRITICAL: No successful Puppet run in the last 10 hours [22:33:59] PROBLEM - Puppet freshness on ms-be1012 is CRITICAL: No successful Puppet run in the last 10 hours [22:35:58] greg-g / jdlrobson: Ah, sorry... that's another part of "SUL". The part mobile wants (centralauth token) should be in wmf6 [22:36:02] !log gwicke Started syncing Wikimedia installation... : [22:36:07] PROBLEM - Puppet freshness on amssq53 is CRITICAL: No successful Puppet run in the last 10 hours [22:36:07] PROBLEM - Puppet freshness on cp1002 is CRITICAL: No successful Puppet run in the last 10 hours [22:36:07] PROBLEM - Puppet freshness on cp1005 is CRITICAL: No successful Puppet run in the last 10 hours [22:36:07] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [22:36:07] PROBLEM - Puppet freshness on db1001 is CRITICAL: No successful Puppet run in the last 10 hours [22:36:10] Logged the message, Master [22:36:28] csteipp: gotcha, my bad [22:38:07] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: No successful Puppet run in the last 10 hours [22:38:07] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: No successful Puppet run in the last 10 hours [22:38:07] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: No successful Puppet run in the last 10 hours [22:38:07] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [22:38:07] PROBLEM - Puppet freshness on db1002 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:07] PROBLEM - Puppet freshness on amssq47 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:07] PROBLEM - Puppet freshness on amssq48 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:07] PROBLEM - Puppet freshness on db1034 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:07] PROBLEM - Puppet freshness on db1051 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:07] PROBLEM - Puppet freshness on mw1010 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:07] PROBLEM - Puppet freshness on mw1091 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:07] PROBLEM - Puppet freshness on mw1092 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:08] PROBLEM - Puppet freshness on mw116 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:08] PROBLEM - Puppet freshness on mw60 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:09] PROBLEM - Puppet freshness on search30 is CRITICAL: No successful Puppet run in the last 10 hours [22:39:09] PROBLEM - Puppet freshness on sockpuppet is CRITICAL: No successful Puppet run in the last 10 hours [22:43:55] !log gwicke Finished syncing Wikimedia installation... : [22:44:03] Logged the message, Master [22:44:08] that's it for me [22:44:17] Excellent [22:44:24] I'll take over in a bit [22:44:37] New patchset: Yurik; "Updated Orange Tunisia IPs per carrier request" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67360 [22:44:44] mw.org still seems to work, so good [22:44:44] gwicke: congrats. btw, the dangling colon is there because the sync scripts take an optional second argument for a log message [22:45:37] ah, I see [22:45:43] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67151 [22:45:51] New patchset: Reedy; "Added a comment for the wikidata hostname" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65861 [22:45:58] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65861 [22:46:32] !log catrope synchronized visualeditor.dblist 'Enabling VisualEditor on the remaining 270 Wikipedias' [22:46:40] Logged the message, Master [22:47:29] New patchset: Reedy; "(bug 49176) Increase account creation limit for itwiki GLAM event" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67087 [22:47:32] that is a fast ramp-up for the Parsoid extension [22:47:36] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67087 [22:47:58] New patchset: Reedy; "wmf favicon for votewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67327 [22:48:04] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67327 [22:48:26] New patchset: Ryan Lane; "Suppress salt logging in cli operations" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67362 [22:50:27] New patchset: Ryan Lane; "Suppress salt logging in cli operations" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67362 [22:50:34] New patchset: Reedy; "I fixed a few typographical errors." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64635 [22:50:40] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64635 [22:50:47] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [22:52:30] RoanKattouw: is there a chance to get sufficient rights on the varnish machines so that I can see the logs? [22:52:42] New review: Reedy; "Why would you follow Python guidelines when writing PHP?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65860 [22:52:48] gwicke: Are they owned by root or something? [22:53:02] ls -ld /var/log/varnish/ [22:53:03] drwxr-x--- 2 varnishlog varnishlog 4096 Nov 23 2012 /var/log/varnish/ [22:53:49] Hmph [22:54:28] making that directory world-readable could be a temporary workaround [22:55:19] We'll probably just want to place admins::parsoid users in the varnishlog group [22:55:20] !log catrope synchronized wmf-config/InitialiseSettings.php 'touch' [22:55:28] Logged the message, Master [22:55:31] But you should ask binasher what he thinks we should do [22:58:21] binasher: ^^ [22:58:22] varnish doesn't write its logs out to the filesystem [22:58:43] even the Parsoid ones? [22:59:24] what's a hostname there? [22:59:29] titanium [22:59:40] and cerium [23:00:37] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [23:00:39] as non-root you can use /usr/bin/varnishlog [23:01:16] gwicke: what sort of data are you looking for? [23:02:39] binasher: the varnish logs on the Parsoid caches [23:02:55] those don't exist. what actual data are you looking for? [23:02:59] I just deployed the Parsoid extension that updates those caches and would like to verify that it works [23:03:37] so if there is a way to get a log of requests from those varnishes then that would be helpful [23:03:41] gwicke: would you like to see the status of requests as they're happening? [23:03:56] !log reedy synchronized php-1.22wmf5/extensions/Parsoid/php/Parsoid.php 'fix hook call' [23:04:04] Logged the message, Master [23:04:19] binasher: it does not necessarily have to be live, but some data from the last minutes or so would be helpful [23:04:45] gwicke: you want either the varnishncsa or varnishlog commands [23:04:55] gwicke: varnish logs to shared memory [23:05:01] those commands read from it [23:05:10] binasher: and those are available to mere mortals? [23:05:39] unfortunately not. for other varnish clusters, we udplog the varnishncsa output though [23:05:47] we could do that for the parsoid caches as well [23:06:03] though that doesn't help you in the immediate term [23:06:21] wait, I am getting output from vanishncsa [23:06:31] gwicke: see a lot of purges [23:06:33] ? [23:06:48] binasher: did you see the conclusion to my master query hunting yesterday? [23:07:25] gwicke: varnishncsa with no option = the backend instance. add "-n frontend" for the frontend instance [23:07:25] !log tstarling cleared profiling data [23:07:25] binasher: yeah, I guess the htcp purge daemon is running on those machines [23:07:32] Logged the message, Master [23:07:42] gwicke: -m 'RxRequest:^(?!PURGE$)' will filter the purges [23:07:56] TimStarling: i didn't, do tell! [23:08:18] http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=MySQL+eqiad&h=db1056.eqiad.wmnet&jr=&js=&v=7130&m=mysql_connections&vl=conns&ti=mysql_connections [23:08:52] binasher: thanks, it seems that our requests are arriving fine [23:08:59] 80% of enwiki master queries and about 80% of master connections also were due to a testing hack in GuidedTour [23:09:26] the other wikis where GuidedTour was enabled were affected also [23:09:51] I submitted a patch to fix it, by removing the test, Matt Flaschen merged it, and I deployed it [23:09:59] gwicke: fwiw, the new regexp daemon per-domain filtering support, so we could filter irrelevant purges [23:10:09] gwicke: that's bblack's work [23:10:09] TimStarling: :O [23:10:12] you can see the result in ganglia [23:10:27] TimStarling: were those Revision::fetchFromConds queries? [23:10:31] yes [23:10:35] that ganglia graph is awesome [23:10:41] paravoid: 100% of them are irrelevant ;) [23:10:45] omg, the enwiki qps is down to 819 [23:10:50] * paravoid tries to imagine binasher's face [23:10:52] note that it does have a broken scale [23:10:57] it was >3000 yesterday [23:11:09] ganglia is very unscientific, we were taught in school to never ever use a broken vertical scale [23:11:14] except if we went into marketing [23:11:19] gwicke: right; mark is also planning on spliting the htcp multicast groups, so these wouldn't even get to vhtcpd [23:11:36] TimStarling: what do you mean by broken scale? [23:11:44] I mean it doesn't start at zero [23:11:44] paravoid: I'm looking at the Parsoid varnishes- those only handle the internal Parsoid service [23:11:57] it starts at 5k [23:12:08] so it looks like almost all the queries are gone when actually it was only 80% [23:12:10] really?! [23:12:33] Reedy: thanks for the fix! [23:12:34] TimStarling: not looking at the actual graphs, but what ganglia is currently reporting as queries/sec for the enwiki master is accurate. so that's good even if the graphs are screwy [23:12:39] just count down on the vertical axis [23:12:54] wow you're right [23:13:04] I didn't immediately notice, that's even scarier [23:13:23] about a 500% reduction (still not looking at graphs) [23:13:47] yes, if you use the reciprocal [23:14:08] * TimStarling is not into reciprocals [23:14:15] fair [23:14:41] !log reedy synchronized php-1.22wmf6/extensions/Parsoid/ [23:14:48] Logged the message, Master [23:15:03] https://ishmael.wikimedia.org/sample/more.php?host=db1056&hours=24&checksum=17570832276017652668 [23:15:06] gwicke: no purges needed there at all? [23:15:15] paravoid: not of articles [23:15:26] TimStarling: and that's since when, apr? [23:15:40] our URLs contain the oldid [23:15:45] TimStarling: i'm surprised the guided tour extension was issuing that query much at all [23:16:05] and are refreshed implicitly on template update [23:16:29] binasher: it was a MakeGlobalVariablesScript hook [23:16:32] !log reedy synchronized php-1.22wmf6/extensions/Parsoid/ [23:16:40] Logged the message, Master [23:17:10] similar to https://gerrit.wikimedia.org/r/#/c/65009/ which is the change that got me looking at master queries [23:17:33] I wanted to do some statistics so I could tell Matt how disasterous that change would be [23:17:43] https://git.wikimedia.org/commit/mediawiki%2Fextensions%2FGuidedTour.git/4ab986cb23327aecf80738653001479509447b9e [23:17:49] that's Tim's change [23:18:11] 2012-01-02? [23:18:12] really? [23:18:27] TimStarling: and that's since when, apr? [23:18:33] ganglia seems to imply that it was april [23:18:41] they developed it a long time ago and enabled it recently, I think [23:18:51] yeah, it took ages to get out [23:18:53] yes, that's what where I got apr [23:19:40] i guess this was deployed around the time the jobq was moved off mysql [23:20:03] !log reedy synchronized php-1.22wmf5/extensions/Parsoid/ [23:20:11] Logged the message, Master [23:20:23] New patchset: Catrope; "Enable VE experimental mode on mediawiki.org and beta labs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67371 [23:20:36] i love how the patch was to just delete a line of code with nothing added [23:21:46] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67371 [23:22:33] hey, apart from guided tour, we actually were successful at moving nearly all read traffic to slaves [23:22:43] !log catrope synchronized wmf-config/InitialiseSettings.php 'Set $wmgVisualEditorExperimental = true on mediawiki.org' [23:22:50] Logged the message, Master [23:23:25] New patchset: GWicke; "Ramp up the Parsoid load" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67373 [23:23:26] !log catrope synchronized wmf-config/CommonSettings.php 'Plumbing for $wmgVisualEditorExperimental' [23:23:33] Logged the message, Master [23:26:15] TimStarling: the rate of several other master query types increased by 2-3x on enwiki right after the guidedtour fix was deployed [23:26:28] thanks to being faster [23:27:27] https://ishmael.wikimedia.org/sample/more.php?host=db1056&hours=24&checksum=4058300124652897056 [23:27:39] https://ishmael.wikimedia.org/sample/more.php?host=db1056&hours=24&checksum=484571594445205663 [23:28:02] * binasher buys TimStarling all the drinks [23:29:39] binasher: the precedent calls for a 35 year old whiskey [23:30:14] !log updating Parsoid to 020c7b2 [23:30:22] Logged the message, Master [23:30:28] i dunno, how about just really good whisky? [23:30:55] i have a 25 year coal isla that is significantly better than that 31 year in amsterdam [23:31:07] which I didn't get a chance to try btw! [23:31:26] that's okay, I guess I'll deserve it only after ceph finishes [23:31:29] i wonder what happened to it [23:31:41] paravoid: are you coming to the US soon to meet ken? [23:31:51] I don't think so [23:34:33] binasher: it was a template fetch, the reason it hit the master is because the page didn't exist [23:34:47] so the fallback case in Revision::newFromConds was hit [23:35:40] wow. [23:35:51] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67373 [23:35:52] wtf :) [23:37:01] I'd like to push out another config change, is anybody deploying right now? [23:38:56] !log gwicke synchronized wmf-config/CommonSettings.php [23:39:04] Logged the message, Master [23:39:35] I'm done [23:39:43] OK [23:39:48] I'm doing a bit more in a minute [23:41:47] binasher: do you want half the remaining queries to go away as well? [23:41:59] TimStarling: sure! [23:42:46] lol [23:42:59] i'm surprised GeoData::getAllCoordinates is a master query [23:43:06] binasher: Ima have the CentralAuth review done tomorrow. Will you want to plug it in for S7 before we buckle down for redactatron rewrite, or after? [23:43:29] specifically I am looking at the ipblocks query [23:43:31] Coren: before, let's get s7 up in labs [23:43:36] * Coren nods. [23:43:56] Block::newLoad? [23:43:59] it is there because someone accidentally removed the hack that made it go to the slave when they refactored the code in 2007 [23:44:04] yes, Block::newLoad [23:44:15] We've got two other labs project wanting access too, so I'll corner Ryan to do the auth stuff cleanly. A shell script running in a screen on the NFS server won't cut it. :-) [23:45:37] binasher, it's a master query only on save, when we need up to date coordinates to figure out which of them need to die [23:45:46] *page save [23:47:05] MaxSem: makes sense. is that on every page save? [23:48:00] yes, because every page can potentially contain coordinates [23:48:41] I guess we could decrease the maximum number of coordinates per page to make these queries faster [23:48:57] I think I'll just file some bugs for now [23:49:51] * Krenair wonders why ishmael does not accept his login [23:51:35] Coren, why do we need separate login details for the MySQL stuff? [23:52:08] Krenair: Not sure I get your question? [23:52:21] Ishmael should be your labs login [23:52:29] But I think it's also restricted access [23:52:32] ah. [23:52:35] it's wmf [23:52:39] okay, that explains it then [23:52:49] the queries contain private data [23:52:52] Krenair: The DB doesn't share auth with LDAP, for one, but more importantly the service groups don't (and cannot) have credentials to begin with. [23:53:30] Coren: Why the replica.my.cnf on non-service accounts? Why can't we just use our labs logins? [23:53:34] it really should NDA rather than wmf, but noone ever figured where/how to get this information [23:53:45] security or something? [23:54:11] Krenair: security is one aspect, simple accountability is another -- it's important to be able to match queries to the actual tools.