[00:00:24] it connects as root though [00:00:46] I think I have a meeting starting now [00:02:53] Reedy: arr, i need to rush to a bus, i can be back in half an hour [00:03:45] TimStarling: yup. :) [00:04:41] TimStarling: if you're still working with Reedy, though, being late is fine [00:06:05] Change merged: Tim Starling; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/67190 [00:07:22] robla said I should push it out before the meeting starts, so I'm doing that [00:08:31] !log aaron synchronized wmf-config/InitialiseSettings.php 'Enabled message cache debug log' [00:08:39] Logged the message, Master [00:10:04] !log deployed apache configuration for vote.wikimedia.org [00:10:11] Logged the message, Master [00:10:44] New patchset: Aaron Schulz; "Enabled message cache debug log." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67197 [00:11:22] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67197 [00:12:34] Reedy: Jamesofur: have the voter qualification scripts been run yet? [00:12:45] robla: they have [00:12:51] whew :) [00:13:00] * Jamesofur nods, yeah lol [00:13:03] Tim was just explaining this to me :) [00:18:07] https://vote.wikimedia.org/wiki/Main_Page [00:18:37] Is https://vote.wikimedia.org/ not taking anyone else to the main page? [00:19:01] I get a Forbidden error. [00:19:13] I was getting the www.wikimedia.org content until I bypassed my cache. [00:19:13] That's what apache-fast-test said too [00:19:36] There was also some vote.wikimedia.org --> vote.wikipedia.org wonkiness earlier. [00:19:54] Firefox keeps trying to do that [00:20:22] Yeah, I'm getting a 301 at https://vote.wikimedia.org/w/index.php [00:20:33] Location: https://vote.wikipedia.org/wiki/Main_Page [00:20:43] Maybe cache, though I suspect a typo. [00:21:07] reedy@tin:/a/common$ mwscript eval.php votewiki [00:21:07] > echo $wgCanonicalServer [00:21:07] http://vote.wikipedia.org [00:21:07] > echo $wgServer [00:21:08] /vote.wikipedia.org [00:21:17] Not set specifically [00:21:32] I guess it's identiying as a wikipedia for some sillyness [00:21:34] I think it's the default [00:21:40] You didn't specify wgServer? [00:22:12] Easy enough to fix. [00:22:42] Because the suffix was 'wiki' instead of 'wikimedia'? [00:22:44] I sorta hoped 'wikimedia' => '//$lang.wikimedia.org', would cater for it [00:22:46] !log reedy synchronized wmf-config/InitialiseSettings.php [00:22:55] Logged the message, Master [00:22:55] Krenair: Probably. :-) [00:25:13] Guess we're missing a couple of rewrite rules [00:25:39] New patchset: Reedy; "Set wgServer and wgCanonicalServer for votewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67202 [00:27:10] New patchset: Reedy; "Add a couple of rewrites for votewiki" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/67203 [00:30:52] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67202 [00:48:30] AaronSchulz: twemproxy actually can't reload its conf, needs to be restarted. so i'm not going to touch the scap scripts at all [00:50:29] AaronSchulz: but will instead add a "restart twemproxy" script. i don't think the restart need is a problem. timed a restart via upstart. from the shutdown to the new copy establishing connections to all memcached instances and then accepting connections took 59ms. [00:51:45] AaronSchulz: ideally, mediawiki would retry requests in this case though [01:00:27] New patchset: Yurik; "Minor fix to base decision on X-CS, not X-Carrier" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67214 [01:02:17] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.005628585815 secs [01:02:18] Reedy: re..still need something? [01:02:37] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.003818631172 secs [01:32:04] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:32:21] ignore that [01:32:32] geez [01:32:54] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:32:54] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 2.157 second response time [01:32:56] paravoid: it will be time for you to wake up soon [01:33:44] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 1.044 second response time [01:53:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [02:04:32] Aaron|home: so, it needs about an hour to upgrade every box (pgs need "upgrading") [02:04:49] Aaron|home: that leaves 9 boxes and I don't want to do them at the same time [02:05:01] Aaron|home: plus waiting for .3, and the bug I reported [02:05:14] !log Graceful reload of Zuul to deploy {{gerrit|I4e7978f8a2a770}} [02:05:26] Aaron|home: and friday is off, so all in all, it's unlikely that we'll put ceph back into production this week [02:05:31] Logged the message, Master [02:05:36] Aaron|home: hopefully early next week [02:05:40] ok [02:05:57] I'll finish up the cuttlefish upgrades tomorrow [02:06:02] !log LocalisationUpdate completed (1.22wmf5) at Thu Jun 6 02:06:01 UTC 2013 [02:06:06] * Aaron|home stopped his copy scripts earlier due to errors [02:06:10] Logged the message, Master [02:06:11] that can be finished later [02:06:16] yeah [02:06:20] I also have copy scripts copying thumbs [02:06:28] running since earlier today [02:07:48] let's say monday for now [02:07:57] based on past experience, I don't think we need a deployment window [02:15:33] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 6 02:15:33 UTC 2013 [02:15:42] Logged the message, Master [02:18:02] TimStarling: Writing up all of our next steps for the election. I'm assuming we have to flip SecurePoll to true by default, it looks like we turn it off for most wikis in between elections? [02:35:27] Jamesofur: https://bugzilla.wikimedia.org/show_bug.cgi?id=42464 [02:36:21] amen :-/ we need to set aside resources to tie up some of the loose ends after this election [02:36:48] especially in case Tim gets hit by a bus [02:46:25] Jamesofur: not sure why it is switched off [02:46:32] it can be on everywhere all the time [02:46:44] * Jamesofur nods, that's what I expected. [02:46:47] easy enough [02:50:38] <^demon|away> Jamesofur: Speaking of elections, I never heard back from anyone after I got you guys that info you were looking for. [02:50:47] <^demon|away> Was that sufficient, or do we need more info? [02:51:50] I think it's likely to be as close as we're going to get, the real question is if we want to turn SecurePoll on for wikitech [02:51:56] if we do that we can just have them start voting there [02:52:18] otherwise we might do the "if you can't vote otherwise and you're a developer ask the election committee and they'll check the list and add you" meathod [02:52:23] method too [02:52:28] <^demon|away> Either way you have to end up reconciling the data with normal wiki voting. [02:52:33] yes [02:52:42] <^demon|away> I for example would qualify to vote either way, and I imagine I'm not alone. [02:52:44] we'll have to do that with staff too [02:52:47] me too [02:53:00] That's why I'm thinking the exception method may actually be better overall [02:53:06] if they don't qualify the normal ways they can ask [02:53:11] and we can add them to the meta list [02:53:14] and then they can vote [02:53:35] the 2 weeks and some staff attention from my side makes that relatively painless (relatively) [02:53:46] as long as we have the list you gave us [02:54:03] <^demon|away> Well its up to you guys ;-) Just lemme know if you need anything else from me. [02:54:11] will do [02:54:15] thanks much for the help [02:54:19] <^demon|away> yw. [03:00:48] New patchset: Jalexander; "Enable SecurePoll globally and VoteWiki logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67227 [03:27:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:29:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [03:45:46] !log tstarling synchronized php-1.22wmf5/includes/cache/MessageCache.php 'temp debug log for high master query load' [03:45:54] Logged the message, Master [03:48:02] !log tstarling synchronized php-1.22wmf5/includes/cache/MessageCache.php 'revert temp hack' [03:48:09] Logged the message, Master [03:54:10] !log tstarling synchronized php-1.22wmf5/includes/Revision.php 'temp debug hack' [03:54:18] Logged the message, Master [03:56:58] AaronSchulz: it wasn't MessageCache::getMsgFromNamespace() after all [04:18:36] !log tstarling synchronized php-1.22wmf5/includes/Revision.php 'remove debug hack' [04:18:44] Logged the message, Master [04:22:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:23:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [04:53:40] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [04:57:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.740 second response time [05:02:00] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [05:12:15] !log tstarling synchronized php-1.22wmf5/extensions/GuidedTour/modules/tours/test.js [05:12:23] Logged the message, Master [05:12:53] !log tstarling synchronized php-1.22wmf5/extensions/GuidedTour/GuidedTourHooks.php [05:13:00] Logged the message, Master [05:14:08] !log tstarling synchronized php-1.22wmf5/extensions/GuidedTour [05:14:16] Logged the message, Master [05:15:11] !log fixed high enwiki master query load due to GuidedTour [05:15:20] Logged the message, Master [05:15:39] nice [05:15:41] superm401: ^ [05:15:46] * greg-g was watching and curious ;) [05:15:46] * ori-l catches up [05:16:13] ori-l, yeah, TimStarling already filled me in. [05:16:15] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=db1056.eqiad.wmnet&m=cpu_report&s=by+name&mc=2&g=network_report&c=MySQL+eqiad [05:16:28] He IDed a big perf problem in GuidedTour then fixed it, and then I cherry-picked it for him. [05:16:29] wow [05:17:41] also user CPU: http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=MySQL+eqiad&h=db1056.eqiad.wmnet&jr=&js=&v=0.8&m=cpu_user&vl=%25&ti=CPU+User [05:18:45] well done. [05:19:39] more than I was expecting, the site is definitely up isn't it? [05:19:41] ;) [05:20:05] appears so ;) [05:20:44] good catch [05:20:47] also good morning [05:20:59] oh right, look at the connection counts: http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=MySQL%20eqiad&h=db1056.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1370495751&v=6204&m=mysql_connections&vl=conns&ti=mysql_connections&z=large [05:21:24] see, we didn't just kill a major source of queries, we removed master connections from parser cache hits entirely [05:21:31] g'morning apergos [05:21:35] so the associated BEGIN queries are gone too [05:21:44] Wow [05:21:56] very nice [05:22:00] *Really* good catch, TimStarling [05:22:37] meh, i've seen better [05:23:02] * ori-l kids [06:25:12] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:12] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:13] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:13] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:14] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:14] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:15] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [06:25:15] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [07:04:36] PROBLEM - Swift HTTP on ms-fe1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:12:41] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out [07:12:42] PROBLEM - Apache HTTP on mw1156 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:12:42] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection timed out [07:12:42] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection timed out [07:12:49] PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection timed out [07:12:49] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection timed out [07:13:01] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection timed out [07:13:09] PROBLEM - Swift HTTP on ms-fe3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:13:09] PROBLEM - Apache HTTP on mw1153 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:13:29] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.068 second response time [07:13:29] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:13:59] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 7.290 second response time [07:13:59] RECOVERY - Swift HTTP on ms-fe3 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.060 second response time [07:13:59] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [07:14:19] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 61954 bytes in 0.207 second response time [07:14:23] !log swift init restart all on ms-fe1 and 3 [07:14:33] Logged the message, Master [07:14:47] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.052 second response time [07:14:47] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.066 second response time [07:14:48] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [07:14:57] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.072 second response time [07:14:57] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.058 second response time [07:14:57] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [07:17:36] !log restarted swift on most backends as well [07:17:46] Logged the message, Master [07:19:02] wee [07:20:51] PROBLEM - Host wtp1008 is DOWN: CRITICAL - Plugin timed out after 15 seconds [07:20:52] none too exciting [07:21:01] PROBLEM - Swift HTTP on ms-fe2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:17] RECOVERY - Host wtp1008 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [07:21:40] !log and the rest of the ms-fes [07:21:48] Logged the message, Master [07:21:51] RECOVERY - Swift HTTP on ms-fe2 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.060 second response time [07:22:00] next time I'll just do them all [07:22:34] parsoid... ugh [07:25:28] it rebooted itself, well that's not too nice [07:27:42] !log wtp1008 rebooted itself a few minutes ago (hence the icinga whine), no idea why [07:27:50] Logged the message, Master [08:02:10] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.006365776062 secs [08:13:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:14:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [08:23:06] mark: on beta is very still any use for deployment-cache-text1 ? IIRC it was to setup a text varnish cache on beta :) [08:27:05] mark when can you provide Snaps the profiling information for varnishncsa? [08:32:56] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.005370020866 secs [09:29:49] drdee: i'm hoping to get to that today [09:34:58] sweet! [10:02:29] ...and if not, it'll be tomorrow [10:04:22] and else the day after that :D but if you are very busy maybe you can give me headsup so we can think of a solution, i rather not have him blocked (he isn't yet but eventually he will get blocked) [10:11:27] no worries, I'm actually planning to get it done this week ;) [10:11:53] New review: Diederik; "(1 comment)" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/53170 [10:22:59] apergos: moin [10:23:14] hello [10:23:31] how's swift? [10:23:39] ceph might take a few more days after all [10:24:26] the boxes fell over this morning with similar symptoms to the last time [10:24:29] and I am off for some extended week-end. See you all on monday [10:24:29] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:24:29] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:24:35] ignore those [10:24:58] PROBLEM - HTTP radosgw on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:01] they've been chugging along since then as though nothing ever happened :-/ [10:25:09] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:20] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:24] sorry for the page [10:25:38] heh [10:26:40] what's the status of the replacement boxes? [10:26:50] RECOVERY - HTTP radosgw on ms-fe1003 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.005 second response time [10:27:01] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.006 second response time [10:27:32] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.005 second response time [10:27:41] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.007 second response time [10:27:41] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.006 second response time [10:28:18] ms-be1 is racked and has he base install but not the first puppet run I believe, and I have no idea about the others (I guess they will go in the same racks as the existig servers, this may mean they can't go in til the live server comes out, depends on space) [10:29:13] ms-be4 is picking up the data for the disks that didn't match the old partition layout now [10:29:25] did you push rings recently? [10:29:26] ms-be9 has one disk not recognized and there's a ticket [10:29:33] this morning [10:29:48] so that's about the state of it [10:30:02] before the leaks? [10:30:03] 1 box + 1 disk unavailable [10:30:17] yes, before [10:30:21] (I !log every time I push rings) [10:31:53] they fell over about 2.5 hours later [10:31:54] so you can tell the age of a log by the number of rings [10:32:44] bad puns == bedtime. bye! [10:33:04] the host with the changes didn't actually exhibit any of the symptoms... [10:33:06] good night [10:33:07] New review: Daniel Kinzler; "(1 comment)" [operations/apache-config] (master) C: -1; - https://gerrit.wikimedia.org/r/65443 [10:33:12] did you also push rings before the other time we had the outage too? [10:33:29] nope [10:33:29] ori-l: i appreciated your comment ;) [10:33:43] heh, it was worth a try [10:33:53] if I had I would have mentioned it in the log :-D [10:34:32] and right now as I say it's only playing catchup for 4 disks (q for obj, 2 for account and container that are likely already done) [10:34:52] s/q for/2 for/ [10:36:11] and what's the plan for ms-be1? [10:36:26] once ms-be4 is happy then we'll bing ms-be1 up at 33% etc [10:36:28] same old same old [10:36:34] heh [10:36:45] when it's all the way in then we start on the ones with the h310 controllers [10:36:53] right [10:37:22] then the race will be on: does ceph become primary front end before we get all the bad controllers out [10:37:30] er back end [10:38:09] wanna place any bets? [10:38:21] hah [10:39:29] I'll take that as a no! [12:02:18] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:03:09] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [12:13:37] Help please... [12:13:51] I'm trying to push a change to gerrit, but I'm getting a weird errors. [12:14:00] To ssh://siebrand@gerrit.wikimedia.org:29418/operations/puppet.git [12:14:00] ! [remote rejected] HEAD -> refs/for/master (branch master not found) [12:14:01] error: failed to push some refs to 'ssh://siebrand@gerrit.wikimedia.org:29418/operations/puppet.git' [12:14:05] Any ideas? [12:14:13] I haven't pushed to that repo before. [12:16:16] push to production, not master [12:18:18] siebrand: ^^ [12:18:58] drdee: ty [12:25:12] <^demon> !log bringing gerrit down for a few minutes to pick up some changes [12:25:20] Logged the message, Master [12:26:08] :( [12:26:35] * aude hopes for shiny new features in gerrit [12:26:42] mmm [12:26:46] shiny! [12:27:01] gitblit is cool [12:27:09] <^demon> Tons of shiny features. [12:27:12] <^demon> And bugfixes [12:27:16] yay! [12:27:30] 'tons' you say? [12:27:34] <^demon> Should be back in a minute, waiting for puppet to finish [12:27:38] ok [12:27:46] <^demon> YuviPanda: I believe the technical term is "a metric shit-ton" :) [12:27:57] metric system ftw [12:28:07] +1 to sane systems of measurements [12:28:21] though I'd reccomend against visiting the metric shit-ton reference brick in Paris... [12:28:24] * apergos isn't sure the metric shit-ton is any more sane than the shit-ton in other systems [12:28:36] <^demon> And we're back with gerrit 2.7-rc1-424-gef469ac [12:28:37] <^demon> :) [12:28:43] nice [12:29:10] oooh, there is a draft / status column [12:29:34] for merged or abandoned I guess [12:29:39] <^demon> !log gerrit's back, running 2.7-rc1-424-gef469ac [12:29:47] Logged the message, Master [12:29:47] yes [12:30:28] als someone reviewed my rev_len change leaving demon off the hook [12:31:33] now the search bar is even more outside my screen :( [12:31:52] <^demon> They got rid of the silly expanding bar at least. [12:32:41] <^demon> Yayayayayayayayay! [12:32:50] <^demon> The stupid "can't expand crap" bug seems fixed. [12:33:02] <^demon> Trying to replicate that's always been a pain in the ass. [12:33:23] New patchset: Siebrand; "Enable auto commit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67249 [12:33:39] * aude wonder if i can paste links into gerrit comments again [12:33:46] and stack traces and such [12:35:04] links work :) https://gerrit.wikimedia.org/r/#/c/67083/ [12:37:04] why does https://gerrit.wikimedia.org/r/#/q/Iaedbea9b5f43ac,n,z show endless "Working" [12:39:29] ^demon: any idea why puppet just hangs when i'm running it on my labs instance? [12:39:35] worked last night [12:40:39] New review: Manybubbles; "Autocommits in Solr are a normal and this will allow us to use this Solr without committing. If we ..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/67249 [12:41:23] <^demon> aude: Not a clue :\ [12:41:37] hmmm [12:41:46] * aude moves on to do other stuff and come back later [12:42:12] trying to submit a patch for puppet, though :/ [12:43:40] ^demon: Was this possibly your change, or is this local. When trying to push a change: [12:43:40] remote: Resolving deltas: 100% (711/711) [12:43:41] error: unpack failed: error Missing blob 83aff0fd25c82dd440dfd95165bec61a6b945908 [12:43:56] <^demon> Yeah, that just showed up in the error log...looking [12:44:11] <^demon> Ugh, that's probably because of Gustaf's change. [12:46:04] while there are gerrit experts in the room.. I was trying to pull down a change (so I could git commit --amend it) after having synced up with some changes in master. It really really wanted to merge the thing in and I could only work around it by rebase via the gerrit gui and then pull the change. what's the "right" way? [12:46:28] by "merge the thing in" I mean a new commit with "merging blah blah" [12:47:10] <^demon> qchris: Ugh, I think Gustaf's stuff is maybe not ready for primetime. [12:47:12] <^demon> http://p.defau.lt/?4B2BZFLIKsBfW4604_mbvw [12:47:24] ^demon: Push just worked. You fixed something? [12:47:28] <^demon> Nope. [12:47:46] <^demon> But I'm concerned about these stacktraces, so I'm rolling some of the experimental changes out of our build. [12:48:33] * aude wouldn't be surprised if this is related to puppet not working [12:48:38] if it's trying to fetch something from gerrit [12:48:47] <^demon> I doubt it [12:48:51] hmmmm [12:50:14] <^demon> Anyway, we're just deploying master without our experimental changes. They seem a bit wonky from siebrand's experience. [12:50:16] * aude takes a nap while puppet runs [12:50:22] <^demon> Gonna roll back now [12:51:04] ^demon: You want some traffic on Gerrit let me know. [12:51:20] ^demon: I start 2 scripts and you have up to 20 repo pulls at the same time... [12:52:49] <^demon> !log gerrit rollback to 2.7-rc1-420-g5d5c5c3, some of the new work wasn't quite ready for prime time [12:52:57] Logged the message, Master [12:53:19] rats [12:53:27] <^demon> Just 4 changes. We still have hundreds of others :) [12:53:39] ah ok then :-) [12:53:46] <^demon> It's mainly a series of changes that do a lot of work to tighten security on drafts. [12:55:05] I have never used the draft feaure [12:55:09] or feature either [12:55:15] * aude assumes nothing is private that i put on the internet [12:55:40] it's still a nice feature but i expect no privacy [12:56:10] <^demon> Yeah, with drafts they've always shot for security through obscurity. [12:56:16] <^demon> We were trying to fix that. [12:56:17] yep [12:56:20] and we all know how well that works [12:56:40] * apergos votes for job security through obscurity, proven to actually work :-D [12:56:49] :) [12:57:06] <^demon> siebrand: Thanks for the offer. Rebooting gerrit replicates all projects, so I get a nice queue of ~2000 jobs to complete already :) [12:57:36] ^demon: Just tried to open https://gerrit.wikimedia.org/r/#/c/48121/ and after 30 seconds, it's still going. Expected? [12:57:36] <^demon> apergos: Work on projects that nobody else understands (or wants to do)? ;-) [12:57:55] <^demon> siebrand: Link wfm, maybe you clicked while it was restarting? [12:58:04] * siebrand sighs. [12:58:07] possibly. [12:58:30] that and don't document anything [12:58:31] it loads in chromium but never ends for me on FF [12:58:36] then you become indispensible :-P [12:59:10] mm link not quite working for me [12:59:17] same for https://gerrit.wikimedia.org/r/#/q/Iaedbea9b5f43ac,n,z (note that in FF it doesn't even get redirected to https://gerrit.wikimedia.org/r/#/c/58082/ ) [12:59:19] <^demon> Wfm in firefox too [12:59:19] "Working" it says [12:59:28] heh [12:59:36] obviously I loaded it after you ^demon so it wasn't a startup issue [12:59:40] maybe ^demon meant "Working..." For Me [12:59:44] I"m in ff something [12:59:54] 21.0 [12:59:58] stupid versioning [13:00:13] <^demon> I'm on 19.something [13:00:25] siebrand's URL loaded after some minutes or so but still shows "Working ..." [13:00:40] <^demon> Just updated to 21.0, wfm too [13:00:42] here it still has not loaded [13:01:36] <^demon> :\ [13:02:03] <^demon> Nothing's hitting the error log... [13:02:09] <^demon> Caches are always cold after a restart. [13:02:09] there are so many warnings in error console that I don't know what may be relevant [13:02:54] <^demon> I've got the usual warnings about the avatars 404'ing, but that shouldn't be it (it's due to the batshit way the avatar plugin point is implemented, which we don't even use) [13:03:45] does cache affect time to redirect from https://gerrit.wikimedia.org/r/#/q/Iaedbea9b5f43ac,n,z to https://gerrit.wikimedia.org/r/#/c/58082/ ? [13:04:02] might be indeed faster now *shrugs* [13:04:05] still no changeset for me [13:04:19] should I reload or wait? decisions decisions [13:04:35] open a new tab :p [13:04:43] or a new browser window [13:04:55] the new tab loaded [13:06:03] <^demon> silly gerrit. [13:06:38] <^demon> Ok....so is anyone seeing any other problems? [13:08:23] <^demon> Silence is consent. Yay! [13:08:27] * aude still napping but probably unrelated [13:08:39] shall wait for ryan or andrew [13:08:56] not yet, I clicked around some [13:09:08] * aude didn't change anything in my puppet since last time it ran successfully [13:12:29] morning akosiaris! [13:15:56] ottomata: more like well into the afternoon but good morning to you!! [13:16:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:17:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [13:19:50] New patchset: Siebrand; "Increase ramBufferSizeMB from 32 to 100" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67252 [13:25:57] good afternoon to you akosiaris! [13:26:00] good morning to me! [13:26:11] so, I have (and have had) postrm and postinst scripts for you [13:26:11] and [13:29:03] how shoudl I get them to you? [13:29:10] i don't htink I can push to your github repo [13:29:29] github? do a pull request? [13:29:39] baaah then I have to fork :) [13:29:50] ok ok [13:30:19] have akosiaris put it in gerrit instead of github [13:30:23] oh but akosiaris, ja you have a new repo, right? [13:30:54] paravoid: we wil do that i think, its probably on github because its annoying to figure out branching and pushing on gerrit, espeically if you are afraid you won't be able to remote delete branches withtout special permissions [13:31:32] the script are on gerrit though, in the old deb attempt [13:31:39] https://gerrit.wikimedia.org/r/#/c/53170/10/debian/postinst [13:31:39] https://gerrit.wikimedia.org/r/#/c/53170/10/debian/postrm [13:31:39] * YuviPanda mentions the GitHub to Gerrit bot, wonders if any other repositories would want to participate [13:31:50] github to gerrit bot? [13:32:02] ottomata: cool that's fine thanx [13:32:09] ottomata: open a pull request in GitHub, automatically creates a patchset in Gerrit [13:32:15] updating it in GitHub also updates gerrit patchset [13:32:29] the github repo is already way to old now and i have done some rebases and changes [13:32:33] (mediawiki.org/wiki/User:Yuvipanda/G2G) enabled on 6-8 reposories now. [13:32:38] i will destroy the repo within the day [13:32:54] unrelated to the current issue, but is going to be useful way to have people work on GitHub while still having things on Gerrit :) [13:32:56] <^demon> YuviPanda: Fwiw, the actual gerrit plugin for github is supposed to be open sourced "real soon now." :) [13:33:01] the workflows of the two are very different so I doubt how well it would work [13:33:01] <^demon> Granted, he said the same thing 3 weeks ago [13:33:07] so the 2 links will suffice for now. Thanks ottomata [13:33:09] ^demon: yes, that's what I heard last time too :P [13:33:20] <^demon> It was repeated this week :p [13:33:33] paravoid: true, but no harm experimenting. Plus there are other advantages to GitHUb [13:33:54] ^demon: do let me know if real time soon happens sometime soon :D [13:37:45] YuviPanda: that's cool [13:37:53] does it allow arbitrary repo name mapping? :D [13:37:58] from github to gerrit? [13:38:02] i would love to try it if so [13:38:04] ottomata: no reason it couldn't. [13:38:12] ottomata: let me know which repos you want to map to [13:38:14] and i'll set it up [13:38:21] <^demon> ottomata: We've trying to push some changes to the replication plugin that allows that. [13:38:47] yeah, thanks ^demon! i talked to christian real briefly about that at AMS [13:39:02] but I think YuviPanda is talking about github -> gerrit, which is real curious [13:39:10] yes, I am [13:39:32] it does only code sync now, but I'll work on comments too once my exams end (Saturday!) [13:39:37] Hmm [13:39:58] actually i'm really not sure which repo I'd want to try right now…hmmmmm [13:40:24] <^demon> !log rolling gerrit back to 2.6-rc0-144-gb1dadd2 (version before any updates). Too many rough edges we didn't catch in testing, zuul is not happy [13:40:33] Logged the message, Master [13:40:52] can we change 'Master' to 'Lord' or 'm'lord' or soemthing similar? [13:43:37] <^demon> !log zuul seems to not be yelling about invalid json anymore [13:43:45]