[00:07:44] * robla checks gerrit's pulse [00:08:34] ...and it's dead [00:10:00] any opsen around with a defibrillator? [00:12:01] anyone? anyone? [00:12:19] * robla thumps microphone [00:12:22] is this thing on? [00:12:54] * TimStarling will look [00:13:30] thanks! [00:14:15] it looks busy [00:15:05] it's faking it....I know it's just playing Minesweeper [00:17:12] ohai! [00:17:18] * maplebed just got paged. [00:17:21] sorry, wasn't watching IRC. [00:17:43] there's a request going that's asking for a tarball, maybe that's it [00:18:11] just saw a gerrit page ? [00:18:18] yeah, TimStarling's on it so far. [00:18:28] I just joined teh party. [00:18:36] ok cool [00:18:45] whee! http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20eqiad&h=manganese.wikimedia.org&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [00:18:49] it's pretty quiet now, is it working again? [00:19:07] it's working again for me. [00:19:14] TimStarling: didy ou kill something? [00:19:18] or did it finish on its own? [00:19:27] it finished on its own [00:19:32] looks better now [00:19:50] tarball creation, huh? [00:19:51] I attached to a couple of apache processes with gdb [00:20:13] * robla wonders if that's a gitweb thing maybe [00:20:17] I only had time for two before it started working again [00:20:34] one was /r/ and one was /r/gitweb?p=mediawiki/core.git;a=snapshot;h=7a4db900deaf235bd729cd0557b2d2a7ef3d9a40;sf=tgz [00:20:55] maybe there is access log information [00:22:02] * robla wonders if anyone dare try that again [00:22:12] links are here: https://gerrit.wikimedia.org/r/gitweb?p=mediawiki%2Fcore.git;a=shortlog;h=HEAD [00:23:03] there's something called YandexBot crawling it apparently [00:23:39] New patchset: Bhartshorne; "bumping object replication concurrency to 2 to decrease the time necessary to get ms-be5 fully loaded." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13100 [00:24:19] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/13100 [00:24:19] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13100 [00:24:22] ah, but it's 80legs that's hitting the tarball links [00:24:47] funny that the crawlers can find the gitweb links even though our developers can't [00:26:25] <^demon> TimStarling: So gerrit got overloaded due to a spider? [00:27:05] there are several crawlers hitting gitweb [00:27:14] probably one or a combination of them caused the overload [00:27:47] <^demon> Should be pretty trivial to slap a robots.txt on manganese. [00:28:36] dear robots, please don't simultaneously request tarballs for every revision of every repo simultaneously. kthxbai [00:28:44] Boring [00:29:27] pity there's no timing in the access log [00:29:31] anyone mind if I add it? [00:29:52] I say go for it [00:30:04] we don't have any scripts processing these logs do we? [00:30:27] * robla isn't aware of any [00:30:36] ^demon: ? [00:30:46] <^demon> Nope. [00:31:23] it'll just take half an hour [00:31:37] 1 minute for the config change and 29 minutes to get it into puppet [00:35:19] New patchset: Demon; "Stop robots from trying to index gerrit." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13103 [00:35:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13103 [00:37:11] oh gerrit [00:37:19] <^demon> TimStarling: I've already got the puppet repo open in front of me, what's the change? [00:37:53] someone get the paddles again [00:38:50] <^demon> Granted, this probably isn't gerrit's fault anyway, gitweb is installed via the package, it's not bundled or anything. [00:38:57] maybe I was exaggerating slightly [00:39:10] I had to spend a bit of time reading the apache manual [00:41:40] and it's still not working [00:42:15] ah, no, there we go, actually the requests were just taking so long that they started before I made the config change [00:42:53] like 146 seconds [00:43:59] or 294 seconds [00:48:15] it looks like only one thing can run git at a time [00:49:14] so it's blocking and waiting? [00:50:03] <^demon> Should be more than one, but we could up the number of processes available to jgit. [00:50:22] <^demon> I think the default is something like ~4 simultaneous clones [00:50:25] if it's gitweb, it's not using jgit, right? [00:50:43] no, no point in increasing gitweb processes [00:51:29] there's 20 already and they're just sitting around doing nothing [00:54:39] actually most of the gitweb seem to be waiting for a write to stdout to complete [00:54:55] waiting for java [00:55:24] <^demon> Heh, silly gerrit docs. "Existing installations have successfully processed change reviews with more than 16,000 files per change. However, since 16,000 modified/new files is a massive amount of code to review, it is more typical to see less than 10 files modified in any single change." [00:58:59] <^demon> If gitweb's performance sucks, we could look at using cgit. Gerrit has integration for that out of the box as well. [00:59:33] seems more like a deadlock [00:59:49] the classic sort of deadlock where you have to read from both stdout and stderr [01:00:35] http://paste.tstarling.com/p/MDxjvJ.html [01:00:47] see, this gerrit thread is reading from gitweb's stderr [01:00:57] but the gitweb is writing to stdout (FD=1) [01:02:19] !log on manganese: killing all gitweb.cgi processes [01:02:25] Logged the message, Master [01:02:51] <^demon> TimStarling: So is this something to be fixed in gerrit? [01:03:20] yes [01:03:36] maybe gerrit allows 20 subprocesses [01:04:08] eventually say 17 get used up with hung gitweb processes and we start to wonder why it is slow [01:04:24] I will see if I can find it in the gerrit source [01:04:42] <^demon> So we could up the number of gerrit processes, but it would just prolong the time before they all get tied up? [01:06:24] copyStderrToLog(proc.getErrorStream()); [01:06:24] if (0 < req.getContentLength()) { [01:06:25] copyContentToCGI(req, proc.getOutputStream()); [01:06:25] } else { [01:06:25] proc.getOutputStream().close(); [01:06:25] } [01:06:36] that is the bug [01:07:35] hmm, ok maybe not [01:07:54] it actually starts a new thread to read stderr [01:10:40] I regret killing those gitweb processes now [01:10:45] they were definitely hung [01:11:44] the bot requests are much faster now [01:22:52] I'll restart gerrit [01:23:16] !log on manganese: restarting gerrit [01:23:22] Logged the message, Master [01:41:08] <^demon> gerrit's error log is huge from today (800M and counting). I should look at some kind of monitoring for those logs...having the same error spammed 110149999 times should've been a hint that gerrit wasn't feeling well. [01:41:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 235 seconds [01:43:03] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 268 seconds [01:48:34] <^demon> TimStarling: I'm calling it a night as long as the immediate fire is out. Is there anything re:gerrit you'd like me to look into tomorrow? Or perhaps could you send an e-mail summarizing anything you find? [01:49:30] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 656s [01:49:32] I'll send you an email, good night [01:49:55] <^demon> Thanks. [01:50:24] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 11 seconds [01:52:03] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 16 seconds [01:52:30] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 43s [01:55:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:55:29] New review: Reedy; "Needs adding to wmf-config/extension-list too, so the messages get into the localisations cache with..." [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/13099 [01:57:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.057 seconds [02:18:48] TimStarling: Did you get about regarding ExtensionDistributor and REL1_19 / Git ? [02:21:41] yes I talked with sam about it a couple of hours ago, it's cloning now [02:21:55] with REL1_19 from subversion and master from git [02:22:19] nice [02:23:02] TimStarling: What about extensions that are only in svn or only in git? Not a problem, just curious how it is handled. [02:23:52] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [02:25:26] New patchset: Jalexander; "Add WikimediaShopLink settings" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13099 [02:26:08] probably if they're only in svn they'll be broken [02:26:15] I'd have to check [02:26:42] so you use clone-all.php and then the svn dir name as dir in that clone? [02:27:41] I used the extensions.git submodules [02:27:44] New patchset: Tim Starling; "Log time elapsed in gerrit access logs for incident analysis" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13107 [02:28:25] New review: Tim Starling; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/13107 [02:28:25] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13107 [02:31:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:38:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.054 seconds [03:29:43] New patchset: Krinkle; "Add WikimediaShopLink settings" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13099 [03:31:07] New patchset: Krinkle; "Add WikimediaShopLink settings" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13099 [03:31:58] New review: Krinkle; "Spotted several issues in the extension (mostly small issues). Submitted Ifaf0e574f736264b3d27d376b6..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/13099 [03:57:53] New review: Tim Starling; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/13103 [03:57:55] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13103 [04:35:15] New review: Jalexander; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/13099 [04:36:25] New review: Jalexander; "dependency on extension merged in as well." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/13099 [04:43:51] mornin [04:44:03] what the hell happened [04:44:12] I only slept for like 5 hours [04:44:25] and all hell broke loose apparently [04:58:07] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [04:59:11] huh I guess I got 5.5 hours [04:59:23] looking at my email now [05:01:22] what you said about bad luck yesterday. surely we shoudl be due for some good luck soon? :-/ [05:02:10] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [06:25:03] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:30:54] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:21:09] good morning [07:22:03] New review: Faidon; "First of all, a minor nitpick: the file won't have a final newline, which will make it a bit unpleas..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/12377 [07:22:13] good morning :) [07:22:44] oh the newline :-D [07:23:03] I did some housework in my gerrit change list, so you might have received review requests from me [07:23:36] you did? [07:24:15] I had several draft changes I abandoned and some that did not have a reviewer [07:27:54] paravoid: for the $realm, my commit message is misleading. I am indeed expecting /etc/wikimedia-realm to let us know on which puppet $::realm we are running under [07:29:36] New patchset: Hashar; "/etc/wikimedia-realm containing $::realm" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12377 [07:29:53] * hashar find out how to add a newline [07:30:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12377 [07:33:32] New patchset: Hashar; "/etc/wikimedia-realm containing $::realm" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12377 [07:34:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12377 [07:35:53] New review: Hashar; "I am assuming realm to be the one from puppet. Aka either 'labs' or 'production'. My comment about '..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/12377 [07:43:26] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12377 [07:43:29] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12377 [07:45:07] paravoid: is puppet labs running form production branch nowadays or should I cherry pick that change to the test branch ? [07:50:55] Should be running from prod [07:51:28] seems so :-D [07:51:39] I still have to read the mail about puppetmaster::self [07:55:46] Reedy: ping? [07:56:17] paravoid: he is usually not there during the morning (idle time: 6:45) [07:58:02] PROBLEM - Puppet freshness on mw56 is CRITICAL: Puppet has not run in the last 10 hours [07:58:40] New review: Hashar; "Do not submit this yet. Need to arrange $cluster / $realm in our Mediawiki config files." [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/12583 [08:01:43] sure, nothing urgent [08:02:07] maybe you can help too: I'm wondering which extensions should I push to Debian [08:02:11] php5-parsekit is one candidate [08:02:36] but I should probably join the PHP packaging team and it'd be nice to have a list of things to do beforehand [08:03:25] hashar: hi, i'll be right there for jenkins [08:03:43] paravoid: I am not sure how many custom extensions we use. [08:04:04] paravoid: I know we have a wikidiff extension, which is a C implementation of mediawiki PHP differ. But it is most probably already upstream [08:04:22] mutante: take your time :-] [08:04:35] it is [08:07:24] paravoid/hashar: yesterday wondering about "mediawiki-math" which already is a Debian package and appears to be used on PDF servers, but "This is a transitional package and can safely be removed. " [08:08:28] http://packages.debian.org/sid/mediawiki-math [08:08:58] mutante: that's because it's replaced by mediawiki-extensions-math [08:09:31] http://qa.debian.org/developer.php?login=pkg-mediawiki-devel%40lists.alioth.debian.org [08:10:34] looking at that qa page now.. i guess their mailing list and our mailing list for mediawiki package maintainers should exchange messages or something [08:12:29] New patchset: Hashar; "detect cluster with /etc/wikimedia-realm" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12583 [08:13:21] paravoid: pkg-mediawiki-devel@lists.alioth.debian.org <-> mediawiki-distributors@lists.wm ? you think we should invite list members vice versa? [08:14:04] New review: Hashar; "The loaded value from /etc/wikimedia-cluster was not at all what we expected for $cluster. Thanks to..." [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/12583 [08:17:16] mutante: I think they're talking already [08:17:38] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/12583 [08:17:59] mutante: getting a coffee and I am ready whenever you are :-] [08:21:09] hashar: ok. soo. first question: you really still want 1.470? i see 1.472 [08:21:32] mutante: seen that yesterday [08:22:15] mutante: yeah just get 1.472 [08:22:22] hmm, release days are 06/13, 06/18, 06/24 [08:22:30] jenkins is doing contint, releasing a new version every week or so [08:22:42] just like we tag a new MediaWiki wmf branch every 2 weeks [08:22:57] k, i see from the dates, lots of versions yea [08:23:17] I already reviewed the past changelogs [08:23:34] there is some bug fix I am interested in, would most probably fix issues I have with our inst [08:23:44] so might as well get the 1.471 and 1.472 [08:24:16] mutante: can you also have a look at https://wikitech.wikimedia.org/view/Jenkins while doing change ? [08:24:16] wget http://pkg.jenkins-ci.org/debian/binary/jenkins_1.472_all.deb [08:24:20] we might want to update our doc [08:24:35] i got that open, yea [08:24:42] wooster told me about having Diederik trained on doing Jenkins upgrade [08:24:53] (will be for next time though :pD ) [08:25:04] being, lazy, wget direct to brewster [08:25:47] yea, uhm, it needs brewster access [08:25:59] which you dont? [08:26:00] ? [08:26:10] bah lame question [08:26:14] i do, but about training others [08:26:23] ah you are right [08:26:44] !log importing jenkins_1.472_all.deb into lucid-wikimedia using reprepro [08:28:02] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [08:28:11] updating package lists on gallium [08:28:21] hashar: shall i upgrade on gallium using apt? [08:28:26] it offers it now [08:28:44] according to our doc apt-get update && apt-get upgrade jenkins [08:28:49] Inst jenkins [1.458] (1.472 Wikimedia:10.04/lucid-wikimedia) [08:29:07] yeah, did the update, just asking if you want me to right the second [08:29:18] right ? [08:29:20] delete ? [08:29:25] yeah just upgrade :) [08:29:55] !log apt-get upgrade on gallium, installs newer jenkins [08:30:00] puppet just have ensure => present [08:30:10] so we don't have it upgraded "by mistake" [08:30:27] hmm, well, you could change to "latest" since we control when there are new packages [08:30:39] but yea, it works just fine like this too [08:31:07] and you can separate the steps import to repo and upgrade on server [08:31:30] "Please wait while Jenkins is getting ready to work..." [08:31:34] + I need to be there to check that Jenkins still work [08:31:53] i see that message on integration.mw/ci/ now [08:32:02] it is loading :) [08:32:32] there it is, yay [08:32:33] INFO: Jenkins is fully up and running [08:33:12] now I am going to run a test [08:33:16] hashar: so that wasn't that much to do after all:) a) wget .. b) reprepro .. c) apt-get upgrade . done [08:33:24] kk, please test [08:34:22] https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/2532/ testing in progress :-] [08:36:17] mutante: works for me :-] [08:36:29] :) [08:36:39] mutante: so that is a success for me as far as I can tell [08:36:53] though we might have some surprise later on. But I guess I can fix them myself [08:37:01] cool! [08:37:13] I love when maintenance run well [08:37:25] hashar: console output looks nicer :) [08:37:33] heh,yeah, just because it breaks sometimes, doesnt mean it has to all the time :) [08:37:34] its indented now, nice [08:37:43] you are right [08:37:51] looks like they replace \t with some or something [08:38:16] so you can close the RT ticket [08:38:23] will you be there this afternoon? [08:38:30] in case something is screwed ? [08:38:49] are you talking to me? [08:39:50] hashar: RT resolved, pasted the important lines from IRC [08:39:57] thought so :) [08:40:26] argh sorry [08:40:35] your nicks use the same color [08:40:41] lol [08:40:42] so I thought Krinkle was mutante :-] [08:40:50] hashar: yea, ehm, afternoon, at some point i need to do some errands but i can be a bit flexible about it [08:40:55] I don't mutate ;-) [08:41:18] colors? i dont think i have IRC colors [08:41:44] I am pretty sure your cli client could colorize nicknames [08:42:13] yea, guess so, but i never picked any color for me, guess you can pick a color for me in your client [08:42:34] Colloquy only distinguishes between "you" and "the rest" (red/orange) [08:43:31] hashar: Their anchor links are borked though [08:43:32] https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/2532/console#ant-target-4 [08:43:37] ^ goes to "clean" [08:43:44] but you see "create-dirs" [08:43:48] because of that hover toolbar [08:44:07] ohnice, they even right-align the labels (exec vs mkdir) so that the response is left-aligned [08:44:16] someone's been busy [08:44:30] the wonders of doing contint? heh [08:44:59] my client does some kind of hashing of the nickname then select a color out of a few possibility [08:45:25] mutante: http://scripts.irssi.org/html/nickcolor.pl.html [08:45:39] that one does add ord() of each character of the nick [08:45:50] then does a modulo 11 to pick a color :-] [08:46:05] yeah, when you talk to me directly, your nick turns yellow, it stays gray if you don't use a nick: prefix [08:46:40] hehe, ok, i might check it out [08:46:47] hashar: Any idea what could cause this error on a fresh* labs instance in a project I own? *fresh= created yesterday, everything is done and set up "running" now, including in ganglia [08:46:48] hashar: > krinkle is not allowed to run sudo on i-000002eb. This incident will be reported. [08:47:01] Why can't I sudo? Without it can't even run puppet or whatever. [08:47:11] probably forgot to add you as a sysadmin [08:47:11] https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-000002eb [08:47:27] hmm, i heard one other report about not being able to sudo anymore [08:47:30] lets switch to #wikimedia-labs [08:47:36] but couldnt reproduce on instance i use [08:47:40] yep [08:48:32] mutante: again thanks for the Jenkins upgrade :-] [08:49:14] yw! glad it worked smoothly. wouldnt really know what needs to be fixed in docs [08:50:28] i mean in a good way, they already said what i did [08:51:34] I think I wrote that doc based on your input [08:51:37] and you even reviewed it [08:51:47] indeed : https://wikitech.wikimedia.org/history/Jenkins [08:51:48] ;) [08:52:02] so you are merely congratulating yourself for writing your own doc :-] [08:52:08] * hashar loves doc [08:52:53] heh. ok [09:23:40] New patchset: Dereckson; "(bug 37981) Babel categories configuration" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13121 [09:29:58] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [09:33:07] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:36:27] !log starting swift-container-auditor on ms-be3 [09:37:28] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:57:40] morebots is dead [10:04:09] he's known as lessbots [10:06:22] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:07:43] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:18:49] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:23:10] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:38:44] Damianz: should be $(PAGE)bots [10:38:51] so we can use our favorite pager [10:39:04] less is more ;) [10:43:45] New patchset: Hashar; "subscribe memcached service to /etc/memcached.conf" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13129 [10:44:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13129 [11:08:25] paravoid: hi [11:08:50] Reedy: hi :-) [11:11:15] hiiii [11:13:40] hm [11:13:47] how do you push a fast forward merge into gerrit... [11:13:58] it just says "no new changes" [11:18:55] I dunno, its always worked for me [11:19:22] i can push directly, but that's not how it should be :) [11:19:35] why a no-ff? [11:19:47] so that all the changes come in as separate commits? [11:19:48] a what? [11:19:55] what do you mean? [11:19:56] oh [11:20:01] nevermind [11:20:13] it's a straight ff, all it needs to do is update the ref :) [11:20:36] but that means there's no merge commit on my end [11:20:49] right [11:20:50] no clue [11:21:04] I guess I should use -no-ff [11:21:20] are all the virtN nodes configured with the same profile? [11:21:26] even the new ones? [11:21:28] same profile? [11:21:30] i.e. do they get the VLAN? [11:21:41] I'd hope they do [11:21:55] mark: how about that router access? :) [11:22:27] http://code.google.com/p/gerrit/issues/detail?id=1145 [11:22:33] right [11:26:11] mark: yep. that kind of sucks [11:28:21] mark: turned out to be a "not cabled properly" issue [11:28:26] so, no access needed for that [11:39:03] hm [11:39:12] instances with public IPs can't talk to the outside world [11:39:57] at the moment? [11:40:00] yes [11:40:11] I can, however, ping en.wikipedia.org [11:40:53] have you changed anything? [11:41:12] with the exception of the stuff we did yesterday, no [11:41:51] what did you do yesterday? [11:42:16] basically nothing [11:42:25] I brought virt1 up as a network node [11:42:30] which moved two IP addresses [11:42:38] and the snat/dnat rules [11:42:48] when I took it back down, it moved them back [11:42:59] Ryan_Lane: my wikistats instance has a public IP and can also still fetch data with a shell script from external mediawikis [11:43:00] here's the snat rule for bastion-restricted: SNAT all -- 10.4.0.85 0.0.0.0/0 to:208.80.153.232 [11:43:12] I wonder if it's just instances on virt1 [11:43:18] lemme look at virt1 and virt3's rules [11:43:39] Ryan_Lane: well - you know the network is just setup for virt2 right now, right? [11:43:46] yes [11:43:51] so [11:44:03] nothing can use virt1 as a network node right now [11:44:07] right [11:44:10] everything moved back [11:44:24] I think I see the issue [11:45:50] fixed [11:45:58] virt1 had a bad SNAT rule [11:47:09] New patchset: Mark Bergsma; "Merge branch 'mp-bgp'" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/13131 [11:47:34] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/13131 [11:47:36] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/13131 [11:49:22] \o/ [11:49:33] gonna deploy that one one lvs machine in a bit [11:49:44] then when it works well for a day or two, on the other ipv6 boxes [11:49:52] then when that remains working well for a week, we can finally upgrade the rest ;) [11:50:28] conveniently, lvs1004 serves no traffic atm [11:50:33] which is perfect for pybal testing [11:50:42] as pybal behaves exactly the same, traffic on it or not [11:51:04] Can someone kick morebots? Thanks [11:51:46] lvs1005 I mean [11:53:28] 208.80.153.192 64666 2647 2708 0 296 22:02:57 Establ [11:53:28] inet.0: 0/0/0/0 [11:53:28] inet6.0: 1/1/1/0 [11:53:35] 22 hour uptime, good [11:53:49] hehe [11:53:58] paravoid: you used some 32 bit ASNs in the default pybal config file [11:54:10] and bgp.py didn't handle that well ;) [11:54:55] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/13099 [11:54:58] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13099 [11:55:03] #bgp-local-asn = 65551 [11:55:04] #bgp-peer-address = 192.0.2.254 [11:55:04] #bgp-as-path = 65551 65552 [11:55:12] was that intentional? :) [11:55:33] that's not 32 bit [11:55:43] well it's not 16 bit either :P [11:57:17] ah, my bad [11:57:26] hehe I didn't see it during review either [11:57:26] that's ASNs from rfc5398 [11:57:34] and it took me a minute to realize as well [11:57:42] when struct.encode was throwing errors ;) [11:57:43] IANA has reserved a contiguous block of 16 Autonomous System numbers from the unallocated number range within the "16-bit" number set for documentation purposes, namely 64496 - 64511, and a contiguous block of 16 Autonomous System numbers from the "32-bit" number set for documentation, namely 65536 - 65551. [11:57:55] ok [11:57:59] i'll change that to 16 bit for now ;) [11:58:02] yep [11:58:11] i'll add 32 bit asn support soon, shouldn't be hard [11:58:17] that's why I pasted the 16-bit ASN for documentation [11:58:18] but I need to get back to other things now ;) [11:58:30] I can commit/push that if you prefer. [11:58:35] i'm already on it [11:59:21] !log kicked morebots [11:59:27] Logged the message, Master [11:59:33] gah, nano being the default editor in labs :P [11:59:35] what's up with that [11:59:52]