[00:00:24] JohnLewis: +1, thx [00:01:02] deploying 4.4 modification changes now after andre_ did some testing that looked all good [00:13:03] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 158.46666 [00:13:03] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 138.333328 [00:13:32] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 587.666687 [00:13:42] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 588.266663 [00:24:12] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [00:45:42] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:46:03] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:46:03] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:46:12] (03CR) 10Dzahn: [C: 031] switch Bugzilla to zirconium [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [00:46:32] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:49:07] (03CR) 10Aklapper: [C: 031] switch Bugzilla to zirconium [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [00:50:04] !log xtrabackup clone db1018 to db1036 [00:50:12] Logged the message, Master [00:53:18] (03PS6) 10Dzahn: switch Bugzilla to zirconium [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 [00:55:11] (03CR) 10Dzahn: [C: 032] "there you go, new Bugzilla coming up" [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [00:55:26] !log DNS update - switching Bugzilla to zirconium [00:55:33] Logged the message, Master [00:56:57] !log Welcome to Bugzilla 4.4.1 in eqiad, served by puppet module bugzilla on zirconium [00:57:05] Logged the message, Master [00:59:19] ^d: DNS updated. I guess you can slowly allow Gerrit to report into Bugzilla again :) [00:59:51] (03PS1) 10Ori.livneh: add 'mwgrep' common script [operations/puppet] - 10https://gerrit.wikimedia.org/r/113061 [01:01:28] mutante, whee :) [01:02:06] :) , yea, yay, it took a while [01:02:34] all the modifications stuff etc, we had to make some last minute revert/fix too [01:02:57] *nod* lgtm so far. the saved searches layout in the sidebar is a bit messed up presumably due to some changes in the new version [01:03:00] i really like the new landing page [01:03:18] (the "Common queries for open reports") [01:04:23] yep - very convenient [01:05:30] Eloquence, yeah, the one regression I have, see https://bugzilla.wikimedia.org/show_bug.cgi?id=61288 [01:06:15] (03PS2) 10Ori.livneh: add 'mwgrep' common script [operations/puppet] - 10https://gerrit.wikimedia.org/r/113061 [01:06:19] Eloquence: I decided to keep the small UI regression to be able to have the new "Saved Reports" being listed in the sidebar, as I know that some devs use reports in Bugzilla [01:06:31] * andre__ off to send an announcement email [01:11:49] mutante, andre__ great job regarding bugzilla! [01:13:00] drdee: thanks, really [01:18:07] drdee: thx, we're glad [01:18:18] closing those RTs is fun [01:18:28] very rewarding :) [01:25:26] drdee: do i resolve the tracking ticket when all children are closed?:) [01:36:02] Hmph, https://integration.wikimedia.org/ci/job/mwext-VisualEditor-npm/869/console and similar URLs are 504ing [01:36:05] * RoanKattouw doesn't see a hashar [01:36:49] (03CR) 10MZMcBride: "As I said on IRC, I think this removal was fine. It's more important to me that bugs.wikimedia.org and bugs.mediawiki.org continue to func" [operations/dns] - 10https://gerrit.wikimedia.org/r/112932 (owner: 10Dzahn) [01:36:59] mutanta; yeah totally! [01:47:55] (03PS2) 10MZMcBride: Enable VisualEditor on OTRS_wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [01:48:51] (03CR) 10TTO: Add transwiki import options for zh.wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110876 (owner: 10Ebe123) [01:49:00] (03PS3) 10MZMcBride: Enable VisualEditor on OTRS_wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [01:49:02] (03CR) 10jenkins-bot: [V: 04-1] Enable VisualEditor on OTRS_wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [01:49:09] (03CR) 10jenkins-bot: [V: 04-1] Enable VisualEditor on OTRS_wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [01:51:32] <^d> !log gerrit: turned BZ plugin back on since downtime is over. No less than 5 pings to do so ;-) [01:51:43] Logged the message, Master [01:51:45] ^d: !:) ty [01:52:07] (03CR) 10BryanDavis: [C: 031] "Thanks for taking the time to add all of this." [operations/puppet] - 10https://gerrit.wikimedia.org/r/112855 (owner: 10Ryan Lane) [01:52:21] <^d> mutante: BZ was still down when I switched office -> home :p [01:53:06] ^d: yea, we used the maximum window but just up to it, had some last minute issue [01:53:18] hope you enjoy new version [01:53:58] ^d: Is the BZ plugin Gerrit Notification Bot? [01:54:03] <^d> Yes. [01:54:06] Nice. [01:54:52] * ^d dreams for a future where all of these things are one system [01:55:42] ^d: Speaking of which, Phabricator seems to enjoy telling me about pending access requests. [01:55:58] <^d> Indeed it does. [01:56:11] <^d> I'm sure all admins are getting that spam :p [01:56:23] I wonder if it has a stagger or tier option. [01:56:26] ^d: i actually told people before how the same discussion from BZ could also be simply on gerrit, just suggest a change, since code is the best summary anyways, and then discussthere [01:56:46] well, we have this giant project.. to unify everything , heh [01:57:02] <^d> The downside of that plan is that we'd have to keep using gerrit :\ [01:57:08] :p [01:57:28] Andre and Guillaume are doing an evaluation. [01:57:31] Of course. [01:57:33] yea [01:57:48] We'll see what the recommendation is. But my personal preference is to stabilize. [01:57:54] And focus on other parts of the stack. [01:58:11] i also tend to "don't change too much here, now that you just got some volunteers used to it" [01:58:59] Yeah. It's not a small cost to move to something else. [01:59:25] <^d> I'll move Mediawiki tomorrow probably. [01:59:29] <^d> I'm done with gerrit :) [01:59:55] Move Mediawiki? [02:00:07] <^d> Yeah out of gerrit and to phabricator. [02:00:35] Jenkins has been hanging for a while: https://integration.wikimedia.org/ci/job/mwext-EducationProgram-jslint/647/console for example [02:00:38] Gerrit needs a maintainer. [02:00:50] ^d: Have you tried installing phabricator? [02:00:51] Though the most recent bug got fixed pretty quickly. [02:00:54] So I can't really complain. [02:00:59] <^d> awight: Yes. It took about 2 minutes [02:01:03] I was all gung-ho until I looked at their source code [02:01:29] <^d> I'll avoid looking then! :) [02:01:35] ^d: interesting. I was not so lucky. There are a lot of assumptions about absolute paths and stuff [02:01:43] that just reminds me how i looked up "gung-ho" last time:) [02:01:53] and the theories where the term comes from [02:01:55] * awight looks up [02:01:56] <^d> awight: I followed the guide line-by-line and it Just Worked. [02:02:31] <^d> Gloria: Good luck bribing someone to do that. [02:02:45] ^d: I think it's called a salary, not a bribe. ;-) [02:02:53] <^d> Oh no, it's a bribe! [02:02:59] * ^d prepares some health warning pamphlets to hand out to such an unlucky person [02:03:05] I'm not asking for someone to volunteer. I'm asking for it to be actively maintained. [02:03:07] Well I don't want to open that wormbin necessarily, but the actual tools we need from a CR system are pretty minimal, IMO. Home-rolled should be on the table... [02:03:14] <^d> "The Surgeon General has warned that extensive Gerrit exposure is poor for your health" [02:03:20] you could put bounty money on BZ tickets [02:03:24] <^d> Gloria: That's a waste of money :) [02:03:25] and mail wikitech [02:03:26] There's money available if necessary. It's a pretty important piece of software. [02:03:55] Gerrit code is a total write-off... [02:03:58] ^d: I could say the same of many full-time staff members. The costs are lower to maintain Gerrit and it's demonstrably useful. [02:04:05] !log Jenkins has been unresponsive to urls that retrieve build results for the past few hours (e.g. https://integration.wikimedia.org/ci/job/mediawiki-core-jslint/11129/console) [02:04:05] <^d> lolol [02:04:11] <^d> Paying someone to work on Gerrit doesn't change the fact that the upstream is *impossible* to work with. [02:04:13] Logged the message, Master [02:04:20] Are they? [02:04:32] <^d> Yes. [02:04:36] If that's the only issue, just fork? [02:04:39] ^d: yep. code duplication hell [02:04:46] <^d> Gloria: That's an even worse idea. [02:04:48] -just- fork? [02:04:59] gerrit? um. [02:05:04] to "tigger" [02:05:05] <^d> Fork and maintain our own JGit-based code review server? [02:05:11] <^d> Talk about *waste of freaking money* [02:05:24] ^d: You mean our central code repository manager? [02:05:40] I think of a lot worse uses of money. [02:05:47] I can think * [02:05:52] ^d: did you make the phabricator instance public? I' [02:05:56] I'd love to try it out. [02:06:00] <^d> awight: fab.wmflabs.org [02:06:01] http://fab.wmflabs.org [02:06:05] rad, thanks [02:06:16] <^d> Gloria: One waste of money doesn't preclude another :) [02:06:18] ^d: I don't believe the upstream thing. [02:06:24] <^d> We can waste money on multiple things. [02:06:26] I've not seen any evidence of it, anyway. [02:06:46] Christian didn't seem to have difficulty modifying the code and making small improvements. [02:06:51] ..just make a poll ? ... [02:07:10] <^d> Anyway, I can not in good faith advocate putting any work into gerrit other than keeping the lights on. [02:07:24] <^d> It was a mistake to force it down everyone's throats 2 years ago, and I regret it. [02:07:27] So we'll have a code repo tool that's unmaintained? [02:07:39] <^d> Who said anything about unmaintained? [02:07:50] "putting any work into" [02:08:03] <^d> Well, I'm advocating dropping it too. [02:08:05] i dunno, for me it does the job [02:08:12] ^d: In favor of what? [02:08:19] Gloria: take any UI element and try to chase the dependencies around the Gerrit codebase. [02:08:30] <^d> Gloria: Phabricator...? [02:08:41] Whoever wrote it did not understand the dependency injection framework it's built on. [02:08:53] ^d, you have an interesting definition of "forcing down people's throats". the choice of git/gerrit tooling was a pretty massive elaborate process, probably more so than any other engineering org goes through for this kind of thing :) [02:09:25] I think it's an OK solution for now, but sure, Phabricator looks cool and might eventually be a viable replacement [02:09:26] ^d: I wasn't sure if you were advocating for Phabricator or it just happened to be the latest shiny toy installed quickly on Labs. [02:09:40] <^d> I'm playing with Phabricator. [02:09:44] <^d> So far I'm impressed. [02:09:45] Eloquence: that's true, but we were voting in the dark. Very few of us actually had the opportunity to try each system? [02:10:11] <^d> Eloquence: I think people who were in a position to drive the discussion (like myself) pushed harder for gerrit and basically sidelined a lot of the very valid "it's a pain in the ass to use" complaints that haven't changed since day 1. [02:10:20] <^d> Was Phabricator a viable alternative at the time? No. [02:10:26] ori did set up a phabricator instance fairly early in the process, and we met with the lead developer to see if it was at all close to being able to meet our needs. it was not. [02:10:26] <^d> Doesn't mean Gerrit was right. [02:10:51] nothing is ever right, chad. all tools suck :) [02:11:00] <^d> +1 to that! [02:11:37] <^d> Eloquence: Anyway, it's all water under the bridge. We're two years in and I don't think it's a bad discussion to revisit at this point. [02:11:38] I wrote the Phacility team (hosted phabricator) recently and they can't seem to get their own code to run ;) That's something to consider... [02:12:08] for sure - haven't followed phab much since then, definitely time to take a closer look again [02:12:09] I actually vastly prefer gerrit's code review workflow as compared to Apache's (Jira based), Sourceforge's (basically just push / patch request), and GitHubs -- once I wrapped my head around it; I found it made a lot of sense [02:12:27] *I don't know how phabricator handles review requests though so I can't judge it [02:12:35] <^d> Eloquence: It's *not* the product we declined to use 2 years ago. [02:13:02] <^d> mwalker: What I'm liking (and I want to play around with in practice) is that it (in theory) supports both pre *and* post commit review. [02:13:05] <^d> You review before commit. [02:13:11] <^d> Then can "audit" stuff that was committed. [02:13:12] ^d: Does Phabricator include a bug tracker or would we continue to use Bugzilla? [02:13:13] Maniphest... [02:13:22] <^d> Yep, Maniphest. [02:14:14] ^d: hmm; interesting; how does it ensure that the post review stuff doesn't get lost? (or do you have an example you can point me to so I can explore?) [02:14:26] <^d> I do not know yet. This is just theory based on docs. [02:14:28] Does anyone run a public Phabricator instance in production? [02:14:29] I'd like to see a real example, this Labs instance doesn't do much for me. [02:14:30] <^d> I haven't tried it yet. [02:16:04] <^d> Also, `arcanist` is way easier to install than `git-review` I don't know why we thought "Requiring PHP to submit patches" was an undue burden but "Require python + pip to use git-review" wasn't :) [02:17:26] ^d: would you mind approving my account request on the labs instance? [02:17:32] <^d> Just did. [02:17:37] thanks [02:18:48] http://phabricator.khanacademy.org/ maybe. [02:18:50] Oh, nope. [02:18:50] Must have a khanacademy.org e-mail address. [02:18:52] https://developer.blender.org [02:20:05] https://developer.blender.org/T37954 [02:20:17] ^d: Replacing Gerrit and replacing Gerrit+Bugzilla are rather different. [02:20:32] (03CR) 10Dzahn: "we _can_ also move bugs.wm to cluster redirects later but these 3 VirtualHosts are already puppetized to be on the BZ host itself, all the" [operations/dns] - 10https://gerrit.wikimedia.org/r/108906 (owner: 10Dzahn) [02:20:44] I'd recommend to anyone wanting to pursue this to figure out which you want before embarking. [02:21:00] <^d> I want ponies [02:21:02] <^d> And rainbows. [02:21:11] <^d> And a cookie [02:21:19] <^d> (Chocolate chip plz) [02:22:00] reprepro GerritBugzillaRTMingleZenDesk-stable-unicorn-wmf-edition-0.42.deb [02:22:08] ugh, netsplit, ttyl:) [02:23:00] Workflow versus interface aren't quite the same. :-) [02:23:01] It depends how configurable whatever's being considered is. I think Phabricator previously wasn't very configurable for workflow, but might be better now. [02:23:01] Gerrit's interface is awful. I think there's universal agreement about that. [02:23:02] Though I'm still not sure it's completely unfixable... we could make (and have made!) small improvements. [02:23:34] (03Abandoned) 10Dzahn: add account for Sahar Massachi (stat servers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112613 (owner: 10Dzahn) [02:23:58] One of us is lagged. [02:24:20] what happened to Special:Newpages on beta? [02:24:21] <^d> No lag here? [02:24:25] (03CR) 10Ori.livneh: [C: 032] add 'mwgrep' common script [operations/puppet] - 10https://gerrit.wikimedia.org/r/113061 (owner: 10Ori.livneh) [02:24:30] can't see what's patrolled anymore [02:25:12] (03PS1) 10Springle: prepare db9 for decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/113074 [02:25:27] <^d> Gloria: Gerrit's workflow isn't configurable without programming. [02:25:45] <^d> Whether it's Java or Prolog or something. [02:26:27] ^d: Even if Phabricator is the greatest software ever, it still needs a maintainer. We have Andre and Daniel for Bugzilla. I think Gerrit or Phabricator or whatever else needs maintainers. [02:27:01] ^d: We customized it. [02:27:05] !log LocalisationUpdate completed (1.23wmf13) at 2014-02-13 02:27:05+00:00 [02:27:05] it --> Gerrit's workflow [02:27:10] <^d> How? [02:27:12] Logged the message, Master [02:27:17] jenkins-bot? :-) [02:27:48] <^d> That's glue built around Gerrit. Workflow doesn't realize the difference. [02:28:01] <^d> Anyway, I think we'll find Phabricator maintainers easier than Gerrit. The whole "it's PHP" isn't without argument. [02:28:09] (03PS2) 10Springle: prepare db9 for decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/113074 [02:28:20] <^d> (I skimmed the auth plugins before, to see what it'd be like for us to tie into things. It's not rocket science) [02:28:26] (never mind about my issue, solved) [02:28:31] (03CR) 10Jalexander: [C: 031] Initial setup for legalteamwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112850 (owner: 10TTO) [02:28:45] I'm not sure what additional customization the workflow might need. [02:29:16] <^d> Who knows :) [02:29:47] (03CR) 10Springle: [C: 032] prepare db9 for decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/113074 (owner: 10Springle) [02:30:01] <^d> Gloria: For project-manager-y types, things like http://fab.wmflabs.org/project/board/3/ are kind of cool. [02:30:14] <^d> (Potentially provides Trello/Mingle-like features they'd want) [02:30:28] <^d> That are most certainly lacking in BZ. [02:30:45] I guess we're still considering "written in PHP" a feature. ;-) [02:31:23] ^d: If James F. were around, perhaps he'd suggest using Flow. [02:31:30] <^d> lol [02:31:45] <^d> Flow's gonna replace everything yo ;-) [02:32:37] I'd consider not reimplenting git in java the bigger feature... [02:32:44] a request/bug is a thing, a thing has a Q on wikidata, same for software, what else you neeed ?:O [02:33:41] <^d> mwalker: I was just explaining that to Nik earlier today. He was like "wait wtf? gerrit has an /implementation of git in java/ in it? why would java people want to....?" [02:33:48] !log db9 mysqld stopped for decom, db1001 slave stopped [02:33:49] <^d> I responded: gplv2 vs apache [02:33:52] <^d> Nik: Ah, makes sense. [02:33:55] Logged the message, Master [02:34:02] springle_: :) [02:35:09] <^d> Gloria: When people suggest doing things like that, I'm reminded of Tim's quote about simulating nuclear launches & cat feeding. [02:35:27] ^d: Nice. [02:35:40] <^d> The important corollary of which is: that doesn't mean it's a good idea to do it though [02:35:44] <^d> :D [02:37:04] (03PS1) 10Dzahn: remove old db9 bugzilla_report script from kaulen [operations/puppet] - 10https://gerrit.wikimedia.org/r/113075 [02:38:11] Yeah. [02:38:11] That'd be great. [02:38:12] (And any bug tracker.) [02:38:22] Well, not any. [02:38:22] But I don't think such a view is common in most. [02:38:45] <^d> Gloria: One of us is definitely lagging. [02:38:57] (03PS2) 10Dzahn: remove old db9 bugzilla_report script from kaulen [operations/puppet] - 10https://gerrit.wikimedia.org/r/113075 [02:41:48] (03CR) 10Dzahn: [C: 032] remove old db9 bugzilla_report script from kaulen [operations/puppet] - 10https://gerrit.wikimedia.org/r/113075 (owner: 10Dzahn) [02:41:59] (03PS1) 10Dzahn: remove old Bugzilla puppet class/role [operations/puppet] - 10https://gerrit.wikimedia.org/r/113076 [02:43:42] (03PS2) 10Dzahn: remove old Bugzilla puppet class/role [operations/puppet] - 10https://gerrit.wikimedia.org/r/113076 [02:45:15] (03PS3) 10Dzahn: remove old Bugzilla puppet class/role [operations/puppet] - 10https://gerrit.wikimedia.org/r/113076 [02:45:28] (03CR) 10Dzahn: "springle, no more db9 here either" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113075 (owner: 10Dzahn) [02:45:49] Was db9 previously an enwiki master? [02:46:01] The name is familiar, but I'm not sure why. [02:46:08] Gloria: it was for all the misc services [02:46:10] <^d> db9 was a misc db I thought [02:46:19] <^d> Yeah ^ [02:46:53] !log LocalisationUpdate completed (1.23wmf12) at 2014-02-13 02:46:53+00:00 [02:47:01] It all begins and ends with MediaWiki. [02:47:02] Logged the message, Master [02:48:14] <^d> Gloria: I think http://gregoryszorc.com/blog/2013/10/14/phabricator-is-awesome/ is a good read. [02:48:22] (03PS4) 10Dzahn: remove old Bugzilla puppet class/role [operations/puppet] - 10https://gerrit.wikimedia.org/r/113076 [02:48:57] ^d is just anti-flow >.> [02:49:10] <^d> ;-) [02:49:33] <^d> Also, here's phabricator's phabricator: https://secure.phabricator.com/ [02:49:50] lqt4lyfe [02:51:02] (03CR) 10Dzahn: [C: 032] remove old Bugzilla puppet class/role [operations/puppet] - 10https://gerrit.wikimedia.org/r/113076 (owner: 10Dzahn) [02:55:10] (03PS1) 10Dzahn: Bugzilla TTL back to normal 1H [operations/dns] - 10https://gerrit.wikimedia.org/r/113078 [02:56:22] (03CR) 10Dzahn: [C: 032] "finishes the migration for now" [operations/dns] - 10https://gerrit.wikimedia.org/r/113078 (owner: 10Dzahn) [02:57:30] !log DNS update - Bugzilla TTL back to 1H, migration over [02:57:38] Logged the message, Master [02:57:40] crap, due to netsplit , but yea:) [03:01:49] using Australian server now. meh [03:02:13] Welcome to banks.freenode.net in Perth, AU. [03:02:20] <^d> Can't stay on dickson? [03:02:34] we're not supposed to ALL be on dickson :) [03:02:42] but yea, i guess [03:02:44] <^d> That's the point, so we never split ;-) [03:02:52] i know .. we've been there [03:02:59] yea, i agree even [03:03:24] the only part that bugs me is when i !log as Guest1235 [03:03:27] the rest isn't a big deal [03:03:30] fixing on wiki [03:03:32] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:03:40] <^d> Well crap. [03:03:44] oops? [03:03:44] <^d> I was just about to step away [03:03:49] <^d> I have no clue, let's find out [03:04:23] <^d> $mistakes[] = 'gitblit'; [03:04:31] (03PS1) 10Ori.livneh: mwgrep: specify 'keyword' analyzer in query [operations/puppet] - 10https://gerrit.wikimedia.org/r/113080 [03:04:44] ^d: that's not git.wm.org [03:05:12] <^d> gitblit on antimony? [03:05:12] (03CR) 10Ori.livneh: [C: 032 V: 032] mwgrep: specify 'keyword' analyzer in query [operations/puppet] - 10https://gerrit.wikimedia.org/r/113080 (owner: 10Ori.livneh) [03:05:17] <^d> that's git.wm.o [03:05:37] ^d: 2 things [03:05:42] 1) it's really busy [03:05:45] shall i kill the java [03:05:54] 2) gitblit.wm.org doesnt exist, the monitoring is misleading [03:06:11] <^d> I'm already on it. [03:06:16] <^d> Was going to kick the service. [03:06:18] ok, cool [03:06:21] yea, same here [03:06:26] 12421 gitblit 20 0 7735m 4.3g 8128 S 587 27.5 7807:27 java [03:06:30] etc [03:06:40] leave it to you [03:07:19] for a second it scared me because of the gitblit.wm.org and that doesnt resolve , and right after DNS update, you know [03:07:31] <^d> Yeah, dunno why that says that. [03:11:53] <^d> ori: ping [03:12:01] hey [03:12:05] <^d> You rewrote the upstart job for gitblit. Does it log somewhere now? [03:12:21] it logs to /var/log/upstart/gitblit.log, as upstart jobs do by default [03:12:30] <^d> Ah gotcha, thx [03:12:32] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 215345 bytes in 8.761 second response time [03:13:27] <^d> Oh shut up no it's not. [03:14:03] (03PS1) 10Ori.livneh: mwgrep: do not quote search term [operations/puppet] - 10https://gerrit.wikimedia.org/r/113081 [03:15:12] (03CR) 10Ori.livneh: [C: 032 V: 032] mwgrep: do not quote search term [operations/puppet] - 10https://gerrit.wikimedia.org/r/113081 (owner: 10Ori.livneh) [03:15:22] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 516 bytes in 0.017 second response time [03:15:52] PROBLEM - gitblit process on antimony is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar gitblit.jar [03:16:20] ^d: everything ok? need me to look at something? [03:16:37] oh, mutante is here [03:16:44] * ori helpfully dumps it on mutante [03:17:16] <^d> Nothing weird happening when it whigged out [03:17:24] <^d> Not wanting to get starting right now. [03:17:27] what is "it" [03:17:30] <^d> gitblit. [03:17:44] i know, but what else besides you starting it [03:17:51] why did you say its not ok when Icinga said it was [03:18:20] <^d> I don't know why icinga thought it was up. [03:18:52] RECOVERY - gitblit process on antimony is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar gitblit.jar [03:19:08] wait, and ori , you just rewrote the way this is started [03:19:17] and expect me to fix .. eh. how [03:19:26] <^d> Recently, like a few weeks back. [03:19:28] <^d> No harm there :) [03:19:30] oh [03:19:31] <^d> It's back up now. [03:19:32] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 215229 bytes in 8.655 second response time [03:19:32] ok [03:19:34] :) [03:19:37] i am happy to look at it [03:19:44] i converted the init script to an upstart script a few weeks ago [03:19:48] is there an issue with it? [03:19:54] alright, it sounded more "just" than that [03:19:59] <^d> No problem with the script. [03:20:10] <^d> It started trashing as it tends to do about once a month. [03:20:20] <^d> It didn't come up the first time immediately. Came up the second. [03:20:20] yea, i think we did this once before [03:20:22] <^d> :) [03:20:25] all there was was restarting it [03:20:32] ok, relieved [03:20:43] <^d> I'm now repacking mw/core pretty aggressively. It's 1.2G on disk. [03:20:58] :) yea, wfm [03:21:00] <^d> I think it trashes when jgit tries to gc it every so often. [03:21:02] now [03:28:38] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-02-13 03:28:37+00:00 [03:28:45] Logged the message, Master [03:29:35] (03PS1) 10Dzahn: remove db9 from dsh and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/113082 [03:36:32] (03PS1) 10Andrew Bogott: Switch eqiad network host to use eth4 and eth5. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113083 [03:41:52] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 305 seconds [03:42:32] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 352 seconds [03:44:52] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay -1 seconds [03:45:32] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [03:48:32] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:48:42] (03CR) 10Andrew Bogott: "There are a few other places in puppet that refer to $::ipaddress_eth0. snmp puppet monitoring, and a reference in realm.pp." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113083 (owner: 10Andrew Bogott) [03:49:50] (03PS1) 10Springle: s1 repool db1056 as slave [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113084 [03:50:16] (03CR) 10Springle: [C: 032] s1 repool db1056 as slave [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113084 (owner: 10Springle) [03:50:22] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 217984 bytes in 6.515 second response time [03:50:26] (03Merged) 10jenkins-bot: s1 repool db1056 as slave [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113084 (owner: 10Springle) [03:51:11] !log springle synchronized wmf-config/db-eqiad.php 's1 repool db1056 as slave' [03:51:19] Logged the message, Master [03:58:32] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:00:32] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 219304 bytes in 9.672 second response time [04:06:33] ^demon|away: My after-discussion thought was that if we believe you that the decision to go with Gerrit was a bit rushed and a bit forceful, we should be doubly or triply skeptical of you this time when you're recommending Phabricator. :-) [04:06:35] (03PS1) 10Springle: s2 repool db1036 as slave [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113085 [04:07:04] (03CR) 10Springle: [C: 032] s2 repool db1036 as slave [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113085 (owner: 10Springle) [04:07:13] (03Merged) 10jenkins-bot: s2 repool db1036 as slave [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113085 (owner: 10Springle) [04:10:23] !log springle synchronized wmf-config/db-eqiad.php 's2 repool db1036 as slave' [04:10:31] Logged the message, Master [04:13:49] (03PS1) 10Springle: sectionsLoads white space [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113086 [04:14:22] (03CR) 10Springle: [C: 032] sectionsLoads white space [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113086 (owner: 10Springle) [04:14:29] (03Merged) 10jenkins-bot: sectionsLoads white space [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113086 (owner: 10Springle) [04:14:41] <^demon|away> Glaisher|away: Or, I'm cutting edge and every 2 years I'll take us to the cusp of greatness ;-) [04:40:28] (03PS1) 10Springle: db1056 and db1036 full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113087 [04:40:56] (03CR) 10Springle: [C: 032] db1056 and db1036 full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113087 (owner: 10Springle) [04:41:05] (03Merged) 10jenkins-bot: db1056 and db1036 full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113087 (owner: 10Springle) [04:41:52] !log springle synchronized wmf-config/db-eqiad.php 'db1056 db1036 full steam' [04:42:00] Logged the message, Master [05:22:28] (03PS3) 10Santhosh: Add shell account for santhosh, admins restricted + stats1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/112912 (owner: 10ArielGlenn) [05:47:37] (03CR) 10Ryan Lane: Code documentation for trebuchet's deployment module (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112855 (owner: 10Ryan Lane) [05:47:59] (03PS2) 10Ryan Lane: Code documentation for trebuchet's deployment module [operations/puppet] - 10https://gerrit.wikimedia.org/r/112855 [05:48:42] (03CR) 10Ryan Lane: "Totally welcome Bryan. Let me know the next place you'd like docs added and I'll get on it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/112855 (owner: 10Ryan Lane) [06:00:43] (03CR) 10Ryan Lane: [C: 04-1] "Some minor issues, but otherwise this looks great, thanks for adding this!" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112944 (owner: 10Ottomata) [06:15:15] (03CR) 10Ryan Lane: "As discussed on IRC, the master calling this function is automated and can fail. There's no human there to see errors, so it's necessary t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111749 (owner: 10Ryan Lane) [07:06:22] PROBLEM - Host db1016 is DOWN: PING CRITICAL - Packet loss = 100% [07:06:32] RECOVERY - Host db1016 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [07:07:10] that's springle's way of making sure we're paying attention [07:16:00] heh [07:59:25] than again, who needs bugzilla anyway... :))) [07:59:32] * yurik done trolling [08:13:57] (03PS1) 10Patchon: Replacing spaces with tabs. [operations/debs/lucene-search-2] - 10https://gerrit.wikimedia.org/r/113091 [08:22:08] (03CR) 10Patchon: ">> I see no harm in merging this if it helps someone in the short term, once the style nits are taken care of that Max pointed out." [operations/debs/lucene-search-2] - 10https://gerrit.wikimedia.org/r/112871 (owner: 10Patchon) [09:02:50] (03PS1) 10Ryan Lane: Ensure resolv.conf is generated properly in labs images [operations/puppet] - 10https://gerrit.wikimedia.org/r/113092 [09:02:57] andrewbogott: ^^ [09:04:02] maybe remove all files from that dir just to be sure? [09:04:30] unless that is the only file, to start? [09:05:52] the files are important [09:05:56] all the other files are empty [09:06:11] and original is prepopulated when you do certain things [09:06:21] ok [09:06:32] tail just points to original [09:06:33] (03CR) 10Andrew Bogott: [C: 031] Ensure resolv.conf is generated properly in labs images [operations/puppet] - 10https://gerrit.wikimedia.org/r/113092 (owner: 10Ryan Lane) [09:06:46] I removed my merge privileges [09:07:06] well, I added them back, but I'm going to continue not merging things :) [09:07:17] since I added the change, you should merge through [09:07:30] ok [09:08:01] (03CR) 10Andrew Bogott: [C: 032] Ensure resolv.conf is generated properly in labs images [operations/puppet] - 10https://gerrit.wikimedia.org/r/113092 (owner: 10Ryan Lane) [09:13:52] morning [09:15:19] yes it is [09:22:36] (03CR) 10Nemo bis: "Indeed. I wanted to publish my reply on bugzilla yesterday, but bugzilla was down. :) I've continued there. I asked se4598 to comment on t" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112639 (owner: 10MZMcBride) [09:27:11] (03PS4) 10Hashar: Initial debian release [operations/debs/git-fat] (debian) - 10https://gerrit.wikimedia.org/r/113018 (owner: 10Ottomata) [09:42:46] (03CR) 10Mark Bergsma: [C: 031] "Let's see how much trouble it will be to use eth4 as primary NIC. If needed, we can always disable the GigE NICs, so the 10G interfaces be" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113083 (owner: 10Andrew Bogott) [09:44:16] mark, are you thinking that eth0 and eth4 will both have an IP, or just eth4? [09:44:49] just eth4 I think [09:44:57] so also /etc/network/interfaces needs to be changed [09:45:04] eth4 would simply be what eth0 is now [09:45:25] ok -- /etc/network/interfaces isn't puppetized, is it? [09:45:33] no, it's done by the installer [09:45:47] so we do some _modifications_ (with augeas) to it in puppet [09:45:54] but this we'll need to change manually then [09:45:55] or reinstall [09:46:03] let's do manually first I'd say, we'll do the reinstall later anyway [09:46:22] Sure. But, in the case of reinstall... [09:46:36] Oh, I guess I put a different nic in the dhcp setup? [09:46:44] um… mac address, I mean [09:47:50] iirc, the installer checks which nics have link [09:47:53] (03PS5) 10Hashar: Initial debian release [operations/debs/git-fat] (debian) - 10https://gerrit.wikimedia.org/r/113018 (owner: 10Ottomata) [09:47:53] and one with a link becomes eth0 [09:47:55] but i'm not entirely sure [09:48:03] oh, that's easy then. [09:48:04] and indeed, after that it's based on mac address [09:48:34] (03CR) 10Hashar: "Changed settings in debian/gbp.conf to be in the [DEFAULT] section instead of the [git-buildpackage], that affects all commands." [operations/debs/git-fat] (debian) - 10https://gerrit.wikimedia.org/r/113018 (owner: 10Ottomata) [09:50:08] (03PS2) 10Andrew Bogott: Switch eqiad network host to use eth4 and eth5. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113083 [09:51:08] Oh, hm, so if on a reinstall our current eth4 will become eth0, then... [09:51:19] i'm not entirely sure really [09:52:22] if that doesn't work, like I said, we can just disable eth0-eth3 in bios [09:52:26] and act as if they don't exist [09:52:30] since we wouldn't be using them anyway [09:52:33] that may be the easiest option [09:53:07] Is that something I would do from mgmt? [09:53:15] yes [09:53:20] during reboot, entering bios [09:53:24] If we disable 0-3 then my patch is wrong since everything will be named 0 and 1 [09:53:31] yup [09:53:37] OK, I'll see if I can do that now. Might as well get the naming right the first time through. [09:53:42] Or try at least :) [09:53:51] disabling eth0-3 seems safest at this point [09:53:54] meanwhile do you need the MACs to set up the links? [09:54:01] it's just that being able to have eth4 as primary is nice to have [09:54:05] but maybe not your main worry right now :) [09:54:09] no I don't [09:54:21] I just need to alter the switch config, which i can do in 5 mins [09:54:27] I agree that using just 10g is worth some trouble. Lemme tinker with the bios and see if that's easy. [09:55:19] (03CR) 10Hashar: "And the Jenkins job works! The main page is: https://integration.wikimedia.org/ci/job/operations-debs-git-fat-debian-glue/2/ you will find" [operations/debs/git-fat] (debian) - 10https://gerrit.wikimedia.org/r/113018 (owner: 10Ottomata) [09:57:42] PROBLEM - Host labnet1001 is DOWN: PING CRITICAL - Packet loss = 100% [10:05:14] hm, I wonder how we'll pxe boot without the embedded nic enabled... [10:05:29] usually that works with any nic [10:06:34] hehe [10:06:42] each and every nic of labnet1001 is connected to the switch ;) [10:07:46] ok, rebooting, we'll see what this looks like when it comes up [10:08:23] huh [10:08:31] something is wrong [10:08:34] how can it have worked now [10:08:40] ? [10:08:42] eth4 and eth5 don't seem setup for tagging on the switch side [10:09:11] so if worked now, that probably means that labnet1001 was not really setup to use the tagged interface [10:09:24] and instead used eth4 instead of eth4.1102? [10:09:38] Boot is hanging on'Waiting for network configuration...' [10:09:55] But, let me know when you're ready and I'll try a reinstall. [10:10:38] ok [10:10:43] let's first try to boot back into what you had without changes [10:10:47] and figure out why this is working at all :) [10:11:10] eth4 is setup as a normal untagged interface on the switch, so eth4.1102 can't be doing anything [10:11:39] oh? Want me to re-enable those interfaces then? [10:12:03] what exactly have you done so far? [10:12:09] disabled eth0-3 and rebooted into linux? [10:12:14] right. [10:12:22] So, now ip -a shows eth4 and eth5 [10:12:24] did you modify /etc/networking/interfaces at all? [10:12:30] and the eth4.1102 [10:12:32] ok so [10:12:40] linux keeps the existing naming based on mac address [10:12:53] to remove that, find a file called something like persistent-net.rules in /etc/udev/ or subdirs somewhere [10:12:57] and remove that file [10:13:54] /etc/network/interfaces is the same as it was, so still refers to eth0. [10:13:59] right [10:14:02] ok. [10:14:20] root@labnet1001:/etc/udev/rules.d# ls [10:14:21] 70-persistent-cd.rules 70-persistent-net.rules [10:14:25] yes [10:14:29] so, just rm -net.rules? [10:14:32] just remove that file and reboot [10:14:38] ok [10:14:41] i think eth4 will become eth0 then [10:14:55] ok, rebooting [10:17:27] weird, eth4 isn't put in any vlan on the switch [10:17:35] so how anything was working at all is a mystery :) [10:17:57] dang [10:18:06] boot is still unhappy about the network config [10:18:12] but, it'll timeout and then I can see what we have [10:18:17] yep [10:19:53] ok, now I have eth0 and eth1 [10:20:01] neither has an ip [10:20:15] what is the content of /etc/network/interfaces? [10:21:06] same as before, eth0 and eth4.1102 [10:21:19] want to see all of it? [10:21:34] so why doesn't eth0 have an ip [10:21:45] i can understand why it wouldn't /work/ [10:21:48] because the switch is not setup right [10:21:54] but it should still have an ip if it's setup for static ip [10:21:59] so just paste the eth0 section [10:22:16] well, our new eth0 has a different mac than before, and nothing knows about the new mac... [10:22:18] I think? [10:22:31] nothing needs to know about the mac [10:22:37] this is messy so I'm going to paste the whole thing [10:22:42] the only thing the mac was used for before was to keep that nic at eth4 [10:22:45] ok [10:22:47] https://dpaste.de/iLE4 [10:22:59] ok [10:23:02] so that is wrong and weird [10:23:05] eth0 is setup for dhcp [10:23:34] normally, the debian installer, gets the ip address on eth0 during the installation, and then converts that into a static ip [10:23:41] so after install, eth0 should be setup statically [10:23:48] ok [10:23:57] seems like that didn't happen here or get changed somehow [10:24:04] which is bad, because it makes this server dependent on our dhcp server [10:24:06] Would be interesting to do a reinstall and see what it does. But that's a long wait [10:24:15] any idea how this happened? [10:24:36] Not really. I've mucked with that file a bit, although I'm pretty sure I haven't touched the eth0 entries. [10:24:43] i'm going to fix the switch configuration now [10:24:54] so, does the server now have eth0 and eth1? [10:25:08] it does. [10:25:23] alright [10:25:29] so, you should modify your existing patch [10:25:31] keep eth0 at eth0 [10:25:39] and change eth4. to eth1. [10:25:43] ok [10:25:49] i'll modify the switch config accordingly [10:27:49] (03PS3) 10Andrew Bogott: Switch eqiad network host to use eth4 and eth5. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113083 [10:28:00] ooh, that needs a new title [10:29:00] (03PS4) 10Andrew Bogott: Switch eqiad network host to use eth0 and eth1. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113083 [10:30:01] mark, ^ [10:30:37] ok [10:30:44] fixed the switch config [10:30:57] (03CR) 10Andrew Bogott: [C: 032] Switch eqiad network host to use eth0 and eth1. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113083 (owner: 10Andrew Bogott) [10:31:10] can you tell me what the difference is [10:31:14] between those two parameters [10:31:26] network_flat_interface and network_flat_interface_name? [10:31:39] i wonder why nova-network needs to know about eth4 (now eth1) ever [10:32:26] I don't know what the difference is, but I see both of them in the database. [10:32:36] It probably works to only set one of them… [10:32:38] that's probably part of what was wrong [10:32:46] oh? [10:32:55] if you use eth4 (now eth1) anywhere but interface::tagged in puppet [10:32:58] then probably that's not right [10:33:14] interface::tagged sets up the tagged interface eth4.1102, so that's why it needs eth4 [10:33:23] but other than that, the system should just use eth4.1102 [10:33:47] lemme see if I can find any docs about those settings [10:35:02] OK, you're right. network_flat_interface_name is not used anywhere except when making that interface. [10:35:06] It's not used in the nova config. [10:35:21] what do you mean by "except when making that interface"? [10:35:32] it's just a dummy hash entry that's not actually used? [10:36:03] it's passed in as base_interface to interface::tagged [10:36:04] that's it. [10:36:24] but in the puppet patch you just did, interface::tagged params are hardcoded [10:36:29] not coming from network_flat_interface_name [10:36:44] it's used for compute nodes. [10:36:49] ahh [10:36:55] but compute nodes don't even have eth4 [10:37:00] Um… some refactoring would be in order, there are two different places where the tagged interfaces are... [10:37:00] so how did any of this work at all? [10:37:03] it looks broken on multiple levels :) [10:37:09] yes, but in my patch it's selectively set on node name. [10:37:17] So different for the network node than for the compute note. [10:37:20] *node [10:37:24] right [10:37:25] So probably it's not actually ever used... [10:37:26] we can probably unify that now [10:37:40] perhaps refactor puppet then [10:37:44] and we can do a reinstall after [10:37:47] with fixed switch config [10:37:52] probably it's broken then [10:37:55] but we'll have to make it work [10:37:55] OK. I'm convinced that the right values are getting to the right places now, though. [10:38:06] possibly, but it's confusing [10:38:19] i'd rather get it fixed now we have the opportunity [10:38:21] ok :) [10:38:37] and i still don't understand how it worked with the switch config being incorrect, so that's a big red flag [10:38:49] perhaps it has been using eth0 all along or something like that :) [10:39:01] which is bad, which will break soon, and which we need to fix now [10:39:32] so... let's refactor puppet, then do a reinstall [10:42:09] if you would prefer to keep the network node separate, that might make sense though [10:42:15] it's not the same as the compute nodes after all [10:42:47] perhaps we can rename network_flat_interface_name [10:43:10] to something like network_flat_tagged_base_interface [10:44:23] that would fix the confusion I think [10:44:42] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/debs/vips] - 10https://gerrit.wikimedia.org/r/113098 [10:45:12] (03PS2) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/debs/vips] - 10https://gerrit.wikimedia.org/r/113098 [10:45:54] (03PS1) 10Andrew Bogott: Small reorg around reuse of network_flat_interface_name and network_flat_interface [operations/puppet] - 10https://gerrit.wikimedia.org/r/113099 [10:46:04] I'm slightly reluctant because of trying not to break tamp in the meantime… but ^ should be pretty safe [10:46:39] ok [10:47:28] ah, shit, that patch breaks vlan_id which isn't... [10:47:29] hang on... [10:47:43] (03CR) 10Mark Bergsma: [C: 04-2] Small reorg around reuse of network_flat_interface_name and network_flat_interface (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/113099 (owner: 10Andrew Bogott) [10:47:45] yep :) [10:49:01] (03PS2) 10Andrew Bogott: Small reorg around reuse of network_flat_interface_name and network_flat_interface [operations/puppet] - 10https://gerrit.wikimedia.org/r/113099 [10:49:31] (03PS3) 10Hashar: git buildpackage basic configuration [operations/debs/vips] - 10https://gerrit.wikimedia.org/r/113098 [10:50:08] ok [10:50:13] just use $::realm and $::site from now on [10:50:25] (03CR) 10Mark Bergsma: [C: 031] Small reorg around reuse of network_flat_interface_name and network_flat_interface [operations/puppet] - 10https://gerrit.wikimedia.org/r/113099 (owner: 10Andrew Bogott) [10:50:49] i'll be back in ~ 20 mins [10:51:01] ok. Are you all set for me to do a reinstall? [10:51:45] (03PS4) 10Hashar: git buildpackage basic configuration [operations/debs/vips] - 10https://gerrit.wikimedia.org/r/113098 [10:54:06] (03CR) 10Andrew Bogott: [C: 032] Small reorg around reuse of network_flat_interface_name and network_flat_interface [operations/puppet] - 10https://gerrit.wikimedia.org/r/113099 (owner: 10Andrew Bogott) [11:11:13] andrewbogott: yes [11:11:31] ok, trying. It doesn't want to pxe boot from the 10g adapter, so… fussing with the bios [11:12:04] usually NICs bring their own pxe firmware [11:12:10] although hm [11:12:14] lunch out again, back in a little while [11:12:16] i do recall one server model where PXE on 10G didn't work [11:12:19] that could be annoying [11:12:28] yeah that rings a bell to me as well [11:12:40] It said something like 'no-pxe-compatible devices available' [11:12:41] what kind of dell server is this? [11:12:43] R510? [11:12:45] or 610 [11:12:49] hmmm [11:12:55] maybe it is something in the bios for booting over external cards? [11:13:02] yes, usually there is [11:13:08] but could be that these 10G nics don't support that [11:13:15] wasn't that an issue we had with those first memcached servers? [11:13:19] where we retrofitted 10G cards [11:13:41] and yet they booted in the end I think :) [11:14:32] I welcome suggestions! [11:15:23] it's a 610 [11:16:30] you sure we didn't install them via GigE and switched to 10G afterwards? [11:16:36] yeah those were 610s too I think [11:18:09] I am not, but I do remember writing doing the whole ixgbe patched kernel module dance with d-i as well (concatenating it to the initramfs) [11:18:31] so maybe I had a reason? I'm not sure, Asher was the one provisioning them [11:19:41] yes [11:20:08] mark, paravoid: sorry, network hiccup, lost the backscroll. [11:20:25] Would one of you like to use mgmt and see what you can find in the bios settings? [11:20:58] iirc, dell r610 has a bios update to allow pxe boot over 10GB NIC's [11:21:15] I dealt with it in the past [11:23:07] matanya, any idea as to a good search string? I haven't turned up much. [11:23:24] let mo look a sec [11:23:26] *e [11:23:41] thanks [11:25:04] found it andrewbogott [11:25:09] it is in the Integrated Devices Screen [11:25:28] under Embedded Gb NIC [11:25:43] ok, that's what I just turned off... [11:25:50] need to change it to: Enabled with PXE [11:26:10] That will let us boot from the gb interface, and at the moment our goal is to disable the gb interfaces and just use 10g [11:26:25] unless I'm misunderstanding what that setting is for [11:26:39] oh, you don't want the on board NIC? [11:27:00] right, we're trying to switch over to a 10g card. [11:27:24] The onboard is 1g [11:29:25] Although, mark, I can certainly configure us to have one onboard nic just for install. I don't know what that will do with device assignment and IPs and such though... [11:29:25] and the external one won't boot from PXE, yeah? [11:29:25] won't, or I can't find the setting to ask it to. [11:29:25] andrewbogott: yeah that works [11:29:25] you'll need to remove the persistent-net file again [11:29:25] and I need to reenable the switch port [11:29:25] i'll do that now [11:29:25] best also not run puppet before you reboot [11:29:41] ok... [11:30:01] ok, first GigE port should work again on the switch [11:30:01] I don't think there's any danger of puppet running atm :) [11:31:46] And anyway, I'm doing a pxe boot, right? So puppet/persistent-net/whatever is moot. [11:31:46] ah, console is in use. Is that one of y'all? [11:31:46] or did I manage to lock it somehow? [13:03:28] (03CR) 10Matanya: [C: 031] remove db9 from dsh and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/113082 (owner: 10Dzahn) [13:05:26] grr trusty [13:05:45] oopses on rhodium [13:05:46] somewhere in mptsas [13:07:44] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 [13:07:46] there it is [13:14:52] (03PS1) 10Hashar: testing for Jenkins / debian-glue [operations/debs/vips] - 10https://gerrit.wikimedia.org/r/113114 [13:27:56] (03CR) 10Hashar: [C: 04-1] "export-dir/tarball-dir should be in a user config, see inline comment :-)" (031 comment) [operations/debs/vips] - 10https://gerrit.wikimedia.org/r/113098 (owner: 10Hashar) [13:29:19] (03PS1) 10Aude: Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 [13:41:56] (03PS2) 10Aude: Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 [13:42:25] Reedy: ^ [14:24:20] paravoid, ready for me to reinstall labnet1001? [14:24:38] I read the backscroll but still can't tell :) [14:25:06] * andrewbogott tries it [14:25:39] yes [14:26:03] 'k thanks [14:28:03] (03PS1) 10Jgreen: makes template symlink for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/113120 [14:30:05] (03CR) 10Jgreen: [C: 032 V: 031] makes template symlink for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/113120 (owner: 10Jgreen) [14:39:07] :) [14:41:50] Jeff_Green: did you see the launchpad link above? [14:41:56] Jeff_Green: I replied to that bug [14:42:06] Jeff_Green: that's the bug that we have on rhodium [14:42:22] basically trusty + Dell SAS 6/iR is completely broken, kernel oopses [14:43:13] hmm [14:43:19] does someone here know how our frontend caching works... namely how long it will take for anonymous users to get new html if the html changes due to code change? [14:43:25] I could allocate a private /16 to labs instances [14:43:32] that should be enough for our float network layout [14:43:37] probably not when we move to neutron though [14:43:45] Nikerabbit: up to 30 days iirc [14:44:05] (03CR) 10Addshore: [C: 031] Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 (owner: 10Aude) [14:44:29] paravoid: I see, is there any reasonable way to speed it up? [14:44:38] nope [14:44:43] okay [14:44:54] I heard some horror stories [14:45:01] of how it was done in the past, when we switched to vector [14:45:17] grandpa tim & grandpa mark were telling it [14:45:33] there were very recent ones too [14:45:42] they wrote something that did staggered PURGEs of all pages [14:45:55] what's so horror about that? [14:45:59] that was quite nice [14:46:02] mark, surely /16 is enough even for neutron? [14:46:13] andrewbogott: it's hard [14:46:19] every network needs its own subnet [14:46:24] so every project needs its own subnet [14:46:25] https://bugzilla.wikimedia.org/show_bug.cgi?id=46956 [14:46:36] Nikerabbit asked if there'sa "reasonable" way of speeding it up; I guess the answer is "it depends" [14:46:59] So each project would need e.g. 16 or 32 ips… even still [14:47:08] some projects could need a lot [14:47:14] we could support different size subnets of course [14:48:21] apparently we have a /16 allocated to "wifi at eqiad" [14:48:26] paravoid: can you repost it? i don't think I have backlogs that far back [14:48:38] Jeff_Green: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 [14:49:30] cool. thanks [14:50:35] it's an LTS release, hopefully they'll prioritize a bug that makes it completely broken in half the Dell systems out there :) [14:56:10] (03PS1) 10Wpmirrordev: Extend maximum allowed mediawiki version to 1.23 [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/113124 [14:57:05] (03CR) 10Alexandros Kosiaris: Initial debian release (031 comment) [operations/debs/git-fat] (debian) - 10https://gerrit.wikimedia.org/r/113018 (owner: 10Ottomata) [15:03:25] paravoid: i have arranged you a test kernel, mind testing? [15:03:39] what kind of kernel? [15:04:08] one with a fix to this bug [15:04:29] what kind of fix? [15:04:31] commit? [15:05:05] yes, jumped into the #ubuntu-kernel channel, and urged them to prioritize it [15:05:31] they are building a test kernel now, and looking for testers once it is done building [15:06:31] it is only one of 95 commits that broke it :P [15:06:49] i'm not going to run a test kernel unless I know what exactly I'm running [15:06:59] is this a test for a fix, or is this a git bisect? [15:07:26] bisesct [15:07:34] yeah, no thanks [15:07:38] :) [15:07:53] I'm not gonna bisect this, this is way too much work [15:08:19] ok, i'll power on server for this. [15:08:41] Bisecting: 6034 revisions left to test after this (roughly 13 steps) [15:09:11] not really, it is a scsi commit that broke it, most likely [15:09:28] and from those there are only 95 [15:09:39] I read the whole log for drivers/scsi, didn't find anything relevant [15:10:19] i think it is one of : http://dpaste.com/1614107/ [15:10:30] Bisecting: 69 revisions left to test after this (roughly 6 steps) [15:10:33] is for drivers/scsi [15:10:58] yeah [15:11:17] well, i'm out. good luck with this :) [15:11:36] qla2xxx/4xxx is FC, fcoe is FCoE, be2iscsi is iSCSI, bnx2fc is FC [15:11:45] all these are unrelated [15:11:51] agreed [15:12:10] so I read the code for all of the relevant ones, didn't find anything [15:12:38] well, just a nasty one [15:12:47] bye [15:12:57] I'm sure canonical has a few SAS 6/iR lying around [15:13:41] I don't think it's in drivers/scsi, the backtrace points to an uninitialized mutex for rescan [15:13:58] could be module load ordering or some kind of race like that [15:17:34] !log all of this week's Cirrus index updates are done except those for the wikipedias to which cirrus is deployed and commons. I [15:17:42] Logged the message, Master [15:17:47] !log I'll start both this evening, time permitting. [15:17:55] Logged the message, Master [15:30:43] mark: /16 for wifi at eqiad? Were we considering hosting the Hackaton to end all Hackatons at the DC? :-) [15:31:35] Coren, I would support a hackathon to end pmtpa at pmtpa [15:32:02] ...ending with savage dances around the burning DC [15:34:16] (03CR) 10Faidon Liambotis: [C: 04-1] "First round :)" (037 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112469 (owner: 10Alexandros Kosiaris) [15:38:11] it's a thing with realms [15:38:18] basically we have a /16 set aside for each realm like production [15:38:24] but that's why labs could have a /16 [15:50:31] (03PS1) 10Andrew Bogott: Switch to a bigger fixed IP range for eqiad labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/113132 [15:50:37] * werdna is pung [15:50:51] (03PS1) 10Mark Bergsma: Increase subnet labs-instances1-b-eqiad from a /24 to a /21 [operations/dns] - 10https://gerrit.wikimedia.org/r/113133 [15:53:17] mark, is there a corresponding change needed in manifests/network.pp? [15:53:30] i'll have a look [15:54:08] if only I could remember how git worked, i'm getting out of touch ;-p [15:54:27] seriously though [15:54:32] I had to think hard about git diff --cached [15:54:35] i worry :) [15:54:48] (03CR) 10Andrew Bogott: [C: 032] Switch to a bigger fixed IP range for eqiad labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/113132 (owner: 10Andrew Bogott) [15:55:24] andrewbogott: so network.pp only has the hosts subnets, not the instances ones [15:55:25] so no [15:55:36] (03CR) 10Mark Bergsma: [C: 032] Increase subnet labs-instances1-b-eqiad from a /24 to a /21 [operations/dns] - 10https://gerrit.wikimedia.org/r/113133 (owner: 10Mark Bergsma) [15:56:39] mark, 'labs-instances1-b-eqiad' => { 'ipv4' => '10.68.16.0/24', [15:56:43] not instance ips? [15:56:53] I mean, maybe that doesn't do anything... [15:56:59] hrm [15:57:05] that's actually in the wrong realm [15:57:20] I guess, yes, change it, no, it doesn't really do anything :) [15:58:55] ok [16:00:03] (03PS1) 10Andrew Bogott: Change eqiad instance IP range. [operations/puppet] - 10https://gerrit.wikimedia.org/r/113136 [16:08:12] (03CR) 10Cmjohnson: "typo 1001-1020 are too be removed." [operations/puppet] - 10https://gerrit.wikimedia.org/r/111132 (owner: 10Cmjohnson) [16:08:19] (03PS2) 10Cmjohnson: Removing cp1001/1002 from ganglia.pp and cache.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/111132 [16:10:10] is it just me, or is https://doc.wikimedia.org/mediawiki-core/master/js/ a redirect to itself? [16:10:15] (301 redirect) [16:10:53] (03CR) 10Cmjohnson: [C: 032] Removing cp1001/1002 from ganglia.pp and cache.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/111132 (owner: 10Cmjohnson) [16:26:52] RECOVERY - Host labnet1001 is UP: PING OK - Packet loss = 0%, RTA = 0.98 ms [16:30:32] (03PS1) 10Cmjohnson: Removing dns entries for db78 [operations/dns] - 10https://gerrit.wikimedia.org/r/113140 [17:36:56] Reedy: can you have a quick look at https://rt.wikimedia.org/Ticket/Display.html?id=1840 and let me know what to do what that ticket? [17:40:00] (03PS1) 10Odder: Add templateeditor user group, protection to rowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113146 [17:42:27] (03PS1) 10RobH: adding tendril.wikimedia.org.pem to repo [operations/puppet] - 10https://gerrit.wikimedia.org/r/113147 [17:44:32] (03CR) 10RobH: [C: 032] adding tendril.wikimedia.org.pem to repo [operations/puppet] - 10https://gerrit.wikimedia.org/r/113147 (owner: 10RobH) [17:45:01] (03CR) 10CSteipp: [C: 04-1] "+1 on updating the permission formats. Yes, the previous way was wrong. I missed that." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 (owner: 10Aude) [17:45:59] csteipp: even if this means showing wikidata rights on wikipedias? [17:47:01] i could put it with oauth, but wrap in if ( $wmgUseWikibaseRepo ) [17:49:03] (03PS3) 10Aude: Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 [17:49:10] aude: Yeah, it would be best for them to be on all lists [17:49:26] ok, since it's managed in mediawiki.org? [17:49:27] (03CR) 10jenkins-bot: [V: 04-1] Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 (owner: 10Aude) [17:49:50] Otherwise if a user looks at that grant on the central wiki (meta/mwwiki), it looks like they haven't given out those rights, when in fact they have... [17:49:55] ok [17:50:34] i think jenkins is having a bad day [17:50:35] (03PS4) 10Aude: Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 [17:50:35] Yikes... same mess we have with CentralAuth [17:51:00] note, we will be consolodating our permissions soon to have just a few [17:51:23] silly to have permissions at such granularity we do now [17:58:17] !log reedy updated /a/common to {{Gerrit|I5cde3917b}}: db1056 and db1036 full steam [17:58:24] Logged the message, Master [18:01:47] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113150 [18:01:47] (03PS1) 10Reedy: Move Wikipedias to 1.23wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113151 [18:01:47] (03PS1) 10Reedy: Group 0 wikis to 1.23wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113152 [18:01:47] (03PS1) 10Reedy: Point php symlink to 1.23wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113153 [18:01:48] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113150 (owner: 10Reedy) [18:02:22] drdee: WONTFIX or whatever we do on RT [18:02:50] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113150 (owner: 10Reedy) [18:03:08] Reedy: thanks! [18:05:55] !log reedy started scap: testwiki to 1.23wmf14 and build l10n cache [18:06:00] !log reedy scap aborted: testwiki to 1.23wmf14 and build l10n cache (duration: 00m 04s) [18:06:03] Logged the message, Master [18:06:10] Logged the message, Master [18:06:20] !log reedy started scap: testwiki to 1.23wmf14 and build l10n cache [18:06:28] Logged the message, Master [18:09:25] !log reedy scap failed: CalledProcessError Command '/usr/local/bin/mw-update-l10n' returned non-zero exit status 1 (duration: 03m 04s) [18:09:33] Logged the message, Master [18:13:19] !log reedy started scap: testwiki to 1.23wmf14 and build l10n cache [18:13:26] Logged the message, Master [18:17:24] !log reedy scap aborted: testwiki to 1.23wmf14 and build l10n cache (duration: 04m 05s) [18:17:32] Logged the message, Master [18:17:39] !log reedy started scap: testwiki to 1.23wmf14 and build l10n cache [18:17:47] Logged the message, Master [18:24:50] (03PS1) 10Odder: Enable Translate extension on OTRS wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113160 [18:25:38] is scap broken, Reedy? [18:26:11] No [18:28:32] PROBLEM - Host labs-ns1.wikimedia.org is DOWN: CRITICAL - Host Unreachable (208.80.154.19) [18:29:12] PROBLEM - Host virt1000 is DOWN: CRITICAL - Host Unreachable (208.80.154.18) [18:29:51] ^ that's me, just rebooting [18:30:03] PROBLEM - Host labnet1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:30:12] RECOVERY - Host virt1000 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [18:30:22] PROBLEM - Host virt1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:30:53] RECOVERY - Host labnet1001 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [18:31:12] it's eqiad labs, wee [18:31:12] RECOVERY - Host labs-ns1.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [18:34:03] RECOVERY - Host virt1001 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [18:37:21] mark, a new nova release appeared in the upstream repo just before I reinstalled labnet1001. So weird behavior I was seeing was because of rpc version disagreements. [18:41:11] unlucky! But, Some upgrades and reboots later, things are mostly behaving. So, bedtime for me. [18:58:20] paravoid: ping [18:59:39] !log reedy finished scap: testwiki to 1.23wmf14 and build l10n cache (duration: 41m 59s) [18:59:47] Logged the message, Master [19:04:43] !log mw1185 is segfaulting a lot [19:04:50] Logged the message, Master [19:05:13] !log mw1094 is segfaulting too, but not so much [19:05:20] Logged the message, Master [19:05:52] (03CR) 10Reedy: [C: 032] Move Wikipedias to 1.23wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113151 (owner: 10Reedy) [19:06:01] (03Merged) 10jenkins-bot: Move Wikipedias to 1.23wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113151 (owner: 10Reedy) [19:07:15] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.23wmf13, testwiki back to 1.23wmf13 too [19:07:19] !log mw1163: ssh: connect to host mw1163 port 22: Connection timed out [19:07:24] Logged the message, Master [19:07:32] Logged the message, Master [19:08:46] (03CR) 10Reedy: [C: 032] Group 0 wikis to 1.23wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113152 (owner: 10Reedy) [19:08:53] (03Merged) 10jenkins-bot: Group 0 wikis to 1.23wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113152 (owner: 10Reedy) [19:08:56] (03CR) 10Reedy: [C: 032] Point php symlink to 1.23wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113153 (owner: 10Reedy) [19:09:03] (03Merged) 10jenkins-bot: Point php symlink to 1.23wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113153 (owner: 10Reedy) [19:10:20] ori, the ball is back in your court regarding https://rt.wikimedia.org/Ticket/Display.html?id=4958 :D [19:14:08] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.23wmf14 [19:14:16] Logged the message, Master [19:16:12] everything ok on the prod? Lots of "Unable to allocate memory for pool." [19:16:24] Yup [19:16:41] Normall APC bitching when we stop using a mw version and start using a new one [19:16:48] not enough space for 3 MW versions simultaneously [19:20:05] thx Reedy [19:50:00] (03PS2) 10Jforrester: Enable VisualEditor for Projekt: NS on sewikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113141 (owner: 10Odder) [19:50:58] reedy: mw1163 is under going a memtest rt6741 [19:51:53] (03CR) 10CSteipp: [C: 031] Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 (owner: 10Aude) [19:54:46] Hi, Reedy [19:55:57] I'm looking at the error you reported (https://bugzilla.wikimedia.org/show_bug.cgi?id=61330) [19:56:39] I'm assuming it's only happening on Special:Notifiactions? [19:57:34] (03CR) 10Jforrester: [C: 031] Enable VisualEditor for Projekt: NS on sewikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113141 (owner: 10Odder) [19:57:35] I didn't look [19:58:20] AndyRussG: Nope [19:58:35] There's seemingly a stack trace via the api too [19:58:45] Reedy: Whihc API? [19:58:53] The MediaWiki API? [19:59:06] URL: http://en.wikipedia.org/w/api.php?action=query&format=json&meta=notifications¬format=flyout¬limit=8¬prop=index%7Clist%7Ccount&_=1392321414644 [19:59:34] Ah OK got it [20:00:07] K that still fits with the theory I have so far [20:00:33] I should have a patch ina little while [20:04:09] (03PS1) 10QChris: Add zero tag for carrier 413-02 for simlpewiki on zerodot [operations/puppet] - 10https://gerrit.wikimedia.org/r/113167 [20:04:18] (03PS1) 10QChris: Add zero tag for carrier 623-03 for yowiki on mdot [operations/puppet] - 10https://gerrit.wikimedia.org/r/113168 [20:06:12] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [20:06:22] PROBLEM - Varnish traffic logger on amssq47 is CRITICAL: Timeout while attempting connection [20:06:42] PROBLEM - Varnish traffic logger on amssq48 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:07:03] PROBLEM - Varnishkafka log producer on cp3019 is CRITICAL: Timeout while attempting connection [20:07:03] PROBLEM - Varnish traffic logger on cp3010 is CRITICAL: Timeout while attempting connection [20:07:32] PROBLEM - Varnish HTCP daemon on amssq48 is CRITICAL: Timeout while attempting connection [20:07:32] RECOVERY - Varnish traffic logger on amssq48 is OK: PROCS OK: 2 processes with command name varnishncsa [20:07:32] PROBLEM - Varnish HTTP mobile-frontend on cp3014 is CRITICAL: Connection timed out [20:07:53] RECOVERY - Varnish traffic logger on cp3010 is OK: PROCS OK: 2 processes with command name varnishncsa [20:07:53] RECOVERY - Varnishkafka log producer on cp3019 is OK: PROCS OK: 1 process with command name varnishkafka [20:08:22] RECOVERY - Varnish traffic logger on amssq47 is OK: PROCS OK: 2 processes with command name varnishncsa [20:08:32] RECOVERY - Varnish HTTP mobile-frontend on cp3014 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 0.597 second response time [20:08:32] RECOVERY - Varnish HTCP daemon on amssq48 is OK: PROCS OK: 1 process with UID = 110 (vhtcpd), args vhtcpd [20:14:46] greg-g, do you know who is the person to ask about gerrit permissions? [20:14:53] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 86.73333 [20:14:58] I was guessing maybe Chad [20:15:00] !log Rebuilding GeoData index [20:15:08] Logged the message, Master [20:15:57] yeah, chad [20:17:02] PROBLEM - Packetloss_Average on erbium is CRITICAL: packet_loss_average CRITICAL: 9.36392773196 [20:17:03] PROBLEM - Packetloss_Average on emery is CRITICAL: packet_loss_average CRITICAL: 8.64815796875 [20:18:53] RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:20:03] PROBLEM - Auth DNS on ns2.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [20:20:03] PROBLEM - HTTPS on amssq47 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:20:22] PROBLEM - Varnish HTTP text-backend on amssq50 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:20:36] 502 Bad Gateway, yeah [20:20:53] RECOVERY - HTTPS on amssq47 is OK: OK - Certificate will expire on 01/20/2016 12:00. [20:21:02] PROBLEM - HTTPS on ssl3001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:21:52] RECOVERY - Auth DNS on ns2.wikimedia.org is OK: DNS OK: 0.118 seconds response time. www.wikipedia.org returns [20:21:53] RECOVERY - HTTPS on ssl3001 is OK: OK - Certificate will expire on 01/20/2016 12:00. [20:22:02] PROBLEM - Varnish HTTP text-frontend on amssq54 is CRITICAL: Connection timed out [20:22:02] PROBLEM - Varnish HTTP bits on cp3022 is CRITICAL: Connection timed out [20:22:12] RECOVERY - Varnish HTTP text-backend on amssq50 is OK: HTTP OK: HTTP/1.1 200 OK - 190 bytes in 0.238 second response time [20:22:22] PROBLEM - Varnish HTTP upload-frontend on cp3007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:22:36] Not looking good. [20:22:52] RECOVERY - Varnish HTTP text-frontend on amssq54 is OK: HTTP OK: HTTP/1.1 200 OK - 199 bytes in 0.224 second response time [20:22:53] RECOVERY - Varnish HTTP bits on cp3022 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 1.593 second response time [20:23:02] PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 8.41106622449 [20:23:09] !log reedy synchronized php-1.23wmf14/extensions/FlaggedRevs 'Fix syntax errors' [20:23:12] RECOVERY - Varnish HTTP upload-frontend on cp3007 is OK: HTTP OK: HTTP/1.1 200 OK - 230 bytes in 1.596 second response time [20:23:16] Logged the message, Master [20:23:32] PROBLEM - Varnish traffic logger on amssq60 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:23:42] PROBLEM - Varnish traffic logger on cp3006 is CRITICAL: Timeout while attempting connection [20:24:22] PROBLEM - Varnish HTTP upload-frontend on cp3005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:24:22] RECOVERY - Varnish traffic logger on amssq60 is OK: PROCS OK: 2 processes with command name varnishncsa [20:24:22] PROBLEM - SSH on cp3022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:24:32] PROBLEM - LVS HTTPS IPv4 on upload-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:24:38] Reedy: was that sync addressing the stuff going on in here? [20:24:42] RECOVERY - Varnish traffic logger on cp3006 is OK: PROCS OK: 2 processes with command name varnishncsa [20:24:45] Nope [20:24:49] Completely unrelated [20:24:54] so, yeah, opsen, what's going on? [20:25:07] no idea [20:25:12] RECOVERY - Varnish HTTP upload-frontend on cp3005 is OK: HTTP OK: HTTP/1.1 200 OK - 230 bytes in 0.174 second response time [20:25:12] RECOVERY - SSH on cp3022 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [20:25:15] i didn't touch it [20:25:18] The Wiki is just a little bit slowwww. [20:25:22] PROBLEM - Varnish HTCP daemon on cp3008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:25:22] RECOVERY - LVS HTTPS IPv4 on upload-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 573 bytes in 2.887 second response time [20:25:34] sjoerddebruin: yeah, european dc is looking bad [20:25:37] Network graphs look pretty weird for the last half an hour or so [20:25:44] Link? [20:25:58] bits.wikimedia.org is sending HTTP 502 errors [20:26:01] It’s getting usual the last time. [20:26:12] RECOVERY - Varnish HTCP daemon on cp3008 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [20:26:12] PROBLEM - Varnish HTTP text-frontend on amssq55 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:26:20] the ganglia graphs are pretty messed up all right [20:26:22] https://ganglia.wikimedia.org/latest/ [20:27:03] RECOVERY - Varnish HTTP text-frontend on amssq55 is OK: HTTP OK: HTTP/1.1 200 OK - 198 bytes in 1.674 second response time [20:27:15] weird dropouts for esams: https://ganglia.wikimedia.org/latest/?c=LVS%20loadbalancers%20esams&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [20:28:03] PROBLEM - LVS HTTPS IPv4 on text-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:16] corresponding to network drop from eqiad: https://ganglia.wikimedia.org/latest/?c=Bits%20application%20servers%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [20:29:02] why.... https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&me=Wikimedia&m=cpu_report&s=by+name&mc=2&g=network_report [20:29:03] RECOVERY - LVS HTTPS IPv4 on text-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 64557 bytes in 7.434 second response time [20:29:19] yikes [20:29:47] paravoid: about? [20:29:54] sorry for the late ping, paravoid :/ [20:30:03] PROBLEM - Varnish HTCP daemon on amssq62 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:30:14] PROBLEM - Varnish HTCP daemon on amssq49 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:30:46] 33% packet loss pinging from palladium to amslvs4 [20:30:53] RECOVERY - Varnish HTCP daemon on amssq62 is OK: PROCS OK: 1 process with UID = 110 (vhtcpd), args vhtcpd [20:30:57] two BGP sessions came UP at 12:24, same time i got paged. not sure why sessions coming up would cause this though. [20:31:12] PROBLEM - Varnish HTTP upload-frontend on cp3010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:31:18] maybe they go over a flaky route? [20:31:28] well, at least people are getting paged.... [20:32:01] they are? [20:32:02] PROBLEM - Varnish HTTP mobile-frontend on cp3013 is CRITICAL: Connection timed out [20:32:03] RECOVERY - Varnish HTCP daemon on amssq49 is OK: PROCS OK: 1 process with UID = 110 (vhtcpd), args vhtcpd [20:32:09] jgage did [20:32:11] why hasnt my phone gone off. [20:32:12] RECOVERY - Varnish HTTP upload-frontend on cp3010 is OK: HTTP OK: HTTP/1.1 200 OK - 230 bytes in 4.991 second response time [20:32:22] PROBLEM - Varnish HTCP daemon on amssq47 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:32:37] (03CR) 10Cmjohnson: [C: 031] remove db9 from dsh and dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/113082 (owner: 10Dzahn) [20:32:41] I did [20:32:46] librenms says that cr2-knams went down and came back up, but it's unclear if that means reboot or just unreaechable [20:32:53] RECOVERY - Varnish HTTP mobile-frontend on cp3013 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 2.882 second response time [20:32:54] I got pages which is why I looked in [20:32:55] should I sms mark? [20:33:03] seems appropriate [20:33:12] RECOVERY - Varnish HTCP daemon on amssq47 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [20:33:18] i'll sms then [20:33:19] yes, about 2 minutes ago Device Up: cr2-knams.wikimedia.org [20:33:22] RobH: yea [20:33:39] xe-1-1-0.cr2-knams.wikimedia.org 50% packet loss watching mtr on palladium, meh [20:34:05] Port saturation threshold alarm: cr1-esams. [20:34:18] txt'd mark to hop online [20:34:23] * greg-g stands back [20:34:46] several more bgp sessions just came up [20:35:30] (03CR) 10Yurik: [C: 04-1] Add zero tag for carrier 623-03 for yowiki on mdot (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/113168 (owner: 10QChris) [20:35:33] core router2 went down like 9 minutes ago and came back 2 minutes ago,, well that's what the mails say [20:35:51] librenms lies a lot too though [20:35:59] cuz its told me pdu's rebooted, when i checked they had not. [20:36:25] but seems like it may have in this case. [20:37:02] PROBLEM - Varnish traffic logger on cp3009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:37:26] On a huge lag, I guess you all know the site is slow as hell [20:37:39] oh my [20:37:46] i haven't learned how to login to the routers yet, but the junos command we need to confirm a reboot is 'show system uptime'. [20:37:53] RECOVERY - Varnish traffic logger on cp3009 is OK: PROCS OK: 2 processes with command name varnishncsa [20:39:02] PROBLEM - Varnish HTCP daemon on cp3005 is CRITICAL: Timeout while attempting connection [20:39:22] PROBLEM - Varnish HTCP daemon on amssq47 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:39:22] yes well, the network admin pass i have isnt workin [20:39:32] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 93.333336 [20:40:03] cr2-knams? [20:40:05] 8:39PM up 189 days, 20:51, 2 users, load averages: 0.12, 0.22, 0.16 [20:40:10] that's what it tells me [20:40:12] PROBLEM - Varnish traffic logger on cp3004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:40:13] RECOVERY - Varnish HTCP daemon on amssq47 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [20:40:13] yea, so not reboot [20:40:22] but some odd issue [20:40:36] i've texted mark, no answer yet, so i guess time to page someone else for netowrk, faidon [20:40:50] no need [20:40:52] i was here [20:40:55] k [20:40:58] i hadnt yet, thx [20:41:00] i was just interested in seeing your reactions [20:41:02] RECOVERY - Varnish HTCP daemon on cp3005 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [20:41:03] RECOVERY - Varnish traffic logger on cp3004 is OK: PROCS OK: 2 processes with command name varnishncsa [20:41:32] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:41:33] if netierh of you showed up, woudl have just been a push traffic away from there [20:41:41] but since i havent done that in a long time [20:41:45] would have been... interesting. [20:42:03] our link between knams and eqiad had issues [20:43:06] So, worst case and you and faidon both were not available. I'm not sure anyone else is comfortable troubleshooting link issues; so the failover to route away from esams entirely would have been next step? [20:44:16] still has, it seems [20:44:38] yeah not looking good [20:44:39] http://smokeping.wikimedia.org/?target=DNS.ns2 [20:45:21] So is failover just editing the config-geo and pushing update? [20:45:31] yes [20:45:32] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:45:56] i would love to see standard procedures like that documented on the wiki [20:46:01] (how) does this relate to all the varnish alerts? [20:46:12] in what manner, just replacing all 'esams' to 'eqiad' in the resources? [20:46:13] jgage: they are [20:46:19] oh? [20:46:21] (or at least, they were?) [20:46:26] dns has a mention of it [20:46:26] * jgage looks around [20:46:29] but its not specific [20:46:35] https://wikitech.wikimedia.org/wiki/Dns#HOWTO [20:46:43] its why im asking now =] [20:46:53] cuz i havent done it in awhile, so i wanna do it if needed now. [20:47:14] might be a good thing to do some hands-on training with, i.e. at an ops hackathon [20:47:19] it's certainly not very specific since we've been using gdnsd [20:47:30] good thing we have the gdnsd author around, at least :P [20:47:35] well, i think i update the resources to force it to eqiad [20:47:36] but not sure [20:47:55] updating the map is relatively straightforward [20:48:02] oh, wait [20:48:04] only primary map [20:48:14] but in the next major release, we'll have a mechanism to disable a datacenter at runtime [20:48:16] that forces to just eqiad though [20:48:28] (03CR) 10QChris: Add zero tag for carrier 623-03 for yowiki on mdot (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/113168 (owner: 10QChris) [20:48:42] so we can construct the map such that each country (or whatever) has a fallback option or two, and we just "disable esams" as a runtime command to let them fall over, without changing config and/or restarting [20:48:52] that would be nice brandon [20:49:03] that's what we had before gdnsd (except less flexible) [20:49:44] so not sure on the config-geo change now though. [20:49:45] technically, gdnsd already has that behavior, but only for address-based decisions, not CNAMEs [20:49:57] and we use CNAMEs [20:50:13] but the stuff in the new version will handle CNAMEs and logical datacenters for manual force-down [20:50:27] * RobH is tired of asking. [20:50:36] i guess we arent in outage or someone else is fixing. [20:51:24] currently, updating config-geo to "disable esams" is somewhat problematic because we don't have a predetermined idea of what to fail them to [20:51:45] s/esams/eqiad/ [20:51:51] I mean, you could s/esams/eqiad/ in the map, but that sends it all to eqiad. maybe for load reasons we want to share some of it to ulsfo instead [20:52:10] id think all eqiad. [20:52:16] ulsfo is the opposite way for that traffic [20:52:38] but then im just changing all the coded entries for the resources stanza at the bottom [20:52:59] well, the question is whether eqiad has spare capacity for all esams load. I'm guessing from the above the answer is yes, but I wouldn't have known that for sure without looking around at stuff. [20:53:04] it should [20:53:11] but yeah, ideally we'll do better in the future [20:53:34] did some testing around the globe [20:53:53] from tokyo, and asia lots of packet loss [20:54:03] PROBLEM - SSH on hooft is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:54:15] (03PS1) 10RobH: re-routing all esams to eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/113178 [20:54:18] london going to eqiad, slow, but no packet loss [20:54:19] one of you folks review that please? [20:54:24] (03CR) 10jenkins-bot: [V: 04-1] re-routing all esams to eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/113178 (owner: 10RobH) [20:54:27] bah, nm [20:54:30] jenkins hated me. [20:54:44] new york is ok [20:54:45] oh, i see [20:54:49] i fucked that up, disregard [20:54:53] RECOVERY - SSH on hooft is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [20:54:55] RobH: yeah I was gonna say :) [20:55:01] leave the keys on the left alone, if you go that route [20:55:04] my find replace was a bit overmuch! [20:55:08] and hong kong is horrible [20:55:36] luckily faidon's jenkings+gdnsd caught that anyways [20:55:42] (03PS2) 10RobH: re-routing all esams to eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/113178 [20:55:53] stop that rob [20:56:09] ? [20:56:19] sorry to say that i must now afk due to familial obligation. wish i could stay. here's hoping for a quick resolution, i look forward to reading the incident report. [20:56:20] i wasnt merging. [20:56:46] i was preparing if it needed to be done, but if its not needed then i'll stop. [20:57:32] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 77.76667 [21:00:03] PROBLEM - Varnish HTTP text-frontend on amssq51 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:00:20] !log Disabling OSPF on cr2-knams:xe-1/1/0.0 [21:00:27] Logged the message, Master [21:00:32] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:00:53] RECOVERY - Varnish HTTP text-frontend on amssq51 is OK: HTTP OK: HTTP/1.1 200 OK - 198 bytes in 0.189 second response time [21:01:28] are the varnish flaps just nrpe failures? [21:03:09] wikiquote is down for me... [21:03:59] * mark is looking [21:04:42] k [21:05:12] prefer_ipv6 is off... [21:05:22] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 440382 bytes in 7.917 second response time [21:05:46] Request: GET http://es.wikipedia.org/w/, from 91.198.174.64 via amssq60 amssq60 ([91.198.174.70]:3128), Varnish XID 732858715 [21:05:54] Error: 503, Service Unavailable [21:07:29] better now? [21:07:41] doesn't seem so [21:09:12] now i have a response [21:09:12] PROBLEM - Varnishkafka Delivery Errors on cp3014 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 0.016667 [21:09:53] PROBLEM - Varnishkafka Delivery Errors on cp3013 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 0.016667 [21:09:54] mark: seems to be ok again from my side. [21:09:55] !log Reenabled OSPF on cr2-knams:xe-1/1/0.0 [21:09:56] palladium to amslvs4 looks better [21:10:04] Logged the message, Master [21:10:41] mutante: fyi, i have an iceweasel that works with https://sni.velox.ch/ (and the subdomain variants listed there which i tried some of) but *not* with bugzilla.w.o [21:10:45] wikipedia is working now for me [21:11:02] RECOVERY - Packetloss_Average on erbium is OK: packet_loss_average OKAY: 1.3758927551 [21:11:03] RECOVERY - Packetloss_Average on emery is OK: packet_loss_average OKAY: 3.62182234375 [21:11:12] RECOVERY - Varnishkafka Delivery Errors on cp3014 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:11:12] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1119.93335 [21:11:34] seeing no packet loss anymore [21:11:53] RECOVERY - Varnishkafka Delivery Errors on cp3013 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:12:22] (03PS2) 10Odder: Add templateeditor user group, protection to rowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113146 [21:12:32] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 6422.933105 [21:12:42] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2591.233398 [21:12:53] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 425.866669 [21:13:03] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3125.699951 [21:13:03] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3045.800049 [21:13:12] RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:14:35] (03CR) 10Reedy: [C: 04-1] "Per checking" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110668 (owner: 10Hoo man) [21:15:00] (03Abandoned) 10RobH: re-routing all esams to eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/113178 (owner: 10RobH) [21:15:24] (03PS3) 10Reza: Disable minor edit on NewUserMessage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110953 [21:15:28] (03CR) 10Reedy: [C: 032] Disable minor edit on NewUserMessage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110953 (owner: 10Reza) [21:15:35] (03Merged) 10jenkins-bot: Disable minor edit on NewUserMessage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110953 (owner: 10Reza) [21:17:02] RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 1.43932234694 [21:18:53] RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:20:14] (03CR) 10Reedy: "Where is this used?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111434 (owner: 10Odder) [21:20:57] (03PS3) 10Odder: Add National Library of Wales to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111827 [21:21:04] (03CR) 10Reedy: [C: 032] Add National Library of Wales to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111827 (owner: 10Odder) [21:21:07] (03CR) 10Odder: "I have no idea. The Labs People have been unusually silent." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111434 (owner: 10Odder) [21:21:27] (03Merged) 10jenkins-bot: Add National Library of Wales to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111827 (owner: 10Odder) [21:22:35] (03PS2) 10Tim Landscheidt: Fix typo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112239 [21:22:41] (03CR) 10Reedy: [C: 032] Fix typo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112239 (owner: 10Tim Landscheidt) [21:22:52] (03Merged) 10jenkins-bot: Fix typo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112239 (owner: 10Tim Landscheidt) [21:28:09] Coren: https://gerrit.wikimedia.org/r/111827 & https://bugzilla.wikimedia.org/show_bug.cgi?id=60964 [21:28:53] or rather https://gerrit.wikimedia.org/r/#/c/111434/ & https://bugzilla.wikimedia.org/show_bug.cgi?id=52583 [21:29:05] I was about to say, Reedy already merged that. :-) [21:29:06] Coren: you were assigned to 52583 at some point [21:29:42] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:30:02] (03PS2) 10Odder: Add Apple Touch icon for Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111434 [21:30:03] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:31:35] (03PS2) 10Odder: Let admins add users to 'accountcreator' on itwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112337 [21:31:49] (03CR) 10Reedy: [C: 032] Let admins add users to 'accountcreator' on itwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112337 (owner: 10Odder) [21:32:00] (03Merged) 10jenkins-bot: Let admins add users to 'accountcreator' on itwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112337 (owner: 10Odder) [21:32:35] I guess https://bugzilla.wikimedia.org/show_bug.cgi?id=61319 is caused by the cluster failover [21:33:02] purge all the things [21:33:22] they say purge does not help [21:34:19] I guess the time will put things where they belong :P [21:36:03] (03CR) 10Reedy: [C: 04-1] "Still referenced on the noc conf screen. Needs removing from there and symlink creation script" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111766 (owner: 10TTO) [21:37:03] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:37:12] (03PS3) 10Odder: Add templateeditor user group, protection to rowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113146 [21:37:17] (03CR) 10Reedy: [C: 032] Add templateeditor user group, protection to rowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113146 (owner: 10Odder) [21:37:25] (03Merged) 10jenkins-bot: Add templateeditor user group, protection to rowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113146 (owner: 10Odder) [21:37:50] (03PS3) 10Odder: Enable VisualEditor for Projekt: NS on sewikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113141 [21:37:54] (03CR) 10Reedy: [C: 032] Enable VisualEditor for Projekt: NS on sewikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113141 (owner: 10Odder) [21:38:04] (03Merged) 10jenkins-bot: Enable VisualEditor for Projekt: NS on sewikimedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113141 (owner: 10Odder) [21:38:20] Reedy: Thanks. :-) [21:39:29] (03CR) 10Reedy: [C: 04-1] Fix OAuth rights for Wikidata (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 (owner: 10Aude) [21:39:43] eh, James_F and his scrupulousness :-) [21:40:11] (03PS2) 10Odder: Set wmgBabelCategoryNames for Esperanto Wiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113142 [21:40:12] twkozlowski: Given that Certain Other People™ have broken VE with config changes, I watch them like a hawk. :-) [21:40:23] (03CR) 10Reedy: [C: 032] Set wmgBabelCategoryNames for Esperanto Wiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113142 (owner: 10Odder) [21:40:32] (03Merged) 10jenkins-bot: Set wmgBabelCategoryNames for Esperanto Wiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113142 (owner: 10Odder) [21:40:36] Reedy: You doing https://gerrit.wikimedia.org/r/#/c/112850/ (legalteamwiki) too? [21:40:46] James_F: sure; it's you who'll be using the config, not me :-P [21:41:05] twkozlowski: On Swedish Wikimedia wiki? I think it's unlikely. :-) [21:41:27] James_F: the VE config in general, I meant [21:41:40] See your scrupulousness, you killjoy? :-P [21:41:55] * twkozlowski hugs James_F [21:42:17] * James_F grins. [21:43:52] (03CR) 10coren: [C: 032] "Ze icon, she is prettie." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111434 (owner: 10Odder) [21:46:14] 'wgAppleTouchIcon' => array( [21:46:14] 'default' => false, // iOS searches for icons in docroot by default [21:46:30] docroot/bits/apple-touch/labs.png [21:46:37] That isn't the docroot for labswiki... [21:46:48] ask Krinkle|detached :) [21:47:11] or Reedy [21:47:29] one of them promised to write appletouch.php magic [21:47:42] I think I did [21:48:09] Reedy: My understanding is that that changeset was only intended to make the icon available, and that it'd then be used from elsewhere. [21:48:23] Yes, because I don't know what you want to use it for, Coren [21:48:27] Why not do it in the same changeset? [21:48:40] No one answered my question in BZ :-( [21:49:01] https://bugzilla.wikimedia.org/show_bug.cgi?id=52583#c6 [21:49:08] 'labswikiwhateveritscalledwiki' => '//bits.beta.wmflabs.org/apple-touch/apple-labs.png', [21:49:22] 'labswikiwhateveritscalledwiki' => '//bits.beta.wmflabs.org/apple-touch/labs.png', [21:49:51] Reedy: I'd ask Odder. I just +2'ed the innocuous patch he wrote. :-) [21:49:59] https://github.com/wikimedia/operations-mediawiki-config/blob/master/w/touch.php [21:50:07] if ( $wgAppleTouchIcon === false ) { [21:50:08] # That's not very helpful, that's where we are already [21:50:08] header( 'HTTP/1.1 404 Not Found' ); [21:50:24] And was about to ask if there were caveats to sync-docroot before I went ahead. [21:50:36] * twkozlowski admits to not being able to write such icons as SVGs yet [21:50:52] You don't even need to run sync-docroot if it's only for labs [21:52:28] Reedy: that bit of config was added before you decided to simplify docroots [21:52:51] the 'default' => false, I mean [21:53:29] and? [21:53:46] so for a moment we had those icons in docroots [21:54:00] and? [21:54:09] it needs to be not false for the image to be served for a wiki [21:54:18] then they got moved to bits, but the config remained as it was [21:54:36] Doesn't make any difference [21:55:11] whatever, we serve Wikipedias with wikipedia.png already [21:55:26] And we have entry for that [21:55:27] 'wiki' => '//bits.wikimedia.org/apple-touch/wikipedia.png', [21:57:00] The point is still adding it to a bits docroot will essentially make no difference [21:57:37] is this the right place to point to something that might be broken? [21:58:03] Possibly [21:58:11] Depends what the something is [21:58:13] Reedy: Not the point [21:58:15] well - this http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Biography/Infoboxes [21:58:16] We can't help if facebook is down [21:58:23] Reedy: Labs people wanted an icon, I gave it to them [21:59:01] Next you'll have them telling you it doesn't work [21:59:33] it works. it's a prettie. [22:00:18] http://deployment.wikimedia.beta.wmflabs.org/apple-touch.png [22:00:19] 404 [22:00:22] (03PS1) 10Dzahn: load Bugzilla virtual host first [operations/puppet] - 10https://gerrit.wikimedia.org/r/113265 [22:00:59] oi, I'm getting Wikimedia Foundation errors too [22:03:04] 'wgAppleTouchIcon' => array( 'labswiki' => '//bits.beta.wmflabs.org/apple-touch/labs.png', ), [22:03:40] Reedy: as I told you, I don't know what they want to use it for [22:03:48] I asked, no one answered. [22:04:03] So they can configure it for themselves, they have the icon ready [22:04:13] right Coren [22:04:34] + I'm seeing Error: 503, Service Unavailable at Thu, 13 Feb 2014 22:04:21 GMT [22:05:04] twkozlowski, same problem for me [22:06:30] (03PS11) 10Yurik: Handle HTTPS for Zero traffic [operations/puppet] - 10https://gerrit.wikimedia.org/r/102316 [22:12:28] !log restarting EL on vanadium [22:12:35] Logged the message, Master [22:18:39] (03PS3) 10Ebe123: Add transwiki import options for zh.wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110876 [22:18:53] PROBLEM - Check status of defined EventLogging jobs on vanadium is CRITICAL: CRITICAL: Stopped EventLogging jobs: consumer/vanadium [22:18:55] (03CR) 10Reedy: [C: 032] Add transwiki import options for zh.wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110876 (owner: 10Ebe123) [22:19:02] (03Merged) 10jenkins-bot: Add transwiki import options for zh.wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110876 (owner: 10Ebe123) [22:20:15] (03PS4) 10Ebe123: Remove ability of admins to give import userright on zh.wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112824 [22:20:22] (03CR) 10Reedy: [C: 032] Remove ability of admins to give import userright on zh.wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112824 (owner: 10Ebe123) [22:20:30] (03Merged) 10jenkins-bot: Remove ability of admins to give import userright on zh.wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112824 (owner: 10Ebe123) [22:21:00] (03PS2) 10Chad: Lower search cache expiry to 12 hours on all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112577 [22:21:06] (03CR) 10Reedy: [C: 032] Lower search cache expiry to 12 hours on all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112577 (owner: 10Chad) [22:22:12] (03Merged) 10jenkins-bot: Lower search cache expiry to 12 hours on all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112577 (owner: 10Chad) [22:22:28] <^d> Reedy: Oh cool, thanks [22:22:46] <^d> manybubbles: prefix cache -> 12h now :) [22:23:07] Where do we file Jenkins bugs? [22:23:12] sweet [22:23:16] Wikimedia -> QA? [22:23:34] <^d> WM -> CI [22:23:37] (03PS4) 10Ebe123: Add namespace aliases for ang.wikipedia and wiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112409 [22:24:04] Duh, thanks [22:25:02] (03PS1) 10Ori.livneh: Disable EventLogging's MongoDB writer [operations/puppet] - 10https://gerrit.wikimedia.org/r/113272 [22:25:08] (03CR) 10Reedy: [C: 032] Add namespace aliases for ang.wikipedia and wiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112409 (owner: 10Ebe123) [22:25:20] (03Merged) 10jenkins-bot: Add namespace aliases for ang.wikipedia and wiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112409 (owner: 10Ebe123) [22:25:29] (03CR) 10Ori.livneh: [C: 032 V: 032] Disable EventLogging's MongoDB writer [operations/puppet] - 10https://gerrit.wikimedia.org/r/113272 (owner: 10Ori.livneh) [22:26:02] (03PS2) 10Ebe123: Add el.wikivoyage to it.wikivoyage transwiki import [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110822 [22:26:07] (03CR) 10Reedy: [C: 032] Add el.wikivoyage to it.wikivoyage transwiki import [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110822 (owner: 10Ebe123) [22:26:34] soo.... what's the status of esams? [22:26:38] RobH: ^ ? [22:26:42] (03PS2) 10Jforrester: Enable wgTemplateDataUseGUI on MediaWiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112156 [22:26:46] * greg-g pings the SF-based person [22:26:52] (03Merged) 10jenkins-bot: Add el.wikivoyage to it.wikivoyage transwiki import [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110822 (owner: 10Ebe123) [22:26:54] (03CR) 10Reedy: [C: 032] Enable wgTemplateDataUseGUI on MediaWiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112156 (owner: 10Jforrester) [22:27:11] RobH: apparently people are still experiencing 503s? [22:27:17] greg-g: i dunno man [22:27:22] i wasnt involved with the fix [22:27:22] (03Merged) 10jenkins-bot: Enable wgTemplateDataUseGUI on MediaWiki.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112156 (owner: 10Jforrester) [22:27:25] mark fixed it. [22:27:53] 16:11 < mark> seeing no packet loss anymore [22:27:58] (timezone on that is eastern) [22:28:19] i dunno what exactly broke or what was done to fix it, some routing stuff. [22:28:27] yeah, that stuff is way over my head [22:28:31] (03PS2) 10Odder: Remove redundant GroupOverride entries [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112339 [22:28:34] (03CR) 10Reedy: [C: 032] Remove redundant GroupOverride entries [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112339 (owner: 10Odder) [22:28:59] (03Merged) 10jenkins-bot: Remove redundant GroupOverride entries [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112339 (owner: 10Odder) [22:29:12] (03PS3) 10Gerrit Patch Uploader: Changes to patrol settings for shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112306 [22:29:17] (03CR) 10Reedy: [C: 032] Changes to patrol settings for shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112306 (owner: 10Gerrit Patch Uploader) [22:30:08] (03PS2) 10Matanya: (bug 61014) add he.wiki checkusers additional rights [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111985 [22:30:10] (03Merged) 10jenkins-bot: Changes to patrol settings for shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112306 (owner: 10Gerrit Patch Uploader) [22:30:12] (03CR) 10Reedy: [C: 032] (bug 61014) add he.wiki checkusers additional rights [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111985 (owner: 10Matanya) [22:30:49] (03PS3) 10Matanya: (bug 61014) add he.wiki checkusers additional rights [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111985 [22:30:53] RECOVERY - Check status of defined EventLogging jobs on vanadium is OK: OK: All defined EventLogging jobs are runnning. [22:30:54] (03CR) 10Reedy: [C: 032] (bug 61014) add he.wiki checkusers additional rights [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111985 (owner: 10Matanya) [22:31:11] Reedy: Did the scap for wmf14 fail? Seeing a few failed messages for VE's new English ones on test2… [22:32:21] The only one that ran to completion didn't [22:32:29] (didn't fail) [22:32:30] (03Merged) 10jenkins-bot: (bug 61014) add he.wiki checkusers additional rights [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111985 (owner: 10Matanya) [22:32:38] Reedy: Hmm. :-( [22:33:23] !log reedy synchronized wmf-config/ 'All of the changes' [22:33:28] :-) [22:33:30] Logged the message, Master [22:34:50] Wonder if localisation update will fix it [22:36:13] urllib2.HTTPError: HTTP Error 503: Service Unavailable [22:36:55] (03PS1) 10Ori.livneh: eventlogging: manage /etc/eventlogging.d recursively [operations/puppet] - 10https://gerrit.wikimedia.org/r/113277 [22:38:03] is it known that english-language messages are showing up on non-english wikis? [22:38:15] e.g. https://pl.wikipedia.org/w/index.php?title=Zbigniew_Romaszewski&diff=38696121&oldid=38683613 "(6 intermediate revisions by 6 users not shown)", this was previously translated [22:38:37] I think i10n upate is running [22:41:32] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [22:41:56] Not for another few hours yet hoo [22:42:44] i'm complaining because this is apparnetly visible enough that people raised the issue on our village pump [22:51:55] !log ori synchronized php-1.23wmf13/extensions/NavigationTiming 'Update NavigationTiming to master for I4aa367e96: Round 'mediaWikiLoadComplete' to comply with schema' [22:52:04] Logged the message, Master [22:54:32] !log ori synchronized php-1.23wmf14/extensions/NavigationTiming 'Update NavigationTiming to master for I4aa367e96: Round 'mediaWikiLoadComplete' to comply with schema' [22:54:39] Logged the message, Master [22:56:14] !log ori synchronized php-1.23wmf13/extensions/WikimediaEvents 'Update WikimediaEvents to master for If3d214319: Don't log NewEditorEdit for anons' [22:56:22] Logged the message, Master [23:04:40] !log Created Translate tables on otrs_wikiwiki [23:04:46] Logged the message, Master [23:06:09] Reedy: thank you for merging the he.wiki CU patch [23:07:46] (03PS1) 10Dzahn: remove mw1163 from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/113287 [23:07:59] see matanya, now you have a reason to join my appreciation thread [23:08:17] twkozlowski: i have like 30 reasons [23:08:33] then go :-) add some love to the mailing list [23:08:34] (03PS2) 10Dzahn: remove mw1163 from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/113287 [23:08:53] mutante: :) thank you [23:09:02] but i prefer saying thank you directly to people rather than noise on ML :) [23:09:09] why did i change the line with tmh2 ?:p [23:09:15] i didnt even touch it [23:10:26] (03CR) 10Dzahn: [C: 032] remove mw1163 from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/113287 (owner: 10Dzahn) [23:11:21] meh, newlines [23:11:44] (03PS1) 10Yurik: Removed obsolete carrier 405-25 [operations/puppet] - 10https://gerrit.wikimedia.org/r/113289 [23:12:14] (03PS2) 10Odder: Enable Translate extension on OTRS wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113160 [23:12:57] (03CR) 10Reedy: [C: 032] Enable Translate extension on OTRS wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113160 (owner: 10Odder) [23:13:04] (03Merged) 10jenkins-bot: Enable Translate extension on OTRS wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113160 (owner: 10Odder) [23:13:39] (03PS2) 10Chad: Adding new configuration option "listenLocalOnly" [operations/debs/lucene-search-2] - 10https://gerrit.wikimedia.org/r/112871 (owner: 10Patchon) [23:13:54] (03Abandoned) 10Chad: Replacing spaces with tabs. [operations/debs/lucene-search-2] - 10https://gerrit.wikimedia.org/r/113091 (owner: 10Patchon) [23:15:37] (03CR) 10Chad: [C: 032] Adding new configuration option "listenLocalOnly" [operations/debs/lucene-search-2] - 10https://gerrit.wikimedia.org/r/112871 (owner: 10Patchon) [23:15:50] (03Merged) 10jenkins-bot: Adding new configuration option "listenLocalOnly" [operations/debs/lucene-search-2] - 10https://gerrit.wikimedia.org/r/112871 (owner: 10Patchon) [23:17:54] (03PS5) 10Chad: Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 (owner: 10Aude) [23:23:59] (03PS6) 10Aude: Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 [23:24:05] (03CR) 10Reedy: [C: 032] Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 (owner: 10Aude) [23:24:12] (03Merged) 10jenkins-bot: Fix OAuth rights for Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113117 (owner: 10Aude) [23:25:57] (03PS2) 10SPQRobin: Enable VisualEditor on Wikimedia Incubator [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112410 [23:26:02] (03CR) 10Reedy: [C: 032] Enable VisualEditor on Wikimedia Incubator [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112410 (owner: 10SPQRobin) [23:26:12] (03Merged) 10jenkins-bot: Enable VisualEditor on Wikimedia Incubator [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112410 (owner: 10SPQRobin) [23:26:47] (03PS2) 10TTO: Add local interwiki for metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111426 [23:26:50] (03CR) 10Reedy: [C: 032] Add local interwiki for metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111426 (owner: 10TTO) [23:26:58] (03Merged) 10jenkins-bot: Add local interwiki for metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111426 (owner: 10TTO) [23:28:01] !log reedy synchronized database lists files: [23:28:09] Logged the message, Master [23:28:27] still cannot login my bot [23:28:32] what's up? [23:28:32] !log reedy synchronized wmf-config/ [23:28:40] Logged the message, Master [23:29:24] Vito: what error(s) are you getting where? [23:29:49] Reedy: urllib2.HTTPError: HTTP Error 503: Service Unavailable on it.wiki from Italy [23:30:05] Retrieving watchlist for wikipedia:it via API. [23:30:06] Result: 503 Service Unavailable [23:30:16] then traceback [23:31:46] Where is it running from? [23:33:04] !log ori synchronized php-1.23wmf13/extensions/NavigationTiming 'Update NavigationTiming to master for Ic0c9060c5: Don't log 'desktop-beta' as mobileMode' [23:33:12] Logged the message, Master [23:33:49] (03PS2) 10Dzahn: load Bugzilla virtual host first [operations/puppet] - 10https://gerrit.wikimedia.org/r/113265 [23:33:59] !log ori synchronized php-1.23wmf14/extensions/NavigationTiming 'Update NavigationTiming to master for Ic0c9060c5: Don't log 'desktop-beta' as mobileMode' [23:34:07] Logged the message, Master [23:35:59] Reedy: err...it's a pywiki bot running on my debian box from AS3269 [23:36:10] though I don't if you meant this [23:36:23] (03PS1) 10Chad: Remove optional inclusion of Elastica. Not like it would do any good if it was missing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113294 [23:36:56] (03PS2) 10Yurik: Updated 623-03 whitelisted language list to match config [operations/puppet] - 10https://gerrit.wikimedia.org/r/113168 (owner: 10QChris) [23:36:56] !log ori synchronized php-1.23wmf13/extensions/EventLogging 'Update EventLogging for I3de7c406f: Have mtime as calculated by startup module increase on schema change' [23:37:03] Logged the message, Master [23:37:16] (03CR) 10Yurik: Updated 623-03 whitelisted language list to match config (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/113168 (owner: 10QChris) [23:37:41] Reedy: some people are also complaining about overall performances [23:37:55] (03CR) 10Dzahn: [C: 032] "for clients who don't know Server Name Indication to get Bugzilla first so they can report bugs. if it was just sorted alpha you'd get ar." [operations/puppet] - 10https://gerrit.wikimedia.org/r/113265 (owner: 10Dzahn) [23:37:58] and someone also got the infamous "wikimedia error" [23:40:41] Request: GET http://it.wikipedia.org/, from 91.198.174.70 via amssq57 amssq57 ([91.198.174.67]:3128), Varnish XID 904253382 [23:40:41] Forwarded for: 93.58.107.156, 91.198.174.70 [23:40:41] Error: 503, Service Unavailable at Thu, 13 Feb 2014 23:39:11 GMT [23:40:47] that's another user btw [23:41:29] !log restarting apache on zirconium, moved BZ site first and deleted old site [23:41:36] Logged the message, Master [23:55:55] (03PS1) 10Yurik: Zero: 470-01 now handles M & Zero, on both Opera & regular [operations/puppet] - 10https://gerrit.wikimedia.org/r/113299 [23:59:07] we'll deploy parsoid stuff soon, and might need ops help for restarting parsoids [23:59:19] mutante, can I draft you once more? [23:59:57] or ori?