[00:22:15] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [00:28:34] New review: Platonides; "(1 comment)" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/69982 [01:31:27] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.00767147541 secs [01:32:37] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.006761908531 secs [02:07:09] !log LocalisationUpdate completed (1.22wmf7) at Mon Jun 24 02:07:08 UTC 2013 [02:07:19] Logged the message, Master [02:12:57] !log LocalisationUpdate completed (1.22wmf8) at Mon Jun 24 02:12:57 UTC 2013 [02:13:05] Logged the message, Master [02:24:40] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 24 02:24:40 UTC 2013 [02:24:49] Logged the message, Master [03:55:05] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [03:55:05] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:05] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:05] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:05] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:05] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:06] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:06] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:07] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:07] PROBLEM - Puppet freshness on spence is CRITICAL: No successful Puppet run in the last 10 hours [04:23:55] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [04:28:55] PROBLEM - Puppet freshness on magnesium is CRITICAL: No successful Puppet run in the last 10 hours [04:52:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:53:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [05:04:51] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [05:07:07] New patchset: Tim Starling; "Shell access for Sean Pringle" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70121 [05:11:26] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [05:21:34] apergos: morning [05:24:55] apergos: ping me when you get online :) [06:24:06] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 10 hours [06:38:58] PROBLEM - search indices - check lucene status page on search1009 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern found - 345 bytes in 0.002 second response time [07:11:24] apergos: ah I just realized that it's a holiday here, silly me :P [07:29:23] New review: Nemo bis; ""overwhelming community consensus" seems an overstatement to me." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69982 [07:41:21] hello [07:45:57] hi hashar [07:46:09] :-) [07:48:39] ori-l: I like your "polemical" style :-] [07:48:43] regarding bugzilla admins [07:51:28] hashar: :P no comments from me on the subject [07:52:25] well that triggered a replyfrom rob that clarify the potential bike shed [07:52:32] so it was definitely useful [07:52:51] I think the whole point is that we are trying to make things a bit more professional, which need a ton of cleanup beforehadn [07:53:04] just like we started Gerrit for mediawiki/core with only a few people with 2 [07:53:05] cr +2 [07:56:07] I guess so. I don't really see the point, though. [07:56:45] Bugzilla mostly relegates admin actions to a separate interface, so the risk for casual abuse is low compared to other software platforms [07:58:27] Rob mentioned that people were changing things without logging them, but that's not tied to admins. Just two days ago I created two components for two new extensions and was vaguely wondering if I should e-mail Andre about it or not waste his time. [07:59:11] I guess the main point was to make sure Andre would be made aware of most of the changes [07:59:22] such as new products / workflow etc [07:59:30] and to avoid having a couple people doing a change on their own without anybody being informed [07:59:45] that will be surely softened later on I guess [08:00:20] Yeah, I'm not especially upset by it. :) [08:04:50] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.0001595020294 secs [08:24:19] paravoid: holiday or no, I'm on [08:24:38] but starting a bit later because we have the evenign meeting [08:32:46] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.00207722187 secs [08:50:00] apergos: oh, hey [08:50:06] apergos: I put some initial docs up in https://wikitech.wikimedia.org/wiki/Ceph [08:50:09] oh yay [08:50:30] could you have a look and make corrections (whether it's english or technical) and/or tell me what's unclear so I can complete it? [08:50:38] I sure will [08:50:47] that will help to make sure I know what's going on on those boxes [08:50:52] that's the hope [08:51:00] I want to create a media storage page as well [08:51:12] plus cleanup all of swift which is full of old stuff [08:51:22] old and inaccurate I mean, like references to ms5/ms7 etc. [08:51:25] ugh [08:51:45] what a great idea [08:51:54] if you want to clean up any of that, I'd be happy to share the work [08:52:01] heh sure [08:52:27] but first things first, have a look on /Ceph when you get the time :) [08:52:41] tab's open and waiting [08:52:57] (I do like mark: open tabs have pending work in them. My ff is often unhappy ;-) ) [09:03:06] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [09:46:13] apergos: https://wikitech.wikimedia.org/wiki/Media_storage too [09:46:38] that was fast [09:50:12] do you think that I should just get rid of https://wikitech.wikimedia.org/wiki/Media_server & associated bits [09:50:17] like https://wikitech.wikimedia.org/wiki/Media_server/2011_Media_Storage_plans [09:51:30] we sure don't need them in the image hangling category [09:51:59] I like to have a past record of things we've done, but at the least they shuold be marked with a big fat obsolete template [09:53:03] these obsolete pages under generically named titles always confused me [09:53:15] if we want to keep obsolete docs, maybe move them under Obsolete/ ? [09:53:25] but your wiki skills are much better than mine I'm guessing [09:53:51] having their own namespace would be pretty great but Obsolete/ is fine too (and quicker) [09:54:16] having their own namespace would take them out of the default search is what I'm thinking [09:55:46] apergos: you can use the magic keyword __NOINDEX__ [09:56:14] * apergos goes to look at the template [09:56:20] ahh no [09:56:21] sorry [09:56:25] that is for webcrawlers [09:57:30] I was wondering.. I didn't think lucene was smart enough to check things like that [09:57:32] https://wikitech.wikimedia.org/w/index.php?title=Template:Archive&action=edit [09:57:35] there's the template [10:04:09] sent email to ops for comments, if no one cares I'll make that happen [10:05:11] huh japanese spam.. first itme for everything [10:05:50] about Viagra apparently :-D [10:08:13] paravoid: just remove them [10:09:07] an obsolete namespace wfm [10:22:30] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [10:58:03] paravoid: can I run these ceph health etc commands on either frontend or backend hosts to get info about the whole cluster? [10:58:19] ah nm you say here on a frontend [10:58:36] apergos: technically you can run them from any host that has the ceph package installed, ceph.conf and the admin key [10:58:46] for simplicity, I'm installing the admin key on all hosts, backends & frontends [10:59:01] ok [10:59:03] but you could limit that on just frontends, or one frontend or even e.g. just fenari [10:59:11] right [11:19:20] demon [11:24:17] RD: ping [12:20:17] New patchset: Mark Bergsma; "Use port 3128 for newer mobile backend Varnish servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70148 [12:21:24] New patchset: Mark Bergsma; "Use port 3128 for newer mobile backend Varnish servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70148 [12:21:53] New patchset: Mark Bergsma; "Use port 3128 for newer mobile backend Varnish servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70148 [12:22:40] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70148 [12:23:52] yay [12:29:29] New patchset: Mark Bergsma; "Set backend weight to 100 for new mobile servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70149 [12:29:44] is that for chash? [12:30:08] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70149 [12:32:12] yes [12:38:48] New patchset: Mark Bergsma; "Send purges to the right port" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70151 [12:39:52] New patchset: Mark Bergsma; "Send purges to the right port" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70151 [12:40:17] and are we migrating to new boxes? [12:40:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70151 [12:45:19] yes [12:46:00] i'm installing 4 in esams as we speak [12:46:04] and hopefully 4 in eqiad later today [12:46:16] and hopefully 4 in ulsfo in the next few weeks [12:47:00] 4 in eqiad to replace the current ones or supplement them? [12:48:09] replace [12:48:15] these new ones are vastly more powerful [12:48:37] what are their specs? [12:48:40] dysprosium-like? [12:48:48] in eqiad, yes [12:48:51] in esams, half the memory [12:48:57] but double the SSD size [12:49:29] and H710s :/ [12:53:19] that's good, isn't it [12:55:15] New patchset: Mark Bergsma; "Prepare mobile VCL for multi-tier setup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70155 [12:55:15] no [12:55:21] but better than H310s [12:55:31] i have a big box of H310s next to me [13:00:09] why is not good? JBOD? [13:02:43] btw, the H710P (but not the H710) has an interesting feature especially designed for SSDs [13:03:27] fastpath/cut through IO, optimizations for random workloads with small blocks [13:03:34] bypasses the BBU completely afaik [13:03:49] which makes it kind of ridiculous that it's offered on the H710P but not the H710 [13:03:52] Dell... [13:06:04] you mean, the expensive H710P has advanced options that make it just like a SATA controller? [13:06:10] as we have in dysprosium, and is much faster ;) [13:06:36] I mean that the difference between the H710 and the H710P is just more BBU memory [13:06:54] mind you, these systems have onboard sata controllers [13:06:56] and probably a tiny flag in the firmware that allows you to use a feature that needs the BBU disabled [13:06:58] we just can't really use them [13:07:20] but in dysprosium you could? [13:07:21] how come? [13:07:26] different disk backplane [13:07:34] ah [13:08:03] apparently two of the 4 boxes in esams have unreachable mgmt [13:08:10] I should complain about that onsite engineer [13:08:10] oops [13:08:20] that bastard [13:08:44] he's lazy as hell, and it'll take weeks to get him to fix this [13:09:38] ssl3004 *g* [13:19:41] New patchset: Mark Bergsma; "Add cp3011-3014 as esams mobile caches" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70157 [13:20:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70155 [13:21:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70157 [13:38:01] New patchset: Mark Bergsma; "Add LVS service IPs for esams mobile" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70159 [13:38:58] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70159 [13:41:03] New patchset: Faidon; "Ceph: abstract ceph::key from ceph::bootstrap_key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70161 [13:41:15] we have IPs? [13:41:43] but I thought we didn't add wikivoyage in esams because we didn't have IPs left in that subnet [13:42:18] we have plenty ips now [13:42:48] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70161 [13:43:17] say whatever you want about our ipv6 hacks in puppet [13:43:20] but they do work well ;) [13:43:33] I haven't said anything about them for over a year :) [13:43:55] because they weren't giving any issues ;p [13:47:35] haha [13:47:36] "mobile" => { [13:47:36] 'description' => "MediaWiki based mobile site", [13:47:36] 'class' => "testing", [13:48:34] lol [13:49:23] that's LVS servers, right? [13:55:31] yes [13:55:33] New patchset: Mark Bergsma; "Put the mobile LVS service in the 'high-traffic1' class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70165 [13:55:33] New patchset: Mark Bergsma; "Configure the esams LVS servers for mobile" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70166 [13:56:33] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70165 [13:57:19] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70166 [13:58:54] manybubbles, ottomata: we never confirmed that meeting, are we doing it? [13:58:59] yes! [13:59:00] hiya [13:59:06] if you can make it [13:59:18] I don't see much point tbh as I haven't seen the solr requirements at all [13:59:51] ... I'm not sure if I can get you anything unless I know that kind of thing you need. in other words - please come [14:08:27] !log jenkins raising 'gallium' slave # of executors from 1 to 4. I have migrated some jobs to run on that slave. [14:08:36] Logged the message, Master [14:09:15] Change restored: Hashar; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60866 [14:10:31] New patchset: Hashar; "Jenkins job validation (DO NOT SUBMIT)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60866 [14:11:13] Change abandoned: Hashar; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60866 [14:26:09] New patchset: Demon; "Updating with some of the new options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69789 [14:30:28] <^demon> manybubbles: Which meeting is this? :) [14:31:02] ^demon: the one you just roped yourself into by the looks [14:31:10] <^demon> Ohman [14:31:19] ^demon: sorry! we're talking about packaging solr in debs. [14:31:32] <^demon> Ah ok [14:39:18] <^demon> paravoid: Would you mind looking at https://gerrit.wikimedia.org/r/#/c/69789/ and its parent? [14:39:24] <^demon> The parent is especially trivial. [14:39:49] ^demon: sorry I didn't invite you - I figured you were busy - but yeah, we talked about packaging solr. upshot: I'm going to figure out exactly how much work it'd be to update the 3.X branch to 4.X and we'll go from there. [14:40:03] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69771 [14:40:29] I also left a comment on the CirrusSearch talk page [14:40:39] fearing that I'd sound like a broken record, to ^demon especially [14:40:48] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/69789 [14:41:38] the summary is that I'm not very convinced about the arguments against elasticsearch [14:41:51] I don't know much about search in general, or solr & elasticsearch in particular [14:42:25] but the arguments "we already have a half-baked solr 3 infrastructure built for a differrent purpose, so we should use solr 4" doesn't sound very convincing to me [14:42:56] nor is the "we have more experience with solr" -- it's surely is an important data point, but not one to pick one piece of software over another [14:46:28] manybubbles, ^demon? [14:47:12] New patchset: Mark Bergsma; "Use chash for mobile backend cache -> backend cache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70172 [14:47:12] New patchset: Mark Bergsma; "Set an appropriate number of 'retries' for the chash director" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70173 [14:47:28] New review: Nemo bis; "Two more comments on the patch, see inline -> -1." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/69982 [14:48:15] Frankly elasticsearch may be worth another look especially given the speet at which we've been able to get everything working properly with solr. [14:48:32] hm? [14:48:51] *speed*, rather. [14:49:10] in other words if we can through together an elasticsearch prototype in a few days why shouldn't we? [14:49:32] elasticsearch provides a nice deb btw :-) [14:49:56] !log jenkins: migrating pyflakes and pep8 jobs to slaves having the label 'hasContintPackages' [14:49:56] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70172 [14:50:01] this isn't why I suggested in the first place, it just ties in well with the discussion we were just having [14:50:07] Logged the message, Master [14:50:16] did you see http://solr-vs-elasticsearch.com/ ? [14:51:05] <^demon> I did. [14:51:27] New patchset: Mark Bergsma; "Set an appropriate number of 'retries' for the chash director" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70173 [14:51:31] <^demon> "Schema change requires restart. Workaround possible using MultiCore." -- mostly not true anymore, will be fixed in 4.4 if memory serves. [14:51:52] it's not like I'm able to judge the repercussions of 90% of that [14:52:24] but it seems like a detailed comparison by multiple people (open to contributions too), so it may be worthy [14:52:43] puppet you piece of shit [14:52:46] New patchset: Petrb; "inserted missing motd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70175 [14:52:55] Coren|DayOff can I haz +2 ^ [14:54:20] New review: coren; "Yeay ASCII Art!" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/70175 [14:54:21] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70175 [14:54:27] paravoid, ^demon: I've looked at these before but I'll have another look. From the last time I looked neither had a compelling advantage over the other beyond ElasticSearch's website being prettier. [14:54:38] <^demon> The lack of dependency on zookeeper would be nice, if we weren't already using zookeeper for analytics. [14:54:47] * Coren|DayOff is off. [14:54:49] still is nice [14:54:51] even with analytics [14:55:16] If you're planning a large installation that requires running distributed search instances, I suspect you're going to be happier with ElasticSearch. [14:55:19] As Matt Weber points out below, ElasticSearch was built to be distributed from the ground up, not tacked on as an 'afterthought' like it was with Solr. This is totally evident when examining the design and architecture of the 2 products, and also when browsing the source code. [14:55:24] that's what that thing says at the end [14:56:05] <^demon> Yeah, but for an afterthought it's working pretty dang well. [14:56:06] New patchset: Mark Bergsma; "Set an appropriate number of 'retries' for the chash director" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70173 [14:56:15] <^demon> And has improved greatly with each 4.x release [14:56:31] and you have tested this how? ;) [14:57:16] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70173 [14:57:17] <^demon> Well it's all worked out of the box, I've been able to scale it as designed, and we got a prototype spun up in a week and a half. [14:57:24] <^demon> That's not to say we can't do the same with elastic. [14:57:57] we haven't tested the getting better part but we've tested that the replication works well in our labs instances with reasonably large data sets. [14:58:51] I'm not saying elasticsearch is better, I wouldn't know. all I'm saying is that the arguments on the CirrusSearch page don't sound very compelling to me [14:59:41] s/complelling/convincing/ ? [14:59:49] probably. [15:00:19] also, the we set this up in a week and a half doesn't account for the 101 dependencies that we have to deploy somehow, does it :) [15:00:22] they aren't that compelling and I wrote them. I still think ElasticSearch should see a prototype [15:00:49] fair enough [15:00:56] paravoid: if ElasticSearch doesn't have on the order of that many dependencies I wouldn't trust it. [15:01:36] there is something wrong with java software that doesn't depend on a billion libraries. The libraries are really the only reason to use Java. [15:02:03] :( [15:02:04] 20, as far as I can see [15:02:19] 12 of which are lucene [15:02:23] that does not offer a lot of place to reinvent the wheel :/ [15:02:31] 20 is believable if low. [15:02:37] sigar + jan + jts + log4j + spatial4j [15:03:04] New patchset: Mark Bergsma; "Frontends always use chash, regardless of tier..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70180 [15:05:17] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70180 [15:07:28] 32 dependencies once you blow them all out to catch transitive ones. surprisingly few, really. but not few enough that it looks like I'm missing something [15:07:46] where did you see the transitive ones? [15:07:59] mvn dependency:tree will tell you [15:08:08] I was looking at their .deb, it includes just 20 jars and has no libraries on the Depends: line [15:08:20] so unless there are jars within the jars, it shouldn't be more [15:09:28] there certainly could be jars within jars but there are certainly more than 20 declared in their pom. 5 are testing only so they probably don't package them. [15:09:53] okay [15:10:20] we might be seeing a difference in version - I'm looking at head and haven't checked their tags [15:10:29] New patchset: Mark Bergsma; "Calculate based on number of backends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70181 [15:10:31] I'm looking at 0.90.1 [15:10:48] released May 30th [15:11:16] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70181 [15:14:35] New patchset: Hashar; "contint: explicitly require php5-dev" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70182 [15:27:59] * AzaToth tries dragging RD out of his hole [15:29:17] New patchset: Cmjohnson; "adding cp1066-70 to netboot.cfg" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70183 [15:31:31] New patchset: Cmjohnson; "adding cp1066-70 to netboot.cfg update" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70183 [15:31:56] apergos: wow, you did go through all of it [15:32:05] even spotted the log file location typo [15:32:09] I'm impressed :) [15:32:24] well still doing, I have a few qs I am trying to work out [15:32:37] I'll come back to you with them if I can't figure it out from the reading [15:32:46] I saw the question mark for the odd number of monitors [15:32:53] yeah see the talk page [15:33:09] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70183 [15:36:00] New patchset: Reedy; "Add Wikivoyage to wgCrossSiteAJAXdomains" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70187 [15:36:50] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70187 [15:40:47] New patchset: Petrb; "changed colors for root prompt, red on production, green on beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70188 [15:42:04] apergos: responded there [15:42:08] thanks [15:42:14] for some reason I didn't get an email [15:42:28] New patchset: Mark Bergsma; "Use IPv6 for esams -> eqiad communication" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70189 [15:43:31] why don't I see the icinga checks when I look at the host entry on icinga? [15:43:40] that was my question actually [15:45:39] paravoid: [15:46:09] which host entry? [15:46:26] the http checks run on the frontends, maybe you're checking a backend? [15:46:28] well I took a random hmm I forget if backend or frontend [15:46:42] the backends should have the raid check which is pretty accurate so far [15:46:50] the LVS checks won't be in either [15:46:53] right [15:47:02] and the upcoming ceph health check will be on the monitors [15:47:08] grrr [15:47:11] I won't check icinga [15:47:16] I'll kick it instead [15:47:59] !log Created WikiLove tables on enwikivoyage [15:48:08] Logged the message, Master [15:48:52] is gerrit down or is it just me? [15:49:05] It's down [15:49:10] New patchset: Reedy; "Install wikilove extension at en.wikivoyage.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70190 [15:49:10] I just pinged Chad over in #wikimedia-dev [15:49:12] Oh look [15:49:17] It just did something for me [15:49:18] New patchset: Andrew Bogott; "Move mail manifests to a module called 'exim'" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68584 [15:49:31] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70189 [15:49:52] <^demon> !log restarting gerrit [15:49:59] Logged the message, Master [15:50:59] how many dependencies does gerrit have? [15:51:52] New patchset: Petrb; "changed colors for root prompt, red on production, green on beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70188 [15:52:06] I don't think it has any dependencies that run on different machines [15:52:15] no [15:52:22] so we can check if it's a java app that we can trust [15:52:27] <^demon> A lot. [15:52:33] Oh, right [15:52:34] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70190 [15:52:37] <^demon> And we can't, really. Which is why we're working on saner packaging. [15:52:56] Error: Could not find any hostgroup matching 'cache_mobile_esams' (config file '/etc/icinga/puppet_hosts.cfg', starting on line 1777) [15:53:05] New patchset: Hashar; "beta: purge bits cache" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70191 [15:53:08] hehe [15:53:11] let me fix that apergos [15:53:15] thankye [15:53:47] andrewbogott: so, what did you think on the answers on the roles thread? [15:53:54] !log reedy synchronized wmf-config/ [15:54:02] Logged the message, Master [15:54:04] * apergos lurks in the roles conversation [15:54:14] New patchset: Mark Bergsma; "Add Nagios group for esams mobile caches" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70192 [15:56:19] paravoid: I don't feel all that strongly… that last patchset (that I just put up for review) moves the roles into a common role module. [15:56:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70192 [15:56:35] I did get some weird issues with loading, though, doing it that way… [15:57:08] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70191 [15:57:33] paravoid, I was getting a duplicate-class-definition error for a class defined in a role that wasn't defined in the node. [15:57:44] hm? [15:58:10] I'm about to go into a meeting, so I can't have a closer look [15:58:39] It might be a red herring… I'll worry about it if that kind of thing happens a second time. [15:59:00] but my take would be that I'd prefer it if mail/exim commit put the roles under manifest/roles and the introduction of modules/mwrole was (debated) in a separate commit [15:59:57] paravoid: OK, that's easy enough. I made one big patch partly as a proof-of-concept so we could discuss it. I'll respond to the email thread. [16:04:55] mark: "how many dependencies does gerrit have?" Naive grep showed 104 dependencies (including source dependencies) [16:06:22] <^demon> How many of those are because of gwt? [16:06:34] :-D No idea. [16:06:39] [16:06:59] also can someone review / merge https://gerrit.wikimedia.org/r/#/c/70064/ [16:09:47] ^demon: lib/gwt/BUCK lists just 5 jars... many things that come with gwt in their name are not real gwt related IIRC (gwtexpui etc) [16:10:48] <^demon> Ya. [16:10:48] But don't you dare to take that as defense for gwt. [16:30:05] ottomata: ist here a C library for avro? [16:34:30] not sure, i don't actually know much about it [16:34:36] i just know that david was talking about it a lot [16:34:49] not sure if it is better/worse than protobufs [16:39:15] and of course there's thrift, that's popular in the analytics space isn't it [16:39:57] qchris: ok, I feel much better about Gerrit already [16:41:51] yeah, thrift is very common [16:42:03] i htink kafka uses thrift [16:42:22] maybe not... [16:45:42] ottomata: kafka messages are byte arrays; i doesn't know/care about the data format [16:46:15] ah ok, dunno where I got that then [16:46:25] !log krinkle synchronized php-1.22wmf8/extensions/VisualEditor/ 'I73522f63d9a49a4b7' [16:46:34] Logged the message, Master [16:47:00] !log krinkle synchronized php-1.22wmf7/extensions/VisualEditor/ 'I73522f63d9a49a4b7' [16:47:08] Logged the message, Master [16:49:45] New patchset: Ottomata; "Moving java package def into a class for analytics role" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70206 [16:51:16] New patchset: Ottomata; "Moving java package def into a class for analytics role" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70206 [16:52:09] New patchset: Krinkle; "Run VisualEditor default-on A/B split test for new accounts on enwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68846 [16:52:09] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70206 [16:52:36] New patchset: Ottomata; "Puppetizing analytics1018 as hadoop worker" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70208 [16:53:30] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68846 [16:53:31] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70208 [16:54:21] !log krinkle synchronized wmf-config/InitialiseSettings.php 'I94d74fd8abeba465f' [16:54:28] Logged the message, Master [16:54:52] !log krinkle synchronized wmf-config/CommonSettings.php 'I94d74fd8abeba465f' [16:54:56] uuh [16:55:01] Logged the message, Master [17:05:22] PROBLEM - RAID on analytics1018 is CRITICAL: Connection refused by host [17:06:12] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [17:06:12] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [17:06:12] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [17:06:12] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [17:06:12] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [17:06:12] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [17:06:13] PROBLEM - Puppet freshness on magnesium is CRITICAL: No successful Puppet run in the last 10 hours [17:06:13] PROBLEM - Puppet freshness on spence is CRITICAL: No successful Puppet run in the last 10 hours [17:06:14] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [17:06:14] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 10 hours [17:06:15] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [17:06:15] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [17:06:16] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [17:06:23] RECOVERY - Puppet freshness on tin is OK: puppet ran at Mon Jun 24 17:06:15 UTC 2013 [17:06:50] New patchset: Cmjohnson; "fixing dhcp for cp1058/59" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70210 [17:07:42] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70210 [17:07:54] PROBLEM - DPKG on analytics1018 is CRITICAL: Connection refused by host [17:08:03] PROBLEM - Disk space on analytics1018 is CRITICAL: Connection refused by host [17:08:03] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [17:08:35] !log while kicking icinga into working again on neon, also cleared out a pile of inodes in /var/spool/snmptt again, show many many dead processes too [17:08:43] Logged the message, Master [17:08:50] !log er, "shot" many dead snmptt processes [17:08:58] Logged the message, Master [17:09:05] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [17:11:48] apergos: yuck, glad you caught that again [17:12:07] sure would be nice to know the underlying cause [17:12:13] or even fix it [17:20:15] PROBLEM - NTP on analytics1018 is CRITICAL: NTP CRITICAL: No response from NTP server [17:22:47] New patchset: Yuvipanda; "Add initial role for redis server for tool labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70212 [17:27:10] New patchset: Yuvipanda; "Add initial role for redis server for tool labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70212 [17:28:55] RECOVERY - DPKG on analytics1018 is OK: All packages OK [17:28:55] RECOVERY - Disk space on analytics1018 is OK: DISK OK [17:43:26] New patchset: Yuvipanda; "Add support for renaming Redis commands to the redis class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70064 [17:45:08] RECOVERY - NTP on analytics1018 is OK: NTP OK: Offset -0.02663362026 secs [17:57:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:58:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [18:05:16] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: special, wikimedia and closed to 1.22wmf8 [18:05:25] Logged the message, Master [18:08:37] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikivoyage and wiktionary to 1.22wmf8 [18:08:46] Logged the message, Master [18:10:47] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikiquote, wikiversity, wikinews, wiktionary to 1.22wmf8 [18:10:56] Logged the message, Master [18:12:38] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikisource and wikiooks to 1.22wmf8 [18:12:46] Logged the message, Master [18:13:22] New patchset: Reedy; "Everything non 'pedia to 1.22wmf8" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70218 [18:13:44] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70218 [18:21:24] !log updated Parsoid to b3a872fa7 [18:21:34] Logged the message, Master [18:51:58] greg-g, is it ok to do a small deployment? [18:53:19] oh, it will probably conflict with the main MW depl window, so nv [18:53:25] *never mind [19:03:41] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [19:17:22] New patchset: Andrew Bogott; "Switch RT/exim from http to https" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70237 [19:20:11] New review: Dzahn; "RT-5169, merge after upgrade" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/70237 [19:33:00] New patchset: Yurik; "Renamed incorrect carrier ID from 623-01 to 623-03" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70239 [19:38:35] !log temp shutting down webserver on streber (RT scheduled downtime) [19:38:45] Logged the message, Master [19:40:37] New patchset: Andrew Bogott; "Turn of rt/lighttpd on streber." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70242 [19:40:44] PROBLEM - Lighttpd HTTP on streber is CRITICAL: Connection refused [19:41:11] New patchset: Andrew Bogott; "Turn off rt/lighttpd on streber." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70242 [19:42:23] binasher: did IIya get back to you? [19:42:32] PROBLEM - Exim SMTP on streber is CRITICAL: Connection refused [19:45:15] !log moving backups from db9 to tridge [19:45:24] Logged the message, Master [19:46:22] !log updated Parsoid to 6ab78c8dd8a [19:46:30] Logged the message, Master [19:51:42] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 252 seconds [19:52:31] !log reedy synchronized php-1.22wmf8/extensions/ProofreadPage/ [19:52:40] Logged the message, Master [19:52:42] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [20:07:39] New patchset: Andrew Bogott; "Explicitly set db name to 'rt'" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70309 [20:08:02] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70242 [20:09:14] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70309 [20:10:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:11:09] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70237 [20:11:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [20:12:21] RECOVERY - Puppet freshness on magnesium is OK: puppet ran at Mon Jun 24 20:12:18 UTC 2013 [20:15:02] New patchset: Yuvipanda; "Add initial role for redis server for tool labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70212 [20:18:32] ^demon, is jenkins on the blink? [20:18:41] or hashar? I don't know the boss of jenkins is [20:18:47] *who [20:23:21] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [20:25:42] <^demon> andrewbogott: Don't think so? [20:25:44] <^demon> Lemme look [20:26:55] <^demon> Zuul seems to be receiving events ok. [20:28:33] ^demon, https://gerrit.wikimedia.org/r/#/c/68584/ seems unchecked, hours later [20:28:42] But, I'm multitasking, maybe missing something dumb [20:28:56] New review: Demon; "recheck" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68584 [20:29:44] <^demon> It rechecked, won't merge cleanly. [20:37:21] it wasn't supposed to merge... [20:38:47] New patchset: Andrew Bogott; "Set rt domain and port explicitly" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70313 [20:38:52] <^demon> No, it won't run tests since it won't merge. [20:39:04] Oh, I see. OK. [20:39:11] thanks, I'll rebase [20:39:26] New review: Dzahn; "yep, fix the CSRF warning on create" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/70313 [20:39:26] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70313 [20:42:04] hello :) [20:42:43] I got a super trivial change for you guys: https://gerrit.wikimedia.org/r/#/c/70182/ that documents in puppet we need php5-dev on continuous integration server. It is already installed as a dependency of another package (php-pear) [20:44:20] is rt broken? [20:44:46] i see andrewbogott recently working on it in gerrit spam backscroll [20:45:01] jeremyb: yes, down for upgrade/migrate. [20:45:04] oh, and mutante too. well there's my answer [20:45:22] k, danke [20:47:32] New patchset: Andrew Bogott; "Switch from http to https for rt url." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70317 [20:47:41] jeremyb: it'll be new and shiny soon:) [20:47:56] high gloss? [20:48:06] New review: Dzahn; "yep, that was it" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/70317 [20:48:07] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70317 [20:50:47] 'new and shiny' = 'the menu tree will be rearranged in a way that will annoy you at first' [20:51:21] <^demon> I love those kinds of upgrades. [20:52:02] whoopsey... [20:52:18] * odder can't load mw.org [20:52:37] <^demon> Wfm [20:53:12] !log DNS update - switching RT over to magnesium and 4.x [20:53:21] Logged the message, Master [20:54:22] tried literally mw.org , but you didnt mean that, heh [20:55:18] <^demon> I wanted us to try and get mw.org. [20:55:53] same for wm.org and wmf.org [20:57:39] ^demon: you're gone from the east already? [20:58:03] <^demon> jeremyb: almost. [20:58:30] jeremyb, try rt now? [20:59:13] ^demon: trying to think of who knows something about labs for a workshop in DC [20:59:23] jeremyb: wanna try?:) [20:59:52] having not modified a ticket yet it seems sane [20:59:58] <^demon> jeremyb: I won't be able to make it between now and moving day. Far too much to do here. [21:00:03] jeremyb: cool [21:00:41] ^demon: but it would probably be after you move i guess. if you think of other labs users nearby let me know [21:00:57] ^demon: have a good trip! [21:01:10] * jeremyb just moved himself. but only like a mile down the road [21:05:44] New patchset: Pyoungmeister; "putting self back into icinga" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70318 [21:07:03] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70318 [21:10:39] New patchset: Andrew Bogott; "Use magnesium instead of streber for incoming rt mail" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70319 [21:10:49] mutante: ^ [21:11:55] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70319 [21:12:41] andrewbogott: :) ack [21:13:00] New review: Dzahn; "yep:)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70319 [21:15:48] Man, mchenry is in such a state… I hope it's slated for upgrade/replacement sometime soon [21:16:07] Reedy: any idea what is in 10.111.0.0/16 ? :P [21:16:17] Is it editing? [21:17:06] no, but anomie blocked a lot of ip ranges for local IPs the other day, I unblocked the range I knew to have squids in just in case, but apparently this one also caused a problem [21:17:09] https://en.wikipedia.org/wiki/User_talk:Anomie#10.111.0.0.2F16 [21:18:08] They look to be labs internal ip addresses [21:18:40] really? [21:18:45] addshore, Reedy: At a glance, it looks like the block apparently applies to user->10.x.x.x->proxy->public internet->WMF IP, if "proxy" includes an XFF header. [21:19:15] addshore: Well, it's 10. so it's definitely internal ;) [21:19:21] that a hillarious and slightly messed up way to access wp :P [21:19:30] but Reedy https://en.wikipedia.org/w/index.php?title=User_talk:82.132.213.150&diff=561351712&oldid=561346634 ;p [21:20:08] Well, resolving en.wikipedia.org is going to give an external IP.. and if you don't remap them somehow [21:21:57] Reedy: the ISP does remapping by adding an XFF header with the ISP's internal IP for that user [21:24:26] xff blocking fun! [21:24:27] Yes [21:24:28] New patchset: Hashar; "beta: tweak $wgLoadScript to use the bits cache" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70322 [21:24:43] But are they coming from an ISP failing? Or tool labs? [21:25:00] isp failing, that ip looked as if it were a genuine use [21:25:02] *user [21:25:45] Someone needs to shout at the ISP then [21:25:54] which ISP? :> [21:26:22] tell the ISP what? switch to IPv6? get more public IPs? [21:27:37] Both, either [21:27:44] Fix that they're setting headers [21:27:46] "Stop reporting 10.0.0.0/8 IPs in XFF", I guess. [21:29:44] * jeremyb runs away :) [21:30:50] :> [21:30:58] does anyone recall where the parsoid team requested full varnish flush capabilities? i'm not finding it in email or rt [21:31:18] PROBLEM - Disk space on analytics1018 is CRITICAL: DISK CRITICAL - free space: / 639 MB (2% inode=95%): [21:32:12] binasher: email [21:32:15] lemme see [21:32:35] thx! [21:33:03] ah [21:33:04] it's rt [21:33:13] binasher: Root for gwicke (or some other way for Gabriel to clear the Parsoid caches) [21:33:27] ahh, thanks [21:33:29] yw [21:35:53] binasher: For now the way I've been doing full purges for them is stop Varnish, delete the varnish.persist files, start Varnish [21:36:14] I warned that would destroy absolutely everything in the cache and they said that was fine [21:36:47] RoanKattouw: if you have permission to do that, you could do it a lot easier with the ban command via varnishadm [21:54:16] RECOVERY - Disk space on analytics1018 is OK: DISK OK [22:25:08] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [22:26:08] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [22:27:08] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [22:29:12] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [22:40:55] New patchset: Yuvipanda; "Add initial role for redis server for tool labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70212 [22:43:02] !log Graceful reload of Zuul to deploy 'Ie51c7ce943e8be2 [22:43:12] Logged the message, Master [22:48:31] thanks Krinkle [22:53:43] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [22:54:32] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [22:55:52] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 279 seconds [22:55:57] New patchset: Yuvipanda; "Add initial role for redis server for tool labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70212 [22:57:00] New patchset: Ori.livneh; "Set common rsync and dsh parameters in mw-deployment-vars" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57890 [23:01:52] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [23:05:07] !log olivneh synchronized php-1.22wmf7/extensions/GettingStarted 'I329f59929: Don't log when schemaAction is falsy. (Bug: 50065)' [23:05:17] Logged the message, Master [23:05:27] !log olivneh synchronized php-1.22wmf8/extensions/GettingStarted 'I329f59929: Don't log when schemaAction is falsy. (Bug: 50065)' [23:05:36] Logged the message, Master [23:21:47] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 262 seconds [23:22:33] New review: Andrew Bogott; "I'm reluctant to merge this because I don't know anything about redis, but it looks perfectly reason..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/70064 [23:23:47] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [23:27:13] New review: Andrew Bogott; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70212 [23:31:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:32:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [23:33:40] New patchset: MZMcBride; "Enable CAPTCHA for all edits of non-confirmed users on pt.wikipedia in order to reduce editing activity" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69982 [23:34:13] New patchset: Yuvipanda; "Add initial role for redis server for tool labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70212 [23:40:05] New patchset: Yuvipanda; "Add initial role for redis server for tool labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70212 [23:45:11] New patchset: Yuvipanda; "Add initial role for redis server for tool labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/70212 [23:45:53] New patchset: Ori.livneh; "Disable ArticleFeedback (v4)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70349 [23:46:57] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/70349 [23:48:46] !log olivneh synchronized wmf-config/InitialiseSettings.php 'I82fe6063a: Disable AFTv4 (Bug 43892)' [23:48:56] Logged the message, Master [23:51:20] New patchset: MZMcBride; "Enable CAPTCHA for all edits of non-confirmed users on pt.wikipedia in order to reduce editing activity" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69982 [23:52:21] New review: MZMcBride; "Gerrit continues to mangle my inputs." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/69982