[00:13:48] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [01:32:45] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.0009511709213 secs [01:57:59] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [02:01:59] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.001435279846 secs [02:02:49] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.0005353689194 secs [02:06:36] !log LocalisationUpdate completed (1.22wmf4) at Sun May 26 02:06:36 UTC 2013 [02:06:46] Logged the message, Master [02:08:55] PROBLEM - Puppet freshness on stat1 is CRITICAL: No successful Puppet run in the last 10 hours [02:13:55] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [02:13:55] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [02:13:55] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [02:15:56] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [02:27:21] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun May 26 02:27:21 UTC 2013 [02:27:30] Logged the message, Master [04:51:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:52:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.141 second response time [05:02:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [05:16:03] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:53] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [05:53:21] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [05:53:51] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [06:11:00] !log De-duplicated base refreshLinks2 jobs on enwiki, didn't do much though [06:11:08] Logged the message, Master [06:20:35] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [06:20:35] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [06:20:35] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [06:23:35] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [06:23:35] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [06:26:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:27:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [06:35:35] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:30] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [07:08:12] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [07:27:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:28:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [07:28:32] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:30:32] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [07:31:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:32:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [07:34:31] heh [07:38:39] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:29] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [08:01:29] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.001762032509 secs [08:02:29] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.002439618111 secs [08:14:55] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [08:15:55] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [08:18:55] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [08:48:03] PROBLEM - Puppet freshness on stat1002 is CRITICAL: No successful Puppet run in the last 10 hours [08:48:12] !log tstarling cleared profiling data [08:48:20] Logged the message, Master [08:48:38] New review: JanZerebecki; "This needs to be done in the vhost www.wikidata.org ." [operations/apache-config] (master) C: -1; - https://gerrit.wikimedia.org/r/65463 [08:51:21] TimStarling: is https://gerrit.wikimedia.org/r/#/c/61743/ ok now? [08:52:12] yes, looks fine [08:53:06] you should have come to amsterdam [08:54:05] little old me? puh [08:54:58] yeah, robla said he tried to convince you [08:56:20] Ryan_Lane: re yesterdays image scaler puppet patch -- can we apply it? [08:56:48] which one? [08:56:51] mwalker: ^ [08:57:03] paravoid: https://gerrit.wikimedia.org/r/#/c/65423/ [08:58:33] hm [08:59:22] indeed [08:59:36] /tmp isn't the same partition as /a [08:59:43] but it's not tmpfs and seems to have enough space [09:01:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:01:45] we dont have to put image magicks temp dir in temp -- it just cant be in /a [09:01:55] or anything else that we commonly use as short mounts [09:02:19] I don't like /a at all, but I wonder, why do you care? [09:02:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [09:02:37] :) this is part of my grand master plan to symlink /a to /apache on the appservers [09:02:45] TimStarling: I was reading over the design talk docs yesterday [09:02:46] what? [09:02:48] ... so that Special:Version can actually get treeish information [09:02:58] that's a -2 from me on that :) [09:03:09] paravoid: why? [09:03:28] I could picture being there and agreeing with much of what you said [09:03:45] why do you need something in / ? [09:04:11] paravoid: because we were silly and on tin we deploy from /a -- which means that all the git submodules point to having a git root at /a/common [09:04:21] paravoid: when in reality the git root is at /apache/common [09:04:39] paravoid: (on the appservers) [09:05:21] Aaron|home: yeah, feel free to add your own sections or comments there [09:05:27] /apache? [09:05:29] aaargh [09:05:37] yep! [09:05:40] i.e. to https://www.mediawiki.org/wiki/Architecture_guidelines [09:06:19] why can't people just use /srv/mediawiki or whatever instead of polluting / [09:06:29] *shrugs* [09:06:35] by people you mean Lee Crocker? [09:06:58] I'm guessing the answer to that is yes [09:07:08] paravoid: the problem mwalker is hitting is that the deployment location is different than source location [09:07:15] paravoid: git-deploy is going to confine us in /srv/deployment/ [09:07:29] yes, because people use /a for all kinds of random stuff [09:07:35] paravoid: so, the submodule git repos point to the wrong place [09:07:49] yeah, I mentioned to mwalker that it may be best to wait for git-deploy [09:08:02] and that it'll probably be 6 months before it's back on the roadmap [09:08:06] and you know what would help? actually using more meaningful names than "a" [09:08:11] :D [09:08:12] yep [09:09:52] New patchset: RobH; "updating smokeping puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65540 [09:10:05] so -- because I'm impatient and dont want to wait for git deploy -- can we change where we deploy from? [09:10:14] (if symlinking /a is a bad idea) [09:10:35] RoanKattouw: 213.127.161.46 [09:11:29] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65423 [09:11:56] using the same paths as git deploy would have made migration to it more difficult [09:13:47] TimStarling: we could change /a to /apache on tin [09:13:47] Reedy: Can you run the script to create REL1_21 branches in extensions? Looks like we forgot it again. [09:14:00] Reedy: We'll need to make it fake it again by date when core master branched [09:14:11] Reedy: e.g. Echo master is depending on 1.22-alpha, breaks on 1.21.0 [09:14:19] New patchset: Mark Bergsma; "Cookie/Vary munging for text, Cache-Control header replacements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65541 [09:14:31] well, like I say, /apache is ancient and does not reflect current practices [09:14:47] whereas /a does reflect current practices (whether faidon likes it or not) [09:14:50] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65540 [09:15:10] TimStarling: yeah. I agree that using the same directory would have made it more difficult [09:15:17] /srv is also current [09:15:24] I was just describing mwalker's issue, mostly [09:15:31] TimStarling: ... both /apache and /a are current -- otherwise why do all our appservers serve from /apache? [09:16:02] New patchset: Catrope; "Raise account creation throttle for the Amsterdam Hackathon IPs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65544 [09:16:06] just because that symlink has always existed and nobody has gotten around to removing it yet [09:16:11] Right, /apache is /usr/local/apache (live), /a on tin is more like /home/wikipedia/common on fenari, working copy. [09:16:15] it's been there for at least 10 years [09:16:26] Change merged: Catrope; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65544 [09:16:28] I too thought /a was a symlink to /apache, but it makes sense now that it isn't. [09:16:58] New patchset: Mark Bergsma; "Cookie/Vary munging for text, Cache-Control header replacements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65541 [09:17:01] originally we had a source install of apache which was in /usr/local/apache [09:17:07] and /apache was a shortcut for that [09:17:19] gotcha [09:17:19] I'm sure it made sense at the time [09:17:37] !log catrope synchronized wmf-config/throttle.php 'Raise account creation throttle for Amsterdam hackathon' [09:17:47] Logged the message, Master [09:21:12] ok! so what if I have puppet create the /a directory if it doesn't exist; and create the symlink /a/common -> /apache/common (?) [09:23:02] no [09:23:57] New patchset: Mark Bergsma; "Cookie/Vary munging for text, Cache-Control header replacements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65541 [09:24:19] New patchset: Faidon; "Varnish: move std.collect X-Varnish/Via to wikimedia" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65545 [09:26:07] you know, /a is mostly a partitioning thing [09:26:24] the idea is to have a small root partition, and mount the bulk of the space at /a [09:26:36] and then anything which needs a lot of disk space goes in the /a hierarchy [09:27:10] so it doesn't tell you anything about the function of the things under /a, because it's a system thing not a function thing [09:27:34] Krinkle: we also forgot to rename the version in the tarball as well [09:27:37] /srv is specifically for content that is served [09:28:28] p858snake: rename version of what in what tarball? [09:28:29] if you mounted the bulk of the available storage at /srv, you might need to put things in there that aren't actually served, which would be inappropriate [09:28:37] New patchset: Yurik; "Allow XFF spoofing from the trusted IPs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62103 [09:28:57] Krinkle: mw last night :) still listed as a rc (there was a patchset to fix that today) [09:29:14] TimStarling: makes sense -- I dont really see whats wrong with a symlink under /a though [09:30:02] if the concern is collisions -- my argument is that all these servers are under puppet -- so we should know exactly what's on them, /a being a separate partition aside [09:30:51] gtg [09:33:12] paravoid: thoughts on ^ ? I am actually OK with waiting until git-deploy is deployed; but not until I know that I've exhausted all the options for solving my problem inside the current setup [09:33:26] because i fail to see why I should wait 6 months if we can solve it now [09:34:18] New patchset: BBlack; "fixed a bad assertion + bumped version to 0.0.2" [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65547 [09:35:42] New patchset: RobH; "further tweaking for smokeping" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65549 [09:36:32] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65278 [09:38:52] New patchset: BBlack; "fixed a bad assertion + bumped version to 0.0.2" [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65550 [09:38:53] New patchset: BBlack; "upstream 0.0.2 + debug packaging stuff" [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65551 [09:40:50] New patchset: Mark Bergsma; "Cookie/Vary munging for text, Cache-Control header replacements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65541 [09:41:23] Change merged: BBlack; [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65547 [09:41:48] Change merged: BBlack; [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65550 [09:42:02] Change merged: BBlack; [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65551 [09:42:21] !log upgrading some pmtpa mw* machines - may be a few alerts [09:42:29] Logged the message, Mistress of the network gear. [09:42:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65541 [09:43:17] New patchset: RobH; "further tweaking for smokeping" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65549 [09:45:46] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65545 [09:48:37] Jeff_Green: access to the ubuntu repo should already exist [09:49:46] huh...it times out [09:50:14] on this host: 67% [Connecting to security.ubuntu.com (91.189.91.14)] [09:50:29] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65549 [09:51:13] hello dschoon [09:51:15] http://www.door-74.com/ [09:51:25] hello LeslieCarr [09:51:57] it appears this website has not yet discovered html { background-color: black; } [09:52:23] PROBLEM - DPKG on mw1118 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:52:23] PROBLEM - DPKG on mw1085 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:52:33] PROBLEM - DPKG on mw1126 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:52:33] PROBLEM - DPKG on mw1127 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:52:33] PROBLEM - DPKG on mw1025 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:52:48] Krinkle: I've no idea what script that is [09:53:03] PROBLEM - DPKG on mw1114 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:03] PROBLEM - DPKG on mw1056 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:03] PROBLEM - DPKG on mw1115 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:03] PROBLEM - DPKG on mw1057 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:13] PROBLEM - DPKG on mw1038 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:23] PROBLEM - DPKG on mw1077 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:23] PROBLEM - DPKG on mw1023 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:23] PROBLEM - DPKG on mw1029 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:23] PROBLEM - DPKG on mw1054 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:23] PROBLEM - DPKG on mw1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:24] PROBLEM - DPKG on mw1010 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:24] RECOVERY - DPKG on mw1118 is OK: All packages OK [09:53:25] RECOVERY - DPKG on mw1085 is OK: All packages OK [09:53:25] PROBLEM - DPKG on mw1031 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:26] PROBLEM - DPKG on mw1063 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:26] PROBLEM - DPKG on mw1034 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:33] RECOVERY - DPKG on mw1126 is OK: All packages OK [09:53:33] PROBLEM - DPKG on mw1068 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:33] PROBLEM - DPKG on mw1046 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:34] RECOVERY - DPKG on mw1127 is OK: All packages OK [09:53:34] PROBLEM - DPKG on mw1045 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:53:55] paravoid, i fixed https://gerrit.wikimedia.org/r/#/c/62103/ [09:54:03] RECOVERY - DPKG on mw1114 is OK: All packages OK [09:54:03] RECOVERY - DPKG on mw1115 is OK: All packages OK [09:54:09] are those the changes you wanted? [09:54:13] PROBLEM - Apache HTTP on mw1118 is CRITICAL: Connection refused [09:54:13] PROBLEM - Apache HTTP on mw1126 is CRITICAL: Connection refused [09:54:13] PROBLEM - Apache HTTP on mw1115 is CRITICAL: Connection refused [09:54:26] New patchset: RobH; "more smokeping, if this isnt the last one I am totally swapping to puppetmaster::self" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65556 [09:54:33] PROBLEM - Apache HTTP on mw1114 is CRITICAL: Connection refused [09:54:33] PROBLEM - Apache HTTP on mw1127 is CRITICAL: Connection refused [09:54:51] Reedy: Oh, I thought you created them last time. [09:54:57] Reedy: Maybe that was Aaron|home [09:55:33] PROBLEM - Apache HTTP on mw1029 is CRITICAL: Connection refused [09:55:43] PROBLEM - Apache HTTP on mw1068 is CRITICAL: Connection refused [09:55:43] PROBLEM - Apache HTTP on mw1034 is CRITICAL: Connection refused [09:56:03] PROBLEM - Apache HTTP on mw1057 is CRITICAL: Connection refused [09:56:03] PROBLEM - Apache HTTP on mw1025 is CRITICAL: Connection refused [09:56:03] PROBLEM - Apache HTTP on mw1063 is CRITICAL: Connection refused [09:56:03] RECOVERY - DPKG on mw1057 is OK: All packages OK [09:56:13] RECOVERY - DPKG on mw1038 is OK: All packages OK [09:56:23] RECOVERY - DPKG on mw1077 is OK: All packages OK [09:56:24] RECOVERY - DPKG on mw1023 is OK: All packages OK [09:56:24] RECOVERY - DPKG on mw1054 is OK: All packages OK [09:56:24] RECOVERY - DPKG on mw1029 is OK: All packages OK [09:56:24] RECOVERY - DPKG on mw1002 is OK: All packages OK [09:56:24] RECOVERY - DPKG on mw1010 is OK: All packages OK [09:56:24] RECOVERY - DPKG on mw1031 is OK: All packages OK [09:56:25] PROBLEM - Apache HTTP on mw1077 is CRITICAL: Connection refused [09:56:25] PROBLEM - Apache HTTP on mw1056 is CRITICAL: Connection refused [09:56:25] New patchset: Asher; "respawn twemproxy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65558 [09:56:26] PROBLEM - Apache HTTP on mw1038 is CRITICAL: Connection refused [09:56:26] PROBLEM - Apache HTTP on mw1023 is CRITICAL: Connection refused [09:56:27] PROBLEM - Apache HTTP on mw1054 is CRITICAL: Connection refused [09:56:27] RECOVERY - DPKG on mw1063 is OK: All packages OK [09:56:28] RECOVERY - DPKG on mw1034 is OK: All packages OK [09:56:31] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65556 [09:56:33] RECOVERY - DPKG on mw1068 is OK: All packages OK [09:56:34] RECOVERY - DPKG on mw1046 is OK: All packages OK [09:56:34] PROBLEM - Apache HTTP on mw1046 is CRITICAL: Connection refused [09:56:34] PROBLEM - Apache HTTP on mw1045 is CRITICAL: Connection refused [09:56:34] RECOVERY - DPKG on mw1025 is OK: All packages OK [09:56:34] RECOVERY - DPKG on mw1045 is OK: All packages OK [09:56:45] PROBLEM - Apache HTTP on mw1031 is CRITICAL: Connection refused [09:57:02] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65558 [09:57:03] RECOVERY - DPKG on mw1056 is OK: All packages OK [09:58:43] RECOVERY - Apache HTTP on mw1031 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.051 second response time [09:59:44] sorry [10:00:13] RECOVERY - Apache HTTP on mw1118 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.096 second response time [10:00:13] RECOVERY - Apache HTTP on mw1126 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.092 second response time [10:00:13] RECOVERY - Apache HTTP on mw1115 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.096 second response time [10:00:23] RECOVERY - Apache HTTP on mw1023 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [10:00:23] RECOVERY - Apache HTTP on mw1038 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.073 second response time [10:00:23] RECOVERY - Apache HTTP on mw1054 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.053 second response time [10:00:23] RECOVERY - Apache HTTP on mw1077 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [10:00:23] RECOVERY - Apache HTTP on mw1056 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.062 second response time [10:00:32] dschoon: well html is such a new technology.. [10:00:33] RECOVERY - Apache HTTP on mw1114 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.131 second response time [10:00:33] RECOVERY - Apache HTTP on mw1045 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.090 second response time [10:00:33] RECOVERY - Apache HTTP on mw1127 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.058 second response time [10:00:33] RECOVERY - Apache HTTP on mw1046 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [10:00:33] RECOVERY - Apache HTTP on mw1029 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.065 second response time [10:00:43] RECOVERY - Apache HTTP on mw1068 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [10:00:43] RECOVERY - Apache HTTP on mw1034 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [10:01:03] RECOVERY - Apache HTTP on mw1057 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [10:01:03] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [10:01:03] RECOVERY - Apache HTTP on mw1063 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.051 second response time [10:01:29] New patchset: RobH; "totally lied, more smokeping!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65559 [10:01:43] New patchset: Mark Bergsma; "Add the Squid error page to Varnish" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65560 [10:02:08] Jeff_Green: hey, so thulium should be happy now [10:02:30] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65559 [10:02:40] LeslieCarr: indeed it is! thanks [10:03:36] New patchset: Mark Bergsma; "Add the Squid error page to Varnish" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65560 [10:05:14] New patchset: Faidon; "Varnish: fix wikimedia.vcl.erb's std.collect()" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65561 [10:07:35] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65561 [10:07:49] New patchset: RobH; "splitting recuse directory from file link creation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65563 [10:08:21] hehe, recurse directory ;) [10:10:20] New patchset: RobH; "smokeping: splitting recuse directory from file link creation & adding dependency chain" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65563 [10:11:18] New review: Faidon; ""Unknown"?" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/62103 [10:11:40] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65563 [10:12:15] New patchset: Asher; "twemproxy stats service should listen on lo only" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65566 [10:15:28] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65566 [10:18:36] New patchset: RobH; "smokeping: cannot recurse when file isnt copied down, having to use ugly bracketing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65569 [10:19:33] New patchset: Mark Bergsma; "Make the error page nicer." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65570 [10:19:59] paravoid, yes, its unknown - It was there before, treated as a test range, and now I kept it because I am not sure what tests are running from there. I could delete it if you want [10:21:27] New patchset: Asher; "twemproxy start=>true, to support reload when upstart twemproxy.conf changes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65571 [10:22:00] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65569 [10:22:31] New review: Yurik; "yes, its unknown - It was there before, treated as a test range, and now I kept it because I am not ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62103 [10:23:05] New patchset: Mark Bergsma; "Make the error page nicer." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65570 [10:23:15] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65571 [10:24:01] New review: Faidon; "This is a new ACL for new functionality that's being made specifically for your needs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62103 [10:29:04] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65560 [10:29:17] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65570 [10:29:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:30:21] New patchset: Catrope; "New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [10:30:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [10:30:52] New patchset: Catrope; "Parsoid VCL refinements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64008 [10:36:09] New patchset: Mark Bergsma; "Add the error page include file to common-vcl" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65576 [10:36:35] New patchset: Catrope; "New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [10:37:03] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65576 [10:38:10] # md5sum /usr/bin/varnishncsa [10:38:10] b87e487430f56df12138862eadab07c6 /usr/bin/varnishncsa [10:38:21] ^ mark-, is this the version running on bits? [10:38:45] err, mark [10:39:06] i'll check [10:39:28] yes it is [10:39:29] New patchset: Faidon; "Allow XFF spoofing from the trusted IPs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62103 [10:39:58] thanks for checking [10:40:18] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/62103 [10:41:03] Change abandoned: Catrope; "Squashed into https://gerrit.wikimedia.org/r/#/c/63890/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64008 [10:44:29] New patchset: Mark Bergsma; "VCL is not C..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65577 [10:44:29] RoanKattouw: thanks :) [10:45:26] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65577 [11:15:16] New patchset: BBlack; "Timeout improvements" [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65582 [11:17:35] Change merged: BBlack; [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65582 [11:35:07] New patchset: BBlack; "kill ndebug unused warning" [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65583 [11:35:07] New patchset: BBlack; "better queue empty detection" [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65584 [11:36:04] Change merged: BBlack; [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65583 [11:37:25] Change merged: BBlack; [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65584 [11:39:35] New patchset: Dzahn; "add "/entity/" redirects for wikidata per" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65463 [11:52:32] Anyone care where we put a wikidata test repo? [11:52:43] test.wikidata.org? test-wikidata.wikimedia.org [11:54:17] New patchset: Dzahn; "add virtual language subdomain redirects for wikidata" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65443 [11:54:22] New patchset: RobH; "smokeping updates" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65585 [11:54:28] That's quirky, el.wikidata.org resolves to eqiad, en.wikidata.org resolves to esams [11:56:02] New patchset: Ottomata; "Getting oxygen ready for precise upgrade" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65586 [11:56:37] Can't be doing with that unprecise oxygen [11:56:53] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65586 [11:57:59] no way man [11:58:12] i like my oxygen real precise [12:03:19] mutante: Hey. Can you add test.wikidata.org to the DNS servers please? [12:06:56] New patchset: Reedy; "Don't redirect test.wikidata.org to the language redirects" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65589 [12:08:14] PROBLEM - Host oxygen is DOWN: CRITICAL - Plugin timed out after 15 seconds [12:09:42] PROBLEM - Puppet freshness on stat1 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:42] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:42] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:42] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:14] Reedy: to point where? wikidata-lb or else? [12:15:27] yeah, wikidata-lb should be fine [12:15:49] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65585 [12:15:51] should be setting a wiki up there [12:16:43] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [12:16:43] RECOVERY - Host oxygen is UP: PING OK - Packet loss = 0%, RTA = 0.16 ms [12:16:53] !log taking oxygen down for precise upgrade [12:17:01] Logged the message, Master [12:20:03] New patchset: Mark Bergsma; "Pass requests with a LoggedOut cookie instead" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65597 [12:20:48] PROBLEM - Host oxygen is DOWN: PING CRITICAL - Packet loss = 100% [12:20:53] oh noes, no more oxygen! i can't breathe! [12:21:15] PROBLEM - DPKG on ersch is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:21:35] !log DNS update - add test.wikidata cname [12:21:43] Logged the message, Master [12:22:11] Reedy: test.wikidata.org. 3600 IN CNAME wikidata-lb.wikimedia.org. [12:24:15] RECOVERY - DPKG on ersch is OK: All packages OK [12:26:48] Thanks [12:27:28] yw [12:27:53] AaronSchulz: ping [12:28:52] New patchset: Reedy; "Add wikidata (repo) dblist" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65599 [12:29:14] New patchset: Asher; "run twemproxy as nobody" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65600 [12:29:31] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65599 [12:29:49] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65600 [12:30:31] !log reedy synchronized wikidata.dblist [12:30:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:30:39] Logged the message, Master [12:32:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.146 second response time [12:34:30] !log reedy synchronized wmf-config/CommonSettings.php [12:34:39] Logged the message, Master [12:36:18] New patchset: Dzahn; "add virtual language subdomain redirects for wikidata" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65443 [12:40:03] New patchset: Reedy; "testwikidatawiki (is that even going to be the right dbname?) config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65602 [12:41:16] Hmm [12:41:23] Is apache just going to redirect test -> www [12:43:51] New patchset: Dzahn; "add virtual language subdomain redirects for wikidata" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65443 [12:43:58] !log authdns-update [12:43:58] http://test.wikidata.org [12:43:58] * 301 Moved Permanently http://www.wikidata.org/wiki/Wikidata:Main_Page [12:44:06] Logged the message, Master [12:44:33] Oh, I see [12:44:38] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65597 [12:44:43] Reedy: hmm.. so if it's an actual wiki, would that mean it has to be in langlist? [12:44:48] ah, you do? [12:45:06] RewriteCond %{HTTP_HOST} =wikidata.org [12:45:06] RewriteRule ^(.*)$ http://www.wikidata.org$1 [R=301,L,NE] [12:45:27] Shouldn't need to be [12:45:43] that is for the "naked" domain [12:45:53] can't use that with loadbalancer [12:45:57] New patchset: Asher; "twemproxy proc monitor, don't run in labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65604 [12:45:59] gotta be a CNAME [12:46:06] So that's only http://wikidata.org? [12:46:12] yea, should be [12:46:14] not somesubdomain.wikidata.org? [12:46:17] yea [12:46:52] eh, right now it's in main.conf and in redirects.conf [12:47:05] but one has the Server Alias wikidata.org and the other *.wikidata.org [12:47:07] mutante: for https://gerrit.wikimedia.org/r/#/c/65443/4/main.conf can we put the special page there instead of the redirector? [12:47:18] i was going to merge that in one of those pending patches , too [12:48:11] aude: Presumably if you tell him where it needs to go.. ;) [12:48:12] heh [12:48:40] mutante: I'll abadon my not test patch then [12:48:43] aude: http:// or /w ? [12:48:52] Reedy: ok [12:49:15] Change abandoned: Reedy; "Daniel has added this to 65443" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65589 [12:50:38] Reedy: i should have actually merged that somehow :p [12:50:49] Merge yours first [12:51:11] Rebase mine, submit [12:51:16] eh, i meant merge 2 pending changes into a 1 pending change [12:51:24] git squash [12:51:29] yea, squash [12:51:37] should have said squash [12:52:48] mutante: probably http:// [12:53:06] New patchset: Asher; "twemproxy proc monitor, don't run in labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65604 [12:56:39] New patchset: Mark Bergsma; "Add amssq47-62 as text (varnish) proxies to the trusted XFF list" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65605 [12:57:41] aude: ok, yea, trying, but needs different rules as well then if i cant just pass the whole request_uri and http_host over but it needs to be actually in /site/title format already [12:58:31] hmmmm [12:58:40] New review: Reedy; "You're using an old version of mediawiki-config" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/65605 [12:58:48] SpecialItemByTitle won't just take %{HTTP_HOST}&uri=%{REQUEST_URI} [12:59:17] right, i need to already do the formatting in apache then [12:59:18] ah, i see [12:59:44] to have Special:ItemByTitle/{site}/{title} [13:01:21] New patchset: Mark Bergsma; "Add amssq47-62 as text (varnish) proxies to the trusted XFF list" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65606 [13:01:55] Change abandoned: Mark Bergsma; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65605 [13:02:17] Change merged: Mark Bergsma; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65606 [13:04:35] Reedy: still fenari now, right [13:04:46] for noc, yes [13:04:55] for anything else, tin:/a/common [13:04:55] noc? [13:04:58] sync-file as usual [13:04:58] ah [13:05:02] tnx [13:05:03] https://noc.wikimedia.org/conf/ [13:05:07] right [13:05:30] so wmf-config is noc [13:07:47] /h/w/common/wmf-config/ on fenari [13:08:55] PROBLEM - DPKG on search31 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:04] PROBLEM - DPKG on search24 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:04] PROBLEM - DPKG on search27 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:05] !log mark synchronized wmf-config/squid.php 'Add amssq47-62' [13:09:13] Logged the message, Master [13:09:14] PROBLEM - DPKG on search18 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:14] PROBLEM - DPKG on search17 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:15] PROBLEM - DPKG on searchidx1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:15] PROBLEM - DPKG on search19 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:15] PROBLEM - DPKG on search26 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:24] PROBLEM - DPKG on search15 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:34] PROBLEM - DPKG on search13 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:44] PROBLEM - DPKG on search14 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:44] PROBLEM - DPKG on search1018 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:45] PROBLEM - DPKG on search25 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:45] PROBLEM - DPKG on searchidx2 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:10:00] RECOVERY - DPKG on search31 is OK: All packages OK [13:10:04] RECOVERY - DPKG on search24 is OK: All packages OK [13:10:04] RECOVERY - DPKG on search27 is OK: All packages OK [13:10:14] RECOVERY - DPKG on search18 is OK: All packages OK [13:10:14] RECOVERY - DPKG on search17 is OK: All packages OK [13:10:14] RECOVERY - DPKG on search19 is OK: All packages OK [13:10:14] RECOVERY - DPKG on search26 is OK: All packages OK [13:10:24] RECOVERY - DPKG on search15 is OK: All packages OK [13:10:35] RECOVERY - DPKG on search13 is OK: All packages OK [13:10:45] RECOVERY - DPKG on search14 is OK: All packages OK [13:10:45] RECOVERY - DPKG on search1018 is OK: All packages OK [13:10:45] RECOVERY - DPKG on search25 is OK: All packages OK [13:11:37] ok so I'm an idiot [13:11:40] those squids WERE in there [13:15:47] RECOVERY - DPKG on searchidx2 is OK: All packages OK [13:16:44] New patchset: Mark Bergsma; "Remove duplicate (upload) squids, update cache list with reality" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65609 [13:17:57] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [13:18:32] Change merged: Mark Bergsma; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65609 [13:18:48] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 27.08 ms [13:19:09] Coren: question you might know the answer to -- how do I change my password on labs/integration? every time I change it via wikitech it seems like something sets it back to whatever it was before... [13:20:20] New patchset: BBlack; "some basic unit-test stuff for strq, + coverage" [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65610 [13:20:20] New patchset: BBlack; "fix another bad assert constraint" [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65611 [13:20:49] !log mark synchronized wmf-config/squid.php 'Update cache list' [13:20:57] Logged the message, Master [13:20:59] Change merged: BBlack; [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65610 [13:21:19] Change merged: BBlack; [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65611 [13:22:48] New patchset: BBlack; "bump version to 0.0.3" [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65612 [13:23:12] Change merged: BBlack; [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65612 [13:27:17] RECOVERY - DPKG on searchidx1001 is OK: All packages OK [13:29:42] andre__: doc = /h/w/doc/bugzilla on fenari [13:33:17] PROBLEM - NTP on mw31 is CRITICAL: NTP CRITICAL: Offset unknown [13:33:56] New patchset: Faidon; "Varnish: separate Zero-related carrier stuff" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65614 [13:34:47] RECOVERY - Host oxygen is UP: PING OK - Packet loss = 0%, RTA = 1.68 ms [13:37:29] New review: Dr0ptp4kt; "Added Yuri, Max, and Arthur just for visibility on the change." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65614 [13:38:17] RECOVERY - NTP on mw31 is OK: NTP OK: Offset -0.000208735466 secs [13:43:53] New patchset: BBlack; "Merge branch 'master' into debian" [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65616 [13:43:53] New patchset: BBlack; "bump pkg version" [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65617 [13:44:34] Change merged: BBlack; [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65616 [13:44:51] Change merged: BBlack; [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65617 [13:54:28] New patchset: BBlack; "gbp ignore coverage.sh" [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65618 [13:54:44] Change merged: BBlack; [operations/software/varnish/vhtcpd] (debian) - https://gerrit.wikimedia.org/r/65618 [13:55:28] PROBLEM - Host oxygen is DOWN: PING CRITICAL - Packet loss = 100% [14:01:59] New patchset: RobH; "smokeping: fix labs only cert syntax error" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65620 [14:06:00] New patchset: Mark Bergsma; "Allow requests with LoggedOut cookies to be cached" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65621 [14:06:32] New patchset: Dzahn; "add virtual language subdomain redirects for wikidata" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65443 [14:08:04] RECOVERY - Host oxygen is UP: PING WARNING - Packet loss = 58%, RTA = 2.36 ms [14:10:04] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: Connection refused by host [14:10:44] PROBLEM - udp2log log age for oxygen on oxygen is CRITICAL: Connection refused by host [14:11:10] PROBLEM - SSH on oxygen is CRITICAL: Connection refused [14:12:25] New patchset: Mark Bergsma; "Allow requests with LoggedOut cookies to be cached" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65621 [14:15:04] New review: Aude; "this works fine, generally. (tested)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65443 [14:15:27] mutante: comments on https://gerrit.wikimedia.org/r/#/c/65443/ [14:15:45] PROBLEM - DPKG on tarin is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:15:59] it works nicely though wonder if/how weird but somewhat common edge cases can be handled [14:16:40] RECOVERY - DPKG on tarin is OK: All packages OK [14:19:10] New patchset: Jeremyb; "add "/entity/" redirects for wikidata per" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65463 [14:20:19] aude: glad to hear it works:) but hmm... /([^/]* doesn't match % ? ehm,... [14:20:44] New patchset: RobH; "claiming db1014 for assignment as netmon1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65623 [14:21:14] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65620 [14:21:53] greg-g: "life"? [14:21:55] :-) [14:22:08] New patchset: RobH; "claiming db1014 for assignment as netmon1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65623 [14:22:40] PROBLEM - NTP on oxygen is CRITICAL: NTP CRITICAL: No response from NTP server [14:22:57] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65623 [14:23:05] jeremyb: yeah yeah :P I saw it after I hit submit. My x200s' screen is really dirty :) [14:23:28] so the s is part of the model #? [14:23:37] periods look like commas, etc [14:23:40] yeah [14:23:43] huh [14:23:53] sounds fixable :) [14:24:14] http://www.thinkwiki.org/wiki/Category:X200s [14:24:28] yeah, just need to find my camera cleaning cloth [14:26:01] New review: Jeremyb; "simplified the /entity/ redirect. also as it was, this would have worked for /entity/ but done nothi..." [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/65463 [14:26:58] New patchset: Pyoungmeister; "comment for bookkeeping" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65624 [14:27:21] mutante: if the ? can't be covered in the first patch, suppose it's okay [14:27:36] step in the right direction, though would be nice to figure it out [14:28:34] New patchset: Pyoungmeister; "comment for bookkeeping" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65624 [14:29:36] i'm not really understanding 65443 :( [14:30:10] RECOVERY - SSH on oxygen is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:30:59] aude: ok, yea [14:31:07] jeremyb: http://meta.wikimedia.org/wiki/Wikidata/Notes/URI_scheme#Planned_implementation [14:31:42] mutante: nope :( [14:31:44] !log authdns-update for netmon1001 [14:31:52] Logged the message, RobH [14:31:57] mutante: how could recursion be triggered? [14:32:11] is this installed in labs somewhere? [14:32:22] anyway, not looking now, have to get ready for lunch [14:32:34] mutante: did you see 65463? [14:34:51] have to make sure that these special pages are sending the right cache headers [14:35:05] jeremyb: thanks, will get back to those later [14:35:09] k [14:35:21] are the php parts of all this already deployed? [14:35:34] jeremyb: they are [14:35:37] k [14:35:41] * jeremyb runs away [14:36:01] oh, also, do you have adequete stroopwafel supplies? [14:36:11] yes! [14:40:36] RECOVERY - NTP on oxygen is OK: NTP OK: Offset -0.01303434372 secs [14:40:43] New review: Nikerabbit; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65620 [14:43:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [14:48:29] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65624 [14:49:34] New patchset: RobH; "mistakenly changed out regex" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65629 [14:50:21] New patchset: RobH; "mistakenly changed out regex" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65629 [14:51:25] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65629 [14:52:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:54:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [14:55:45] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65614 [14:56:37] RECOVERY - Varnish HTCP daemon on cp1029 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [14:57:21] bblack: that you? [14:57:31] yes [14:57:35] it's a recovery, but still :) [14:57:59] I have bugs to hunt and new debs to make, so I started back up the old daemon on the test host for now [14:59:46] RECOVERY - udp2log log age for oxygen on oxygen is OK: OK: all log files active [15:00:08] bugs! [15:02:04] well, at least one [15:17:05] !log reedy synchronized wmf-config/InitialiseSettings.php [15:17:15] Logged the message, Master [15:18:04] notpeter, oxygen is back up and runing, than youuu! [15:18:48] yay! thanks guys! [15:24:40] New patchset: Petrb; "new tool for easy sql replica access" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65634 [15:25:41] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:27:15] New patchset: Reedy; "Set wgProofreadPageNamespaceIds for thwikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65635 [15:27:41] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [15:30:06] New patchset: Reedy; "Set wgProofreadPageNamespaceIds for thwikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65635 [15:30:35] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65635 [15:32:46] !log reedy synchronized wmf-config/InitialiseSettings.php [15:32:55] Logged the message, Master [15:38:37] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:39:27] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [15:42:58] !log reedy synchronized wmf-config/InitialiseSettings.php [15:43:07] Logged the message, Master [15:43:26] New patchset: Reedy; "Added english proofreadpage aliases to thwikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65636 [15:44:47] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65636 [16:08:23] PROBLEM - Host amssq47 is DOWN: PING CRITICAL - Packet loss = 100% [16:09:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65621 [16:10:09] New patchset: Mark Bergsma; "Add the error page to the backends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65637 [16:12:23] New patchset: Mark Bergsma; "Add the error page to the backends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65637 [16:13:08] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65637 [16:15:35] RECOVERY - Host amssq47 is UP: PING OK - Packet loss = 16%, RTA = 89.13 ms [16:20:33] PROBLEM - Host amssq47 is DOWN: PING CRITICAL - Packet loss = 100% [16:20:43] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [16:20:44] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [16:20:44] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [16:23:43] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [16:23:43] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [16:51:39] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:53:39] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [17:09:08] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [17:55:53] New patchset: BBlack; "doc received bytes on recv timeout" [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65639 [17:55:54] New patchset: BBlack; "bugfix on queue growth, more strq sanity checks" [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/65640 [18:04:41] PROBLEM - Varnish HTCP daemon on cp1029 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (varnishhtcpd), args varnishhtcpd worker [18:04:54] ^ that's me as well, don't worry [18:15:53] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [18:16:53] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [18:19:53] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [18:46:15] New patchset: Dereckson; "Throttle rule for NIH/NLM/NCBI May editing session" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [18:48:56] PROBLEM - Puppet freshness on stat1002 is CRITICAL: No successful Puppet run in the last 10 hours [19:00:45] PROBLEM - Disk space on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:35] RECOVERY - Disk space on mc15 is OK: DISK OK [19:26:56] Change abandoned: Dereckson; "The NIH Campus have several outgoing addresses." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [19:33:12] bblack: ping [19:33:33] ori-l: pong [19:51:44] Change restored: Dereckson; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [19:51:47] New patchset: Dereckson; "Throtle now handles IP ranges. NIH throttle rule." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [19:53:05] New patchset: Dereckson; "Throtle now handles IP ranges. NIH throttle rule." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [19:53:44] New review: Dereckson; "P2: adding IP ranges support" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [20:20:02] New patchset: Reedy; "testwikidatawiki (is that even going to be the right dbname?) config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65602 [20:49:38] New review: Se4598; "file intro needs update to reflect the new option" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/65644 [20:58:25] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [21:08:29] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [21:19:01] New patchset: Dereckson; "Throtle now handles IP ranges. NIH throttle rule." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [21:19:50] New patchset: Nemo bis; "Throttle now handles IP ranges. NIH throttle rule." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [21:19:50] New review: Dereckson; "PS4. Adding documentation." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65644 [21:54:02] New review: Daniel Kinzler; "looks to me like it should do what we want." [operations/apache-config] (master) C: 1; - https://gerrit.wikimedia.org/r/65463 [22:07:32] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [22:10:11] PROBLEM - Puppet freshness on stat1 is CRITICAL: No successful Puppet run in the last 10 hours [22:15:11] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [22:15:11] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [22:15:12] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [22:17:12] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [22:29:31] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [22:52:12] New patchset: Wpmirrordev; "Fix for compatibility with help2man and Debian Policy" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64343 [22:53:32] New patchset: Wpmirrordev; "Fix for compatibility with help2man and Debian Policy" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64343 [23:10:14] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [23:15:44] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [23:49:29] New patchset: Wpmirrordev; "Fix for compatibility with help2man and Debian Policy" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64343