[00:01:07] $wgTiffMaxMetaSize = 64*1024; [00:01:19] * AaronSchulz wonders what explodes when that is higher [00:01:19] so...the TIFF metadata thing may be related to a recent shell bug [00:01:34] * robla looks up the bug he's thinking of [00:02:04] the max thumb area? [00:02:20] AaronSchulz: why would that change with precise deployment? [00:02:26] these seem to be below 25 Mpx anyway [00:03:37] the overall rate of "thumbnail failed" messages in thumbnail.log is actually a bit lower than a week ago [00:04:19] Nemo_bis: yeah, that's the one: https://bugzilla.wikimedia.org/show_bug.cgi?id=41125 [00:04:54] TimStarling: well it does shell out in retrieveMetadata(), not sure why the metadata would enlarge [00:05:09] it probably isn't related...probably [00:05:26] binasher: which is always quite high :) [00:05:42] the obvious oggThumb error is "OggHandler requires oggThumb version 0.9 or later" [00:05:52] which it does, and the code that generates that error is pretty specific [00:06:08] that's all I get for the ogg files after a few refreshes [00:06:36] it's not something we can fix easily in MW [00:06:52] what is the version being used? Is it not present? [00:07:05] it's present [00:07:13] if ( count( $lines ) > 0 [00:07:13] && preg_match( '/invalid option -- \'n\'$/', $lines[0] ) ) [00:07:13] { [00:07:13] return wfMessage( 'ogg-oggThumb-version', '0.9' )->inContentLanguage()->text(); [00:07:19] see, very specific [00:07:27] it checks for an "invalid option" message [00:07:33] TimStarling: did we have a backported version in Lucid or something? [00:07:37] yes [00:08:32] root@srv220:~# dpkg-query -W oggvideotools [00:08:32] oggvideotools 0.8-1 [00:08:35] binasher: paravoid: would it be faster to build the new package, or downgrade the scalers? [00:09:00] new package [00:09:01] or set up existing lucid apaches as scalers [00:09:30] add them to the puppet group, create /a/magick-tmp [00:09:38] let me check quickly [00:09:46] modify pybal conf, then you're pretty much done, right? [00:10:00] lucid-wikimedia|main|amd64: oggvideotools 0.8a-1 [00:10:01] lucid-wikimedia|main|amd64: oggvideotools-dbg 0.8a-1 [00:10:01] lucid-wikimedia|main|source: oggvideotools 0.8a-1 [00:10:12] that's what we had in lucid [00:10:19] let's check what 0.8"a" is [00:10:29] maybe we had a patch from 9? [00:10:31] a==awesome [00:11:17] i made the packaging for 0.8a from scratch [00:11:27] it should just build directly on precise [00:11:29] oggvideotools (0.8a-1) lucid-wikimedia; urgency=low [00:11:29] * Initial packaging of CMake build based oggtools [00:11:29] - all new debian/* [00:11:29] * Minor bugfixes and enhancements [00:11:29] -- Asher Feldman Fri, 09 Sep 2011 14:06:00 -0700 [00:11:32] oggvideotools (0.8-1) unstable; urgency=low [00:12:54] so.....someone starting to build? [00:12:55] so why MW says that it needs 0.9? [00:13:05] 0.9 doesn't exist [00:13:12] yes robla [00:13:20] cool, thanks! [00:13:33] oh you beat me to it? [00:13:34] oh well [00:13:46] no [00:14:10] because I thought that they might get around to releasing some time in the two years after they made that change [00:14:29] and 0.9 is the version our changes would be released into [00:14:34] oh....we're using the 0.9 alpha [00:15:04] which is called 0.8a because....I'm assuming because it's alpha software [00:15:15] binasher: did you modify 0.8a in any way? [00:15:19] 0.8a was an official release [00:15:27] no [00:15:28] I know [00:15:29] okay [00:15:45] I'll backport the quantal package then if you don't mind [00:16:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:14] not that anything's wrong with yours, it's just that if anything's wrong with the quantal ones I'd like them to get fixed before our next upgrade :) [00:19:36] I'm isolating that TIFF metadata thing [00:20:27] seems to be 450KB either way [00:20:32] I'll just increase the limit [00:21:23] Now it would be funny if stuff blew up after saying "I'll just increase the limit" :) [00:21:45] * AaronSchulz wonders why the limit had the value it did [00:23:10] New patchset: Tim Starling; "Increase $wgTiffMaxMetaSize" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29913 [00:23:34] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29913 [00:24:17] !log tstarling synchronized wmf-config/CommonSettings.php [00:24:25] Logged the message, Master [00:25:15] TimStarling: time to start testing tiffs? [00:25:46] I loaded a few of the test cases on commons, they all worked [00:25:50] but you can test too if you like [00:25:59] yeah, it seems to work [00:27:12] !log deploying updated squid mobile redirector [00:27:22] Logged the message, Master [00:27:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.404 seconds [00:28:52] the PDF issue could be OOM [00:33:11] maybe not, I tried a test case and it seems to be broken on both lucid and precise [00:34:10] with no memory limit [00:34:20] it's possible that the pdfs are an old problem [00:34:56] !log upgrading oggvideotools to 0.8a on all imagescalers, fixing regression on the lucid->precise upgrade [00:35:09] Logged the message, Master [00:35:09] ok, lets have our one-on-one meeting now [00:35:19] and done [00:35:35] TimStarling: yup [00:36:36] paravoid: already? [00:36:36] yes [00:36:36] Error creating thumbnail: oggThumb failed to create the thumbnail. [00:36:39] yay... [00:36:50] well no version error anymore [00:55:45] New patchset: Asher; "remove duplicate nagios grp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29920 [00:56:08] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29920 [01:02:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:03:46] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [01:14:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.439 seconds [01:26:52] PROBLEM - Puppet freshness on srv222 is CRITICAL: Puppet has not run in the last 10 hours [01:27:46] PROBLEM - Puppet freshness on srv221 is CRITICAL: Puppet has not run in the last 10 hours [01:40:40] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 295 seconds [01:49:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:51:55] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [01:52:13] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [02:00:41] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 306 seconds [02:03:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.020 seconds [02:35:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:52] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [02:37:52] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [02:38:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.028 seconds [02:42:10] RECOVERY - Puppet freshness on brewster is OK: puppet ran at Thu Oct 25 02:41:35 UTC 2012 [02:42:10] RECOVERY - Puppet freshness on srv222 is OK: puppet ran at Thu Oct 25 02:41:36 UTC 2012 [02:44:17] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [02:48:10] RECOVERY - Puppet freshness on srv221 is OK: puppet ran at Thu Oct 25 02:48:02 UTC 2012 [02:50:43] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 214 seconds [02:50:43] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.004 second response time on port 11000 [02:50:43] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 217 seconds [02:54:46] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [02:57:19] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [02:57:19] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [03:11:52] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [03:24:46] PROBLEM - Puppet freshness on mw40 is CRITICAL: Puppet has not run in the last 10 hours [03:31:40] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 206 seconds [03:31:40] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 208 seconds [03:36:38] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [03:36:38] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [03:45:28] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 13 seconds [04:33:10] RECOVERY - Puppet freshness on spence is OK: puppet ran at Thu Oct 25 04:33:00 UTC 2012 [04:36:55] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [07:12:50] !log Stopping backend squid on sq82, sda I/O errors [07:13:11] Logged the message, Master [07:25:46] New patchset: Mark Bergsma; "Install ngrep on all machines" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29929 [07:29:58] New patchset: Mark Bergsma; "Fix memcached monitoring mess" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29930 [07:35:26] New patchset: Mark Bergsma; "Install ngrep on all machines" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29929 [07:35:26] New patchset: Mark Bergsma; "Fix memcached monitoring mess" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29930 [07:35:56] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29929 [07:36:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29930 [08:33:29] !log Added LVS service IPs for wikidata and wikivoyage (pmtpa/eqiad) to DNS [08:33:41] Logged the message, Master [08:35:18] ack, they look okay [08:35:41] ? [08:35:55] the DNS entries :) [08:36:51] New patchset: Mark Bergsma; "Fix nagios group description" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29934 [08:37:27] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29934 [08:39:27] paravoid: mark: I would like to extract a PHP linting script out of misc::deployment::scripts, I am wondering if should create a puppet module such as wmfscripts which would have a wmfscripts::phplinter or simply add a new manifest under misc (such as misc::phplinter ). [08:39:41] the change itself is pretty straightforward, simply need to require a package and copy two files :-] [08:40:10] I just can't make a choice between module or an additional class in the main config [08:41:19] we're slowly moving into modules but if you're not willing to clean things up and move other things besides phplinter to the module, I think a manifest under misc is ok. [08:43:04] I guess I should write the module so. No point in continuing pilling stuff I guess [08:43:18] would it be acceptable to write a base wmf scripts module that simply provide the PHP linter for now? [08:43:25] we could move the other scripts over time [08:43:54] !log Added georecords for wikidata-lb.wikimedia.org and wikivoyage-lb.wikimedia.org, their geomaps containing just a default entry pointing to eqiad [08:44:07] Logged the message, Master [08:45:50] mark: want me to do anything wrt wikidata/voyage? [08:46:07] no [08:46:08] I thought Daniel was doing most of it, but I'd be happy to if you want [08:46:29] yeah well but yesterday evening it was clear to me he didn't really understand how it (dns mostly) works [08:46:38] so I wasn't comfortable with him doing it without supervision [08:46:44] so I offered to do it today and tell him what I did [08:47:28] aha, okay [08:47:41] I think I know how DNS works, but I also think you did everything already, so... [08:51:04] New patchset: Mark Bergsma; "Add IPv6 LVS service IPs for wikidata/wikivoyage" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29936 [08:54:51] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29936 [08:57:15] grr slow puppet [09:03:15] New patchset: Hashar; "move PHP linter to a new `wmfscripts` module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29937 [09:03:39] paravoid: here the lame change to move the PHP linter out of the main manifests to a new 'wmfscripts' module https://gerrit.wikimedia.org/r/29937 [09:30:48] New patchset: Mark Bergsma; "Add LVS realserver IPs to protoproxy hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29938 [09:31:21] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29938 [09:35:35] New patchset: Mark Bergsma; "Add IPv6 service IP to wikidata SSL service, enable IPv6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29939 [09:36:09] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29939 [09:42:36] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [09:42:36] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [09:42:36] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [09:42:36] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [09:42:57] !log Hacked Nagios back up [09:43:01] it'll break again [09:43:06] Logged the message, Master [09:44:42] PROBLEM - LVS HTTPS IPv4 on wikivoyage-lb.pmtpa.wikimedia.org is CRITICAL: Connection refused [09:45:11] PROBLEM - LVS HTTPS IPv4 on wikidata-lb.pmtpa.wikimedia.org is CRITICAL: Connection refused [09:45:18] which brilliant mind setup LVS monitoring of lvs services that hadn't even been created yet [09:45:36] PROBLEM - Backend Squid HTTP on sq82 is CRITICAL: Connection refused [09:45:36] PROBLEM - SSH on nickel is CRITICAL: Server answer: [09:45:54] PROBLEM - HTTP on nickel is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:48:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:53:57] ok can I put some traffic on eqiad now paravoid? [09:55:18] hm? [09:55:27] varnish [09:55:50] i.e. more swift traffic due to misses [09:55:55] caches are empty [09:56:11] ganglia doesn't work for me [09:56:33] indeed [09:56:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.041 seconds [09:56:52] nickel is down [09:57:06] is it "no monitoring day"? [09:57:34] people have screwed it up quite well with that memcached stuff [09:57:40] might be ganglia related also [09:58:30] RECOVERY - LVS HTTPS IPv4 on wikidata-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 2528 bytes in 0.030 seconds [09:58:45] looks like swapping [09:58:52] Inickel login: root [09:58:52] Password: [09:58:52] Last login: Wed Oct 24 21:39:27 UTC 2012 from ool-45755507.dyn.optonline.net on pts/1 [09:58:58] and can't get a shell [10:01:57] New patchset: Mark Bergsma; "Fix eqiad IPv6 addresses" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29941 [10:02:31] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29941 [10:03:57] !log powercycling nickel, OOM & unable to login [10:04:11] Logged the message, Master [10:05:08] mark: I don't see a problem with loading swift more, although I'd like to have ganglia back before you do that [10:05:31] me too [10:05:51] RECOVERY - SSH on nickel is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:06:47] RECOVERY - HTTP on nickel is OK: HTTP OK - HTTP/1.1 302 Found - 0.063 second response time [10:08:04] why did we lose 2½ hours? [10:08:40] (of ganglia data) [10:13:35] New patchset: Mark Bergsma; "Add LVS service monitoring for wikidata/wikivoyage HTTP" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29944 [10:14:23] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29944 [10:24:55] !log Sending Canadian upload traffic to eqiad [10:25:11] Logged the message, Master [10:25:17] yay [10:25:46] I should totally replace the prompt with "\o/, master" [10:29:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:31:35] I'm *so* looking forward to ditching squid [10:32:33] me too [10:32:43] there's almost no load now, as it's night [10:32:46] I think i'll add some spanish traffic as well [10:32:57] not that it really matters for upload anyway [10:37:11] !log Sending Brazil upload traffic to eqiad [10:37:24] Logged the message, Master [10:38:00] (ok ok portugese ;) [10:38:58] > 20% hit rate now [10:39:48] more like 44% actually [10:41:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.787 seconds [10:42:11] PROBLEM - Auth DNS on ns2.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:42:39] restarted [10:43:39] RECOVERY - Auth DNS on ns2.wikimedia.org is OK: DNS OK: 0.137 seconds response time. www.wikipedia.org returns 208.80.154.225 [10:56:52] !log Sending Argentina upload traffic to eqiad [10:57:04] Logged the message, Master [11:00:31] added mexico [11:02:28] j^: btw, https://bugs.launchpad.net/bugs/1071085 (openstack-docs bug about the missing documentation for arbitrary headers) [11:03:43] ok [11:03:51] perhaps in an hour or so, i'll see if i can add the US [11:04:21] bbl [11:04:34] mark: I'm leaving for a few hours [11:04:44] I'll be back at 14:00 UTC for the gallium upgrade [11:05:17] (and after the upgrade booked with meetings until 1am localtime...) [11:05:31] don't hesitate to call me if anything's wrong with swift. [11:15:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:01] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [11:22:21] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:30:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.028 seconds [11:32:42] ok, all is well. see you in ~2h. [11:32:52] yep [11:47:35] ori-l, it's a bad idea to read in the bath about syslogd and get ideas. [11:50:52] 1. the well-known system fields should all be prefixed by one underscore. _browser_time , _user_anon_token, _page_id, etc. Mention any of these in your data model, and logEvent() will supply it. Two underscores are for PHP system fields for clicktracking on the server. [11:52:39] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [11:54:01] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29809 [11:59:20] ^demon: thanks for the integration gerrit user rights :) [11:59:42] PROBLEM - Puppet freshness on sq76 is CRITICAL: Puppet has not run in the last 10 hours [12:00:51] <^demon> hashar: yw [12:01:05] !log demon synchronized wmf-config/wgConf.php 'Syncing out new prefixes for wikidata/wikivoyage' [12:01:10] !log Sending US upload traffic to eqiad [12:01:20] Logged the message, Master [12:01:34] Logged the message, Master [12:03:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:08:15] well well [12:08:18] swift is doing 1000 req/s [12:17:43] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.045 seconds [12:38:42] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [12:38:42] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [12:49:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:55:48] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [13:04:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.053 seconds [13:14:03] New patchset: Hashar; "jenkins: OpenStack jenkins-job-builder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24620 [13:15:27] paravoid: ^demon aren't we doing the upgrades to day , [13:15:28] ? [13:15:31] or was it friday? [13:15:40] <^demon> Today, I thought. [13:16:13] been swamped in some puppet manifest, haven't seen the time yet ;-] [13:16:29] <^demon> In 45m :) [13:16:54] oh my god [13:16:56] I am so bad [13:16:58] with timezone [13:17:08] I though I was already in GMT+1 but still in +2 hehe [13:17:31] I need to get out by 14:50 UTC :-( [13:17:45] so left me only 50 minutes $$$$$ [13:17:59] <^demon> Well you've only got 1 box to dist-upgrade. I've got 2 :) [13:18:01] so cold outside that I already adjusted to the non DST time [13:18:07] true [13:18:55] once Gerrit has restarted, we will have to verify the Gerrit Trigger plugin is still communicating with Gerrit [13:19:09] it might need to be restarted via the Jenkins web interface [13:19:12] <^demon> If paravoid is willing, you could start gallium a bit earlier. I don't want to start gerrit early since I announce the window (and it affects more people) [13:19:37] yup [13:19:42] New review: TheDJ; "what about svgs with png thumbnails. Would that be a problem ?" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/29805 [13:19:44] <^demon> But that's up to him :) [13:21:02] <^demon> Add this to the long list of reasons I *hate* DST :) [13:21:32] I love having an extra hour of light in the evening [13:27:56] I have soooo many pending changes https://gerrit.wikimedia.org/r/#/q/owner:hashar+is:open,n,z [13:36:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:38:53] <^demon> hashar: I merged all your tweaks to sql.php [13:38:59] <^demon> Looked nice :) [13:39:08] ^demon: thanks :-] [13:45:57] fresh air, brb [13:49:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.913 seconds [13:53:08] hashar: you mean, fresh air, mixed with tobacco? ;) [13:55:28] Reedy: indeed :/ [13:55:35] I must quit smoking [13:56:24] heh [13:56:33] Any more thoughts about the nl hackathon? [13:59:08] Reedy: are you coming ? [13:59:37] we had a few mail exchanges [14:00:10] I think I'm going to [14:00:59] I transferred you 4 mails [14:01:06] <^demon> hashar, paravoid: You guys ready to start? [14:01:19] Reedy: would be mostly about CI stuff / Jenkins :-] [14:01:33] I am [14:01:42] but I think paravoid disappeared :-] [14:03:15] <^demon> uh oh [14:05:09] !log Sending all non-European upload traffic to eqiad [14:05:23] Logged the message, Master [14:06:19] heya [14:06:28] <^demon> Ah there he is :) [14:07:33] <^demon> I was going to step out for 5 minutes to the store across the street, then I'll be ready. [14:08:15] <^demon> Feel free to start gallium though, hashar's on a short timeline today [14:08:32] yeah I'm starting [14:09:09] i'm finishing [14:09:17] hmm? [14:09:42] I'm impressed by swift. [14:09:51] yes [14:09:56] it behaved quite well today [14:14:16] New patchset: Hashar; "jenkins: OpenStack jenkins-job-builder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24620 [14:16:39] <^demon> paravoid: I'm going to start on formey now (gerrit slave) before doing manganese (master) [14:21:32] puppet I hate you :( [14:22:42] grr [14:23:24] hashar: why the hell do we have postgres on gallium? [14:23:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:23:38] you can remove it for now [14:23:53] we originally wanted to run unit tests using a postgre backend [14:24:01] but it is not used yet, so feel free to remove it [14:24:10] <^demon> Well, rephrase: "we wanted to run it on alternate backends" [14:24:21] <^demon> There wasn't much about wanting postgres itself :p [14:24:30] oh [14:24:36] forgot you are not part of that cabal [14:24:38] sorry ;-] [14:24:48] PROBLEM - HTTPS on formey is CRITICAL: Connection refused [14:25:21] PROBLEM - HTTP on formey is CRITICAL: Connection refused [14:25:30] pff [14:25:33] <^demon> Yes yes, hang tight nagios. [14:26:04] one day I will have to look at the nagios suite and make the service checks dependent on the host check [14:26:14] so the service stop reporting they are down when the host does not even ping [14:28:15] <^demon> Why the heck is exim running on formey? [14:28:20] <^demon> Probably something stupid for svn :\ [14:28:33] local mta perhaps? [14:28:36] <^demon> Possibly. [14:28:37] every server runs exim [14:28:39] isn't it installed by default on all instances so the box can send mails by themselves ? [14:28:47] aka smtp( host => localhost ) [14:28:48] just usually not as a deamon [14:29:07] <^demon> It stopped...started during the upgrade. [14:30:34] !g v [14:30:34] https://gerrit.wikimedia.org/r/#q,v,n,z [14:30:40] !g 26420 [14:30:40] https://gerrit.wikimedia.org/r/#q,26420,n,z [14:33:05] New review: Hashar; "being tested on instance `integration-jobbuilder`. The password is not expanded despite it being the..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/24620 [14:33:45] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [14:34:39] PROBLEM - Puppet freshness on db9 is CRITICAL: Puppet has not run in the last 10 hours [14:34:52] poor db9 [14:35:10] paravoid: how it is going on on gallium? [14:35:47] progressing. [14:36:17] New patchset: Hashar; "jenkins: OpenStack jenkins-job-builder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24620 [14:36:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.850 seconds [14:36:43] New review: Hashar; "Fixed ;] Simply dont put a dollar sign in templates!" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/24620 [14:37:48] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [14:37:49] <^demon> Done on formey, rebooting. [14:39:51] <^demon> Hmm, can't seem to SSH to it. Pinging ok. [14:42:21] <^demon> paravoid: ^ [14:42:27] PROBLEM - SSH on formey is CRITICAL: Connection refused [14:43:01] there you go paravoid [14:43:06] they can fix sq82 now ;-) [14:43:42] bah more broken packages :-( [14:44:14] jeff_green: db78 is ready for you [14:44:22] mark: ? [14:44:36] cmjohnson1: great--thank you! [14:44:39] ^demon: okay, let me login to the mgmt [14:45:06] well i got people off squid [14:45:12] ah [14:46:01] paravoid: according to apt log the gallium upgrade does work that well [14:46:06] we can resume after formey/ gerrit [14:46:16] must grab my daughter :/ [14:46:23] I have though we were out of DST already duh [14:47:05] it's progressing, I'm fixing problems as I go [14:47:20] I'm not worried [14:47:42] i noticed there is no candidate for the testswarm package not an issue though since we are no more using it [14:47:54] can be removed from the box [14:48:14] will reconnect in about 50 minutes [14:48:28] ^demon: nothing on the console. what did you do? [14:48:37] brb [14:48:58] <^demon> paravoid: I just did do-release-upgrade. Nothing notable happened. Got to the end and rebooted. [14:49:19] tried SSHing to 1022? [14:49:33] huh, doesn't work [14:49:36] ok then, I'll powercycle it [14:49:36] <^demon> Yes, didn't work. [14:52:43] !log shutting down srv194 to replace disk [14:52:55] Logged the message, Master [14:54:23] !log shutting down sq82 to replace /dev/sda [14:54:35] Logged the message, Master [14:55:30] PROBLEM - Host formey is DOWN: CRITICAL - Host Unreachable (208.80.152.147) [14:56:06] PROBLEM - LVS HTTP IPv4 on wikivoyage-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:33] The disk drive for /svnroot is not ready yet or not present. [14:56:49] The disk drive for /var/lib/gerrit2 is not ready yet or not present. [14:56:53] Continue to wait, or Press S to skip mounting or M for manual recovery [14:56:59] SSh should be back. [14:57:27] RECOVERY - SSH on formey is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [14:57:36] RECOVERY - Host formey is UP: PING OK - Packet loss = 0%, RTA = 0.15 ms [14:57:40] I'm going to silence that [14:57:54] !log silencing wikivoyage checks in nagios until deployed [14:58:08] Logged the message, notpeter [14:58:30] PROBLEM - Host sq82 is DOWN: PING CRITICAL - Packet loss = 100% [14:59:24] ^demon: are you fixing those mount errors or do you need help? [15:00:05] notpeter: kudos [15:00:40] <^demon> paravoid: I'm not sure how, I'm afraid :\ [15:00:59] don't worry, I'll have a look :-) [15:01:26] <^demon> Also, it's not letting me ssh (prompting for password) [15:01:30] <^demon> But at least ssh is up. [15:01:40] it doesn't? [15:01:47] it lets me just fine [15:05:21] <^demon> paravoid: I've got my key in /home/demon/.ssh/authorized_keys, right? [15:06:07] Oct 25 15:05:52 formey sshd[13950]: input_userauth_request: invalid user gerrit2 [preauth] [15:06:12] Oct 25 15:05:52 formey sshd[13906]: Failed password for invalid user demon from 208.80.152.165 port 40703 ssh2 [15:06:26] oh, ldap [15:06:28] yay [15:06:58] <^demon> Oh *fun* [15:07:23] <^demon> I kept all the current settings for LDAP, rather than distro-installed crap. [15:08:19] let me see about that lvm problem first [15:11:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:15:27] RECOVERY - Host sq82 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [15:17:57] friggin puppet [15:18:18] super extra slow [15:18:27] PROBLEM - SSH on sq82 is CRITICAL: Connection refused [15:19:30] PROBLEM - Frontend Squid HTTP on sq82 is CRITICAL: Connection refused [15:21:50] <^demon> Why is gerrit-wm_ joining? [15:21:59] <^demon> formey's not supposed to have the stupid irc bot. [15:22:02] <^demon> I'll fix that. [15:22:15] <^demon> Oh, can't still. [15:23:08] I'm waiting for puppet to run for ages now. [15:23:22] there's a ldap config change that's conditional on the ubuntu version [15:23:24] puppet should fix it [15:23:50] <^demon> Ah, gotcha. [15:24:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.347 seconds [15:25:27] but i-uri ldap://nfs1.pmtpa.wmnet:389 [15:25:27] +uri ldap://virt0.wikimedia.org:389 ldap://virt1000.wikimedia.org:389 [15:25:31] hmm [15:25:33] puppet has been extra painful today, dunno why [15:27:07] <^demon> It's supposed to use virt0/virt1000 since it's labs ldap. [15:27:43] let's see. [15:28:21] PROBLEM - Host sq82 is DOWN: PING CRITICAL - Packet loss = 100% [15:29:40] <^demon> paravoid: I'm in now. [15:30:12] the lvm issue isn't fixed yet unfortunately [15:38:58] sigh, ubuntu bug [15:43:38] I just loooove Ubuntu's QA [15:45:45] PROBLEM - SSH on formey is CRITICAL: Connection refused [15:47:06] RECOVERY - SSH on formey is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:47:30] * Starting MySQL Server [fail] [15:47:32] Cannot find a JRE or JDK. Please set JAVA_HOME to a >=1.6 JRE [15:47:35] otherwise done [15:47:45] i.e. lvm works, ssh should work [15:48:18] RECOVERY - HTTP on formey is OK: HTTP OK HTTP/1.1 200 OK - 3596 bytes in 0.012 seconds [15:48:27] RECOVERY - HTTPS on formey is OK: OK - Certificate will expire on 08/22/2015 22:23. [15:49:11] ^demon: is formey supposed to run a mysql? [15:49:26] it has 5.1 removed but no 5.5 installed [15:49:26] <^demon> Not for any purpose that I can remember. [15:49:47] <^demon> Yeah, I'll fix the java issue for gerrit too. [15:50:00] great [15:50:09] also, you seem to have a newer gerrit than what is in apt, I presume on purpose. [15:51:20] <^demon> Yes. [15:51:34] <^demon> Which is why we just use ensure=>present now rather than pin it. [15:52:22] gallium is also rebooted but I have no idea what to check :-) [15:52:34] <^demon> Nor do I. [15:53:55] <^demon> And gerrit's back up on the slave. [15:54:00] :-) [15:54:08] <^demon> paravoid: If you could merge https://gerrit.wikimedia.org/r/#/c/28232/, it'll keep the next puppet run from killing it. [15:56:01] hashar: welcome back. gallium is back from the reboot for a while, can you check how's jenkins [15:56:09] and if anything's broken? [15:56:21] ^demon: did you do manganese too? [15:56:33] <^demon> Not yet. I didn't want to risk bringing them both down at once. [15:56:44] but that changeset is safe to merge? :) [15:57:01] back [15:57:03] <^demon> We can start manganese now, so it won't run puppet again :) [15:57:06] sorry about the timezone madness :/ [15:58:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:58:54] <^demon> paravoid: Although, we're at the end of our window. Maybe I should just do manganese another time? [15:59:01] your call [15:59:03] <^demon> Ugh, but then they'll be inconsistent. [15:59:10] <^demon> And puppet will break one or the other [15:59:17] <^demon> Let's just do it now. [15:59:56] paravoid: seems to be working fine from my first checks [16:01:16] paravoid: so Ubuntu "just" replace packages in place ? [16:01:19] <^demon> Ryan_Lane: Do you remember why on earth we might've had mysql-server running on formey? [16:01:24] ^demon: 503 on gerrit [16:01:30] ^demon: so, can't merge that :-) [16:02:26] <^demon> Bringing it back up for a moment. [16:02:33] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28232 [16:02:35] <^demon> Ok, up. [16:03:02] <^demon> Lemme know when you've pulled to puppetmaster. [16:04:05] I did. [16:04:32] I'm also on a conference call that's starting right about now [16:04:34] ^demon: it may have been for gerrt [16:04:36] so I'll be lagging [16:04:38] *gerrit [16:04:52] <^demon> Before we used db9 and then db1048? [16:05:00] <^demon> That was my best thought. [16:05:07] ok meeting [16:07:59] ^demon: are you still upgrading gerrit ? [16:08:08] <^demon> Master right now. [16:08:22] that explain the connection down in jenkins :] [16:08:40] <^demon> paravoid: http://p.defau.lt/?aS8ZtygqZuAObko2tz_0tg [16:09:47] I forced it with apt-get -f install [16:10:09] <^demon> Mmk. [16:10:16] I'm not sure if you can resume do-release-upgrade [16:11:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.540 seconds [16:12:17] <^demon> As long as you fail at a sane point. [16:13:37] New patchset: jan; "Add puppet config for PHP" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29975 [16:14:22] <^demon> paravoid: forcing the install worked. [16:14:47] so gallium seems fine to me :-] Will check up a bit more after dinner [16:14:51] thank you paravoid !!! [16:20:51] <^demon> Ok, done with the upgrade. Rebooting manganese. [16:23:14] <^demon> paravoid: manganese rebooted, needs puppet run again so I can login :) [16:23:19] <^demon> (like formey) [16:23:47] running puppet now [16:29:08] <^demon> paravoid: Everything seems stable now. [16:29:10] <^demon> Thanks for your help. [16:29:16] great [16:45:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:49:41] mw45 and mw 34 are spamming fwrite() expects parameter 1 to be resource, boolean warnings.. [16:50:08] New review: Nikerabbit; "I don't quite understand what exit $? does but that file was just renamed." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/29937 [16:53:48] http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=copper.wikimedia.org&m=load_one&r=hour&s=by%20name&hc=4&mc=2&st=1351184005&g=load_report&z=large&c=Swift%20eqiad [16:53:54] lol [16:54:17] (it's a test box afaik) [16:56:26] !log powercycling copper, load 700 [16:56:38] Logged the message, Master [16:58:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.340 seconds [17:00:13] New review: Andrew Bogott; "This looks good to me! It'd be good to get someone who contributed to webserver.pp to review as well." [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/29975 [17:01:54] New review: Faidon; "This looks like a good, much-needed abstraction. Please make it into a puppet module though, as this..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/29975 [17:06:02] New patchset: Jgreen; "fixing ganglia last-octet snafu, sort hash by last octet" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29983 [17:15:35] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29983 [17:25:28] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/25939 [17:27:13] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24432 [17:28:30] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29328 [17:31:46] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28377 [17:33:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:33:47] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28497 [17:48:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.051 seconds [17:52:53] !log preilly synchronized php-1.21wmf1/extensions/MobileFrontend 'update after deploy' [17:53:05] Logged the message, Master [17:53:17] !log preilly synchronized php-1.21wmf2/extensions/MobileFrontend 'update after deploy' [17:53:32] Logged the message, Master [18:01:03] New review: Pyoungmeister; "will also require a line in:" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/28741 [18:09:20] !log restarted gmetad on nickel [18:09:36] Logged the message, Master [18:10:11] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid parameter check_command at /var/lib/git/operations/puppet/manifests/lvs.pp:976 on node spence.wikimedia.org [18:15:33] New patchset: Pyoungmeister; "fixing monitoring definitions for wikivoyage and wikidata lb's" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29995 [18:16:26] binasher: ^ [18:16:29] that should fix it [18:16:46] !log flushed mobile varnish cache per preilly [18:16:59] Logged the message, Master [18:17:10] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29995 [18:17:27] !log updated OpenStackManager on labsconsole to master version [18:17:35] Logged the message, Master [18:20:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:29:43] !log installing base OS on srv194 [18:29:49] Logged the message, Master [18:31:58] !log Put in a live-hack on labsconsole to remove m1.tiny from the list of instance types [18:32:14] Logged the message, Master [18:34:11] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29571 [18:34:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.083 seconds [18:36:04] !log preilly synchronized php-1.21wmf1/extensions/MobileFrontend 'update after deploy' [18:36:20] Logged the message, Master [18:36:38] !log preilly synchronized php-1.21wmf2/extensions/MobileFrontend 'update after deploy' [18:36:40] notpeter: thanks for your merges :-]  I have replied to your question about the udp2log reload [18:36:44] Logged the message, Master [18:39:01] LeslieCarr, you available to help me with ganglia some more today :) [18:40:15] hashar: yep! just saw. will poke around at it [18:42:16] ottomata: not at the moment - am at the datacenter [18:42:45] notpeter: and thanks for the other merges :-] [18:44:38] notpeter: are you on console srv194? [18:45:03] hashar: definitely [18:45:05] cmjohnson1: no [18:45:06] it's the bug [18:45:38] okay thx [18:45:39] http://wikitech.wikimedia.org/view/Dell_PowerEdge_1950 [18:45:53] mmmk, thanks [18:46:39] PROBLEM - LVS on payments.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:47:31] Jeff_Green: ^^ [18:47:48] hmmm [18:48:00] that's not a very informative message [18:48:48] i'm rebooting payments3 but I've already taken it out of lvs config [18:53:49] ah ha. payments3 is #2 of 4 on the db list in the mediawiki conf, apparently it doesn't fail out gracefully [18:53:52] Jeff_Green: hey! have you sorted out your mediawiki database balancing ? [18:54:23] hashar: conceptually yes, but the fr-tech folks have been too crazy-busy to implement the change yet [18:55:19] Jeff_Green: at least you are not blocked pending some information :-] [18:55:21] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [18:55:29] yes totally, thanks for your help with that [18:55:34] New patchset: Pyoungmeister; "patching up ipv6 monitoring for wikidata and wikivoyage" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29999 [18:55:57] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29999 [18:57:12] New patchset: jan; "Add puppet config for PHP" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29975 [19:00:19] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [19:02:45] New patchset: Hashar; "admins.pp: annotate the include as disabled" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23789 [19:03:08] New review: Hashar; "Rebased." [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/23789 [19:03:43] Jeff_Green: mind merging the ultra trivial change https://gerrit.wikimedia.org/r/#/c/23789/ ? It simply adds a comment :-} [19:04:24] can it wait a bit? i'm in the middle of rebooting the payments cluster [19:04:35] yeah sure :-] [19:04:38] and testing to see if they survive updates :-) [19:04:39] thx [19:04:41] Jeff_Green: even forget about it, it is not important hehe [19:05:15] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [19:06:37] speakign of which . . . [19:06:53] LVS is still down [19:06:54] do you need any help? [19:07:16] if it's down then I think LVS itself is broken [19:07:42] were there changes to lvs.pp earlier? [19:08:12] New patchset: Ryan Lane; "Add pam_mkhomedir" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30004 [19:08:12] ya--if you could take a look at lvs itself that would be good [19:09:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:10:21] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [19:11:18] paravoid: i'm not sure what test is failing--payments through lvs is responsive [19:11:55] New review: Hashar; "Yeahhh a module :-]" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/29975 [19:12:27] I know what this is [19:12:29] New patchset: Kaldari; "Adding $wgCentralBannerDispatcher for CentralNotice" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30005 [19:12:36] we have sh instead of wrr for payment [19:12:41] *payments [19:12:45] don't know why [19:12:51] legacy? [19:12:59] it's SSL? [19:13:06] but this means that it takes more for a server to be depooled iirc [19:13:09] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30005 [19:13:11] it hasn't in the past [19:13:36] in the past it togglign status in the pybal conf quickly worked [19:13:36] oh it's just SSL, it got truncated in the output, doh [19:13:52] TCP payments-lb.pmtpa.wikimedia. sh -> payments1.wikimedia.org:http Route 10 1 7 -> payments2.wikimedia.org:http Route 10 6 8 [19:13:58] the "s" got truncated, heh [19:14:07] ha [19:14:38] hm, telnet to 443 from spence works [19:15:12] meanwhile squid didn't like the latest package updates [19:15:26] squid on payments3 itself [19:15:27] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [19:16:05] it's in a 404 state right now [19:16:13] what is? [19:16:18] [1351190871] SERVICE ALERT: payments.wikimedia.org;LVS;WARNING;HARD;20;HTTP WARNING: HTTP/1.1 404 Not Found [19:16:30] before it was [19:16:30] [1351190799] SERVICE ALERT: payments.wikimedia.org;LVS;CRITICAL;HARD;20;CRITICAL - Socket timeout after 10 seconds [19:16:42] what's the test exactly? [19:16:46] muttw [19:16:57] to what URL [19:16:57] & from where [19:18:33] 404s here too [19:19:51] yeah I see it too [19:20:20] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 0 processes with command name squid [19:22:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.785 seconds [19:25:12] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 0 processes with command name squid [19:30:19] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 0 processes with command name squid [19:31:14] New patchset: Ori.livneh; "Enable PostEdit on 15 additional wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30009 [19:31:53] Reedy: what's the question? [19:32:13] Where the hell is the current docroot for www.wikidata.org [19:33:13] docroot/wikidata.org ? [19:33:25] what are you trying to do? [19:34:03] No, it isn't [19:34:26] Change merged: Ori.livneh; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30009 [19:34:33] New patchset: Jgreen; "adjusting LVS url test for payments" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30010 [19:35:13] Reedy: ahh, right you added that [19:35:15] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 0 processes with command name squid [19:35:19] Reedy: it's docroot/www.wikidata.org [19:35:30] No, it isn't [19:36:00] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30010 [19:37:16] Reedy: how do you figure? + DocumentRoot /usr/local/apache/common/docroot/www.wikidata.org [19:37:48] http://www.wikidata.org/ [19:37:53] That landing page doesn't exist there [19:38:02] so it can't be using that [19:38:07] the landing page is in meta [19:38:17] it's extract2.php [19:38:38] + RewriteRule ^/$ /w/extract2.php?title=Www.wikidata.org_portal&template=Www.wikidata.org_template [L] [19:38:53] ffs [19:38:54] thanks [19:39:55] hey i've forgotten too. and it was my idea to use extract2 for wikidata.org in the first place [19:39:59] ;-) [19:40:19] nice that wikidata.org is editable, even as a landing page [19:40:21] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 0 processes with command name squid [19:41:30] !log pgehres synchronized php-1.21wmf2/extensions/LandingCheck/ 'Updating LandingCheck to master' [19:41:42] Logged the message, Master [19:43:48] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [19:43:48] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [19:43:48] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [19:43:48] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [19:44:01] aude: can you create a new one? [19:44:45] ok [19:44:50] !log kaldari synchronized wmf-config/CommonSettings.php 'adding new var for CentralNotice' [19:45:03] Logged the message, Master [19:45:06] stupid buggy openstack [19:45:18] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [19:45:32] build state [19:46:29] -_- [19:46:31] stupid nova client cache [19:47:15] !log pgehres synchronized php-1.21wmf2/extensions/ContributionTracking/ 'Updating ContributionTracking to master' [19:47:15] :( [19:47:20] ^demon: regarding formey upgrade, we might want to check the doxygen doc is still running properly. Or we can wait for the cron to kick in 4hours :-] [19:47:21] same IP [19:47:24] still in build state [19:47:31] Logged the message, Master [19:47:38] ^demon: will probably move that service to gallium anyway [19:47:49] listed as active [19:48:02] that build work [19:48:02] *worked [19:48:02] <^demon> hashar: Will wait, and yes, we should. [19:48:03] aude: under what domain are we supposed to be creating this wikidata wiki? https://bugzilla.wikimedia.org/show_bug.cgi?id=40137 "Contradictory to comment 0 we try to set up wikidata.org from the start." [19:48:04] no clue why [19:48:08] or why the other one failed [19:48:11] some bug in nova [19:48:20] (www\.)?wikidata\.org [19:48:25] * aude looks [19:48:32] ^demon: ok will check the rendering tomorrow morning. Will work on doc.wikimedia.org on monday under 20% [19:49:03] Reedy: yes, wikidata.org [19:49:39] our language setup may be different at some point than other wikis, but for now i think it's okay to assume english for the language [19:49:50] and wikidata as the site/project [19:49:58] or www = en maybe [19:50:21] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [19:50:46] The apache configs need updating slightly then ;) [19:50:51] + # www -> en [19:50:51] + RewriteCond %{HTTP_HOST} www.wikidata.org [19:50:51] + RewriteRule ^/(.*$) http://en.wikidata.org/$1 [R=301,L] [19:50:54] assume you are using the mediawiki-multiversion stuff [19:50:55] New patchset: Pyoungmeister; "adding sudoers defs for code deploy to search indexer role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30011 [19:50:58] <^demon> Yes, we'll need to rewrite the whole thing [19:51:14] Reedy: it's okay i think for a start [19:51:22] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30011 [19:51:25] We can probably steal from the mediawiki.org config.. [19:51:32] * aude doesn't like to make these decisions and not sure denny really understands all the details [19:52:08] itsafuckingtestwiki.wikidata.org [19:52:17] Reedy: yes :) [19:52:24] whatever it's called is okay for now [19:52:41] just (whatever).wikidata.org [19:53:15] RECOVERY - Puppet freshness on spence is OK: puppet ran at Thu Oct 25 19:52:46 UTC 2012 [19:53:32] Dennys comment 7 suggests we're just gonna create the final one first [19:54:43] Reedy: ok [19:54:51] so, just www.wikidata.org? [19:55:21] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 0 processes with command name squid [19:56:17] New patchset: jan; "Add puppet config for PHP" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/29975 [19:56:37] MWMultiVersion.php looks scary :o [19:57:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:57:33] Yeah, like mediawiki.org [19:57:40] if you give it no www, it takes you there [19:59:11] New review: Andrew Bogott; "This looks reasonable. Is it a module 'best practice' to have every class in a separate file? It s..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/29975 [19:59:57] Reedy: i think that works for now [20:00:18] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [20:00:44] later on, we probably want de.wikidata.org/wiki/Q100 to redirect to wikidata.org/wiki/Q100?setlang=de or something fancy like that [20:01:02] but not to worry about that now [20:01:19] Q100 being an item page [20:02:44] brutal pagenames you have:) [20:03:00] http://meta.wikimedia.org/wiki/Wikidata/Notes/URI_scheme has notes but they are not that current [20:03:00] MaxSem: yes [20:03:54] New review: Hashar; "looks good to me." [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/29975 [20:04:04] we also have pages like http://wikidata.org/wiki/Special:ItemByTitle/en/Berlin and those can have a short url of some schema [20:04:18] ultimately redirect to Q100 or whatever, magically [20:04:42] New review: jan; "The seperation in many files is needed by the puppet autoloader" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/29975 [20:05:19] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [20:07:26] New patchset: Demon; "Adjust wikidata.org apache config" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/30056 [20:08:15] PROBLEM - LVS on payments.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:08:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.863 seconds [20:10:43] RECOVERY - check_squid on payments3 is OK: OK [20:11:39] Is there gerrit maintenance planned today? It's been up/down randomly today [20:11:48] got a 503 a couple of times a few hours ago, and again now. [20:25:14] PROBLEM - Memcached on marmontel is CRITICAL: Connection refused [20:25:28] PROBLEM - Swift HTTP on copper is CRITICAL: Connection refused [20:25:43] PROBLEM - LVS HTTP IPv6 on wikivoyage-lb.pmtpa.wikimedia.org_ipv6 is CRITICAL: Connection refused [20:25:43] PROBLEM - LVS HTTP IPv6 on wikivoyage-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection refused [20:26:06] Krinkle: some boxes were to be ugraded, the gerrit host among them [20:26:13] PROBLEM - LVS HTTPS IPv6 on wikivoyage-lb.pmtpa.wikimedia.org_ipv6 is CRITICAL: Connection refused [20:26:13] PROBLEM - LVS HTTPS IPv6 on wikivoyage-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection refused [20:26:17] PROBLEM - Swift HTTP on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:26:17] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:26:17] PROBLEM - Swift HTTP on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:26:17] PROBLEM - Swift HTTP on magnesium is CRITICAL: Connection refused [20:27:50] New patchset: Hashar; "rake disable colors on non TTY" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/30062 [20:27:56] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:25] no I need to find someone from op with some ruby knowledge :-] [20:29:23] hashar: I'm sure they all have knowledge it sucks ;) [20:30:18] what do you need? [20:30:25] PROBLEM - check_apache2 on payments3 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2 [20:30:25] PROBLEM - check_nginx on payments3 is CRITICAL: PROCS CRITICAL: 0 processes with command name nginx [20:30:42] !log reedy synchronized php-1.21wmf2/extensions/ 'Dark deploy Wikibase, Diff and ULS' [20:30:48] * aude eager to see our wiki :) [20:30:58] Logged the message, Master [20:31:28] paravoid: a change to the ops/puppet rakefile https://gerrit.wikimedia.org/r/30062 [20:31:33] paravoid: to disable color on non tty [20:32:32] moreover, the Puppet::Util::Color was introduced in puppet 2.7.12 and Precise got 2.7.11 :-( [20:32:46] but since Jenkins is not interactive it will not have any troubles ;-] [20:35:26] RECOVERY - check_apache2 on payments3 is OK: PROCS OK: 9 processes with command name apache2 [20:35:26] RECOVERY - check_nginx on payments3 is OK: PROCS OK: 49 processes with command name nginx [20:36:11] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:36:11] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:36:26] Reedy: for this one, https://gerrit.wikimedia.org/r/#/c/29979/2 [20:36:41] does it matter if it's "WikiBase" here or "Wikibase" as the extension is named? [20:37:04] and the fact that we have them in subdirectories... WikibaseLib and Wikibase [20:37:36] New patchset: Demon; "wikidata needs special treatment by het deploy" [operations/mediawiki-multiversion] (master) - https://gerrit.wikimedia.org/r/30064 [20:43:38] New review: Siebrand; "Just checking, but can you confirm that a full Localisation level for the extension was required for..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30009 [20:44:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:44:41] New review: Aude; "or, no i think it will work" [operations/mediawiki-multiversion] (master) C: 0; - https://gerrit.wikimedia.org/r/30064 [20:44:47] ^demon: when can we kill that preg_match( '/^(.*)\.prototype\.wikimedia\.org$/', $serverName, $matches ) stuff? :) [20:45:01] <^demon> I dunno. [20:45:04] <^demon> Prototype was never me. [20:45:18] and then there is the if( $secure ) cruft [20:45:27] AaronSchulz: prototype is still around. chrismcmahon might know Ryan_Lane too [20:45:27] the multiversion stuff is scary [20:45:35] AaronSchulz: I would keep it around until it is confirmed. [20:45:45] prototype is definitely way out of date [20:45:49] well, yes, but can we try to get rid of some cruft [20:45:59] it's like we never clean anything up almost [20:45:59] aude: I had renamed it once. Grrr [20:46:02] chrismcmahon: isn't prototype still used by some people ? [20:46:07] hashar: beta labs should be a replacement for prototype [20:46:08] The subdirectories don't matter, it's still one git repo [20:46:34] <^demon> aude: For example, en.wikidata.org, it would've set $lang = 'en', $site = 'wikidata', $docroot = '.../wikidata.org/' [20:46:36] hashar: afaik, the only thing hosted on prototype now is an old version of AFTv5 and a REALLY old version of MediaWiki [20:46:48] AaronSchulz: I am all in when it comes to cleaning our conf. It is just that prototype is not dead yet AFAIK. [20:46:55] AaronSchulz: speaking of which, https://meta.wikimedia.org/wiki/Test_wikis [20:47:03] hashar: if we can guarantee an easy update for AFTv5 on beta labs, we are done with prototype. [20:47:06] Reedy: ok [20:47:18] <^demon> Reedy: $wgDBname will be "wikidata." [20:47:18] chrismcmahon: so we still need prototype :-] [20:47:18] (there's dozens of prototype wikis) [20:47:18] <^demon> Wanna go ahead and create? [20:47:27] chrismcmahon: I got a weird permission error with the beta autoupdater. [20:47:36] ^demon: makes sense [20:47:36] <^demon> AaronSchulz: Can you https://gerrit.wikimedia.org/r/#/c/30064/? [20:47:38] chrismcmahon: I somehow found the root cause, need to find out a workaround. [20:47:40] one database [20:47:41] ^demon: will wikidata have any uploads? [20:47:49] Not wikidatawiki [20:47:57] <^demon> Reedy: No, no need afaict. [20:48:05] so help icons and anything go to commons I guess [20:48:05] AaronSchulz: please keep the prototype stuff for now on :/ AFTv5 will still need it till beta is able to self update properly. [20:48:05] heh [20:48:09] AaronSchulz: i really doubt it [20:48:20] hashar: I'm not killing, I'm saying we should think about it [20:48:20] * aude doesn't make decisions though [20:48:24] <^demon> Reedy: Or it could be wikidatawiki, and it would be special-cased like commons. Either way it gets special-cased. [20:48:35] ^demon: well it better not, unless you want to change rewrite.py :) [20:48:35] AaronSchulz: we are very close to kill it :-] [20:48:37] I don't think we need the extra suffix [20:48:39] it was just amusing ;) [20:48:41] hashar: I'm really hoping we can deploy to beta using Jenkins builds now that gallium, formey etc are upgraded [20:48:42] <^demon> AaronSchulz: Btw, how does MWMultiversion handle mediawiki.org? I couldn't find any special casing. [20:49:12] break MWMultiversion and everything breaks :o [20:49:29] interesting how it works [20:49:52] ^demon: site=wikipedia, lang=mediawiki, so it's mediawikiwiki [20:50:07] that kind of pattern already works [20:50:17] for uploads [20:50:26] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 0 processes with command name squid [20:50:40] let me see how it works for page views [20:50:55] <^demon> AaronSchulz: So if we name it wikidatawiki, it should *just work* without the special-casing I added? [20:51:11] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:51:11] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:52:04] ^demon: wikidatawiki would certainly be easier to handle than wikidata, yes [20:52:13] New patchset: Reedy; "Wikidata config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30068 [20:52:16] chrismcmahon: was the request to apply a specific patchset before it get merged ? [20:52:59] <^demon> Reedy: Go ahead and run addWiki. [20:52:59] aude: Are we likely to have other non wikidata NS on the wiki? Just wondering if we should start from way NS 120? [20:53:00] <^demon> $wgDBname = "wikidatawiki" [20:53:09] ^demon: but we won't be able to see it ;) [20:53:15] <^demon> Yeah, let's pester ops. [20:53:47] Change abandoned: Demon; "Shouldn't be necessary if we name the database "wikidatawiki"" [operations/mediawiki-multiversion] (master) - https://gerrit.wikimedia.org/r/30064 [20:54:19] mwscript addWiki.php --wiki=aawiki en wikidata wikidatawiki wikidata.org [20:54:24] Reedy: not that i know of, but i could be forgetting something obvious or something i can't foresee now [20:54:37] aude: I guess it wouldn't hurt either way :) [20:54:52] Any advances on mwscript addWiki.php --wiki=aawiki en wikidata wikidatawiki wikidata.org [20:55:06] <^demon> AaronSchulz: ^? [20:55:12] elseif ( preg_match( "/^\/usr\/local\/apache\/(?:htdocs|common\/docroot)\/([a-z0-9\-_]*)$ [20:55:14] /", $docRoot, $matches ) ) { [20:55:23] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [20:55:24] I guess mw.org must use that case right? [20:55:31] <^demon> I believe so. [20:56:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.357 seconds [20:56:10] <^demon> And actually, wikidata won't be a suffix anymore in wgConf. [20:56:12] <^demon> I can remove that too. [20:56:43] so yeah, users would be directed to docroot/wikidata and it could fall under that case I suppose [20:57:03] mw.org is probably a good example indeed :D [20:57:27] Ryan_Lane: is secure almost nuked yet? [20:57:47] AaronSchulz: someone just needs to jfdi [20:57:50] * Reedy smiles at paravoid [20:57:58] New patchset: Demon; "More wikidata fixes" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30069 [20:58:13] Error: 1007 Can't create database 'wikidatawiki'; database exists (10.0.6.44) [20:58:13] Seriously!? [20:58:38] wtf [20:58:51] <^demon> Maybe I did something earlier. [20:59:00] * AaronSchulz hands Reedy the swear jar [20:59:00] http://noc.wikimedia.org/conf/all.dblist :P [20:59:15] <^demon> hoo: I thought I had deleted it though. [20:59:23] DROB DB zOmmgs [20:59:30] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:30] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:31] Did you delete the ES stuff too? [20:59:35] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30069 [20:59:35] :o [20:59:45] ^demon: maybe you accidentally dropped some other wiki? [21:00:00] oO [21:00:03] lol [21:00:05] preilly, commented out WikiMiniAtlas for now until we know why it degrades site performance [21:00:20] PROBLEM - check_squid on payments3 is CRITICAL: PROCS CRITICAL: 1 process with command name squid [21:00:23] New patchset: Brion VIBBER; "Disable $wgMFEnableResourceLoader since it's broken currently" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30070 [21:00:32] AaronSchulz: I think someone just needs to apply the redirects [21:00:39] to the new https cluster [21:00:47] !log demon synchronized wmf-config/wgConf.php 'Syncing I39c9693a' [21:01:03] Logged the message, Master [21:01:28] !log demon synchronized wmf-config/CommonSettings.php 'Syncing I39c9693a' [21:01:42] Logged the message, Master [21:03:02] Ryan_Lane: I'd like to find a MLP picture of a unicorn with a bunch of nukes [21:03:12] Eloquence: what's the problem with WMA? [21:03:16] New patchset: Reedy; "Initial config for wikidatawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30073 [21:03:56] Eloquence: Dispenser has been requesting/testing a few millions thumbnails (finding a dozen bugs) [21:04:01] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30073 [21:05:15] Nemo_bis, it uses an mw.loader directive that pulls from meta.wikimedia.org, which appears to result in problematic data-center roundtripping based on the request analysis we did. [21:05:47] Eloquence: yes I saw your edits now [21:05:48] thanks [21:05:50] I'm aware of Dispenser's analysis, it's very helpful [21:06:55] Eloquence: we have plenty of wikis loading scripts from Meta, so clues on how to do this better will be very helpful [21:08:02] New patchset: Pgehres; "Adding French territories to priority countries list for LandingCheck." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30075 [21:09:13] AaronSchulz: :D [21:09:32] Ryan_Lane: I bet at least something similar exists [21:09:42] but I don't want to be seen searching for it at the office [21:10:11] Eloquence: okay thanks [21:10:27] New patchset: Reedy; "Wikidata config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30068 [21:10:28] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30075 [21:10:36] Ryan_Lane: it would be perfect for when prototype dies [21:10:49] prototype needs to die [21:10:51] badly [21:11:12] so does internproxy [21:11:12] wtf is internproxy? [21:11:12] for the analinterns [21:11:45] * AaronSchulz had to reread that a few times [21:12:14] ;-] [21:14:02] https://gerrit.wikimedia.org/r/#/c/30056/ <- Can someone please review, deploy and graceful that for wikidata stuffs? [21:15:09] New patchset: Reedy; "Wikidata config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30068 [21:19:04] New review: Swalling; "@Siebrand" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30009 [21:19:14] Reedy: looking [21:19:51] New patchset: Reedy; "Wikidata config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30068 [21:21:07] RECOVERY - LVS on payments.wikimedia.org is OK: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error [21:22:03] RECOVERY - check_squid on payments3 is OK: OK [21:22:49] New review: Aude; "Looks to me like this should work, even if not 100% perfect. Perfection can come later and I think i..." [operations/apache-config] (master) C: 1; - https://gerrit.wikimedia.org/r/30056 [21:24:54] Reedy: when Timo says "(same)", it usually means "as above" [21:25:10] ^ indeed [21:25:19] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:25:25] (or below, if I type asynchronously) [21:25:41] :) [21:25:41] why did the parallel chicken cross the road? [21:25:45] below makes sense [21:25:56] well, above does [21:25:56] after you've made a comment [21:26:21] New patchset: Reedy; "Wikidata config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30068 [21:27:00] Eloquence: commented out WikiMiniAtlas where? [21:27:47] Krinkle: on enwiki [21:27:48] common.js [21:27:50] Why? [21:28:18] http://en.wikipedia.org/wiki/MediaWiki:Common.js [21:28:18] https://en.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=prev&oldid=519826982 [21:28:18] some performance issue with mw.loader from meta wiki [21:28:18] https://en.wikipedia.org/w/index.php?title=User_talk:Dschwen&diff=prev&oldid=519827444 [21:28:22] * aude cries :( [21:28:24] Why would it matter which wiki? [21:28:34] no idea [21:28:35] New review: Hashar; "This change is still pending some logic simplifications per Ryan comment on patchset 6 https://gerri..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/8120 [21:28:53] last I checked load balancing is done based on geo of user, not of what wiki it is. [21:29:29] did commenting it out help? I'd like some data on this. [21:29:35] I doubt it. [21:29:39] Krinkle, we're doing some request analysis based on monitoring data collected via Nimsoft monitoring stations and are finding that the connect times for the meta.wikimedia.org request are often several seconds long. Patrick and Leslie are going to look a bit more at this this PM. [21:29:50] Reedy: almost 100% sure for phase 1, we don't need the property or query namespaces [21:30:05] definitely not query and 99% sure not property [21:30:14] lol [21:30:21] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:30:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:30:38] Eloquence: ok, strange. Only when referring from en.wiki or meta slow in general (i.e. direct page views on meta)? [21:30:45] i know those are in the example settings and not sure it matters, but probably best not to have them if we don't use the namespaces yet [21:30:59] Eloquence: anyway, that needs to be taken care of soon, as cross-wiki gadgets are coming up not so far from now [21:31:14] there is at leas half a dozen more of these cross-wiki loads in the wild, which should be fine. [21:31:20] i can double check with denny tomorrow [21:31:40] Krinkle, the request data we have right now is based on en.wp article views loaded in real browsers, not individual GETs to that object. [21:31:40] some to en.wiki, some to meta.wiki others to medawiki.org. Usually grabbing the central version from one of those. [21:31:56] yeah I know [21:32:26] Krinkle: I'll loop you in once Leslie and I have a change to talk about it a bit more [21:32:47] preilly: k, please do :) These should be caught by squid for anonymous users [21:33:03] even a cache miss will be fairly cheap as it is action=raw with high smaxage [21:33:34] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:33:36] Krinkle: yeah that is how it should be working for sure [21:33:48] Krinkle: but that doesn't seem to be the case in the real world right now [21:33:58] preilly: ok [21:34:17] preilly: what kind of 'loop' can I expect btw. So I can put a watch on it. [21:34:29] bugzilla, rt, mailling list, .. [21:35:15] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:35:15] Krinkle, do you want to help debug this? [21:35:17] New review: Siebrand; "That's wonderful, Steven. Thanks for getting it!" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30009 [21:35:21] Eloquence: If I can, sure. [21:35:33] notpeter, can you make Krinkle (Timo) a Watchmouse account? [21:35:50] ori-l: btw, I just got a PostEdit notification earlier today when working in CentralNotice today. [21:35:51] sure [21:36:01] Krinkle: via email or IRC [21:36:52] guys, can I deploy a quick configuration change? [21:37:07] Krinkle, prepare for wanting to stab your own face [21:37:19] as you get exposure to the wonderful watchmouse UI [21:37:19] eh.. [21:37:25] MaxSem: give us a minute, please [21:37:42] Krinkle: sorry, can you elaborate? are you reporting a bug? [21:37:50] ori-l: Looks like it should be guarded against the scenario of making an edit without being on action=edit (e.g. a script using the API, or Special pages like in CentralNotice that make an edit internally to store data) [21:38:14] ori-l: probably just needs an additional if check to see what $context>Action::getActionName is (or wherever that thing is) [21:38:27] Krinkle: it's *so bad* [21:38:31] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:38:31] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:38:48] notpeter: their frond-end is pretty modern though [21:38:53] (status.wm.o) [21:38:57] Krinkle: i thought the fact that the cookie is set from a BeforePageRedirect hook handler would ensure that it didn't affect API calls [21:39:09] here's an example request for WikiMiniAtlas: http://meta.wikimedia.org/w/index.php?title=MediaWiki:Wikiminiatlas.js&action=raw&ctype=text/javascript&smaxage=21600&maxage=86400 [21:39:39] notpeter: oww. cleartext password e-mail ftw [21:39:39] the connect time on that is 5-7 secs for a significant number of pageviews [21:39:55] I expected a "click here to set a password" link [21:39:59] instead I got my current password sent in plain text [21:40:10] headers: http://pastebin.com/MyS6Wvxw [21:40:19] for an example request to that URL [21:40:26] from Europe [21:40:41] Krinkle: hurray..... [21:40:49] anyway, changed the default and logged in [21:40:56] notpeter: where do I go from here? [21:41:14] lots of buttons to press ;-) [21:41:33] Krinkle: probably head to logs [21:41:50] k, I'm on logviewer [21:41:55] Krinkle, there's multiple monitors setup in watchmouse including the status monitors. K4-713 has been running a monitor called "banner load testing" [21:42:03] that loads an en.wiki article [21:42:33] here's an example probe from that monitor: https://dashboard.cloudmonitor.nimsoft.com/en/rootcause.php?mid=41159&vrid=271418&vlogid=12744824 [21:42:35] there's alos now banner load testing FF [21:42:41] which is firefox as opposed to chrome [21:42:49] right [21:43:20] The ones from India can't seem to even connect to en.wikipedia.org? [21:43:35] hm.. not all, seems random [21:44:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.061 seconds [21:45:11] notpeter: please let me know once I have an account [21:45:23] notpeter: thanks [21:45:23] Eloquence: ok, on the link you sent there, I see an error from WikiMiniAtlas indeed. [21:45:24] a dom exception [21:45:39] New patchset: Reedy; "Wikidata config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30068 [21:45:44] Krinkle: that is with local storage right? [21:45:48] yeah [21:45:55] !log mlitn synchronized php-1.21wmf2/extensions/ArticleFeedbackv5 'desc' [21:45:56] note the old chrome version as well [21:46:03] so it's possible that this only happens with very ancient chrome versions [21:46:04] Logged the message, Master [21:46:10] !log mlitn synchronized php-1.21wmf2/extensions/AbuseFilter 'desc' [21:46:16] but I doubt that this would cause the connect time on those meta.wiki resources to shoot up [21:46:25] Logged the message, Master [21:46:29] And Wiki.png takes 6 seconds to load? [21:46:30] Reedy: d before m? [21:46:41] ? [21:46:46] https://gerrit.wikimedia.org/r/#/c/30068/6/special.dblist [21:46:46] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:46:46] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:46:48] Or am I interpreting that waterfall chart wrong? [21:46:56] well, it is a waterfall chart [21:47:01] paravoid: what is with those swift notices? [21:47:11] AaronSchulz: I just added them all to the end [21:47:13] so that request may just be blocked on others [21:47:14] I'll alphasort them laster [21:47:26] preilly: ok, good to go [21:47:34] you better, or the ocd mob will get you [21:49:50] Reedy: can rename $baseNs to $wmfWikiBaseNs or something? [21:50:03] Krinkle, I suspect the India results are watchmouse artifacts. that's why we need additional monitoring services to compare [21:50:04] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:50:09] Feel free :p [21:50:22] I feel free to tell you to rename it, yes [21:50:49] note also that watchmouse overrepresents IPv6 massively [21:51:03] about half their monitoring stations are IPv6 [21:54:16] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [21:55:01] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:55:01] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:55:11] !log mlitn synchronized wmf-config 'desc' [21:55:16] !log olivneh synchronized php-1.21wmf2/extensions/PostEdit [21:55:20] Logged the message, Master [21:55:27] !log mlitn synchronized php-1.21wmf2/extensions/ArticleFeedbackv5 'desc' [21:55:32] Logged the message, Master [21:55:44] !log mlitn synchronized php-1.21wmf2/extensions/AbuseFilter 'desc' [21:55:44] Logged the message, Master [21:55:58] Logged the message, Master [21:59:29] AaronSchulz: Reedy it could probably be $wgWBBaseNS ? or really don't care what it is :) [21:59:43] It doesn't need to be $wg anything really [21:59:53] It's not going to be used by anything else [21:59:59] we could just unset it at the end of the code block [22:00:00] ok :) [22:00:01] Reedy: you could just call unset() then ;) [22:00:24] * AaronSchulz is jinxed [22:01:19] PROBLEM - Puppet freshness on sq76 is CRITICAL: Puppet has not run in the last 10 hours [22:02:29] Reedy: jeroen suggests we not have the property and query namespaces enabled just yet (for phase 1) [22:02:47] * aude thinks it's trivial to enable them later [22:03:02] It was in Dennys config/setup email [22:03:09] really? [22:03:16] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:03:38] i think it was an oversight but he gets the final say [22:03:50] I can forward you the email if you want to see it [22:03:54] ok [22:04:08] * aude really doesn't think he thought of that [22:04:27] just copied our example settings i think [22:04:33] heh [22:04:37] forwarded to your gmail [22:04:49] ok [22:06:33] Reedy: if he says so, ok [22:06:51] Either way, we can add/remove/fix/whatever it later [22:07:02] for the beta wiki, but i suppose it's trivial to remove them [22:07:02] yeah [22:07:24] i sent him an email asking aobut it but it's rather late here to expect an answer now [22:14:00] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:14:16] New patchset: Ori.livneh; "Deploy EventLogging to en and test" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30087 [22:14:58] Change merged: Ori.livneh; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30087 [22:15:28] Need the apache config updated first [22:16:39] New patchset: Ori.livneh; "Add wgEventLoggingBaseUri config var" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30088 [22:17:01] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:17:10] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:17:18] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:17:23] Change merged: Ori.livneh; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30088 [22:17:25] the PHP libxml patch is pretty much done, I just need to do some more testing [22:18:24] I have to try to think of tests which would reliably break if I had done something wrong [22:18:56] !log olivneh synchronized php-1.21wmf2/extensions/EventLogging [22:19:10] Logged the message, Master [22:21:44] !log aaron synchronized php-1.21wmf2/includes/Revision.php 'temp query debug logging' [22:21:59] Logged the message, Master [22:22:16] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:22:19] !log aaron synchronized php-1.21wmf2/includes/Revision.php [22:22:33] Logged the message, Master [22:23:12] !log aaron synchronized php-1.21wmf2/includes/Revision.php 'tweak wfGetCaller param a level' [22:23:25] Logged the message, Master [22:24:06] ori-l: Would you mind making a follow-up commit to https://gerrit.wikimedia.org/r/#/c/29894/ that passes the jshint configuration you added? [22:24:07] !log aaron synchronized php-1.21wmf2/includes/Revision.php 'done' [22:24:21] Logged the message, Master [22:24:42] Ah, so it's LuceneResult::__construct and ApiOpenSearchXml::getExtract to a lesser extent spamming the master :) [22:24:56] ori-l: There is also "modules/dataModels.js: line 5, col 27, Extra comma." that will make IE crash. [22:25:15] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:25:42] Krinkle: good catch, thanks. i'll look after both, sec. [22:28:11] Krinkle: do you have a suggestion regarding jshint failing the 'enum' property? [22:28:21] (not sure if you saw my comment) [22:28:33] I didn't see it [22:28:38] ori-l: It depends on whether it is a reserved word in ES3 [22:29:11] ori-l: it isn't in ES5. And as of ES5 javascript allows use of reserved words in variable names (which make sense, hitting these random words is super annoying) [22:29:36] ori-l: however in ES3 the script will fail if a reserved word is there, so if that is the case, you'll have to rename it. [22:29:56] it is; but i'm using it as an object attribute, not an identifier. i'd drop its use altogether, but i'm trying to conform to the json schema draft spec. i could silence jshint with ES5:true, but that would be too broad. [22:29:59] ES3 is bascically IE6-9, Firefox < 4 and older Opera. [22:30:10] !log aaron synchronized php-1.21wmf2/includes/Revision.php 'logging one more thing.' [22:30:24] Logged the message, Master [22:30:45] ori-l: that wouldn't just be too broad, it would ignore the fact that the script doesn't work IE6-8, Firefox < 4 and older Opera. [22:30:52] ok thats weird [22:30:56] !log aaron synchronized php-1.21wmf2/includes/Revision.php 'done' [22:31:02] ori-l: if you want to use that name, you'll have to quote it [22:31:10] Logged the message, Master [22:31:11] accessing the property: foo['enum'] [22:31:16] in object literal: { 'enum' : .. } [22:31:41] Krinkle: really? i thought "Reserved Words actually only apply to Identifiers (vs. IdentifierNames)" (https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Reserved_Words) [22:31:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.064 seconds [22:32:00] though it's not clear from MDN article if they're referring to ES5 there. [22:32:12] ori-l: Look at the example at the bottom of that article [22:32:26] ori-l: in ES5 it is allowed in (almost) all places. [22:33:03] I admit, its a bit messy on the MDN article. [22:33:15] Let me see if an can pull up the es3 spec [22:33:29] (too bad it doesn't have a nice human readable version like the es5 spec: es5.github.com )_ [22:33:30] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:34:25] Krinkle: curious to you know the answer, but i think i'm going to quote the property regardless, just as another way of flagging the issue. [22:34:40] er, curious to know the answer, i meant. [22:36:57] !log aaron synchronized php-1.21wmf2/extensions/MWSearch/MWSearch_body.php 'use slave DB for revision queries.' [22:37:11] Logged the message, Master [22:39:11] ori-l: also, I'd recommend droppping the mw.foo = mw.foo || {}; pattern. It can be useful sometimes, but it is out of place in this case. [22:39:17] There are 2 files, one extends the other. [22:39:26] they are mis-ordered in the definition file [22:39:40] * ori-l takes a look. [22:39:40] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [22:39:40] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [22:39:40] that's probably why you added the line because it didn't work first [22:39:46] Krinkle: no, automatic habit [22:39:56] aii, dangerous habbit [22:40:02] it masks errors [22:40:04] !log aaron synchronized php-1.21wmf2/extensions/OpenSearchXml/ApiOpenSearchXml.php 'use slave DB for revision queries.' [22:40:18] Logged the message, Master [22:40:35] dataModels is useless without the rest of the module, and it if it to be used outside the module, then it can be namespaces differently. [22:40:42] namespaced* [22:41:18] ori-l: btw, is EventLogging only used in modern browsers? [22:41:31] I also see JSON.stringify which obviously doesn't exist yet in older browsers (new in HTML5) [22:41:45] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:42:07] binasher: enwiki master should be in euphoria now ;) [22:42:16] Krinkle: that's another excellent point :). [22:42:22] ok [22:42:33] binasher: https://graphite.wikimedia.org/dashboard/temporary-18 [22:42:54] ori-l: actually, I was about to start on another me-style clean up commit. But maybe its best if I let you do it. I don't want to be intrusive. [22:43:08] I'll send you my commit message I formulated so far for what I was about to do. [22:43:25] I thought changes to core made it in 1.21wmf2, but I guess they were after the branch point [22:43:32] use your own judgement of course, they're raw notes :) [22:43:46] Krinkle, that'd be awesome. thanks! [22:45:03] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:47:29] New patchset: CSteipp; "Enable InstantCommons on Wikivoyage beta" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30092 [22:50:00] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:53:19] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:55:19] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:56:36] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [22:58:15] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:58:38] !log olivneh synchronized php-1.21wmf2/extensions/E3Experiments [22:58:53] Logged the message, Master [23:00:52] New review: Kaldari; "Pending update of a couple of the logo graphics (wowiki for example). Nemo says these will be ready ..." [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/23985 [23:01:33] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:03:26] binasher, did https://gerrit.wikimedia.org/r/#/c/29883/ get deployed? [23:03:30] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:03:53] MaxSem: yes [23:05:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:08:05] binasher: http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=MySQL%20pmtpa&h=db63.pmtpa.wmnet&v=11872&m=mysql_table_locks_immediate&r=hour&z=default&jr=&js=&st=1351206309&vl=count&ti=mysql_table_locks_immediate&z=large [23:08:11] heh :) [23:09:00] binasher, hmm. I see old behaviour [23:09:20] AaronSchulz: :D [23:09:48] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:10:59] about to run scap [23:11:14] binasher, the manifest doesn't require automatic service restart on redirector changes - maybe, that's the problem? [23:11:45] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:12:38] MaxSem: try now? [23:13:31] binasher, now it works, thanks [23:14:03] running scap [23:16:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.986 seconds [23:18:03] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:20:01] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:21:01] !log kaldari Started syncing Wikimedia installation... : [23:21:10] Logged the message, Master [23:23:18] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:26:18] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:28:20] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:31:20] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:31:36] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:36:30] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:41:27] RECOVERY - MySQL Slave Delay on es1001 is OK: OK replication delay NULL seconds [23:42:48] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:44:45] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:46:15] PROBLEM - MySQL Slave Delay on es1001 is CRITICAL: CRIT replication delay 2963988 seconds [23:48:03] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:49:02] !log installing base OS on sq82 [23:49:11] Logged the message, Master [23:51:06] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:51:06] RECOVERY - Host sq82 is UP: PING OK - Packet loss = 0%, RTA = 2.55 ms [23:53:00] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:55:39] !log kaldari Finished syncing Wikimedia installation... : [23:55:54] Logged the message, Master [23:56:00] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:56:18] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds