[00:00:42] !log running patch-job_token.sql migration against all wikis [00:00:55] Logged the message, Master [00:03:40] !log completed patch-job_token.sql migrations [00:03:43] AaronSchulz: ^^ [00:03:51] Logged the message, Master [00:08:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.003 seconds [00:09:59] \o/ [00:16:24] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [00:42:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:44:48] error: insufficient permission for adding an object to repository database .git/objects [00:44:50] * Reedy looks around [00:45:10] fatal: failed to write object [00:45:10] fatal: unpack-objects failed [00:45:14] Really useful error, thanks git [00:46:50] Did someone screw up with a bad umask? [00:46:53] * RoanKattouw looks at spage [00:46:57] lol [00:46:59] Possibly [00:47:11] I thought mutante had committed something to the puppet repo to fix that [00:47:23] Unfortunately, I've no idea where.. [00:48:29] !log Created ~spage/.bashrc with 'umask 0002' [00:48:41] Logged the message, Mr. Obvious [00:48:44] ~log I mean 002 [00:48:47] !log I mean 002 [00:48:58] Logged the message, Mr. Obvious [00:51:43] Reedy: Which git command did you try to run and where? [00:52:06] git pull in /home/wikipedia/common/php-1.21wmf2 [00:53:15] OK [00:53:46] Aha! [00:53:51] I was blaming spage, but it's maxsem [00:54:27] But .... he has umask 002 in his .bashrc [00:56:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.584 seconds [00:57:34] !log chmod -R g+w /home/wikipedia/common/php-1.21wmf2/.git [00:57:37] Reedy: Try now [00:57:45] Logged the message, Mr. Obvious [00:58:01] cheers [00:58:06] Gotta go now [01:04:24] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [01:10:24] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [01:29:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:40:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 277 seconds [01:44:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.033 seconds [01:46:24] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [02:17:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:10] !log LocalisationUpdate completed (1.21wmf1) at Tue Oct 16 02:28:10 UTC 2012 [02:28:29] Logged the message, Master [02:31:24] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [02:33:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.020 seconds [02:53:28] !log LocalisationUpdate completed (1.21wmf2) at Tue Oct 16 02:53:28 UTC 2012 [02:53:40] Logged the message, Master [03:36:12] RECOVERY - Puppet freshness on lvs1001 is OK: puppet ran at Tue Oct 16 03:35:45 UTC 2012 [03:53:27] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [03:53:27] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [03:58:24] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [04:17:27] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [06:44:35] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [07:21:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:23:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.777 seconds [07:55:32] PROBLEM - Puppet freshness on srv220 is CRITICAL: Puppet has not run in the last 10 hours [07:55:32] PROBLEM - Puppet freshness on sq42 is CRITICAL: Puppet has not run in the last 10 hours [07:56:35] PROBLEM - Puppet freshness on srv297 is CRITICAL: Puppet has not run in the last 10 hours [08:00:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:06:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.838 seconds [08:17:35] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [08:17:35] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [08:39:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:52:32] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [08:55:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.553 seconds [09:00:05] Change merged: Nikerabbit; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/27962 [09:06:30] !log nikerabbit synchronized wmf-config/InitialiseSettings.php '(bug 41015) Enable Narayam on hi.wiktionary' [09:06:43] Logged the message, Master [09:13:18] hello [09:29:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:44:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.229 seconds [10:17:35] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [10:18:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:34:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds [10:58:06] New patchset: Hashar; "Gerrit notifications for Wikidata to their channel" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26042 [10:59:12] New patchset: Hashar; "cleanup/refactor gerrit logging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8120 [11:00:22] New patchset: Hashar; "Gerrit hook tests extended coverage" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26041 [11:01:27] New review: Hashar; "Fixed french typo." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26041 [11:01:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/26042 [11:01:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8120 [11:01:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/26041 [11:05:35] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [11:06:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:11:35] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [11:22:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.034 seconds [11:55:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:10:03] New patchset: J; "add nfs::apache::labs to mediawiki::videoscaler" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28208 [12:10:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [12:11:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28208 [12:27:39] Change abandoned: J; "will merge with new patch introducing role::applicationserver::videoscaler" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28008 [12:31:54] !log reedy synchronized php-1.21wmf2/includes/ [12:32:06] Logged the message, Master [12:32:35] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [12:33:15] New patchset: J; "add role::applicationserver::videoscaler" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28208 [12:34:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28208 [12:43:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:48:52] mark: still around? I got a varnish related change for you. it simply moves a gcc parameter ;) https://gerrit.wikimedia.org/r/#/c/24797/ [12:48:58] (tested on labs) [12:52:18] New patchset: J; "add role::applicationserver::videoscaler" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28208 [12:52:41] sorry hashar, can't approve that without testing [12:52:54] mark: fully agree ;-] [12:53:08] you don't want to have all varnish to die suddenly [12:53:10] no [12:53:18] :-( [12:53:22] it's already weird that that would fix it [12:53:22] New review: Mark Bergsma; "If this fixes it, it means that parameter ordering matters, which means that I can't assume it still..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/24797 [12:53:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28208 [12:53:25] so therefore we need to know why [12:54:35] I have no idea beside the error reported in commit message [12:55:00] I simply reproduced the command line generated by the init script and tweaked it until I got varnish to compile. [12:55:13] definitely weird :/ [12:58:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.022 seconds [13:00:52] paravoid: if you are around, I made a change to git::clone to checkout a specific sha1 version https://gerrit.wikimedia.org/r/#/c/27175/ I have tested it on labs ;) [13:12:12] New patchset: ArielGlenn; "fix up swift drive audit so it doesn't barf on bad log lines" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28210 [13:13:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28210 [13:31:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:46:59] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [13:54:31] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [13:54:31] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [13:59:14] 16 09:51:50 < j^> are some imagescalers already updated to precise? [13:59:27] i think so. talk to notpeter or paravoid [13:59:35] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [13:59:50] notpeter is working on it [14:00:05] last I heard he's blocked by someone porting some custom patches of ours to librsvg [14:00:24] to the newer version I mean [14:00:31] otherwise I think he's very close [14:05:05] New patchset: Mark Bergsma; "First attempt to support passed (high) range requests on Varnish backends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28213 [14:06:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28213 [14:09:41] New patchset: Mark Bergsma; "First attempt to support passed (high) range requests on Varnish backends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28213 [14:10:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28213 [14:18:29] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [14:19:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:22:15] New patchset: Hashar; "module androidsdk (WORK IN PROGRESS)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28216 [14:23:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28216 [14:23:26] paravoid: can you possibly get a look at a change I made to git::clone to make it support cloning at an arbitrary sha1 ? https://gerrit.wikimedia.org/r/#/c/27175/ [14:23:51] paravoid: it only support being given a branch such as "master" [14:24:20] reset --hard? [14:24:24] no, that's evil [14:24:28] it will override local changes [14:24:34] i.e. lose data [14:24:47] why you want a specific commit anyway? [14:25:28] cause you don't want to install a third party "master" version [14:25:41] which can be just any code ;-D [14:25:51] just like we used to do svn up -r somerevision [14:26:09] you don't want to install a third party, period [14:26:09] or whenever someone merge a change in "master" puppet will end up deploying it by itself :/ [14:26:51] I wouldn't trust a third-party git server in general [14:27:00] by third party, I meant a "non op" repository [14:27:08] ah [14:27:17] such as a software written by another group like platform engineering [14:27:17] or features [14:27:23] a typical example would be the wikibugs perl script [14:28:03] that should be packaged, not deployed in such evil ways [14:28:13] example: https://gerrit.wikimedia.org/r/#/c/26325/10/manifests/misc/wikibugs.pp,unified [14:28:16] especially not such hacky ways [14:28:41] so I will open a RT to get all that stuff packaged ;-D [14:29:01] sure [14:29:06] doesn't mean we're necessarily gonna do that [14:29:09] which will never get done cause nobody care about wikibugs hehe [14:29:16] you can do it [14:29:16] if you care [14:29:22] I am not more doing any packaging [14:29:35] it ends up being a huge time sink to me cause I lack basic knowledge about debian packages [14:29:49] I always end up spending a full day to build a package. [14:29:50] :-/ [14:30:54] unless you want me to do packaging full-time, I don't have the time to build packages for everyone... [14:31:07] I'm always happy to help people though [14:31:57] but yes, calling git from puppet to deploy stuff is hackish in my opinion too [14:33:29] I find it to be nice way to deploy a set of files [14:33:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.712 seconds [14:33:51] whenever non ops want to get the package updated they can send a a new sha1 to be merged [14:34:05] or maybe I should just create a "production" branch in the repository and use git::clone { branch => 'production' } [14:34:21] branch/tag sounds more sensible [14:34:23] then whenever I want some head to be deployed I could push that reference to production and puppet will handle all of that [14:34:36] but still sounds hackish to use puppet to do that [14:34:48] I remember Andrew trying to git clone mediawiki via puppet [14:34:51] save us from maintaining a package [14:34:59] he got timeouts and it was hell in general :) [14:35:01] and then beg for ops to upload it on apt :-] [14:35:41] you're tying stuff deep into our infrastructure and then you complain about begging us for stuff? :) [14:35:42] yeah mediawiki/core.git is a bit too big when one request a full clone. A few hundred MB :( [14:35:57] na I am not really complaining [14:36:15] don't use puppet for that, work on other ways to deploy things. e.g. help Ryan with his deployment system [14:36:24] it is just that make you do some additional paperwork that have no real added value in my opinion [14:36:38] it does have added value [14:36:49] it's not a totally hacky and unreliable way to deploy things [14:39:57] anyway that git::clone at a specific sha1 comes from wikibugs that used to be deployed using "svn co -r12345 svn.../trunk/tools/wikibugs" [14:40:05] New patchset: Ottomata; "Updating zero filters for Saudi Telecom and Tata India" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28218 [14:40:18] so I merely converted that svn code to a git call https://gerrit.wikimedia.org/r/#/c/26325/10/manifests/misc/wikibugs.pp,unified [14:40:40] that is what caused me to adapt git:clone to be able to use some commit sha1 [14:41:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28218 [14:41:43] oh, that makes it ok then ;-) [14:41:56] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28218 [14:42:23] ... [14:42:26] ;-D [14:42:38] need to fix the "git reset --hard" stuff anyway [14:46:00] git has cheap branching/tagging though [14:46:04] which isn't the case for svn [14:47:05] New patchset: Hashar; "git::clone now support a specific sha1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/27175 [14:48:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/27175 [14:48:57] paravoid: so https://gerrit.wikimedia.org/r/#/c/27175/ would let us specify a tag instead of a branch [14:49:09] New patchset: Dereckson; "(bug 41069) Delhi 2012-10-17 workshop throttle rule" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28219 [14:49:12] so we can track in puppet which version is being deployed [14:50:07] I don't see it [14:52:24] it ? [14:52:24] New patchset: Hashar; "wikibugs migrated from svn to git" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26325 [14:52:25] the change ? [14:53:24] New review: Hashar; "rebased" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/26325 [14:53:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/26325 [14:55:05] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28219 [14:56:05] !log hashar synchronized wmf-config/throttle.php '(bug 41069) Delhi 2012-10-17 workshop throttle rule' [14:56:17] Logged the message, Master [14:56:20] !log hashar synchronized wmf-config/throttle.php '(bug 41069) Delhi 2012-10-17 workshop throttle rule' [14:56:32] Logged the message, Master [14:57:12] !log hashar synchronized wmf-config/InitialiseSettings.php 'touch' [14:57:17] New review: Hashar; "deployed live. Thanks for the patch!" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28219 [14:57:24] Logged the message, Master [14:59:32] New patchset: Ottomata; "Removing log2udp relay for an11." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28220 [15:00:38] New patchset: Dereckson; "Cleaning wmf-config/throttle.php: - removing old entries - making more clear IP could be a string or an array - typo - removing duplicate start here line" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28221 [15:00:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28220 [15:00:43] New patchset: Dereckson; "Cleaning wmf-config/throttle.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28221 [15:01:31] New review: Dereckson; "PS1: Initial change" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/28221 [15:03:22] paravoid: mark: I will face a similar issue to deploy the Android SDK . It is required to build the mobile applications. The installer download a bunch of files directly from Google :-D So that would most probably require packaging :-] [15:03:29] New review: Dereckson; "There is a test/mediawiki repository for demos." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/27449 [15:03:44] are you suggesting to have puppet fetch the android sdk? [15:03:49] c'mon [15:04:07] I start a puppet class to install from Google indeed. Forgot about it being a third party hehe [15:04:17] push for a proper deployment system already [15:04:25] it doesn't have to be packaged imho [15:04:33] just deployed outside of puppet [15:04:37] like scap [15:05:19] it's not just about being a third-party, it's just abusing puppet [15:05:32] it's not like puppet scales so well that we can shove things into it [15:06:26] my idea was to download the SDK from google and ask puppet to execute the command to untar it somewhere and then run the updater [15:06:55] https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=modules/androidsdk/manifests/bootstrap.pp;h=4c93dded64a4fc3c7873816b20085ea15489da94;hb=c5480c9c6e549902443c209825517f9319373281 [15:08:05] instead of puppet should I write a documentation about how to install the Android SDK manually ? Or maybe write a shell script that takes care of the steps. [15:08:08] and just provide that script via puppet ? [15:08:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:08:19] PROBLEM - BGP status on cr2-eqiad is CRITICAL: CRITICAL: No response from remote host 208.80.154.197, [15:08:46] PROBLEM - Packetloss_Average on oxygen is CRITICAL: CRITICAL: packet_loss_average is 8.81806448 (gt 8.0) [15:15:12] jeremyb: paravoid yes, I'm waiting on a porting of a patch to librsvg. other than that, the test box was preforming just fine. [15:22:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.224 seconds [15:25:07] paravoid: any tip about how I could provide the Android SDK ? :-D [15:27:29] it's ALSO about third party repos [15:30:37] I understand it is nasty [15:30:49] not sure how to get the file deployed now :-] [15:31:10] I originally downloaded the installer from Google and ran it form cli which made it download a bunch of stuff from google.com [15:38:16] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28220 [15:46:59] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is 0.0 [15:56:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:09:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.973 seconds [16:14:42] mark: paravoid sent an email to ops-l about the Android SDK. Will be easier to follow up on that. [16:45:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:45:29] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [16:50:53] PROBLEM - Host db42 is DOWN: PING CRITICAL - Packet loss = 100% [16:59:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.106 seconds [17:02:35] RECOVERY - Host db42 is UP: PING WARNING - Packet loss = 73%, RTA = 0.26 ms [17:27:41] New patchset: Demon; "Update Java path for Precise install" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28232 [17:28:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28232 [17:31:59] RECOVERY - SSH on db42 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:32:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:38:44] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [17:46:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.438 seconds [17:48:02] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.005 second response time on port 11000 [17:56:35] PROBLEM - Puppet freshness on srv220 is CRITICAL: Puppet has not run in the last 10 hours [17:56:35] PROBLEM - Puppet freshness on sq42 is CRITICAL: Puppet has not run in the last 10 hours [17:57:29] PROBLEM - Puppet freshness on srv297 is CRITICAL: Puppet has not run in the last 10 hours [17:59:49] New patchset: Matthias Mullie; "Init Wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28238 [18:00:01] New review: Matthias Mullie; "WIP" [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/28238 [18:11:28] New patchset: Dereckson; "Cleaning wmf-config/throttle.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28221 [18:12:10] New review: Dereckson; "PS1: Initial change" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/28221 [18:16:58] Change abandoned: Demon; "Not doing it this way. It'll be more like commons, so we'll use special.dblist." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28040 [18:18:29] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [18:18:29] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [18:18:44] ^demon: so it will be like datawiki or something? [18:19:08] <^demon> Yeah, I was thinking that. [18:19:18] <^demon> It can just be a special.dblist entry [18:20:09] <^demon> So that side's easy. Harder part is the domain structure. wikidata.org is the actual wiki, and the subdomains will redirect (somehow) [18:20:21] <^demon> (Or be transparent, maybe, dunno yet) [18:20:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:22:05] ^demon: there is no wikimedia.org? [18:22:15] <^demon> No, we have wikidata.org [18:22:19] ok [18:22:32] <^demon> And en.wikidata.org -> wikidata.org?setlang=en or something [18:23:16] <^demon> Although, it might be easier to set it up as data.wikimedia.org, and still have the *wikidata.org stuff redirect. [18:23:28] <^demon> It'd be way easier that way I'd imagine. [18:24:18] where's all that being discussed? [18:24:26] besides here I mean [18:25:07] <^demon> I'm basing it on the conversations we've had with the wikidata team. There's some gaps in what I know, and I'm trying to figure it out. [18:25:13] <^demon> So I can ask ops for the proper things :) [18:26:37] <^demon> aude: Your input is appreciated :) I'm pinging DanielK now. [18:27:05] sure [18:27:12] maybe there should be mails about that? [18:27:20] the plan re: wikidata domains and constraints [18:27:28] to ops or even engineering [18:27:39] ^demon: Clearly it should be data.commons.wikimedia.org. :-) [18:27:51] hi [18:27:51] James_F: no! [18:27:56] * James_F laughs. [18:28:00] aude: Not a serious suggestion. [18:28:05] :) [18:28:10] <^demon> paravoid: Yes. Let's try and get some clarification now, then we can discuss it. [18:28:35] so, what's the question? [18:28:40] aude: Though of course Commons is a good example of a 'real' multi-lingual project that started as a "service" project, so data.wikimedia.org wouldn't be stupid. [18:28:53] I am a gangster [18:29:06] James_F: that could redirect to wikidata.org? [18:29:29] aude: Or the other way around. [18:29:30] <^demon> DanielK_WMDE: So I understand the language sudomains redirect. But is the central wiki going to be at wikidata.org (the TLD) or something like data.wikimedia.org (much easier to setup) [18:29:43] I am a gangster [18:29:49] A very good gangster [18:29:57] ^demon: at wikidata.org [18:30:11] data.wikimedia.org could be a redirect [18:30:14] Have you forgotten me already [18:30:25] i think de.wikidata.org would redirect to wikidata.org/wiki/Main_Page/de [18:30:28] (or localized maybe) [18:30:48] or actually, Project:Main_Page/de or something [18:30:56] since items are in the main namespace [18:31:20] ^demon: if using wikidata.org is holding us back, we can use data.wikimedia.org for a few days, and turn it into an alias later [18:31:26] ^demon: what would the "lang" and "site" be for wikidata.org? [18:31:37] <^demon> default lang is en, I assume. [18:31:44] i think en is okay [18:31:49] it's the default? [18:31:54] yes, in leu of anything better, content language is "en". [18:32:04] hm... [18:32:09] ^demon: but it won't be en.wikidata.org [18:32:19] I see things that kind of assume something like that [18:32:30] like MWMultiWrite or rewrite.py [18:32:34] <^demon> DanielK_WMDE: I'm thinking of doing data.wikimedia.org. The infrastructure is already in place and we can setup new *.wm.o wikis much much faster (and will let us soft-launch on time) [18:32:35] ^demon: can the config scripts deal with a domain that has no subdomain? if that is a problem, use "www" for now, but let's get rid of it as soon as possible. [18:32:47] <^demon> That will give us time to discuss with ops/engineering how best to set up the rest of it. [18:33:39] ^demon: as a temporary solution, that's fine. it should just be clear that we really want wikidata.org to work. [18:33:44] agree with daniel [18:33:48] <^demon> Indeed. [18:33:53] <^demon> We definitely want it working. [18:34:13] it would be tough to convince denny to use another domain, he loves this one :) [18:34:22] realize we can't exactly copy the settings from some other wikimedia wiki [18:34:30] <^demon> paravoid: I'll send an e-mail to engineering@ so we can start trying to figure it out. Thanks for reminding me on that. [18:34:43] it's a little different for our usecase yet shouldn't be that hard to setup [18:34:49] but i can see that a language-like subdomain is assumed in some places. having no subdomain is unprecedented, i think? [18:34:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.542 seconds [18:35:06] <^demon> no subdomain typically takes you to the landing page. [18:35:16] right, like we have now [18:35:22] our lovely landing page :) [18:35:47] ^demon: just saying :) it sounded interesting and I wanted to read about the background and it seemed more proper to ask for a mail to everyone rather than someone explaining it to me :-) [18:36:25] <^demon> paravoid: There's some uncharted territory here. Should be fun for everyone :) [18:36:30] :) [18:36:38] huh. wikispecies.org is registered to Keyword Acquisitions, Inc. [18:37:21] <^demon> I always forget wikispecies exists until someone mentions wikispecies. [18:37:27] heh [18:53:38] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [18:58:38] !log shutting down db67 to add h800 card [18:58:50] Logged the message, Master [19:09:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:19:31] !log adding dns and geodns entries for donate-lb [19:19:45] Logged the message, Master [19:20:04] !log deployed squid refresh_pattern tweaks for testing [19:20:16] Logged the message, Master [19:24:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds [19:52:27] Alright, no one has a deploy window right now so I'm gonna deploy two VisualEditor fixes, should be quick [19:57:15] !log Ran namespaceDupes.php on azwikibooks [19:57:27] Logged the message, Master [19:57:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:08:35] !log storage3 down for troubleshooting [20:08:46] Logged the message, Master [20:13:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds [20:14:08] PROBLEM - Host storage3 is DOWN: PING CRITICAL - Packet loss = 100% [20:18:29] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [20:46:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:50:14] Alright, *actually* gonna do that VE change now [20:59:27] is someone looking at the redirect loop issues? [20:59:40] Yes, see private channel [20:59:41] probably not [21:00:08] Jeff and Faidon are poking at it [21:01:59] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.040 seconds [21:06:29] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [21:12:31] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [21:17:26] New patchset: Adamw; "UNTESTED assume geoip is thread-safe" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28295 [21:18:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28295 [21:34:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:40:43] !log reedy synchronized php-1.21wmf2/includes/specials/SpecialUndelete.php [21:40:57] Logged the message, Master [21:43:01] RobH: what do you think about putting both of the 3u shelves in the front of each cabinet - i ask because the horizontal pdu's are half depth, and then we can use all that rack space since the 3u storage shelves are half depht [21:45:08] !log deploying squid configs, reverting a change that was too aggressive on caching content and broke https-by-default wikis [21:45:20] Logged the message, Master [21:47:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.401 seconds [22:05:43] LeslieCarr: Getting excited for all the new toys? :D [22:15:28] yes, however my finger which performed the blood sacrifice to the datacenter gods is no [22:15:29] not [22:21:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:33:38] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [22:33:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.364 seconds [22:46:43] !log Running mw-update-l10n and friends to catch up newest MobileFrontend messages [22:46:56] Logged the message, Master [22:47:27] New patchset: Pyoungmeister; "coredb class. for review by asher and anyone else :)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28311 [22:48:27] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/28311 [22:50:15] !log redeployed squid refresh_pattern no-cache tweaks [22:50:27] Logged the message, Master [22:54:42] New patchset: Pyoungmeister; "coredb class. for review by asher and anyone else :)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28311 [22:55:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28311 [22:57:17] New patchset: Asher; "db67 is replacing db42" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28313 [22:58:16] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28313 [23:06:12] binasher: lots of "srv289 ruwiki OldLocalFile::upgradeRow 10.0.6.41 1205 Lock wait timeout exceeded; try restarting transaction (10.0.6.41) UPDATE `oldimage` SET oi_size = '45247',oi_width = '0',oi_height = '0',oi_bits = '0',oi_media_type = 'DRAWING',oi_major_mime = 'image',oi_minor_mime = 'svg+xml',oi_metadata = '0',oi_sha1 = 'q7215g7nys8i5c93p17e4jt955lkg8l' WHERE... [23:06:12] ...oi_name = 'Soccerball_current_event.svg' AND oi_archive_name = '20121016092141!Soccerball_current_event.svg'" [23:09:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:18:05] AaronSchulz: binasher has just been trying to work out why MW redirects have squid caching suppressed [23:18:24] it seems to be due to https://gerrit.wikimedia.org/r/#/c/8622/ [23:18:43] for HistoryAction it didn't really matter that the canonical title was used, instead of the requested title [23:18:56] because the history link from the redirect page goes to the canonical title [23:19:05] but it kind of does matter for ViewAction [23:19:22] Liangent moved your check to Wiki.php but the function is still the same [23:25:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.021 seconds [23:29:09] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 203 seconds [23:32:12] AaronSchulz: what calls OldLocalFile::upgradeRow? [23:32:38] I had a look at some of those during the swift video problem [23:32:56] you get those lock wait timeouts as a result of swift slowness because the lock is held while swift requests are done [23:33:11] !log maxsem synchronized php-1.21wmf2/extensions/MobileFrontend/ 'https://www.mediawiki.org/wiki/Extension:MobileFrontend/Deployments/2012-10-15%2616' [23:33:19] most of the upgradeRow() requests seemed to be coming from invalid SVGs [23:33:23] Logged the message, Master [23:33:31] i'm not seeing any OldLocalFile::upgradeRow calls making it into the binlog on s7 where the current errors are, 100% fail [23:33:38] and doesn't seem to be lock contention within mysql [23:33:42] the code is such that every thumbnail request will cause upgradeRow() to be triggered again [23:33:51] so yah, just holding the lock while swift is being swift would do it [23:34:05]