[00:00:05] !log reedy synchronized php-1.21wmf7/maintenance/mcc.php [00:00:18] Logged the message, Master [00:02:17] Reedy should rename himself to Synchronizer [00:04:36] So, wait, why'd the Apaches go ape-shit? [00:05:06] post-mortem forthcoming [00:05:29] Susan: [00:05:30] The last time that search was failed over from eqiad to pmtpa, the search_prefix pool was not failed over, so some small amount of traffic was still going to eqiad. I started java upgrading in the "inactive" datacenter, eqiad. Apparently, this needs some more attention (haven't done this bit yet), and all queries to the search_prefix vip hung, which caused search opensearch queries to hang, which caused the api to go down. Ain't no f [00:07:58] RobH: I'm sat in a room at 11.7C/53F.. And you're cold? :p [00:08:09] ? [00:08:09] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [00:08:13] its cold outside. [00:08:18] Reedy: cold aisle? [00:08:24] notpeter: "Ain't no f ..." # You got cut off. [00:08:34] Ain't no failure like a cascading failure, because a cascading failure don't stop... [00:08:37] "Office" [00:08:41] Reedy, no heating? [00:08:42] Ain't no failure like a cascading failure, because a cascading failure don't stop...\ [00:08:42] Susan: ^ [00:08:47] :D [00:08:59] it was 8.4C earlier [00:09:04] inside [00:09:42] Interesting. I didn't think search load was high enough to take anything down. [00:10:06] Susan, it's a prefix search [00:10:12] Ah. [00:10:21] normal search was fine, prefix was misrouted to eqiad. [00:10:22] I do love how the biggest threat to Wikimedia is almost always Wikimedia, though. [00:10:28] when you type something in the search bar, prefix search gets called several times [00:10:32] Right, right. [00:11:22] Susan: I know some fool gave me root on this thing ;) [00:11:27] PROBLEM - Puppet freshness on mw40 is CRITICAL: Puppet has not run in the last 10 hours [00:11:59] Reedy: your review queue is ludicrous :) [00:12:27] Most are unread [00:15:00] hehe [00:15:31] well, there's both the us messing with ourselves and then the "this would have worked if there were 0 external users using the site" [00:15:44] so to be safe … we must revoke all of our root access and unplug the servers from the outside world [00:15:54] hahaha [00:16:07] progress is a lie. we must protect the data at all costs [00:16:37] exactly! [00:19:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 289 seconds [00:21:30] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 17 seconds [00:22:14] We should get people to send back all the images they've borrowed [00:22:17] We need them [00:37:31] !log reloading unresponsive ms1002 [00:37:42] Logged the message, Mistress of the network gear. [00:38:41] !log powercycling ms1002 [00:38:50] Logged the message, Master [00:41:13] PROBLEM - Host ms1002 is DOWN: PING CRITICAL - Packet loss = 100% [00:41:50] udevd-work[443]: open /dev/null failed: No such file or directory [00:42:07] just a timing thing.i suppose [00:42:20] /dev/null not existing yet when udevd wants it [00:42:27] ms1002 back at login: [00:42:36] yay [00:42:40] let's see if puppet works [00:42:52] it's actually starting, which is better than it's ever done [00:43:01] RECOVERY - Host ms1002 is UP: PING OK - Packet loss = 0%, RTA = 26.51 ms [00:43:04] we can also install package upgrades while on it [00:43:07] or? [00:43:19] RECOVERY - Puppet freshness on ms1002 is OK: puppet ran at Thu Jan 17 00:43:05 UTC 2013 [00:43:22] not because we want to not change it to find out the issue.. shrug [00:43:23] yay [00:43:27] sounds good [00:43:38] you want to get that ? [00:44:08] !log installing package upgrades on ms1002 [00:44:17] Logged the message, Master [00:44:21] hey, among them are also salt packages [00:45:29] LeslieCarr: hey [00:45:34] you were looking for me? [00:45:43] she _just_ logged off [00:45:51] but i can still relay a message [00:46:09] she pinged me a few hours ago [00:46:10] I was out [00:46:22] she cant remember:) [00:46:37] so probably not that urgent [00:46:43] thanks :) [00:46:47] sure [00:47:16] paravoid: do you know about ms1002 ? [00:47:37] it uses misc::images::rsyncd and misc::images::rsync [00:47:43] ariel would know [00:47:45] I don't for sure [00:48:16] alright, because i was wondering if we can remove that whole class [00:48:23] let me add Ariel to a gerrit change then [00:48:31] heh, already on it:)ok [00:49:53] Ryan_Lane: when installing package upgrades and getting newer salt packages, i am being asked if i want to keep current or install maintainer version for /etc/salt/minion [00:50:11] mutante: always use the default [00:50:30] New patchset: Reedy; "Make mctest wikiless" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44368 [00:50:32] ok, that is keeping the current one [00:50:37] yep [00:50:51] sockpuppet in it. looks good [00:51:08] * Ryan_Lane nods [00:51:15] New patchset: Reedy; "Make mctest and mcc wikiless" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44368 [00:51:42] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44368 [01:01:31] New patchset: Asher; "ruby + string concat doesn't work in erb without to_s casting, so nevermind that" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44369 [01:02:13] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44369 [01:10:42] !log DNS update to remove some useless stuff from esams zone (RT-4331) [01:12:21] Logged the message, Master [01:15:43] RECOVERY - mysqld processes on db1001 is OK: PROCS OK: 1 process with command name mysqld [01:15:52] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [01:15:52] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [01:20:31] PROBLEM - MySQL Slave Running on db1001 is CRITICAL: CRIT replication Slave_IO_Running: No Slave_SQL_Running: No Last_Error: Rollback done for prepared transaction because its XID was not in the [01:21:25] PROBLEM - MySQL Replication Heartbeat on db1001 is CRITICAL: CRIT replication delay 68609 seconds [01:26:58] RECOVERY - MySQL Replication Heartbeat on db1001 is OK: OK replication delay seconds [01:27:34] RECOVERY - MySQL Slave Running on db1001 is OK: OK replication [01:32:22] PROBLEM - MySQL Replication Heartbeat on db1001 is CRITICAL: CRIT replication delay 68785 seconds [01:32:55] PROBLEM - MySQL Slave Delay on db1001 is CRITICAL: CRIT replication delay 68789 seconds [01:52:52] New patchset: Asher; "issues with remote secondary master expansion and account pws" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44372 [01:53:20] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44372 [02:05:01] New patchset: Asher; "adding mha config for the datacenter failover case make configs 0400" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44374 [02:11:11] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44374 [02:26:58] !log LocalisationUpdate completed (1.21wmf7) at Thu Jan 17 02:26:58 UTC 2013 [02:27:10] Logged the message, Master [02:38:37] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [02:38:37] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [02:38:37] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [02:38:37] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [02:45:28] !log LocalisationUpdate completed (1.21wmf8) at Thu Jan 17 02:45:26 UTC 2013 [02:45:39] Logged the message, Master [02:54:39] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: Puppet has not run in the last 10 hours [03:21:45] New patchset: Lwelling; "Add PHP Pear Mail_mime class to Apache manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44377 [03:49:19] !log stopped replication on db67.pmtpa (enwiki research slave), re-replicating off the pmtpa master instead of cross datacenter. original config isn't well supported by mha. [03:49:29] Logged the message, Master [03:51:16] PROBLEM - Varnish HTCP daemon on cp1043 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 1001 (varnishhtcpd), args varnishhtcpd worker [03:52:42] PROBLEM - MySQL Replication Heartbeat on db67 is CRITICAL: CRIT replication delay 341 seconds [03:56:36] !log resumed replication on db67 [03:56:46] Logged the message, Master [03:58:06] RECOVERY - MySQL Replication Heartbeat on db67 is OK: OK replication delay 0 seconds [04:04:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:08:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.066 seconds [04:08:39] On what PHP version runs our wikis? [04:09:13] -our wikis +our wikis infrastructure [04:09:16] Dereckson: 5.3.10; we support 5.3+ [04:09:49] Dereckson: isn't that on [[special:version]] ? [04:09:55] * jeremyb runs off to sleep [04:10:09] Thank you. [04:11:33] np! and yes, i think it's also on special:version [04:16:06] RECOVERY - Varnish HTCP daemon on cp1043 is OK: PROCS OK: 1 process with UID = 1001 (varnishhtcpd), args varnishhtcpd worker [04:17:19] New patchset: Asher; "reducing pt-heartbeat interval to 500ms" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44378 [04:18:13] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44378 [04:52:59] New patchset: Lwelling; "Add PHP Pear Mail_mime class to Apache manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44377 [05:06:35] fyi: Thu Jan 17 5:05:01 UTC 2013 hume foundationwiki Error connecting to db1013.eqiad.wmnet: Can't connect to MySQL server on 'db1013.eqiad.wmnet' (111) [05:06:42] something fundraising related [05:16:42] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [05:27:39] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [05:29:09] New patchset: Asher; "stashing $::mw_primary under /etc/mha/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44380 [05:29:55] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44380 [05:45:48] RECOVERY - MySQL Slave Delay on db1001 is OK: OK replication delay 0 seconds [05:46:06] RECOVERY - MySQL Replication Heartbeat on db1001 is OK: OK replication delay 0 seconds [06:02:21] New review: ArielGlenn; "Pretty sure this can go. Feel free to: merge it in yourself, or nag me when you are around and I'll ..." [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/36146 [07:26:01] PROBLEM - LVS HTTP IPv4 on wikiquote-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:27:40] RECOVERY - LVS HTTP IPv4 on wikiquote-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 54939 bytes in 0.616 seconds [07:27:49] PROBLEM - LVS HTTPS IPv4 on wikibooks-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:30:13] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:31:25] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.32704181159 (gt 8.0) [07:31:52] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 51838 bytes in 0.747 seconds [07:31:52] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:33:04] RECOVERY - LVS HTTPS IPv4 on wikibooks-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 47607 bytes in 0.763 seconds [07:33:31] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 66326 bytes in 0.881 seconds [07:35:39] er? [07:37:10] apergos: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=LVS%2520loadbalancers%2520esams&tab=m&vn= hrm [07:37:48] network eh? [07:38:15] looks like it to me [07:58:10] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.44014175182 [08:36:16] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [09:13:46] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 196 seconds [09:14:40] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 212 seconds [09:20:58] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [09:22:01] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [09:24:16] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [09:33:06] RECOVERY - Puppet freshness on mw40 is OK: puppet ran at Thu Jan 17 09:32:43 UTC 2013 [09:48:33] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [09:53:31] PROBLEM - Puppet freshness on virt1008 is CRITICAL: Puppet has not run in the last 10 hours [09:55:12] New review: Hashar; "As I said on IRC, wmf-config/InitialiseSettings-labs.php is loaded on labs AFTER InitialiseSettings...." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44278 [09:56:30] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [09:58:50] New review: Hashar; "For mobile-labs.php, maybe consider including the mobile.php file, then override settings for labs." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44278 [10:38:14] PROBLEM - Puppet freshness on cp1026 is CRITICAL: Puppet has not run in the last 10 hours [10:47:05] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 184 seconds [10:48:26] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 232 seconds [10:52:11] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [10:52:38] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [10:55:11] New patchset: Ori.livneh; "(Bug 43879) Update eowikiquote logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44390 [11:00:27] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44390 [11:17:14] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [11:17:14] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [11:18:53] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 182 seconds [11:20:32] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 228 seconds [11:25:56] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [11:26:05] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [11:43:30] !log "mwscriptwikiset cleanupUploadStash.php all.dblist" on hume (kill it if swift overloads) [11:43:40] Logged the message, Master [11:45:49] New patchset: Denny Vrandecic; "Remove wikidata portal (not used anymore)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44400 [11:49:41] paravoid: https://gerrit.wikimedia.org/r/#/c/37968/ [11:49:56] foreachwiki cleanupUploadStash.php works and is shorter too ;) [11:50:12] oh [11:50:48] ok, I'll leave it run for now [11:51:09] and when that first run completes, I'll merge that commit anyway [11:51:18] thanks [11:51:25] I've been doing it manually now and again when I remember [12:03:36] New patchset: Reedy; "RT #2295: Run cleanupUploadStash across all wikis daily" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37968 [12:39:25] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [12:39:25] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [12:39:25] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [12:39:25] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [12:45:36] paravoid: good afternoon :-] Are you aware of any tool to convert a python pip package to a Debian package? [12:46:00] paravoid: I thought about copy pasting an existing one and just change the names around :-] [12:46:32] the tool is called Hashar! [12:47:30] yeah I noticed that :-D [12:47:46] ahh I need to learn about uupdate [12:55:28] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: Puppet has not run in the last 10 hours [12:56:43] !log Gerrit: created repository operations/debs/python-voluptuous to package http://pypi.python.org/pypi/voluptuous/ [12:56:47] Logged the message, Master [12:56:51] wich me luck [12:57:13] wish [12:57:15] whatever [13:13:47] hmm [13:13:53] less than 15 minutes to build it [13:14:04] I found out a script that does everything for us [13:14:05] https://wikitech.wikimedia.org/view/Debianize_python_package [13:14:12] (based on fpm ) [13:14:23] but then I don't have debian directory to push in Gerit [13:42:16] New patchset: Denny Vrandecic; "(bug 41847) Drop the www. from the domain wikidata.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44406 [13:42:39] fpm? ewww [13:42:49] New patchset: Denny Vrandecic; "(bug 41847) Drop the www. on wikidata.org" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/44407 [13:42:51] %: [13:42:56] dh $@ --with python2 [13:42:59] should be enough :) [13:43:52] hashar: http://wiki.debian.org/Python/Packaging [13:44:33] ahh [13:44:38] weird [13:44:50] I see varnish not honouring If-Range: at all [13:45:20] New patchset: Denny Vrandecic; "(bug 41847) Drop the www. on wikidata.org" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/44407 [13:47:06] where? [13:47:08] everywhere? [13:47:25] as far as I can tell it doesn't inspect the header anywhere in the source [13:47:35] hm that reminds me [13:47:44] which matches with my testing, I did If-Range: requests and it ignored the header [13:48:53] paravoid: yeah doing that right now [13:49:21] is that the opera bugzilla bug? [13:49:24] yes [13:49:32] i can't reproduce the results in the bug at all [13:51:26] and of course upstream does not properly tag its versions grmblblbl [13:54:39] New patchset: Hashar; "initialize release" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [13:55:24] mark: paravoid: here is my packaging :-] [14:08:15] New patchset: Hashar; "initialize release" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44411 [14:08:42] bah [14:10:07] New patchset: Hashar; "(bug 44061) initialize release" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [14:10:19] Change abandoned: Hashar; "dupe of https://gerrit.wikimedia.org/r/#/c/44408/" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44411 [14:11:55] New review: Hashar; "our bug is https://bugzilla.wikimedia.org/show_bug.cgi?id=44061" [operations/debs/python-voluptuous] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44408 [14:17:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:18:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.038 seconds [14:45:06] New patchset: Mark Bergsma; "Add missing create_container param dstconn" [operations/software] (master) - https://gerrit.wikimedia.org/r/44418 [14:45:06] New patchset: Mark Bergsma; "Remove gets from the stats, not very useful" [operations/software] (master) - https://gerrit.wikimedia.org/r/44419 [14:45:43] Change merged: Mark Bergsma; [operations/software] (master) - https://gerrit.wikimedia.org/r/44418 [14:45:58] Change merged: Mark Bergsma; [operations/software] (master) - https://gerrit.wikimedia.org/r/44419 [14:48:40] paravoid: filling an ITP :-D [14:53:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:00:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.481 seconds [15:02:52] New patchset: Hashar; "(bug 44061) initialize release" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [15:03:25] !log demon synchronized php-1.21wmf8/extensions/Wikibase/repo/resources/Resources.php 'Deploying Ie22a7213' [15:03:34] Logged the message, Master [15:03:45] New review: Hashar; "Fixed various lintian errors from PS2." [operations/debs/python-voluptuous] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/44408 [15:03:49] !log demon synchronized php-1.21wmf8/extensions/Wikibase/repo/resources/wikibase.ui.entityViewInit.js 'Deploying Ie22a7213' [15:03:59] Logged the message, Master [15:06:53] New patchset: Mark Bergsma; "Sync deletes on the source to the destination" [operations/software] (master) - https://gerrit.wikimedia.org/r/44422 [15:13:08] !log reedy synchronized docroot [15:13:18] Logged the message, Master [15:16:22] !log reedy synchronized live-1.5/ [15:16:32] Logged the message, Master [15:17:01] New patchset: Reedy; "Update php symlink" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44423 [15:17:17] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44400 [15:17:50] !log reedy synchronized extract2.php [15:18:00] Logged the message, Master [15:18:07] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [15:28:21] New review: Hashar; "You should be able to listen on port 80 as well :-D" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44164 [15:29:04] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [15:30:44] New patchset: Mark Bergsma; "Sync deletes on the source to the destination" [operations/software] (master) - https://gerrit.wikimedia.org/r/44422 [15:32:12] New review: Dereckson; "Shellpolicy issue mitigated." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/42774 [15:33:19] New patchset: Hashar; "(bug 44061) initialize release" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [15:33:26] PROBLEM - Host virt1008 is DOWN: PING CRITICAL - Packet loss = 100% [15:33:49] New review: Hashar; "IPT #698354" [operations/debs/python-voluptuous] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44408 [15:36:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:40:55] New patchset: Mark Bergsma; "Sync deletes on the source to the destination" [operations/software] (master) - https://gerrit.wikimedia.org/r/44422 [15:42:49] New patchset: Reedy; "(bug 43760) Enable Collection on is.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42976 [15:43:06] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42976 [15:44:09] New patchset: Reedy; "(bug 43834) Enable WebFonts on en.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43351 [15:44:23] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43351 [15:45:17] RECOVERY - Host virt1008 is UP: PING OK - Packet loss = 0%, RTA = 26.59 ms [15:45:24] New patchset: Reedy; "(bug 29692) Per-wiki namespace aliases shouldn't override (remove) global ones" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/25737 [15:45:26] New patchset: Mark Bergsma; "Sync deletes on the source to the destination" [operations/software] (master) - https://gerrit.wikimedia.org/r/44422 [15:45:37] New patchset: Reedy; "Remove wgUseTagFilter. Same as default and no longer needed" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43004 [15:45:49] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43004 [15:46:19] New patchset: Reedy; "(bug 43830) Namespace configuration for de.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43359 [15:46:24] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43359 [15:46:43] New patchset: Reedy; "(bug 43851) Allow enwikivoyage bureaucrats to remove sysops" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44276 [15:46:53] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44276 [15:47:27] New review: Reedy; "Apache change: https://gerrit.wikimedia.org/r/#/c/44407/" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44406 [15:48:15] New patchset: Reedy; "Update php symlink" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44423 [15:48:21] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44423 [15:49:32] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44091 [15:49:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.031 seconds [15:49:55] New patchset: Reedy; "(bug 43769) Close ik.wiktionary and zh-min-nan.wikiquote" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42978 [15:50:00] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42978 [15:50:39] New review: Reedy; "Needs rebasing" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/42995 [15:50:48] New patchset: Reedy; "(bug 43760) Enable WikiLove on is.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42915 [15:51:46] !log Created wikilove tables on iswiki [15:51:56] Logged the message, Master [15:52:05] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42915 [15:52:22] New patchset: Reedy; "Ensure confirmed has the same rights as autoconfirmed." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43703 [15:52:57] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43703 [15:53:29] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44085 [15:53:43] New patchset: Reedy; "(bug 43659) Import sources for eo.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42774 [15:53:58] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42774 [15:55:08] !log reedy synchronized wmf-config/ [15:55:18] Logged the message, Master [15:59:14] PROBLEM - Host virt1008 is DOWN: PING CRITICAL - Packet loss = 100% [15:59:50] RECOVERY - SSH on virt1008 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:59:59] RECOVERY - Host virt1008 is UP: PING OK - Packet loss = 0%, RTA = 26.52 ms [16:17:28] New patchset: Reedy; "Bug 44039 - Add a category to $wgArticleFeedbackBlacklistCategories for Portuguese Wikibooks" [operations/mediawiki-config] (newdeploy) - https://gerrit.wikimedia.org/r/44428 [16:18:00] Change merged: Reedy; [operations/mediawiki-config] (newdeploy) - https://gerrit.wikimedia.org/r/44428 [16:18:27] Damn it. [16:19:15] New patchset: Reedy; "Bug 44039 - Add a category to $wgArticleFeedbackBlacklistCategories for Portuguese Wikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44429 [16:19:38] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44429 [16:20:07] !log reedy synchronized wmf-config/InitialiseSettings.php [16:20:18] Logged the message, Master [16:22:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:26:14] RECOVERY - NTP on virt1008 is OK: NTP OK: Offset -0.03484725952 secs [16:33:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.680 seconds [16:35:18] New patchset: Cmjohnson; "adding virt1008 to decom.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44431 [16:35:55] New review: Cmjohnson; "Looks good to me" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/44431 [16:35:55] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44431 [16:38:47] apergos, around? [16:38:57] yep, what's going on? [16:39:16] apergos, remember you merged https://gerrit.wikimedia.org/r/#/c/43218/ - it doesn't seem to be running [16:39:46] what hosst is that again? [16:40:15] hume. currently running these scripts manually there - see a lot of unindexed data [16:40:17] ah hume [16:40:29] well there was some problem with it that I fixed up [16:40:38] mm or was it another hume cron [16:40:41] lemme look [16:41:40] I see no emails, so that's good [16:42:32] I see em in the crontab over there [16:42:44] weird [16:42:56] yep [16:43:24] oh you know what [16:43:29] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 250 seconds [16:43:30] I bet .... [16:43:36] foreachwikiindblist /home/wikipedia/common/wikipedia.dblist extensions/GeoData/solrupdate.php [16:43:45] that it doesn't know whre foreachwikiindblist is [16:43:49] and we never hear about it cause [16:44:05] I'm running it right now [16:44:05] PROBLEM - MySQL Replication Heartbeat on db58 is CRITICAL: CRIT replication delay 231 seconds [16:44:07]   command => "/usr/local/bin/update-geodata >/dev/null 2>&1" [16:44:14] PROBLEM - MySQL Replication Heartbeat on db56 is CRITICAL: CRIT replication delay 295 seconds [16:44:15] yeah but you're running from cmd line [16:44:18] not out of cron [16:44:24] owcie, path? [16:44:41] PROBLEM - MySQL Replication Heartbeat on db1024 is CRITICAL: CRIT replication delay 322 seconds [16:44:42] try taking the redirs to dev null off the puppet change [16:44:52] and let's see what it complains about [16:44:59] PROBLEM - MySQL Slave Delay on db56 is CRITICAL: CRIT replication delay 341 seconds [16:45:26] PROBLEM - MySQL Replication Heartbeat on db1028 is CRITICAL: CRIT replication delay 367 seconds [16:45:59] mark: i am racking 10 db's today and want to put them in b5 where the netapp is located. Do you have any future plans for that rack or okay to put db's there? [16:46:02] RECOVERY - MySQL Replication Heartbeat on db56 is OK: OK replication delay 0 seconds [16:46:02] RECOVERY - MySQL Replication Heartbeat on db58 is OK: OK replication delay 0 seconds [16:46:27] on it [16:46:29] RECOVERY - MySQL Replication Heartbeat on db1024 is OK: OK replication delay 0 seconds [16:46:47] RECOVERY - MySQL Slave Delay on db56 is OK: OK replication delay 0 seconds [16:47:04] cmjohnson1: no don't do that [16:47:14] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [16:47:15] that rack is reserved for the netapp [16:47:22] why not put it in row C, isn't there plenty of space there? [16:47:24] k [16:48:22] if there isn't one already, designate one rack to be the DB rack [16:49:02] RECOVERY - MySQL Replication Heartbeat on db1028 is OK: OK replication delay 21 seconds [16:49:40] New patchset: MaxSem; "GeoData scripts debugging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44432 [16:49:47] apergos, ^^^ [16:50:05] mark: okay...was thinking of filling in all the space but...i will use row c [16:50:08] thx [16:50:34] looking [16:51:28] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44432 [16:51:48] Reedy: was https://gerrit.wikimedia.org/r/#/c/25737/ supposed to be merged? [16:51:57] No [16:52:02] oki [16:52:38] trying to stop it becoming too stale (less manual rebasing) [16:52:42] till it gets reviewed again [16:54:12] updated, now we wait for next cron run [16:58:29] New patchset: Demon; "Enable micro design for itwikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44434 [17:01:33] New patchset: Demon; "Also match "bug: 123"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44436 [17:02:55] New patchset: Demon; "Also match "bug: 123"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44436 [17:03:26] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44434 [17:06:15] !log demon synchronized wmf-config/InitialiseSettings.php 'micro design for itwikivoyage' [17:06:25] Logged the message, Master [17:09:35] MaxSem: [17:09:37] /usr/local/bin/update-geodata: line 8: foreachwikiindblist: command not found [17:09:38] /usr/local/bin/update-geodata: line 9: foreachwikiindblist: command not found [17:09:43] just sayin... :-) [17:10:07] apergos, thanks - fixing [17:10:25] yw [17:17:47] New patchset: MaxSem; "Use full path in GeoData scripts to fix cron" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44439 [17:18:44] apergos, ^^ [17:19:24] 2>&1 I suppose you want? [17:19:52] or yu want stderr to be emailed to us? [17:19:52] MaxSem: [17:21:20] apergos, yes - if there are errors, they'd better be known. so if possible, I'd like to keep it this way unless it creates noise. well, if it creates noise you can always kick me to fix it:) [17:22:22] ok, fine by me [17:22:49] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44439 [17:23:00] thanks!:) [17:23:34] it could be cool if it were possible to send cronspam to specific emails [17:23:39] it is [17:23:44] MAILTO= [17:24:27] in env? [17:24:44] environment => 'MAILTO=someting@wikimedia.org" [17:24:53] if you do that in a root crontb it's a problem [17:25:09] but if you do it in one that you want all the mail from all the jobs to go the same place it's grea [17:25:09] t [17:25:13] and my mailbox is happier [17:25:59] mmm, so to be fully isolated it has to be run as a unique user? [17:26:22] well otherwise you have to make sure that [17:26:34] people who add other cron jobs after yours have set MAILTO appropriately [17:26:47] in the same crontab I mean [17:27:11] cool [17:27:43] New patchset: Reedy; "Remove $urlprotocol as it's set to """ [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42995 [17:28:38] PROBLEM - mysqld processes on labsdb1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [17:29:12] New patchset: Reedy; "Remove $urlprotocol as it's set to """ [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42995 [17:31:32] !log asher synchronized wmf-config/db.php 'setting s5/dewiki to read-only for several minutes' [17:31:41] Logged the message, Master [17:32:25] !log asher synchronized wmf-config/db.php 'setting s5/dewiki to writeable' [17:32:35] Logged the message, Master [17:34:57] MaxSem: "no news is good news" [17:35:04] :) [17:35:27] which is good cause I'm about done for the day :-) [17:35:34] New patchset: Reedy; "Bug 43979 - Enable MoodBar on it.wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44441 [17:38:24] !log Created moodbar tables for itwikivoyage [17:38:34] Logged the message, Master [17:38:54] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44441 [17:41:52] !log moved dewiki db master to eqiad with mha and back to pmtpa in < 6 seconds, only one http request wanting to write hit the read-only window [17:41:57] Logged the message, Master [17:43:32] \o/ [17:45:35] mark: putting finishing touches on a single script that will do all the db+es failovers in one go in either direction [17:45:38] <^demon> binasher: What was the one? [17:45:43] awesome [17:46:44] nice :) [17:47:04] New patchset: Reedy; "Bug 43488 - Set $wgBabelCategoryNames on it.wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44443 [17:47:52] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44443 [17:48:06] ^demon: actually was wrong, there was both a flagged revs related requests (FlaggedRevision::insert) and one from aft, but the aft issue was seen yesterday and i think will be ignored since its an async request that shouldn't show an exception to users [17:48:50] !log reedy synchronized wmf-config/InitialiseSettings.php [17:49:00] Logged the message, Master [17:49:57] New review: Reedy; "Also needs rebasing" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/39578 [18:08:08] ASchulz|away: ping [18:08:41] RECOVERY - MySQL Slave Delay on es1001 is OK: OK replication delay NULL seconds [18:11:37] mark: the |away part isn't very promising :) [18:11:51] AaronSchulz: ping? :) [18:13:16] just say what it is :) [18:14:06] hey [18:14:14] PROBLEM - MySQL Slave Delay on es1001 is CRITICAL: CRIT replication delay 5099022 seconds [18:14:45] we want to find out a bit more about the replication journal [18:14:55] possibly enable it and possibly turn off writes to nas* [18:15:30] well it has never been turned off [18:16:02] how back does it go? [18:16:05] 8 months [18:16:09] oh wow [18:16:13] awesome [18:16:14] that's... great! [18:16:28] so my delete script isn't even needed [18:16:35] aaron's stuff can fix it up for originals [18:16:58] * mark turns it off [18:17:12] haha [18:17:28] so, Aaron, swiftrepl that has been copying originals doesn't take deletes into account [18:17:43] since it's much more complicated [18:17:47] updates are handled though [18:17:55] well [18:17:56] mark can tell you all about it :-) [18:17:57] not as good as journal [18:18:06] so also updates should be done by aaron's journal fixup thing [18:18:15] but yeah, swiftrepl does updates when it sees them [18:20:55] New patchset: Silke Meyer; "Puppet files to install Wikidata repo / client on different labs instances" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42786 [18:21:08] i'm going offline now, dinne [18:21:10] mark: how does it detect changes? Is it scrapping some log or is there a trigger hook added somewhere in swift? [18:21:12] dinner [18:21:23] it doesn't detect changes, it's just a copy [18:21:30] it does detect etag differences and does updates [18:21:37] but we still need your script to go over the journal and fix it up [18:21:42] as there's always a race [18:22:03] it's just not stupid enough to copy already existing files with the right etag/contents [18:22:26] but it doesn't have a log or journal of changes from mediawiki, so it can't stay up to date [18:22:43] are you sharding containers like in swift? [18:22:55] it's just copying the contents of containers from swift to ceph [18:22:59] exactly as is [18:23:01] anyone against adding the packages "php-mail" and "php-mail-mime" into class apaches::packages (along with other php- packages already in it)? Lwelling requests it for email notifications from the Echo project [18:23:06] so I guess that is a yes [18:23:09] yes [18:23:21] * AaronSchulz was hoping we wouldn't need to do that [18:23:35] well we thought of not doing that [18:23:54] but it's yet another risky change which we don't wanna do now [18:24:00] we can do that later with efficient server side copies [18:24:11] I guess [18:24:21] the idea was to change one thing at a time, yes [18:25:05] does mediawiki actually support sharded for one repo (swift), and not sharded for ceph at the same time? [18:25:18] for multiwrite? yes [18:25:43] ah cool [18:25:52] that stuff is hidden behind the interface [18:26:47] paravoid: I should work out the MW ceph config [18:27:08] sure [18:27:18] we need to turn off nas* until tuesday [18:27:26] not until [18:27:28] before [18:27:37] so is today ok? :) [18:27:39] as we've decided to not do the NFS switchover [18:27:41] yes [18:28:08] yes [18:29:09] ok, bbl [18:29:16] see you [18:32:06] New patchset: Reedy; "Bug 37608 - Enable Extension:GoogleNewsSitemap on el wikinews" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44450 [18:32:25] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44450 [18:33:18] AaronSchulz: hey, quick favor [18:33:26] AaronSchulz: https://gerrit.wikimedia.org/r/#/c/42600/ vs. https://gerrit.wikimedia.org/r/#/c/37968/ [18:33:29] pick one :) [18:34:00] or Reedy [18:34:08] heh [18:34:14] let me see if my internet will let me open it [18:35:00] Mines older [18:35:01] ;) [18:36:25] Reedy: why does the summary day "daily"? [18:36:59] Which summary where? [18:37:10] Reedy: you didn't put any ops as reviewers and I wasn't aware of it :( [18:37:20] Change abandoned: Aaron Schulz; "Duplicate of https://gerrit.wikimedia.org/r/#/c/37968/8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42600 [18:37:22] I guess we ops suck at going through the unreviewed queue [18:37:45] It's even got an RT ticket! [18:37:59] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [18:38:00] Reedy: https://gerrit.wikimedia.org/r/#/c/37968/8 [18:38:47] !log reedy synchronized wmf-config/InitialiseSettings.php [18:38:56] Logged the message, Master [18:40:53] Reedy: that's every 2 hours [18:41:58] a bit disappointing [18:42:02] it didn't free up much space [18:43:18] hm, I see no noticeable differences [18:43:18] New review: Aaron Schulz; "This should be weekly or daily at the most, it is every 2 hours here." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/37968 [18:43:33] e.g. wikipedia-commons-local-temp.5b is still 20GB [18:44:02] it might help to check the listings to see what type of file names are in there [18:44:30] 5/5b/10k1ey9qu0f0.ca5dz.8041.pdf.10199 [18:44:30] 5/5b/10k1ey9qu0f0.ca5dz.8041.pdf.1032 [18:44:42] 5/5b/20120426120308!phpZ8CKX4.jpg [18:44:43] 5/5b/20120426122223!20120426122044!phpR0Nm8j.jpg_1a576f2-1.jpg [18:44:43] 5/5b/20120426124451!20120426124407!phpEHGZUo.jpg_36d27a8-1.jpg [18:45:29] Is there a load still with old timestamps? [18:45:40] I don't understand the question [18:46:03] Are there any files left in the temporary dir that shouldn't be there now? [18:46:16] there must be [18:46:19] ie is the script actually doing its job? [18:46:39] so the script already traverses the thumb listing and nukes the old ones, maybe it should do the same with the other temp files instead of using the db [18:46:53] since the upload code is a piece of shit, it may be that they are out of sync [18:47:10] lols [18:47:16] the cron should definitely be weekly in that case [18:47:19] say, every tue [18:49:33] is this easily fixable aaron? [18:49:56] commons temp alone is 5TB, out of 45TB of total swift storage [18:50:02] if someone fixes that cron to be weekly and merges it, I'll change core [18:50:06] so it kinda matters for the replication to eqiad [18:50:50] which won't be complete until next week, but let's do our best to shorten it as much as possible [18:52:24] New patchset: Dereckson; "(no subject)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44452 [18:52:44] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44377 [18:56:03] New patchset: Aaron Schulz; "Disable multiwrite to nas-1." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44453 [18:56:10] paravoid: ^ [18:56:15] Reedy: are you doing that or should I? [18:56:20] the cron changes [18:56:27] Feel free [18:56:33] I'm poking at shell bugs [19:01:58] hmm, maybe it can still be daily [19:02:22] meh [19:03:26] New patchset: Dzahn; "need git-core installed on mchenry to deploy wikibugs changes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44454 [19:04:04] New patchset: Dereckson; "(no subject)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44455 [19:06:27] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44454 [19:06:36] New patchset: Aaron Schulz; "RT #2295: Run cleanupUploadStash across all wikis daily" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37968 [19:06:41] paravoid: ^ [19:07:06] hey [19:08:06] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37968 [19:08:33] thanks! [19:08:35] very much appreciated [19:09:43] E: Couldn't find package salt-minion [19:09:45] sigh [19:10:03] Reedy: do you agree with that request to drop the "www" from wikidata? [19:10:15] there is a mediawiki-config and an Apache config change [19:10:24] Yeah [19:10:26] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44453 [19:10:30] mutante: I'm indifferent. I think they hinted to that originally they wanted that [19:10:31] the Apache part just changes ServerName to wikidata.org [19:10:33] New patchset: Asher; "deploying mha powered db datacenter failover script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44456 [19:10:56] AaronSchulz: I'm going to run it on hume now [19:11:12] Reedy: hmm.. which should be merged first? [19:11:48] Not sure, doing the sync and the graceful quickly after each other should be fine [19:11:56] alright, ty [19:12:04] AaronSchulz: ping me when you fix core so that I can run it. [19:12:22] binasher: \o/ mha is ready? [19:12:57] E: Couldn't find package python-mwclient [19:13:16] paravoid: binasher already briefly failed over dewiki ;) [19:13:25] oh? [19:14:02] (09:41:54 AM) binasher: !log moved dewiki db master to eqiad with mha and back to pmtpa in < 6 seconds, only one http request wanting to write hit the read-only window [19:14:43] wow. [19:14:48] paravoid: i figured if it's ok to put enwiki in read-only mode for 5 minutes at peak, no one should mind too much if i moved the dewiki master to eqiad for a few seconds [19:15:15] hehe [19:15:42] i was tempted to do every db and es shard to do the actual full migration day test, but that seemed a little too brazen even for me [19:16:11] !log aaron synchronized wmf-config/filebackend.php 'Disabled multiwrite to nas-1' [19:16:16] too brazen for binasher? I thought there was no such thing! [19:16:21] Logged the message, Master [19:16:45] notpeter: it's just the actually caring about the users bit… [19:17:09] you are well known to fight for the users. [19:17:10] if i didn't care about the users, i'd deploy services that can push upwards of 6 requests per second left and right, believe you me [19:17:35] 10.0.2.191 apache2[13025]: PHP Warning: fopen(/tmp/mw-cache-1.21wmf7/conf-enwiki) [function.fopen]: failed to open stream: Read-only file system in /usr/local/apache/common-local/wmf-config/CommonSettings.php on line 211 [19:17:39] just that box [19:19:45] ugh, installing git-core isnt even enough to be able to git clone.. missing curl [19:21:08] New patchset: MaxSem; "Enable MobileFrontend on labs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44278 [19:23:22] "Cannot get remote repository information. [19:23:26] Perhaps git-update-server-info needs to be run there? [19:25:41] New patchset: Asher; "deploying mha powered db datacenter failover script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44456 [19:26:00] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [19:27:16] notpeter: Does the presence of the coredb and coredb_mysql modules mean that a bunch of stuff in manifests/mysql.pp is dead code? [19:27:34] mostly. [19:28:02] I have noted the exceptions in the email I sent out recently [19:28:07] why do you ask? [19:28:27] I'm thinking about refactoring some of those classes, wondering which ones are still used. [19:28:40] which ones [19:28:41] ? [19:29:01] everything in the generic section is neutral in my books [19:29:09] the rest of it isn't worth touching, as it's nearly dead [19:29:10] notpeter: you should probably just delete the dead [19:29:20] binasher: yes... but it's not quite dead yet :) [19:29:23] Yeah, it's the generic stuff I want to fix. [19:29:24] but yeah, i can [19:29:55] andrewbogott: fine by me. although the right thing to do is to just grab the mysql module off of puppet forge and mod that as needed [19:30:03] binasher: but yes, I shall [19:30:21] notpeter: Yep, I agree. [19:30:38] So, I'll just bandaid this problem for now, and come back to this after you've scrubbed the core bits out of that file. [19:31:07] paravoid: locally, the script seems to clean everything up [19:31:29] notpeter: btw, do you remember the subject line of the email? [19:31:36] in terms of our prod es and S[1-7] dbs, you can do whatever you want after like 612 [19:31:57] pmtpa and eqiad using coredb module [19:32:22] cool, I see it. thanks [19:36:09] AaronSchulz: feel free to run it on hume [19:36:30] it's just odd that it does that without changing the code yet [19:36:49] I guess something different happens on prod somewhere [19:36:57] New patchset: Pyoungmeister; "lucene.php: moving en search traffic ot eqiad for java upgrades in pmtpa" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44460 [19:38:10] !log fixed eqiad root + wikiuser grants on es shards [19:38:21] Logged the message, Master [19:41:11] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44460 [19:41:52] RECOVERY - MySQL Slave Delay on es3 is OK: OK replication delay 3 seconds [19:42:10] RECOVERY - MySQL Slave Delay on es2 is OK: OK replication delay 21 seconds [19:42:10] RECOVERY - MySQL Slave Delay on es1001 is OK: OK replication delay 21 seconds [19:42:25] !log py synchronized wmf-config/lucene.php 'moving enwiki search traffic to eqiad' [19:42:35] Logged the message, Master [19:45:13] New patchset: Pyoungmeister; "lucene.php: moving all search traffic to eqiad" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44461 [19:46:30] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44461 [19:46:38] !log stopped all replication on es1 (es1-4,1001-1004) and ensured all nodes are read-only. replication was enabled there briefly for mha testing. [19:46:48] Logged the message, Master [19:47:09] !log py synchronized wmf-config/lucene.php 'moving all search traffic to eqiad' [19:47:19] Logged the message, Master [19:49:40] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [19:53:56] paravoid: nope wait it doesn't clear a lot of stuff even locally [19:54:06] * AaronSchulz was didn't notice a short-circuit [19:54:11] anyway, it does with the changes [19:57:37] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [20:03:00] !log bsitu synchronized php-1.21wmf8/extensions/Echo 'Update Echo to master' [20:03:10] Logged the message, Master [20:36:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:38:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.040 seconds [20:39:37] PROBLEM - Puppet freshness on cp1026 is CRITICAL: Puppet has not run in the last 10 hours [20:51:37] PROBLEM - Varnish traffic logger on cp1043 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [20:53:25] RECOVERY - Varnish traffic logger on cp1043 is OK: PROCS OK: 3 processes with command name varnishncsa [21:10:20] !log manually deployed changes to wikibugs bot, which is in git meanwhile [21:10:31] Logged the message, Master [21:16:57] New review: Catrope; "I don't think there's much sense in doing so, because the IPs are DC-specific." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44160 [21:18:37] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [21:18:37] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [21:19:52] New patchset: Tim Starling; "Vary the Parsoid IP by datacenter" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44160 [21:19:59] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44160 [21:20:50] so, do we feel ready to have double paging yet ? [21:20:55] i mean not paging [21:20:58] irc notifications [21:21:01] want to test out icinga [21:21:50] LeslieCarr: we can switch the frack hosts to icinga if you want [21:21:55] :) [21:22:12] ok, lemme fix up the "contacts_fake" file to have the real fundraising paging info [21:22:22] ok [21:22:43] does icinga already get the payments_nsca config file? [21:23:12] Thehelpfulone: got an available help slot?:) [21:24:07] not for long today unfortunately - if it's quick, else I should be available tomorrow :) [21:24:56] 4364.. but only if you got the time. no rush at all [21:25:14] it's about telling mailman admins how to transfer to another admin [21:25:19] (and not needing ops for it) [21:26:10] ah yeah I'll take a look at that one tomorrow, "taken" on RT [21:26:26] tyvm:) [21:27:24] Jeff_Green: where would this be ? [21:27:39] just found it--/etc/icinga/nsca_payments.cfg [21:27:42] Jeff_Green: oh i need to update the file on payments as well [21:27:43] cool [21:28:16] don't update it yet, or we'll start getting paged by nagios [21:28:41] we need the nsca listener running on neon [21:28:54] New patchset: Andrew Bogott; "Compute the mysql version only if it's not specified externally." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44503 [21:29:01] ok [21:29:33] LeslieCarr: Can you look at ^^ and make sure it still does what you needed that original patch to do? [21:31:09] ok, jeff, will you still be around in about 15 minutes ? [21:31:25] yep. [21:31:30] LeslieCarr: no rush, I need to run another test on that patch which will take 20 minutes or so [21:31:33] at lesat [21:31:35] i gotta go around 5:15 my time [21:32:12] ok [21:32:24] New review: Lcarr; "quick overview LGTM" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/44503 [21:33:19] Change merged: Dzahn; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44406 [21:33:36] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/44407 [21:35:23] New patchset: Lcarr; "snmptt restart + have fundraising page via icinga" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44515 [21:38:44] New review: Andrew Bogott; "If y'all need someone to merge this I'm happy to do so... I'm not qualified to comment as to the cor..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44157 [21:39:07] !log dzahn synchronized ./wmf-config/InitialiseSettings.php [21:39:17] Logged the message, Master [21:39:36] !log dzahn synchronized ./wmf-config/CommonSettings.php [21:39:46] Logged the message, Master [21:40:01] New patchset: Dereckson; "(no subject)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44516 [21:40:46] The page you requested was not found, or you do not have permission to view this page. wth [21:41:26] dzahn is doing a graceful restart of all apaches [21:41:31] what eqiad servers need MW apart from mw1001-1160? [21:41:48] !log dropping www from wikidata in mw and apache configs as requested [21:41:50] !log dzahn gracefulled all apaches [21:41:59] Logged the message, Master [21:42:08] Logged the message, Master [21:42:52] New patchset: Dereckson; "(bug 44052) Namespace configuration for sk.wikiquote" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44455 [21:43:19] New patchset: Lcarr; "snmptt restart" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44515 [21:44:50] TimStarling: I know searchidx1001 does as well [21:45:27] it is already in mediawiki-installation [21:46:08] as well as snapshot1001-1004 [21:48:59] ah, cool [21:50:14] New review: Danny B.; "Please sync. Thanks." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/44455 [21:52:25] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44515 [21:56:27] New patchset: Lcarr; "allowing irc bot to read the proper file for icinga" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44517 [21:58:25] Change merged: Dzahn; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44455 [21:59:08] !log dzahn synchronized ./wmf-config/InitialiseSettings.php [21:59:15] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44517 [21:59:18] Logged the message, Master [21:59:33] thanks dzahn :) [22:00:25] thanks mutante [22:02:06] yw yw [22:08:19] AaronSchulz: hey [22:08:22] you there? [22:12:22] notpeter: yay for no nfs :) [22:12:34] woo! [22:13:21] hey RobH [22:13:31] I come to you seaking boxes [22:15:20] RobH: like so: http://img.pokemondb.net/artwork/seaking.jpg [22:18:01] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44503 [22:18:10] !log reedy synchronized php-1.21wmf7/cache/interwiki.cdb 'Updating 1.21wmf7 interwiki cache' [22:18:20] Logged the message, Master [22:18:27] !log reedy synchronized php-1.21wmf8/cache/interwiki.cdb 'Updating 1.21wmf8 interwiki cache' [22:18:37] Logged the message, Master [22:20:34] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42786 [22:21:03] New patchset: Tim Starling; "Network-aware scap" [operations/debs/wikimedia-task-appserver] (master) - https://gerrit.wikimedia.org/r/44520 [22:22:27] Reedy: you there? [22:22:33] Ya [22:23:12] you made: https://rt.wikimedia.org/Ticket/Display.html?id=4216 I aksed a question, it never got asnwered. I think that tim took care of it on 2013-01-16. can you confirm? [22:23:49] yeah, let me check [22:25:50] Reedy: https://gerrit.wikimedia.org/r/#/c/44523/ [22:25:55] hume and fenari aren't consistent with thumbs [22:26:05] RECOVERY - Puppet freshness on ms-fe1001 is OK: puppet ran at Thu Jan 17 22:26:00 UTC 2013 [22:26:27] fenari: [22:26:28] ms5.pmtpa.wmnet:/export/thumbs on /mnt/thumbs type nfs (rw,bg,soft,tcp,rsize=8192,wsize=8192,intr,nfsvers=3,addr=10.0.0.252) [22:26:28] nas1-a.pmtpa.wmnet:/vol/thumbs on /mnt/thumbs2 type nfs (rw,bg,intr,addr=10.0.0.253) [22:26:31] hume: [22:26:43] oh, wait [22:27:03] ms5.pmtpa.wmnet:/export/thumbs is missing from hume [22:27:39] notpeter: what for? [22:27:48] I added upload mounts on hume just two days ago [22:28:05] I guess there was a class I missed [22:28:18] TimStarling: cool, thank you. i wasn't sure what all was needed [22:28:21] Reedy: meh, nothing should touch those now [22:28:31] I don't know why we can't just includes those mounts on all MediaWiki servers [22:28:46] Reedy: I was going to say "I guess I'll include those, but I guess now I don't have to per AaronSchulz :) [22:28:58] RobH: tmh [22:29:12] do we have 2 boxes in eqiad that are roughly equiv to tmh1/2 ? [22:29:13] Thanks [22:30:04] TimStarling: I have no opinion on this one way or the other, personally. [22:34:53] RobH: although, temporary, different spec boxes would do, too [22:40:20] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [22:40:20] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [22:40:20] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [22:40:20] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [22:41:48] notpeter: ahh, woosters pinged me about this [22:41:52] lemme take a gander [22:41:56] sweet! thanks [22:42:30] paravoid: have you edited PrivateSettings.php before? [22:43:11] New patchset: Tim Starling; "Moved scap scripts to their own directory" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44525 [22:43:11] New patchset: Tim Starling; "Network-aware scap" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44526 [22:43:55] notpeter: not only do i have ones like it, it appears tmh1/2 are high performance misc servers, r610 [22:44:00] i have two of them in eqiad now. [22:44:09] we can rename them from elements to tmh1001/1002 [22:44:16] notpeter: these would be permanent use right? [22:44:25] I would assume so? [22:44:31] so, yeah sounds good to me! [22:44:36] awesome! [22:44:49] RobH: you renamed praseodymium, right? [22:45:25] yeah, to titanium [22:46:07] notpeter: graphite seems no worky, the graphs have nothing [22:46:16] notpeter: no [22:46:18] not yet [22:46:21] titanium is another machine [22:46:24] ah, ok [22:46:47] AaronSchulz: ok, gimme a sec and I'll poke around at them [22:46:54] for some unknown reason praeodsfuipeafu3eqr3 is still fucking up bast1001's puppet [22:47:04] LeslieCarr: I'm looking at that [22:47:38] nice nick [22:47:40] add a cname [22:48:09] ah I was gonna do that tomorrow on awakign (I notivced it this evening after clocking out) [22:48:27] apergos: I suggest damnyourob.wikimedia.org CNAME praesodfddkajfdkljlfasdjfadklf.wikimedia.org [22:48:42] hee hee [22:49:27] I swear this is just because he misses eiximenis [22:49:39] notpeter: i have no freaking clue what's up [22:49:57] praesodymium (sp?) is in a really f-ing weird state [22:50:01] notpeter: i tried puppetstoredconfigclean on bast and praji4oh3rhifdsifjda [22:50:07] New review: Tim Starling; "Deployed already." [operations/debs/wikimedia-task-appserver] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/44520 [22:50:07] Change merged: Tim Starling; [operations/debs/wikimedia-task-appserver] (master) - https://gerrit.wikimedia.org/r/44520 [22:50:14] Last I tried to work with it, ssh would kick me out every minute, and the SSH host key would change every minute [22:50:19] LeslieCarr: I think that it's a duplicate exported resource def [22:50:23] I'm poking around on db9 [22:50:32] So every time I tried to reconnect after being kicked out, I'd get a warning about how the host key changed [22:50:53] anyone want to do a scap any time in the next half hour or so? [22:51:14] * AaronSchulz just wants to sync-file [22:51:17] because I want to test my changes and I'll probably break it for a while [22:51:28] sync-file should still work [22:52:07] Mobile might want to soon [22:52:10] awjr: MaxSem ^ [22:52:33] TimStarling yeah, we will be scapping [22:52:39] probably in 30-60mins [22:52:44] Reedy, TimStarling, though probably a bit later [22:52:59] actually yeah, probably closer to 60+ mins [22:53:04] !log aaron synchronized php-1.21wmf8/includes/Block.php 'deployed d69578ee88a2542cbaec3bd7904026a4edba86b1' [22:53:04] if history is any guide. [22:53:13] Logged the message, Master [22:53:24] will you be changing the NFS code before then? [22:53:32] I will need to do my own scap to test it [22:54:00] if anyone is planning to scap anytime sooner, please tell me now because we are about to start our deployment window which will involve having code sit on fenari for a while while we test on testwiki [22:54:08] TimStarling now, just MobileFrontend changes [22:54:12] s/now/no [22:54:25] So you mean yes [22:54:27] I'll wait [22:54:30] Because MobileFrontend lives on NFS [22:54:37] but i am not changing nfs code... [22:54:40] oh i see what you mean [22:54:45] the code on NFS [22:54:49] sorry, i misunderstood - yeah [22:55:58] RobH: another question: re: https://rt.wikimedia.org/Ticket/Display.html?id=4343 [22:56:05] msfe1002 and msbe1002 are both no more? [22:56:14] mark: when can we pop the nfs champagne? [22:56:51] notpeter: they are no more, but they are not similar to tmh1/2 [22:57:01] renamed them back to their original element names [22:57:02] RobH: cool. [22:57:05] thanks! [22:58:18] RobH: the numbering is inconsistent in that ticket [22:58:25] can you tell me which names have been reverted? [22:58:35] msfe1002, and.... [22:58:47] !log olivneh synchronized php-1.21wmf7/extensions/EventLogging [22:58:49] did we have an msfe1000? [22:58:57] Logged the message, Master [22:59:42] New patchset: Lcarr; "doh, 0, not 24 for cron!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44533 [23:00:28] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44533 [23:00:38] RoanKattouw: uhh, were you still using cadmium? [23:00:41] cuz i need it back. [23:00:51] (was temp transcoding, then you guys needed it for something else) [23:01:23] notpeter: how soon do you need both tmh1001/1002? [23:01:29] uh [23:01:30] I mean [23:01:38] asap i assume ;] [23:01:38] for eqiad cutover would be good [23:01:47] so if you see roan wandering around afk [23:01:51] ask him about cadmium, as i want it back [23:01:56] and he had it on a very temp loan [23:02:03] (i wanna make sure me taking it wont lose any data) [23:02:26] TimStarling: 10.0.2.191 apache2[1930]: PHP Warning: fopen(/tmp/mw-cache-1.21wmf7/conf-enwiki) [function.fopen]: failed to open stream: Read-only file system in /usr/local/apache/common-local/wmf-config/CommonSettings.php on line 211 [23:02:34] not sure what is up with that [23:02:35] RobH: he's working from elsewhere today [23:02:42] New patchset: Lcarr; "forgot hour 20" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44535 [23:02:43] RobH: I'm at the Wikia office today [23:02:51] usually means the server has to be powered down [23:02:53] RobH: cadmium is all yours [23:03:11] LeslieCarr: puppet seems to be working on fenari. probably also bast1001 [23:03:16] (am starting a run now) [23:03:37] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Thu Jan 17 23:03:29 UTC 2013 [23:03:48] what'd you fix up notpeter ? [23:03:49] RoanKattouw: awesome thx! [23:03:54] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44535 [23:03:57] curious minds want to know [23:04:14] LeslieCarr: http://24.media.tumblr.com/tumblr_mc6pgbPWf81reby8oo1_400.gif [23:04:24] !log olivneh synchronized php-1.21wmf8/extensions/EventLogging [23:04:33] Logged the message, Master [23:04:38] but srsly. I looked at host table and resource table in puppet db on db9 [23:04:47] strange to have /tmp on a separate partition [23:04:53] saw that there were two hosts with that ip [23:04:57] one was msfe1002 [23:05:00] which is no more [23:05:08] re: the ticket that I aksed rob about above [23:05:08] it used to be common to put /tmp on a ramdisk, back when creating a file on disk took about a second [23:05:11] well, not strange in general [23:05:23] it's on ext4 [23:05:26] hehehe [23:05:30] it still had an sshhostkey in puppet.resources [23:05:34] deleted that [23:06:09] no dupe hostkeys now, is fine [23:06:10] RECOVERY - Puppet freshness on bast1001 is OK: puppet ran at Thu Jan 17 23:05:56 UTC 2013 [23:06:45] anyway, since it's not the root partition, I can probably just fix it [23:08:02] ok, who's next [23:08:04] !log fixing broken /tmp partition on srv191. Stopping apache [23:08:08] AaronSchulz: graphite borken? [23:08:13] Logged the message, Master [23:08:23] I think so [23:12:42] AaronSchulz: I see graphs now [23:12:46] can you confirm? [23:12:49] !log on srv191: wiped /tmp and started apache [23:12:58] Logged the message, Master [23:13:23] oh, hrm [23:13:24] maybe not [23:17:14] Reedy: https://gerrit.wikimedia.org/r/#/c/44537/ [23:17:34] RECOVERY - swift-account-reaper on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:21:10] !log authdns-update [23:21:19] Logged the message, RobH [23:23:07] PROBLEM - swift-account-reaper on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:25:22] RobH: can you start those reinstalling when you're done? [23:25:44] I can if you don't want to [23:26:48] in process of doing so [23:26:58] * RobH was swapping vlans on them [23:27:29] woo! [23:27:30] thank you [23:31:19] TimStarling: i'll be scapping in just a few minutes [23:31:30] ok [23:33:01] !log rolled wikibugs back to last working version from SVN ..Revision 93776 [23:33:04] New patchset: Pyoungmeister; "adding tmh boxes to eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44541 [23:33:11] Logged the message, Master [23:34:04] Reedy: Are you still up? [23:34:05] notpeter: cya :) [23:34:24] What do you think? ;) [23:34:27] haha [23:34:31] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44452 [23:35:05] Reedy: Could you do an urgent-ish config change for nlwikivoyage? [23:35:09] mutante: wouldn't it be a good idea to get rid of that box if you can't even git clone on it asap? [23:35:21] Their patrol system is getting overwhelmed and they want to get it fixed before the Lockout of Doom next week due to the eqiad migration [23:35:27] haha [23:35:42] Few days behind enwikivoyage it seems [23:35:52] TimStarling: did you see https://gerrit.wikimedia.org/r/#/c/43395/ ? [23:35:59] /msg Romaine and he'll explain [23:35:59] RoanKattouw: What do they want doing? [23:36:05] I figured you were probably the only other person with ceph setup [23:36:09] They want a patrolled group, I think [23:36:15] *patroller [23:36:18] Or something along those lines [23:36:42] But talk to Romaine for the exact specifics [23:36:47] New patchset: RobH; "adding tmh1001 & tmh1002" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44542 [23:37:13] Reedy: Actually, ignore what I said re what he wants, he filed https://bugzilla.wikimedia.org/show_bug.cgi?id=44082 [23:37:15] AaronSchulz: is this something we need for eqiad? [23:37:22] notpeter: https://gerrit.wikimedia.org/r/#/c/44542/1 [23:37:29] wanna review my change so i dont self review? [23:37:54] TimStarling: not for the short-term switchover since we will still use swift [23:38:16] though eventually when we use ceph, yes [23:40:23] New review: RobH; "self review never backfires" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/44542 [23:40:24] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44542 [23:41:01] anyone knows who last deployed wikibugs before? [23:43:33] RobH: you need a bank of comments [23:43:45] AaronSchulz: yep [23:44:52] New patchset: Reedy; "Bug 44082 - patrol extension for Dutch Wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44544 [23:45:00] AzaToth: yes,there is just a tiny problem.it's the mail server [23:45:25] icinga-wm - why do you hate me ? [23:45:30] why do you not read from irc.log ? [23:46:58] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44544 [23:46:59] perhaps because my tests have devolved into strings of curses [23:47:41] oh [23:47:58] !log reedy synchronized wmf-config/InitialiseSettings.php [23:48:07] Logged the message, Master [23:48:11] AaronSchulz: so it looks like the S3 API does support a kind of temporary URL [23:48:37] PROBLEM - Host cadmium is DOWN: PING CRITICAL - Packet loss = 100% [23:48:44] well done for working out what to put inside that hash_hmac() [23:48:55] PROBLEM - Host yttrium is DOWN: PING CRITICAL - Packet loss = 100% [23:50:32] !log awjrichards Started syncing Wikimedia installation... : Updating MobileFrontend per https://www.mediawiki.org/wiki/Extension:MobileFrontend/Deployments/2013-01-17 [23:50:42] Logged the message, Master [23:51:20] New patchset: RobH; "renaming old servers, added to decom to remove from nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44545 [23:51:55] PHP Warning: file_put_contents(/home/wikipedia/common/wmf-config/ExtensionMessages-1.21wmf8.php): failed to open stream: Permission denied in /home/wikipedia/common/php-1.21wmf8/maintenance/mergeMessageFileList.php on line 119 [23:51:55] Warning: file_put_contents(/home/wikipedia/common/wmf-config/ExtensionMessages-1.21wmf8.php): failed to open stream: Permission denied in /home/wikipedia/common/php-1.21wmf8/maintenance/mergeMessageFileList.php on line 119 [23:52:12] New review: RobH; "this isnt the code review you are looking for" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/44545 [23:52:12] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44545 [23:52:19] noticed that just before updating localisation cache for 1.21wfm8 during scap ^^ [23:53:05] New patchset: Dzahn; "add redirect for www.wikidata to wikidata" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/44546 [23:54:43] did you enable a new extension? [23:54:46] RECOVERY - Host yttrium is UP: PING OK - Packet loss = 0%, RTA = 26.49 ms [23:54:47] no [23:55:27] New review: Dzahn; "bug 41847" [operations/apache-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/44546 [23:55:28] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/44546 [23:55:58] it shouldn't be a problem then [23:56:06] sweet thanks [23:56:09] oh, except that it would have aborted the LC update [23:57:39] hmm it wrapped up with Updating LocalisationCache for 1.21wmf8... done [23:57:53] before starting to copy stuff to the apaches; should i be worried? [23:58:08] we did have new english messages going out, hence the scap [23:58:18] New patchset: Dzahn; "add ServerAlias for wikidata (bug 41847)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/44547 [23:58:32] I guess if it showed that message then it should be ok [23:58:46] PHP must have given a zero exit status [23:58:57] New patchset: Dzahn; "add ServerAlias for wikidata (bug 41847)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/44547 [23:58:59] PROBLEM - SSH on yttrium is CRITICAL: Connection refused [23:59:10] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/44547 [23:59:25] notpeter: so tmh1001 and tmh1002 are mid install [23:59:36] ok, it's not a huge deal if things are slightly wonky on mw.o and test/test2 - im pretty sure the new messages were only in the 'beta' version of MobileFrontend anyway, which is expected to have the occasional minor issue and i think it's not used much on mw.o [23:59:42] we can clean it up later [23:59:48] will be handing to you shortly, cuz going to look at new place in a bit