[00:01:42] wikidata status update for anyone interested: the migration to the S5 shard is done on 6 out 8 effected db servers. i expect to be fully done in < 45min [00:03:54] binasher: thanks for the updates :) [00:04:01] ping robla : see above wikidata status [00:04:27] Ryan_Lane: wanna review https://gerrit.wikimedia.org/r/#/c/32924/ ? [00:04:40] you're more authoritative on that area :) [00:04:56] lemme see [00:05:23] I'm pretty sure that file should just get deleted [00:05:26] not renamed [00:05:41] Reedy: ^^ [00:05:47] heh [00:06:09] I'll do that then [00:07:00] New patchset: Reedy; "Delete *.wikimedia.org.crt" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/32924 [00:14:09] I'm getting really annoyed by having copies of configs between nagios and icinga [00:14:24] New patchset: Asher; "moving wikidatawiki to s5, disabling wgReadOnly" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50127 [00:16:00] New patchset: Faidon; "Fix check_solr for Nagios, not just Icinga" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50129 [00:16:23] New review: Asher; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50127 [00:16:24] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50127 [00:17:38] !log asher synchronized s3.dblist 'removing wikidatawiki' [00:17:40] Logged the message, Master [00:17:48] New review: Faidon; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50119 [00:17:57] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50119 [00:18:10] !log asher synchronized s5.dblist 'adding wikidatawiki' [00:18:11] Logged the message, Master [00:18:12] New review: Faidon; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50129 [00:18:18] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50129 [00:18:27] binasher: sync-dblist ;) [00:19:36] New review: Ryan Lane; "Patch Set 7: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/32924 [00:19:45] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/32924 [00:20:10] woosters: thanks for the ping [00:20:58] !log asher synchronized wmf-config 'enabling wikidatawiki on shard s5' [00:20:59] Logged the message, Master [00:21:26] Wheee [00:21:28] binasher: is that it? [00:21:39] New patchset: Reedy; "Add enwiki to wikidata dblist" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50130 [00:22:13] * robla prepares "w00t" and other celebratory comments [00:22:33] New review: Reedy; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50130 [00:22:34] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50130 [00:22:39] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 207 seconds [00:22:49] robla: should be! wikidata db writes are flowing to the s5 master and so far, everything looks good [00:22:54] Reedy: so, those jobqueue alerts... [00:22:55] ignore them? [00:22:57] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 214 seconds [00:23:09] Err [00:23:15] excellent, thanks binasher! w00t! \o/ [00:23:25] https://www.wikidata.org/wiki/Special:RecentChanges [00:23:27] :D [00:23:34] going to watch a bit longer but yup, still looks good [00:23:34] paravoid: Which alerts? [00:23:50] the nagios ones, we were looking at them the other day [00:23:54] you verified they're true [00:24:04] "w00t" :) [00:24:13] JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , lmowiki (11414), svwiki (63550), Total (90760) [00:24:17] Oh [00:24:21] JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (661642), lmowiki (11414), svwiki (63552), Total (750686) [00:24:26] apparently different, no idea why [00:24:30] the fist writes to immiediately start flowing are related to the job runner / update client (i.e. UPDATE /* Wikibase\DispatchChanges::trySelectClient */ `wb_changes_dispatch` SET chd_site = 'enwiki',chd_db = 'enwiki',chd_seen = '6148311',chd_touched = '20130221002303',chd_lock = 'enwiki.WikiBase.dispatchChanges',chd_disabled = '0' WHERE chd_site = 'enwiki') [00:24:52] hmmm [00:25:00] binasher: That seems to be just over 3GB smaller now.. [00:25:03] maybe we should have stopped the cronjobs [00:25:08] lolol [00:25:21] let's see if they pickup again [00:25:44] 20.986434936523 vs 24.061431884766 earlier [00:25:55] woo [00:25:57] nice [00:25:59] or there goes some data ;) hahaha [00:27:29] having explicitly defined small primary keys is a good thing, or innodb makes ones up that are unusable and generally take up a lot more more space per row than an auto-inc int would [00:27:33] did the changes table get pruned? [00:27:44] there is a cron job for that [00:27:46] dump/load always uses space more efficiently too [00:28:00] aude: nothing was explicitly pruned [00:28:15] binasher: that's fine but should automatically happen like once a day [00:28:18] hmm, job runners should probably be restarted though [00:28:19] makes stuff smaller [00:28:23] yeah [00:28:25] paravoid: But yes, those numbers are pretty useless now we keep old jobs around and prune them at some later point [00:28:37] uhm [00:28:47] so? remove the checks? [00:28:50] old/failed [00:28:54] it'd be nice to have /some/ check [00:29:01] Indeed, and before it worked fine (mostly) [00:29:28] paravoid: We need to do something with job_attempts [00:29:58] just curious, did we do rebuild the term search key? or add that column [00:30:05] (assume they can be done later) [00:30:18] the column with the OSC thing [00:30:27] AaronSchulz: ^ Should we count where job_attempts = 0 for the job queue counts? (I know, it's not indexed) [00:30:28] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 184 seconds [00:30:34] and rebuild does not require read only, i think [00:30:41] Reedy: what counts? [00:30:47] depends what you want to count [00:30:53] JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (661642), lmowiki (11414), svwiki (63552), Total (750686) [00:31:07] | 109017665 | 20130213224350 | refreshLinks | 3 | [00:31:07] | 109018333 | 20130213224918 | refreshLinks | 3 | [00:31:18] Which somewhat articifically inflate the count that we care about.. [00:31:39] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 193 seconds [00:31:39] having tonnes of ones with 0 count would suggest new jobs possibly not being processed [00:32:43] the check calls extensions/WikimediaMaintenance/getJobQueueLengths.php [00:32:47] fwiw [00:33:04] Yeah [00:33:19] heh, I bet that won't work with redis :) [00:33:21] and fires off if at least one is > 10k [00:33:31] one wiki [00:34:46] !log moved wikidatawiki tables on s3 to wikidatawiki_old to keep around for a time just in case; then "drop database wikidatawiki" [00:34:47] Logged the message, Master [00:35:00] :) [00:35:10] just in case..... [00:35:36] That means they'll stay for everer [00:35:56] would everything be up to date on the toolserver once replication catches up? [00:36:01] or do they have to make changes as well? [00:36:18] They'll need to make changes [00:36:37] AaronSchulz: what's the lightest effective check we can do to make sure the job queue isn't spiralling out of control? [00:36:38] duh: looks like changes are needed there [00:36:52] ok [00:36:52] they need to change where they're replicating from [00:37:13] Reedy: is it doing COUNT() now? [00:37:22] Yeah [00:37:29] * aude no idea how toolserver replication works [00:37:51] i asked dab in -toolserver [00:37:56] Reedy: so it can do the same thing JobQueueDB::doGetSize does [00:37:56] ah, okay [00:38:47] Which is? [00:39:03] exclude rows with job_token set? [00:39:06] We aren't filtering by type.. [00:39:20] unindexed [00:39:29] isn't this scanning everything anyway? [00:40:06] 1 row in set (1.73 sec) on enwiki [00:40:07] meh, WFM [00:40:15] it's not like this is myisam and COUNT(*) was fast [00:45:29] just for curiousity can someone do show create table for wb_terms in wikidata? [00:45:45] * aude just wonders what, if anything remains to do [00:45:56] since i can't see on toolserver [00:47:30] aude: http://p.defau.lt/new.html [00:47:34] http://p.defau.lt/?wuvc0ouLdZ3NmR_e6YowkQ [00:47:49] awesome [00:48:10] can you tell if the term search key got populated or not? [00:48:18] not a big deal either way at this point [00:48:22] I'll have a look in a minute [00:48:30] thanks [00:48:39] it looks good though [00:48:41] !log reedy synchronized php-1.21wmf10/extensions/WikimediaMaintenance/ [00:48:42] Logged the message, Master [00:48:46] RobH: mw85-mw125 yours? [00:49:26] what they do? [00:49:28] !log reedy synchronized php-1.21wmf9/extensions/WikimediaMaintenance/ [00:49:29] they should be working. [00:49:29] Logged the message, Master [00:49:34] mw125: rsync: mkdir "/apache/common-local/php-1.21wmf10/extensions/WikimediaMaintenance" failed: No such file or directory (2) [00:49:34] mw125: rsync error: error in file IO (code 11) at main.c(605) [Receiver=3.0.9] [00:49:44] Fine for php-1.21wmf9 though... [00:49:44] is that on all of them ? [00:49:51] paravoid: The check should be of some use now [00:49:52] or just the one? [00:50:01] i did a bunch of syncs, but not a scap, bleh. [00:50:08] sync-common locally [00:50:20] down to mw82 [00:50:21] mw110: ssh: connect to host mw110 port 22: Connection refused [00:50:21] Reedy: wow, thanks :) [00:50:23] ^ that's dead [00:50:24] Reedy: so the entire range is throwing errors? [00:50:32] i know mw110 is dead [00:50:33] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [00:50:35] there is a note on it [00:50:45] looks not populated, as far as i can tell from editing properties on wikidata [00:50:46] Yup, without checking every number, yes [00:50:51] Reedy: but now you have me paranoid, so they are all bad [00:50:56] ok, let me pull them out of pybal. [00:51:01] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [00:51:10] Like I say, seem ok for php-1.21wmf9, but not for php-1.21wmf10 [00:51:21] So that means? [00:51:29] I would imagine it means they shouldnt be serving the site. [00:51:47] Yeah [00:51:54] They can serve wikipedias fine, the rest not [00:51:55] aude: '' [00:52:24] aude: We can run the script for that tomorrow (well, today, post sleep) if you want [00:52:57] Reedy: ok [00:53:11] And anything else that needs tidying up [00:53:19] !log issues with new mw servers 86+, took back out of pybal until i can troubleshoot [00:53:20] Logged the message, RobH [00:53:23] Reedy: Thanks for the spot =] [00:53:35] I'll look into it and get them back to working [00:53:37] well i'm travelling tomorrow and friday, but if daniel k is around (in case of any problems, unlikely) [00:53:39] then that's fine [00:54:01] RobH: Everything else looks fine, just looks like they're out of date [00:54:21] i just copied them to live today [00:54:24] :/ [00:54:28] and they ran the normal syncs, but bleh [00:54:33] so you ran sync-common and no go [00:54:41] i wonder whats up with them [00:54:49] reedy@mw113:/usr/local/apache/common$ sync-common [00:54:50] Copying to mw113 from 10.0.5.8... [00:54:50] * Reedy waits [01:01:12] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 186 seconds [01:21:32] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [01:23:03] RECOVERY - MySQL disk space on neon is OK: DISK OK [01:23:29] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 30 seconds [01:24:59] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [01:41:56] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [01:58:35] RECOVERY - mysqld processes on db1030 is OK: PROCS OK: 1 process with command name mysqld [01:59:29] RECOVERY - mysqld processes on db1031 is OK: PROCS OK: 1 process with command name mysqld [02:02:38] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 190 seconds [02:02:56] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 194 seconds [02:03:59] PROBLEM - mysqld processes on db1030 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [02:04:53] PROBLEM - mysqld processes on db1031 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [02:04:54] PROBLEM - Puppet freshness on db40 is CRITICAL: Puppet has not run in the last 10 hours [02:12:03] RECOVERY - mysqld processes on db1031 is OK: PROCS OK: 1 process with command name mysqld [02:12:57] RECOVERY - mysqld processes on db1030 is OK: PROCS OK: 1 process with command name mysqld [02:23:54] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [02:25:33] New patchset: Andrew Bogott; "Turn manage-volumes into a daemon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49916 [02:26:07] New review: Andrew Bogott; "Patch Set 4:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49916 [02:28:05] New patchset: Asher; "new extension1 shard (as an externalload config) - initially for AFTv5" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50139 [02:28:21] !log LocalisationUpdate completed (1.21wmf10) at Thu Feb 21 02:28:20 UTC 2013 [02:28:25] Logged the message, Master [02:28:44] New review: Asher; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50139 [02:28:46] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50139 [02:30:13] !log asher synchronized wmf-config/db-eqiad.php 'new extension1 shard (as an externalload config) - initially for AFTv5' [02:30:15] Logged the message, Master [02:30:58] !log asher synchronized wmf-config/db-pmtpa.php 'new extension1 shard (as an externalload config) - initially for AFTv5' [02:31:00] Logged the message, Master [02:35:18] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [02:35:18] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [02:54:17] !log LocalisationUpdate completed (1.21wmf9) at Thu Feb 21 02:54:17 UTC 2013 [02:54:19] Logged the message, Master [03:02:54] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [03:06:03] RECOVERY - MySQL disk space on neon is OK: DISK OK [03:06:39] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [03:16:54] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 181 seconds [03:18:24] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 199 seconds [03:50:48] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [03:51:15] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [03:56:30] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [03:57:06] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [04:05:39] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [04:15:42] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [04:17:39] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [04:19:45] PROBLEM - Puppet freshness on labstore3 is CRITICAL: Puppet has not run in the last 10 hours [04:47:04] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [04:48:34] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [05:20:37] RECOVERY - MySQL disk space on neon is OK: DISK OK [05:21:04] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [05:25:32] New patchset: Tim Starling; "Move all favicons to bits" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49802 [05:29:17] New review: Tim Starling; "Patch Set 2:" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49802 [05:49:17] New patchset: Tim Starling; "Use a cgroup for command execution" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49000 [05:49:26] New review: Tim Starling; "Patch Set 2: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/49000 [05:49:27] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49000 [05:50:42] !log tstarling synchronized wmf-config/CommonSettings.php 'shell cgroup' [05:50:45] Logged the message, Master [05:53:01] PROBLEM - Puppet freshness on cp1035 is CRITICAL: Puppet has not run in the last 10 hours [05:56:55] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [05:57:22] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [06:08:28] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [06:10:07] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [06:11:33] New patchset: Tim Starling; "Increase $wgMaxImageArea to 75 Mpx" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50149 [06:11:46] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [06:13:26] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 9.020 second response time on port 8123 [06:18:59] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [06:20:28] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.027 second response time on port 8123 [06:28:09] RECOVERY - MySQL disk space on neon is OK: DISK OK [06:28:21] TimStarling: hadn't noticed the cgroup changes, they look great [06:28:36] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [06:28:38] from a glance [06:28:54] PROBLEM - Puppet freshness on sq71 is CRITICAL: Puppet has not run in the last 10 hours [06:29:26] yeah, it should give us a few benefits beyond just avoiding imagemagick deadlocks [06:29:45] limiting on vmsize was getting tedious [06:30:07] since adding a library or deploying a new binary that uses lots of libraries would throw out the estimate [06:30:36] that was one of the problems with lilypond deployment, come to think of it -- massive vsize usage [06:31:25] yeah, even for avconv we were inflating the limits because of vsize [06:31:41] https://bugzilla.wikimedia.org/show_bug.cgi?id=43188 I guess this can be closed? [06:32:56] yes [06:33:41] done [06:34:35] oh was about to too :) [06:35:05] we should merge https://gerrit.wikimedia.org/r/#/c/38307/ too... [06:35:15] oh you're not a reviewer there [06:35:26] it's about apparmor, you might be interested :) [06:37:54] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [07:12:24] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [07:17:30] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 3.033 second response time on port 8123 [07:26:57] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [07:28:16] New review: Faidon; "Patch Set 4:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42791 [07:28:36] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 9.020 second response time on port 8123 [07:40:39] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [07:40:39] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [07:40:48] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [07:45:54] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [07:58:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:01:57] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.846 seconds [08:11:15] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [08:11:42] RECOVERY - MySQL disk space on neon is OK: DISK OK [08:29:06] PROBLEM - MySQL Slave Delay on db46 is CRITICAL: CRIT replication delay 191 seconds [08:29:15] PROBLEM - MySQL Replication Heartbeat on db46 is CRITICAL: CRIT replication delay 195 seconds [08:29:51] PROBLEM - MySQL Replication Heartbeat on db1022 is CRITICAL: CRIT replication delay 206 seconds [08:30:18] PROBLEM - MySQL Slave Delay on db1022 is CRITICAL: CRIT replication delay 219 seconds [08:47:25] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiki (1892892), Total (1896226) [08:47:51] RECOVERY - MySQL Replication Heartbeat on db1022 is OK: OK replication delay 0 seconds [08:48:18] RECOVERY - MySQL Slave Delay on db1022 is OK: OK replication delay 0 seconds [08:48:54] RECOVERY - MySQL Slave Delay on db46 is OK: OK replication delay 28 seconds [08:49:03] RECOVERY - MySQL Replication Heartbeat on db46 is OK: OK replication delay 21 seconds [08:49:57] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiki (1928065), Total (1930900) [09:15:30] mark: are you around by any chance to get some changes reviewed ? :-D [09:15:33] or maybe this afternoon [09:32:23] !log Jenkins: removing git branch specifier from all mediawiki-core jobs. [09:32:24] Logged the message, Master [09:42:15] New patchset: Hashar; "adapt role::cache::upload for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50064 [10:06:16] !log Jenkins: reseting git branch specifier :-/ [10:06:18] Logged the message, Master [10:51:31] !log huge CPU spike on manganese (gerrit host) [10:51:33] Logged the message, Master [10:53:31] apergos: mark: mutante: manganese has some huge CPU spike, would you be able to get us some log informations please ? :-} [10:53:34] qchris: ^^^^ [10:53:40] qchris: don't you have access on the cluster? [10:53:47] Thanks hashar [10:53:51] hashar: [10:54:03] hashar: Don't think so. [10:54:22] which log? [10:54:26] I"m on the host [10:54:55] top says java and python, that must be you. where do I look? [10:54:59] they are in /var/lib/gerrit2/review_site/logs [10:55:05] apergos: Last time ^demon noticed lots and lots of "Dispatched Failed!" lines ... I'll see if he mentioned which logs that were... [10:55:11] (then a pile of little gits stacked up oo) [10:55:13] *too [10:55:14] java is the Gerrit process, python is probably some hook. [10:55:51] [2013-02-21 10:55:29,395] WARN org.eclipse.jetty.io.nio : Dispatched Failed! SCEP@3c5da61{l(/127.0.0.1:27509)<->r(/127.0.0.1:8080),d=false,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=1r}-{AsyncHttpConnection@32841836,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} to org.eclipse.jetty.server.nio.SelectChannelConnector$ConnectorSelectorManager@4189cd3c [10:55:51] so either: /var/lib/gerrit2/review_site/logs/error_log or /var/lib/gerrit2/review_site/logs/sshd_log [10:55:53] lots [10:55:56] really lots and lots [10:55:56] good! [10:56:16] so it seems to be the same issue it had a few days ago [10:56:25] 2.8gb worth I guess [10:56:25] Restarting gerrit should make the problem disapear [10:56:29] apergos: what is the python command line ? [10:56:45] I mean there is some python process having some huge CPU, do you get the full command line? [10:56:51] I am wondering if that could be some hook getting wild [10:56:52] thats the entire log entry [10:56:54] as I posted it [10:56:59] oh. just a sec [10:57:12] ircecho [10:57:19] irrelevant [10:57:20] ah ok :-] [10:57:26] there's a salt-minion too [10:57:39] no idea what it is for [10:57:47] anything else before I restart (I assume i/etc/init.d/gerrit restart suffices)? [10:57:56] so I guess simply reboot gerrit. Hopefully restart on the init script will be fine [10:58:03] doing [10:58:08] :-] [10:58:19] qchris: I really hate that error message. That is not really helpful. [10:58:32] :-) [10:58:42] Looks like we'll have to switch away from jetty then. [10:58:48] should have tossed the log file rats [10:58:48] ^demon will not like this [11:00:18] Looks like gerrit is not there yet, is it still starting up? [11:00:26] probably [11:00:36] + there is an apache in front of it acting as a reverse proxy [11:00:45] Apache is up [11:00:51] yeah but it has some timeout [11:00:52] err [11:00:59] going to do so, I will stop it again, sec [11:01:14] when gerrit is unreachable, apache send the "Service Temporarily Unavailable" error [11:01:16] and start a timer [11:01:24] it will keep serving that page until the timer is expired [11:01:26] startup had failed no idea why, I ketp the recent additions to the error log [11:01:32] one way is to restart gerrit then restart apache to clear the timer [11:01:40] doh [11:01:51] waiting [11:02:26] it spurts the errors in /var/lib/gerrit2/review_site/logs/error_log with a huge long stacktrace (seems like java people love long traces) [11:03:19] so unhappy because it claims it didn't start [11:03:31] but I see a GerritCodeReview running with a new timestamp [11:03:41] ah [11:03:45] working again [11:03:46] http://gerrit.wikimedia.org/r/ [11:03:47] :-] [11:03:49] It's up again [11:03:50] yay [11:03:51] so it seems you saved it [11:03:53] \O/ [11:03:53] \o/ [11:03:54] and log fle is tiny now [11:04:02] !log Ariel saved Gerrit by restarting it! [11:04:03] Logged the message, Master [11:04:07] hahaha [11:04:10] :-))) [11:04:12] thank you Ariel! [11:04:14] yw [11:04:39] would be nice to know what's broken [11:04:50] Jetty is broken as it seems. [11:05:00] We hit this problem the other day [11:05:18] that's unfortunate [11:05:34] Fortunately, tomcat does not seem to have this problem. [11:05:43] ah [11:05:47] so that's next up is it? [11:05:59] We'll have to duscuss that with ^demon [11:06:03] good luck [11:06:08] Thanks :-) [11:33:58] ok mark is going to hate me [11:34:22] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [11:34:40] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [11:37:58] New patchset: Hashar; "adapt role::cache::upload for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50064 [11:43:05] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [12:00:01] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 194 seconds [12:00:19] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 196 seconds [12:05:16] RECOVERY - MySQL disk space on neon is OK: DISK OK [12:06:10] PROBLEM - Puppet freshness on db40 is CRITICAL: Puppet has not run in the last 10 hours [12:06:47] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [12:22:41] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay 0 seconds [12:23:52] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 0 seconds [12:24:46] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [13:03:46] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [13:11:43] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 190 seconds [13:12:19] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 199 seconds [13:14:32]

Error 403 Access denied

[13:14:33] I love it [13:19:31] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [13:20:43] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [13:33:15] apergos: hey buddy, is there a way of checking whether there is anything that is causing abnormally long cache update times for banners in central notice? [13:36:47] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [13:37:14] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [13:45:07] Seddon: I really have no idea, let me hunt around some [13:45:23] apergos: its ok, its a banner code problem [13:45:32] ah hm [13:45:46] New patchset: Hashar; "adapt role::cache::upload for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50064 [13:45:58] that's an area I don't really know [13:46:40] I've hunted down the issue, cant fix the cause but I should be able to get a work around for the time being [13:47:01] ill get our guys to look at it later [14:07:05] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [14:07:32] RECOVERY - MySQL disk space on neon is OK: DISK OK [14:07:50] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [14:08:08] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [14:12:11] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 188 seconds [14:12:47] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 195 seconds [14:17:17] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [14:19:14] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [14:21:11] PROBLEM - Puppet freshness on labstore3 is CRITICAL: Puppet has not run in the last 10 hours [14:26:48] can't manage to get varnish to hit my backends :( [14:26:50] bohhh [14:32:25] New review: Milimetric; "Patch Set 3: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/49710 [14:34:59] Thanks gerrit bot :) What a nice bot. [14:57:05] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 22 seconds [14:58:44] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 1 seconds [15:19:17] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [15:20:11] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [15:27:48] New review: ArielGlenn; "Patch Set 1: Code-Review+2" [operations/dumps] (ariel) C: 2; - https://gerrit.wikimedia.org/r/47427 [15:28:00] New review: ArielGlenn; "Patch Set 1: Verified+2" [operations/dumps] (ariel); V: 2 - https://gerrit.wikimedia.org/r/47427 [15:28:00] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/47427 [15:31:06] New patchset: ArielGlenn; "support multiple output files in one pass, bug fixes, documentation - write multiple sql output files for different mw versions at the same time - fix make install (referenced nonexistent file) - fix some issues with mw version comparison macros and funct" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50166 [15:31:34] New review: ArielGlenn; "Patch Set 1: Verified+2 Code-Review+2" [operations/dumps] (ariel); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50166 [15:31:34] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50166 [15:31:39] gah [15:31:46] ariel's too fast :( [15:32:28] and now i'm trying to tab-complete ariel. must be sleepy [15:32:56] apergos: commit message line 2 always must be a blank line [15:33:02] ugh [15:33:09] I can git ammend that [15:33:12] grr [15:33:17] s/mm/m [15:33:41] or not? [15:33:51] uhhh [15:34:22] becaue it's merged I mean [15:35:05] you'd have to bypass review i guess [15:35:08] :( [15:35:10] no [15:35:16] no good. [15:35:27] * apergos will just do better about future commits [15:37:07] k :) [15:39:15] New patchset: ArielGlenn; "sql2txt: convert sql dumps to input format suitable for LOAD DATA INFILE" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50167 [15:39:24] 65 chars grrr [15:39:26] whatever [15:41:02] New review: ArielGlenn; "Patch Set 1: Verified+2 Code-Review+2" [operations/dumps] (ariel); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50167 [15:41:02] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50167 [15:43:05] <^demon> apergos: I think that's going to be split to a plugin, which would make the length configurable/disableable. [15:43:24] 65 just seems a wee bit short [15:43:30] in this day and age [15:43:31] <^demon> Yeah, 65's a bit short. [15:43:57] seems like gerrit is slowly growing up :-) [15:43:57] <^demon> Something like 100 keeps people from making really really long lines, but wouldn't yell at you for doing 66 :p [15:44:06] yeah 100 is a fine length [15:44:07] i disagree [15:44:29] 65 is so that you can have leading stuff prepended and still fit within 75-80 chars [15:44:41] and that's the thing. 76-80. :-P [15:45:02] so that you can make something like release notes with one line per commit and also include part of the commit id [15:45:40] should be a summary that makes sense alone. starting with line 3 you can elaborate [15:46:09] (and then still don't get too long per line) [15:46:20] <^demon> Well yeah, but yelling at someone for making it like 66 or 70 characters is kind of silly. [15:46:48] is this the feedback on push? or something else? [15:46:53] <^demon> Yeah [15:46:57] (i mean automated feedback) [15:47:19] well i guess they could have multiple levels of scolding [15:47:24] :P [15:47:37] depending on how long the input was [15:48:11] <^demon> The BZ plugin (will be enabled soon) has configurable levels of warning. You can require bugs, suggest bugs if ones not mentioned, or just accept them either way. [15:48:30] <^demon> I was thinking suggest, but not everything has a bug. Probably a superfluous warning. [15:48:52] right [15:49:04] does it pull out bug summaries? [15:49:10] e.g. to display on hover [15:49:19] or notify bugzilla? [15:50:48] Some aspects of this jobs are going to be "fun". [15:51:07] RECOVERY - MySQL disk space on neon is OK: DISK OK [15:51:34] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [15:51:43] New patchset: ArielGlenn; "sql2txt bug fixes" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50172 [15:52:20] New review: ArielGlenn; "Patch Set 1: Verified+2 Code-Review+2" [operations/dumps] (ariel); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50172 [15:52:20] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50172 [15:54:43] PROBLEM - Puppet freshness on cp1035 is CRITICAL: Puppet has not run in the last 10 hours [16:00:06] New patchset: ArielGlenn; "build static binaries, makefile fixes, sha1 field fix" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50175 [16:00:31] New review: ArielGlenn; "Patch Set 1: Verified+2 Code-Review+2" [operations/dumps] (ariel); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50175 [16:00:32] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50175 [16:25:17] New patchset: ArielGlenn; "bugfixes, more documentation, good enough for v0.0.1 now." [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50180 [16:25:39] New review: ArielGlenn; "Patch Set 1: Verified+2 Code-Review+2" [operations/dumps] (ariel); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50180 [16:25:39] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/50180 [16:27:38] New patchset: Dereckson; "(bug 45233) Groups permissions on pt.wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50181 [16:30:43] PROBLEM - Puppet freshness on sq71 is CRITICAL: Puppet has not run in the last 10 hours [16:39:43] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [16:46:55] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [16:47:31] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [16:57:57] could somebody please tell me which *exact* version of OTRS we're using for ticket.wikimedia.org? Basically the x in "2.4.x" [17:06:25] andre__: there's not an exact versions [17:06:28] version* [17:06:55] * jeremyb_ looks up what he read before [17:10:10] andre__: see https://rt.wikimedia.org/Ticket/Display.html?id=452#txn-76819 and the attachment on https://rt.wikimedia.org/Ticket/Display.html?id=452#txn-50143 [17:17:32] RECOVERY - MySQL disk space on neon is OK: DISK OK [17:18:16] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [17:42:24] Oh, yeah, some aspects of this jobs are going to be /really/ "fun". Why do I keep getting myself into these things? :-) [17:43:02] LeslieCarr: feeling better? [17:44:15] matanya: better than yesterday not good still [17:44:35] glad to hear better, sad to hear not 100% [17:50:36] Coren: I've indeed asked myself that question when I heard the news ;) [17:50:58] http://www.bbc.scotlandshire.co.uk/index.php/city-news/219-clever-people-not-needed-says-idiot.html [17:51:05] ooops [17:51:07] wrong channel :p [17:51:19] that is not seriuz stuffz [17:52:32] Seddon: Is that a Scottish Onion I see? [17:54:43] indeed [18:01:01] PROBLEM - Puppet freshness on stafford is CRITICAL: Puppet has not run in the last 10 hours [18:34:27] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [18:39:52] !log enabled Extension:OpenID as a provider on labsconsole [18:39:55] Logged the message, Master [18:40:59] interesting [18:41:01] Ryan_Lane, o_0 thumbs up! [18:41:19] yeah, all the needed fixes went in [18:41:34] there's one more fix I'd like before we can use this on wikimedia projects, too [18:41:48] awwsum [18:41:49] right now it's necessary for the user's page to have content [18:41:56] I'd like for that to not be necessary [18:42:16] that or for all users pages to automatically have content :) [18:43:16] also doable [18:46:06] so where can we use it? i guess for some tools @ labs. maybe even toolserver. but most would want to use SUL openid i guess [18:49:27] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 184 seconds [18:50:30] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 191 seconds [18:51:18] Ryan_Lane: I had a strangely related thought, incidentally. [18:51:24] New patchset: Andrew Bogott; "Turn manage-volumes into a daemon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49916 [18:51:45] New review: Andrew Bogott; "Patch Set 5: -Code-Review" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49916 [18:52:33] Ryan_Lane: Many of the tools will want to have user databases, and we're worried about password reuse enough to have to warn the users about not sharing credentials. What do you think of offering a central auth for tool users and passing auth tokens to the tools instead of them having to have their own? [18:52:58] we need OAuth [18:53:19] Ryan_Lane: That'd seem like the natural fit. :-) [18:53:29] I've been asking for it for ages ;) [18:53:37] I count it as a blocker for tools, honestly [18:53:47] So would I, given the option. [18:53:56] Should I dumpt this on my TODO then? [18:55:01] Ryan_Lane: I'm pretty sure that this will work now: https://gerrit.wikimedia.org/r/#/c/49916/ [19:04:45] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay 0 seconds [19:04:45] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [19:04:54] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 0 seconds [19:05:57] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [19:07:56] RoanKattouw: you here? [19:09:51] PROBLEM - MySQL Slave Delay on db66 is CRITICAL: CRIT replication delay 74754 seconds [19:10:35] Ryan_Lane: Hey, btw, when you say "We need OAuth", do you mean "We'd like to have an OAuth server at the labs for end-user credentials" or "We'd like SUL to be an OAuth server"? [19:11:08] Ryan_Lane: Because the latter would be cool, but the former seems closer to "Yeah, that can be done in this lifetime". :-) [19:13:16] New patchset: Jalexander; "Adding CentralNotice user right to meta and testwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50196 [19:17:35] Coren: we need wikimedia SUL to provide OAth [19:17:37] OAuth [19:17:59] then for tools/bots to use it to authenticate [19:18:07] Ryan_Lane: Expecting this to be feasible (politically) this lifetime? [19:18:23] it's been on the roadmap for about 1.5 years [19:18:31] Ah, so the will is there. [19:18:56] yes, and someone is technically assigned to it [19:18:59] that someone being csteipp [19:19:21] "technically" as in it's on his list but he never managed to reach it yet? [19:19:54] Coren: The good news is that we're planning a sprint to finish it out, as soon as lua is deployed [19:19:54] New patchset: Jalexander; "Adding CentralNotice user right to meta and testwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50196 [19:20:11] csteipp: Yeay! So I can actually plan according to its presence? [19:20:12] Ryan_Lane: Yeah I'm here [19:20:15] (at the Wikia office) [19:20:17] Coren: you need to talk to csteipp about OAuth [19:20:20] RoanKattouw: ah. ok [19:20:28] csteipp: is SUL not a blocker for OAuth? [19:20:46] RoanKattouw: we're having a meeting about caching and resources that 404 [19:20:53] pgehres: You mean unified accounts? [19:20:56] Not really [19:21:01] RoanKattouw: what resources 404 on mediawiki changes? [19:21:03] Ryan_Lane: When, now, or later today? [19:21:03] It would help, but not a blocker. [19:21:21] csteipp: ah, k. well i have the db and am multitasking playing with pt-table-sync now [19:21:23] Coren: I think you can plan for OAuth around May [19:21:24] csteipp: What protected resources do you plan on supporting? [19:21:37] RoanKattouw: having one right now, we don't need you here or anything, but wanted to ask some questions [19:21:38] Coren: also, be very, very careful when talking to ryan. OAuth != OATHAuth. :) [19:21:44] Ryan_Lane: So most assets are loaded via load.php and will never 404 [19:21:57] Coren: API only. Authz based on the api module. That is probably all we can get done for now. [19:22:07] OAuth 1 [19:22:20] Some things (and if you're in debug mode: all things) are loaded via bits.wm.o/static-1.21wmfN/ and can 404 once N is sufficiently far in the past [19:22:56] RobLa and I talked about this at some point and figured out that to prevent 404s from Squid-cached pages, we'd need to keep around these static-1.21wmfN dirs down to N = latest - 3 or something [19:23:15] I don't think that was ever actually done, and these dirs are often removed earlier than they should be [19:23:44] (static-1.21wmfN is really a bunch of symlinks into php-1.21wmfN, and those dirs are removed fairly aggressively) [19:24:01] csteipp: What would be extra juicy nice for tool /end users/ is to have "username" as protected resources so that bots-as-clients can have end-users prove they are project users without giving credentials to give API access. [19:24:03] RoanKattouw: why are those based on mw versions? [19:24:34] If we do actually do a good job keeping those dirs around until at least 30 days after they've been obsoleted, we shouldn't have any 404s from MW itself. That just leaves random people that copy URLs and use them, not noticing they're versioned, but we can't do much about that [19:24:41] Well resources can change between versions [19:25:00] csteipp: "FooBot would like to access your user information" [19:25:03] In particular, these paths are used to load JS in debug mode, and JS obviously changes between versions [19:25:11] We can also have multiple versions live at any given time, and often do [19:25:23] right [19:25:37] we're discussing this with brad [19:25:41] This would be a bit less inconvenient with slots, git-deploy style [19:25:47] indeed [19:26:04] csteipp: I can think of about 20 major tools offhand that would greatly benefit from this. [19:26:10] Coren: So if FooBot wants to access a wiki, the user will grant it access to whatever api modues it needs, and the bot will hold the tokens to access the api on behalf of the user [19:26:13] Because slot0 can still be expected to point to something, although it might be more recent that what the page is expecting but that theoretically shouldn't matter [19:26:33] csteipp: Ah, you'd have module granularity? [19:26:34] But are you saying you want the user to authenticate to the bot also? [19:26:43] csteipp: Webtools, mostly. [19:26:44] Coren: Yes [19:27:07] Coren: For that, OpenID is probably a better service [19:27:09] csteipp: So that a user can login in a tool with his SUL without disclosing their credentials to the tool [19:27:10] Then we'd just have 404s from the case where MW removes/renames a file, and a page generated by version N-2 ends up requesting that resource from version N (because the slots have cycled through), which 404s [19:27:38] csteipp: Possibly, but OAuth would permit it if it's already going to be in place. [19:29:19] Coren: Yes, the oauth psudo-auth will probably be possible. If you can give a web *server* a token to act on your behalf, the web server can be very certain you are that user. [19:29:35] This will NOT work for javascript apps [19:29:48] csteipp: Yeah, I was only thinking webservers here. [19:29:50] (need OAuth 2 for that...) [19:29:52] Cool [19:30:32] csteipp: Admitedly, Having SUL be an OpenID service would be even simpler for that use case. But is that even on the roadmap? [19:30:52] Coren: No, not officially [19:31:06] Coren: well see what Ryan just deployed [19:31:34] The next things that would be awesome to implement would be OpenID and SAML [19:31:44] We only need one thing to deploy openid to production [19:32:09] Ryan_Lane: What is that? I've been meaning to look at his new work, but haven't yet.. [19:32:12] Ryan_Lane: Tell me what it is and I'll do it. That would solve all my problems in one fell swoop! [19:32:19] I'd *really* like to have a centralized domain for openid identity urls [19:32:29] but that's not the issue [19:32:46] the openid extension needs user pages to have content to work [19:32:52] because otherwise the pages are 404s [19:32:59] so, either, we need to ensure user pages have content [19:33:08] or we need to not deliver 404s for empty user pages [19:33:22] but only if the user actually exists [19:33:26] yep [19:34:13] Until those things are done, that would just mean that "If you want to login you need to have created your user page on the project"? [19:34:21] yes, but that sucks [19:34:31] and I'd prefer not to have that in production [19:34:35] I'm willing to deal with it in labs [19:35:42] Why not just return a 204 on a user page that doesn't exist but where the user does? [19:36:03] what does the OpenID spec say about that? [19:36:23] RECOVERY - MySQL disk space on neon is OK: DISK OK [19:36:29] jeremyb_: It's a 200. Success. [19:36:39] i mean a 204 [19:36:41] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [19:36:58] jeremyb_: Well, HTTP sez it should count as success. Lemme go check the spec. [19:37:50] Coren: anyway, that means web browsers (humans) would get white screen of death? [19:38:28] jeremyb_: No, you're allowed to return content on a 204, and the UA must display it because it's 2XX [19:38:35] * Coren tries this. [19:40:07] The 204 response MUST NOT include a message-body, and thus is always [19:40:11] terminated by the first empty line after the header fields. [19:40:24] Ah. I was confused then. [19:41:36] jeremyb_: Darn, because both Chrome and Firefox behave like I just said. [19:41:38] http://www.uberbox.org/test.php [19:42:29] Ah. No they don't. Apache is "helpfully" changing my 204 into a 200. [19:42:37] hah [19:45:06] ^demon: Any news from the gerrit front? [19:46:07] There he goes :-) [19:47:03] Hi ^demon [19:47:17] <^demon> Hi. [19:47:42] After reading irc logs and checking ganglia, it seems gerrit behaved nicely ever since? [19:48:23] <^demon> Yeah, hasn't freaked out since this morning. [19:48:49] Have you thought about which of the possible options we go for? [19:49:58] Coren: I think 200 is appropriate for users that have accounts [19:50:02] assuming we are using openid [19:50:08] otherwise 404s make sense [19:50:26] after all, we are serving content when using openid [19:50:32] Ryan_Lane: Yeah, that'd work. :-) [19:50:39] <^demon> qchris: I'd like to play around with tomcat a bit in labs, and I setup an instance to do that. [19:51:11] ^demon: So nothing I can assist you with for now? [19:52:08] <^demon> Not this minute, I'm kinda in the middle of a few things. [19:52:19] ok. Thanks. [19:54:23] hey RoanKattouw, what do you mean when you say "For that request I see: Content-Type: application/json; charset=utf-8"? do you have a separate log file where you checked that? [19:55:18] <^demon> qchris: Actually, could you help ori-l? He's got a user who's rights don't seem to be inheriting properly :\ [19:55:54] I'll try. But I am just a normal user on gerrit.wm [19:56:31] That might not work, then :/. The user is 'Mattflaschen'. [19:57:20] Oh :-( [19:59:24] ori-l What is the problem exactly? [20:01:04] drdee: I just copied the URL into my browser and used Firebug to look at the headers [20:02:13] !log some wikis have myisam page_props tables. converting all to innodb via osc [20:02:15] Logged the message, Master [20:03:08] yay, I didn't know there were any of those left [20:07:04] binasher: I can't wait till the default is innodb ;) [20:07:36] qchris: as a WMF employee and a deployer, Mattflaschen ought to have (at minimum, I think) +2 in extensions, but he does not. [20:08:49] oril-l: I'll see if I can find anything. [20:10:02] thanks. [20:12:05] $extdb->query( "SET table_type=InnoDB" ); [20:12:09] apparently that doesn't work then [20:14:16] ori-l: Shouldn't Mattflaschen be on https://gerrit.wikimedia.org/r/#/admin/groups/uuid-4cdcb3a1ef2e19d73bc9a97f1d0f109d2e0209cd,members [20:15:31] qchris: yes! [20:15:42] but I don't have the perms to add him. [20:15:53] Neither do I :-( [20:16:09] <^demon> No, if he's in the included group he shouldn't have to manually be there. [20:16:49] He is not included as far as I can tell [20:17:18] right, but I'd bet that that fixes things, and I don't think it's fair to block him on Gerrit/LDAP issues. [20:17:36] AaronSchulz: i'm not going to explicitly change default_storage_engine, but it is innodb in the version of mariadb we're slowly migrating to [20:17:58] <^demon> ori-l: Well that won't help him in any other groups he should be having included membership in. [20:18:09] <^demon> Plus, this was the whole reason we held off upgrading, was for this behavior. [20:18:32] binasher: actually I see, the sql file says to use MyISAM [20:18:36] that should probably just be changed [20:19:04] ^demon: OK, so what should I do? Bugzilla? RT? just wait? [20:19:17] RECOVERY - MySQL Slave Delay on db66 is OK: OK replication delay 0 seconds [20:19:22] AaronSchulz: heh, definitely. i wasn't sure which was used for es table creation [20:19:26] <^demon> Bah. [20:19:31] <^demon> Why is he not in the wmf group. [20:19:34] <^demon> He totally should be. [20:19:48] * ori-l shrugs. [20:19:51] Shruggery. [20:20:20] <^demon> For some reason I swore he was there. [20:20:47] RECOVERY - MySQL Replication Heartbeat on db66 is OK: OK replication delay 0 seconds [20:21:43] how would I get him added? [20:26:51] RoanKattouw: okay, then it's an error in somewhere in the logging [20:29:49] <^demon> ori-l: I already did. He should be set now. [20:29:53] <^demon> May need to log out/in. [20:31:09] ^demon: very much obliged! [20:31:17] <^demon> yw. [20:36:44] who is going to plus 2 that is the question (xt store) [20:59:32] New patchset: Ori.livneh; "EventLogging: fix test2wiki configs; +CodeEditor" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49991 [21:00:04] New review: Ori.livneh; "Patch Set 2: Code-Review+2" [operations/mediawiki-config] (master) C: 2; - https://gerrit.wikimedia.org/r/49991 [21:00:51] New patchset: Hashar; "gerrit: qa/* IRC notification to #wikimedia-dev" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50268 [21:03:04] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49991 [21:04:54] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 185 seconds [21:05:12] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 188 seconds [21:05:24] !log olivneh synchronized wmf-config/CommonSettings.php [21:05:26] Logged the message, Master [21:05:38] New review: Cmcmahon; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50268 [21:10:48] New review: Ryan Lane; "Patch Set 5: Code-Review-1" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/49916 [21:10:56] andrewbogott: ^^ [21:17:12] RECOVERY - MySQL Slave Running on db58 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [21:17:53] New review: Zfilipin; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50268 [21:20:03] PROBLEM - MySQL Slave Delay on db58 is CRITICAL: CRIT replication delay 157786 seconds [21:21:25] RECOVERY - Puppet freshness on labstore2 is OK: puppet ran at Thu Feb 21 21:21:20 UTC 2013 [21:21:46] New review: Catrope; "Patch Set 2:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47103 [21:23:50] !log olivneh synchronized php-1.21wmf9/extensions/SyntaxHighlight_GeSHi 'Disable highlighting of very large files.' [21:23:51] Logged the message, Master [21:29:52] easy merge to update some Gerrit IRC notifications : https://gerrit.wikimedia.org/r/50268 ;-] [21:31:23] !log olivneh synchronized php-1.21wmf10/extensions/SyntaxHighlight_GeSHi 'Disable highlighting of very large files.' [21:31:23] Logged the message, Master [21:32:37] New patchset: Pyoungmeister; "mariadb test boxes for pmtpa s2-5" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50276 [21:34:18] New patchset: Pyoungmeister; "pulling dbs 52, 39, 51, and 35 for upgrade mariadb testing upgrade" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50278 [21:35:10] New patchset: Andrew Bogott; "Turn manage-volumes into a daemon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49916 [21:37:52] !log restarting mysql on db1047 to pick up conf changes [21:37:53] Logged the message, notpeter [21:47:34] !log olivneh synchronized php-1.21wmf9/extensions/CodeEditor [21:47:35] Logged the message, Master [21:48:00] New review: Pyoungmeister; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50278 [21:48:01] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50278 [21:48:10] !log olivneh synchronized php-1.21wmf9/extensions/EventLogging [21:48:12] Logged the message, Master [21:48:29] Ryan_Lane: paravoid: $role::ldap::config::production::ldapconfig::servernames still points to nfs1 and nfs2, but paravoid took down ldap there. what should it be? [21:48:41] oh [21:48:42] crap [21:48:45] ryan actually did [21:48:55] I didn't see any references left to it [21:48:58] !log py synchronized wmf-config/db-pmtpa.php 'removing 1 slave for s2-5 from db-secondary for upgrades and maria' [21:48:59] Logged the message, Master [21:49:00] I obviously missed that [21:49:06] I just merely asked "why do we have ldap on nfs1/2?" [21:49:09] binasher: virt0.wikimedia.org virt1000.wikimedia.org [21:49:23] !log olivneh synchronized php-1.21wmf10/extensions/EventLogging [21:49:24] Logged the message, Master [21:50:52] ok, thanks. i just need to update the graphite/ishmael auth, that's why they're down [21:51:01] oh [21:51:02] heh [21:51:22] I see other references too [21:51:24] but now that ldap registration is open, is there a wmf ldap group or something similar that can be required instead of just a valid user? [21:51:34] there was a has-signed-nda group [21:51:36] binasher: there's a wmf group [21:51:43] that was created specifically for analytics [21:51:44] paravoid: we don't have one of those yet [21:51:52] oh it wasn't created yet? [21:51:58] let's see [21:52:01] does this look right? Require ldap-group cn=wmf,ou=groups,dc=wikimedia,dc=org [21:52:18] I think so. let me look at the docs [21:52:23] thanks [21:52:46] this is mod_authz_ldap or mod_auth_ldap? [21:53:20] hm. must be mod_auth_ldap [21:53:37] uhm? [21:53:48] mod_authnz_ldap [21:53:54] there's no mod_auth_ldap anymore [21:53:59] it's authnz now [21:54:02] heh. way too many of these damn things [21:54:03] now being apache 2.2 iirc [21:54:23] binasher: yes, that's correxct [21:54:25] *correct [21:54:51] assuming group support is setup [21:55:06] PROBLEM - mysqld processes on db51 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:55:08] the default for AuthLDAPGroupAttribute is correct [21:55:16] same with AuthLDAPGroupAttributeIsDN [21:56:02] yep. that by itself should work [21:56:09] PROBLEM - mysqld processes on db52 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:56:20] !log running do-release-upgrade on dbs 52, 39, 51, and 35 [21:56:22] Logged the message, notpeter [21:57:03] PROBLEM - mysqld processes on db35 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:57:12] "If the attribute specified by AuthzLDAPMemberKey only holds the login names of group members, rather than the full DN, change the AuthzLDAPSetGoupAuth directive to…" what does the wmf group actually contain? [21:57:12] PROBLEM - mysqld processes on db39 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:57:35] dns [21:57:55] * Ryan_Lane is a fan of referential integrity [21:57:55] ok, hopefully this just works [22:00:56] New review: Pyoungmeister; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50276 [22:00:57] PROBLEM - Host db52 is DOWN: PING CRITICAL - Packet loss = 100% [22:01:05] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50276 [22:01:51] RECOVERY - Host db52 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [22:02:18] binasher: notpeter : is there a firm plan to switch to mariadb in production? I am wondering if I should setup mariadb instances in labs (for beta) [22:03:10] hashar: it's more of a soft plan [22:03:30] but that does sound like a good idea [22:03:33] I tried to set one up this week, got stuck with the custom apt repository :-] [22:04:02] I guess I will either wait a bit or get you guys to install mariadb on an instance :-] [22:04:55] hashar: how did that get you stuck? [22:05:20] I can't remember the details, but mostly that you have to use some puppet class which is only meant for production [22:05:36] PROBLEM - Full LVS Snapshot on db51 is CRITICAL: Connection refused by host [22:05:53] hashar: well, those apt repos are all reachable from the whole internet [22:05:54] PROBLEM - MySQL Slave Delay on db51 is CRITICAL: Connection refused by host [22:06:00] so I can take a look if you point me at the right instances [22:06:12] PROBLEM - MySQL Slave Running on db51 is CRITICAL: Connection refused by host [22:06:15] ah the coredb_mysql puppet module :-] It is full of production only settings hehe [22:06:21] PROBLEM - MySQL Recent Restart on db51 is CRITICAL: Connection refused by host [22:06:30] PROBLEM - MySQL Idle Transactions on db35 is CRITICAL: Connection refused by host [22:06:31] PROBLEM - MySQL disk space on db35 is CRITICAL: Connection refused by host [22:06:31] PROBLEM - MySQL Replication Heartbeat on db51 is CRITICAL: Connection refused by host [22:06:37] notpeter: I might poke you next week about it :-] [22:06:39] PROBLEM - MySQL Recent Restart on db39 is CRITICAL: Connection refused by host [22:06:39] PROBLEM - MySQL disk space on db51 is CRITICAL: Connection refused by host [22:06:42] ok, sounds good [22:06:48] PROBLEM - MySQL Recent Restart on db35 is CRITICAL: Connection refused by host [22:06:48] PROBLEM - MySQL Idle Transactions on db51 is CRITICAL: Connection refused by host [22:06:57] PROBLEM - MySQL Replication Heartbeat on db35 is CRITICAL: Connection refused by host [22:06:58] PROBLEM - MySQL Replication Heartbeat on db39 is CRITICAL: Connection refused by host [22:07:03] and will probably poke binasher about graphite :-] [22:07:06] PROBLEM - Puppet freshness on db40 is CRITICAL: Puppet has not run in the last 10 hours [22:07:06] PROBLEM - MySQL disk space on db39 is CRITICAL: Connection refused by host [22:07:24] PROBLEM - MySQL Slave Delay on db35 is CRITICAL: Connection refused by host [22:07:24] PROBLEM - MySQL Slave Delay on db39 is CRITICAL: Connection refused by host [22:07:33] PROBLEM - Full LVS Snapshot on db39 is CRITICAL: Connection refused by host [22:07:42] PROBLEM - MySQL Slave Running on db39 is CRITICAL: Connection refused by host [22:07:42] PROBLEM - Full LVS Snapshot on db35 is CRITICAL: Connection refused by host [22:07:51] PROBLEM - MySQL Idle Transactions on db39 is CRITICAL: Connection refused by host [22:07:52] PROBLEM - MySQL Slave Running on db35 is CRITICAL: Connection refused by host [22:07:56] New review: awjrichards; "Patch Set 1: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50109 [22:07:57] Change merged: awjrichards; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50109 [22:08:18] RECOVERY - MySQL Idle Transactions on db35 is OK: OK longest blocking idle transaction sleeps for seconds [22:08:18] RECOVERY - MySQL disk space on db35 is OK: DISK OK [22:08:18] RECOVERY - MySQL Replication Heartbeat on db51 is OK: OK replication delay seconds [22:08:27] RECOVERY - MySQL disk space on db51 is OK: DISK OK [22:08:36] RECOVERY - MySQL Recent Restart on db35 is OK: OK seconds since restart [22:08:36] RECOVERY - MySQL Idle Transactions on db51 is OK: OK longest blocking idle transaction sleeps for seconds [22:08:45] RECOVERY - MySQL Replication Heartbeat on db35 is OK: OK replication delay seconds [22:08:45] RECOVERY - MySQL Replication Heartbeat on db39 is OK: OK replication delay seconds [22:08:45] RECOVERY - MySQL Recent Restart on db39 is OK: OK seconds since restart [22:08:54] RECOVERY - MySQL disk space on db39 is OK: DISK OK [22:09:12] RECOVERY - MySQL Slave Delay on db35 is OK: OK replication delay seconds [22:09:12] RECOVERY - Full LVS Snapshot on db51 is OK: OK no full LVM snapshot volumes [22:09:12] RECOVERY - MySQL Slave Delay on db39 is OK: OK replication delay seconds [22:09:30] RECOVERY - MySQL Slave Running on db39 is OK: OK replication [22:09:30] RECOVERY - Full LVS Snapshot on db35 is OK: OK no full LVM snapshot volumes [22:09:30] Jeff_Green, poke [22:09:30] RECOVERY - MySQL Slave Delay on db51 is OK: OK replication delay seconds [22:09:35] off for now *wave* [22:09:39] RECOVERY - MySQL Idle Transactions on db39 is OK: OK longest blocking idle transaction sleeps for seconds [22:09:39] RECOVERY - MySQL Slave Running on db35 is OK: OK replication [22:09:52] RECOVERY - MySQL Slave Running on db51 is OK: OK replication [22:09:57] RECOVERY - MySQL Recent Restart on db51 is OK: OK seconds since restart [22:10:51] RECOVERY - Full LVS Snapshot on db39 is OK: OK no full LVM snapshot volumes [22:11:40] New patchset: Asher; "fix ldap auth for ishmael/graphite, require wmf group membership" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50297 [22:12:27] !log awjrichards synchronized wmf-config/mobile.php 'Add photo upload schema for event logging' [22:12:29] Logged the message, Master [22:13:05] New review: Asher; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/50297 [22:13:13] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50297 [22:13:24] PROBLEM - Host db35 is DOWN: PING CRITICAL - Packet loss = 100% [22:13:42] PROBLEM - Host db51 is DOWN: PING CRITICAL - Packet loss = 100% [22:14:08] binasher: merged your stuff [22:14:15] thanks [22:14:45] RECOVERY - Host db51 is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms [22:15:30] RECOVERY - Host db35 is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [22:16:06] PROBLEM - Host db39 is DOWN: PING CRITICAL - Packet loss = 100% [22:18:03] RECOVERY - Host db39 is UP: PING OK - Packet loss = 0%, RTA = 0.41 ms [22:25:06] RECOVERY - mysqld processes on db51 is OK: PROCS OK: 1 process with command name mysqld [22:26:00] RECOVERY - mysqld processes on db35 is OK: PROCS OK: 1 process with command name mysqld [22:26:00] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [22:26:00] PROBLEM - Host db52 is DOWN: PING CRITICAL - Packet loss = 100% [22:26:22] RECOVERY - mysqld processes on db39 is OK: PROCS OK: 1 process with command name mysqld [22:27:12] RECOVERY - Host db52 is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms [22:29:09] PROBLEM - MySQL Slave Delay on db35 is CRITICAL: CRIT replication delay 1513 seconds [22:29:10] PROBLEM - MySQL Slave Delay on db39 is CRITICAL: CRIT replication delay 2308 seconds [22:29:27] PROBLEM - MySQL Slave Delay on db51 is CRITICAL: CRIT replication delay 1591 seconds [22:31:51] New patchset: Mattflaschen; "Enable PostEdit on Commons and GuidedTour on that + ko, nl, vi" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50301 [22:33:02] New review: Ori.livneh; "Patch Set 1: Code-Review+2" [operations/mediawiki-config] (master) C: 2; - https://gerrit.wikimedia.org/r/50301 [22:33:25] !log awjrichards synchronized php-1.21wmf10/extensions/MobileFrontend [22:33:26] Logged the message, Master [22:33:45] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50301 [22:33:56] i just got a bunch of rsync errors from that sync-dir: [22:33:57] mw125: rsync: mkdir "/apache/common-local/php-1.21wmf10/extensions/MobileFrontend" failed: No such file or directory (2) [22:33:57] mw125: rsync error: error in file IO (code 11) at main.c(605) [Receiver=3.0.9] [22:35:39] !log awjrichards synchronized php-1.21wmf9/extensions/MobileFrontend/ [22:35:41] Logged the message, Master [22:36:21] RECOVERY - MySQL Slave Delay on db35 is OK: OK replication delay 0 seconds [22:36:39] RECOVERY - MySQL Slave Delay on db51 is OK: OK replication delay 0 seconds [22:38:18] RECOVERY - mysqld processes on db52 is OK: PROCS OK: 1 process with command name mysqld [22:41:54] PROBLEM - MySQL Slave Delay on db52 is CRITICAL: CRIT replication delay 2479 seconds [22:43:41] * RoanKattouw looks around for the on-duty person [22:43:43] apergos! [22:43:53] it is so after midnight [22:43:58] apergos: Could you look at and possibly merge https://gerrit.wikimedia.org/r/#/c/47103/ please? It's got approval from CT now [22:44:03] Oh crap sorry [22:44:04] now? [22:44:08] Nah tomorrow is fine [22:44:14] ok, I will open the tab [22:44:19] I sometimes forget you're ten whole hours ahead of us [22:44:25] heh [22:44:32] * RoanKattouw installs FoxClocks on his new machine [22:44:39] :-) [22:44:55] I know I shouldn't even be answering emails at this hour, it sets a bad precedent [22:45:05] just rying to get a few sent and then off to bed [22:46:03] Started the E3 scap [22:47:58] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 1 seconds [22:48:34] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [22:50:58] RECOVERY - MySQL Slave Delay on db52 is OK: OK replication delay 0 seconds [22:54:21] New patchset: Ori.livneh; "Add ack-grep to standard packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50306 [22:56:31] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [22:57:25] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [23:00:53] RECOVERY - MySQL Slave Delay on db39 is OK: OK replication delay 0 seconds [23:02:33] !log mflaschen Started syncing Wikimedia installation... : Deploy E3Experiments, GettingStarted, and GuidedTour [23:02:34] Logged the message, Master [23:05:13] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [23:07:43] New review: Mattflaschen; "Patch Set 1: Code-Review+1" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50306 [23:11:22] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Thu Feb 21 23:10:56 UTC 2013 [23:11:49] RECOVERY - Puppet freshness on labstore1 is OK: puppet ran at Thu Feb 21 23:11:22 UTC 2013 [23:12:57] New patchset: Pyoungmeister; "increasing granularity of users in mediawiki_new module and making statistics.pp use module user def" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50310 [23:18:53] !log mflaschen Finished syncing Wikimedia installation... : Deploy E3Experiments, GettingStarted, and GuidedTour [23:18:55] Logged the message, Master [23:21:41] !log olivneh synchronized php-1.21wmf10/extensions/CodeEditor 'Syncing patch that disables background linting' [23:21:42] Logged the message, Master [23:23:03] New patchset: Pyoungmeister; "increasing granularity of users in mediawiki_new module and making statistics.pp use module user def" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50310 [23:24:18] New review: Pyoungmeister; "Patch Set 2: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50310 [23:24:31] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50310 [23:24:33] scap is done, but there were problems at the end with wikidiff2.so and texvc: [23:24:35] http://pastebin.com/gX4tP1Qa [23:25:00] ori-l, I'm not sure if that's a known issue. [23:26:08] superm401, spence has given me problems with texvc before [23:26:54] spagewmf, good to know. What about wikidiff2? [23:26:55] New patchset: Faidon; "Another fix for check_solr (duh!)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50312 [23:27:07] hrm, no idea. Reedy? [23:27:46] superm401 the question is what is spence doing. If it's serving wiki pages then working extensions matter, but I think it's for some other purpose [23:28:37] there's an open bz bug, IIRC, for packaging texvc and putting it in our debian repo [23:29:30] Anyone know what ishmael is? [23:29:32] New review: Faidon; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50312 [23:29:41] http://wikitech.wikimedia.org/view/Ishmael [23:29:42] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50312 [23:29:55] When I ssh in, it just keeps saying "Call me ishmael". [23:29:58] heh [23:30:02] thats from [23:30:04] Moby Dick [23:30:11] duh [23:30:18] ;) [23:30:28] > "Moby-Dick" begins with the line "Call me Ishmael." According to the American Book Review's rating in 2011, this is one of the most recognizable opening lines in Western literature.[4] [23:31:43] RECOVERY - Puppet freshness on labstore3 is OK: puppet ran at Thu Feb 21 23:31:33 UTC 2013 [23:39:40] RECOVERY - Puppet freshness on search13 is OK: puppet ran at Thu Feb 21 23:39:15 UTC 2013 [23:42:40] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [23:43:43] New review: Ryan Lane; "Patch Set 6: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/49916 [23:44:23] TimStarling: http://en.wikipedia.org/w/index.php?title=Module:Convert&action=edit [23:44:29] superm401: Ignore the spence errors. As for ishmael, it's a DB stats server https://ishmael.wikimedia.org/ [23:44:30] I like the comment in shallow_copy(t) [23:44:46] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [23:44:53] Reedy, thanks. [23:44:59] Worth filing a bug or RT to fix? [23:46:57] No [23:54:44] AaronSchulz: yeah, it's nice [23:55:28] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [23:55:37] that Convertdata module will be slow though, until we merge mw.loadData() [23:56:48] RECOVERY - MySQL disk space on neon is OK: DISK OK [23:57:37] New patchset: Pyoungmeister; "another swithc away from old mediawiki::user class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50314 [23:58:32] New review: Pyoungmeister; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/50314 [23:58:42] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50314