[00:00:09] I'm not sure it is [00:00:19] I think it's making sure a username isn't used anywhere [00:00:36] Sorry. Yeah, you're right [00:02:59] well, I'll cut off my pinky if my first deploy broke CentralAuth :-( , but it's just client-side JavaScript fiddling with the Account Creation form. [00:03:14] !log olivneh synchronized php-1.21wmf1/extensions/PostEdit [00:03:22] Logged the message, Master [00:03:48] Yep, its in $wgLocalDatabases [00:03:52] Grr.... [00:04:58] !log olivneh Started syncing Wikimedia installation... : [00:05:07] Logged the message, Master [00:06:39] Haha [00:06:39] Snap [00:06:57] csteipp: hack time! [00:06:57] wfRunHooks( 'CentralAuthWikiList', array( &$wikiList ) ); [00:07:17] foreach wiki in array, if it contains wikivoyage, remove it from the array [00:07:28] You mean pluck them out with the hook [00:07:28] array(2) { [00:07:28] ["sha1base36"]=> [00:07:28] string(262) "r2baidcig82irc3n9xfozku4mgo8b96, r2baidcig82irc3n9xfozku4mgo8b96, r2baidcig82irc3n9xfozku4mgo8b96, r2baidcig82irc3n9xfozku4mgo8b96, r2baidcig82irc3n9xfozku4mgo8b96, r2baidcig82irc3n9xfozku4mgo8b96, r2baidcig82irc3n9xfozku4mgo8b96, r2baidcig82irc3n9xfozku4mgo8b96" [00:07:28] ["Sha1base36"]=> [00:07:30] string(31) "r2baidcig82irc3n9xfozku4mgo8b96" [00:07:31] } [00:07:33] binasher: hope that's not a ceph bug :/ [00:07:43] Reedy: Yeah, we can do that... [00:07:47] And also stick it in memcached [00:08:13] Orr.... [00:08:15] csteipp: we make an all-wmflabs.dblist [00:08:28] I think regex would be fast enough... [00:08:31] I think that's the better idea [00:08:43] I think we should revert the all.dblist part [00:08:44] $wgConf->wikis = array_map( 'trim', file( "$IP/../all.dblist" ) ); [00:09:09] let me make a commit for this [00:09:20] !log olivneh Finished syncing Wikimedia installation... : [00:09:31] Logged the message, Master [00:09:33] ori-l: that was quick [00:11:10] shoot, that reminds me... [00:12:02] New patchset: Reedy; "Use all-wmflabs.dblist for wmflabs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28642 [00:12:08] csteipp: https://gerrit.wikimedia.org/r/28642 [00:12:10] Reedy: i ran scap --turbo [00:13:28] works for now... [00:13:49] did you mean to leave all.dblist in labs? [00:14:19] instead of the new all-wmflabs.dblist? [00:14:55] hah [00:14:57] But yeah, otherwise +2 [00:15:23] New patchset: Reedy; "Use all-wmflabs.dblist for wmflabs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28642 [00:16:48] argh [00:16:48] fail [00:17:09] New patchset: Reedy; "Use all-wmflabs.dblist for wmflabs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28642 [00:17:17] csteipp: some dollar signs would be useful too [00:18:33] dollar signs? [00:18:52] $all = "IP/../all.dblist"; [00:19:01] Oh! Yeah... [00:19:27] AaronSchulz: ceph bug? [00:21:10] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28642 [00:21:23] !log disabling puppet on (the already depooled) ms-fe1 [00:21:34] Logged the message, Master [00:21:38] New patchset: Ori.livneh; "Enable PostEdit for dewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28645 [00:23:00] Change merged: Ori.livneh; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28645 [00:23:00] !log reedy synchronized wmf-config/ [00:23:10] Logged the message, Master [00:23:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:26:12] !log olivneh synchronized php-1.21wmf1/extensions/PostEdit [00:26:24] Logged the message, Master [00:26:52] !log olivneh synchronized wmf-config/InitialiseSettings.php [00:27:08] Logged the message, Master [00:31:22] AaronSchulz: ping? [00:34:55] paravoid: want me to throw something at him? [00:35:17] hehe, sure [00:35:27] unless he's doing something important [00:35:37] mine can probably wait [00:36:34] paravoid: it helps to just say what the thing is briefly ;) [00:36:39] * AaronSchulz was filing a tracker report [00:36:55] I guess it does, sorry :) [00:37:01] so, ms-fe1 has swift 1.7.4 now [00:37:05] it's of course depooled [00:37:17] everything seems to be working with my rudimentary tests [00:37:22] do you want to give it a try at some point too? [00:38:01] if all goes well, I'll probably pool it and monitor it for a few hours to see if the leak's gone [00:38:15] what did you test? auth, container head/get, object head/get? [00:38:16] (then depool it so we can have a proper weekend :) [00:38:18] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [00:39:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.017 seconds [00:39:28] yes, but also our own unauthenticated URLs to make sure rewrite.py still works. [00:39:44] yeah, that seems fine then :) [00:41:03] if you're busy I can wait until tomorrow your morning too [00:41:25] I'd like you to at least do an attempt too, and also try e.g. a PUT [00:41:40] before pooling into production [00:42:11] paravoid: there is also the script fileOpPerfTest.php [00:42:29] * AaronSchulz should put some small test files in his user dir ;) [00:42:32] where and what does it do? [00:42:45] and do I run it with "php fileOpPerfTest.php" or does it need something else? [00:42:47] it is in maintenance/ for mediawiki [00:43:04] it would need arguments, you can run it with --help to see [00:43:28] do you have a canonical place from which you run maintenance scripts? [00:43:38] is it fenari? [00:43:45] I'd prefer hume [00:43:57] I run it locally, like php maintenance/fileOpPerfTest.php --b1 local-swift --srcdir /var/www/SwiftWiki/core/dump --maxfiles 100 --quick [00:44:21] (/dump has a bunch of small random jpg files in it) [00:44:35] the go to a "unittest-cont1" container or something and get deleted afterwards [00:44:55] so do you use a socks or something? [00:45:07] ssh -D etc.? [00:45:18] for what? [00:45:33] for connecting to ms-fe1? [00:45:48] when you said "locally" you meant locally on hume or locally on your laptop? [00:45:57] the later [00:46:27] ok, so the question stands :) how do you connect to the proxy then? [00:46:32] if I was running it against the production clusters (as I have before a few times) it would be from hume [00:46:34] aha [00:46:47] but yeah, you'd have to cludge the auth url [00:46:49] otherwise it would just hit fe1-3 [00:47:09] no, it'd just hit the squids, and they'd block random methods etc. [00:47:19] PUT/POST [00:47:31] doh....computer just locked up [00:47:44] paravoid: heh, I still see my trick in PrivateSettings [00:47:47] hm? [00:48:21] I defined another filebackend instance and pointed it to the target box [00:48:40] it only works if you run the script for testwiki [00:49:17] care to give an example? :) [00:49:22] so you pass in unittest-swift2 or something instead of local-swift to the php script [00:50:19] paravoid: php maintenance/fileOpPerfTest.php --b1 unittest-swift3 --srcdir /home/aaron/somefiles --maxfiles 100 --quick [00:50:50] you could define unittest-swift3 in PrivateSettings just like unittest-swift2 [00:52:06] is there stuff to render a graph of a wiki's category tree (of production wikis)? [00:52:34] I have no idea what PrivateSettings is or how to change it [00:53:31] it's in mediawiki-config repo [00:55:25] Reedy: how are we versioning that file anyway? [00:57:12] PROBLEM - Puppet freshness on search36 is CRITICAL: Puppet has not run in the last 10 hours [00:59:18] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [01:00:09] isn't it still in the old svn repo? [01:00:16] PROBLEM - Puppet freshness on search1012 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on search1017 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on sq52 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on search18 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on search27 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on sq73 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on sq77 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on sq72 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on sq80 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on sq83 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on virt8 is CRITICAL: Puppet has not run in the last 10 hours [01:00:16] PROBLEM - Puppet freshness on yttrium is CRITICAL: Puppet has not run in the last 10 hours [01:00:17] btw it won't test much if fe1 auth requests give fe2-4 storage urls [01:00:22] paravoid: ^ [01:00:37] ah, right [01:00:48] last time you did some hack to test it though, didn't you? [01:01:13] I may have use various CF_* function calls directly [01:01:15] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [01:01:15] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [01:01:15] PROBLEM - Puppet freshness on sq53 is CRITICAL: Puppet has not run in the last 10 hours [01:01:15] PROBLEM - Puppet freshness on sq51 is CRITICAL: Puppet has not run in the last 10 hours [01:01:15] PROBLEM - Puppet freshness on sq75 is CRITICAL: Puppet has not run in the last 10 hours [01:01:16] PROBLEM - Puppet freshness on virt1008 is CRITICAL: Puppet has not run in the last 10 hours [01:01:16] PROBLEM - Puppet freshness on sq84 is CRITICAL: Puppet has not run in the last 10 hours [01:01:17] PROBLEM - Puppet freshness on sq79 is CRITICAL: Puppet has not run in the last 10 hours [01:01:17] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [01:01:23] yeah, in svn [01:01:23] Last Changed Date: 2012-08-23 17:56:02 +0000 (Thu, 23 Aug 2012) [01:01:23] Text Last Updated: 2012-08-23 17:19:46 +0000 (Thu, 23 Aug 2012) [01:01:32] I don't even have an svn account [01:01:38] and I don't want any [01:01:42] or maybe I do, hrm. [01:01:47] rings a bell [01:02:18] PROBLEM - Puppet freshness on sq74 is CRITICAL: Puppet has not run in the last 10 hours [01:02:30] AaronSchulz: I can temporarily change the URL for the tests [01:02:34] I may still have that test code lying on a laptop somewhere [01:02:42] paravoid: maybe we set the hosts in srv193 [01:03:12] * AaronSchulz forgot if the storage url is to the lvs or to a specific box [01:03:14] lvs [01:04:16] paravoid: that's what you got with authing via curl? [01:04:41] no, that's what I get by seeing "default_swift_url" in the swauth config [01:08:31] yeah, it's the lvs [01:08:46] so we can run testwiki on that proxy before the other wikis [01:09:22] * AaronSchulz needs to go home :) [01:09:31] sure [01:09:36] we can continue that tomorrow [01:09:39] don't worry [01:09:43] go home :) [01:12:36] TimStarling: thanks for replying on RT #2108. I wasn't sure if my msg got through before my computer choked, but I'm guessing it did ;-) [01:12:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:15:12] PROBLEM - Puppet freshness on srv238 is CRITICAL: Puppet has not run in the last 10 hours [01:20:33] !log kaldari Started syncing Wikimedia installation... : [01:20:48] Logged the message, Master [01:26:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [01:34:33] paravoid: you don't need a svn account [01:35:27] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 16 seconds [01:35:42] have the wikivoyage extensions been reviewed for security? [01:36:07] Chris has reviewed most (all?) of them [01:36:28] https://www.mediawiki.org/wiki/Wikivoyage_migration/Extensions [01:38:52] heh, you guys saw the youtube live stream from 6th floor;) [01:39:16] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [01:41:36] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 263 seconds [01:43:24] New patchset: Dereckson; "Cleaning InitialiseSettings.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28655 [01:44:41] !log tstarling synchronized php-1.21wmf2/extensions/CentralAuth/CentralAuthHooks.php [01:44:57] Logged the message, Master [01:45:58] !log kaldari Finished syncing Wikimedia installation... : [01:46:08] Change abandoned: Dereckson; "Contains also a change already submitted in Ib8f36c49" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28655 [01:46:10] Logged the message, Master [01:46:25] !log tstarling synchronized php-1.21wmf2/extensions/CentralAuth/CentralAuthHooks.php [01:46:37] Logged the message, Master [01:48:03] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 13 seconds [01:50:10] !log tstarling synchronized php-1.21wmf2/extensions/CentralAuth/CentralAuthHooks.php [01:50:22] Logged the message, Master [01:53:06] !log tstarling synchronized php-1.21wmf2/extensions/CentralAuth/CentralAuthHooks.php [01:53:13] New patchset: Dereckson; "Cleaning InitialiseSettings.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28656 [01:53:18] Logged the message, Master [01:56:07] !log tstarling synchronized php-1.21wmf2/extensions/CentralAuth/CentralAuthHooks.php [01:56:20] Logged the message, Master [01:59:25] !log tstarling synchronized php-1.21wmf2/includes/User.php [01:59:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:59:37] Logged the message, Master [02:00:12] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [02:00:12] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [02:10:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.629 seconds [02:10:40] !log tstarling synchronized php-1.21wmf2/includes/User.php [02:10:55] Logged the message, Master [02:23:09] !log tstarling synchronized php-1.21wmf2/extensions/AbuseFilter/AbuseFilter.class.php [02:23:22] Logged the message, Master [02:27:46] !log tstarling synchronized php-1.21wmf2/extensions/AbuseFilter/AbuseFilter.class.php [02:27:57] !log LocalisationUpdate completed (1.21wmf2) at Fri Oct 19 02:27:57 UTC 2012 [02:27:58] Logged the message, Master [02:28:11] Logged the message, Master [02:38:09] RECOVERY - Puppet freshness on sq77 is OK: puppet ran at Fri Oct 19 02:37:58 UTC 2012 [02:38:37] RECOVERY - Puppet freshness on sq81 is OK: puppet ran at Fri Oct 19 02:38:11 UTC 2012 [02:40:06] RECOVERY - Puppet freshness on sq48 is OK: puppet ran at Fri Oct 19 02:39:45 UTC 2012 [02:40:06] RECOVERY - Puppet freshness on sq75 is OK: puppet ran at Fri Oct 19 02:39:53 UTC 2012 [02:40:06] RECOVERY - Puppet freshness on search1021 is OK: puppet ran at Fri Oct 19 02:39:54 UTC 2012 [02:40:33] RECOVERY - Puppet freshness on sq53 is OK: puppet ran at Fri Oct 19 02:40:18 UTC 2012 [02:41:36] RECOVERY - Puppet freshness on sq69 is OK: puppet ran at Fri Oct 19 02:41:30 UTC 2012 [02:42:41] RECOVERY - Puppet freshness on williams is OK: puppet ran at Fri Oct 19 02:42:23 UTC 2012 [02:44:10] RECOVERY - Puppet freshness on sq73 is OK: puppet ran at Fri Oct 19 02:43:36 UTC 2012 [02:44:18] RECOVERY - Puppet freshness on sq67 is OK: puppet ran at Fri Oct 19 02:44:04 UTC 2012 [02:44:41] RECOVERY - Puppet freshness on sq83 is OK: puppet ran at Fri Oct 19 02:44:24 UTC 2012 [02:44:45] RECOVERY - Puppet freshness on snapshot3 is OK: puppet ran at Fri Oct 19 02:44:35 UTC 2012 [02:44:56] RECOVERY - Puppet freshness on sq82 is OK: puppet ran at Fri Oct 19 02:44:42 UTC 2012 [02:45:57] RECOVERY - Puppet freshness on sq51 is OK: puppet ran at Fri Oct 19 02:45:44 UTC 2012 [02:46:33] RECOVERY - Puppet freshness on search20 is OK: puppet ran at Fri Oct 19 02:46:31 UTC 2012 [02:47:10] RECOVERY - Puppet freshness on erzurumi is OK: puppet ran at Fri Oct 19 02:47:01 UTC 2012 [02:47:18] RECOVERY - Puppet freshness on sq66 is OK: puppet ran at Fri Oct 19 02:47:06 UTC 2012 [02:47:39] RECOVERY - Puppet freshness on yttrium is OK: puppet ran at Fri Oct 19 02:47:20 UTC 2012 [02:47:54] !log LocalisationUpdate completed (1.21wmf1) at Fri Oct 19 02:47:49 UTC 2012 [02:48:02] Logged the message, Master [02:49:07] RECOVERY - Puppet freshness on analytics1015 is OK: puppet ran at Fri Oct 19 02:48:41 UTC 2012 [02:49:37] RECOVERY - Puppet freshness on virt4 is OK: puppet ran at Fri Oct 19 02:49:20 UTC 2012 [02:50:36] RECOVERY - Puppet freshness on search36 is OK: puppet ran at Fri Oct 19 02:50:28 UTC 2012 [02:51:05] RECOVERY - Puppet freshness on sq58 is OK: puppet ran at Fri Oct 19 02:50:40 UTC 2012 [02:51:39] RECOVERY - Puppet freshness on sq86 is OK: puppet ran at Fri Oct 19 02:51:11 UTC 2012 [02:52:06] RECOVERY - Puppet freshness on sq79 is OK: puppet ran at Fri Oct 19 02:51:52 UTC 2012 [02:52:33] RECOVERY - Puppet freshness on lvs4 is OK: puppet ran at Fri Oct 19 02:52:28 UTC 2012 [02:54:43] RECOVERY - Puppet freshness on sq36 is OK: puppet ran at Fri Oct 19 02:54:29 UTC 2012 [02:55:06] RECOVERY - Puppet freshness on search27 is OK: puppet ran at Fri Oct 19 02:54:54 UTC 2012 [02:55:28] RECOVERY - Puppet freshness on sq84 is OK: puppet ran at Fri Oct 19 02:55:08 UTC 2012 [02:55:33] RECOVERY - Puppet freshness on search16 is OK: puppet ran at Fri Oct 19 02:55:29 UTC 2012 [02:56:12] RECOVERY - Puppet freshness on virt8 is OK: puppet ran at Fri Oct 19 02:55:45 UTC 2012 [02:57:01] !log tstarling synchronized php-1.21wmf2/includes/User.php [02:57:13] Logged the message, Master [02:57:40] RECOVERY - Puppet freshness on search1012 is OK: puppet ran at Fri Oct 19 02:57:04 UTC 2012 [02:58:09] RECOVERY - Puppet freshness on snapshot2 is OK: puppet ran at Fri Oct 19 02:57:56 UTC 2012 [02:59:25] !log tstarling synchronized php-1.21wmf2/extensions/AbuseFilter/AbuseFilter.class.php [02:59:39] Logged the message, Master [03:00:39] RECOVERY - Puppet freshness on sq72 is OK: puppet ran at Fri Oct 19 03:00:23 UTC 2012 [03:00:51] RECOVERY - Puppet freshness on mw13 is OK: puppet ran at Fri Oct 19 03:00:35 UTC 2012 [03:01:06] RECOVERY - Puppet freshness on sq68 is OK: puppet ran at Fri Oct 19 03:00:53 UTC 2012 [03:01:34] RECOVERY - Puppet freshness on tarin is OK: puppet ran at Fri Oct 19 03:01:23 UTC 2012 [03:02:09] RECOVERY - Puppet freshness on search1013 is OK: puppet ran at Fri Oct 19 03:01:57 UTC 2012 [03:03:03] RECOVERY - Puppet freshness on sq74 is OK: puppet ran at Fri Oct 19 03:02:54 UTC 2012 [03:03:03] RECOVERY - Puppet freshness on search18 is OK: puppet ran at Fri Oct 19 03:02:56 UTC 2012 [03:03:39] RECOVERY - Puppet freshness on search17 is OK: puppet ran at Fri Oct 19 03:03:26 UTC 2012 [03:03:48] RECOVERY - Puppet freshness on mw14 is OK: puppet ran at Fri Oct 19 03:03:36 UTC 2012 [03:03:59] RECOVERY - Puppet freshness on search1017 is OK: puppet ran at Fri Oct 19 03:03:42 UTC 2012 [03:04:33] RECOVERY - Puppet freshness on srv238 is OK: puppet ran at Fri Oct 19 03:04:20 UTC 2012 [03:05:10] RECOVERY - Puppet freshness on sq52 is OK: puppet ran at Fri Oct 19 03:04:57 UTC 2012 [03:06:39] RECOVERY - Puppet freshness on sq80 is OK: puppet ran at Fri Oct 19 03:06:08 UTC 2012 [03:06:39] RECOVERY - Puppet freshness on sq57 is OK: puppet ran at Fri Oct 19 03:06:14 UTC 2012 [03:07:06] RECOVERY - Puppet freshness on sq70 is OK: puppet ran at Fri Oct 19 03:06:49 UTC 2012 [03:07:06] RECOVERY - Puppet freshness on virt1008 is OK: puppet ran at Fri Oct 19 03:06:58 UTC 2012 [03:07:06] RECOVERY - Puppet freshness on capella is OK: puppet ran at Fri Oct 19 03:07:00 UTC 2012 [03:07:51] RECOVERY - Puppet freshness on sq44 is OK: puppet ran at Fri Oct 19 03:07:37 UTC 2012 [03:08:36] RECOVERY - Puppet freshness on search24 is OK: puppet ran at Fri Oct 19 03:08:10 UTC 2012 [03:16:41] New patchset: Dereckson; "Unit testing for InitialiseSettings.php (WIP - DO NOT MERGE)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28627 [03:21:34] New patchset: Dereckson; "Unit testing for InitialiseSettings.php (WIP - DO NOT MERGE)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28627 [03:24:06] New review: Dereckson; "PS1,2: first draft" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/28627 [04:15:10] !log tstarling synchronized php-1.21wmf2/includes/User.php [04:15:23] Logged the message, Master [04:51:12] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [05:37:41] New review: Liangent; "I don't understand what the above comment mean." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28655 [05:58:15] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [06:24:12] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:24:12] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [06:58:59] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [08:24:02] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [08:56:21] New review: Dereckson; "It means I abandoned the change (to republish it properly): this were a difference between master an..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28655 [09:12:03] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [09:18:05] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [09:48:56] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [10:39:05] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [11:39:58] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [12:00:56] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [12:00:56] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [12:31:39] New review: Hashar; "Thanks for taking care of that. Looks like you are on the right direction :-]" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/28627 [13:48:45] !log reedy synchronized php-1.21wmf2/includes/EditPage.php [13:48:57] Logged the message, Master [13:50:36] !log shutting down msbe1001-1012 to remove ssd's and package for dell return [13:50:46] Logged the message, Master [13:52:14] PROBLEM - Host ms-be1007 is DOWN: PING CRITICAL - Packet loss = 100% [13:52:35] PROBLEM - Host ms-be1006 is DOWN: PING CRITICAL - Packet loss = 100% [13:53:53] PROBLEM - Host ms-be1005 is DOWN: PING CRITICAL - Packet loss = 100% [13:59:18] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: metawiki to 1.21wmf2 [13:59:26] Logged the message, Master [14:14:36] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: commonswiki to 1.21wmf2 [14:14:51] Logged the message, Master [14:19:45] PROBLEM - Host ms-be1011 is DOWN: PING CRITICAL - Packet loss = 100% [14:20:00] PROBLEM - Host ms-be1009 is DOWN: PING CRITICAL - Packet loss = 100% [14:20:00] PROBLEM - Host ms-be1010 is DOWN: PING CRITICAL - Packet loss = 100% [14:43:39] !log putting ms-fe1 back into the pool [14:43:48] Logged the message, Master [14:51:47] New review: Hashar; "Can you please add a bit more context in the commit message? Thanks!" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/27830 [14:52:05] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [14:54:22] New patchset: Mark Bergsma; "Add Range support to Varnish in streaming mode" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/28379 [14:56:47] New patchset: Hashar; "Removing pt.wikimedia configuration" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28636 [14:57:46] New review: Hashar; "gave a bit more context in commit message and rebased change." [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/28636 [14:57:46] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28636 [14:59:42] New review: Hashar; "Which fix up Bug 41133 - beta all.dblist is a live hack" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28642 [15:08:08] PROBLEM - Host srv223 is DOWN: PING CRITICAL - Packet loss = 100% [15:37:43] how interesting [15:37:45] no leak that I can see. [15:38:44] what did you do? [15:38:46] upgrade to 1.7.4 :) [15:38:52] aha [15:39:06] but it's too early to tell for sure [15:39:44] i'm happy too, i've got the range requests working [15:39:57] oh really? [15:39:57] so next week we should be able to deploy varnish again [15:40:07] with your patch? [15:40:13] yes [15:40:14] cool! [15:40:43] https://gerrit.wikimedia.org/r/#/c/28379/2 [15:40:47] feel free to review/comment/etc [15:40:51] I'm looking at it right now [15:44:28] nitpick [15:44:29] if (content_len >= 0 && content_len != LONG_MAX) [15:44:32] shouldn't this be SSIZE_MAX? [15:45:00] it's the same in amd64 obviously [15:45:04] (hence nitpick) [15:45:11] but I'm not sure about other platforms [15:45:29] no, it's strtol [15:45:48] oh hrm [15:45:56] if strtol says it'll return LONG_MAX on overflow, then i should check that [15:45:59] right [15:46:04] as long as it fits into ssize_t :) [15:46:12] yes [15:46:17] but yeah, let's hope no platforms define ssize_t as int [15:46:27] let's hope so ;) [15:47:16] okay, it's not like I understand varnish's code [15:47:26] so I focused on the good ol' stuff that I know and may have gone wrong [15:47:30] data types and such :) [15:47:41] that's good [15:47:56] because I've changed that around a fair bit while working on it, so always good to have someone recheck that afresh [15:49:16] <^demon> mark: Stackoverflow says that strtol returns LONG_MAX on overflow: http://stackoverflow.com/questions/5493235/strtol-returns-an-incorrect-value [15:49:30] that's what he said [15:49:46] <^demon> I was agreeing :) [15:50:00] stackoverflow meh [15:50:10] just look at the manpage man [15:51:23] oh paravoid :-] Good morning! [15:51:42] paravoid: regarding Gallium upgrade, will you be in charge of it ? If so can we schedule it for next week ? [15:54:26] oh gallium [15:54:27] sigh [15:54:53] !log msw-a2-eqiad is coming down for replacement per RT 3683. all mgmt connectivity (only) will be lost to items in A2-EQIAD [15:55:05] mark: so, what the behavior now, after your patch & VCL changes? I got confused again :) [15:55:05] Logged the message, RobH [15:55:07] mark LeslieCarr ^ [15:55:20] (i like to notify you guys when i go yanking switches, even mgmt) [15:55:43] it is a bit confusing paravoid [15:55:52] the idea is: [15:55:57] we define a threshold at say 64 MB [15:56:33] if a range request comes in with a low range below that threshold, it will cause varnish to fetch the entire object from the backend [15:56:36] but it will also enable streaming mode [15:56:49] and that means that with this new varnish patch, it will deliver the range as soon as it comes in [15:57:01] which should be quick enough given that the threshold is set fairly low [15:57:16] ABOVE that threshold, waiting for that range to come in could take far too long [15:57:16] so, if the client seeks to, say, 0:10, it'll wait a few seconds more to see it [15:57:20] especially on very large files [15:57:21] but that's okay [15:57:35] right [15:57:36] so THOSE will be requested from the backend _as range requests_, in pass mode, and won't get cached [15:57:59] this means that low ranges will cause entire objects to get cached, with good performance for the client requesting them [15:57:59] nod [15:58:06] ah, I forgot [15:58:06] high ranges will either be served from the cache, or be passed through [15:58:14] also with good enough performance [15:58:20] I have something that you'll love [15:58:21] https://bugs.launchpad.net/swift/+bug/1065869 [15:58:30] Any GET with a Range always sends two GETs to the backend object server - one with the range and one without. [15:58:33] swift bug [15:58:44] heh [15:58:50] nice [15:58:59] PROBLEM - Host ps1-a2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [15:58:59] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [15:59:18] so, yeah, what you say makes sense to me [15:59:29] it's the best we can do now I think [15:59:32] it's also what squid does at the moment [15:59:39] so at least it will no longer be worse than squid [16:02:15] yeah, it'd be nice to finally ditch squid [16:02:17] PROBLEM - Swift HTTP on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:26] PROBLEM - Host ms-be1008 is DOWN: PING CRITICAL - Packet loss = 100% [16:02:37] PROBLEM - Host ms-be1001 is DOWN: PING CRITICAL - Packet loss = 100% [16:02:37] PROBLEM - Swift HTTP on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:37] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:40] RobH: that you? [16:02:44] PROBLEM - Host ms-be1002 is DOWN: PING CRITICAL - Packet loss = 100% [16:03:05] PROBLEM - Host ms-be1012 is DOWN: PING CRITICAL - Packet loss = 100% [16:03:05] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:03:11] PROBLEM - Host ms-be1003 is DOWN: PING CRITICAL - Packet loss = 100% [16:03:19] mark: btw we need to port some of the squid stanzas to varnish too [16:03:28] like IP blocks, UA blocks, regexp URL blocks and such [16:04:03] and more importantly, in the upload varnishes, block PUT/POST/DELETE [16:04:08] well do we still need them [16:04:48] some swift measures like PUT/POST/DELETE and blocking X-Auth-Token it'd be nice to have [16:04:48] for the rest, no idea [16:04:54] they were there before my time :) [16:05:09] if (req.request != "GET" && req.request != "HEAD" && req.request != "POST" && req.request != "PURGE") { [16:05:09] /* We only deal with GET, HEAD and POST by default */ [16:05:09] error 403 "HTTP method not allowed."; [16:05:10] } [16:05:18] so just need to block POST [16:05:22] PUT you mean [16:05:34] ah no POST [16:05:34] and DELETE [16:05:37] mark - just back-scrolled and read range request works for varnish :-) [16:05:38] delete and put are already blocked [16:05:42] woosters: yes [16:05:48] woot! [16:07:32] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:09:40] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:15:47] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:21:16] * mark yawns [16:21:17] i need some fresh air [16:21:20] so I think i'll call it a day [16:21:37] !log msw-a2-eqiad back online [16:21:38] RECOVERY - Host ps1-a2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 28.51 ms [16:21:44] mark: have a nice weekend [16:21:49] thanks [16:21:49] Logged the message, RobH [16:21:52] you too [16:21:53] c'ya [16:22:03] bye mark [16:24:56] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [16:24:56] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [16:25:34] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:25:54] paravoid: so yeah gallium :-]  If only there was less production issues ;-D [16:26:17] paravoid: if you get too busy with swift / network and all, maybe someone else can take care of it ? [16:37:43] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:23] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:43:41] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:44:17] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:47:15] New review: MaxSem; "Chris have you done demoing? If so, please abandon." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/27449 [16:51:09] Change abandoned: Cmcmahon; "pointless comment, abandoning" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/27449 [16:53:26] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:55:56] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:56:44] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:00:05] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [17:06:29] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:11:31] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:12:14] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:17:53] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:47] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:30:11] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:32:36] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 229 seconds [17:34:14] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:32] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:42:29] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:28] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:51:20] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:52:05] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 20 seconds [17:52:14] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:54:02] Change abandoned: CSteipp; "Continued on Matthias'es change https://gerrit.wikimedia.org/r/#/c/28238/" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28375 [17:54:49] ^demon: we put a tentative deploy window for wikidata.org on Wednesday from 11am-1pm EDT (8am-10am PDT). Does that work for you? [17:56:51] Jeff_Green: ready to put the firewalls back locked down on the payments <-> administrations zone ? [17:57:49] yah--have you gone through again to figure out why it was breaking logging? [17:58:32] <^demon> robla: Wednesday as in this coming Wednesday? Yeah, sounds fine. [17:58:54] yup. About to send mail to the ops list about this to make sure we're all aligned on this one. [17:59:56] oh that close? [18:00:18] cool [18:00:18] it beats wikivoyage! [18:01:17] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:04:27] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:13:55] notpeter: want to send out a little email like "here's some things to look out for/what i learned during rotation week" ? [18:18:32] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:19:16] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:22:25] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:23:17] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:24:56] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [18:25:41] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:56] <^demon> paravoid: Wikidata team told us they'd be sad if wikivoyage beat them :) [18:29:02] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:29:46] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 255 seconds [18:32:17] LeslieCarr: sure! I'm about to write some stuff up :) [18:32:27] woot [18:32:57] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 295 seconds [18:33:02] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:39:29] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 217 seconds [18:41:20] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:43:41] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:45:56] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [18:50:53] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 307 seconds [18:51:02] PROBLEM - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:55:41] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 194 seconds [18:59:03] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 216 seconds [18:59:37] New patchset: CSteipp; "Add wikivoyage docroot" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28699 [19:01:32] PROBLEM - Host srv221 is DOWN: PING CRITICAL - Packet loss = 100% [19:01:46] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:04:06] AaronSchulz: ping? [19:04:41] I found an image on the officewiki that doesn't load and I suspect it's the 200 OK header/404 body [19:05:26] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 20 seconds [19:06:26] hm maybe not [19:06:36] thumb works, original isn't [19:06:49] it's over img_auth.php [19:09:27] ok, img_auth.php is buggy [19:11:26] PROBLEM - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:12:57] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [19:12:57] ACKNOWLEDGEMENT - Swift HTTP on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn not yet in production [19:13:14] ACKNOWLEDGEMENT - Swift HTTP on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn not yet in production [19:13:14] ACKNOWLEDGEMENT - Swift HTTP on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn not yet in production [19:13:14] ACKNOWLEDGEMENT - Swift HTTP on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn not yet in production [19:14:26] paravoid: buggy? I'm using it on my testwikis right now, or some of them at least [19:14:52] daniel_zahn is not in production? ;) [19:18:15] we lost two image scalers [19:18:23] and the rest are melting [19:18:56] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [19:20:20] paravoid: lemme know if there's anything i can do to help [19:21:13] thanks, I'm just going to power cycle them [19:21:36] we haven't got an LVS alert for rendering yet [19:21:44] but we got reports that URLOpen times out [19:21:58] AaronSchulz: heh, yeah, that is almost as good as when i put "repeatedly went down in the past" in there meaning the server [19:23:37] ok, srv223 is unreachable in its mgmt too [19:23:41] let's hope srv221 recovers [19:25:11] !log powercycling srv221, locked up, nothing in serial console [19:25:24] Logged the message, Master [19:26:05] !log powercycling srv223 too, locked up, nothing in serial console [19:26:05] wtf. [19:26:13] Logged the message, Master [19:28:15] both back up [19:28:41] RECOVERY - Host srv221 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [19:28:59] RECOVERY - Host srv223 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [19:33:38] PROBLEM - Apache HTTP on srv223 is CRITICAL: Connection refused [19:33:55] !log aaron synchronized php-1.21wmf2/includes/filebackend/FileBackendStore.php 'deployed 65e47037f6e9da2e81db2bc1374cdac375a82db5' [19:34:07] Logged the message, Master [19:35:23] 2012-10-19 19:26:35.923697 [rendering] Leaving previously pooled but down server srv221.pmtpa.wmnet pooled [19:35:37] it appears that two failed imagescalers is one too many [19:37:43] AaronSchulz: btw, I pooled ms-fe1 this evening [19:38:02] yeah I saw, how is the memory doing? [19:38:10] looks okay [19:38:26] I'm looking at the socket count in this weird leftover state [19:38:26] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [19:38:45] and it's increasing but barely so [19:39:12] so now it's 28 in about 5 hours [19:39:23] if fluctuates too [19:39:37] so it's either that it's fixed in most but not all of the cases [19:39:54] or that it's being aggravated by something, e.g. 499s [19:39:58] we'll see [19:44:56] when are you doing the others? [19:45:08] next week if this doesn't blow up [19:50:06] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [20:10:26] RECOVERY - Puppet freshness on stat1 is OK: puppet ran at Fri Oct 19 20:10:10 UTC 2012 [20:18:54] !log reedy synchronized php-1.21wmf2/includes/api/ApiParse.php [20:19:06] Logged the message, Master [20:21:29] New patchset: Dzahn; "let CT send Nagios host and service commands" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28741 [20:22:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28741 [20:31:58] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [20:40:05] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [20:49:33] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.002 second response time on port 11000 [21:02:21] !log upgrading percona-toolkit to 2.1.5 [21:02:35] Logged the message, Master [21:10:37] !log pulled srv194 and srv200 from lvs for memc testing [21:10:47] Logged the message, Master [21:24:37] New review: Kaldari; "Switching en.wikinews from ReaderFeedback to ArticleFeedback might not be an easy sell. ReaderFeedba..." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/27830 [21:36:24] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 244 seconds [21:43:31] New review: Aaron Schulz; "The "feedback" form is custom site JS that posts comments to an opinion page." [operations/mediawiki-config] (master); V: 0 C: 1; - https://gerrit.wikimedia.org/r/27830 [21:54:07] binasher: it would really be nice to be able to prune profile calls from the graphite suggestion list that have not occurred in a month or so [21:56:09] hmm [21:56:09] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 11 seconds [21:56:18] is there any reason to keep such data around at all? [21:57:25] how many months do we need? [22:01:55] 3? [22:02:01] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [22:02:01] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [22:10:41] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 281 seconds [22:18:18] binasher: i made you the owner of #3759, but i'm not sure if i was supposed to or if that's something you guys (i.e., ops) do internally. sorry if that's presumptuous. [22:18:24] on rt, that is [22:28:41] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [22:29:29] paravoid: how is the new hardware coming along? [22:33:37] aaronschulz - do u mean the c2100 replacements? The 12 of them for Tampa has arrived...yesterday [22:39:42] Change merged: Asher; [operations/debs/varnish] (master) - https://gerrit.wikimedia.org/r/27662 [22:41:15] woosters: do you know how the eqiad ones are coming along? [22:41:30] just sent in to Erik for approval [22:50:23] !log authdns-update, renaming travel-guide-lb to wikivoyage-lb [22:50:37] Logged the message, Master [22:53:41] binasher: http://tracker.newdream.net/issues/3081hmm [23:11:26] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 216 seconds [23:15:18] !log returning srv194/200 to apache lvs pool [23:15:31] Logged the message, Master [23:20:01] New patchset: CSteipp; "Remove SlippyMaps from Wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28782 [23:23:46] Change merged: Matthias Mullie; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28782 [23:27:47] !log reedy synchronized php-1.21wmf2/includes/api/ApiParse.php [23:27:59] Logged the message, Master [23:30:50] !log shutting down sfo-monitor1 [23:30:55] !log reedy synchronized php-1.21wmf2/extensions/SwiftCloudFiles [23:30:59] Logged the message, Mistress of the network gear. [23:31:10] Logged the message, Master [23:36:53] !log reedy synchronized wmf-config/CommonSettings.php 'wgMaxImageArea to 17MP' [23:37:06] Logged the message, Master [23:46:57] LeslieCarr: do you happen to know what's going at UTC 1200 on the esams bits daily? [23:47:06] it's looking like varnish just vomits [23:47:10] really ? [23:47:19] :( [23:47:29] ya; cache misses goes to 12M or so [23:47:40] it's... unique [23:48:11] :( [23:59:56] LeslieCarr: it looks like it happens as eqiad as well