[00:06:51] whats the most up to date visual of our cluster that runs mw? [00:10:12] Change merged: Aaron Schulz; [operations/mediawiki-multiversion] (master) - https://gerrit.wikimedia.org/r/34231 [00:10:22] tfinc: the one you draw for us. thanks! [00:10:48] binasher: i figured as much [00:10:57] * AaronSchulz reminds binasher about https://gerrit.wikimedia.org/r/#/c/34648/ [00:15:40] New patchset: Aaron Schulz; "Enable transcoding on all wikis that allow uploads" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33542 [00:16:04] New review: Aaron Schulz; "Why does this need to depend on wgEnableUploads, can't the software just check that?" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/33542 [00:18:52] AaronSchulz: that change isn't backwards compatible with the old schema, is it? [00:21:09] Why? It's only adding a column.. [00:21:12] New patchset: Dereckson; "(bug 41785) Logo for ml.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35334 [00:23:12] preilly - we just created a new queue for access request and it was set to 'limited' access; I think mutante explained to you earlier. [00:23:32] woosters: yeah, he explained it [00:26:18] woosters: thanks for making sure that I knew about it [00:30:42] binasher: it should be [00:30:55] old code will leave out the column and it will be NULL [00:31:09] * AaronSchulz was just double checking the BLOB behavior with a test table [00:31:29] AaronSchulz: i meant new code with old schema [00:31:48] oh, yeah, well then it wouldn't work ;) [00:32:06] the db change would be done first [00:33:42] AaronSchulz: i was just asking since the jobq schema change requirement seemed to upset some people running from master (at translatewiki?).. but whatever, it's in MysqlUpdater [00:34:20] binasher: well one of those people added +1 to this ;) [00:35:32] wasn't the job queue change in the updater too? [00:35:44] PROBLEM - Lucene on search1015 is CRITICAL: Connection refused [00:38:46] though, might aswell wait till I've cleared the crap out of the tables again.. [00:53:22] RECOVERY - Lucene on search1015 is OK: TCP OK - 0.031 second response time on port 8123 [01:22:19] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [01:39:43] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 272 seconds [01:39:52] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 268 seconds [01:44:40] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [01:46:01] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:49:39] New patchset: Dereckson; "(bug 41785) Set upload file URL on en.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35341 [01:52:09] New review: Tychay; "I feel your pain. It does not die with you (or this patch). :-)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/27830 [01:52:10] New patchset: Dereckson; "(bug 42263) Set upload file URL on en.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35341 [01:53:37] New review: Dereckson; "PS2: fixing bug number" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/35341 [01:55:51] New patchset: Kaldari; "Switching back to http flickr API per bug 42468" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35343 [01:56:51] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35343 [01:57:58] New patchset: Pyoungmeister; "coredb: more reasonable inhritence model" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35344 [02:00:34] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 279 seconds [02:00:34] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 279 seconds [02:51:16] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [02:51:16] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [02:51:16] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [03:07:19] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [03:07:19] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [03:25:46] PROBLEM - Lucene on search1016 is CRITICAL: Connection refused [03:33:35] Lucene issues again? sigh. [03:34:32] Where? [03:35:13] above, 10 minutes ago. [03:35:19] nagios-wm. [03:36:01] Oh. I ignore nagios. [03:53:04] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [03:56:47] do we actually use those eqiad servers? [03:58:14] !log synchronized wmf-config/reporting-setup.php "Setting S:FundraiserStatistics to use db1013 as the slave" [03:59:43] * pgehres misses morebots [04:00:00] !log restarting lucene-search-2 on search1016 due to init script bug [04:00:52] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.027 second response time on port 8123 [04:01:02] looks like I didn't get around to committing my init script fix [04:02:40] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [04:03:00] Ryan_Lane: can you kick morebots? [04:08:13] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [04:10:36] TimStarling: no, it's currently inactive [04:10:46] where it = eqiad search [04:11:08] !log synchronized wmf-config/reporting-setup.php "Setting S:FundraiserStatistics to use db1013 as the slave" [04:11:16] Logged the message, Master [04:11:37] !log TimStarling: restarting lucene-search-2 on search1016 due to init script bug [04:11:44] Logged the message, Master [04:11:52] New patchset: Tim Starling; "Fixed init stop to wait for the process to stop" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/35345 [04:12:18] see above for why search1016 was not working [04:13:06] ah, makes sense! [04:13:38] I assume this is one reason for https://bugzilla.wikimedia.org/show_bug.cgi?id=42423 ? [04:16:01] probably [04:16:23] I'll add a comment for myself, thanks [04:27:25] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [04:34:40] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.007 second response time on port 11000 [05:36:21] PROBLEM - Lucene on search13 is CRITICAL: Connection timed out [05:41:00] RECOVERY - Lucene on search13 is OK: TCP OK - 2.997 second response time on port 8123 [06:29:18] New review: Tim Starling; "Rebase fails." [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/33037 [07:07:03] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [07:28:52] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [08:47:21] New review: Nikerabbit; "Forgot to deploy this?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35343 [08:50:38] New patchset: Siebrand; "Revert "Switching back to http flickr API per bug 42468"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35359 [08:50:56] New review: Siebrand; "Simple revert." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/35359 [08:51:21] aw [08:56:09] New review: Hashar; "Original change has not been deployed so it get reverted :)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/35359 [08:56:09] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35359 [09:03:19] New patchset: Nikerabbit; "(bug 42280) Temporary disable WebFonts on fa.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35145 [09:03:37] Change merged: Nikerabbit; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35145 [09:13:01] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [10:01:40] New review: Hashar; "-1 still needs logic simplification" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/8120 [10:09:00] New patchset: ArielGlenn; "start of role class for media, dump mirrors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35360 [10:10:12] New review: Hashar; "the new git::extension() duplicate codes from git::clone(). What about calling git::clone() instead?" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/35173 [10:19:20] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35360 [10:42:36] Hi! What's wrong with bastion? It keeps disconnecting my ssh sessions. (It's not my internet connection.) [10:44:13] bast1001? I'm on it and it's not bouncing me (and I have a horrible connection) [10:44:30] can it be something between you and there? [10:44:56] bastion1.pmtpa.wmflabs [10:45:01] ah [10:45:10] sorry, I was talking about the production host [10:45:12] I dont have this problem connecting to somewhere else [10:46:59] yeah, I don't know about labs what might be going on [10:47:00] is this the right channel for the labs bastion, too? [10:47:06] ah ok [10:47:07] um there's a labs channel [10:47:37] yes, but it's quiet around there. [10:48:42] I'll keep asking. :) [10:48:50] ok, good luck [10:49:02] thx [10:54:13] New review: Silke Meyer; "Hi hashar, in https://gerrit.wikimedia.org/r/#/c/30593/ you proposed to find a more elegant solution..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/35173 [10:54:20] New patchset: ArielGlenn; "add media mirror cron script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35366 [10:55:59] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35366 [11:00:30] New patchset: ArielGlenn; "test mirror role class on ms1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35367 [11:01:28] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35367 [11:11:24] New patchset: ArielGlenn; "fix up the mirror classes structure" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35368 [11:12:06] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35368 [11:13:22] apergos: I'm fine with the plan [11:13:30] γρεατ [11:13:31] are all the servers racked up? [11:13:32] er [11:13:33] great [11:13:49] I bet not, I think they get racked in place of the old servers [11:13:51] and remember to keep 4 servers out for now, for the ceph lab [11:13:59] so we decommision two, rack two, deploy two [11:14:24] wait, 4 in tampa? of msbe1-12? [11:14:32] I saw the "two" part, it's a bit risky [11:14:37] yep [11:14:57] also, put ms-be3 in the front of the queue [11:14:59] it has died twice [11:15:04] ah, didn't know that [11:15:12] yeah, I had to powercycle it [11:15:16] ok [11:15:26] so do you care if your boxes have ssds? for ceph [11:15:27] two times in two months, but still [11:15:34] yes, I do :) [11:15:46] it'd be nice to have SSDs in that cluster [11:16:00] maybe we should just do all of em with ssds [11:16:04] in the end [11:16:22] if we have them on hand for all 12. [11:18:43] New patchset: Dereckson; "(bug 40872) Enable NewUserMessage on gu.wikipedia & gu.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35369 [11:22:54] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [11:23:26] New patchset: ArielGlenn; "environment vars want = in cron stanza" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35373 [11:24:00] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35373 [11:28:20] New patchset: ArielGlenn; "ms10 and ms1001 in one stanza" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35375 [11:29:09] New patchset: Dereckson; "(bug 41880) Import sources configuration on te.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35376 [11:29:56] New patchset: ArielGlenn; "ms10 and ms1001 in one stanza" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35375 [11:30:01] what are ms10/ms1001 used for? [11:30:16] rsync ro external media mirrors [11:30:41] and we'll be building local media bundles on those hosts too [11:31:14] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35375 [11:32:19] how was your paris trip? [11:35:08] good! [11:39:43] New patchset: ArielGlenn; "for 'small' wikis only sync remote (hash)dirs that exist remotely" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/35378 [11:40:23] who's on rt duty this week? [11:40:31] noone? :) [11:40:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:41:05] so, I'm trying to watch http://commons.wikimedia.org/wiki/File:The_Impact_Of_Wikipedia.webm [11:41:10] and the video never plays [11:41:23] is it a local issue/pebkac? [11:41:34] or should I start a round of tcpdumps... [11:41:36] works fine for me [11:45:14] i'm still not seeing any banners? [11:45:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.662 seconds [11:45:50] didn't you see the latest mails? [11:46:04] not yet, just reading my email [11:46:12] sigh [11:46:16] non en in april [11:46:30] I cannot get that video to play for me either btw [11:46:35] so apparently only english-speaking countries are going to be in this round [11:46:42] and we'll have another round of FR in April [11:46:44] are you kidding me? [11:46:47] nope [11:46:51] I wish I was. [11:47:00] they want to give lots of time for translating tailored messages etc [11:47:26] this is their plan going forward (i.e. not just this year) but you'll see it in the emails [11:47:39] we've been postponing work for after October, it's almost December [11:47:39] ok good [11:47:45] then i'm continuing my varnish work now [11:47:55] in esams? :) [11:47:59] anywhere [11:48:23] this video plays for me [11:49:01] downloaded it works fine of course, so it's something about my plugins or ff prolly [11:49:34] * apergos tries chrome [11:50:34] chrome works, ff not. meh [11:50:44] huh [11:50:51] FF, Opera, IE WFM [11:50:52] just finished my mail, didn't see anything? whom was it from? [11:51:08] zack? [11:51:19] Zack [11:51:23] where? [11:51:29] see wmfall [11:51:46] Fundraiser launch update [11:51:55] it's also archived on wikimedia-l but I don't subscribe there [11:52:21] oh that mail [11:52:24] read that yesterday [11:52:25] that is to say [11:52:32] i didn't really read it except the first line :P [11:52:34] "read" :) [11:52:37] :-D [11:53:14] there was no TLDR in there [11:53:24] nope [11:53:52] lol [11:53:57] I know why FF refuses to play it [11:54:03] oh? [11:54:24] HTTP/1.1 206 Partial Content [11:54:24] Server: nginx/1.1.19 [11:54:24] Date: Tue, 27 Nov 2012 11:50:32 GMT [11:54:24] Content-Type: text/plain [11:54:30] lol [11:54:32] a text/plain video [11:54:35] nice [11:54:55] did I say, "we're not really ready for this yet"? ;) [11:55:01] yes you did [11:55:08] :facepalm: [11:56:35] hm, strange, the 200 OK to curl says audio/webm [11:56:40] (which is still wrong, but better) [11:57:02] where was that nifty little script of yours mark? [11:57:08] what's it say to chrome? something different ? [11:57:11] in my homedir on fenari [11:57:24] there's a range version as well [11:57:38] params are: ip address to use, port, http host header, URI [11:57:47] and the range header content after that if you use the range version [11:57:55] hm, also audio/webm with curl -r [11:57:58] strange [11:58:07] firstbyte.py and firstbyte-range.py [12:07:31] i'm gonna request some ripe ip space [12:11:11] HTTP/1.0 200 OK [12:11:11] Content-Type: text/plain [12:11:11] Content-Length: 33134247 [12:11:12] Accept-Ranges: bytes [12:11:12] Server: nginx/0.7.65 [12:11:15] nginx 0.7.65?! [12:11:36] that's ms5 [12:11:42] why on earth would this be fetched from ms5 [12:11:57] unless it's a borked SSL host? [12:12:13] last-modified 26 Nov, so it can't be ms5 [12:12:40] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [12:12:55] ms6 [12:14:00] oh right [12:14:08] I keep forgetting about ms6 [12:14:20] i'm thinking about turning it off [12:14:26] I have the impression it's causing quite a few issues now [12:14:26] so, a 480p version of a video is a "thumb" [12:14:40] and stored in ms6 [12:15:01] it's actually cool that it worked so long [12:15:15] it failed once, when leslie rebooted it and it didn't come back up [12:15:18] then ms5 was overloaded [12:15:26] but since ms5 is now gone, I think swift may be able to handle it [12:15:38] or not, in which case we'll turn it back on [12:15:41] should be, yes [12:15:48] it would just be nice if swift was in a bit better shape on new servers yet ;) [12:16:03] we're getting there [12:16:11] how many are done? 2? [12:17:03] 4 I think [12:17:08] apergos: ^ [12:17:09] oh [12:17:14] didn't know they arrived yet [12:17:24] 4 is correct, we have a winner [12:17:40] how many c2100s were down? [12:17:52] apergos: remember not to push new rings too often [12:17:52] 4 :-D [12:18:01] how many pushes did you do? [12:18:12] the scedule of deployment is the schedule of new rings [12:18:12] and with one flaky c2100 left [12:18:17] that means we're in fairly good shape? [12:18:55] not bad but the two new ones from last week didn't go in til today (they could have gone in sat or sun but I was never here, and yesterday I was mostly out of commission) [12:19:05] but we are in much better shape than we were [12:19:34] i upgraded all upload varnish boxes yesterday [12:20:18] with the two patches? [12:20:31] with 4 patches [12:20:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:21:08] it was nice to see that persistent storage was working [12:21:19] oh, you got the multiple waiter patch? [12:21:22] while I upgraded one box at a time, swift requests spiked, but immediately returned to normal levels after each host came back up [12:21:24] no [12:21:32] i think i'll write my own [12:21:44] heh [12:22:04] i didn't ask [12:22:08] i'm not gonna beg for that stuff [12:25:05] so i'm going to remove ms6 out of the loop now, unless anyone has any concerns [12:25:10] i'll test with one squid first [12:25:38] afaik there is nothing relying on it any more (famous last words though) [12:25:55] nothing has ever been relying on it, so I think that's a true statement [12:27:44] I just added video/webm to its mime.types [12:27:45] heh [12:27:53] how inefficient of you [12:28:30] backin a bit, food run [12:28:53] ms6 would give us trouble with the upcoming X-Content-Duration too fwiw [12:29:10] which found its way to wikis now, not sure if you heard [12:29:27] and Aaron wrote (but hasn't merged nor run) the maint script to retroactively set it to all past ogg videos [12:30:26] deploying to amssq47 [12:30:33] yes I saw [12:30:43] yeah, so it's time for it to go [12:31:31] i'll deploy to all squids now [12:32:11] !log Removed ms6 out of the loop as backend server for esams upload squids [12:32:16] I don't expect any trouble [12:32:18] Logged the message, Master [12:32:26] me neither [12:32:45] ms6 was a nice hack ;) [12:33:14] spike on swift requests [12:33:22] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.002 second response time on port 11000 [12:33:24] doesn't seem very big [12:33:34] but we'll see [12:34:40] yeah no problem [12:36:32] * apergos closes up the graphs and actually goes for lunch now [12:36:57] good [12:37:07] what's the best way to run purges for these webms? [12:37:16] action=purge on MW? :) [12:37:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.412 seconds [12:37:24] if it works, probably [12:37:27] there are some scripts [12:37:37] like purgeList in the mw maintenance dir [12:38:13] we could use ms6's content as an initial test for ceph [12:38:21] it's still purging files, just not adding new thumbs [12:38:26] but that should be fine for a first test [12:38:46] i'm just not 100% sure if ms6's purging is working correctly given the amount of problems we have with purging thumbs in esams [12:38:51] one of the reasons I want it gone ;) [12:39:31] nope, action=purge didn't work [12:40:03] on the original size? [12:40:12] we should debug that [12:40:21] the original size didn't have an issue [12:40:25] as it didn't come from ms6 [12:40:32] right [12:40:42] so I wonder if mediawiki is not sending (the right) purge requests [12:40:45] or if they just don't arrive [12:40:49] or if they don't get processed properly [12:41:00] let's debug it now [12:41:10] hehe [12:43:05] okay, how? [12:43:09] sec [12:43:17] I ran ngrep on amssq47 and it hangs [12:43:24] can't log in on the box now [12:43:55] fixed [12:46:23] it started swapping when I ran that... [12:48:00] amssq48 same story [12:48:07] perhaps ngrep is like really borked on lucid or something [12:48:12] i'll use tcpdump [12:52:21] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [12:52:21] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [12:52:21] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [13:03:12] it's just not purging the smaller scale videos [13:04:15] the jpeg thumbs are purged, but the scaled videos are not [13:06:07] j^: ping :) [13:06:19] paravoid: hi [13:06:49] hello [13:07:37] action=purge on a page with multiple videos doesn't send cache purges for the scaled videos [13:07:43] purge does not remove transcoded videos, on the page you can reset them if you have the right permissions [13:07:52] purge happens way to often [13:08:15] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [13:08:15] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [13:08:24] what does that mean, "reset" them? [13:09:09] and what "remove" means? we are talking about cache purges, not removing them from swift; are these two in the same codepath in mediawiki? [13:09:22] mark: in the transcode table you get a reset link, it removes it and submits a new job to encode it [13:09:28] yeah we don't want that [13:09:35] but we do want the ability to remove them from the caches [13:09:40] like now [13:09:46] the scaled video is fine, the cache objects in squid/varnish are not [13:09:52] purge removes thumbnails from swift [13:10:01] yeah that's a problem here [13:10:08] not so big a deal for images, a much bigger problem for videos [13:10:16] I know that it does that for thumbs, but I have no idea if that's mandated by the internal code architecture [13:10:24] if its just about squid/varnish not sure [13:10:40] i guess I'll purge these manually right now [13:10:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:10:55] in core extensions can remove thumbnails from the list of thumbs that get purged, squid cleanup could happen before that [13:12:29] i don't even know how you run a maint script these days [13:12:34] how does this MWScript.php work ;) [13:13:24] use mwscript --wiki=enwiki somecommand.php [13:13:28] yeah found it [13:13:36] j^: okay, so, I'll file a bug report against TMH [13:13:45] so that others can weight it [13:13:49] just purged the 480p version [13:13:51] let's see if it works [13:14:05] paravoid: ok will look into still purging videos from squid but not removing them [13:14:10] from swift [13:14:21] huh [13:14:23] 404 [13:14:34] so the scaled videos are ogg, not webm :) [13:14:34] wha [13:15:19] oh AND webm [13:15:24] it's purging neither [13:15:53] there's a table at the bottom [13:15:57] on the commons page [13:16:00] with all the variants [13:16:16] oh yeah [13:16:20] works now [13:16:26] 480p webm [13:16:28] 480p should be purged [13:16:32] still says audio/webm [13:16:41] yeah, that's okay apparently [13:16:50] firefox plays the video now yay [13:17:06] what did you run? [13:18:04] root@fenari:/home/w/common/multiversion# php MWScript.php purgeList.php commonswiki [13:18:08] then provide full URLs on stdin [13:18:17] oh also, with ms6 we wouldn't get the CORS header [13:18:22] indeed [13:20:27] good [13:20:33] nice [13:20:35] heh [13:21:39] New patchset: Mark Bergsma; "Hardcode GeoIP netmask to 24 in Varnish's GeoIP" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/34280 [13:22:30] just in case? [13:22:41] i want unnecessary mutexes gone :P [13:23:30] and probably i'll remove the netmask altogether [13:23:37] better to not provide it than to lie about it imho [13:23:53] yes, but we need to ask FR people first [13:24:02] https://it.wikipedia.org/w/index.php?title=Speciale%3ARicerca&profile=default&search=incategory%3A%22Compositori+italiani+del+XVIII+secolo%22+incategory%3A%22Morti+nel+1741%22&fulltext=Search <-- why doesn't it work? [13:24:03] to make sure that we don't completely break their js [13:24:50] I did [13:24:56] they said they didn't give a crap about netmask [13:25:01] i don't know if anyone else does [13:25:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.711 seconds [13:25:31] does anyone else uses geoiplookup? [13:25:37] probably, no idea [13:25:55] it's probably like https gateway and etherpad [13:26:02] once you set something up these things live their own life ;) [13:26:46] the more hacky and temporary and specific you make a solution, the more sure you can be that the entire world learns about it and starts using it :) [13:28:26] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/34280 [13:59:06] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:09:18] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [14:11:24] PROBLEM - Host snapshot1002 is DOWN: PING CRITICAL - Packet loss = 100% [14:15:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds [14:15:52] !log snapshot1002 down for reinstall and testing [14:15:58] Logged the message, Master [14:17:06] RECOVERY - Host snapshot1002 is UP: PING OK - Packet loss = 0%, RTA = 26.51 ms [14:17:28] it's not but nagios can think it is [14:17:41] it pings alright [14:17:52] it's in the installer [14:21:45] PROBLEM - SSH on snapshot1002 is CRITICAL: Connection refused [14:36:29] New review: ArielGlenn; "old changes are old. let's see if it meets our needs" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9118 [14:36:35] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9118 [14:44:56] New patchset: ArielGlenn; "snapshot1002 gets snapshot cfg for partition (test)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35386 [14:45:34] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35386 [14:48:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:53:14] New patchset: Demon; "Clean up gerrit's system_role declarations" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35387 [15:03:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.241 seconds [15:05:42] PROBLEM - NTP on snapshot1002 is CRITICAL: NTP CRITICAL: No response from NTP server [15:16:29] New review: Faidon; "This isn't exactly related to this change, but I dislike modifying the behavior with if statements, ..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/35344 [15:16:56] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/33554 [15:23:33] RECOVERY - SSH on snapshot1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:38:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:47] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/34862 [15:38:56] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/34863 [15:44:59] New patchset: Hashar; "beta: use IP for memcached server" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35318 [15:45:59] New review: Hashar; "Reason for self merge:" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/35318 [15:45:59] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35318 [15:48:43] hmm? [15:48:51] is DNS in labs broken or something? [15:50:20] paravoid: yup a bit [15:50:32] maybe I should have opened a bug [15:50:36] New review: Faidon; "There's nothing generic about git::extension -- even the name is a misnomer (extension of /what/?). ..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/35173 [15:50:52] what's wrong? [15:51:04] 2012-11-27 15:50:25 deployment-apache33 enwiki: Memcached error: Error connecting to deployment-mc:11000: php_network_getaddresses: getaddrinfo failed: Temporary failure in name resolution [15:51:17] on BETA, from the udp2log memcached-serious.log [15:53:22] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35387 [15:55:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.033 seconds [16:01:13] off dad duties [16:01:17] might connect later tonight [16:01:48] New review: Faidon; "Is this really supposed to be matching an e.g. "MSIE IEMobile/10" UA, or should there be a .* or | b..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/35298 [16:06:45] !log bastion disabled hazard-sj's cron, as it was accidentally DoSing bastion [16:06:49] argh [16:06:52] Logged the message, Master [16:06:56] wrong channel [16:07:29] !log ignore previous labs-related entry [16:07:35] Logged the message, Master [16:08:12] New review: MaxSem; "Well, there _is_ a | between MSIE and IEMobile:)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/35298 [16:11:35] MaxSem: oh heh :) [16:11:49] yeah, Gerrit diffs suck:) [16:11:58] what about < 10? [16:12:12] also as I said, I don't feel authoritative on approving UA matches [16:12:19] paravoid, I'll get someone to review the corresponding PHP changes, then you can be sure [16:12:39] VCL-wise looks fine [16:12:47] (basically, we have the same detection code in PHP and Varnish) [16:12:53] yes I noticed that [16:12:58] and noticed how the other one isn't merged yet :) [16:14:13] and it kind of sucks how we do the same UA detection twice [16:14:59] we don't use the PHP code in WMF, it's for third-party users [16:15:16] also, it is something we can unit-test:) [16:16:23] oh, good to know that [16:27:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:37:15] New review: Andrew Bogott; "Faidon -- this isn't for deployment, it's a refactor to simplify some mediawiki classes. I agree wi..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/35173 [16:43:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.060 seconds [16:56:33] New patchset: Jgreen; "adjust time of day for fundraising db dumps" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35400 [16:58:18] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35400 [17:08:34] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [17:17:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:20:24] paravoid, the rules for https://gerrit.wikimedia.org/r/#/c/35298/ have been reviewed [17:24:48] New review: Demon; "This is basically the default template, other than the minor part I added (noted with inline comment..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/32465 [17:25:31] PROBLEM - mysqld processes on db78 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [17:25:49] New review: Demon; "Note this is already in use in production, I'm just puppetizing it. (eg: http://svn.wikimedia.org/vi..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/34561 [17:29:34] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [17:35:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [17:38:16] RECOVERY - mysqld processes on db78 is OK: PROCS OK: 1 process with command name mysqld [18:00:01] PROBLEM - Host analytics1002 is DOWN: PING CRITICAL - Packet loss = 100% [18:05:43] RECOVERY - Host analytics1002 is UP: PING OK - Packet loss = 0%, RTA = 31.40 ms [18:07:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:20:05] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35376 [18:20:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35369 [18:21:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.611 seconds [18:24:04] New patchset: Reedy; "Cleanup Lucene config, removing old globals and unused code" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33037 [18:24:36] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33037 [18:25:00] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33542 [18:25:27] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/32694 [18:31:42] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35300 [18:32:55] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35341 [18:33:12] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35334 [18:33:29] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/29623 [18:33:52] snapshot1002: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ [18:33:58] I guess puppet is on a go slow? [18:34:27] New review: Demon; "Wrong comment." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/34561 [18:34:38] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33329 [18:34:41] Reedy: change is eternal [18:34:42] New review: Demon; "Note this is already in use in production, I'm just puppetizing it. (eg: http://svn.wikimedia.org/vi..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/32465 [18:55:20] New review: Dzahn; "adds wikivoyage" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/34281 [18:55:22] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/34281 [18:55:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:56:36] New patchset: Demon; "Also link shorthand commit sha1s" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35415 [18:57:30] Change merged: Dzahn; [operations/debs/squid] (master) - https://gerrit.wikimedia.org/r/29895 [18:57:59] New review: Dzahn; "added tests / wikivoyage to test-redirector.php" [operations/debs/squid] (master) - https://gerrit.wikimedia.org/r/29895 [19:10:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.794 seconds [19:14:34] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [19:25:39] RobH: hey, so i saw your ticket for labs1001-1002 -- i already have some machines in there labeled as labsdb1001 labsdb1002 so i wanted to confirm ports and stuff [19:26:21] ok, can you put in ticket what you ahve and i iwll confirm them [19:28:43] !log installing a bunch of pkg upgrades on [19:28:49] Logged the message, Master [19:28:49] meh [19:29:08] !log installing a bunch of pkg upgrades on zirconium [19:29:15] Logged the message, Master [19:30:14] New patchset: Dereckson; "(bug 41992) Set timezones for Wikivoyage wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35330 [19:31:07] sure [19:31:52] !log upgrade libc* packages on sodium and singer [19:31:59] Logged the message, Master [19:32:41] New review: Dereckson; "PS2: Removed deb typo" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/35330 [19:32:48] New review: Dzahn; "makes sense yes, time zones for wikivoyage" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/35330 [19:32:48] Change merged: Dzahn; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35330 [19:33:18] oh. wow. timing :p [19:33:23] New patchset: Kaldari; "Switching back to http flickr API per bug 42468" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35421 [19:33:54] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35421 [19:34:01] New review: MaxSem; "Was Europe/Moscow originally used on ru.wv? I'm asking because Russia spans 11 timezones and when I ..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35330 [19:35:15] New review: Dereckson; "Discussed on village pump:" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35330 [19:40:39] MaxSem: can we get the value of $wgLocaltimezone from the old wiki without shell? [19:40:57] http://ru.wikivoyage-old.org/wiki/%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%B0%D1%8F:RecentChanges [19:41:17] then it would answer your question if they used Moscow zone before or not [19:41:41] mutante, they used CET: http://ru.wikivoyage-old.org/wiki/Wikivoyage:Пивная_путешественников [19:42:05] which was some rudiment, it seems XD [19:42:37] i see, ok, i guess all the ones hosted on that one server had just CET then [19:42:44] Moscow sounds better [19:45:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:58:13] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 269 seconds [20:01:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds [20:01:22] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [20:07:53] !log temp stopping puppet on brewster [20:08:00] Logged the message, notpeter [20:10:33] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26040 [20:10:47] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26041 [20:17:29] New patchset: Hashar; "Gerrit notifications for Wikidata to their channel" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26042 [20:18:19] !log upgrading salt on all labs minions [20:18:26] Logged the message, Master [20:18:57] well, that surely spiked the load [20:19:29] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26042 [20:19:41] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/32465 [20:19:54] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/34561 [20:20:06] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35415 [20:22:06] New review: Hashar; "I still have to follow up on that change so we use a tag / branch instead of a sha1." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/27175 [20:22:16] and hooks are still working :-] [20:24:19] New review: Dzahn; "the puppet part itself looks good, as mentioned above it works like the refreshlinks class which ind..." [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/33713 [20:28:23] New patchset: Pyoungmeister; "correcting mac for mc1015" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35433 [20:29:33] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35433 [20:31:49] New review: Nemo bis; "babababa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35415 [20:33:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:33:24] New patchset: Demon; "Fixup to gerrit::backup removal. Wasn't unused everywhere." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35435 [20:39:29] New review: Dzahn; "yep, this has just been removed." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/35435 [20:39:30] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35435 [20:42:59] thanks mutante for your review :D [20:46:57] !log upgrading salt in production, in batches of 10 [20:47:03] Logged the message, Master [20:49:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.420 seconds [20:58:57] New patchset: Dzahn; "remove mysql server from host iron, remove firewall rules for mysql" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35439 [21:02:16] New patchset: Dzahn; "remove mysql server from host iron, remove firewall rules for mysql" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35439 [21:05:31] Jeff_Green: you still need misc::jenkins on aluminium i suppose? [21:05:47] yes [21:06:14] is it otherwise deprecated? [21:06:28] it has "merge with misc::contint::test, or remove" as a comment [21:06:35] yes it is [21:06:36] oic [21:06:46] we could move it into misc/fundraising [21:06:55] actually for now i would just like to move it to a different file [21:07:04] sounds good [21:08:58] huh. I don't think most of what's in misc::jenkins is actually needed on aluminium [21:09:02] New patchset: Ottomata; "Removing opera mini IP addresses from Thailand Wikipedia Zero filter." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35516 [21:09:22] New patchset: Dzahn; "move misc::jenkins (the other jenkins) to fundraiser.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35520 [21:09:36] Jeff_Green: ^ [21:09:45] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35516 [21:11:01] mutante: ok. i'll clean it up and rename it someday [21:11:38] thanks, just separating that file for cleanup purposes like that..ack [21:11:43] New patchset: Anomie; "Merge operations/mediawiki-multiversion into multiversion/" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35540 [21:11:58] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35520 [21:19:28] New patchset: Dzahn; "rename misc::extension-distributor to mediawiki::extension-distributor and move from misc-servers to mediawiki.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35558 [21:20:49] New patchset: Dzahn; "rename misc::extension-distributor to mediawiki::extension-distributor and move from misc-servers to mediawiki.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35558 [21:22:19] New patchset: Jgreen; "renamed misc::jenkins to misc::fundraising::jenkins" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35559 [21:22:30] New review: Dzahn; "more RT-720" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/35558 [21:22:31] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35558 [21:22:49] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35559 [21:23:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:23:34] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [21:27:56] New patchset: Dzahn; "move "kiwix-mirror" from misc-servers.pp to download.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35560 [21:30:15] New review: Dzahn; "might wanna rename them all to download::something too" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/35560 [21:30:15] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35560 [21:42:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.019 seconds [21:43:39] New patchset: Demon; "Remove custom hacks for search bar" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35562 [21:47:14] morebots seems to be dead [21:47:29] or at least, not doing it's job.. [21:47:44] err, logmsgbot even [21:47:53] New patchset: Anomie; "Merge operations/mediawiki-multiversion into multiversion/" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35540 [21:48:06] that's bad, we can collide mid-air [21:48:48] "omg, nobody's deploying, lemme do a quick scap" [21:49:50] not to mention the problem that the log isn't showing up here: http://wikitech.wikimedia.org/view/Server_admin_log [21:50:28] * Reedy kicks logmsgbot [21:50:36] !log logmsgbot doesn't seem to be working for sync-dir, scap, etc [21:50:44] Logged the message, Master [21:51:17] Looks like we're missing logmsgbot_ [21:52:08] !log reedy synchronized php-1.21wmf5/extensions/LabeledSectionTransclusion [21:52:14] Logged the message, Master [21:53:07] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 224 seconds [21:54:37] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [22:08:12] New patchset: Dzahn; "move apple-dictionary-bridge out of misc-servers into search.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35568 [22:09:52] New patchset: Dzahn; "move apple-dictionary-bridge out of misc-servers into search.pp and rename it from misc:: to search::" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35568 [22:10:52] New patchset: Dzahn; "move apple-dictionary-bridge out of misc-servers into search.pp and rename it from misc:: to search::" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35568 [22:11:25] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35568 [22:14:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:23:38] !log starting squid backend on knsq18 (cron spam/nagios) [22:23:44] Logged the message, Master [22:25:52] ACKNOWLEDGEMENT - Host analytics1007 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn install issues [22:26:43] ACKNOWLEDGEMENT - Host cp1031 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn fan issue / Dell case RT-3614 [22:27:04] !log scapping as a part of usual mobile deployment [22:27:11] Logged the message, Master [22:30:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.112 seconds [22:34:52] ACKNOWLEDGEMENT - Host srv266 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn this one has been going down since September 2011 and caused way too much time.. kill it [22:53:34] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [22:53:34] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [22:53:34] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [23:04:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:05:15] fenari liquidthreads_labswikimedia UploadStashCleanup::execute 10.0.6.49 1146 Table 'liquidthreads_labswikimedia.uploadstash' doesn't exist (10.0.6.49) SELECT us_key FROM `uploadstash` WHERE us_timestamp < '20121127060916' [23:05:49] meh [23:09:37] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [23:09:37] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [23:10:02] ugh, I've got a hanging scap [23:10:44] binasher: I wonder if WikiPage::updateCategoryCounts could use GET_LOCK in some way [23:11:15] dewiki WikiPage::updateCategoryCounts 10.0.6.55 1213 Deadlock found when trying to get lock; try restarting transaction (10.0.6.55) INSERT IGNORE INTO `category` (cat_id,cat_title) VALUES ... [23:11:21] dozens in a row [23:14:53] !log restarted gerrit; was getting WARN org.eclipse.jetty.util.log : Dispatched Failed! errors [23:14:59] Logged the message, Master [23:18:36] MaxSem: press enter a few times [23:18:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.587 seconds [23:18:58] ewww, thanks Reedy [23:24:37] PROBLEM - Puppet freshness on snapshot1002 is CRITICAL: Puppet has not run in the last 10 hours [23:29:26] LeslieCarr: I earned that bird [23:29:39] yes you did :) [23:30:18] * preilly hides [23:32:34] New patchset: Reedy; "(bug 39380) Enabling secure login (HTTPS)." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21322 [23:53:52] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds