[00:04:34] PROBLEM - SSH on amslvs2 is CRITICAL: Server answer: [00:05:19] RECOVERY - SSH on nescio is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [00:06:13] RECOVERY - SSH on ssl3001 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [00:07:34] RECOVERY - SSH on amslvs2 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:11:55] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [00:21:03] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [00:21:22] PROBLEM - SSH on nescio is CRITICAL: Server answer: [00:27:04] RECOVERY - SSH on nescio is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [00:29:46] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:31:25] PROBLEM - SSH on nescio is CRITICAL: Server answer: [00:32:18] PROBLEM - SSH on ssl3001 is CRITICAL: Server answer: [00:33:40] RECOVERY - SSH on ssl3001 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [00:36:57] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [00:38:00] PROBLEM - SSH on ssl3001 is CRITICAL: Server answer: [00:38:27] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:45:12] RECOVERY - SSH on ssl3001 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [01:01:36] Could someone help me get a couple of debs onto our apt repository? I've already packaged everything but don't have the ability of uploading them to apt.wikimedia.org. If this is more involved than I imagine, how should I go about requesting help? [01:04:48] I can try [01:04:59] usually it works as documented [01:05:25] TimStarling: oh, that would be awesome! both packages are in a ppa here: https://launchpad.net/~ori-livneh/+archive/e3 [01:06:04] is there something i could do to make this less work for you? [01:06:33] do you have a shell account? [01:07:00] i can ssh to fenari and emery, if that's what you mean [01:07:11] don't think it extends to any other machines [01:10:37] can you just drop your PPA public key into your home directory on fenari? [01:10:53] yes, sec [01:13:55] do you have the .changes files? [01:14:09] yes, i'll drop that there too [01:14:35] thanks [01:17:34] TimStarling: okay, both there [01:18:08] I just see zpubsub_0.2-4_source.changes, no changes for czmq [01:18:23] oh. ah. good point. sec. [01:19:52] okay, should be there [01:32:28] RECOVERY - SSH on nescio is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [01:41:55] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 246 seconds [01:42:49] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 266 seconds [01:43:33] ori-l: czmq is done, but it's failing on zpubsub apparently because the .orig.tar.gz file is not listed in the .changes file [01:44:28] so reprepro doesn't upload the .orig.tar.gz file, and later fails when it starts processing the .dsc file [01:45:42] hrm, I think I can repackage it quickly but to get it to build on launchpad will probably take an hour [01:46:03] I can probably make it upload, I'm reading the reprepro manual [01:46:52] i don't know why but packaging for debian has a learning curve that makes haskell look friendly. [01:48:19] it may or may not have worked [01:48:28] do you have a server you want it installed on, so I can test? [01:48:29] heisenapt [01:48:49] um, there's vanadium, but i think it currently has it installed via the ppa [01:48:57] let me uninstall it and deregister the ppa [01:49:16] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 657s [01:49:40] if you have root there you can test it yourself, it should show up after apt-get update [01:49:56] trying now [01:50:45] zpubsub : Depends: libczmq0 but it is not installable [01:55:25] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 3 seconds [01:57:02] yeah, it wasn't in the .changes or the .dsc [01:57:15] TimStarling: i forced it to install using apt-get download and dpkg -i --ignore-depends=libczmq0 [01:57:19] maybe I only have binary packages for it [01:57:53] http://ppa.launchpad.net/ori-livneh/e3/ubuntu/pool/main/c/czmq/ [01:57:56] but then it fails because it's expecting libczmq.so.1. so at least zpubsub is okay. let me take a look at the other one [01:57:57] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [01:58:34] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 16 seconds [01:59:29] it seems to be there: Source: czmq / Binary: libczmq0 libczmq-dev libczmq-dbg [02:01:24] you mentioned that czmq is done, but i can't find it in apt -- am i simply being daft? [02:07:47] only the source package was uploaded, again because of the broken .changes file [02:10:08] could you use the same trick you used with the other package? [02:11:36] yes, try now [02:15:29] works! yay [02:15:37] good [02:15:45] thank you thank you thank you! :) [02:15:52] no problem [02:16:02] you have made a simple peasant happy [02:16:09] next time you do this, you should probably use a pbuilder instance on labs instead of launchpad [02:16:41] it'll generate the .changes file for both source and binary, so the upload will just work [02:16:47] and you won't have to wait an hour (once it's set up) [02:16:57] that's what the analytics folks did for udp-filter [02:17:15] ah, that's good to know. [02:19:19] * ori-l pokes around [02:20:28] !log LocalisationUpdate completed (1.20wmf9) at Mon Aug 13 02:20:27 UTC 2012 [02:20:42] Logged the message, Master [02:29:25] TimStarling: btw, are the TODOs for udp2log still relevant? i know analytics is planning to replace it, but if you think it'll be around for a while i might like to dig in for fun [02:31:31] yes, they are still relevant [02:37:53] !log LocalisationUpdate completed (1.20wmf8) at Mon Aug 13 02:37:53 UTC 2012 [02:38:03] Logged the message, Master [03:23:55] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [03:32:16] TimStarling: how did you benchmark the different approaches you've tried with udp2log? were you able to generate sufficient load in a development environment, or was there no alternative to trying things out in production? [03:33:57] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [03:38:00] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [03:38:00] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [03:38:00] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [03:52:00] ori-l: I just tested it in development, I'm pretty sure it was realistic [03:53:16] it's a simple system, no caching or anything, so the only thing you have to match is the line length and packet arrival rate [03:54:36] I think I used iperf or some similar tool [03:56:08] yeah, that seems simple enough [04:01:52] TimStarling: I've found another instance where the template update failed to propagate [04:03:56] if udp2log is not cpu bound, why not rewrite it using a language or framework (twisted, nodejs, go, whatever) that makes concurrent network software easier? [04:04:21] it's CPU bound [04:05:39] TimStarling: https://commons.wikimedia.org/w/api.php?action=query&prop=extlinks&titles=File:Stoodleigh,_above_Throwcombe_Cross_-_geograph.org.uk_-_243681.jpg gives a different link from https://commons.wikimedia.org/wiki/File:Stoodleigh,_above_Throwcombe_Cross_-_geograph.org.uk_-_243681.jpg [04:06:01] but single-threaded [04:06:58] it wouldn't be faster if it was multithreaded [04:08:30] "So my idea is to split the read side and the write side across a thread or process boundary, using the existing buffer pipe to communicate between the two processes."? [04:10:51] on the read side, there is only one socket, so you can't distribute the task [04:11:14] if you have multiple processes on the write side, then the read thread would have to write to all of them [04:11:47] and that would take about as long as just writing directly to the log filter processes [04:13:46] yeah, but implementing even just two workers with a single unidirectional channel between them could be made easier [04:14:21] I don't think you would get a significant benefit, even in C++ [04:14:30] in python I think you'd lose a factor of 10 [04:14:33] (in case it isn't blindingly obvious, btw, i'm interested in this not because i think i know how to solve this but precisely because i don't, which makes this fun) [04:14:49] you know udp2log does no allocation during after startup [04:14:55] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [04:15:19] can you write allocation-free code in python? [04:15:51] boost::asio makes asynchronous code in C++ really easy [04:16:24] I wrote a prototype memcached replacement in it, the code was almost as short as it would have been in python [04:17:29] http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/maxcache/ [04:17:47] my big axe against C++ is that i don't know it, which makes it vastly inferior to most other solutions [05:06:40] TimStarling: did/will you have a chance to give a look at that couple of bugs against wikidiff2? or, who else could help? https://bugzilla.wikimedia.org/buglist.cgi?resolution=---&query_format=advanced&component=wikidiff2&product=MediaWiki%20extensions [05:08:46] which one are you interested in? [05:10:53] TimStarling: 13462 mostly [05:11:06] but I've been told 22559 might be a blocker for it [05:16:54] wikidiff2 does word level diffs, except for unified han characters which are done character-by-character [05:17:09] it includes support for thai segmentation for word-level diffs in that language [05:17:59] dwdiff looks nice, but it doesn't seem to have any Thai support [05:22:54] detect if thai codepoints are present and then use one based on that? [05:23:14] * jeremyb has not looked much at the context [05:33:36] TimStarling: what is required to detect lines better then? [05:34:06] dwdiff is indeed slow (compared to diff) but its execution time doesn't explode with the test cases in bug 22559 [05:34:28] so I hope such slowness is not an inherent problem of its approach [05:34:56] nobody has rereported bug 22559 so I assume it doesn't happen very often [05:35:20] TimStarling: it happens only with pages which have 30k+ lines [05:35:23] you know domas was really angry about you giving him a "reality check" [05:35:24] it seems [05:35:33] yeah I noticed [05:35:48] the page he reported, for instance, was fixed by reducing newlines [05:35:59] how did you notice? [05:38:04] TimStarling: that he was angry? well, I know him a bit and his laconic answer was quite clear [05:41:14] maybe it's better to do the whole thing as a word-level diff [05:41:36] maybe that can be made to be efficient enough [05:42:32] I haven't seen anyone suggest an algorithm which would fix either that problem or bug 22559 [05:42:44] it would be wonderful if you could look into it as you only can do [05:43:40] well, Max Sem claimed that 13462 would be trivial to resolve but could ^2 the time execution in 22559 [05:44:10] * Nemo_bis is clueless [05:44:53] you obviously can't calculate edit distances between every possible pair of lines [05:45:04] *shrug* there are lots of diff algorithms [05:45:13] I guess we could just use a different one [05:45:24] the algorithm, that is, not the frontend [05:45:34] sure [05:46:37] in the original wikidiff, some guy tried to write his own [05:46:44] it was plainly worse than the original algorithm [05:46:48] the diff visualization was so broken before 1.20 that the actual diff algorithm didn't matter much, now it would be useful to improve it [05:46:55] heh [05:47:20] so in my work, I ported the original algorithm from PHP to C++ [05:47:33] and used some of the formatting code from wikidiff [05:47:56] as far as I can see, diff algorithms are a complex rigorous *science* [05:50:01] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [05:50:15] TimStarling: have you ever given a look at https://en.wikipedia.org/wiki/User:Cacycle/diff ? [05:50:30] it's JS, but it suggests an algorithm [05:50:36] heh, 1978 [05:50:47] :p [05:51:00] you would think some work would have been done since then [05:51:03] it's actually used by people on en.wiki [05:51:20] hey, we're doing an encyclopedis, it's a 18th century dream [05:51:50] it's not encyclopedia-related, it's computer science [05:52:06] last I checked, it was still an active field [05:52:12] orly [05:53:09] do you use that JS thing? [05:53:53] TimStarling: no I don't [05:54:05] (I don't use en.wiki much) [05:54:21] but users seem happy with it, dunno [05:54:50] the quality of diff output is somewhat subjective [05:54:56] and it probably got a lot of feedback from the wild [05:55:24] well, sure, but if it brought some diffs to half an hour execution time I guess someone would have noticed, for instance [06:12:58] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:26:43] !log uploaded czmq and zpubsub packages to brewster for Ori [06:26:56] Logged the message, Master [06:52:29] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [07:07:29] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [07:22:30] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [07:34:48] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [07:37:30] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [08:59:42] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [09:06:27] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [09:06:27] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [10:12:27] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [11:04:17] TimStarling: I asked dwdiff's dev about Thai word segmentation, he says there are no plans for implementing it as he doesn't know what's the best approach; do you have suggestions for him? :) [11:47:25] Server admin log: "Tuesday, August 14 23:30-01:30 (next day) (4:30pm-6:30pm PDT) - Lua deployment to test2 [Tim] - depends on RT 3365" [11:47:37] I take it that didn't happen? (RTs are private) [12:59:24] Jarry1250: august 14 hsan't come yet? [13:00:03] jeremyb: hmm? [13:00:12] your quote [13:00:18] lua [13:00:57] jeremyb: Ah yes. Well, that would explain it rather :) I suppose that rather prompts the question, will it be happening? [13:01:48] Jarry1250: you could ask him... also you know about the deployment calendar? [13:02:24] jeremyb: Well, that's from [[Software deployments]], I think there's an internal one as well? [13:02:31] Otherwise not. [13:02:47] i hope there's not an internal one. i meant the one on [[software deployments]] [13:02:54] i think [13:02:58] i thought you were quoting something else [13:03:03] * jeremyb runs away [13:04:52] Of course, calendars are only as good as their readers' ability to understand the passage of time :P My bad. [13:09:52] Tim-away: your new instance isn't reachable by symbolic DNS right? [13:10:01] only by instance ID [13:24:27] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [13:34:30] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [13:38:32] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [13:38:32] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [13:38:32] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [14:15:27] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [15:19:47] PROBLEM - Host srv281 is DOWN: PING CRITICAL - Packet loss = 100% [15:25:20] RECOVERY - Host srv281 is UP: PING OK - Packet loss = 0%, RTA = 2.11 ms [15:28:21] hi, I don't really know where else to ask and I googled quite a while so I'll try here: is there a way using the api to detect which image is used on as 'logo' on the righthandside in the 'darker brown' box ( on wikipedia ) ? [15:28:45] i know how to get all the images on a page but i'd like to know which one is the 'logo' in the summary on the right [15:28:57] couldn't find an api call [15:29:14] PROBLEM - SSH on srv281 is CRITICAL: Connection refused [15:33:36] RECOVERY - SSH on srv281 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:38:50] wb dungodung [15:39:12] ty Nemo_bis [15:50:33] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [15:53:06] PROBLEM - NTP on srv281 is CRITICAL: NTP CRITICAL: No response from NTP server [16:13:30] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [16:41:08] dzan: There isn't an API for that, you're gong to need to parse the HTML or templates [16:41:48] Nemo_bis: Just saw that the ka input conversion already has been merged into git :) [16:42:27] hoo: yep [16:43:06] Dispenser: well we should give it some consistent id... [16:43:29] hoo: maybe you could tell the community so that they ask narayam to be enabled and it is as soon as update is deployed [16:43:34] so you can just search for `div#theid > img` or for img#theid [16:44:45] Go for ahead! I'm tired of arguing with noobs over technical changes [16:45:12] thx Dispenser [16:45:23] Nemo_bis: Noted down, I'm going to ask kawiki and maybe the sister projects as well ;) [16:53:17] hoo: good [16:53:33] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [17:05:33] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [17:08:33] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [17:21:07] !log aaron synchronized wmf-config/filebackend.php 'Switched backend reads to swift for testwikis and mw.org' [17:21:16] Logged the message, Master [17:21:28] woohoo!!! [17:23:33] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [17:23:47] Tim-away: Is alright for me to run the purge script on ~1363? [17:23:56] robla: we still don't have a scheduled deploy for updating apache rewrite rules to use swift [17:24:12] AaronSchulz: err... [17:24:14] we don't need them. [17:24:16] right? [17:24:33] that's why we're using thumb_handler.php instead of /thumb/ in teh URL. [17:24:39] or are you talking about something different? [17:25:45] maplebed: originals [17:26:05] * robla didn't realize we needed another window [17:27:05] it was on my list to talk to you about how upload URLs for originals were going to be parsed... but I figured they'd go straight to swift rather than mediawiki (since they go straight to ms7 now). [17:27:36] so ... what's the rewrite rule you need for originals? [17:29:21] maplebed: similar to how you changed the rules for thumbs to use swift, we need /site/lang/(archive/)?[0-9a-f] to go to swift [17:29:43] * AaronSchulz needs to check rewrite.py, I think it already handles these pretty much [17:30:16] what part of that needs an apache rewrite rule? I think that only needs squid and swift changes. [17:30:21] * AaronSchulz 's computer keeps stalling [17:30:45] maplebed: oh, right squid config, not apache [17:30:51] k. [17:31:23] does it need a rewrite to go to the right shard? [17:31:29] robla: would you add apergos and paravoid to that invitation? [17:33:54] I guess leaving = no :-D [17:34:03] bah. [17:34:08] heh [17:37:05] those times on the invite are pst right? [17:37:21] 3pm to 5pm? [17:37:52] maplebed: [17:38:07] yeah. [17:38:25] Thursday, August 16, 22:00-23:00 UTC (3pm-4pm PDT) - Swift default read (reads from Swift, writes to both) everything else [17:38:26] mm [17:38:31] (from http://wikitech.wikimedia.org/view/Software_deployments) [17:38:32] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [17:40:02] I wouldn't object to any change in the schedule - maybe you can convince AaronSchulz to take an earlier slot (and maybe trade with one of the other deploys on thursday) [17:40:03] that's 1 am til 3 am here... not sure I can hang [17:44:41] apergos: red bull? :) [17:44:47] er no [17:44:56] I wake up at 7:30 am no matter what happens [17:45:03] unless I'm deathly ill [17:45:21] I'll try sleeping earlier in the day, that would be the only way to make it work [17:45:28] I basically never touch caffine [17:45:31] caffeine [17:48:48] brion: I meant to ask, has the Gerrit (etc) review period been extended, and if so, until when? [17:49:05] Jarry1250: right I gotta write up some notes :) [17:49:28] i was thinking we should play with fabricator a bit more to get a better feel for what's available and at stake [17:49:39] but for the time being we're going to stick with gerrit as it works [17:50:23] brion: Cool, so the official review is kinda over, but there's also going to be an ongoing review? [17:50:47] sounds about right :) [17:51:37] <^demon> brion: I was just wondering where that e-mail was. Thought I'd have one Friday :) [17:52:04] yeah, i got distracted [17:53:04] <^demon> brion: Something shiny? ;-) [17:53:59] there's always something shiny :) [17:54:17] want shiny [17:54:20] is it sharp too?? [17:55:18] RoanKattouw: https://bugzilla.wikimedia.org/show_bug.cgi?id=39221 [17:58:00] Aaaah crap [17:58:17] I uploaded those years ago, before we figured out that all importImages.php runs should be done as apache [17:58:44] it would be nice to do a full walk to unwonk all files [17:59:03] Yeah [17:59:06] Looking into that now [17:59:14] I hope root trusts my key on ms7 [17:59:31] Bleh, it doesn't [17:59:36] OK I need to go to a meeting [17:59:45] But when I get back I'll bother ops about my root access not working on ms7 [18:00:08] I could do the walk over NFS from fenari but I really don't want to [18:00:14] oh. yeah it uses some weird password or other [18:11:27] hey robla [18:11:32] hi [18:11:36] stoopid wifi [18:11:40] heh [18:12:28] robla: do you know what the status is for https://bugzilla.wikimedia.org/20512 ? waiting on a particular person/review ? (for just upload.wm.o -> commons) [18:12:47] * jeremyb can't tell [18:14:47] given the amount of work that's going on with Swift and everything else, I'd rather not push on that one right now [18:15:08] hrmmmm, k [18:15:17] so, after originals -> swift ? [18:15:18] who wants to poke https://gerrit.wikimedia.org/r/#/c/19240/ for me? quick one-line change to apache config to (hopefully) fix a symlink link for Wiki Loves Monuments app [18:15:22] Ryeah [18:15:40] seems kinda unrelated i think. but ok [18:15:55] what's the first step after swift then? [18:16:27] is it ready to become a vanilla shell request then? [18:16:52] it's waiting for the dust to settle on swift originals [18:17:16] right, but then what? does anyone have to sign off on it? [18:17:19] Reedy: you around now? [18:17:30] robla: he's 17 hrs idle [18:18:02] robla: and can i give some kind of forecast of when? say 6 weeks? [18:18:09] he warned me he was going to be out for a bit, but thought he might be back in time for this [18:19:00] jeremyb: I'm not sure, honestly. it's been a while since I've thought about that one. could you bring it up on the mailing list? [18:19:28] i guess... [18:19:47] * jeremyb didn't want to get so involved today but someone was asking me about the bug ;-) [18:20:24] robla: how much buffer after originals? 1-2 weeks? [18:20:25] sorry for putting this off, but I'm trying to shepherd a deployment [18:20:41] AaronSchulz: can you take over for Reedy? [18:21:21] jeremyb: I can't give an answer right this second [18:21:29] where is he? [18:21:36] robla: k, have fun with your scaps ;) [18:22:09] AaronSchulz: I'll send mail [18:23:39] Phase 2 Monday, August 13 English Wikipedia [18:23:45] alright then [18:24:11] should be pretty easy :) (famous last words) [18:25:08] !log aaron rebuilt wikiversions.cdb and synchronized wikiversions files: Moved enwiki to 1.20wmf9 [18:25:17] Logged the message, Master [19:07:29] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [19:07:29] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [19:36:48] !log preilly synchronized php-1.20wmf8/extensions/ZeroRatedMobileAccess 'update for landing page' [19:36:57] Logged the message, Master [19:41:32] !log preilly synchronized php-1.20wmf9/extensions/ZeroRatedMobileAccess 'update for landing page' [19:41:41] Logged the message, Master [19:41:57] !log update to zero for banner text link [19:42:05] Logged the message, Master [19:42:07] !log preilly synchronized php-1.20wmf8/extensions/ZeroRatedMobileAccess 'update for landing page' [19:42:16] Logged the message, Master [20:03:02] !log preilly synchronized php-1.20wmf8/extensions/ZeroRatedMobileAccess 'update for landing page' [20:03:11] Logged the message, Master [20:03:37] !log preilly synchronized php-1.20wmf9/extensions/ZeroRatedMobileAccess 'update for landing page' [20:03:45] Logged the message, Master [20:04:25] !log updating zero for custom banner colors [20:04:34] Logged the message, Master [20:13:30] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [20:18:01] I know hashar's away so can't answer this, but I take it no-one ever got Jenkins running on, say, an extension's Gerrit repo ? [20:18:46] It isn't just a case of saying "please turn this on for X repo", right? [20:36:31] !log catrope synchronized php-1.20wmf9/resources/jquery/jquery.tablesorter.js 'Deploy 1fafaef3aa157f49cbbf4b9cffb03544bcf72a08' [20:36:41] Logged the message, Master [21:19:57] PROBLEM - SSH on virt1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:21:45] PROBLEM - Host virt1004 is DOWN: PING CRITICAL - Packet loss = 100% [21:27:18] RECOVERY - Host virt1004 is UP: PING OK - Packet loss = 0%, RTA = 35.38 ms [21:45:53] PROBLEM - Host virt1004 is DOWN: PING CRITICAL - Packet loss = 100% [21:49:56] ACKNOWLEDGEMENT - Host virt1004 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn RT-#3261disk fail [21:57:08] RECOVERY - Host virt1004 is UP: PING OK - Packet loss = 0%, RTA = 35.34 ms [22:04:31] RoanKattouw: did you get a chance to look at those file perms? [22:04:47] PROBLEM - Host virt1004 is DOWN: PING CRITICAL - Packet loss = 100% [22:05:10] AaronSchulz: Yes, sorry I forgot to update you [22:05:35] There were some issues with ms7 having an out-of-date authorized keys file because it's not puppetized (because it's Solaris), so it wasn't letting me in [22:05:45] That was fixed, so I'm now running a find on the bare metal, see SAL [22:06:12] The file with the list of bad files it's found is now 19MB [22:06:28] And it hasn't gotten very far yet [22:06:30] /export/upload/wikipedia/commons/1/13/Arvalee_Townland_-_geograph.org.uk_-_1222857.jpg [22:07:31] Once it finishes building the list (which will take a while, may have to wait till tomorrow) I'll batch chown them [22:07:55] sweet [22:07:56] RECOVERY - SSH on virt1004 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [22:08:05] RECOVERY - Host virt1004 is UP: PING OK - Packet loss = 0%, RTA = 35.36 ms [22:08:17] Then run another find to locate and clean up any remaining ones, in case files moved around in the 24h between the find and the chown [22:08:33] I'm also not processing the private wikis' files right now because they're in a separate location [22:08:48] But they probably have very few to none bad files, and the find should be quick there anyway [22:09:20] And I'm not doing thumbs because those live on a different box [22:14:06] I don't care about thumbs, just originals [22:15:35] Right [22:17:59] Man this is slow [22:18:02] Oh it's at 2d now [22:18:38] So that's 17.5% of the shards in the first hour [22:19:03] Assuming even distribution across the shards, that means this'll take 5.7 hours total, so another 4.7 hours to go [22:28:53] Crap, it's not processing them in hex order of course; 22 comes after 2d [22:29:09] So 22 is really 2*16 + 6 + 2 = 38 [22:29:22] So another ~6h to go [22:29:24] PROBLEM - NTP on virt1004 is CRITICAL: NTP CRITICAL: No response from NTP server [22:42:55] hello. I've had a VERY log of issues reaching wikimedia with ipv6 recently. It all goes very well with ipv4 (i use another ipv4-only browser to check). It seems the main problem is bits.wikimedia.org : it pings, it nmaps and has port 80 open, but it takes minutes to answer to queries. Meanwhile the whole html page can't be displayed by the browser [22:43:07] is that a known issue ? or should i triple check my ipv6 stuff ? [22:51:05] jeremyb: Damn it, just mis-used "master" again. [22:51:43] orzel: What's a VERY log? [22:52:43] binasher: http://radar.oreilly.com/2006/04/database-war-stories-3-flickr.html <-- is that last quote truthy? [22:53:16] orzel: If you think bits.wikimedia.org is intermittently unresponsive, please file a bug at . Any browser load time info you can attach to the bug is always appreciated as well. [22:53:24] I think Safari and Chrome can graph load time now. [22:53:37] If it's really locking up for a minute, something is broken. :-) [22:54:21] AaronSchulz: it may have been when made in 2006, but it hasn't been for years [22:54:50] orzel: It'd also be helpful to know just how intermittent, if you have any details... like does it typically happen only at a particular time in the day? And does it ever happen with IPv4? [22:59:59] Brooke: s/log/lot/, sorry [23:00:19] No problem. :-) [23:00:30] Brooke: as said : never had any problem with ipv4, and it has consistently been very long to repond for the last few weeks [23:01:04] I believe you. If you can file a bug, that'll be very helpful in getting the issue resolved. [23:01:11] IRC is very hit-or-miss. [23:01:37] i'll try to get some timechart [23:01:38] !log killing nagios-wm process which isn't supposed to be on #wikimedia-tech anymore [23:01:47] Logged the message, Master [23:02:15] mutante: Thanks for that. :-) Any idea how the process spawned? It seems like it was loading an old version of the file or something. [23:07:19] mm, it works better with firefox than with chromium. For some reason ff manages to display most of the page even while bit.wikimedia.org doesn't answer. Chromium waits for it to answer... [23:07:54] Brooke: no. indeed. [23:08:15] Brooke: and i can't get it to start with the correct channel either ,,grrr [23:08:34] anyone with debian packaging experience ? [23:08:37] the url failing is a quite long one : http://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=ext.articleFeedback%7Cext.articleFeedback.ratingi18n%7Cjquery.appear%2CarticleFeedback%2Clocalize%2Ctipsy%7Cmediawiki.language%7Cmediawiki.language.data%2Cinit&skin=vector&version=20120813T230113Z&* [23:09:55] mutante: Bots are evil. [23:17:35] Brooke: there's no category related to hosting. in the bugzilla ? that seems weird [23:17:53] orzel: Just file it under any category you'd like. [23:17:58]