[01:37:04] are the devs aware that the API is currently acting funny? [01:52:07] oh dear I've found it: [01:52:20] it is a bug indeed [01:52:43] the 'repository' characteristic of images is not being returned [03:05:11] ugh it's only appearing after so many characters in the query [03:05:16] so hard to pin down [03:05:18] c'est la vie [04:14:30] ok, who broke the site ? [04:15:33] what's broken? [04:15:47] rendering cluster paged me [04:16:07] ipv6 en.wikipedia works for me [04:16:17] how much caching does wikipedia have? [04:16:25] lots [04:16:31] like, if it went down, would anyone notice :P [04:16:46] it would only knock offline new rescaling of images [04:18:29] swift is unhappy :( [04:21:48] more unhappy than usual? [04:21:58] what was the fix last time? [04:22:59] i have no idea ... [04:23:04] yeah, the ms-fe's are all totally in swap [04:23:07] like hardcore [04:24:06] also i just got back from dinner with a fair amount of wine .. worst time to have to debug [04:24:42] I'll look [04:24:52] i just restarted swift proxy-server on ms-fe4 [04:24:56] ms-fe3 is having the same problem though [04:24:59] and has been untouched [04:25:12] i just logged out of its serial [04:28:39] !log on ms-fe3: restarting swift-proxy due to swap [04:28:51] Logged the message, Master [04:31:19] still trying to get a shell on ms-fe2 [04:31:21] TimStarling: i love we came to the same conclusion - did you figure out anything ? [04:31:30] http://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&m=mem_report&s=by+name&c=Swift+pmtpa&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=3 [04:31:37] this is obviously a regular problem [04:31:44] yeah [04:31:51] very sawtoothed graph [04:31:51] nobody restarted it over the weekend, so it exploded [04:32:50] TimStarling: interesting , so ms-fe1 is precise and there's no sawtoothedness/swapdeath [04:33:22] also no CPU or network traffic [04:33:32] probably easier to avoid leaking memory when you're not doing anything ;) [04:34:16] !log on ms-fe2: restarted swift-proxy due to mem leak [04:34:27] Logged the message, Master [04:35:07] it's nice that they were reasonably responsive while they were swapping [04:35:22] often I just give up waiting and power cycle machines that are in swapdeath [04:36:00] it's easy enough to fix this semi-permanently, you know [04:36:30] just run swift-proxy from a restart loop, and disable swap [04:36:42] restart loop -- [04:36:50] disabling swap is not necessarily a bad idea [04:36:50] yeah, [04:36:53] #!/bin/bash [04:36:55] while true; do [04:36:59] hah [04:37:00] swift-proxy [04:37:03] sleep 1 [04:37:05] done [04:38:06] some people don't like restart loops, they think we should fix the applications [04:38:13] it's not very elegant [04:38:33] but it's like 3 lines of code and it'll fix the problem so well that nobody will even notice it's broken [04:39:32] i prefer the fix the problem solution myself [04:40:29] anyways, i'm off - thanks tim :) [04:43:25] bye [04:54:49] https://commons.wikimedia.org/wiki/Special:NewFiles [04:55:39] https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Thandie_Newton_2%2C_2010.jpg/92px-Thandie_Newton_2%2C_2010.jpg [04:55:47] I'm getting a Python traceback there. [04:57:13] TimStarling: ^ [04:58:48] seems kind of wrong to cache a 500 error [05:10:18] I'm also getting "HTTP Error 504: Gateway Time-out" when trying to upload through the API. [05:11:39] Brooke: it fixed itself [05:12:12] For that image, yes. [05:12:26] I tried loading an image on Commons and got: [05:12:30] A database error has occurred. Did you forget to run maintenance/update.php after upgrading? See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script [05:12:30] Query: SELECT 1 FROM `image` WHERE img_name = 'New_Government_of_the_U.S._-_NARA_-_5730035.jpg' LIMIT 1 FOR UPDATE [05:12:31] Function: LocalFile::lock [05:12:32] Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.0.6.41) [05:18:28] Hi all, if anyone's around, I'm unable to upload a small 3 MB file to Commons at the moment. I'm using the basic upload form and I get a timeout. Is this a known issue? [05:18:50] Request: POST http://commons.wikimedia.org/wiki/Special:Upload, from 69.214.171.3 via cp1011.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.131 (10.64.0.131) [05:18:50] Error: ERR_READ_TIMEOUT, errno [No Error] at Mon, 15 Oct 2012 05:03:19 GMT [05:19:03] I'm looking at it [05:19:09] Dmcdevit reported it also [05:19:10] Okay thx :-) [05:19:44] FYI russavia is reporting the same issue [06:26:44] TimStarling: FYI some thumbnails are also failing to generate, I get the following stack trace: http://pastebin.com/Jk8Ayjnq [06:52:57] bbl [07:06:39] I get a message that the servers are overloaded [07:26:20] yes, they are overloaded [07:31:38] ms-be11 is clearly out of workers [07:32:01] sorry for not getting to this earlier, this is my first time looking at swift ops [07:33:45] !log experimentally doubling the worker count on ms-be11 since wchan indicates that the worker pool is exhausted [07:33:56] Logged the message, Master [07:37:55] !log ms-be11 showed an immediate improvement in bandwidth out, but wchan still indicates that 48 is not enough, increasing to 100 [07:38:06] Logged the message, Master [07:53:38] !log increasing worker count to 100 on all swift backends, via puppet [07:53:49] Logged the message, Master [08:11:06] any NoSQL experts present? [08:18:25] mystery solved, it's all about Gertie the Dinosaur [08:20:32] hi TimStarling , sorry I know nothing about NoSQL / Swift / reddis etc [08:20:57] it's ok, I was only asking so I could troll them [08:21:03] just heard and looked at the concept [08:21:29] aren't you working on migrating us from memcached to Reddis ? [08:21:34] what's it called when you get slashdotted from one of those special event google banners? [08:21:41] because I think "googled" is already taken [08:22:56] go to google.com, what do you see? [08:23:16] it takes a while, it's a little game [08:23:38] I see some comic strip coming from the 1920's [08:23:41] little nemo [08:23:43] googledotted? [08:23:52] those banners have a name I think [08:24:04] doodles [08:24:20] anyway when you click it enough times it takes you to http://en.wikipedia.org/wiki/Winsor_McCay [08:24:21] so you could probably say that we have been "doodled" ;D [08:24:33] and now millions of people are trying to play the video there, which is Gertie the Dinosaur [08:24:43] and that is hosted on sdd of ms-be11 [08:24:54] and as a result, sdd of ms-be11 is massively overloaded [08:25:00] apparently we don't have any caching or anything [08:25:11] so we just have to serve the file straight out of disk, uncached [08:25:13] I think we used to have cache of videos [08:25:20] but got disabled cause of some trouble with varnish [08:25:40] no [08:25:48] ahh mark will tell :-] [08:25:55] ah yes, it's 57MB, that is the size of the object file on the backend [08:26:03] http://paste.tstarling.com/p/HlwRSm.html [08:26:05] holy f*ck [08:26:13] tim the investigator [08:26:14] see, ms-be11 has 1300 FDs open for this video [08:26:37] amazing [08:27:08] well, it would have had a lot less, but I quadrupled the number of workers on it [08:29:15] i'm getting a cached hit from the squids, even in the frontends [08:29:24] but it's loading super slow for some reason [08:29:26] try with a Range? [08:29:57] cheers Tim, it would have taken me quite a while to find that out [08:30:05] among other reasons because I never visit google's frontpage [08:30:35] i'm still waiting for the full object to come in [08:30:42] I have been working on it for a few hours [08:30:58] I didn't visit google's front page until I found it in the squid referrer logs [08:31:24] could it be that squid request the file in parallel and never manage to cache it ? [08:31:55] hmm, rendering.svc is still down according to nagios [08:31:59] what I want to know is: why isn't ms-be11 serving it out of the kernel cache? [08:32:06] the load on image scaler is nothing [08:32:13] I think the backends are the root cause there [08:33:23] iostat shows it pumping out 40MB/s from the underlying device [08:34:14] rendering doesn't work [08:34:38] sure, but I think if you fix the Gertie issue, rendering will start working [08:41:07] TimStarling: so imagescalers have too many requests waiting because swift's slow because ms-be11 is slow? [08:41:25] is that your working theory? [08:41:37] swift is certainly very slow to respond to any queries [08:42:45] shouldn't maybe gertie the dinosaur be removed from the article for now...? [08:46:35] I see swift doing a lot of fadvise64(.., POSIX_FADV_DONTNEED) calls [08:46:44] yeah, I've noticed those too [08:46:46] not sure yet on what kind of FDs [08:46:56] and asked the swiftstack people when I met them a month ago [08:47:11] they said that they do that sometimes, depending on the file size [08:47:16] argh [08:47:17] didn't exactly understand why [08:47:34] perhaps we should hack them out in swift on ms-be11 [08:49:36] read += len(chunk) [08:49:36] if read - dropped_cache > (1024 * 1024): [08:49:36] self.drop_cache(self.fp.fileno(), dropped_cache, [08:49:36] read - dropped_cache) [08:49:36] dropped_cache = read [08:50:39] are you kidding me [08:51:10] TimStarling: so imagescalers have too many requests waiting because swift's slow because ms-be11 is slow? [08:51:34] yes, on ms-be11 I saw a lot of established connections from rendering.svc [08:52:04] ok, maybe I saw those on the frontend, come to think of it [08:52:11] yeah [08:52:29] so, 57M is certainly bigger than 1M [08:52:30] but yes, my theory is that a single slow hard drive will eventually suck up all available rendering threads and a good deal of general swift cluster resources [08:52:32] hence the fadvise [08:52:41] hence the dropped cache [08:52:42] due to long timeouts and lack of concurrency limits [08:52:48] what the fuck [08:52:52] let's hack that out [08:52:56] okay. [08:52:58] doing it [08:52:58] yes, hack it [08:54:17] done, swift restarting [08:54:21] done [08:54:26] sdd utilisation dropped [08:54:35] it's similar to the other drives now [08:54:58] iowait down [08:55:04] no, it takes a few minutes to repool [08:55:21] ok, then it dropped for being not in use ;) [08:55:28] I think the frontends must declare it down [08:56:09] so now probably some other backend is having the problem [08:56:23] it serves more traffic than other backends atm [08:56:53] it's repooled now, there was a jump in the network out [08:57:15] indeed [08:57:26] iowait is still normal [08:57:29] sdd utilisation back at 100% [08:57:37] ah now it's not [08:57:48] yeah it looks fine now [08:58:00] I'm looking at ganglia [08:58:06] iostat with a short polling interval is always noisy [08:58:09] right [08:58:12] was about to say that [08:58:18] http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=ms-be11.pmtpa.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2&st=1350291301&g=cpu_report&z=large&c=Swift%20pmtpa [08:58:20] sometimes I use iostat -xd 30 [08:58:35] well it did that for about 30s [08:58:43] perhaps to load that file once ;) [08:58:45] so, mark, add that to the list of how we're not prepared for videos [08:58:49] i did [08:59:52] KEEP_CACHE_SIZE = (5 * 1024 * 1024) [09:00:00] if response.content_length < KEEP_CACHE_SIZE and \ [09:00:01] 'X-Auth-Token' not in request.headers and \ [09:00:01] 'X-Storage-Token' not in request.headers: [09:00:01] file.keep_cache = True [09:00:21] which shortcuts self.drop_cache to nothing [09:00:32] but in this case it's a noop, since we do have an X-Auth-Token [09:00:42] not that it would matter, as we're above 5M too [09:01:22] so... ms-be11 is normal again, but swift is still slow as hell [09:01:25] and rendering is still down [09:01:44] the netapp copy finished btw, and the netapps are now doing snapmirror in sync mode [09:01:52] which possibly slows down nfs access [09:02:10] ms-be12 needs the same trick, sdf there is overloaded [09:02:39] doing ~53 MB/s out of sdf alone [09:03:43] done [09:03:50] (and thanks) [09:03:58] my test when I started: [09:03:59] Connect time: 0 ms [09:03:59] Request to response headers: 7 ms [09:03:59] Request to first data byte: 8 ms [09:03:59] Received 59646368 bytes, at 54000 bytes/s average [09:04:00] Request to end of data: 1099700 ms [09:04:00] Total time: 1099700 ms [09:04:18] i'm quite liking my new http test script, I've used it for various purposes already ;) [09:04:20] it's Gertie also: "GET /sdf1/24874/AUTH_43651b15-ed7a-40b6-b745-47666abf8dfe/wikipedia-commons-local-public.3b/3/3b/Gertie_the_Dinosaur.ogv" [09:05:42] and ms-be1: /sdl1/24874/AUTH_43651b15-ed7a-40b6-b745-47666abf8dfe/wikipedia-commons-local-public.3b/3/3b/Gertie_the_Dinosaur.ogv [09:06:17] debstack can get to work on that package ;) [09:11:27] mark: is that script in git? [09:11:32] no [09:11:47] it's only in /home/mark/firstbyte.py [09:12:21] it was the quick and dirty script I wrote after our http test tools discussion [09:13:21] i can clean it up a little and put it in git if you want [09:14:31] I think it would be useful [09:15:03] ok [09:19:42] that video still loads very slowly [09:20:01] maybe we are hitting another limit [09:20:18] you mean from swift? [09:20:22] or squids? [09:20:25] no from the squids [09:20:31] swift is very slow to load in everything [09:20:32] yeah, the CARP-balanced squid [09:20:39] ah better now [09:20:40] Received 59646368 bytes, at 1007000 bytes/s average [09:20:47] 1 MB/s [09:21:00] still not great, from fenari [09:21:32] swift timeouts completely argh [09:21:33] sq85 is the carp balanced squid [09:22:54] ms-be1 is also i/o waiting like crazy [09:22:57] live hacking there too [09:23:02] yeah tim said tht [09:23:14] oh I missed it [09:24:05] !log live-hacking swift on ms-be10, ms-be12, ms-be1 to remove fadvise calls [09:24:16] Logged the message, Master [09:24:17] i'm hungry, i need breakfast [09:24:31] I need coffee [09:24:37] yes that too [09:24:37] TimStarling: around? [09:24:42] breakfast includes coffee [09:24:50] liangent: yes [09:25:02] hmm now i'm getting a MISS from sq85 for some reason [09:25:10] it doesn't seem overloaded [09:25:15] no [09:25:20] TimStarling: some users are racing article creation with bots [09:25:37] i wonder if it's just evicting from its cache really quickly [09:25:40] including users from zh, vietnam, swedish [09:26:07] liangent: malicious users? [09:26:08] not sure whether this is affecting job queue or system load [09:26:20] or just normal articles? [09:26:27] TimStarling: normal articles [09:26:39] usually created from database [09:27:04] most are about towns currently [09:27:33] they just don't want to see their language get lower ranked in wikipedia rankings by article number [09:27:54] I think it's ok, as long as they don't use country data templates [09:28:06] and as long as the bots are single-threaded [09:28:21] TimStarling: zhwiki job queue rised by factor 12 since we talk about it last time. my bot wasn't active in template namespace since that time. can you have a look why so many new jobs were added? [09:28:24] we don't want people attempting to insert large numbers of articles concurrently [09:29:17] HTTP/1.0 504 Gateway Time-out [09:29:17] Server: squid/2.7.STABLE9 [09:29:17] Date: Mon, 15 Oct 2012 09:28:55 GMT [09:29:17] Content-Type: text/html [09:29:17] Content-Length: 1346 [09:29:18] X-Squid-Error: ERR_CANNOT_FORWARD 11 [09:29:18] X-Cache: MISS from sq85.wikimedia.org [09:29:19] X-Cache-Lookup: MISS from sq85.wikimedia.org:3128 [09:29:55] yeah swift is dead [09:29:57] no idea why [09:30:02] looking [09:30:45] Merlissimo: we don't have logs of job queue insertions, only job queue removals [09:30:55] TimStarling: hmm some or articles created by them are using that french "database" templates [09:31:36] TimStarling: but the account name is added to the database table, i think [09:32:19] sq85 isn't caching that video for very long, that's for sure [09:32:29] looking at the database is more helpful, no username though [09:32:49] but there is an insertion time [09:32:52] then another bot is chasing the creation bot to add iw links... [09:36:03] swift latency seems to be normal, it's imagescalers that are not responding [09:36:34] I'm looking at srv220, 465 established connections, 20 apache processes (MaxClients is 20) [09:37:45] probably a backlog of not yet thumbed images [09:39:00] could be, although they're relatively idle in CPU [09:39:23] seems better than before though [09:39:25] yeah very [09:41:50] i restarted apache on srv224 [09:42:02] to see what effect it would have [09:44:01] there are lots of errors about curl getting from swift in the logs [09:44:02] but not anymore [09:44:12] backlog is sane [09:44:12] Oct 15 08:43:21 srv224 apache2[9598]: PHP Warning: SwiftFileBackend::getLocalCopy: Invalid response (): (curl error: 18) transfer closed with 3915839 bytes remaining to read: Failed to obtain [09:44:13] valid HTTP response. in /usr/local/apache/common-local/php-1.21wmf1/includes/filebackend/SwiftFileBackend.php on line 1364 [09:44:46] okay, I'm going to restart the rest the [09:44:47] then [09:45:12] i assume it's gonna be fine soon [09:46:19] and there is the text [09:46:56] yeah I'm restarting apaches [09:48:26] alright [09:48:33] we'll have to investigate the caching of videos more [09:48:51] but breakfast doesn't need to wait for that [09:48:56] !log restarting all imagescaler apaches, did not recover after swift outage [09:48:57] so i'll be back later ;) [09:49:07] Logged the message, Master [09:49:22] are you gonna patch out the fadvise thing in the pacakge? [09:49:44] probably [09:49:56] I was actually thinking of upgrading swift this week to a newer version [09:50:06] the swiftstack people promised to help with the leaks too :) [09:50:20] ok [09:50:29] brb too [09:50:35] want coffee. need coffee. [09:50:40] same here [09:53:12] Merlissimo: refreshLinks2 jobs can be split into smaller jobs if they have more than 10 pages in them [09:54:19] for example there was a [[Template:Country_data_United_Kingdom]] that was split into 50 jobs [09:56:06] that's the usual split, actually, $wgUpdateRowsPerJob / RefreshLinksJob2::MAX_TITLES_RUN = 50 [09:56:37] so as the job runners hit expensive jobs, the job queue size appears to expand by a factor of 50 [09:59:23] TimStarling: would it be possible to delete jobs manually caused by me after i moved all country data iws to subpages? i could do this in a way that jobs created by this chance won't have any effect. [10:00:14] it would be possible to remove all the refreshLinks jobs for country data templates [10:00:20] it's not so easy to tell why each was inserted [10:01:33] but I could remove them all and then reinsert a single copy [10:01:37] as a kind of duplicate removal [10:02:39] I guess all refreshLinks2 insertions should do that kind of duplicate removal [10:03:04] TimStarling: then i'll first write an update script that moves the langlinks all at once and then you could do this. [10:03:16] ok [10:04:06] liangent: should i announce this on zhwiki first? [10:05:39] not sure, anyway it won't affect most of users, and all users affected should be technical [10:05:59] Merlissimo: or let me announce it in Chinese. can you give some examples? [10:06:05] of your bot changes [10:06:32] its like to one you did last time manually [10:06:58] Merlissimo: and its not running currently? [10:08:58] bbl [10:09:23] i have added automatically move to subpages for country data at my bot framework when my bot wants to update langlinks. but now i'll do the chance once. [10:10:08] currently the job queue size blocks my automatically running bot for doing changes on zhwiki template namespace [10:16:15] and back [10:19:43] TimStarling: is there a job queue size graph available ? [10:19:50] like the ts replag graph [10:20:38] Merlissimo: I just want to confirm it's working fine [10:20:56] liangent: probably [10:21:57] liangent: i am still working on my bot code. [10:22:37] but i am doing test edits first, of course [10:23:26] Merlissimo: ok let me say it with my example first [10:24:20] TimStarling: probably = you have data to generate one but nothing is available currently? [10:24:29] well bbl again [10:32:52] probably as in probably someone somewhere makes one of those [10:32:59] maybe on toolserver [10:33:09] maybe also asher made one [12:38:43] paravoid: thans for the update and resolution to the issue [12:39:06] sorry for taking so long :) [12:39:31] no problem [12:59:29] liangent: job queue size on only available for enwiki. Reedy created it i think. but not for zhwiki. i requested such a graph last time, but nobody set one up i think. [12:59:54] I didn't create it [13:00:22] One of the first things maplebed did.. [13:00:46] Reedy: yes you asked be to ping other people because you had to sleep. but nobody did it. [13:01:05] I still didn't create it :p [13:42:09] hello tehre :-] [14:42:45] bminish: mark contacted tele2 and they've fixed the problem [14:42:49] not that you care anymore, but still [15:17:05] Howdy,.. what kind of MPM model are Wikipedia, anybody here that knows? [15:17:28] And how large is a typical process/thread? [15:17:34] probably mod_php [15:17:40] or however it's called in apache2 [15:17:45] *whatever [15:18:00] not that,.. [15:18:27] Apache has several models for how it processes requests [15:18:36] prefork, threaded, and so forth [15:19:00] I don't think it's prefork [15:19:38] ii apache2 2.2.14-5ubuntu8.9 Apache HTTP Server metapackage [15:19:38] ii apache2-mpm-prefork 2.2.14-5ubuntu8.9 Apache HTTP Server - traditional non-threaded model [15:19:42] It doesn't seem very likely.. but memory usage is more critical in the threaded model [15:20:01] * jeblad_WMDE is chockolated [15:20:04] looksm like prefork to me [15:20:33] well, that answers the question [15:20:42] He,.. you can speed up the webservers drasitcally by choosing another model [15:20:57] I'm sure there's good reason... [15:21:05] But nice, then we can do more ugly stuff.. 8D [15:21:16] mod_php5 works well with the threaded model? [15:21:34] a threaded php is tipically much slower [15:21:42] Not 100% sure, I usulally play with ModPerl [15:21:44] and php is the important piece in wmf setup [15:21:53] is someone able to reliably answer http://lists.wikimedia.org/pipermail/wikimedia-l/2012-October/122332.html ? [15:22:32] does the 'ip' ratelimits in $wgRateLimits affect also (new?) users editing from the same IP, by cumulating their edits? [15:22:34] Prefork spins off a process and use it once before it is killed. Maximum security, but it costs. [15:22:50] But thanks. [15:22:54] :) [15:24:48] Nemo_bis, the ip limit is per action and per ip [15:25:05] which means that when there are moultiple newbies under the same ip, they are aggregated [15:30:28] Reedy, can you check another thing for me? Please? becauseyouaresoverykindandhelpfull.. :D [15:30:36] sure [15:30:38] :p [15:30:46] How large is a process for Wikipedia typically? [15:30:56] [ ] [15:30:58] ^ this big [15:31:03] hehee [15:31:33] I wonder if thats in our profiler/graphite [15:31:42] We have memory use for a single request about 35-45 MB [15:32:06] 'default' => 128 * 1024 * 1024, // 128MB [15:32:07] Seems like 43-44MB is typical [15:32:11] # Extra 60MB for zh wikis for converter tables [15:32:56] <^demon> (And those still OOM) [15:33:01] yup [15:33:09] we're nearly there for dedicated apaches... [15:33:30] jeblad_WMDE: I get a feeling we might have this somewhere. Probably better to ask Asher [15:33:43] hehe, seems like our processes is well behaved then.. We just though we were growing out of bounds [15:34:06] <^demon> Well, each release of MediaWiki tends to consume more memory than the previous. [15:34:11] <^demon> Since more is better, of course :D [15:34:57] * jeblad_WMDE once had a linux box with 64MB ... [15:37:19] <^demon> I have 16GB on this laptop. And Eclipse eats about 8 of those :p [15:38:52] Nemo_bis, I replied on foundation-l [15:50:44] Platonides: thanks [15:52:32] Platonides: does it also follow the local definition of autoconfirmed? [15:52:39] en.wiki's is stricter [16:26:57] paravoid: thanks again, now if only commercial and government entities could resolve issues as quickly and as transparently.. [17:52:21] Hola, un Steward que hable espaƱol? [17:57:32] Help!! [18:00:23] Deivismaster: #wikipedia-es ? [18:00:40] Yes... [18:01:18] Platonides: about? [18:02:18] Reedy: ready for the 1.21wmf2? [18:08:43] Reedy? [18:09:38] Deivismaster: try #wikimedia-stewards [18:12:31] 1 Fatal error: Call to a member function truncate() on a non-object in /usr/local/apache/common-local/php-1.21wmf2/extensions/CodeReview/ui/CodeRevisionListView.php on line 453 [18:12:35] Casualty no 1 [18:13:15] Aha [18:17:20] was $wgLang recently removed? [18:17:45] <^demon|lunch> Shouldn't have been. [18:17:51] <^demon|lunch> Not without major release-notes. [18:18:00] <^demon|lunch> (Big b/c break) [18:18:43] just wondering why https://gerrit.wikimedia.org/r/28063 was necessary [18:19:57] not that it's bad to do this sort of cleanup, but we shouldn't be doing it as part of a deploy [18:20:14] <^demon|lunch> ack -c 'wgLang[^a-z]' | grep -v ':0' | wc -l [18:20:14] (rather, we shouldn't need to) [18:20:15] <^demon|lunch> says 47. [18:20:35] <^demon|lunch> (Probably an easier way, but meh) [18:21:21] Probably part of siebrands maintenance [18:22:54] oh, right, I see that now [18:23:10] https://gerrit.wikimedia.org/r/#/c/24651/ [18:23:29] 1 Catchable fatal error: Argument 1 passed to EditPage::toEditText() must implement interface Content, boolean given, called in /usr/local/apache/common-local/php-1.21wmf2/includes/EditPage.php on lin [18:23:29] e 779 and defined in /usr/local/apache/common-local/php-1.21wmf2/includes/EditPage.php on line 1908 [18:24:59] $content = $this->getContentObject( false ); #TODO: track content object?! [18:24:59] $this->textbox1 = $this->toEditText( $content ); [18:25:15] Reedy: siebrand: when siebrand makes a change, and Reedy approves it, has anyone actually run the code in question? [18:26:36] Yeah, hence finding stuff that was removed that shouldn't have been... [18:29:16] Logged that error above as a bug.. It has gone away in the error logs again.. [18:46:32] Reedy: What's the RL-related stuff you were cherry-picking in then reverting? [18:47:08] RoanKattouw: Trying to fix teh array_map warning spam from the php suppression of the exception [18:47:23] I found a couple of bad paths (leading to 19 broken modules) in mobile frontend [18:47:33] Ala https://gerrit.wikimedia.org/r/#/c/28030/ [18:47:48] ha [18:47:52] Which I deployed to 1.21wmf1, not realising they had a bunch of untested features in it, so had to revert it [18:48:02] RoanKattouw: [18:58:20] Can someone please create on flourine: /a/mw-log/udp2log/resourceloader.log chmod 644 and owned by udp2log:udp2log [18:48:11] Yes, already on it :) [18:48:13] ^ so we can find out which the other offenders are [18:48:14] thanks :) [18:48:51] Reedy: /a/mw-log/udp2log doesn't exist [18:49:07] Sounds like you just want /a/mw-log/resourceloader.log ? [18:49:35] hah, yes please [18:49:49] !log Created fluorine:/a/mw-log/udp2log/resourceloader.log chmod 644 and owned by udp2log:udp2log as root per Reedy's request [18:50:01] Logged the message, Mr. Obvious [18:55:53] Now there is just the question is why the log is still empty... [19:01:15] Reedy: sitting with dominic and he's asking about 41028. (this morning's uploads) [19:01:37] Reedy: you say last file won't upload but actually it was in the middle of the list right? [19:03:06] !b 41028 [19:03:06] https://bugzilla.wikimedia.org/show_bug.cgi?id=41028 [19:03:55] anyway, what next? if it's a swift error then it's not a problem with the format or content of the file? [19:12:20] robla: any idea? [19:12:57] i guess we wait for aaron & co. ? [19:14:02] jeremyb: yeah, looks like one for AaronSchulz to poke [19:14:17] (who is afk at the moment) [19:16:54] jeremyb: it's the remaining file [19:16:57] so therefore it is the last file [19:17:27] Reedy: ok, thought maybe that was it [22:51:02] gn8 folks [23:13:04] spagewmf: hey, quick question about your keys [23:13:27] notpeter yes? Use the deploy one please [23:13:38] for deploy and access to stat1? [23:13:52] or do you wnat seperate keys for deploy and stat1? [23:14:27] notpeter I can already ssh to stat1.wikimedia.org [23:17:04] yes. we have a key for you that is currently on stat1. do you want me to replace that key with the one you provided today and add it to fenari for deployment as well? or do you want to have seperate keys, one for deployment on fenari, and another one (the one you currently use) for access to stat1? [23:19:47] notpeter, I think the latter. mutante in IM said "yes, something different from labs would be good" so I made a new id_rsa key just for deployment. [23:22:07] spagewmf: yea, labs keys should be different from production keys. but at that point we did not think of the one for stat-1. if that is the labs key, then it's better to use the new one for fenari and stat1 and keep the other one just for labs/gerrit [23:22:27] ..also, the new one is longer [23:23:14] mutante, notpeter, fine use the new deploy for stat1. Thanks! [23:25:27] spagewmf: cool. sounds good. sorry for the bother