[04:01:51] any chance of some lovin for bug 64596 . not being able to submit patches is getting pretty old [07:47:59] Hello. I did make a whois tool. but I think possibly whois tool are violating Terms of use in Tool Labs, because APNIC said "Whois data copyright terms ". WMF (Legal | Staff); or Labs Administrators do think about these? and, We can't have a whois tool in WMF Labs? [09:01:08] 3Wikimedia Labs / 3tools: Please remove project local-maps - 10https://bugzilla.wikimedia.org/65250 (10nosy) 3NEW p:3Unprio s:3normal a:3Marc A. Pelletier I added 2 projects - maps & local-maps because I was not sure if adding the project maps worked. I cannot find how I can delete local-maps. Please... [09:22:27] hi, is there an overview of scientific publications using / based on wikipedia dumps or stats somewhere? [09:23:41] like a survey paper "how is wikipedia's data used by science"? [09:27:59] !log deployment-prep Logstash events stop at 2014-05-11T18:36:35Z; Log file shows many "Failed parsing date from field" errors which probably triggered the known upstream memory leak bug [09:28:01] Logged the message, Master [09:28:37] !log deployment-prep Restarted logstash service on deployment-logstash1 [09:28:39] Logged the message, Master [09:48:15] petan|hack: jsi? [10:21:01] jorn: something like http://wikilit.referata.com/ ; or http://wikilit.referata.com/wiki/A_Wikipedia_literature_review , but more recent? [10:24:43] https://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia may also be of use [10:37:46] jayvdb: thanks [11:54:37] 3Wikimedia Labs / 3tools: Please remove project local-maps - 10https://bugzilla.wikimedia.org/65250#c1 (10Tim Landscheidt) p:5Unprio>3Normal You're correct, tools can only be deleted by admins (cf. also [[wikitech:Nova Resource:Tools/Help#Can I delete a tool?]]). But [[wikitech:Special:NovaServiceGroup]... [11:55:36] !add-labs-user [11:58:56] rxy: You mean a web application that offers a whois service? What do you need that for? [12:15:06] 3Wikimedia Labs / 3tools: Please remove project local-maps - 10https://bugzilla.wikimedia.org/65250#c2 (10nosy) Waiting is perfectly fine. You can see both projects (maps and local-maps) on the tools.wmflabs.org page. The one I want to have removed is the one with only kolossos and nosy as members. [12:40:54] (03PS1) 10JanZerebecki: Add dhparam file which will be used by at least nginx. [labs/private] - 10https://gerrit.wikimedia.org/r/133066 [13:07:36] 3Wikimedia Labs / 3deployment-prep (beta): false "wiki is read-only mode" message in beta labs - 10https://bugzilla.wikimedia.org/65228#c5 (10Antoine "hashar" Musso) There is a few errors such as: Mon May 12 5:32:25 UTC 2014 deployment-apache01 testwiki Error connecting to 10.68.17.94: :real_connec... [13:29:33] (03PS1) 10Giuseppe Lavagetto: Change in class names for puppet 3.x compat. [labs/private] - 10https://gerrit.wikimedia.org/r/133071 [13:31:40] (03PS2) 10Giuseppe Lavagetto: Change in class names for puppet 3.x compat. [labs/private] - 10https://gerrit.wikimedia.org/r/133071 [13:31:51] (03CR) 10Giuseppe Lavagetto: [C: 032] Change in class names for puppet 3.x compat. [labs/private] - 10https://gerrit.wikimedia.org/r/133071 (owner: 10Giuseppe Lavagetto) [13:32:05] (03CR) 10Giuseppe Lavagetto: [V: 032] Change in class names for puppet 3.x compat. [labs/private] - 10https://gerrit.wikimedia.org/r/133071 (owner: 10Giuseppe Lavagetto) [15:26:51] 3Wikimedia Labs / 3deployment-prep (beta): false "wiki is read-only mode" message in beta labs - 10https://bugzilla.wikimedia.org/65228#c6 (10Chris McMahon) Several failed builds with this error over the last few days, this is the most recent from last night 13 May: https://wmf.ci.cloudbees.com/job/Visual... [18:53:21] How do I use the shared pywikibot? [18:55:52] a930913: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Pywikibot [18:59:52] valhallasw: Yeah, I still need help. [19:00:02] I've done all that. [19:01:06] :~$ python -c "import pywikibot" ImportError: No module named pywikibot [19:01:39] a930913: what does echo $PYTHONPATH return? [19:01:43] Nout. [19:02:07] ? [19:02:20] Ok, I ran the export again and now it's changed :/ [19:02:35] Ok, weirdness. [19:02:51] I ran the export a number of times before that. [19:03:16] And you're using bash? other shells might need a different syntax [19:04:14] Whatever I'm logged into when I SSH. [19:08:14] FFS, grrrit-wm1 [19:09:31] !log Restarted grrrit because it had a stupid nick [19:09:32] Restarted is not a valid project. [19:09:37] !log grrrit Restarted grrrit because it had a stupid nick [19:09:37] grrrit is not a valid project. [19:09:42] !log tools Restarted grrrit because it had a stupid nick [19:09:53] Logged the message, Master [19:11:50] a930913: that should be bash [19:12:24] 3Wikimedia Labs / 3tools: Large files hang during download - 10https://bugzilla.wikimedia.org/65272 (10Merlijn van Deen) 3NEW p:3Unprio s:3major a:3Marc A. Pelletier For the pywikibot project, at least. Try downloading 'core.tar.gz' or 'core.zip' from http://tools.wmflabs.org/pywikibot/ ; All downlo... [19:12:33] valhallasw: Mmm. [19:12:38] As I said, weirdness. [19:12:45] a930913: yeah - not sure why that would happen :/ [19:20:07] 3Wikimedia Labs / 3tools: Large files hang during download - 10https://bugzilla.wikimedia.org/65272#c1 (10Maarten Dammers) I have the same problem. Firefox on Windows. [19:23:37] 3Wikimedia Labs / 3tools: Large files hang during download - 10https://bugzilla.wikimedia.org/65272 (10Merlijn van Deen) [19:23:37] 3Wikimedia Labs / 3tools: Random 503 Service Temporarily Unavailable errors from tools-webproxy - 10https://bugzilla.wikimedia.org/65179#c5 (10Merlijn van Deen) There are also issues with Pywikibot's nightlies stopping transfer after ~50kB. Might be related, but there are no 500's involved. https://bugzilla... [19:40:22] 3Wikimedia Labs / 3tools: Random 503 Service Temporarily Unavailable errors from tools-webproxy - 10https://bugzilla.wikimedia.org/65179#c6 (10Yuvi Panda) Is this still happening? I rolled back the nginx change right after making that comment (and mentioned on IRC, but didn't get time to response here - sorr... [20:20:38] 503 in all channels :/ [20:20:58] hedonil? [20:21:10] oh, wow. [20:21:10] http://tools.wmflabs.org/catscan2/server-status [20:21:12] YuviPanda: ^ [20:21:19] valhallasw labs webservers are dead [20:21:44] they were working an hour ago :-p [20:21:52] 3Wikimedia Labs / 3tools: Random 503 Service Temporarily Unavailable errors from tools-webproxy - 10https://bugzilla.wikimedia.org/65179#c7 (10Merlijn van Deen) It's definitely happening right now: 503 Service Temporarily Unavailable nginx/1.5.0 [20:22:44] * Damianz wonders at which point his hatrid of toolserver will be transfered fully to tools and he will go back to using a seperate project. [20:23:09] Existing connection to webservice is still working. [20:23:19] well, the lighty's are running. Seems to be tools-webproxy again... [20:23:25] Damianz: probably at the point someone from the WMF repeats the 'LABS IS BETTER THAN THE TOOLSERVER' mantra. [20:23:54] yeah, server's running but nothing in the (access|error).log, proxy problem? [20:24:33] I think labs in general is better than toolserver (granted toolserver only REALLY sucked in the last few years), but tools seems to be inheriting lots of things... probably because of the 'WE WANT TOOLSERVER IN LABS' mantra... sadly key things like db replication is still tools heavy. [20:25:19] Like moving crons to grid... for people abusing cron, a good idea, for those using cron to check/trigger jobs, breaking them is the WORST experience ever. Amount of times working systems become broken systems before of SOMEONE ELSE is annoying. [20:25:37] 3Wikimedia Labs / 3tools: Random 503 Service Temporarily Unavailable errors from tools-webproxy - 10https://bugzilla.wikimedia.org/65179#c8 (10metatron) Here's a feedback: After the rollback everything went back to usual (not normal). Right now 503 on all channels. [20:26:41] Damianz: well, the 'let's waste volunteers time by breaking things' is the most annoying part [20:27:22] 3Wikimedia Labs / 3tools: Random 503 Service Temporarily Unavailable errors from tools-webproxy - 10https://bugzilla.wikimedia.org/65179#c9 (10Morten Wang) Same problem, and I see no entries in the access.log or error.log. lighttpd appears to be running, thus it seems HTTP requests don't make it past the pr... [20:27:30] first by proposing to kill off the toolserver, asking people to migrate to labs -- to the bots project, at that moment, then to tools, then from ptmpa to eqiad (and with that from apache to lighttpd), then the cron stuff... ugh. [20:28:14] * Nemo_bis seeing 503 too [20:28:24] valhallasw: ugh, I don't know what's happening. [20:28:26] valhallasw: let me look at logs [20:28:45] we should get a nagios shouting :-p [20:29:10] Tbh, a staggered forced migration every year wouldn't be too bad in terms of spring cleaning. [20:29:12] valhallasw Im about to take a shotgun to the servers [20:29:40] a930913: if with 'spring cleaning' you mean 'wasting even more of everyones time', you're right. [20:29:42] Betacommand: Think of the server kitties! [20:30:03] Damianz: probably using a separate service is best , if you can :) tools/labs is good for those who can't afford or don't want or don't already have a vserv or other histing or whatever, or aren't sure their things are going to last (this is most people after all) [20:30:16] a930913: those where already sent home [20:31:04] now it is all dead [20:31:06] I just restarted it [20:31:33] ok back to 503 [20:31:48] My stats just flatlined :p [20:33:42] Nemo_bis: Main problem is certain things via the api is /really/ inefficient vs direct db access - use to have remote endpoint on ts, but it was soooo unreliable. [20:33:48] * Betacommand is about to send a nasty email to labs, and wikimedia-l [20:34:01] okshould be back now [20:34:31] YuviPanda: that also magically fixed the pywikibot issue [20:34:33] YuviPanda: any idea on the cause? [20:34:40] redis was flailing [20:34:43] It is, thanks :) [20:34:49] restarting *that* fixed it [20:35:28] YuviPanda: yet wikibugs was still functional. Strange. [20:35:54] valhallasw: this is a different redis. the webproxy has a local redis that stores routing info [20:35:54] !log redis failed, causing tools-webproxy to thow 503's [20:35:54] redis is not a valid project. [20:36:01] YuviPanda: ahh. [20:36:03] !log labs redis failed, causing tools-webproxy to thow 503's [20:36:03] labs is not a valid project. [20:36:14] !log tools-labs redis failed, causing tools-webproxy to thow 503's [20:36:14] !log tools redis failed, causing tools-webproxy to thow 503's [20:36:14] tools-labs is not a valid project. [20:36:17] Logged the message, Master [20:36:18] :-p [20:36:37] !log tools restarting redis on tools-webproxy fixed 503s [20:36:38] Logged the message, Master [20:36:40] valhallasw thanks, new it was something like that [20:36:54] BTW, does anyone about the connectivity tool, or Lvova? [20:36:55] needs root cause analysis, to see why that was happening. [20:37:02] know about* [20:37:10] * Betacommand makes a point of starting to log issues [20:37:32] ok, it looks like it is a scaling issue. [20:37:53] Betacommand: A tool that says "It's been x days since the last incident"? :D [20:37:56] but redis is web scale! [20:37:59] * valhallasw will shut up now [20:38:12] it has a default 511 connection backlog limit [20:38:23] and looks like the nginx lua connection pooling isn't working properly [20:38:34] a930913: no, just better documentation that I can show that labs sucks horribly [20:39:24] Betacommand: Make a tool that just has a single web page that says that in large. :) [20:39:56] but then, what to do when instead of that page labs presents an error page? [20:40:35] 'that page'? [20:40:43] oh [20:40:44] nevermind [20:41:15] a930913: Oh trust me Im going to get picky over this. Ive been looking for Coren, the person who is paid to support labs, for over a week with no luck. I had to hack the webservice script to get my tools functional [20:41:43] Betacommand: have you missed the announcement? [20:41:56] there was this thing called a 'hackathon' in Zürich [20:42:38] valhallasw he stated he would have wider availability [20:43:08] valhallasw: goiing MIA and not leaving others with root access is really annoying [20:43:31] Betacommand: err, no, "but vastly increased physical availability in Zürich [20:43:34] itself." [20:43:39] does anyone know how long it was out? [20:43:43] also, "After the Hackaton, I'm taking a few day's worth of vacation, and will return to full availability starting May 16th." [20:43:48] YuviPanda: maybe an hour or so [20:43:51] oh wow [20:43:59] yeah, we definitely need some monitoring [20:44:03] YuviPanda: it was still functioning when I posted the the pywikibot bug [20:44:04] "Oh I cant do" or "only coren can do that" [20:44:13] I need to make something ping my phone or something when the proxy is done [20:44:47] YuviPanda: Do you know if we could run into a kernel connection limit or would it be safe to increase the 511 connections limit of Redis to, say, 1024? [20:44:53] YuviPanda: thanks, htis time you're not the one to blame :P [20:44:56] YuviPanda: it's called nagios :-p [20:45:06] (If that's possible.) [20:45:14] scfc_de: our version doesn't actually have 511, I think. we are running 2.6 [20:45:19] scfc_de: https://github.com/openresty/lua-resty-redis/issues/27 is the upstream discussion [20:45:21] that we are hitting [20:45:23] YuviPanda: how about redundancy? [20:45:41] two proxies that are dns managed sounds like a good idea [20:45:44] and is fairly easy to do [20:46:17] But if Redis is the bottleneck, that would mean the limit would decrease to 255 :-). [20:46:37] Or Redis master -> slave. Hmmm. [20:46:40] scfc_de: no, it's a local redis instance. so that won't be a problem, since each machine has its own redis instance :) [20:46:55] proxy a mission critical infrastructure, so at least 2 would be a really good idea ;) [20:47:00] I agree. [20:47:52] I don't :-). Complexity has always bitten us in the ass. On Toolserver, the HA stuff caused more outtimes than anything else. [20:48:39] I am first looking into fixing it on the redis side. connection pooling + more aggressive in memory caching should help [20:49:11] scfc_de: we would 1. modify 'webservice' to make two calls to the two proxy machines than one, 2. set dns to have two entries, 3. done [20:49:41] scfc_de: alternative to 1 is to have a (2) be a redis slave [20:49:55] YuviPanda: how would DNS fix it when one of them throws 503's? [20:50:14] YuviPanda: why not keeping them total separate, just replicate redis [20:50:16] how about a plain text file to store the configuration? :-p [20:50:49] valhallasw: they were throwing 503s because they were hitting connection limits on redis (current hypothesis). So if there are two, then obviously lesser load, etc. [20:51:22] hedonil: yeah, that's the second option. [20:51:35] hedonil: but then if one machine goes down then both are down, and that is useless [20:51:47] YuviPanda: iif the connection limits were due to high load on the http end, and not just because they hit because of something on a longer timescale [20:52:14] e.g. connections that stay open indefinitely for some reason [20:52:14] valhallasw: I agree. First order of work is to figure out if the lua code can be optimized, not to setup a new machine :) [20:52:18] YuviPanda: hmm, one is hot standy, switching to it's local redis clone if the main proxy goes down [20:52:42] hedonil: yeah, that kinda stuff adds complexity that might cause downtime by itself :D Let's see if we can figure out another solution. [20:52:43] *standby [20:52:48] I'm reading through the docs again now [20:53:29] I might lose power in a while. solution to 503s is to restart redis [20:53:35] no data will be lost if you restart [20:53:39] scfc_de: ^ [20:53:44] petan: ^ [20:53:45] YuviPanda: k [20:53:53] scfc_de: petan redis in tools-webproxy, not tools-redis :) [20:54:09] :-) [20:55:10] YuviPanda: another suggestion, 2x nginx, 2x redis (replication), each service on it's own box ( 4 in total) [20:55:38] hedonil: no. nginx and redis should be on the same box. shouldn't require a network request to fetch routing info [20:57:26] valhallasw: Does https://bugzilla.wikimedia.org/65272 work for you now (again)? [20:57:56] scfc_de: yep. [20:59:52] 3Wikimedia Labs / 3tools: Large files hang during download - 10https://bugzilla.wikimedia.org/65272#c2 (10Merlijn van Deen) 5NEW>3RES/DUP Same core cause as #65179 : nginx failing due to an overloaded redis server *** This bug has been marked as a duplicate of bug 65179 *** [20:59:52] 3Wikimedia Labs / 3tools: Random 503 Service Temporarily Unavailable errors from tools-webproxy - 10https://bugzilla.wikimedia.org/65179#c10 (10Merlijn van Deen) *** Bug 65272 has been marked as a duplicate of this bug. *** [21:06:04] errors again. i am dealing with it [21:06:45] should be back now [21:08:36] scfc_de: valhallasw my fault. I wasn't setting the connection pool properly. let me make a patch [21:28:46] Python isn't finding pywikibot from jsub. [21:33:50] (Where) do you set PYTHONPATH (or whatever it is called)? [21:34:33] scfc_de: ~/.bash_profile, right? [21:35:47] a930913: I don't know if that is sourced by the grid; try "jsub env" to see if it is set. [21:36:33] scfc_de: Aye, not there. [21:51:23] 3Wikimedia Labs / 3tools: Random 503 Service Temporarily Unavailable errors from tools-webproxy - 10https://bugzilla.wikimedia.org/65179 (10Tim Landscheidt) a:5Marc A. Pelletier>3Yuvi Panda [22:30:00] hey scfc_de / valhallasw`cloud [22:30:15] my laptop is dead, and I don't have an adapter [22:30:21] can you tell me if I missed something? [22:33:56] YuviPanda: Nothing happened as far as I know. I assume you tested the changes submitted to Gerrit on the live system? [22:41:36] scfc_de: I did, yeah. I checked the connection re-use with it as well and it was being reused after the patch and not before [22:41:38] let me find someone to merge [22:41:46] mutante: around? care to merge a patch for toollabs? [22:41:56] yuvipanda_: depends what it does [22:42:01] hmmm, actually, no, not now [22:42:08] not the one that "introduces new service model" eh [22:42:13] heh, ok [22:42:14] I don't have access to my primary machine, and shouldn't do that [22:42:31] mutante: no, it introduces connection pooling for the proxy so it doesn't kill the proxy after a long time of it being in use [22:42:47] ah [22:42:56] well.. [22:43:04] there are some current problems [22:43:13] but just intermittent [22:44:58] mutante: no, it fully went down for a while [22:45:03] mutante: restarting redis fixed it [22:45:21] yuvipanda_: ah [22:45:36] mutante: and then I looked at the logs and the problem was that there were just too many connections hanging around, since redis is single threaded but nginx has multiple workers and I had set a 1s connection timeout but not set a connection pool [22:46:00] mutante: so now I've a connection pool with 32s timeouts for purging from the pool plus a 128 max connections limit, which should work [22:47:15] scfc_de: can you comment on the bug saying this was the problem and the solution is to restart redis on tools-webproxy, for now at least? I don't have my primary machine with me now [22:48:35] yuvipanda_: that sounds reasonable [22:51:56] mutante: yeah. only, if shit goes down I don't even have my key on me. I am in the UK and have no compatible plug, despite having a US plug, an Indian plug *and* and European plug I could use. oh well [22:52:42] yuvipanda_: hah, i see, problems of a world traveller [22:54:06] YuviPanda: Will do. I'm sure there are European -> UK adapters, but probably not at this hour :-). [22:55:10] scfc_de: yeah :) [22:55:13] I am off now :) [22:55:15] cya guys! [22:56:16] cya yuvi [22:59:01] Bye!