[01:00:00] 3MediaWiki / 3File management: Find a better solution for a "clickable image" on file pages - 10https://bugzilla.wikimedia.org/71485#c2 (10Bawolff (Brian Wolff)) > > Some of our users have limited data plans and this can be quite costly for > them. Given we have an `original file` link it seems surplus. I... [08:17:45] 3MediaWiki extensions / 3UploadWizard: asterisk * to denote 'required' is missing from description field on Special:UploadWizard - 10https://bugzilla.wikimedia.org/71475#c1 (10Andre Klapper) UploadWizard => UploadWizard component [08:58:33] (03CR) 10Gergő Tisza: [C: 032] Fix repo/details check of E2E test [extensions/MultimediaViewer] - 10https://gerrit.wikimedia.org/r/163828 (owner: 10Gilles) [08:59:20] (03Merged) 10jenkins-bot: Fix repo/details check of E2E test [extensions/MultimediaViewer] - 10https://gerrit.wikimedia.org/r/163828 (owner: 10Gilles) [13:27:14] (03PS1) 10Gilles: Generate TSV for versus test running on new labs machine [analytics/multimedia] - 10https://gerrit.wikimedia.org/r/164062 [13:27:51] (03CR) 10Gilles: [C: 032 V: 032] "Query tested by SSHing the SQL server" [analytics/multimedia] - 10https://gerrit.wikimedia.org/r/164062 (owner: 10Gilles) [14:05:07] (03PS1) 10Gilles: Add graph for new versus test machine [analytics/multimedia/config] - 10https://gerrit.wikimedia.org/r/164068 [14:05:25] (03CR) 10Gilles: [C: 032 V: 032] "Tested on limn locally" [analytics/multimedia/config] - 10https://gerrit.wikimedia.org/r/164068 (owner: 10Gilles) [14:20:14] gi11es: https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/897 says size dropdown text in the reuse panels should be covered by an e2e test, but we have no e2e tests whatsoever for reuse panels [14:20:34] and https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/445 is set to priority:never [14:21:05] maybe we should merge the tickets and reprioritize? [14:21:33] or cover this with a qunit test? [14:26:44] tgr: I was just about to work on that [14:27:08] I'll start working on 445 and cover 897 [14:27:37] we've had so much OOUI-induced breakage, we can't afford for the E2E tests not to cover those menus [14:28:00] most of the time we didn't discover the breakage through qunit [14:29:12] https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/445 updated and brought to current sprint, I've moved the other one back to the accepted column [14:48:33] > Firefox can't find the server at multimedia-metrics.wmflabs.org [14:48:35] Fantastic [14:51:08] Fabrice should be able to test 908... [14:52:09] I'm tired of 822 sitting in ACR when it's not...I might move it. [14:52:24] It's in RFD now. [15:08:58] marktraceur: are you by any chance an admin or something on beta enwiki? [15:09:37] Special:Upload is telling me I need to be in one of those groups: Autoconfirmed users, Administrators, Confirmed users. [15:09:40] Nope, but let me look [15:09:42] http://en.wikipedia.beta.wmflabs.org/wiki/Special:Upload [15:10:24] Ugh slow [15:23:28] (03PS1) 10Gergő Tisza: Update schema revision number for NavigationTiming [analytics/multimedia] - 10https://gerrit.wikimedia.org/r/164084 [15:25:54] what's the process to access beta servers I currently don't have access to? ping an admin on that list? https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep [15:26:22] someone like... marktraceur [15:26:40] Yeah [15:26:42] I can add ye [15:26:51] I need to ssh into deployment-cache-upload02.eqiad.wmflabs to check that the thumbnails are getting prerendered properly [15:27:22] gi11es: are you uploading via the API? [15:27:35] gi11es: Don't break stuff [15:27:38] no, UW on beta commons [15:28:05] be careful not to trigger thumbnail chainging, that will also prerender stuff [15:28:05] it's hard to break something with "ls" but I'll try [15:28:07] gi11es: What's your wikitech username? [15:28:13] I think [15:28:34] the chaining isn't deployed anywhere yet, afaik [15:28:43] not even on beta? [15:28:53] nope, I was going to do that after this [15:28:54] Oh, I typed "rm -rf /" instead of ls, damn it [15:28:54] I do remember testing it there [15:28:56] Common typo [15:29:04] hmmm [15:29:12] marktraceur: gilles [15:29:27] well even if there's chaining, it's not the same set of sizes [15:29:29] OK, you're added [15:29:46] oh, right, I forgot chaining uses powers of 2 [15:29:52] gi11es: Do you need projectadmin? [15:30:02] marktraceur: can you check if you can ssh into "ssh deployment-cache-upload02.eqiad.wmflabs" yourself? [15:30:12] I'm going to guess not, but I'll try [15:30:16] I still can't, but maybe you giving me powers need to propagate or soemthing [15:30:32] Might could be. [15:30:33] gi11es: are you using the labs bastion? [15:30:50] I think my issues are related to DNS stupidity [15:30:56] yep, my ssh config should do that for .wmflabs addresses [15:31:34] Host *.eqiad.wmflabs [15:31:35] ProxyCommand ssh -a -W %h:%p bastion1.eqiad.wmflabs [15:32:11] ah, I cann ssh now [15:32:19] there is a propagation time, then [15:33:54] ah, but it seems like the machine is only a varnish layer. gott find where it's pulling the thumbnails from [15:34:26] Maybe there are swift machines in labs [15:34:56] I don't think it's swift, godog said labs probably stored files the classic way [15:35:20] the file setup on beta cluster does not resemble production [15:35:26] remotely [15:35:37] that's not what I'm after [15:35:44] I'd just like to know where on labs the files go [15:35:47] to check their existence [15:36:06] Probably the apaches, right? [15:36:35] gi11es: deployment-cache-text02 IIRC [15:36:50] TIL there is an upload-wizard project and labs machine [15:36:59] /data/project/upload7/wikipedia/commons ? [15:37:24] bawolff: on what machine is that? [15:37:27] gi11es: Used for Yuvi's experiments with campaigns, IIRC [15:37:31] but the directory is an NFS share so souldn't matter much [15:37:41] ah, cool [15:37:55] yes, I believe its an nfs share on the apache [15:38:33] I found that path in filebackend-labs.php file [15:38:50] thanks, that's exactly what I needed [15:39:15] see https://bugzilla.wikimedia.org/show_bug.cgi?id=67525#c5 and the followup for details, I had the same trouble when verifying the chaining patch [15:39:50] and the prerendering didn't work, at least not for all sizes, yay [15:41:39] definitely not working [15:41:55] Aw. [15:42:32] now, what server does beta commons run on? [15:42:58] first thing I'd like to check is whether it can hit the varnish node like the code is supposed to [15:43:20] Hi, I can't view original files [15:44:38] (03PS1) 10Cmcmahon: QA: update Ruby gems [extensions/MultimediaViewer] - 10https://gerrit.wikimedia.org/r/164090 [15:45:01] Pyb: can you provide more detail? [15:45:54] deployment-mediawiki01 2, 3... one of those? [15:47:10] tgr: on Commons, I can see a file in different resolutions but the link "original file" doesn't work anymore. [15:47:29] Pyb: do you have a link? [15:47:39] gi11es: Keep in mind also, that there used to be a bug where purges were not making it to upload varnish on labs, and I don't think that bug is fixed, so ?action=purge on the thumbnail might not be deleting the old thumbnails, so depending on how you test, you might just be seeing old cached results [15:47:45] tgr: for example https://upload.wikimedia.org/wikipedia/commons/f/fe/Claude_de_Besan%C3%A7on_XVIe_07997.jpg [15:48:16] Pyb: works fine for me [15:48:21] me too [15:48:23] what result do you get? [15:50:11] Pyb: which country are you accessing from? [15:50:58] tgr: http://imgur.com/XCzZzBo [15:51:02] bawolff: France [15:51:21] Pyb: which browser are you using? [15:51:25] log or not, the result is the same [15:51:40] gi11es: FF 32 [15:51:59] Pyb: same for images which do not have % in their URL? [15:52:08] Pyb: works for me and I'm connecting from france. what's your ISP? [15:52:59] esams gave me same result as eqiad [15:53:00] gi11es: hmm it works on Chrome [15:53:26] gi11es: I'm not that familiar with prod setup, and much less with beta, but isn't it basically a pool of appservers with a bunch of virtual host configs each? [15:53:37] tgr: yes [15:53:40] Pyb: have you tried shift-refreshing in firefox? [15:54:07] tgr: I'd assume so, but I just need to know what one of those boxen is, to run telnet from there towards the varnish machine [15:54:12] to check if that part works [15:54:13] gi11es: yes [15:54:21] ie you connect to commons beta, the load balancer sends you to deployment-mediawikiXX, it looks up the settings in InitalizeConfig-labs.php based on the host header [15:54:40] and can act as commons or enwiki or whatever based on that [15:55:29] gi11es: it has to be one of the boxes with role::beta::appserver [15:55:51] tgr: yep, https://wikitech.wikimedia.org/wiki/Nova_Resource:I-0000044e.eqiad.wmflabs looks like one [15:56:02] commons is definitely in the apache sites-enabled configs [15:56:04] hence, deployment-mediawiki01 or 03 [15:56:06] beta commons, that is [15:56:10] right [15:59:21] telnet with the expected headers works and generates the thumb... [16:01:50] I guess I need to check if the global variables are set correctly [16:02:04] if eval.php or something like that is available on those machines [16:02:17] gi11es: you can check if the job is stuck [16:02:24] how? [16:02:49] is there a special page for that? [16:02:56] showJobs.php [16:03:11] I mean [16:03:19] showJobs.php --group [16:03:24] mwscript maintenance/showJobs.php [16:03:39] I can't find the damn maintenance folder [16:04:01] (I'm poking on deployment-mediawiki02) [16:04:15] would be nice to see what jobs have been run in the past, not sure if there is a way to do that [16:04:32] /srv/common I think? [16:04:50] There should be a log file [16:04:57] At least on production there is, no idea about beta [16:05:35] tgr: no such folder [16:05:37] The log file would also include what error the job had if it couldn't be finished due to error [16:07:40] /srv/mediawiki/php-master/maintenance [16:09:04] can't find mwscript, though [16:11:45] a-ha /srv/mediawiki/multiversion/MWScript.php [16:20:27] I've finally managed to run mwscript, the globals are definitely set correctly for beta commons [16:20:39] showJobs echoes "0" [16:20:53] bawolff: do you know what that log file is usually called? [16:21:36] gi11es: actually, aren't jobs running from a separate box? [16:22:38] well, I guess that's where this investigation has to go now... [16:22:43] no, all i know is the config settings on production has lines like udp://$wmfUdp2logDest/jobqueue/web [16:22:57] tgr: yes [16:23:23] deployment-jobrunner01 [16:26:17] the mwscript wrapper is on deployment-bastion. [16:26:30] We don't provision it on all hosts [16:26:33] in prod or beta [16:26:40] I just put it in my home folder [16:27:20] same result for showJobs (0) on deployment-jobrunner01 [16:28:06] $ mwscript showJobs.php --wiki=commonswiki --group [16:28:07] gwtoolsetUploadMediafileJob: 0 queued; 9 claimed (0 active, 9 abandoned); 0 delayed [16:28:21] what does --group do? [16:28:38] GWT is definitely unrelated [16:28:41] --group: Show number of jobs per job type [16:28:48] mwscript showJobs.php --wiki=commonswiki --help [16:29:20] bd808: any idea where I should look for info about jobs that have run already? [16:30:03] I was trying to figure out where jobs logs go in beta... I think they are in logstash. Let me look [16:30:23] facking labs dns [16:30:27] *fracking [16:30:36] "The server at logstash-beta.wmflabs.org can't be found, because the DNS lookup failed." [16:30:42] ugh, beta commons looks pretty broken right now http://commons.wikimedia.beta.wmflabs.org/wiki/Special:UploadWizard [16:31:08] ah, shift-refresh worked [16:32:28] I've just uploaded a new file, I'm not seeing the job appear [16:35:55] but maybe I wasn't fast enough... [16:36:16] gi11es: There are some logs in logstash -- https://logstash-beta.wmflabs.org/#dashboard/temp/zSBntmqHQ3ei5Fg55GfnWQ [16:36:31] bd808, gi11es: not sure if this helps, but this was the previous server i used to go to watch the runJobs.log ssh deployment-jobrunner08.pmtpa.wmflabs [16:37:05] bd808: which credentials does that page use? [16:37:11] Strangely I don't see a runJobs.log in deployment-bastion:/data/project/logs [16:37:23] the weird sudo command has the password? [16:37:26] gi11es: It tells you how to access in the auth prompt message [16:37:30] yeah [16:37:40] it looked too dodgy ;) I'll do that [16:37:46] We can't do proper auth inside labs [16:38:09] because I would be able to sniff everyone's gerrit credentials :) [16:38:26] Sorry, user gilles is not allowed to execute '/bin/cat /root/secrets.txt' as root on deployment-bastion.eqiad.wmflabs. [16:39:00] I should fix that then. :) [16:39:17] not sure my sudo password was correct... I assumed wikitech credentials [16:39:47] It won't prompt you when the rights are granted... and they are now. [16:40:06] excellent, thank you [16:42:00] the job is listed there, perfect [16:42:42] I'm a bit confused that log events are making it into logstash but not also ending up on disk on /data/project/logs [16:43:00] something not quite right about the udp2log service I guess [16:45:01] Hmm... log seems to be at deployment-bastion:/data/project/logs/archive/runJobs.log-20141001 [16:45:11] Maybe is just got rotated before I looked [16:45:20] s/is/it/ [16:45:29] apparently the job should fail [16:45:41] but didn't... gotta figure out why [16:46:15] * bd808 wishes gi11es good luck debugging upload jobs [16:46:21] the issue seems to be taht the titles include File: on beta when passed to the job [16:46:27] not locally [16:47:31] or maybe the job did fail, I'm not sure where the status would appear [16:47:49] the logstash entry only shows the parameters passed to the job, it seems [16:48:13] "good" at the end? does that mean that the job was pushed correctly or that the job was successful? [16:49:03] * bd808 growls at "2014-10-01 16:02:17 deployment-jobrunner01 wikidatawiki: [98a3681a] [no req] Exception from line 827 of /srv/mediawiki/php-master/includes/jobqueue/JobQueueRedis.php: Unable to connect to redis server." [16:49:10] I thought I had fixed that yesterday [17:07:42] marktraceur: aaron wrote a script to identify missing files, we need to ask him and run it if he didn't [17:08:06] (my hangouts plugin crashed so I can hear you but can't unmute myself :/) [17:09:03] I think the OOM killer randomly kills Chrome plugins for some reason [17:09:47] anyway, we have two different cards about missing files [17:10:36] #827 is swift outage related, bug has been fixed, Aaron wrote a script to identify all missing files, once they are found they can be fixed by commons admins by moving back and forth [17:10:49] bd808: don't know if this helps or not: 2014-10-01 16:02:17 deployment-jobrunner01 wikidatawiki: [98a3681a] [no req] Exception from line 827 of /srv/mediawiki/php-master/includes/jobqueue/JobQueueRedis.php: Unable to connect to redis server. [17:12:03] #877 is not swift outage related, bug is not fixed, as I understand there is no way to tell what happened since swift logs are not kept for long [17:13:02] so either ops have some idea or we might want to just give up on that (and make sure swift logging times are increased) [17:13:21] pginer: Hate to do this to you, but the post on multimedia-l from gnangarra was actually about the layout of the viewer, not a technical question - would you mind responding? [17:14:45] ok [17:17:23] bd808: sorry, irc lag, see that you already posted the exception [17:17:57] I’m not sure which post it is, I searched the list and I only see posts from May by gnangarra [17:19:33] bd808: I've been poking further and I don't see the difference between locally and on beta, in terms of parameters passed to the job. the only piece of the puzzle I'm missing right now is how to check whether the job run on beta was successful or not [17:22:53] gi11es: for what it's worth, i used to ssh into the job runner and tail -f the runJobs.log | grep GWToolset. kick off a batch and just wait until something showed up [17:23:50] gi11es: The jobs are actually run on deployment-jobrunner01 via runJobs.php. The logs should end up in /data/project/logs on all hosts (NFS mount). [17:24:36] But things look a little strange to me there at the moment. There should be more log files than I'm seeing in the directory. [17:24:55] nyep [17:25:02] a lot more are showing up in the archives [17:25:13] including runJobs, which is missing [17:26:07] Yeah. Which makes me think something is wrong with the udp2log service on deployment-bastion. I can try restarting it I guess. [17:26:55] STARTING +. job starting, I presume [17:27:01] t=2792 good [17:27:13] does that mean it took 2792ms and was successful? [17:27:23] our logging is horrible. :( [17:28:09] yep [17:28:14] if ( $status === false ) { [17:28:14] $this->runJobsLog( $job->toString() . " t=$timeMs error={$error}" ); [17:28:14] } else { [17:28:15] $this->runJobsLog( $job->toString() . " t=$timeMs good" ); [17:28:17] } [17:29:34] seems like the problem is in the realm of my code, then [17:29:44] job returns as successful but didn't do what it was supposed to [17:30:48] I restarted the udp2log service on deployement-bastion (service udp2log stop; service udp2log-mw stop; service udp2log-mw start) so hopefully the log files will start to look right. [17:31:07] Don't ask me why we have 2 services one of which should not be running :/ [17:31:34] Someday my monolog patches will land and I will understand the logging layer [17:31:50] Keegan|Away pginer: Hi Keegan and Pau, I just filed this ticket #933 to ‘Warn users if they click to enlarge huge images’: https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/933 [17:32:56] Keegan: So so you can mention that in your response to Geni on our talk page — and ask him what he thinks might be a reasonable threshold: https://www.mediawiki.org/wiki/Talk:Multimedia/About_Media_Viewer#Media_Viewer_Update:_First_Improvements [17:33:59] gi11es or tgr : Do any of you have data on what the threshold might be for identifying file sizes that might crash your browser? [18:45:19] 3MediaWiki extensions / 3GWToolset: GWToolset uploads files without file description pages - 10https://bugzilla.wikimedia.org/71527 (10Jean-Fred) 3NEW p:3Unprio s:3critic a:3None Created attachment 16641 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16641&action=edit XML file containing the... [18:46:31] 3MediaWiki extensions / 3GWToolset: GWToolset uploads files without file description pages - 10https://bugzilla.wikimedia.org/71527 (10Jean-Fred) s:5critic>3major [18:49:31] fabriceflorin: it completely depends on the computer opening it... very hard to make a good guess about a limit [18:50:08] it's probably going to have to be arbitrary [18:51:51] gi11es: Thanks! Would you mind if I asked our public multimedia list if they have any data of the type that Geni is asking for? or if they know of best practices on that point? It would be good if we could agree on a limit that is at least partly informed by data. [18:52:04] sure, go ahead [19:09:44] it would be nice to measure in what fraction of uses does showing-original-on-click actually fulfill user expectations, although I have no idea how that could be done [19:10:44] from a UX point of view it is really confusing: you click on the article thumbnail, you get a screen-sized thumbnail; you click on that to get another screen-sized thumbnail; you click on that to get a huge image [20:24:00] 3MediaWiki extensions / 3GWToolset: GWToolset uploads files without file description pages - 10https://bugzilla.wikimedia.org/71527#c1 (10dan) was it uploaded recently? after the new Special:Log for GWToolset was added? i didn’t see any of the titles above listed in the current Special:Log, this month or las... [20:37:42] figured out what was wrong with the prerender job... due to beta's config the url rewriting doesn't properly, and as a result it hits the right server but one a wrong url [20:38:07] and that wrong url serves the body of a 404 page, but the MWHttpRequest sees a 200 http status code [20:39:45] and I've checked with telnet, varnish does indeed serve a 200 status code on beta for a 404 page [20:39:59] at least for that particular 404 [20:42:43] ah, and it's because of < 5.4.7 parse_url behaviour... [20:43:42] the vagrant vm runs 5.5.9-1ubuntu4.3 on the command line [20:44:01] deployment-jobrunner01 runs 5.3.10-1ubuntu3.14 [20:44:54] bd808: what's up with that version discrepancy in CLI? is the beta version the one we run in production? [20:45:28] gi11es: Ubuntu 14.04 on vagrant 12.04 on beta right now [20:45:56] I see that in production it's the old stuff as well [20:46:20] alright, I guess I'll have to find a polyfill of some kind for the old parse_url way of handling schema-less urls [20:46:41] scheme-less [20:47:30] (03PS1) 10Cmcmahon: QA: update Ruby gems [extensions/UploadWizard] - 10https://gerrit.wikimedia.org/r/164217 [20:48:20] (03CR) 10Cmcmahon: [C: 032] "maintenance" [extensions/UploadWizard] - 10https://gerrit.wikimedia.org/r/164217 (owner: 10Cmcmahon) [20:48:55] (03Merged) 10jenkins-bot: QA: update Ruby gems [extensions/UploadWizard] - 10https://gerrit.wikimedia.org/r/164217 (owner: 10Cmcmahon) [20:50:40] (03CR) 10Cmcmahon: [C: 032] "maintenance" [extensions/MultimediaViewer] - 10https://gerrit.wikimedia.org/r/164090 (owner: 10Cmcmahon) [20:51:24] (03Merged) 10jenkins-bot: QA: update Ruby gems [extensions/MultimediaViewer] - 10https://gerrit.wikimedia.org/r/164090 (owner: 10Cmcmahon) [21:12:31] 3MediaWiki extensions / 3GWToolset: GWToolset uploads files without file description pages - 10https://bugzilla.wikimedia.org/71527#c2 (10Jean-Fred) (In reply to dan from comment #1) > was it uploaded recently? after the new Special:Log for GWToolset was added? These files were uploaded in July, hence befor... [22:07:15] 3MediaWiki extensions / 3GWToolset: GWToolset uploads files without file description pages - 10https://bugzilla.wikimedia.org/71527#c3 (10dan) whoops, should have noticed the date/time stamp. because someone else edited the pages, GWToolset won’t upload to the page again. is it possible to delete the pages...