[00:02:55] Negative24: what setup issues? [00:03:34] twentyafterfour: phab has new "setup issues" in its interface that it now is enforcing [00:04:01] I didn't run into any errors on phab-01 [00:04:30] Negative24: new as of today? I updated phab-01 to HEAD a few hours ago [00:04:46] hrm [00:05:56] just got it when I pulled phab-02 to release/2015-05-06/1 at about the same time [00:06:54] I'll be back a bit later [00:21:59] Negative24: did you run phabricator/bin/storage upgrade? [00:39:21] twentyafterfour: I did [00:39:31] phab wouldn't even load without it [00:39:54] twentyafterfour: its a php post_max_size config [00:41:37] PROBLEM - Puppet failure on tools-checker-02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [00:41:49] PROBLEM - Puppet failure on tools-checker-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:42:08] I’m looking [00:42:11] Negative24: interesting [00:42:33] twentyafterfour: can you login to phab-02? [00:43:38] Negative24: http isn't loading on phab02.wmflabs or phab-02.wmflabs [00:44:05] ah yes you didn't see that whole ordeal me and yuvi went through :P [00:44:14] do you use chrome? [00:44:28] chrome and firefox [00:44:33] tried both [00:45:10] short story: phab-02 now goes through http but phab served http://phab-02.wmflabs.org as a permanent redirect so you'll have to clear that cache [00:45:46] in chrome open up dev tools on the error page, hold down refresh and move your cursor over clear cache and reload [00:46:00] *hold down the refresh button on screen [00:47:16] twentyafterfour: ^ [01:01:47] RECOVERY - Puppet failure on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:26:34] RECOVERY - Puppet failure on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [01:31:25] 6Labs, 10Tool-Labs, 6operations: NFS file corruption - https://phabricator.wikimedia.org/T96488#1259659 (10coren) I don't think that's likely to be possible in the general case; we might be able - at some cost - to gather a list of files that were written around the right timeframe but, unless we know what w... [01:51:26] Negative24: hmm, still redirecting to https [02:00:53] ok clearing all cache did the trick in chrome, but not firefox :-/ [02:01:41] twentyafterfour: clear it again. Sometimes it takes two tries. [02:03:57] caches are evil :P [02:04:25] chrome still redirects me to https when I type only phab-02 in the address bar [02:04:37] I have to go to the fully justified url [02:12:28] yuvipanda: what is star.wikimedia.org? [02:51:35] PROBLEM - Host tools-webproxy-test is DOWN: CRITICAL - Host Unreachable (10.68.16.113) [02:54:18] twentyafterfour: try ssl again [02:54:38] you may get a bunch of security warnings [02:58:17] RECOVERY - Host tools-webproxy-test is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [03:14:53] 6Labs: Pagelinks table contains a row having pl_from = 0 - https://phabricator.wikimedia.org/T98110#1259693 (10PleaseStand) [03:27:12] !log phabricator Generated self-signed cert on phab-02 and enabled https serving as well as http to https redirect (if anyone has any idea on getting a good, signed cert, let me know) [03:27:18] Logged the message, Master [03:27:44] Negative24: there's a *.wmflabs.org cert [03:27:56] oh is there [03:28:11] Negative24: is this behind the proxy? [03:28:26] nope. That's why I have to do this [03:28:54] this was a nice learning expedition in ssl, though :) [03:28:57] can it go behind the proxy? [03:29:01] it can't [03:29:12] why not? :/ [03:29:21] its serving through https and git through 22 [03:29:31] ah [03:30:18] so glad its working now :) twentyafterfour won't have to worry about clearing his cache now :P [03:30:50] legoktm: I heard yuvi say something about star.wikimedia.org [03:32:09] that's for production [03:32:26] but that's the *.wikimedia.org cert [03:32:28] well that's a no go [03:32:50] :/ [03:34:38] This is ok for now. I'm heading out. [03:36:03] o/ [03:54:07] sweeet [03:55:06] lols: "This site uses HTTP Strict Transport Security (HSTS) to specify that Firefox only connect to it securely. As a result, it is not possible to add an exception for this certificate." [05:21:00] Hi. I am a little upset. I thought job scripts can contain -M and -m options to email me when the job begins and ends, but it does not email me, and I have no clue how to find what job scheduler this cluster uses. [05:58:57] ... [06:14:45] svetlana: that worked previously … [06:15:11] 10Tool-Labs: Grid doesn't send mail anymore - https://phabricator.wikimedia.org/T98112#1259783 (10Sitic) 3NEW [07:21:19] sitic: oh, oops. :-) turns out I just had to peek at ``man qsub'' and figure out that the prefix is "#$" [07:22:11] sitic: still no mail though after a successful job start ... I have a "#$ -m abe" and "#$ -M svetlana@blabla.com" lines in it [07:22:22] (with a real email) [08:38:13] PROBLEM - Puppet staleness on tools-shadow is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [08:40:09] 10Tool-Labs: Tool "listeria" has no DB replica access file - https://phabricator.wikimedia.org/T98116#1259889 (10Magnus) 3NEW [10:20:30] PROBLEM - Puppet staleness on tools-exec-catscan is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:24:15] PROBLEM - Puppet staleness on tools-exec-cyberbot is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:31:51] PROBLEM - Puppet staleness on tools-exec-15 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:32:15] PROBLEM - Puppet staleness on tools-exec-gift is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:33:12] PROBLEM - Puppet staleness on tools-exec-wmt is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:34:34] PROBLEM - Puppet staleness on tools-exec-13 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:36:09] PROBLEM - Puppet staleness on tools-exec-08 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:36:49] PROBLEM - Puppet staleness on tools-exec-14 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:37:03] PROBLEM - Puppet staleness on tools-exec-07 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:40:26] PROBLEM - Puppet staleness on tools-exec-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [11:06:02] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [11:26:37] 6Labs, 10OpenStreetMap, 5Patch-For-Review: Block OruxMaps app from hitting labs proxy - https://phabricator.wikimedia.org/T97841#1260178 (10akosiaris) @yuvipanda, yeah they have been made aware since yesterday. I think we can close this for now and if we got any important updates we can always re-open it [11:31:06] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [11:49:15] 6Labs, 10OpenStreetMap, 5Patch-For-Review: Block OruxMaps app from hitting labs proxy - https://phabricator.wikimedia.org/T97841#1260230 (10MaxSem) 5Open>3Resolved a:3MaxSem [11:49:37] 6Labs, 10OpenStreetMap: Block OruxMaps app from hitting labs proxy - https://phabricator.wikimedia.org/T97841#1253184 (10MaxSem) [12:23:03] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [12:59:24] twentyafterfour: ah that's a problem. Let me fix that. [12:59:48] leave it up to firefox to be all up in security [12:59:56] :) [13:01:39] twentyafterfour: should work now [13:10:55] coren when you get back give me a ping [13:11:10] Betacommand: pong. [13:11:23] your up early [13:12:09] Coren: [[User:10.68.17.88]] is a labs bot that is logged out. The op is MIA, can you take a look and kill it? [13:12:28] Betacommand: Do you already know which bot that is? [13:13:02] Coren: [[User:HBC AIV helperbot]] or [[User:HBC AIV helperbot5]] [13:14:26] Betacommand: I'll track down the responsible tool. TY for raising the issue. [13:15:20] Coren: Just passing the request along, it popped up in -bag [13:47:30] NFS in any kind of known-bad state? [13:47:58] (I'm just noticing my regular filesystem access slowing down progressively for simple commands) [13:54:17] Needs restart: https://tools.wmflabs.org/xtools/pages/ [13:59:17] 10Tool-Labs-xTools: xtools restarted - https://phabricator.wikimedia.org/T98144#1260497 (10valhallasw) 3NEW [14:00:31] Nemo_bis: ^ seems online again [14:00:54] more chance than wisdom, though [14:02:31] what the heck? why does it have php servers (?) configured for tools-service-01 O_o [14:04:55] 10Tool-Labs-xTools: xtools restarted - https://phabricator.wikimedia.org/T98144#1260528 (10valhallasw) Adding Yuvi and Coren because the .lighttpd.conf looks wrong to me: ``` "PHPADD1" => ( "host" => "10.68.16.29", "port" => "20002", "disabl... [14:07:53] 10Tool-Labs-xTools: xtools restarted - https://phabricator.wikimedia.org/T98144#1260546 (10coren) `.lighttpd.conf` is user-managed; I expect that whatever that IP points to now wasn't the same when it was written. Not only has tools had a lot of instances rebuilt, but cold migrations over the past several days... [14:08:35] Coren: any guess as to what host it should have referred to? [14:09:11] we don't have a specific xtools host as far as I could see [14:19:25] 10Tool-Labs-xTools: xtools restarted - https://phabricator.wikimedia.org/T98144#1260561 (10valhallasw) 10.68.16.29 was webgrid-tomcat at some point, but I'm at a loss why xtools would try to use that host for fcgi... In any case, I have commented out the offending lines, so at least xtools should work again (bu... [15:45:50] 6Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Disable LDAP and enable admin puppet module on labstore100[12] - https://phabricator.wikimedia.org/T95559#1260806 (10faidon) I don't have any comments on the issue at hand yet, but a couple of meta-issues: - This description should be in the... [16:36:22] 10Tool-Labs: Tool "listeria" has no DB replica access file - https://phabricator.wikimedia.org/T98116#1260938 (10scfc) 5Open>3Resolved a:3scfc This has been created since: ``` scfc@tools-bastion-01:~$ ls -l /data/project/listeria/replica.my.cnf -rw------- 1 tools.listeria tools.listeria 51 Mai 5 12:42 /d... [16:36:29] 10Tool-Labs: Tool "listeria" has no DB replica access file - https://phabricator.wikimedia.org/T98116#1260941 (10scfc) a:5scfc>3None [16:37:33] 10Tool-Labs: Grid doesn't send mail anymore - https://phabricator.wikimedia.org/T98112#1260949 (10scfc) Please provide a job number or a time when this occured so that it can be debugged. [17:04:05] A public wiki that can be allowed to act as a honeypot for spammers would be of great help for my GSoC project: an extension to identify and delete spam pages. Is there a chance that a Labs instance could be given for this or are there better places for such experiments? [17:05:24] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1261014 (10scfc) ``` From: root@tools.wmflabs.org (Cron Daemon) Subject: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) To: root@tools.wmflabs.org... [17:18:19] Um, hello? [17:18:53] polybuildr: be patient [17:19:15] * polybuildr sighs [17:19:22] gifti: Fair enough. [17:19:35] you can also use the mailing list [17:36:40] polybuildr: Are you familiar with tool labs? If you can fit it there, I think that is generally recommended (though I vaguely recall running open wikis is discouraged). [17:40:04] afeder: No, I'm not. I'll take a look, though. (Also, if open wikis are discouraged, spam honeypot wikis are going to be very much so) [17:41:20] polybuildr: Indeed, but maybe you can special permission from ops, since your use case is sensible. [17:43:20] polybuildr: afeder I think if you’re looking for honeypots, you should get your own labs instance / project. shouldn’t be hard. [17:43:26] polybuildr: ask csteipp too, though. [17:43:38] polybuildr: you can find the request a project link on sidebar of wikitech.wikimedia.org [17:43:48] alright [17:44:19] yuvipanda: So you think it's possible that the request will be granted? Also, where do I contact csteipp? [17:44:37] In labs, should be fine [17:44:44] polybuildr: yeah, I think it would be. [17:44:57] well, I’m one of the people who creates projects, so… :) [17:45:07] polybuildr: What's the gsoc project? [17:45:15] yuvipanda, csteipp: Awesome. :D Thanks! [17:45:25] csteipp: An extension to identify and delete spam pages [17:46:24] csteipp: https://phabricator.wikimedia.org/T93498 [17:52:06] 6Labs, 7Tracking: Create spam honeypot Labs project for extension to identify and delete spam pages - https://phabricator.wikimedia.org/T98174#1261247 (10polybuildr) 3NEW [17:52:16] yuvipanda: done :) [17:54:11] speaking of honeypots... yuvipanda, can you allocate ip addresses in labs? T98038 [17:54:25] csteipp: sure! [17:55:39] 6Labs: Add public ip to security-tools - https://phabricator.wikimedia.org/T98038#1261284 (10yuvipanda) @csteipp Which project? [17:55:54] csteipp: There are other honeypots too? [17:56:03] 6Labs, 7Tracking: Create spam honeypot Labs project for extension to identify and delete spam pages - https://phabricator.wikimedia.org/T98174#1261291 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Done. [17:56:04] polybuildr: done [17:56:06] 6Labs, 7Tracking: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#1261294 (10yuvipanda) [17:57:01] 6Labs: Add public ip to security-tools - https://phabricator.wikimedia.org/T98038#1261309 (10csteipp) @yuvipanda, "security-tools" [17:57:51] 6Labs: Add public ip to security-tools - https://phabricator.wikimedia.org/T98038#1261328 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Done. [17:58:01] polybuildr: I'm setting up mhn for testing, and eventually would like to an instrumented mediawiki running to capture web exploits. Not for spam though. [17:58:04] csteipp: done [17:59:44] csteipp: nice :D [18:05:15] yuvipanda: okay, I tried looking around, but couldn't find any guides to Labs. I haven't ever used Labs before, so could you point me to something that could get me started? [18:14:32] Hello everyone, I have a question concerning java on the tool server. Somebody change the configuration. I get the following error: [18:14:34] missing `jamvm' JVM at `/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/jamvm/libjvm.so'. [18:14:36] Please install or use the JRE or JDK that contains these missing components. [18:14:38] What shall I do? Thank you. [18:22:01] 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad: labstore1002 issues while trying to reboot - https://phabricator.wikimedia.org/T98183#1261827 (10coren) 3NEW [18:22:53] Erm. [18:23:24] tkaspar: Might be a change in defaults - Precise installed the jamvm by default but perhaps Trusty doesn't. Checking. [18:24:59] Coren: Thank you. [18:25:21] That looks to be the case. Lemme test by hand on -login [18:26:08] Try again on -login? [18:27:22] Coren: did you track down the rouge bot? [18:27:57] Betacommand: I'm still pending an okay from Legal for doing a checkuser (more complicated when you are staff). Lemme poke 'em again. [18:28:22] Coren: for the UA? [18:28:27] * Coren nods. [18:28:52] Coren: want me to poke a CU? [18:30:00] ... that's kindof a runaround, but since I can't accidentally see something that opens liability then I suppose that'd work. I hadn't considered that. I'll do it. :-) [18:30:59] Coren: Given we work in IT finding workarounds is SOP [18:31:37] Lawyers tend to dislike runarounds, but the reason why that rule exists is compatible with that particular one. :-) [18:33:02] Coren: if a CU declines to release it, thats their call [18:33:14] we already know its from tools [18:35:01] thus UA shouldnt be a big deal [18:35:28] what are you looking for? [18:35:36] Coren: does that IP map to an exec node? [18:35:51] gifti: rogue bot [18:36:16] that's very specific [18:36:24] I don't expect it to be an issue; it's really just a question of layer-of-insulation. And now I have the UA and khow for a fact the author needs a trouting. :-) [18:37:16] Coren: generic FF or IE browser? [18:38:25] Betacommand: "MediaWiki/1.11" [18:38:37] ouch [18:38:43] even worse [18:40:13] It's not the actual aivhelperbot tool as far as I can tell. [18:40:23] No running job and no crontab. [18:41:00] Coren: like I said before, does that IP map to an exec node? [18:41:05] It does. [18:41:30] Coren: which node? [18:41:54] 1204. I think I found the culprit. [18:42:03] Coren: so, I’m running pdns and it recurses for requests outside of .wmflabs. Does that mean I am running ‘pdns recursor’ or is that something else? (I ask because I’m trying to run rec_control and it’s telling me a need to install the recursor package.) [18:42:28] Coren: helperbot11 ? [18:42:38] (Trying to write a custo lua script to remap IPs) [18:43:00] andrewbogott: I'm no pdns expert, but I think so. [18:44:20] Betacommand: helperbot; it's a perl bot written with the Mediawiki perl lib which - iirc - puts out that UA by default. [18:44:31] hm, apparently I am not, that package runs a different service called ‘pdns-recursor' [18:44:57] Coren: Ah, is there a public list of nodes and assigned IP addresses? [18:45:02] Huh. A subset of the whole pdns used when you /only/ want to recurse, maybe? [18:45:12] I guess. [18:45:23] Unforunate because it seems to be scriptable and the normal service seems not [18:45:35] Betacommand: Not one that's easy to get to. [18:45:49] Betacommand: This is why reverse DNS is on our plate. [18:46:06] Betacommand: Killed it. [18:46:38] Coren: thanks [18:46:49] checked cron? [18:49:25] Betacommand: commented out. [18:50:48] !log tools helperbot WP:AVI bot running logged out owner is MIA, Coren killed job from 1204 and commented out crontab [18:50:54] Logged the message, Master [18:56:49] Coren: can https://phabricator.wikimedia.org/T75384 get some love? [18:57:14] * Coren hugs the ticket. [18:57:29] Coren: Being cheeky? [18:57:50] Betacommand: Fiddling with the web interface is pretty low on the priority list I fear, but if you have a changeset I'd be happy to merge it in. [18:58:53] Betacommand: You can find the source in the labs/toollabs repo under www [18:59:00] Coren: is that php? [18:59:07] It is. [18:59:21] With a bit of js spicing for the table. [18:59:31] * Betacommand shudders and runs from the monster [19:00:55] * Betacommand thinks of re-writing it in python [19:09:48] 10Tool-Labs: Grid doesn't send mail anymore - https://phabricator.wikimedia.org/T98112#1262043 (10Sitic) ``` $ jsub -N test - jan.lebert@online.de -m abe test.sh $ qstat -j 212741 ============================================================== job_number: 212741 exec_file: job_scr... [19:22:10] Coren: did you fix the jamvm thing? [19:22:15] was there a patch? [19:22:46] 10Tool-Labs: Grid doesn't send mail anymore - https://phabricator.wikimedia.org/T98112#1262091 (10Sitic) The last email I have from when this was still working is this one: ``` Subject: Job 39912 (rabbit) Killed Date: Fri, 17 Apr 2015 19:10:07 +0000 From: root To: jan.lebert@online.de... [19:22:50] yuvipanda: Ima make one. I was waiting for confirmation from tkaspar first [19:22:58] Coren: cool [19:23:09] tkaspar: Speaking of... :-) Did it work? [19:25:56] 10Tool-Labs: Install jamvm under Trusty (which doesn't have it by default) - https://phabricator.wikimedia.org/T98195#1262103 (10coren) 3NEW a:3coren [19:38:14] 6Labs: allow routing between labs instances and public labs ips - https://phabricator.wikimedia.org/T96924#1262147 (10Andrew) Someone on IRC just claimed that there was a setting to fix this in new nova-network implementations. The closest thing I can find is force_snat_range: https://review.openstack.org/#/c/... [19:43:29] 6Labs: allow routing between labs instances and public labs ips - https://phabricator.wikimedia.org/T96924#1262172 (10Andrew) For the record, our current floating ip range is 208.80.155.128/25 [19:53:32] Can someone please point me to a guide for Labs? I've been given a Labs instance, but I don't know how to get into it yet. [19:57:23] polybuildr: here’s the faq: https://wikitech.wikimedia.org/wiki/Help:FAQ [19:57:23] did labs just go down? [19:57:24] (tools-login ssh at least) [19:57:41] Platonides: network seemed slow for a second, but looks ok to me... [19:57:47] finally entered [19:58:13] polybuildr: also https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_public_and_private_instances may be useful [19:58:21] it was weird, I had an interactive session and seemed to freeze [19:58:32] then on reconnect, it stayed on debug2: we sent a publickey packet, wait for reply for a long time [20:00:04] andrewbogott: Coren ugh, toollabs failure reports from catchpoint [20:00:10] hmm [20:00:11] back up now [20:00:15] stupid 5min resolution [20:00:19] but yeah there was a mini outage there [20:00:23] yuvipanda: that fits with Platonides seeing a hiccup [20:00:31] yeah [20:01:08] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Isolation: Include Base::Standard-packages in labs images - https://phabricator.wikimedia.org/T94995#1262253 (10hashar) [20:01:10] I'm seeing a minor bit of symptomatic dip in network io on NFS, but no matching burp of disk I/O [20:01:43] Usage has been pretty high for the past 90 minutes though. [20:03:54] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1262265 (10Teslaton) Plenty of dropouts last two days: http://tools.freeside.sk/monitor/http- kmlexport.html The overall availability still very very poor. [20:03:57] I’ve been migrating stuff all day but that should be throttled. [20:05:53] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1262276 (10yuvipanda) (We also have http://p.catchpoint.com/ui/Entry/PD/V/A.RNP-Ov-jSUbDu8Jdg/ErLK now) [20:06:40] Okay, so I'm currently behind a proxy server and access gerrit through corkscrew. [20:07:01] ProxyCommand corkscrew 8080 %h %p [20:07:07] Is the command I use. [20:07:30] I was reading https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_public_and_private_instances and attempted to log in to my Labs instance, [20:08:04] but the proxy server (probably) refused a connection over 22. Could someone please tell me how I could modify the ssh/config commands given in that Help page to also include a proxy? [20:14:47] 10Tool-Labs: Grid doesn't send mail anymore - https://phabricator.wikimedia.org/T98112#1262342 (10scfc) The job was executed on one of the newly set up boxes (`tools-exec-1210`). These were missing the script `/usr/local/bin/gridengine-mailer`. I have manually copied it on all `tools-exec-*` and `tools-webgrid... [20:14:59] 10Tool-Labs: Grid doesn't send mail anymore - https://phabricator.wikimedia.org/T98112#1262344 (10scfc) [20:15:00] 10Tool-Labs, 5Patch-For-Review: Error mails from SGE are encoded as application/octet-stream - https://phabricator.wikimedia.org/T63160#1262345 (10scfc) [20:15:25] andrewbogott: I doubt it's networking - I think someone's being naughtly again. [20:15:55] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1262347 (10Teslaton) Nice, but it would be more relevant to monitor actual availability of each particular service (tool) alone. The availability of kmlexport was < 90% during last 24h, I guess. And even worse during week 17 (... [20:18:45] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1262356 (10yuvipanda) So there are two things here - one is availability of the underlying tools infrastructure, and then of the individual tool itself. If the infrastructure is up and the tool is 'down' - I would think that's... [20:19:00] There's a 'massdeletebot' that's being rough [20:20:56] But it accounts for, maybe, 10% [20:22:17] Hm. And there's a phpunit running that couns for another 7% [20:22:55] And maps-tiles3 for another 10% or so. [20:23:02] No single culprit. [20:24:19] Looks like the load is high because lots of users, not because of any one of them. [20:30:01] Can anyone help me with the ProxyCommand that also uses corkscrew, please? [20:56:26] 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad: labvirt1005 memory errors - https://phabricator.wikimedia.org/T97521#1262455 (10Cmjohnson) Hi Christopher, This is Regarding the Case Number:4651331170 I have made arrangements to ship a replacement System board along with an onsite engineer. Part... [21:04:59] polybuildr: What help do you need, exactly? [21:08:33] 6Labs: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1262527 (10Andrew) a:5Andrew>3Springle [21:09:14] 6Labs: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1118300 (10Andrew) Sean -- this is easy, right? I'd like to do this soon so that I can move the nova controller to a new server. Thanks! [21:10:31] 6Labs: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1262541 (10Andrew) [21:10:34] 6Labs: Replicate or backup virt1000 DBs - https://phabricator.wikimedia.org/T90627#1262539 (10Andrew) [21:12:54] 6Labs, 3ToolLabs-Goals-Q4: Glance image files are only stored on virt1000 - https://phabricator.wikimedia.org/T98226#1262575 (10Andrew) 3NEW a:3Andrew [21:14:39] 6Labs: Remove puppet and salt keys on instance deletion - https://phabricator.wikimedia.org/T95911#1262620 (10Andrew) 5Open>3Resolved Resolved by https://gerrit.wikimedia.org/r/#/c/205897/ [21:14:40] 6Labs: Abolish use of ec2 ids - https://phabricator.wikimedia.org/T95910#1262622 (10Andrew) [21:16:02] 6Labs: Sort out labs user privs in Horizon vs. Wikitech - https://phabricator.wikimedia.org/T91830#1262649 (10Andrew) [21:16:15] search on wikitech seems dead [21:16:22] https://wikitech.wikimedia.org/w/index.php?search=x&title=Special%3ASearch&go=Go [21:18:52] tgr: So I see, thanks for reporting. [21:20:04] Coren: I'm currently behind a proxy server, and I use corkscrew to connect to machines outside the server (eg. gerrit). I need to get into a Labs instance the same way. [21:20:48] I looked at the commands for sshing into a proxy server and I'm not sure how to include the corkscrew command in them. [21:21:40] I use ProxyCommand corkscrew 8080 %h %p to connect to gerrit, for example, in my .ssh/config. I was wondering if someone knows how to combine this with the ProxyCommand for getting into a Labs instance. [21:23:35] polybuildr: https://bpaste.net/show/40fe5e91215d should work, I think [21:33:06] valhallasw`nuage: This is what happens https://gist.github.com/polybuildr/a7564b422fc9c932f994 [21:34:26] polybuildr: try ssh -vvv bastion1.eqiad.wmflabs ? [21:34:31] polybuildr: You'll probably have to have more than one level of proxying - labs instance normally need to be reached via one of the bastions. [21:35:33] valhallasw`nuage: Pretty much the same output. [21:35:41] Coren: Meaning bastion1, bastion2 and so on? [21:36:22] polybuildr: Yes, unless you are using tool labs which has its own set of bastions for convenience. [21:37:07] polybuildr: nothing inbetween "debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2" and "Proxy could not open connection to bastion.wmflabs.org: Forbidden" ? [21:37:30] valhallasw`nuage: nope, I didn't edit the output. [21:37:40] polybuildr: when you ssh to bastion1 directly, I mean [21:37:51] valhallasw`nuage: Oh, hold on. Rechecking. [21:38:03] maybe try just running corkscrew, i.e. corkscrew 8080 bastion.wmflabs.org 22 [21:38:17] that should show something like SSH-2.0-OpenSSH_5.9p1 Debian-5ubuntu1.4 [21:38:55] valhallasw`nuage: That just shows "Proxy could not open connection to bastion.wmflabs.org: Forbidden" [21:39:03] And no, no new lines in between those two. [21:39:14] "Forbidden"? [21:39:16] polybuildr: that = corkscrew? [21:39:22] or that = ssh -vvv bastion1? [21:39:36] Coren: it's proxying over an http(s) proxy [21:39:53] valhallasw`nuage: that = corkscrew [21:39:56] polybuildr: It's entirely possible your proxy does not allow a connect to port 22. [21:40:02] polybuildr: I *think* (but am not 100% sure) your proxy blocks stuff over port 22, then [21:40:04] and no new lines for "ssh -vvv bastion1" [21:40:07] (The proxy itself, not corkscrew) [21:40:09] gerrit works because it uses some weird non-standard port [21:40:41] valhallasw`nuage: That'd be a first. Some of the weird non-standard crap from gerrit being an advantage. :-) [21:40:44] Oh. Right, I'd forgotten about the weird port. [21:41:05] I'm wondering whether we can make ssh available over port 443 [21:41:12] Coren: I got the sysadmins to open this port just to let my requests to gerrit work. :P [21:41:26] I'll check about whether it blocks all 22. [21:44:47] valhallasw`nuage, Coren: thanks for the help! :) [21:45:55] valhallasw`nuage: I rarely saw sysadmins block port 22 by design outside of MLS environments; it's probably better if polybuildr just asks. :-) [21:46:19] Coren: well, it's an https proxy, so it's not weird to block everything not-443 by default :-p [21:46:58] valhallasw`nuage: Right, but that's generally it; defaults to closed as opposed to "we don't want ssh" [21:47:07] Yeah, it's an http proxy, blocks all other connections by design. [21:47:21] Yep, the sysadmin confirmed. 22 is blocked. :P [21:47:35] Well, at least you have a sysadmin that's helpful :-D [21:47:36] I can get an exception for my IP, though, so that should be fine. [21:47:52] valhallasw`nuage: Student sysadmins. :') Very kind individuals. [21:48:29] Yeah, I was about to say it doesn't seem right for us to encourage users to circumvent deliberate policy. [21:49:17] Coren: Well, the policy itself is rather ill-designed. So much so that they enable exceptions for anyone who has a legitimate purpose. You wouldn't be doing much harm. :) [21:49:45] 10Tool-Labs, 5Patch-For-Review: Install jamvm under Trusty (which doesn't have it by default) - https://phabricator.wikimedia.org/T98195#1262705 (10coren) 5Open>3Resolved This should now be avaliable on Trusty and Precise both. [21:50:34] polybuildr: I still would be uneasy doing so. If you ask nicely and they say yes then all is well (I was referring to moving ssh to port 443 to work around an explicit decline to open 220 [21:50:40] 22)* [21:52:58] Coren: Fair enough. But yeah, I shall ask nicely and they'll say yes, so it should work out just fine for me. :) [21:53:17] Coren: Also, the sysadmin also told me I could use a VPN if I was in a hurry. :P [21:53:43] That's also something we don't support but that may well be a fairly reasonable idea. [21:54:09] I'm not sure how it would solve hte problem though, as you would still need to get through that proxy [21:55:12] valhallasw`nuage: I'm not sure either, but it certainly does work (I've heard). Tunnels through and makes an HTTP request maybe. [21:55:30] valhallasw`nuage: Well, presumably, if the sysadmins explicitly said polybuildr could use a vpn then I expect they support at least some vpn schemes. :-) [21:57:31] I guess I just have had a few too many bad experiences with sysadmins :P [21:58:23] ignoring requests, freaking out, dismissing because 'you could also do this in this other way even though that would take you a gazillion times longer' [22:04:00] and, arguing from our side, every form that needs to be filled makes it less likely for someone to contribute [22:05:40] Hi all... Im trying to run a query on the wiki db (login.tools.wmflabs.org) and it's taking a few hours. Could someone help me optimize it? Query: select * from user where user_registration >'$START_TIME' and user_registration < '$END_TIME' and (select count(*) from revision where rev_user = user_id) = user_editcount; [22:09:16] Coren: valhallasw`nuage not the first time I’ve seen port 22 blocked. (Indian colleges and their terribleness) [22:10:01] samudra: replace revision with revision_userindex [22:10:04] samudra: First, use revision_userindex not revision. [22:10:07] samudra: hi! two things: for simple queries, try using quarry.wmflabs.org. second, try using revision_userindex instead of revision [22:10:08] heh [22:10:32] :) [22:10:37] {{editconflict}} [22:11:22] Coren: any luck on the lvm testing? [22:11:27] eranroz: thanks! [22:11:51] yuvipanda: Semi. I got mostly sidetracked today by triage and labstore1002 going dead again. [22:11:58] heh, ok [22:14:47] "Unable to initialize environment because of error: denied: host "tools-webgrid-lighttpd-1210.eqiad.wmflabs" is neither submit nor admin host Exiting." <-- What does this mean? [22:15:20] it from the .err output of a jsub job [22:15:30] it's* [22:16:14] hmm [22:16:32] I might have missed adding that as a submit host? [22:17:09] anything i can do to evade it? restart webservice? [22:17:58] afeder: yeah, am doing it again [22:18:02] ok [22:18:03] yuvipanda: thanks for the inputs! Is there any resource which I could use for information like that, e.g. indexes? I couldnt find anything helpful on the wiki. [22:19:13] Coren: ^ is the _userindex stuff documented anywhere? [22:20:15] yuvipanda: In several places. https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs being the most prominent. :-) [22:20:24] samudra: ^ [22:21:00] Oops. [22:21:15] Thanks a lot yuvipanda and Coren :) [22:25:37] Coren: I’ve a 1.5h scheduled with spage tomorrow to go over tools docs ;) [23:00:28] 6Labs, 6operations: Investigate ways of getting off raid6 for labs store - https://phabricator.wikimedia.org/T96063#1262969 (10yuvipanda) p:5Low>3Normal [23:01:44] 6Labs, 6operations: Investigate ways of getting off raid6 for labs store - https://phabricator.wikimedia.org/T96063#1207452 (10yuvipanda) So right now, we have five shelves of disks, and ```/dev/mapper/store-now 40T 11T 30T 27% /srv/project``` So about 72% free. What's preventing us from moving to RA... [23:44:07] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [23:58:33] 6Labs, 6operations: Investigate ways of getting off raid6 for labs store - https://phabricator.wikimedia.org/T96063#1263282 (10yuvipanda) [23:59:06] heya labs folk! I need a labs instance. I'm on the Reading team but wikitech.wikimedia.org only shows me as an administrator for the triply-obsolete editor-engagement project. Is there a Reading project I should be part of? [23:59:23] hi spagewmf_ [23:59:32] projects should not be tied to WMF teams but specific to what you are doing :) [23:59:35] so, what do you want to do? :)