[00:29:58] Reedy: What can you tell me about Typoscan? Is there a reason a regular bot doesn't maintain the list? Maybe Betacommand could write a db bot to do it? [00:30:42] Db stuff is his speciality it seems [00:33:38] Reedy: can I be added as a maintainer to toollabs:awb/typoscan or at least get a restart since it's offline? [03:07:41] T13|detached: I can run a bot, but it wont use the db, it would have to use dumps [03:08:21] I was thinking once a month or even quarterly. [03:09:01] Still need to wait to hear from Reedy about getting added as a maintainer to the existing project. [03:11:06] T13|detached: anything more than 1 a month wouldnt be do able [03:11:32] and if you can get the AWB typo format converted to python its trivial to write a bot to scan for them [03:11:46] Ive already got the code for it [03:11:46] It might take a week to process the query [03:12:10] T13|detached: I can do a dump scan in about a day or two [03:12:38] Last i checked for a simple single item run was ~12 hours [03:12:49] or less [03:13:50] I'll have to see if I can get the code tomorrow. [03:13:58] In bed now [03:14:14] T13|detached: converting the regexes will take quite a bit of time [06:56:34] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [07:21:37] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [10:03:25] (03PS1) 10Alexandros Kosiaris: Add passwords::statistics::user [labs/private] - 10https://gerrit.wikimedia.org/r/190423 [10:16:22] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add passwords::statistics::user [labs/private] - 10https://gerrit.wikimedia.org/r/190423 (owner: 10Alexandros Kosiaris) [14:02:39] any chance one of the admins could restart the webservice for https://tools.wmflabs.org/commonscategorycount/ ? the one-and-only maintainer is out of office and not reachable at the moment unfortunately. [14:10:37] I'm getting 502's for ores-test.wmflabs.org (https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000865.eqiad.wmflabs). The docs say to wait a bit. https://wikitech.wikimedia.org/wiki/Help:LAMP_issues#502_Bad_Gateway_nginx.2F1.1.19 I've waited at least 12 hours. [14:10:52] Is there anything else that I could try? [14:19:22] Tobi_WMDE_SW: ok [14:19:38] petan: thx! [14:21:00] Tobi_WMDE_SW: done [14:24:09] petan: thanks a lot!! [14:25:45] (03PS1) 10BBlack: add frontend-hooks VCL for labs [labs/private] - 10https://gerrit.wikimedia.org/r/190455 [14:26:07] (03CR) 10BBlack: [C: 032 V: 032] add frontend-hooks VCL for labs [labs/private] - 10https://gerrit.wikimedia.org/r/190455 (owner: 10BBlack) [15:30:40] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<37.50%) [15:40:40] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [15:54:33] Well, any idea why is XTools down? [16:25:37] QEDK: I'm not entirely certain what you mean; the only thing xtools seems to be doing atm is a redirect to xtools-articleinfo and not much else [16:44:24] as it turns out I didn't need toollabs access since now, anyone that can approve https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Filippo_Giunchedi ? thanks! [16:44:43] (trying to poke grrrit-wm) [16:44:53] Sure, you are cool enough. :-) [16:45:15] haha thanks Coren [17:25:55] !log restartting grrrit-wm because it's not behaving [17:25:56] restartting is not a valid project. [17:26:26] !log lolrrit-wm restarting grrrit-wm because it's not behaving [17:26:27] lolrrit-wm is not a valid project. [17:26:32] !log grrrit-wm restarting grrrit-wm because it's not behaving [17:26:32] grrrit-wm is not a valid project. [17:26:38] * marktraceur implodes [17:26:57] !log tools restarting grrrit-wm because it's not behaving [17:27:00] THERE. [17:27:00] Logged the message, Master [17:30:58] !log tools.wikibugs legoktm: Deployed 0941e5af42ab1c035b023246da5dde30b17c0f63 Remove Phabricator and Code-Review from -releng wb2-irc [17:31:01] Logged the message, Master [17:35:17] Coren: tools-redis has run out of memory [17:35:23] tools-redis:6379> RPUSH legotest "help" [17:35:23] (error) OOM command not allowed when used memory > 'maxmemory'. [17:38:22] !log tools redis on tools-redis is OOMing? [17:38:25] Logged the message, Master [17:41:36] Gah. Needs a bigger instance, but the downtime would be a pain. [17:44:22] It's already down [17:44:31] It's dead, Coren. Take its stuff. [17:45:11] Yeah, I was about to kick it; but making it bigger is longer. I suppose it's worthwhile though. [17:46:51] I vote yes, so we don't need to do this again :) [17:47:25] PROBLEM - Host tools-redis is DOWN: CRITICAL - Host Unreachable (10.68.16.26) [17:47:55] RIP [17:48:30] !log tools rebuilding tools-redis with moar ramz [17:48:32] Logged the message, Master [17:50:05] :D [17:57:08] RECOVERY - Host tools-redis is UP: PING OK - Packet loss = 0%, RTA = 0.94 ms [17:57:27] That was... impressively painless. [17:58:02] PROBLEM - Puppet failure on tools-redis is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [18:00:12] Coren: is it ready to be used now? [18:00:24] legoktm: It looks like it is. [18:00:33] woot! thanks for the quick fix :) [18:01:06] !log tools.wikibugs legoktm: Deployed 0941e5af42ab1c035b023246da5dde30b17c0f63 Remove Phabricator and Code-Review from -releng wb2-phab [18:01:09] Logged the message, Master [18:01:32] !log tools tools-redis is dead, long live tools-redis [18:01:34] Logged the message, Master [18:01:47] Coren, xtools has no more instances or quotas. What happened to them? [18:02:18] Nothing is showing up on Wikitech. [18:02:22] Cyberpower678: Nothing, chances are your keystone token is expired - log off and on again and you shoudl see them [18:03:04] RECOVERY - Puppet failure on tools-redis is OK: OK: Less than 1.00% above the threshold [0.0] [18:04:59] Coren, thanks now "ssh: Could not resolve hostname login.wmflabs.org: nodename nor servname provided, or not known" [18:05:07] For the login instance. [18:05:59] That name doesn't exist - did you meant login.tools.wmflabs.org instead? [18:06:25] I'm trying to access the login instance of xtools. [18:06:47] You need to go through a bastion. [18:06:58] Meaning? [18:07:20] (And its name would then be login.eqiad.wmflabs) [18:07:50] o_O proxy agent through bastion.wmflabs.org - the normal process for all non-tools projects. [18:08:11] Well, I've never been outside of the tools project. [18:09:58] Cyberpower678: Ah! :-) https://wikitech.wikimedia.org/wiki/Help:Access [18:11:45] I'm getting 502's for ores-test.wmflabs.org (https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000865.eqiad.wmflabs). The docs say to wait a bit. https://wikitech.wikimedia.org/wiki/Help:LAMP_issues#502_Bad_Gateway_nginx.2F1.1.19 I've waited at least 12 hours. Is there something else I could try? [18:12:10] Coren: ^ [18:13:35] halfak: Well, presuming that the web server on the instance is actually up and running, I'd take a look at its logs. [18:13:52] Coren, do you think the request is making it to the machine? [18:14:08] The logs will tell you for sure. [18:14:26] Well, it doesn't seem to matter whether the webserver is running on the machine or not. [18:14:33] So I think that tells me. [18:14:38] Also, I'm not running nginx [18:14:45] And the error message is from nginx [18:14:57] halfak: That's because the proxy is running nginx [18:15:02] Indeed. [18:15:10] So I suspect the proxy is the problem :) [18:15:32] I doubt that - this would have broken all the things, and that is too many. :-) [18:16:18] Well... what could I try? Can I view the proxy logs? [18:16:27] Make sure your project scecurity rules actually allow connection on port 80, also. [18:16:41] halfak: Not unless you have a signed NDA; the proxy logs have PII [18:16:55] halfak: But I can search for specific things in it if you want. [18:16:58] I'm staff [18:17:05] Oh, duh. [18:17:27] My IRC handle to people hash has too many collisions. :-) [18:17:41] :) [18:18:16] But 90% of issues people have with the proxy is that their project blocks port 80. Did you check that? [18:18:50] Gotcha. Yes. There's a security group enabled for port 80. [18:22:56] halfak: I don't see anything running on port 80 on that instance. [18:23:15] Oh yes. I specifically took it down for testing. [18:23:31] I should get something other than a 502, yes? [18:23:55] Well no, a 502 is exactly "the backend server don't work" [18:24:12] Oh. Seems like it says something about "gateway" to me. [18:24:45] * halfak starts up the port 80 server [18:24:59] The error message is from the perspective of the frontend [18:25:06] which /is/ a gateway. :-) [18:25:40] So, when I try localhost:80 from the machine, I get a 404 (which is what I expect) [18:26:06] halfak: Your server is explicitly listening on 127.0.0.1 [18:26:07] !log tools.wikibugs valhallasw: Deployed 0941e5af42ab1c035b023246da5dde30b17c0f63 Remove Phabricator and Code-Review from -releng wb2-irc [18:26:08] :-) [18:26:13] Logged the message, Master [18:26:38] halfak: More to the point, it's explcitly /not/ listening to the actual network. :-) [18:27:14] I see. [18:27:56] In other words, that 502 is normal and expected. [18:27:59] 3Wikibugs: wb2-irc: reconnect to redis after errors - https://phabricator.wikimedia.org/T89480#1037170 (10valhallasw) 3NEW [18:28:02] legoktm: ^ [18:28:24] Well, I wouldn't expect to receive a 502 if I was hitting the server directly. [18:28:33] :DDDDDDD [18:28:48] But I appreciate that is what you would expect. [18:29:05] valhallasw`cloud: so we should probably restart grrrit-wm and maybe gerrit-to-redis? [18:29:22] I wonder if this could be added to the docs rather then telling me that "this is expected". [18:29:33] I'd like to expect it in the future :) [18:30:30] \o/ it works. [18:30:38] Anyway, thanks for your help Coren. [18:30:44] halfak: No worries. [18:31:08] legoktm: let me remember how to do that [18:31:21] valhallasw`cloud: i think it's just grrrit-wm [18:31:26] gerrit-to-redis logs look fine: [18:31:27] Coren, ssh login yields "If you are having access problems, please see:https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [18:31:27] Permission denied (publickey)." [18:31:27] 2015-02-13 18:30:07,044 Pushed to 0 clients [18:31:49] yeah lolrrit-wm wasn't running [18:31:49] !log tools.lolrrit-wm restarted [18:31:51] Logged the message, Master [18:32:04] hurray [18:33:04] valhallasw`cloud: eh...not working though :/ [18:33:10] meh. [18:33:26] 2015-02-13 18:32:46,229 Pushed to 0 clients [18:33:30] that sould be 1 client :/ [18:33:56] !log restarting gerrit-to-redis [18:33:56] restarting is not a valid project. [18:33:57] Cyberpower678: That means you probably don't have your key forwarded or aren't using proxycommand. [18:34:02] legoktm: ugh [18:34:11] legoktm: that's because we killed redis [18:34:17] lol [18:34:28] Coren: https://wikitech.wikimedia.org/w/index.php?title=Help%3ALAMP_issues&diff=144204&oldid=85127 look good? [18:34:39] I wrote a script for that ;p [18:35:21] Coren, have a manual on how to set that up so the keys stored on Wikitech preferences work? :p [18:35:21] halfak: Feels a bit overly specific to me, but if it helps people in the future it's all good. [18:35:31] Overly specific? [18:36:34] 8:09‧ Cyberpower678: Ah! :-) https://wikitech.wikimedia.org/wiki/Help:Access [18:36:53] UGGGH [18:37:00] I must've disconnected when you said that. :p [18:37:11] Coren, Is this not a "Help" page. Should it not try to "help"? [18:37:32] halfak: 502 just means "can't reach whatever is behind that proxy"; those are only two of the umptillion reasons why that could happen. :-) [18:37:37] legoktm: but the script is broken :< [18:37:54] Yeah, having an error message that is the same for many different types of errors == :( [18:38:23] Also, note that the docs didn't say anything about the machine being unreachable until I got here. [18:38:23] halfak: Well, in all fairness the proxy can't possiblt know why it can't reach the backend. [18:38:52] Coren, fair enough. Seems like listing common reasons is a fine thing for a "Help page" [18:39:03] valhallasw`cloud: :| [18:39:05] Oh, no objection from me. [18:39:12] legoktm: and I have no lcue why :< [18:39:36] I wonder if you could update that 502 page to say, "couldn't connect to instance via port 80" or something [18:43:23] legoktm: it's as if there's a completely different version there :| [18:43:39] there is. [18:43:40] wtf. [18:43:49] o.O [18:46:13] .... [18:47:34] Coren, https://wikitech.wikimedia.org/w/index.php?title=Help%3ALAMP_issues&diff=144207&oldid=85127 better? [18:48:01] valhallasw`cloud: we should just re-write all of this in python [18:48:07] maybe [18:48:20] it's the python part that's breaking though [18:48:24] oh? [18:48:34] (03PS3) 10Merlijn van Deen: hacky script to dump mysql subscriptions to redis [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/185644 [18:48:35] which part is in python? :P [18:48:40] halfak: Sure. It was okay before too. [18:48:43] :OOOOOOOOOOOO [18:48:47] legoktm: the part which does the subscriptions [18:48:48] it's alive! [18:58:35] 3operations, Labs, hardware-requests, ops-eqiad: Can virt1000 take more ram? - https://phabricator.wikimedia.org/T89266#1037265 (10RobH) a:5Cmjohnson>3Andrew @andrew & @coren: When did you guys want to take virt1000 down to have more memory installed? It sound like you guys are not in agreement if this is... [18:58:51] 3operations, Labs, hardware-requests, ops-eqiad: virt1000 memory upgrade - https://phabricator.wikimedia.org/T89266#1037268 (10RobH) [19:15:39] 3operations, Labs, hardware-requests, ops-eqiad: virt1000 memory upgrade - https://phabricator.wikimedia.org/T89266#1037376 (10coren) No, don't worry about it - it's definitely wanted, I'm just wondering wether it is //sufficient//. At any rate, since this may make labs clunky for a while we probably want to av... [19:22:40] Coren, successfully accessed login. [19:38:00] Coren, successfully accessed the other instances. [19:38:07] legoktm: it's a weird combination of node, python and lua :/ [19:38:24] Cyberpower678: you can haz suksess. [19:38:37] valhallasw`cloud: I'm fine with python and lua :P [19:38:37] Coren, successfully planted a virus on labs. >:D [19:38:42] legoktm: :p [19:38:43] :-) [19:39:09] Coren, which means my config file has no BOM. :p [20:04:46] 3Labs, operations: Fix php5 cli conf.d symlinks on silver - https://phabricator.wikimedia.org/T89468#1037475 (10Krenair) [21:28:13] Coren, what does puppet do? [21:29:35] Cyberpower678: In general you mean? It's configuration management. You describe the system state in its language, and as it runs it applies the necessary changes to make the system match. [21:30:33] Apply the changes where? [21:30:59] Wherever needed on the system depending what what you configure. It can do pretty much everything root does. [21:31:23] How do you configure it is what I'm asking. [21:32:33] That's... a seriously long subject for a one-liner answer. https://docs.puppetlabs.com/learning/ should be a fairly good starting point. [21:32:59] hi folks, any word on when the ldap problem will be fixed? i have labs nodes that still can't mount /home. [21:34:38] or is there a phab ticket where i can read more about it? [21:36:16] Coren, where are the DB host files on Tools? [21:37:10] jgage: I don't see one? https://phabricator.wikimedia.org/maniphest/query/iUjHz2S.Z11x/#R [21:38:06] i found https://phabricator.wikimedia.org/T87870 which is about nfs and ldap but it doesn't mention my problem [21:39:04] all i know is the nfs server isn't exporting /home for my project to all clients in that project [21:39:15] and coren said it was "probably" a known ldap issue [21:40:00] <^demon|busy> jgage: You're possibly looking for https://phabricator.wikimedia.org/T89001 [21:40:07] <^demon|busy> (which has affected several of my instances too) [21:40:30] jgage: What instance is this? [21:40:38] ipsec-c5, ipsec-c6 [21:40:42] thanks ^demon [21:40:44] <^demon|busy> yw [21:41:30] coren, those hosts have the same problem with /data/project [21:41:40] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<12.50%) [21:41:50] jgage: Same export, the opposite would be stunning. [21:45:41] jgage: I can fix your issue with those instances by hand, but the underlying cause is currently mysterious. [21:46:03] thanks. otherwise it looks like i can just create new ones and they should probably be ok now that wikitech has moved? [21:46:41] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [21:47:06] I made an instance 2 days ago(?) that had this issue I think [21:47:31] jgage: It's not clear that it will; it appears to be random-ish but it may be related to how virt1000 still has issues. [21:48:25] jgage: At any rate, I manually fixed those issues in LDAP and the exports should appear within a few minutes. [21:48:34] thank you sir [21:55:27] hmm ipsec-c6 is ok now, but ipsec-c5 still isn't even though i see its ip in the exports file. i'll get lunch and check on it in a lil while. [21:58:53] Coren, Coren, where are the DB host files on Tools? [21:59:07] "DB host files"? [22:00:04] Whatever defines the hosts for the DBs. [22:00:15] such as tools-db and enwiki.labsdb [22:00:22] Ah! /etc/hosts [22:00:59] Does each instance need it's own hosts file, or is it global? [22:01:10] This is quite a learning experience/ [22:01:21] It's per-instance; it's a local config file. [22:01:31] Ok. [22:02:02] 3Labs: fix http://openmeetings.wmflabs.org/; currently times out - https://phabricator.wikimedia.org/T86698#1037873 (10Dzahn) p:5Triage>3Low [23:41:10] hi... any tool labs root around who'd like to (re)start the webservice of a project for me? I'm not a member of it... [23:41:22] It was built for WMDE, but is down now [23:43:03] petan: ^ [23:43:20] tree-of-life [23:45:02] 37:e5:17:6b:d9:53:e7:f6:53:66:89:7f:14:f7:7f:3b [23:45:32] sorry..