[00:24:53] !log phabricator role::phabricator_server renamed to role::phabricator - adjusted role on instance: phabricator (the others dont seem to use it, if there are more instances in other projects using it, please adjust) [00:24:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [00:29:47] !log wikilabels adding self to project and admin to adjust puppet style and convert roles to profiles [00:29:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [00:42:04] !log wikilabels - all instances using role::wikilabels already had "Could not find declared class ::profile::wikilabels::server" puppet error before gerrit:400252, attempting to fix that too [00:42:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [00:51:22] !log wikilabels - puppet runs fixed on wikilabels-staging-01, wikilabels-01, wikilabels-experiment after https://gerrit.wikimedia.org/r/#/c/403572/ [00:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [12:24:17] Hey y'all. I'm having a bit of a hiccup with horizon. I created a new instance in a project and in the instance config, I enabled mediawiki_vagrant class. But I don't see the mediawiki-vagrant directory or any vagrant files in the instance when I SSH in. The log doesn't indicate anything relating to this either. What am I missing? [12:24:38] instance puppet config* [12:25:18] Niharika: did puppet run to pick up the change? [12:25:20] did you try forcing puppet to run? [12:25:39] I don't think it did. How/where can I do that? [12:25:49] Niharika: sudo puppet agent -t -v [12:26:09] Not possible from horizon, right? [12:26:19] from inside the instance in SSH [12:26:36] Doing it now. [12:27:17] It failed. [12:27:19] https://www.irccloud.com/pastebin/O7Qe5F8h/ [12:27:48] Because it's a stretch instance? Is mw-vagrant incompatible with stretch? [12:27:53] Niharika: perhaps there is no support for stretch in that class [12:28:26] could you please open a phabricator task with your case so we the team could review it in deep? [12:28:32] I'm pretty sure I have another instance using stretch and running vagrant. [12:28:36] I will. Thanks. [14:46:46] !log tools install metltdown kernel and reboot workers 1011-1016 as jessie pilot [14:46:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:53:47] !log tools reboot tools-exec-1401 [14:53:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:57:42] !log tools reboot tools-exec-1401 again... [14:57:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:02:17] !log tools reboot tools-exec-1402 [15:02:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:06:19] !log tools reboot tools-exec-1403 [15:06:21] !log tools reboot tools-exec-1404 [15:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:06:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:08:43] !log tools reboot tools-exec-1405 [15:08:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:07:41] Niharika: T180377 -- I don't think that role::labs::mediawiki_vagrant works on a Stretch guest at the moment. [16:07:42] T180377: Does role::labs::mediawiki_vagrant provision cleanly on Debian Stretch hosts? - https://phabricator.wikimedia.org/T180377 [16:08:37] which becomes super funny as soon as I land the stretch-migration branch in MediaWiki-Vagrant and make the VM/container it builds be Stretch based [17:03:37] bd808: For grantreview, I created a new instance (jessie this time), grantreview-03. I applied the mediawiki_vagrant role, which did make a mediawiki-vagrant directory appear. I'm able to enable roles but I can't provision it or ssh in. It says "==> default: The container hasn't been created yet." I've not seen this error before. Should I also be applying the vagrant lxc role? It's not there for scholarships instances we have. [17:04:32] it should be picked automatically. Did you start the VM using `vagrant up` yet? [17:04:46] the role doesn't do that automatically. [17:05:25] it will restart an existing LXC container when the host is rebooted, but it won't make one the first time [17:09:26] bd808: Ah, I thought the role does that. [17:09:28] * Niharika does [17:50:27] !log deployment-prep added Groovier1 to project members for T158909 [17:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [17:50:37] T158909: Automatically detect spambot registration using machine learning (like invisible reCAPTCHA) - https://phabricator.wikimedia.org/T158909 [17:56:44] bd808: Vagrant booted and provisioned fine. I'm still getting a 502. The web proxy is setup correctly, as far as I can see. http://grantreview-dev.wmflabs.org Do you have any idea what might be wrong? [19:00:53] !log tools reboot tools-worker-1015 [19:00:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:30:21] Niharika: I'll take a look. It might be firewall stuff [19:32:41] !log grantreview Added grant for port 8080 to default security group [19:32:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Grantreview/SAL [19:33:46] !log grantreview Changed grantreview-dev proxy to point to grantreview-03:8080 [19:33:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Grantreview/SAL [19:34:49] Niharika: all better now I think. Looks like we need to do some hiera changes to make Apache serve up the review app instead of the wiki for that vhost name [19:57:20] !log grantreview Deleted grantreview-01 and grantreview-02 VMs [19:57:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Grantreview/SAL [19:59:51] !log grantreview Setup puppet/hieradata/local.yaml on grantreview-03 so that grantreview app is served for grantreview-dev.wmflabs.org vhost [19:59:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Grantreview/SAL [20:01:29] Niharika: I think we can nuke the grantreview-dev.grantreview.eqiad.wmflabs VM now, but I'll wait until you give the ok that you have pulled anything you have there locally off first. [20:02:39] !log tools depooling tools-exec-1411, tools-exec-1440, tools-webgrid-lighttpd-1419, tools-webgrid-lighttpd-1420, tools-webgrid-lighttpd-1421 [20:02:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:06:24] !log tools cordoning tools-worker-1012 and tools-worker-1017 [20:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:06:40] !log ores created ores-web-01 as Stretch instance [20:06:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [20:33:17] !log tools uncordoning tools-worker-1012 and tools-worker-1017 [20:33:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:33:52] !log tools repooling tools-exec-1411, tools-exec-1440, tools-webgrid-lighttpd-1419, tools-webgrid-lighttpd-1420, tools-webgrid-lighttpd-1421 [20:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:01:07] !help [21:01:08] tomthirteen: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [21:01:30] tomthirteen: what's your question? [21:02:42] !ask [21:02:42] Hi, how can we help you? Just ask your question. [21:03:30] Hello, I am running SQL scripts in WinSCP that are querying and making files in Wikipedia's cloud. However, the script crashed, and now it doesn't show files in WinSCP when they are listed in the shell. I get constant error messages when I try to refresh WinSCP (see attachment). Your help is much appreciated. Tom [21:04:14] where to see the attachment? [21:05:01] sorry the error message reads Unexpected directory listing line 'sql enwiki_p select'. Invalid rights description 'ql enwiki' [21:06:14] It work for a part of the script then crashed [21:06:26] Now I can't get WinSCP to refresh properly [21:08:29] tomthirteen: you are executing scripts over scp? [21:09:27] I'm loading script in WinSCP and executing them in shell [21:11:27] oh, that makes more sense :) [21:11:53] so edit script locally, upload to your $HOME on toolforge, and then run the script [21:12:42] so lets start at the beginning. Is your biggest problem that the script failed to run or that winscp is not reading your directory now? [21:14:12] Thank you for your help. I can get the scripts to run, but WinSCP does not properly show the files in it. They list correctly in the shell but not in WinSCP [21:15:10] What is toolforge? [21:15:16] !log git upgraded gerrit on gerrit-new.wmflabs.org to include the new edit preference (merged upstream). [21:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [21:17:40] andrewbogott: for dwl:taxonbot I get ssh taxonbot -> Permission denied (publickey). Was anything wring with the reboot [21:18:04] *wrong [21:18:16] doctaxon: that's a tool or a VM? [21:18:20] if a VM what's the hostname? [21:18:37] taxonbot@bastion-01 [21:20:46] andrewbogott: I try: taxonbot@bastion-01:~$ ssh taxonbot [21:20:55] Permission denied (publickey). [21:20:56] doctaxon: I don't think that can be related; I didn't reboot that host and it works for me. [21:21:07] wait, you're sshing from on bastion-01? [21:21:14] yes [21:21:20] and taxonbot is… [21:21:25] in what project? [21:21:27] dwl [21:22:05] when did you last connect to it? [21:22:29] tomthirteen: Toolforge is the current name of "Tool Labs". [21:23:33] doctaxon: taxonbot.dwl.eqiad.wmflabs was indeed rebooted. But it works for me, so I'm wondering if this is just a garden-variety access issue [21:23:45] tomthirteen: there is a file in your $HOME named "Five_Nights_at_Freddys.txt;\n\nsql\ enwiki_p\ select" -- that is probably breaking WinSCP's directory parsing routines [21:23:57] yes [21:24:06] but I can't delete it [21:24:07] but the ssh doesn't work [21:24:12] tomthirteen: try renaming it on the command line. [21:24:35] tomthirteen: would you like me to rename it for you? I think I can figure it out. [21:24:52] one moment. let me try [21:25:43] YOu will probably need to quote the filename. something like `mv "Five_N` and then hitting tab should complete the name as the shell sees it [21:26:08] doctaxon: how about now? [21:26:12] then close the quotes and type a new name [21:26:42] andrewbogott: fine [21:26:47] what was wrong [21:27:02] that semi colon is really screwing things up [21:27:09] I restarted nslcd which caches ldap info [21:27:20] I don't know why it was broken but it would've reloaded itself in a few minutes either way. [21:27:24] (well, or in < one hour) [21:27:37] I don't know why a reboot would've done that [21:28:27] doctaxon: anyway, let me know if you run into other troubles [21:28:47] andrewbogott: thank you, I've been died once more [21:29:24] sorry, I don't understand what 'I've been died once more' means [21:29:28] andrewbogott: thanks for taking care of the reboot stuff and keeping us updated! [21:29:38] I lost one life [21:29:47] ^^ [21:30:05] Nettrom: sure thing. There's a lot more to to come unfortunately, it's been a bad few weeks for security :( [21:30:13] * bd808 throws doctaxon a spare 1-up mushroom [21:30:17] doctaxon: ok :) [21:30:28] ^^ [21:30:36] andrewbogott: yeah, sorry to hear that :( [21:30:47] tomthirteen: yeah, its being weird for sure. I haven't found a way to change it either yet. [21:31:45] i've tried $ mv "Five_Nights_at_Freddys.txt;" "text.txt" mv: cannot stat ‘Five_Nights_at_Freddys.txt;’: No such file or directory [21:31:49] it's not working [21:32:17] the full name of that file also has a newline in it [21:32:40] I can't get a * splat to pick it up yet either [21:33:49] What should I do? [21:33:51] tomthirteen: victory! for f in Five*; do mv "$f" Five_Nights_at_Freddys.txt.badname; done [21:34:12] its' renamed now as Five_Nights_at_Freddys.txt.badname [21:34:26] neat. What did you do? [21:34:51] I ran `for f in Five*; do mv "$f" Five_Nights_at_Freddys.txt.badname; done` [21:35:16] Ok if this happens again, and I think it will I need to run [21:35:29] "$f" and then rename? [21:36:12] the trick I used there was to let bash's "for" construct properly expand the file name [21:36:52] tomthirteen: inspired by this stackexchange answer -- https://unix.stackexchange.com/a/218299/10171 [21:38:59] Ok, I'll "try" to understand that [21:39:14] Anyway, thanks again for all your help. This forum is the best. [21:39:24] All the best [21:52:41] is there some network maintenance going on? I'm trying to connect to labs machine and get no route to host 80% of the time and very bad connection other 20% [21:54:24] SMalyshev: that would likely be outside of the tiny part of the network we control... or bad timing as the resent meltdown reboots happened [21:54:54] yeah but it already booted. the host is up, but the network is being bad... [21:55:38] SMalyshev: https://wikitech.wikimedia.org/wiki/Reporting_a_connectivity_issue may have some debugging tips that would help trace the cause [21:55:42] also looks like it's host-specific - I'm fine with wdsearch.eqiad but wdqs-test.eqiad is bad [21:58:21] SMalyshev: what project are those hosts in? [21:58:37] wdsearch is in search, wdqs-test in in wikidata-query [22:00:45] groups: cannot find name for group ID 1003 [22:00:52] something weird is going on there [22:01:41] the vm does seem sick. I got in with ssh but now the session is unresponsive [22:02:01] or at least very slow ... [22:02:56] ok let me try to reboot it [22:03:36] that VM is running on labvirt1014.eqiad.wmnet. andrewbogott is that one of the labvirts that has the new kernel? [22:04:45] looks like no [22:04:57] bd808: no it wasn't [22:05:09] * bd808 found the etherpad [22:11:02] okie I rebooted it... but doesn't look like it fixed it [22:11:10] still no route to host [22:13:25] it gets to bastion fine, so the problem seems to be later than that [22:15:31] SMalyshev: I got right in :/ [22:15:44] Reedy: how would i make wikibugs notify about phab repos (https://phabricator.wikimedia.org/T160973 ) [22:16:33] SMalyshev: ah, but i came in through a different bastion. Maybe that's related [22:16:37] it connects sometimes... like 20% of tries... and then connection stalls and breaks [22:16:47] I'm using bastion-01.bastion.eqiad.wmflabs [22:17:29] but I'm using the same one for other hosts and it's fine [22:18:27] (03Draft2) 10Zppix: add #wikimedia-ve to wikibugs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/403766 (https://phabricator.wikimedia.org/T160973) [22:23:39] kernel:[ 952.558178] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [sudo:2508] [22:23:45] huh this is new [22:23:49] from the same host [22:39:55] wdqs-test.wikidata-query.eqiad.wmflabs is among the hosts that had super broken puppet and/or other things and I was logged in to it a couple of days ago... [22:41:31] /srv is 100% full which probably isn't helping things [22:45:06] SMalyshev: systemd-journal is using almost 100% of the CPU on that box. I don't know what it's doing. [22:45:11] Can you suggest some things I can delete from /srv? [22:51:55] wdqs-test kernel: serial8250: too much work for irq4 [22:53:49] andrewbogott: yeah there are a zillion log messages about serial8250 things [22:54:42] I'm going to update the kernel, because we'll be doing that shortly anyway [22:55:20] Old bugs in the trackers like " With virtual machines like qemu, it's pretty common to see "too much work for irq4" messages nowadays. This happens when a bunch of output is printed on the emulated serial console. This is caused by too low PASS_LIMIT. When ISR loops more than the limit, it spits the message." [22:55:38] OK, so, this VM has the meltdown-patched kernel [22:55:44] but isn't running on a labvirt with a patched kernel [22:55:50] so, that could be something... [22:56:00] Although I guess it wasn't running the new kernel until SMalyshev rebooted it half an hour ago [22:56:54] i see lots of this Jan 11 09:46:41 jenkins-slave-01 rc.local[670]: + '[' '!' -f /var/lib/cloud/instance/boot-finished ']' in syslog [22:58:23] looks like it's settling down now [22:59:06] So, I think that was maybe a red herring… blazegraph-service-0.3.0-SNAPSHOT.war is eating way more CPU [22:59:08] yeah /srv is a problem... [22:59:16] so, SMalyshev, would say the problem is /srv and blazegraph-service-0.3.0-SNAPSHOT.war both being unhealthy [22:59:24] sort those things out and then see if you still have a problem [22:59:44] if I can log in there finally I'll try to see what's up with that... [23:00:49] ok seems that network is back to working there [23:01:16] it comes and goes, I think depending on CPU starvation [23:01:25] (for me, it just went again) [23:02:21] 50M Jan 11 23:01 daemon.log [23:02:23] wow 50m [23:02:31] 107M Jan 8 06:25 daemon.log.1 [23:02:34] oh 107 [23:48:19] NFS requires a host-only network to be created. [23:48:19] Please add a host-only network to the machine (with either DHCP or a [23:48:19] static IP) for NFS to work. [23:48:21] Thanks bd808! There's nothing important on the grantreview-dev instance so I'll just nuke it myself. [23:48:25] why am I suddenly get this? [23:50:04] SMalyshev try mwvagrant reload [23:50:08] or vagrant reload [23:50:11] until it works [23:50:52] reload seems to be stuck on shutdown [23:51:04] it shuts down all lxc processes and then just sits there [23:53:35] I have rotten luck today with VMs :( [23:55:21] SMalyshev ah your using jessie? [23:55:29] that's fixed with the stretch image for mwv [23:55:30] yes [23:55:38] SMalyshev reboot the instance [23:55:38] hmm I am using vagrant repo master [23:55:41] as the workaround [23:56:12] SMalyshev there's a stretch-migration branch but you would need to start from fresh, ie you wont be able to migrate your content from jessie mwv to stretch mwv. [23:57:44] I just want to run the default setup... [23:58:01] SMalyshev ok, the workaround is reboot the instance [23:58:07] then mwvagrant up or vagrant up [23:58:18] if it comes up with nfs error again, you need to repeat this. [23:58:23] it should eventually work. [23:58:44] ok, I think I got it to proceed to provision stage... that's better [23:59:42] but then I get ==> default: Error: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install hhvm' returned 100: Reading pack [23:59:52] and everything is broken from there on