[09:05:58] (03CR) 10MarcoAurelio: "recheck" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/436936 (owner: 10Libraryupgrader) [09:06:21] (03CR) 10jerkins-bot: [V: 04-1] build: Updating mediawiki/mediawiki-codesniffer to 19.0.0 [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/436936 (owner: 10Libraryupgrader) [09:10:50] (03CR) 10MarcoAurelio: "Filed T196530." [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/436936 (owner: 10Libraryupgrader) [09:14:54] hello, cannot ssh into etytree-b.etytree.eqiad.wmflabs [09:15:06] anyone who can give me tips? thank!! [09:15:33] I'm doing ssh -F ~/.ssh/config_labs etytree-b.etytree.eqiad.wmflabs [09:16:19] following Accessing instances with ProxyCommand ssh option (recommended) [09:16:19] in https://wikitech.wikimedia.org/wiki/Help:Access [09:16:54] I get Permission denied (publickey) [09:18:15] and added my public key to https://wikitech.wikimedia.org/wiki/Special:NovaKey [09:18:30] sorry to wikitech [09:19:01] Ester: did you specify ssh -i explicitly? [09:19:15] where? [09:19:20] ssh -F ~/.ssh/config_labs etytree-b.etytree.eqiad.wmflabs [09:19:22] here? [09:19:41] ah [09:20:04] -F ~/.ssh/config_labs is missing from the proxy command [09:20:34] what do you mean? [09:20:43] ProxyCommand ssh -a -W %h:%p @primary.bastion.wmflabs.org [09:20:49] ^ uses default config [09:20:51] no -F [09:21:42] I really don't understand why people choose to use -i or -F in the command line instead of writing it to default ~/.ssh/config [09:22:04] sometimes you have multiple proxies [09:22:09] could that be an answer? [09:23:19] you can seperate the config with Host or Match blocks [09:23:25] ok [09:23:38] eg. I have a: [09:23:39] so, I'm not that advanced but in other words [09:23:42] https://www.irccloud.com/pastebin/fqC8pqIr/ [09:24:02] I should just move config_labs to config? [09:24:12] that applies only to github uris [09:24:19] I recommend that way [09:24:43] but yes you can set the -F argument in ProxyCommand [09:27:51] after that I just do ssh -etytree-b.etytree.eqiad.wmflabs [09:27:53] ? [09:28:07] ssh etytree-b.etytree.eqiad.wmflabs [09:28:28] I still get Permission denied (publickey). [09:28:29] :( [09:29:00] can you `ssh -vvv etytree-b.etytree.eqiad.wmflabs` and paste the logs? [09:29:58] OpenSSH_7.3p1, OpenSSL 1.0.2k 26 Jan 2017 [09:29:58] debug1: Reading configuration data /Users/esterpantaleo/.ssh/config [09:29:58] debug1: /Users/esterpantaleo/.ssh/config line 1: Applying options for * [09:29:58] debug1: /Users/esterpantaleo/.ssh/config line 2: Deprecated option "useroaming" [09:29:58] debug1: /Users/esterpantaleo/.ssh/config line 4: Applying options for *.eqiad.wmflabs [09:29:59] debug1: /Users/esterpantaleo/.ssh/config line 7: Applying options for *.wmflabs [09:30:02] debug1: /Users/esterpantaleo/.ssh/config line 10: Applying options for * [09:30:04] debug3: kex names ok: [curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256] [09:30:06] debug1: Reading configuration data /usr/local/etc/ssh/ssh_config [09:30:08] debug1: Executing proxy command: exec ssh -a -W etytree-b.etytree.eqiad.wmflabs:22 epantaleo@primary.bastion.wmflabs.org [09:30:10] debug1: identity file /Users/esterpantaleo/.ssh/id_rsa type 1 [09:30:13] debug1: key_load_public: No such file or directory [09:30:15] debug1: identity file /Users/esterpantaleo/.ssh/id_rsa-cert type -1 [09:30:17] debug1: Enabling compatibility mode for protocol 2.0 [09:30:19] debug1: Local version string SSH-2.0-OpenSSH_7.3 [09:30:21] debug1: permanently_drop_suid: 501 [09:30:23] Enter passphrase for key '/Users/esterpantaleo/.ssh/id_rsa': [09:30:26] Permission denied (publickey). [09:30:32] after I enter the passphrase [09:30:37] i get Permission denied (publickey) [09:31:38] next time, can you paste it somewhere? eg. pastebin.com, dpaste.de, phab paste, or toolforge paste? [09:31:54] sure!! sorry! [09:32:02] it looks like that the proxycommand still failed [09:32:22] can you `ssh -vvv epantaleo@primary.bastion.wmflabs.org` [09:34:06] https://pastebin.com/V89NTyzT [09:34:10] I'm insterested in stuffs that happen after `debug1: Authentications that can continue: publickey` [09:34:15] ok [09:35:01] debug3: receive packet: type 60 is okay [09:35:12] I mean it means the key is accepted [09:35:23] 51 is declined [09:36:29] after the password input a successful auth should have type 52 [09:37:46] so the passphrase is wrong? [09:37:50] I don't think I can check the logs on the bastion project [09:38:20] I don't think the passphrase is wrong. in that case it would usually re ask for the passphrase [09:38:30] if it can't decrypt the key [09:38:49] does ssh-ing to toolforge work? [09:39:24] how? [09:39:44] to the tools [09:39:46] ok I'll try [09:41:06] https://www.irccloud.com/pastebin/9PctgnZs/ [09:41:15] does this match your public key? [09:41:34] it's about twice as long as mine [09:44:03] yes it does match [09:44:22] the slashes are ok? they look weird to me [09:44:55] yes the slashes are part of base64 encoding [09:44:58] ok [09:45:22] maybe there is some weird setting on my computer [09:45:25] could it be? [09:46:12] idk [09:46:39] anyways, does toolforge work? I'll check the logs on toolforge if not [09:47:03] so I just changed the ssh key [09:47:10] do I need to update anything [09:47:10] ? [09:47:22] ? [09:47:43] so this morning I generated a new ssh key pair [09:47:45] you need to update your public key on striker or wikitech [09:47:50] and I updated it on wikitech [09:47:51] that should be it [09:48:06] ssh -i ~/.ssh/id_toolforge epantaleo@login.tools.wmflabs.org [09:48:11] gives me permission denied [09:48:28] * zhuyifei1999_ looks at the logs [09:49:36] !log tools T196137 aborrero@tools-clushmaster-01:~$ clush -w@all 'sudo service prometheus-node-exporter restart' <-- procs using the old uid [09:49:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:49:39] T196137: toolforge: prometheus issue is filling up email queue - https://phabricator.wikimedia.org/T196137 [09:50:47] https://www.irccloud.com/pastebin/tnEJTPeE/ [09:50:50] Ester: ^ [09:51:24] can I ask why is '-i ~/.ssh/id_toolforge' here? [09:56:34] 17:50:46 https://www.irccloud.com/pastebin/tnEJTPeE/ [09:56:34] 17:50:49 Ester: ^ [09:56:34] 17:51:24 can I ask why is '-i ~/.ssh/id_toolforge' here? [09:56:50] that's how i used to connect [09:58:33] I don't really know what 'error: key_verify: error in libcrypto' means [09:58:50] if I do ssh epantaleo@login.tools.wmflabs.org I get Permission denied (publickey,hostbased). [09:58:53] I still find it weird that your RSA key is really long [09:59:08] I can generate a new one [09:59:31] ssh-keygen -t rsa -b 4096 -C [09:59:38] I use this command [09:59:52] https://www.irccloud.com/pastebin/bf54srsw/ [10:00:01] oh 4096 bit key [10:01:38] I don't see any mentions of maximum key length [10:03:02] does 2048 bit work? [10:03:14] oh now it works [10:03:18] sorryyy [10:03:26] o.O? [10:03:31] I regenerated the key [10:03:35] not sure... [10:05:12] well thanks!! [10:05:15] for your help [10:05:32] I restarted and regenerated the key [10:07:11] o.O [10:07:45] Jun 6 10:03:01 tools-bastion-03 sshd[21624]: Accepted publickey for epantaleo from X.X.X.X port 43044 ssh2: RSA SHA256:7ydhLzLdL9RVJRNHLF4ia5P+aijpmYMmyPgWDWqag+k [10:07:51] k [10:07:59] well, that was odd [10:08:03] Ester_: that is the key fingerprint you have in wikitech, right? [10:09:36] zhuyifei1999_: Hi, Do you have a few minutes ? [10:09:46] yes? [10:09:50] arturo: not sure what you ean [10:09:51] mean [10:10:22] * zhuyifei1999_ is switchin back and forth between IRC and watching youtube [10:10:32] so yeah, kind of [10:10:45] zhuyifei1999_: In toolsbeta, I am unable to run any tool on gridengine. I tried admin, test and test2. I think I may have messed up the code. Should I rebuild it ? [10:11:17] I think it's because it's only able toone one job per queue (I should fix that) [10:11:31] *to run [10:11:54] use `qstat -f -u \*` to see what's going on [10:12:07] but feel free to fix that [10:12:12] Ester_: nevermind, I'm glad it works now [10:13:01] thanks arturo [10:14:12] zhuyifei1999_: ok got it. Also, can we write python 3 code in webservices ? I was getting errors on using python3 modules [10:14:24] yes, use k8s [10:14:34] the 'python' type is python 3 [10:14:52] oh wait, do you mean the webservice command? [10:15:29] yes, I meant the source code of webservices command [10:16:13] idk about the internalsof the building process, so ask bd.808 :P [10:16:36] which module are you trying to use? [10:17:17] zhuyifei1999_: I wanted to use configparser of python 3. It was giving me errors so for the time being I switched to configparser of python 2 [10:17:28] what error? [10:18:46] zhuyifei1999_: I think it was 'no module named 'configparser'. And the syntax was correct, since I have already tried the same code locally. [10:19:11] https://docs.python.org/2/library/configparser.html [10:19:19] it's camelcased in py2 [10:19:41] zhuyifei1999_: Yes, I have switched to that [10:20:18] zhuyifei1999_: Thanks a lot for your help :) [10:20:22] np [11:17:27] can I do sco to copy files from the instance to my computer? [11:17:32] scp [11:17:40] is there a link with instructions? [12:25:34] (03CR) 10Reedy: "recheck" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/436936 (owner: 10Libraryupgrader) [14:05:54] !log shinken stopping shinken for now; too noisy [14:05:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [14:23:28] Guys, I'm having a problem here. Namely... [14:23:29] ssh: connect to host tools-login.wmflabs.org port 22: Connection refused [14:24:17] bd808: ^ [14:25:31] Now I'm getting Network is unreachable [14:27:07] ssh cyberbot-exec-iabot-01.eqiad.wmflabs [14:27:07] ssh: connect to host primary.bastion.wmflabs.org port 22: Network is unreachable [14:27:07] ssh_exchange_identification: Connection closed by remote host [14:27:20] I can't access my VPS's either. What's going on? [14:28:26] zhuyifei1999_: ^ [14:28:32] andrewbogott: ^ [14:28:41] Cyberpower678: we are doing rolling reboots of the servers [14:28:43] * zhuyifei1999_ looks [14:28:47] Cyberpower678: there is an outage inducing maint in progress [14:28:55] Oh [14:28:58] see status in topic or emails to -cloud mailing list [14:29:24] Why are the VM's getting rebooted? [14:29:49] And are ALL VM's getting rebooted including those in my Cyberbot project? [14:30:31] All presumably includes all [14:30:32] Cyberpower678: please subscribe to https://lists.wikimedia.org/mailman/listinfo/cloud-admin [14:31:59] Ugh. Another list. [14:32:40] digest ftw [14:32:46] andrewbogott: The email address you supplied is banned from this mailing list. If you think this restriction is erroneous, please contact the list owners at cloud-admin-owner@lists.wikimedia.org. [14:32:49] WTF?? [14:33:06] What domain? [14:33:10] gmail [14:33:42] My Yahoo inbox sees literally hundreds of emails a day. Emails get lost in there. [14:34:31] cloud-announce is probably what you want, not cloud-admin [14:34:57] I think I'm already subscribed to it. [14:35:27] the current maintenance is https://lists.wikimedia.org/pipermail/cloud-announce/2018-May/000050.html [14:36:28] we are updating software on the servers that host the VMs as well as on the VMs themselves. [14:36:39] bd808: has cyberbot been rebooted yet? [14:36:57] bd808: oh, good point, thank you [14:37:03] I'm a bit distracted :) [14:37:34] andrewbogott: *nod* I can answer questions while you work on the servers :) [14:37:55] yay, managers [14:37:57] * Reedy hides [14:38:44] Cyberpower678: I do not think so. It looks like things are just starting. [14:39:56] Oh good. I shut down https://tools.wmflabs.org/iabot [14:40:02] :S [14:40:31] it seems i can't reach my tool [14:40:43] mmecor: See the topic and https://lists.wikimedia.org/pipermail/cloud-announce/2018-May/000050.html [14:40:50] Maintenance ongoing [14:41:02] bd808: any etas [14:41:14] "several hours" [14:41:39] i guess this means the entire afternoon-evening [14:42:04] Cyberpower678, mmecor: the first server that we rebooted is not starting back up as easily as expected. We are figuring out what is wrong and how to proceed [14:42:15] Okay [14:42:17] no prob. ok [14:42:32] * Cyberpower678 really needs to get his job manager up and running on the tool [14:55:43] andrewbogott: is it normal, that the reboot of the instance dwl and dwl.taxonbot lasts so long? [14:56:09] doctaxon2: there have been issues we are working to debug, so no but it's known [14:58:11] chasemp: okay, please keep me up to date here [15:08:50] Technical Advice IRC meeting starting now in channel #wikimedia-tech - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:19:49] status update on the reboots: we have 2 labvirt hosts (the machines that actually run the VMs) down right now. We are working through kernel configuration issues that are keeping them from starting back up cleanly. [15:52:08] tools-login is back up, finally [16:29:26] status update: The issues with rebooting have been fixed (we needed another package of kernel modules installed). Reboots continue to roll across the OpenStack cluster [17:22:55] because I can query `revision` by `rev_user_text` on production and it's fast, but on Toolforge it is not, you need to use `revision_userindex` [17:23:13] this makes me feel like `revision` doesn't have the same indexes as production, no? [17:23:45] or at least I wouldn't want to suggest that it does and that it would be efficient [17:26:43] It actually has more indexes. However, in toolforge there are joins in it right now. [17:27:09] The joins are there for backward compatibility during some refactors of the database. We've had problems with it already. [17:27:21] The revision table has a temp table or two that are joined in as well. [17:28:16] you can use `EXPLAIN SELECT *rest of the query*` to see if there is something you can point to as well. [17:28:47] It's always possible that our replicas are missing something that was specially-added in production that I don't know about :) [17:29:07] well that's what my application is for, to allow you to run EXPLAIN on Toolforge queries https://tools.wmflabs.org/sql-optimizer [17:29:09] I tend to think that the joins are likely lagging things [17:29:24] HAH! nice :) [17:29:32] next I was going to add a "schema browser", which dumbs it down and lists the indexes for each table [17:30:14] Here's some of those joins: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/templates/labs/db/views/maintain-views.yaml#541 [17:30:52] Where people have had issues so far is that I was joining revision into page. I had to remove that. It was too slow. [17:31:09] There should be an index for each foreign key...there could be something missing! [17:31:17] But even with an index, it's more work than a straight select. [17:32:26] information_schema_p contains already metadata information [17:32:50] Good point! [17:33:24] I guess not the index table,but that can be added [17:33:34] I didn't even consider that we had that :) [17:37:33] it seems the `statistics` table isn't there, is what I think we give me the indexes of every table [17:38:07] ask for it :-) [17:38:15] nobody asked for it before! [17:39:10] musikanimal: this explain tool is rather nice. :) [17:39:21] thanks! work in progress [17:40:01] my goal is to make it easy to figure out why your query is slow, and suggest ways to improve it [17:40:35] bstorm_: honestly, I gave my opinion about the views- while not breaking compatibility is nice [17:40:54] :) [17:40:58] if production changes and that makes the query slow, I prefer we have the bare tables [17:41:07] this is for anomie really, not for you [17:41:15] he was the mastermind about the views [17:41:17] :-) [17:41:24] heh :) [17:41:46] !log project-proxy switching primary proxy to novaproxy-02 [17:41:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [17:42:02] * anomie looks for context [17:42:13] anomie: just badmouthing you :-) [17:42:16] as usual [17:42:33] ha, ok (: [17:43:06] in any case, the views are a temporary patch until we have a binary log on-the-fly modifier [17:43:19] (it just happens to be a 10-year temporary patch) [17:46:38] hey, tools-bastion-03 (IO?) is super slow - someone's running a script again? [17:46:46] maxsem@tools-bastion-03:~$ time ls [17:46:54] real 0m14.909s [17:48:45] MaxSem: looks like it to me but it's so slow that I can't get a prompt [17:48:50] I will look if my cursor ever appears [17:49:00] I have a prompt [17:49:02] jynus: https://phabricator.wikimedia.org/T196570 I may have worded that wrong, I only a tiny bit know what I'm talking about, but hopefully it's clear what I'm asking for [17:49:10] it does seem slow but I don't see why [17:49:33] chasemp: what is the magic ps result? [17:51:14] maybe the split log? [17:52:29] the load on its host doesn't seem that high [17:52:43] iops is usually the culprit there [17:54:53] https://graphite-labs.wikimedia.org/render/?width=586&height=308&_salt=1528307681.196&target=tools.tools-bastion-03.cpu.total.iowait [17:56:34] !log project-proxy switching primary proxy to novaproxy-01 [17:56:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [18:01:10] tools-login seems better now [18:07:10] still very slow to connect for me [18:07:25] Hi, tools.wmflabs.org gives 502, is it known/related to today's reboots? [18:07:37] and a lot of iowait andrewbogott https://graphite-labs.wikimedia.org/render/?width=586&height=308&_salt=1528308220.324&target=tools.tools-bastion-03.cpu.total.iowait&from=-1hours [18:08:00] the io intensive tasks there need to killed [18:08:08] it seems like as soon as I interact with the grid it slows down... [18:08:09] s/to/to be/ [18:08:18] chicocvenancio: how about now, better? (My grid requests are done) [18:09:19] Hi? [18:09:23] jem: it's not expected… probably needs a restart. bd808 is that something you can do? [18:09:37] Ok, thanks [18:09:56] andrewbogott: can do. did you fail over the tools proxy too, or just the vps one? [18:10:10] I did but it's back — that was > than an hour ago [18:10:15] other things (e.g. openstack-browser) are fine [18:10:35] andrewbogott: still slow (5s per ls) [18:10:55] what is the output of iotop? [18:10:58] chicocvenancio: ok, probably not me then since I'm logged out :) [18:11:21] !log tools.admin Pod in CrashLoopBackOff, invesitgating [18:11:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL [18:11:46] its usually an user doing something that seems benign, but hogs all iops [18:12:09] iotops was all 0's when I looked before [18:12:38] there has to be something, the iowait can't be coming from nowhere... [18:13:04] !log tools.admin Restarting webservice (webservice --backend kubernetes php5.6 start) [18:13:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL [18:13:36] andrewbogott: from the graph and the start times in the ps, I'd wager on one of the dexbot's processes [18:13:43] jem: its back for me. The cause of the crash was not obvious in the logs, but I'll try to keep an eye on it [18:14:58] Ok, my tool is working after webservice restart, thanks, bd808 [18:15:21] chicocvenancio: there's still nothing at all on iotop or top [18:15:31] !log tools.replag Restarting webservice, pod in CrashLoopBackOff state [18:15:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.replag/SAL [18:15:42] I'm seeing a trend here :/ [18:15:55] wmopbot keeps leaving and comming back. [18:16:20] andrewbogott: now its good, zero iowait [18:16:37] ls taking under 10ms [18:16:51] paladox: the "(Nickname regained by services)" message makes me think there are multiple copies running? I don't know who operates that bot [18:17:01] ah ok [18:17:03] thanks [18:17:40] danilo: is wmopbot yours? [18:21:42] chicocvenancio: it is [18:21:55] looks like it is stable now [18:22:08] I issued a restart command in IRC [18:22:58] Sagan: I think there still may be 2 copies. "Guest5872 (tools.wmop@wikimedia/bot/wmopbot)" and "wmopbot (tools.wmop@wikimedia/bot/wmopbot)" [18:35:15] zhuyifei1999_: someone on Facebook appears to have gotten a 502 error on v2c [18:35:20] bd808: andrewbogott: How goes it so far? [18:36:02] !log tools.anni-me Restarted webservice, 25720 CrashLoopBackOff retries logged [18:36:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.anni-me/SAL [18:37:03] I assume the reboots are still going on? [18:37:50] !log tools.wikibugs qdel current jobs [18:37:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [18:37:55] !log tools.anni-me Stopped webservice. App trying to run django, but failing to load MySQLdb module. This causes an infinite start, crash, restart loop [18:37:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.anni-me/SAL [18:38:25] Cyberpower678: yup, still going on. I know we are over half way done at this point [18:38:44] !tool anni-me [18:38:54] bd808: sweet. Just ping me when I can log back on, and reactivate IABot. [18:39:17] Cyberpower678: you can give it a shot now. many things are up, some are down [18:39:52] Bastion was down two minutes ago. :p [18:40:55] Cyberpower678: which bastion? [18:41:13] The one I proxy to get to cyberbot, but it's up now [18:41:40] bastion-01.bastion.eqiad.wmflabs has been running for almost 3 hours now [18:43:03] !log tools.video2commons-socketio Webservice in CrashLoopBackOff due to failure to reach video-redis.video.eqiad.wmflabs:6379 [18:43:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.video2commons-socketio/SAL [18:44:31] chicocvenancio, zhuyifei1999_: looks like tools.video2commons-socketio is yours? Its not having a fun time. 5721 CrashLoopBackOff restarts [18:44:57] I'm not yet familiar with it [18:45:03] but I can take a look [18:45:40] its dying because of a failure to connect to a redis in the video project [18:47:46] !log tools.wmf-task-samtar Restarted webservice, stuck in CrashLoopBackOff [18:47:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wmf-task-samtar/SAL [18:49:36] Reboots are all done. [18:49:51] Cyberpower678: ^ [18:49:59] Yay [18:50:12] Thanks :-) [18:52:33] * halfak just came here to ask about when the reboots would be done :) [18:52:48] bd808: I'm getting this [18:52:51] https://www.irccloud.com/pastebin/iF0BytMh/ [18:54:55] tools-bastion-03 is slow again [18:56:00] something is weird there [18:56:02] and I don't know what [18:56:27] * chicocvenancio is moving debuging v2c to tools-bastion-02 [18:56:53] yay rolling robots! [18:56:55] reboots [18:56:56] roboots [18:58:47] o_O [18:58:52] NFS requires a host-only network to be created. [18:59:01] i should probably just wipe this machine and replace it with a newer vm [18:59:07] brion: mw-vagrant startup failure? [18:59:15] yeah, on jessie-compat brach so it's old [19:00:04] I have seen this before and never found a reliable fix. I usually try `vagrant reload` 4-5 times and then if that doesn't work I reboot the underlying VM and cross my fingers [19:00:10] heh [19:00:28] its some problem with the loopback networking and LXC not playing nice [19:00:51] there we go, it's happier on another run [19:00:52] thx :D [19:01:14] bd808: this error doesn't seem related to redis, did something change with webservice already? [19:01:48] chicocvenancio: yeah, that one is a host mount from the k8s exec node not being seen in the Container. [19:02:12] Hi. [19:02:16] I honestly don't know why that would happen [19:02:28] "tools-bastion-03" is basically unusable. [19:02:30] hey Marybelle. what tool are you hear to ask about? ;) [19:02:37] Is there some other bastion host that's better? [19:02:46] Marybelle: dev.tools.wmflabs.org [19:02:53] I usually use whatever tools-login.wmflabs.org is pointing to. [19:03:06] I'm thinking I want to reboot tools-bastion-03 [19:03:10] it's got issues [19:03:15] and tools-bastion-05 seems ok [19:03:18] so I believe it's local [19:03:34] chasemp: +1. lots of looking has not found the cause and 02 and 05 do seem to be ok [19:03:36] A whole three hours of uptime. :-/ [19:03:50] Thanks, I'll try one of the other hosts. [19:04:31] Marybelle: we just finished a kernel upgrade across all the vms and hosts. A few things are being goofy on restart [19:04:32] !log tools tools-bastion-03 is virtually unusable [19:04:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:04:39] chasemp: Does tools-bastion-05 have an external host name? [19:04:46] bd808: Ah, okay. :-) [19:05:05] Marybelle: no probably not, but it can be reached through normal VPS bastion jump hosts [19:05:10] !log tools.video2commons-socketio restarting webservice to see if it fixes host mount issue [19:05:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.video2commons-socketio/SAL [19:05:14] hopefully tools-bastion-03 comes back here pretty quick [19:05:15] bd808: that did it [19:05:39] dev.tools.wmflabs.org got me into tools-bastion-02. I'll take it. [19:06:11] :) [19:06:24] tools is down again. [19:07:10] Cyberpower678: it's difficult to know what that means to you, a more specific report would be helpful [19:07:23] login.tools.wmflabs.org seems back now and is working better for me [19:07:25] ^I imagine it was the bastion reboot [19:07:33] $ ssh tools-login.wmflabs.org [19:07:33]