[05:33:33] !log tools migrating tools-worker-1012 to labvirt1017 (CPU load balancing) [05:33:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:59:18] (03CR) 10Lokal Profil: [C: 04-1] "this hides the error but does not fix the underlying issue." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378975 (https://phabricator.wikimedia.org/T174503) (owner: 10Jean-Frédéric) [13:46:49] !log tools moving tools-webgrid-lighttpd-1419.tools.eqiad.wmflabs to labvirt1017 (CPU balancing) [13:46:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:49:17] tx andrewbogott [13:49:44] overall the CPU situation seems fine, unless we think that grafana is lying to us due to hyperthreading [13:49:50] (in which case it is not so fine) [13:50:41] andrewbogott: we'll find out what the damage is next week I imagine and doing shuffling now to give us a few days of baseline seems good [13:51:28] We can get everything squarely under 60% usage with just a few more migrations, so that's my current plan [13:51:48] k [13:52:40] could you share the link to the graph? [13:54:29] andrewbogott: fyi https://phabricator.wikimedia.org/T183937#3878217 [13:54:29] arturo: I'm using https://grafana.wikimedia.org/dashboard/db/labs-capacity-planning [13:54:54] thanks andrewbogott [13:55:13] chasemp: looks good. It's 19 and 20 that are reserved for DB hosting? [13:55:19] andrewbogott: yes [13:55:29] we could in a pinch steal them and definitely temporarily [13:55:41] andrewbogott: is 60% w/ spares preserved or without? [13:55:45] without [13:55:50] well that hurts [13:55:52] k [13:55:59] oh, sorry I misread [13:56:05] 60% is without using the spares [13:56:09] so, spares are still spares [13:56:09] oh sweet [13:56:13] ok that makes me feel much better [13:56:20] yeah, it's not too scary [13:56:51] good timing on 21 and 22 and we can eat spares and forward procurement timelines I imagine if we have to [13:57:24] * andrewbogott nods [13:58:50] andrewbogott: any clue on tools-webgrid-lighttpd-1419? [13:58:56] oh migration ... [13:58:57] right [13:59:02] yeah :) [13:59:06] heh [14:02:11] andrewbogott: I'm going to push a handful of instances to 1018 and kick off some stress testing ot try to get a baseline [14:03:27] sounds good [14:03:48] !log testlabs OS_TENANT_NAME=testlabs openstack server create --flavor 2 --image 85e8924b-b25d-4341-ad3e-56856d4de2cc --availability-zone host:labvirt1018 labvirt1018stresstest-1 [14:04:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [14:05:07] !log tools moving tools-webgrid-lighttpd-1417.tools.eqiad.wmflabs to labvirt1015 (CPU balancing) [14:05:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:07:30] (03PS1) 10Elukey: Move varnishkafka.key.pem to varnishkafka.key.private.pem [labs/private] - 10https://gerrit.wikimedia.org/r/402357 [14:07:46] (03CR) 10Elukey: [V: 032 C: 032] Move varnishkafka.key.pem to varnishkafka.key.private.pem [labs/private] - 10https://gerrit.wikimedia.org/r/402357 (owner: 10Elukey) [14:08:26] !log testlabs OS_TENANT_NAME=testlabs openstack server create --flavor 2 --image 85e8924b-b25d-4341-ad3e-56856d4de2cc --availability-zone host:labvirt1018 labvirt1018stresstest-2 [14:08:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [14:08:33] !log testlabs OS_TENANT_NAME=testlabs openstack server create --flavor 4 --image 85e8924b-b25d-4341-ad3e-56856d4de2cc --availability-zone host:labvirt1018 labvirt1018stresstest-3 [14:08:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [14:08:41] !log testlabs OS_TENANT_NAME=testlabs openstack server create --flavor 4 --image 85e8924b-b25d-4341-ad3e-56856d4de2cc --availability-zone host:labvirt1018 labvirt1018stresstest-4 [14:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [14:25:20] !log tools moving tools-webgrid-lighttpd-1420.tools.eqiad.wmflabs to labvirt1015 (CPU balancing) [14:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:37:35] !log testlabs sudo cumin --force D{labvirt1018stresstest-[1-5].testlabs.eqiad.wmflabs} 'sudo screen -d -m "stress-ng --timeout 600 --fork 4 --cpu 1 --io 2 --vm 1 --vm-bytes 1G --switch 5"' [14:37:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [14:47:10] !log tools moving tools-webgrid-lighttpd-1421.tools.eqiad.wmflabs to labvirt1017 (CPU balancing) [14:47:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:02:26] !log tools moving tools-exec-1440.tools.eqiad.wmflabs to labvirt1017 (CPU balancing) [15:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:05:31] !log tools.zppixbot restarted for config changes [15:05:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [15:18:10] !log tools moving tools-exec-1411.tools.eqiad.wmflabs to labvirt1017 (CPU balancing) [15:18:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:32:40] !log tools moving tools-exec-1420.tools.eqiad.wmflabs to labvirt1015 (CPU balancing) [15:32:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:45:06] bd808: could you get a gdb python traceback of PID 4762 on tools-exec-1415 ? [15:45:33] it looks like deadlocked somewhere [15:49:01] (https://commons.wikimedia.org/wiki/User_talk:Zhuyifei1999#SignBot_stopped_working) [16:01:43] !log tools moving tools-worker-1017 to labvirt1017 (CPU balancing) [16:01:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:15:50] zhuyifei1999_: I can't get it to give my a py-bt, but the raw bt is at https://phabricator.wikimedia.org/P6539 [16:16:08] ok [16:16:18] could you also should the other threads? [16:17:01] bd808: did you installed -dev packages? for example python2.7-dev [16:17:15] /usr/share/doc/python2.7/README.debug [16:17:15] arturo: yes, they are installed [16:17:28] I also tried the virtualenv trick [16:17:29] https://www.irccloud.com/pastebin/wML3pq6u/ [16:18:06] the py-bt macro is loaded, it just reports "#12 (unable to read python frame information)" [16:18:18] -_- [16:19:39] zhuyifei1999_: the paste is updated with "info threads" [16:19:51] ok thanks [16:20:47] * zhuyifei1999_ is finding that backtrace weird, I'll compare it with other traces (not that familiar with cpython internals) [16:22:30] !log tools moving tools-worker-1027 to labvirt1015 (CPU balancing) [16:22:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:26:45] !log ores deleting ores-staging-02 to recreate as a stretch instance. [18:26:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [19:31:15] Steinsplitter: I see your bot was the last to update https://commons.wikimedia.org/w/index.php?title=File:WikiMunzee.png&action=info -- The code is no longer valid and needs to be updated with the wiki code in https://69852172-a-62cb3a1a-s-sites.googlegroups.com/site/armchairmunzees1234/armchair-munzees/munzees8.png?attachauth=ANoY7cquDB_7nmhtcz2nJ22yIEZgqEFm3kDoj__MoQzVIiEjHEr3cW4v50xeU7gu9EnDUqy6XESdIXyp59eUBZZgAdgP8jdCVVG9UJ-rFasF [19:31:16] pT-v5jzsR0ye7N6zQWAfyTMd8r_72dWThX2Jb8ctU5EUOtbcSpV7i-QzaaVdJMgnfCaduxF-3_yDWx_em_IwYHn7CoyCUmmM0GwNmI32RxJEuctbRo9wTfkIf6s79YeYpPelPNveJp7LQNawlBZCIvr870zbuN2L&attredirects=0 [19:31:40] How do I get the bot to get the updated QR code and update it? [19:33:28] my bot did not upload the file: https://commons.wikimedia.org/w/index.php?title=File:WikiMunzee.png#filehistory please contact the original uploader. [20:38:54] (03PS1) 10ArielGlenn: add scap keys for dumpsdeploy for beta [labs/private] - 10https://gerrit.wikimedia.org/r/402426 [21:11:04] (03CR) 10ArielGlenn: [V: 032 C: 032] add scap keys for dumpsdeploy for beta [labs/private] - 10https://gerrit.wikimedia.org/r/402426 (owner: 10ArielGlenn) [21:59:54] !log tools.zppixbot restarted for config changes [21:59:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [22:46:14] !help Hello, can someone help me? I have an error when I connect to my instance. [22:46:14] Lofhi: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [22:46:41] Lofhi: sure! what's the instance and what's the erorr? [22:49:01] bd808: thanks, I use lophi as login and when I open a session, I get the error server unexpectedly closed network connection :( [22:49:25] I followed the instructions on https://wikitech.wikimedia.org/wiki/Help:Putty [22:50:20] Lofhi: what server are you trying to connect to? [22:50:54] bastion.wmflabs.org as written, I did wrong? [23:00:20] Lofhi, so you can't successfully log into bastion? [23:00:34] you always get unexpectedly closed network connection? [23:00:51] could be someone trying to block SSH [23:01:25] It seems not, yet I think I followed the instructions correctly. I regenerated a key twice to check! [23:01:30] Lofhi, can you try SSHing to gerrit.wikimedia.org port 29418 with the same username and key? [23:01:38] of course [23:01:53] do you use pagent.exe with putty to load your SSH key and minimize to desktop tray? [23:02:05] Yes [23:02:19] and confirmed the (right) key is loaded in there? [23:02:45] I see the key in "View keys" [23:02:49] the gerrit key and the wikitech key can be different keys [23:02:53] or the same [23:03:26] Lofhi, so can you successfully get to gerrit? [23:04:08] Strange, I get another error on gerrit [23:04:20] which one? [23:04:28] No supported authentication methods available [23:04:32] uhhh [23:04:33] hm [23:04:43] that might be because you haven't uploaded a key to gerrit [23:04:53] mh [23:05:04] can you log in using your wikitech username and password to gerrit.wikimedia.org and go to https://gerrit.wikimedia.org/r/#/settings/ssh-keys and add the key there too? [23:05:09] then try again? [23:05:17] yes [23:05:47] !log ores staged ores-wmflabs-deploy:8d252de [23:05:57] the reason I'm checking gerrit is it listens for SSH connections on a different port [23:06:11] ...stashbot? [23:06:20] Bueller? [23:06:21] stashbot is dead [23:06:26] hmmm [23:06:32] Nooo. Whyyy :( [23:06:35] let me poke at stashbot [23:06:37] Poor bot [23:06:41] thanks madhuvishy [23:06:57] let’s start a GoFundMe to send Stashbot a “Get well soon” Card [23:07:15] I'm back, success with gerrit, using login: lofhi [23:07:20] * Hauskatze sends some € [23:07:36] Lofhi: are you a member of tools or any other VPS project? [23:08:01] Lofhi, what do you see when sshing to gerrit? [23:08:13] madhuvishy, looks like they're not [23:08:42] Yes, the tool is here : https://toolsadmin.wikimedia.org/tools/id/lebot [23:08:53] I'm on PAWS too [23:08:57] that would explain why they can't log in to vps bastion - I think bastion membership is granted implicitly [23:09:10] hmmm [23:09:25] yes [23:09:27] I think so too [23:09:39] Lofhi: can you try logging into username@login.tools.wmflabs.org [23:09:49] krenair@bastion-01:~$ groups lofhi [23:09:49] lofhi : wikidev [23:09:52] no project-tools [23:10:07] wait, i have a question [23:10:32] huh weird [23:10:54] hold on [23:10:54] When I created an account on "tool", I couldn't use my real username "Lofhi", maybe it's because of that? [23:11:02] So I'm using Lophi [23:11:03] Hi there, I'm in the middle of a Discourse installation. It is asking me for SMTP details. [23:11:06] I don't know why [23:11:08] your name on that tool is Lophi [23:11:12] not Lofhi [23:11:14] Yes [23:11:20] krenair@tools-bastion-03:~$ groups lophi [23:11:21] lophi : wikidev project-tools tools.lebot [23:11:26] you should be able to log in [23:11:26] But why Lofhi is taken? [23:11:30] I see that https://discourse.wmflabs.org seems to have mx1001.wikimedia.org defined as SMTP server [23:11:46] Lofhi, someone got there before you [23:11:52] strange, but okay [23:11:53] but then it is also asking for as SMTP username and password. I wonder what to add there. [23:12:07] Lofhi, you can run `ldapsearch -LLLx cn=Lofhi mail` and ask them :) [23:12:40] labs instances using prod SMTP? [23:12:50] wouldn't expect that to work [23:13:26] Lofhi, okay so [23:13:27] https://discourse.wmflabs.org does send notifications [23:13:30] madhuvishy, it works with lophi@login.tools.wmflabs.org [23:13:40] cool [23:13:41] No error [23:13:46] what were you doing when it was erroring then? [23:13:48] that's the right place to go for tools [23:13:56] he was trying to get to bastion.wmflabs [23:14:05] well that should be fine [23:14:05] ok, if there is no supersimple answer I can look deeper to https://discourse.wmflabs.org configuration [23:14:18] In fact I don't even know why tools is directly exposed like it is right now [23:14:49] with my config I can get to the same host that login.tools.wmflabs.org does simply by 'ssh tools-bastion-03' [23:15:02] which takes me via the proper bastion [23:15:08] bastion is used for what? [23:15:24] I followed the wiki page so... [23:15:25] bd808: stashbot seems to be failing to talk to elasticsearch [23:15:27] SSHing into labs instances that don't have public IPs and exposed SSH [23:15:41] ideally everything in labs would block SSH apart from bastion [23:16:08] madhuvishy: hmmm... seeing that. dns failures [23:16:08] Lofhi: right, for tools I recommend going to https://wikitech.wikimedia.org/wiki/Portal:Toolforge [23:16:43] "In fact I don't even know why tools is directly exposed like it is right now" -- because jump servers were thought to be difficult to explain mostly I think [23:17:04] every other project manages [23:17:20] Toolforge is not every other project :) [23:17:29] and these decisions were made long ago [23:17:42] we could revisit them [23:17:48] root pts/34 bastion-restrict Mon04 4days 0.14s 0.14s -bash [23:17:50] idle for 4 days, pff [23:17:52] Ok, I will remember madhuvishy, thanks [23:18:02] wish my SSH sessions would stay open idle for more than 10 minutes [23:18:03] Thanks bd808 and Krenair too [23:18:05] np [23:18:47] Lofhi: yw! [23:19:47] madhuvishy: I killed the pod. lets hope the DNS failures were transient [23:19:53] * bd808 tails the log [23:20:30] bd808: I did try killing the pod once [23:20:58] madhuvishy: I think you restarted the webservice rather than the bot [23:21:04] but it seems slightly happier this time? stashbot isn't back yet though [23:21:09] I did both :) [23:21:46] hmmm... I asw 2d uptime on the pod [23:21:50] *saw [23:22:12] but yea something is not right very early in the startup [23:22:15] Can I bother you a little bit more? [23:22:35] Lofhi: sure! I'll even try not to wander way this time [23:23:01] It seems that I created Lofhi on Wikitech, but I never had access to it. [23:23:04] https://wikitech.wikimedia.org/wiki/Special:Log/Lofhi [23:23:16] The date matches my attempts [23:23:58] madhuvishy: sal is busted too. :/ Either tools-elastic is down or something worse is up in k8s land [23:24:08] hmmm [23:24:33] * bd808 tries a tool that doesn't use ES but does need DNS [23:25:27] bd808: openstack browser is also busted [23:25:49] madhuvishy: yeah, versions can't talk to noc [23:25:57] we have problems [23:26:14] Lofhi: that account exists in LDAP. It seems to have an email address that will make it unrecoverable however [23:26:40] mh [23:26:44] https://www.irccloud.com/pastebin/CfVHDBVa/ [23:27:09] I assume usurping it would be extremely complicated? [23:27:52] bd808, do we need a Wikitech account to use "PAWS"? [23:28:12] No, just a Wikimedia account (Meta or otherwise) [23:28:14] Lofhi: no, PAWS uses your normal Wikimedia account [23:28:26] mh... [23:28:34] (also called an SUL account) [23:28:52] bd808: are you able to restart Stashbot? [23:28:57] madhuvishy and I need to duck out to figure out what's wrong with Kubernetes [23:29:12] Hauskatze: no, its not starting properly. see the topic :/ [23:29:21] ah, oh [23:29:24] to be fair the topic doesn't specify stashbot [23:29:38] I believe they're working on fixing a broader problem that affects stashbot, Hauskatze [23:29:46] When I tried to create an account with "Lofhi" to access PAWS, I had a fatal error, could that explain it? With https://tools.wmflabs.org/paws/hub [23:29:49] Krenair: ack [23:30:01] whatever kubernetes is :) [23:30:14] a system that stashbot runs inside [23:30:53] All right, page gives a 504 Gateway Time-out error now [23:30:53] ... [23:31:09] same [23:31:20] I had a fatel error when I create Lofhi [23:31:24] So [23:31:49] let's let them fix the current thing they're on and we'll see if paws starts working. if not they can look at that too [23:31:53] Hauskatze: https://kubernetes.io/ as for what kubernetes is [23:32:07] heh, BIOT domain [23:32:20] British Indian Ocean Territory [23:32:22] .io gets abused like mad [23:32:35] okay, ty [23:32:37] iirc BIOT is mostly military use [23:33:19] from the wikipedia article on BIOT: "The only inhabitants are US and British military personnel and associated contractors, who collectively number around 2,500 (2012 figures).[3] " [23:33:19] https://biot.gov.io/ [23:33:35] .io does get abused yes, but lots of legimate open source software uses it [23:33:43] for their websites [23:34:13] "The British Indian Ocean Territory (BIOT), an archipelago of 58 islands covering some 640,000 sq km of ocean, is a British Overseas Territory. It is administered from London and is located approximately halfway between East Africa and Indonesia. [23:34:14] Access is restricted and a permit is required in advance of travel." [23:34:48] Also not a very trustworthy TLD: https://www.theregister.co.uk/2017/07/10/io_hijacking_in_transition_cockup/ [23:35:15] guy managed to register some domains that were actually listed as .io's NS records [23:45:18] Hauskatze [23:47:15] Krenair: :) [23:49:20] !log tools Run clush -w @k8s-worker -x tools-worker-1001.tools.eqiad.wmflabs 'sudo service docker restart; sudo service flannel restart; sudo service kubelet restart; sudo service kube-proxy restart' on tools-clushmaster-01 [23:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:50:17] bd808: On T181925, you killed one too much. alswiki is still a thing. [23:50:18] T181925: Remove als.wik(ibooks|iquote|tionary), mo.wik(ipedia|tionary) views from replicas - https://phabricator.wikimedia.org/T181925 [23:51:11] alswikibooks, alswikiquote and alswiktionary now redirect to alswiki; the views for the former three were meant to be dropped, the database for alswiki wasn't. [23:53:37] eddiegp: hello, thanks for reporting that! we are in the middle of a different outage, so bd808 will probably respond in a bit [23:55:50] No problem, but I was about to go to bed when I read the bugmail and decided I should mention that sooner rather than later, so I'll not wait ;) I'll reopen and leave a comment on the task. [23:56:43] eddiegp: oops. It should be easy for me to fix it [23:58:46] eddiegp: please do reopen the ticket and I'll get to it as soon as the rest of the fire we are fighting is out [23:58:50] bd808: I've just reopened the task. [23:59:14] thanks! [23:59:17] bd808: Well, good luck for that from me, and good night ;)