[00:34:17] !log tools.zoomviewer Deleting gird job log files older than 7 days directly on the NFS backend server (T248188) [00:34:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zoomviewer/SAL [00:34:20] T248188: Zoomviewer has ~450,000 files in NFS home directory - https://phabricator.wikimedia.org/T248188 [00:49:00] wow, 395G on .err logs [03:38:52] !log tools.toolviews Hard stop/start cycle to enable --canonical [03:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolviews/SAL [10:10:34] !log tools.zppixbot-test drop remind from exclude && git pull && reboot twice for tests when travis passes [10:10:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot-test/SAL [10:40:27] !log tools.zppixbot drop remind and use sopel default [10:40:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [12:34:10] !log paws created and transferred DNS zone `svc.paws.eqiad1.wikimedia.cloud` (T211096) [12:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [12:34:12] T211096: PAWS: Rebuild and upgrade Kubernetes - https://phabricator.wikimedia.org/T211096 [12:48:46] !log paws created record `k8s.svc.paws.eqiad1.wikimedia.cloud` pointing to `172.16.0.191` (which is paws-k8s-haproxy-1) (T211096) [12:48:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [12:48:49] T211096: PAWS: Rebuild and upgrade Kubernetes - https://phabricator.wikimedia.org/T211096 [14:39:46] !log paws point record `k8s.svc.paws.eqiad1.wikimedia.cloud` to `172.16.1.186` (which is paws-k8s-control-1, for the initial bootstrap) (T211096) [14:39:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [14:39:50] T211096: PAWS: Rebuild and upgrade Kubernetes - https://phabricator.wikimedia.org/T211096 [14:53:37] !log paws adding the hiera values to horizon for bootstrapping k8s T211096 [14:53:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [14:53:41] T211096: PAWS: Rebuild and upgrade Kubernetes - https://phabricator.wikimedia.org/T211096 [15:41:32] !log admin failing neutron over from cloudnet1003 to cloudnet1004 [15:48:25] !log admin re-imaging cloudnet1003 with Buster [15:48:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:01:30] !log tools.stewardbots Investigating StewardBot's outage [16:01:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [16:04:16] !log tools.stewardbots Restart StewardBot [16:04:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [16:07:53] cteam: herron is here to help us with the email thing. We have a VM in toolsbeta that he can't ssh to because some weirdness in the SSH config [16:08:09] which host? [16:08:23] trying to ssh to toolsbeta-mail-01.toolsbeta.eqiad.wmflabs [16:08:31] ok, checking... [16:09:02] I have some labs blocks in my .ssh/config from a while back that are possibly stale [16:09:46] could use a quick run-down of which bastion should be used and which key has been deployed to the host itself, I have a wmf_lab and a wmf_lab_root key that are used for different hostname patterns (not sure if that's still needed or if I even still have that access) [16:10:39] herron: can you try ssh -i ~/path/to/root/key root@toolsbeta-mail-01.tools.eqiad.wmflabs ? [16:10:59] I see things in auth.log that look like your config is correct and you maybe just don't have permission to ssh to that host [16:11:03] which the root key should override [16:12:30] hmmm no dice [16:12:45] ok, /that/ one didn't even get as far as reaching the host I think [16:13:15] mind is... [16:13:18] *mine [16:13:21] https://www.irccloud.com/pastebin/adwGLfvq/ [16:13:57] I also have [16:13:59] https://www.irccloud.com/pastebin/bIhk4Wi4/ [16:14:15] oh whoops the host is toolsbeta-mail-01.toolsbeta.eqiad.wmflabs instead of toolsbeta-mail-01.tools.eqiad.wmflabs [16:14:53] using the root key works! [16:15:22] so in this env should I ssh in as root normally? [16:15:41] In particular for toolforge, since we have restricted access to most VMs [16:15:55] Most cloud-vps projects have permissive any-member-can-do-anything policies [16:16:04] but not toolforge, where most members only ever use the bastions [16:16:21] I don't have in mind what the actual policies are there, I just know using the root key will skip them all :) [16:16:35] outside of toolforge you should use your normal/personal key [16:16:44] * andrewbogott hopes this is making sense [16:16:50] awesome, ssh_config updated to do that [16:17:03] yes makes sense, thanks! [17:41:12] hello everybody. Please, can you tell me which mysql server can I use from the Cloud instances? I'm trying the ones on the documentation, but none of them seems to work.. [17:41:42] what is the server you are trying to use? [17:42:02] and what error are you getting? [17:42:07] tools.db.svc.eqiad.wmflabs [17:42:20] OperationalError: (2003, "Can't connect to MySQL server on [17:42:39] I've also tried the IP address that I get from PAWS 10.110.98.159 [17:42:41] same error [17:42:52] also tried enwiki.labsdb [17:42:56] same error [17:43:25] are you using Toolforge or Cloud VPS? [17:43:39] dsaez: where are you trying to connect from? [17:44:13] * bd808 jsut verified that `sql toolsdb` works from a bastion [17:44:42] cloud instance covid-data.wmf-research-tools.eqiad.wmflabs [17:45:42] I need to connect from python, so I'm trying this: [17:45:57] import pymysql.cursors [17:45:57] connection = pymysql.connect(host='tools.db.svc.eqiad.wmflabs:3306', [17:45:57] user=myuser, [17:45:57] password=mypass, [17:45:57] db='enwiki_p', [17:45:58] charset='utf8', [17:46:00] cursorclass=pymysql.cursors.DictCursor) [17:46:12] myuser and mypass I got from toolforge replica.cnf [17:46:17] please use a pastebin for multiple lines of text next time [17:46:40] * dsaez googling pastebiin [17:47:16] dsaez: `telnet tools.db.svc.eqiad.wmflabs 3306` from that instance gives mea MariaDB connection [17:47:36] so the tcp connection is not blocked [17:47:58] dsaez: and easy pastebin is https://phabricator.wikimedia.org/paste/edit/form/14/ [17:48:04] *an easy [17:48:29] got it, thx [17:48:56] try host='tools.db.svc.eqiad.wmflabs' without the port on the end [17:49:08] the default is 3306 anyway [17:49:15] ok [17:50:20] AntiComposite, that works!! [17:50:26] amazing. [17:50:27] yeah, I think AntiComposite is on the right track. The docs for pymysql.connections.Connection have separate host and port params [17:51:00] Ok, I'll search back that piece of code and fix it, I copied from some wikitech docummentation [17:51:23] great, thank you very much AntiComposite and bd808 [17:54:54] https://wikitech.wikimedia.org/wiki/User:Legoktm/toolforge_library can make connecting to the databases easier [17:57:21] wow, cool, I'll check that [19:07:50] hello! I noticed the images for k8s changed tags... when did they change? [19:13:48] davidwbarratt: https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration [19:16:18] !log admin systemctl disable block_sync-tools-project.service on cloudbackup2001.codfw.wmnet to avoid stepping on current upgrade [19:16:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:18:32] so it looks like the node10 shell has a current version of npm. I'm running npm install which seems to workfile, but I guess my project's bin folder isn't added to the PATH because I get undefined postinstall: `webpack` [19:19:17] which seems to imply that webpack can't be found (even though it's in node_modules/.bin/webpack) [19:19:35] maybe I should manually add node_modules/.bin to the PATH? [19:23:32] !log admin disabling puppet on cloudbackup2001 to prevent the backup job from starting during maintenance [19:23:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [21:20:01] !log tools.ldap Stopping webservice for T253346 [21:20:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ldap/SAL [21:20:34] I have this GitHub Action and it works beautifully, but when I try to do the same thing on toolforge it fails. :( https://github.com/wikimedia/InteractionTimeline/blob/deb65a4ffc06fff7e6fa8a9b4bbf1c15ee3e7b15/.github/workflows/build.yml#L21-L29 [21:24:25] !log tools.ldap refs. T253346 [21:24:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ldap/SAL [21:25:22] !log tools.ldap "webservice --backend=kubernetes --canonical python3.7 start" refs. T253346 [21:25:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ldap/SAL [21:25:59] legoktm: cc re T253346 :) [21:25:59] T253346: Prepare to work with toolforge.org domain - https://phabricator.wikimedia.org/T253346 [21:26:12] can I copy files to a tool using SCP? [21:27:48] hauskatze: awesome! [21:28:09] davidwbarratt: yes. You may need to use `take` after to fix permissions though [21:28:34] davidwbarratt: yes, but you have to do it as your user rather than the tool and then you will have to run `take` on the files to fix the ownership. Something like `scp /local/path me@login.toolforge.org:/data/project/$TOOL/some/dir` [21:28:53] davidwbarratt: Yup. I used WinSCP on some of my tools in the past, but you'll need to `take` or `chmod` files afterwards [21:28:54] ah! thanks [21:29:12] FYI these docs are out of date: https://www.mediawiki.org/wiki/Toolserver:Transferring_files [21:29:39] davidwbarratt: that's not the place to look for aything [21:29:41] davidwbarratt: that's the old, dead Toolserver, not Toolforge [21:29:54] https://wikitech.wikimedia.org/wiki/Help:Toolforge [21:29:55] right, I just googled it and that's what came up [21:30:15] you googled for the wrong project name too I'd guess [21:30:26] toolserver !== toolforge [21:31:00] hauskatze: fyi, all of the flask tools I've written should be fully compatible with the new domain, no changes needed. [21:31:02] davidwbarratt: "Toolserver has been replaced by Toolforge. As such, the instructions here may no longer work, but may still be of historical interest." -- seems well marked [21:31:48] legoktm: awesome, less work :-) Although I could get this one done 'cause I had access to it [21:32:27] I'll leave the others up to you or for bd808 when he deactivates the old URLs :-) [21:33:02] hauskatze: when we flip that switch things will either work, or be broken until folks come to fix them [21:33:16] I'm not going to hand fix 1000 webservices [21:33:28] I'm just a jerk like that :) [21:33:36] It's totally understandable bd808 [21:33:49] No jerkiness (does this word exist?) there IMHO [21:34:24] bd808 I know, there just isn't an equivalent page for Toolforge. [21:34:27] or not one that I could find [21:35:20] davidwbarratt: probably https://wikitech.wikimedia.org/wiki/Help:Access_to_Toolforge_instances_with_PuTTY_and_WinSCP [21:35:38] that's bent towards gui clients, but covers the content [21:36:21] legoktm: also, maybe wrong channel but since you're here... There's a week-old mw+2 access request that could use a gerrit admin :-) [21:36:30] * hauskatze hides [21:36:44] davidwbarratt: I found that by going to https://wikitech.wikimedia.org/wiki/Help:Toolforge and typing 'scp' into the sidebar search [21:36:54] yeah I saw that [21:37:18] Great... https://ldap.toolforge.org/user/l10nupdate [21:37:26] "The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application." [21:37:48] bd808 ok, so I split this up into a build script, but when I try to run `npm run build` within the container, either locally or in a GitHub action (also within the container) it works perfectly, on toolforge it fails [21:38:26] I'm assuming that when I do node10 shell I am getting this image: docker-registry.tools.wmflabs.org/toolforge-node10-sssd-web [21:39:31] davidwbarratt: can you paste logs of what exactly is failing? [21:40:35] hauskatze: oh let me see [21:40:37] legoktm yep, though it's not very helpful: https://phabricator.wikimedia.org/P11271 [21:41:21] probably you're hitting a memory limit [21:41:31] oooo, is there a way to increase it? [21:41:32] yeah, I bet it is an oom [21:41:40] almost. :/ [21:41:52] The code exists but I have not pushed the package yet [21:41:53] legoktm: maybe it's an issue with l10nupdate ldap account [21:42:02] because it works for, e.g. maurelio [21:42:24] ugh... ok, I guess my only option for now is to build it locally and copy the files [21:42:45] there is a super hacky hack documented in T252700 [21:42:47] T252700: `pip install fasttext` fails inside `webservice shell` for lack of RAM - https://phabricator.wikimedia.org/T252700 [21:42:50] I guess I could automate it from GitHub Actions? 🤔 [21:43:40] hauskatze: oh, I'm looking at the +2 request, not the broken ldap account :) [21:43:51] ah, lol, ok [21:51:34] bd808 I'll copy this from my local for now and follow that task [22:14:53] !log toolsbeta Building tools-webservice 0.70 via wmcs-package-build.py [22:14:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [22:29:42] !log tools Building tools-webservice 0.70 via wmcs-package-build.py [22:29:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:36:21] !log tools Updated tools-webservice to 0.70 across instances (T252700) [22:36:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:36:26] T252700: `pip install fasttext` fails inside `webservice shell` for lack of RAM - https://phabricator.wikimedia.org/T252700 [22:40:32] !log tools Rebuilding all Docker containers for tools-webservice 0.70 (T252700) [22:40:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:04:09] !log paws added profile::wmcs::kubeadm::k8s::encryption_key and profile::wmcs::kubeadm::k8s::node_token to labs/private T211096 [23:04:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [23:04:13] T211096: PAWS: Rebuild and upgrade Kubernetes - https://phabricator.wikimedia.org/T211096