[05:50:24] <Andy96>	 hello
[05:52:40] <Andy96>	 Hey, everyone, please, please be an operation and maintenance engineer. I want to join the wiki technology infrastructure team to provide services for the wiki. How can I join? Thank you.
[09:33:08] <arturo>	 !log admin force update ferm cloud-wide (in all VMs) for T153468
[09:33:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[09:33:12] <stashbot>	 T153468: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs - https://phabricator.wikimedia.org/T153468
[09:38:57] <arturo>	 !log admin downtime toolschecker for 24h
[09:38:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[09:47:52] <musikanimal>	 andrewbogott: still getting 504 gateway timeouts across all of VPS :(
[09:48:33] <musikanimal>	 Is there a Phab ticket already? If not I can create one later when I'm home
[09:48:55] <qgil>	 (Ah, ok, I thought it was just our VPS)
[09:50:23] <arturo>	 musikanimal: qgil we just updated the firewall and may have some widespread issues
[09:51:19] <arturo>	 mmm
[09:51:22] <arturo>	 toolforge is down!
[09:53:21] <Wurgl>	 I am on tools-sgebastion, but cannot connect to my user database … timeout during connect
[09:53:37] * hauskatze tests his
[09:56:45] <qgil>	 https://space.wmflabs.org/blog/ works but veeeerrryyyy slowly. https://discuss-space.wmflabs.org/ gives a consistent 504.
[09:59:07] <qgil>	 (About the blog, it could be that the slowness is caused by the wait for elements fro, Discuss, the server giving 504.)
[10:14:07] <andrewbogott>	 musikanimal: any better?
[10:14:13] <andrewbogott>	 (this was a totally unrelated thing btw)
[10:14:47] <musikanimal>	 still timing out on my end
[10:14:53] <andrewbogott>	 musikanimal: wait, don't answer yet, I didn't fix the right thing :)
[10:15:02] <andrewbogott>	 musikanimal: what's the hostname/project for your backend?
[10:15:17] <musikanimal>	 This is happening for all of VPS
[10:15:29] <musikanimal>	 Quarry, XTools, etc.
[10:16:06] <andrewbogott>	 I know, but that doesn't help me patch your specific problem
[10:16:48] <musikanimal>	 xtools-prod06 is an example
[10:17:20] <andrewbogott>	 ok, how's that?
[10:19:09] <musikanimal>	 It is loading now, but very slow. Looks like the Toolforge CDN isn't working (where XTools pulls in some assets)
[10:20:00] <musikanimal>	 My plane is about to take off, if you don't get a reply from me that's why!
[10:20:10] <andrewbogott>	 cool, the other bits should come back online shortly (arturo is running a fleet-wide fix)
[10:21:32] <arturo>	 !log admin we installed ferm in every VM by mistake. Deleting it and forcing a puppet agent run to try to go back to a clean state.
[10:21:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:22:21] <Wurgl>	 tools.persondata@tools-sgebastion-07:~$ mysql --defaults-file=$HOME/replica.my.cnf --database=s51412__data --host=tools.db.svc.eqiad.wmflabs
[10:22:26] <Wurgl>	 … waiting …
[10:22:40] <Wurgl>	 ERROR 2003 (HY000): Can't connect to MySQL server on 'tools.db.svc.eqiad.wmflabs' (110 "Connection timed out")
[10:23:20] <andrewbogott>	 Wurgl: a fix is rolling out now, the error should go away in a few minutes
[10:26:15] <Wurgl>	 works again
[10:26:56] <andrewbogott>	 cool
[10:33:49] <arturo>	 qgil: is everything better now?
[10:41:02] <Urbanecm>	 Hello, are there any problems with submitting jobs? I have a lot of error message like:
[10:41:06] <Urbanecm>	 https://www.irccloud.com/pastebin/BjTI2cyC/
[10:43:58] <qgil>	 arturo, YES!!! Everything seems to work normally. Thank you very much!
[10:49:31] <arturo>	 Urbanecm: yes, please try again
[10:54:01] <Urbanecm>	 arturo: still getting a lot of emails with the error message listed above, the last one came at 12:52 CEST (your msg was sent at 12:49).
[10:54:27] <Urbanecm>	 submitting jobs manually works, this is from cron-scheduled jobs
[10:57:21] <arturo>	 ok, that may make sense
[10:57:47] <arturo>	 due to a flooding of emails
[10:58:00] <arturo>	 reporting failing jobs
[14:36:42] <qgil>	 Mmm can it be that this firewall problem is still hitting our Space backend? See https://phabricator.wikimedia.org/T234218
[14:36:57] <qgil>	 arturo, if you are around ^^^
[14:43:31] * arturo loking
[14:43:35] <arturo>	 looking*
[14:52:32] <tgr>	 could T234225 be related to the recent ferm issue?
[14:52:33] <stashbot>	 T234225: Wikimedia Space fails to start with "iptables: No chain/target/match by that name." - https://phabricator.wikimedia.org/T234225
[14:55:58] <jeh>	 tgr: it does appear to be related. I'd recommend restarting docker, which will reapply the iptables rules maintained by docker. This should do it: `systemctl restart docker` 
[14:56:26] <jeh>	 restarting with `sudo` :)
[14:57:07] <arturo>	 yes!
[14:57:17] <arturo>	 that should be related to T234218
[14:57:17] <stashbot>	 T234218: Can't login into Wikimedia Space - https://phabricator.wikimedia.org/T234218
[14:57:48] <tgr>	 thanks, that helped
[14:59:55] <tgr>	 !log discourse restarted docker on discuss-space for T234218
[14:59:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Discourse/SAL
[15:42:37] <Amir1>	 !log codesearch 
[15:42:37] <stashbot>	 Amir1: Missing project or message? Expected !log <project> <message>
[15:43:02] <Amir1>	 !~log codesearch ladsgroup@codesearch4:~$ sudo MODE=restart /srv/codesearch/manage.sh (T234211)
[15:43:04] <stashbot>	 T234211: tool-codesearch: Unable to contact hound - https://phabricator.wikimedia.org/T234211
[15:46:57] <Lucas_WMDE>	 Amir1: was the ~ between ! and log intentional? I think your message wasn’t logged
[15:49:37] <Amir1>	 !log codesearch ladsgroup@codesearch4:~$ sudo MODE=restart /srv/codesearch/manage.sh (T234211)
[15:49:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL
[15:49:42] <stashbot>	 T234211: tool-codesearch: Unable to contact hound - https://phabricator.wikimedia.org/T234211
[15:49:50] <Amir1>	 Lucas_WMDE: thanks
[15:50:09] <Lucas_WMDE>	 np
[16:09:20] <Amir1>	 didn't work, going to stop everything and start over again
[16:10:57] <bd808>	 Amir1: does that project use Docker? If so you may need to restart the Docker service as fallout from T234231
[16:10:58] <stashbot>	 T234231: Toolforge ingress: decide on how ingress configuration objects will be managed - https://phabricator.wikimedia.org/T234231
[16:11:14] <bd808>	 err.. not that one... T153468
[16:11:15] <stashbot>	 T153468: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs - https://phabricator.wikimedia.org/T153468
[16:11:20] <Amir1>	 bd808: I have restrated docker there like 5 times already :(
[16:11:40] <Amir1>	 it's around ten different docker-based systemd services
[16:11:55] <bd808>	 yuck. ok
[16:12:18] <bd808>	 have you tried the big hammer "fix" of restarting the instance?
[16:12:52] <Amir1>	 hmm, let me try that
[16:14:29] <Amir1>	 !log codesearch soft-rebooting codesearch4
[16:14:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL
[16:17:07] <Amir1>	 it seems is back sorta
[17:15:24] <bstorm_>	 !log clouddb-services restart osmdb on clouddb1003
[17:15:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL
[17:15:53] <bstorm_>	 !log clouddb-services restart wikilabels postgresql
[17:15:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL
[17:43:12] <audiodude>	 good morning. For toolforge DBs, is the connection limit always 10? Can we increase it for our app/user?
[17:43:19] <audiodude>	 !help
[17:43:19] <wm-bot>	 audiodude: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team
[17:44:11] <bd808>	 audiodude: the concurrent connection limit is 10 for everyone, yes.
[17:44:15] <audiodude>	 the use case is, we have 8 worker processes and just recently launched a web tool which needs a connection for each request. We also have the legacy web tool running which probably (I haven't looked into it) also needs a connection for each request.
[17:45:18] <audiodude>	 are per-user exceptions a possibility?
[17:46:00] <bd808>	 audiodude: it sounds like you have multiple tools running under a shared account? That's the first thing that is "easy" to change to get you more connections. Each tool account gets it's own limit of 10.
[17:46:44] <audiodude>	 and we could get read/write access to the same db? from different users?
[17:46:46] <bd808>	 making per-tool exceptions for the limit is technically possible, but we do not currently have any tooling or process for doing that
[17:47:18] <audiodude>	 I know it's the same Maria instance, but I mean the actual db/tables inside
[17:48:05] <audiodude>	 that makes sense about the exceptions, I imagine it would be a nightmare to manage and you don't want to set a precedent
[17:50:35] <bd808>	 audiodude: global read access is possible with the '*_p' naming convention. Letting more than one tool account have write access to a table could be accomplished on the mysql server with custom grants. I am honestly not sure if we have done that for existing things on toolsdb or not.
[17:51:24] <bd808>	 Some tools end up outgrowing ToolsDB and needing their own dedicated mysql server(s) too.
[17:51:36] <audiodude>	 interesting
[17:52:31] <bd808>	 The way that has been handled up to now is with a Cloud VPS project request and then the tool maintainers building and managing their own personal database server instance.
[17:52:48] <audiodude>	 ah well we already have VPS resources
[17:53:08] <audiodude>	 there just wasn't a rush to admin our own database while we could still use toolforge
[17:53:21] <bd808>	 totally understood :)
[17:53:23] <audiodude>	 uptime, upgrades, backups etc
[17:53:33] <audiodude>	 yeah
[17:54:45] <bd808>	 someday™ we would like to offer https://wiki.openstack.org/wiki/Trove or something similar to make managing db servers easier, but that is still in the vague future
[17:55:51] <audiodude>	 ain't no future like a vague future
[18:00:11] <bd808>	 audiodude: I guess I could start saying "As part of our 2030 strategy" instead of "vague future", but really just as vague ;)
[18:00:34] <bd808>	 I am also fond of "before the heat death of the universe"
[18:31:48] <rluts>	 Hi all. Can anybody tell me, can I run my script in python 3.6 interpreter in toolforge server?
[18:32:46] <rluts>	 My script not compatible with python 3.5
[18:44:17] <phamhi>	 rluts: I don't think python3.6 is supported yet... you can create a phab ticket requesting this feature and we can look into it
[19:17:36] <bd808>	 phamhi: T230961 is probably the closest task. bstorm_ and I would like to start making Debian Buster based Docker images for Toolforge's Kubernetes cluster "soon" and that will make it simple for us to provide a Python 3.7.3 runtime.
[19:17:37] <stashbot>	 T230961: Install a version of Python newer than 3.5.3 in Toolforge - https://phabricator.wikimedia.org/T230961
[19:17:38] <mutante>	 !log wikistats - deleted old instances T128642 and T21008 
[19:17:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL
[19:17:44] <stashbot>	 T21008: Search index outdated (articles 9 months), became worse with LuceneSearch 2.1 activation - https://phabricator.wikimedia.org/T21008
[19:17:44] <stashbot>	 T128642: role::simplelamp fails to start mysql due to apparmor - https://phabricator.wikimedia.org/T128642
[19:18:16] <mutante>	 !log wikistats - created new instance dancing-goat with buster as backup and to replace stretch
[19:18:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL
[19:21:22] <phamhi>	 bd808: I don't mind taking this on
[19:22:21] <bd808>	 phamhi: cool. Check in with bstorm_ to see if she has active reasons to be holding it back right now, but otherwise +1
[19:24:18] <bstorm_>	 phamhi: Go ahead!  :)  The upstream buster image is up there.  Having supported python with f-strings for k8s users sounds awesome!
[19:24:49] <bstorm_>	 There's a lot of packages that are not supporting python < 3.6 because of that now
[19:25:00] <bstorm_>	 that and a few other things...
[19:25:23] * bd808 should really merge or abandon the pile of `webservice` code changes he has had stacked up in gerrit for 7 months :/
[19:26:32] <bstorm_>	 bd808: let me know if you want some reviews for that whatnot
[19:27:05] <bd808>	 I *think* we reviewed them all and I was waiting for a "better time" to drop the changes they include.
[19:27:15] <bd808>	 but I will poke if needed :)
[19:27:17] <bd808>	 thanks
[19:28:29] <bstorm_>	 I wouldn't be surprised
[19:30:13] <bd808>	 I know one of the things sitting in that stack of patches is changing the default backend from grid to k8s and I expected that would cause some bumps for folks.
[19:30:32] <bd808>	 I could just leave that one off easily though I think...
[19:43:37] <codezee>	 i'm looking at revisions of some articles (very early revisions, around ~2006) in the revision table but the rev_len field for all of them is 0. Is this field no longer used?
[19:47:42] <bd808>	 codezee: per https://www.mediawiki.org/wiki/Manual:Revision_table#rev_len it seems to be an active field, but I would not be surprised to hear that it is not populated correctly for 100% of revisions (and especially really old ones)
[19:50:24] <bd808>	 codezee: heh. that field was added in MediaWiki 1.10 which was released in May 2007 -- https://lists.wikimedia.org/pipermail/mediawiki-announce/2007-May/000063.html -- That might explain it being empty for revisions in the 2006 era and before.
[19:53:28] <codezee>	 bd808: ok, but i just saw recent edits(2019) for the set of articles i'm looking at , and still rev-len is 0
[19:54:44] <bd808>	 interesting. I don't have an active theory about that. Somebody in #wikimedia-tech might know better when it ends up being empty and what that means.
[20:06:17] <Platonides>	 oh, you were also asking here
[20:44:18] <wm-bot>	 !log tools.stewardbots <maurelio> Restarting SULWatcher and StewardBot
[20:44:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL
[21:14:06] <Zppix>	 I'm curious, is it possible for toolforge users to create a systemd.timer to git pull? 
[21:19:22] <bd808>	 Zppix: technically, yes. But really no because we do not have a system for managing user systemd timers similar to the system we have for managing crontabs in Toolforge.
[21:20:29] <bd808>	 The main problem is that the systemd timer will be tied to the host you create it on (probably a bastion) and there is nothing that backs that data up or ensures that the work is done on a grid engine exec node rather than the bastion.
[21:20:40] <Zppix>	 bd808:  so any good alt for automatic git pull that is set to occour when a push happens on github?
[21:22:48] <bd808>	 Zppix: some tools use cron to pull once or more per day. Others have setup webhook based notifications. There is a "not recommended but documented" method on wikitech somewhere too.. let me see if I can find it
[21:23:38] <Zppix>	 bd808:  i saw the webhook one on wikitech but im not to sure im 100% sure on how to set that up and thats the kind of thing i want 
[21:23:50] <bd808>	 https://wikitech.wikimedia.org/wiki/Help:Toolforge/Auto-update_a_tool_from_GitHub
[21:24:42] <Zppix>	 bd808:  but what if the git repo i need to be pulled isnt in public_html?
[21:24:46] <bd808>	 The main concern with the webhook system, or a timer or cron actually, is that if the github repo is compromised then the compromised code is automatically deployed
[21:25:22] <bd808>	 Zppix: I think that would just be a matter of changing the `git pull` to a more complex shell command
[21:25:32] <Zppix>	 ok
[21:25:48] * bd808 is now wincing at the use of PHP's backtick operator in that code
[21:26:55] <bd808>	 Zppix: something like the `git -C ../www/static pull` example lower on the page with the path tweaked to match your target git clone
[21:27:33] <Zppix>	 so simply just change the path to /example/repodir/ ?
[21:28:08] <Zppix>	 correct
[21:28:09] <Zppix>	 ?
[21:29:08] <bd808>	 Zppix: correct. If you are using PHP to handle the incoming message from github, then the current working directory of the PHP process will be $HOME/public_html and you can use either a relative or absolute path to target a particular git clone
[21:29:38] <Zppix>	 bd808:  okay ill try that thanks, sorry for all the questions its just git isnt my strong suit
[21:30:12] <bd808>	 Just to preserve my public history on this topic... do any automatic update of code at your own risk. If the upstream git repo is compromised then your tool is compromised.
[22:39:15] <wm-bot>	 !log tools.lexeme-forms <lucaswerkmeister> deployed 19bf4e3347 (remove PHP_ENGINE cookie)
[22:39:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL