[00:35:09] <Dispenser>	 Why is /home/stuem007 world writable?
[00:55:17] <bd808>	 Dispenser: yuck. because the user chmoded it 0777 presumably
[00:57:01] <bd808>	 I see some files in there owned by the tools.suggestbot group. I bet they were trying to make some file copies, had problems, and did the worst/easiest thing to fix them
[01:00:28] <Dispenser>	 File ownership permissions are a bitch.  Can't give ownership for security reasons
[01:02:10] <wikibugs>	 (03PS1) 10Ayounsi: Add non-secret password for netbox DB replication [labs/private] - 10https://gerrit.wikimedia.org/r/389654
[01:03:15] <wikibugs>	 (03CR) 10Ayounsi: [V: 032 C: 032] Add non-secret password for netbox DB replication [labs/private] - 10https://gerrit.wikimedia.org/r/389654 (owner: 10Ayounsi)
[01:05:11] <Dispenser>	 BTW, why are there random files by root in /home?  cbench.test, foo, test, hi -> /bin/sh, and wat.bash
[01:07:42] <bd808>	 Dispenser: probably various tests done by some admin at a point when debugging NFS issues or file caches that have never been cleaned up. I'll take a look in a bit.
[01:19:00] <Dispenser>	 We probably should have a script that periodically checks permissions, no world writeable, no wikidev group writable, $USER can write to all files under /home/$USER/, etc.
[01:21:38] <bd808>	 !log tools Removed all non-directory files from /home (via labstore1004 direct access)
[01:21:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[01:24:58] <bd808>	 I have mixed feelings about some of those tests, but we have seen people accidentally cause chaos with an rm -rf from the wrong dir and o+w directories.
[01:25:45] <bd808>	 Unix directory permissions are a bit scary in that o+w grants everyone delete permission on files even if they can't read them.
[01:27:06] <bd808>	 I wonder if we made a task about this after the great delete that happened in July?
[01:27:08] * bd808 looks
[01:30:53] <bd808>	 It doesn't look like we did :/
[01:34:19] <Dispenser>	 I suspect part of the reason is become tool discards all shell customizations (e.g. my green highlight for world writable files)
[01:35:39] <bd808>	 `become tool` is a thin wrapper around `sudo su tool` so the change of default environment is expected, but I get your point
[01:35:41] <Dispenser>	 Took far too long to figure out how, but eventually I created /data/project/dispenser/.profile that'll load /home/$SUDO_USER/.bashrc
[01:36:47] <bd808>	 The real bug in Toolforge is that you can get a shell as a non-tool
[01:36:49] <Dispenser>	 I don't know if there security issues with it (like exec something)
[01:37:21] <bd808>	 but there are auditing issues with direct auth as a tool that haven't been well thought out
[01:38:46] <bd808>	 that .profile trick is kind of neat
[01:39:23] <bd808>	 you can move it to .bash_profile and get rid of the need for the $BASH_VERISON guard
[01:40:38] <Dispenser>	 I kept as close to /etc/skel so it wouldn't break anything
[01:40:47] <bd808>	 *nod*
[02:08:26] <Krenair>	 Dispenser, it's possible to 'sudo -su tools.<toolname>'
[02:09:26] <Krenair>	 which if you have world-readable .bashrc should keep your $PS1 etc.
[02:10:52] <Krenair>	 'become' uses -i instead of -s
[02:12:39] <Krenair>	 (doesn't strictly have to be world-readable of course but it'll need to be readable by whatever tool you switch to)
[02:24:35] <bd808>	 !log tools.tedbot Killed /data/project/tedbot/bot_data/wiki.pl process running on tools-bastion-03. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid for instructions on running jobs on the grid.
[02:24:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.tedbot/SAL
[02:26:10] <bd808>	 !log tools.congressedits Killed /data/project/congressedits/congresseditors/congresseditors.js process running on tools-bastion-03. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid for instructions on running jobs on the grid.
[02:26:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.congressedits/SAL
[02:27:12] <bd808>	 I wish I could force all tool maintainers to watch their tool's SAL
[02:27:35] * bd808 is lazy about sending emails for rogue tools
[02:52:26] <Dispenser>	 You could ping them with echo
[03:27:33] <Dispenser>	 So the geohack log is 53 GB now... Probably shouldn't even attempt to tail it...
[03:28:26] <Dispenser>	 rm /data/project/geohack/access.log # Log too big @ 53 GB
[04:30:10] <bd808>	 Dispenser: rm'ing the access log while the webservice is still running actually doesn't work. lighttpd keeps the file handle open so it just makes an orphan inode on the NFS server
[06:11:48] <Esther>	 tail should work fine with a 53G file, as far as I know.
[11:27:41] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Change k8s_infrastructure_users structure [labs/private] - 10https://gerrit.wikimedia.org/r/389696
[11:31:09] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Change k8s_infrastructure_users structure [labs/private] - 10https://gerrit.wikimedia.org/r/389696 (owner: 10Alexandros Kosiaris)
[11:51:46] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Fix k8s_infrastructure_users [labs/private] - 10https://gerrit.wikimedia.org/r/389700
[11:55:14] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Fix k8s_infrastructure_users [labs/private] - 10https://gerrit.wikimedia.org/r/389700
[11:56:40] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Fix k8s_infrastructure_users [labs/private] - 10https://gerrit.wikimedia.org/r/389700 (owner: 10Alexandros Kosiaris)
[15:42:57] <Dispenser>	 max_user_connections=10 for cluster=web is too low.  The default pipeline count in browser is 6, two users can easily overload a tool.
[15:52:33] <Nettrom>	 !log suggestbot Moved inlink count tables to s51172__ilc_p on tools.labsdb, restarted SuggestBot
[15:52:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Suggestbot/SAL
[15:55:53] <Nettrom>	 !log suggestbot Dropped database p50380g50553__ilc from c3.labsdb
[15:55:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Suggestbot/SAL
[16:16:13] <bd808>	 !log testlabs Running 3rd round of stress tests on labvirt1015 (T171473)
[16:16:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL
[16:16:15] <stashbot>	 T171473: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473
[18:23:13] <Nettrom>	 gosh darn the new database servers are fast! thank you, thank you, dear administrators for providing us with more hardware! :)
[18:47:20] <bd808>	 Nettrom: :) yw. Our DBAs did all of the hard work.
[18:51:38] <Nettrom>	 bd808: I figured it’s a group effort, the DBAs get my applause as well :)
[20:03:06] <wikibugs>	 (03CR) 10Lokal Profil: "a though. is this actually due to old table entries? I.e. entries which were created before we patched to ensure they always have all part" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378975 (https://phabricator.wikimedia.org/T174503) (owner: 10Jean-Frédéric)
[20:19:23] <musikanimal>	 hi, I'm trying have a bash script restart a k8s pod, that's ran by the cron
[20:19:41] <musikanimal>	 however it keeps complaining `kubectl: not found`
[20:19:55] <musikanimal>	 I can run the bash script directly, without a cronjob, and it works
[20:20:15] <musikanimal>	 any ideas?
[20:21:49] <musikanimal>	 the cron entry is like `*/10 * * * * jlocal ~/restart_survey_cop.sh`
[20:22:35] <musikanimal>	 the script is executable, with `#!/bin/bash` at the top
[20:22:56] <musikanimal>	 it just runs `kubectl delete deployment <name>` and `create`s another 
[20:29:32] <zhuyifei1999_>	 musikanimal: the cron host might have no knowledge of the k8s cluster
[20:29:40] <zhuyifei1999_>	 idk how to solve it though
[20:29:48] <musikanimal>	 evidently not
[20:30:38] <musikanimal>	 boo
[20:33:29] <bd808>	 musikanimal: first question is why do you need to cron a restart
[20:33:40] <musikanimal>	 yeah... haha
[20:33:46] <bd808>	 is the k8s deployment not restarting the pod?
[20:34:18] <musikanimal>	 no it does, for certain errors
[20:34:23] <bd808>	 the k8s reconciliation loop *should* take care of this sort of thing if the process inside the container is fully dying
[20:34:51] <musikanimal>	 well here it's erroring out because the login session died, e.g. invalid edit token
[20:34:56] <musikanimal>	 I've tried everything
[20:35:06] <musikanimal>	 you can re-login every time, at least not with this library
[20:35:16] <musikanimal>	 but I did a try/catch type thing, still no dice
[20:35:33] <bd808>	 have you read about pywikibot? ;)
[20:36:06] <musikanimal>	 k8s doesn't actually restart when that error happens though, for some reason. Doesn't matter though, it needs to not die when it attempts to edit
[20:36:32] <bd808>	 the cron host may not be setup to talk to kubernetes as zhuyifei1999_ pointed out. Probably worth a phab task
[20:36:50] <musikanimal>	 okay I can create that
[20:37:07] <musikanimal>	 in the meantime, I can do something really hacky...
[20:37:14] <bd808>	 I would not be surprised to find out that we only have the bastions setup to talk to the api
[20:40:07] <musikanimal>	 bd808: do you know if just the API's edit token expires, or could your login session as a whole expire?
[20:40:21] <musikanimal>	 the docs aren't clear on that
[20:41:40] <bd808>	 the long session can expire too, but it takes a long period of inactivity typically. A robust bot script should be prepared to deal with that though, because it does happen
[20:42:03] <bd808>	 writing a bot framework that is well behaved is quite a bit of work
[20:42:11] <bd808>	 lots of little edge cases
[20:42:17] <musikanimal>	 yeah I'm gonna make some PRs to this one once I figure it out
[20:42:24] <bd808>	 sweet
[20:42:51] <musikanimal>	 I know in my case it won't go idle for a very long time, so I'm going to focus on refreshing the edit token for now
[20:43:15] <musikanimal>	 that might actually solve my problem, and I won't need to restart the pod
[21:08:59] <wikibugs>	 (03CR) 10Lokal Profil: "I've created the landing page on Commons which will get linked to from each of the report pages." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380060 (https://phabricator.wikimedia.org/T176528) (owner: 10Lokal Profil)