[00:35:09] Why is /home/stuem007 world writable? [00:55:17] Dispenser: yuck. because the user chmoded it 0777 presumably [00:57:01] I see some files in there owned by the tools.suggestbot group. I bet they were trying to make some file copies, had problems, and did the worst/easiest thing to fix them [01:00:28] File ownership permissions are a bitch. Can't give ownership for security reasons [01:02:10] (03PS1) 10Ayounsi: Add non-secret password for netbox DB replication [labs/private] - 10https://gerrit.wikimedia.org/r/389654 [01:03:15] (03CR) 10Ayounsi: [V: 032 C: 032] Add non-secret password for netbox DB replication [labs/private] - 10https://gerrit.wikimedia.org/r/389654 (owner: 10Ayounsi) [01:05:11] BTW, why are there random files by root in /home? cbench.test, foo, test, hi -> /bin/sh, and wat.bash [01:07:42] Dispenser: probably various tests done by some admin at a point when debugging NFS issues or file caches that have never been cleaned up. I'll take a look in a bit. [01:19:00] We probably should have a script that periodically checks permissions, no world writeable, no wikidev group writable, $USER can write to all files under /home/$USER/, etc. [01:21:38] !log tools Removed all non-directory files from /home (via labstore1004 direct access) [01:21:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [01:24:58] I have mixed feelings about some of those tests, but we have seen people accidentally cause chaos with an rm -rf from the wrong dir and o+w directories. [01:25:45] Unix directory permissions are a bit scary in that o+w grants everyone delete permission on files even if they can't read them. [01:27:06] I wonder if we made a task about this after the great delete that happened in July? [01:27:08] * bd808 looks [01:30:53] It doesn't look like we did :/ [01:34:19] I suspect part of the reason is become tool discards all shell customizations (e.g. my green highlight for world writable files) [01:35:39] `become tool` is a thin wrapper around `sudo su tool` so the change of default environment is expected, but I get your point [01:35:41] Took far too long to figure out how, but eventually I created /data/project/dispenser/.profile that'll load /home/$SUDO_USER/.bashrc [01:36:47] The real bug in Toolforge is that you can get a shell as a non-tool [01:36:49] I don't know if there security issues with it (like exec something) [01:37:21] but there are auditing issues with direct auth as a tool that haven't been well thought out [01:38:46] that .profile trick is kind of neat [01:39:23] you can move it to .bash_profile and get rid of the need for the $BASH_VERISON guard [01:40:38] I kept as close to /etc/skel so it wouldn't break anything [01:40:47] *nod* [02:08:26] Dispenser, it's possible to 'sudo -su tools.' [02:09:26] which if you have world-readable .bashrc should keep your $PS1 etc. [02:10:52] 'become' uses -i instead of -s [02:12:39] (doesn't strictly have to be world-readable of course but it'll need to be readable by whatever tool you switch to) [02:24:35] !log tools.tedbot Killed /data/project/tedbot/bot_data/wiki.pl process running on tools-bastion-03. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid for instructions on running jobs on the grid. [02:24:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.tedbot/SAL [02:26:10] !log tools.congressedits Killed /data/project/congressedits/congresseditors/congresseditors.js process running on tools-bastion-03. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid for instructions on running jobs on the grid. [02:26:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.congressedits/SAL [02:27:12] I wish I could force all tool maintainers to watch their tool's SAL [02:27:35] * bd808 is lazy about sending emails for rogue tools [02:52:26] You could ping them with echo [03:27:33] So the geohack log is 53 GB now... Probably shouldn't even attempt to tail it... [03:28:26] rm /data/project/geohack/access.log # Log too big @ 53 GB [04:30:10] Dispenser: rm'ing the access log while the webservice is still running actually doesn't work. lighttpd keeps the file handle open so it just makes an orphan inode on the NFS server [06:11:48] tail should work fine with a 53G file, as far as I know. [11:27:41] (03PS1) 10Alexandros Kosiaris: Change k8s_infrastructure_users structure [labs/private] - 10https://gerrit.wikimedia.org/r/389696 [11:31:09] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Change k8s_infrastructure_users structure [labs/private] - 10https://gerrit.wikimedia.org/r/389696 (owner: 10Alexandros Kosiaris) [11:51:46] (03PS1) 10Alexandros Kosiaris: Fix k8s_infrastructure_users [labs/private] - 10https://gerrit.wikimedia.org/r/389700 [11:55:14] (03PS2) 10Alexandros Kosiaris: Fix k8s_infrastructure_users [labs/private] - 10https://gerrit.wikimedia.org/r/389700 [11:56:40] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Fix k8s_infrastructure_users [labs/private] - 10https://gerrit.wikimedia.org/r/389700 (owner: 10Alexandros Kosiaris) [15:42:57] max_user_connections=10 for cluster=web is too low. The default pipeline count in browser is 6, two users can easily overload a tool. [15:52:33] !log suggestbot Moved inlink count tables to s51172__ilc_p on tools.labsdb, restarted SuggestBot [15:52:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Suggestbot/SAL [15:55:53] !log suggestbot Dropped database p50380g50553__ilc from c3.labsdb [15:55:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Suggestbot/SAL [16:16:13] !log testlabs Running 3rd round of stress tests on labvirt1015 (T171473) [16:16:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [16:16:15] T171473: labvirt1015 crashes - https://phabricator.wikimedia.org/T171473 [18:23:13] gosh darn the new database servers are fast! thank you, thank you, dear administrators for providing us with more hardware! :) [18:47:20] Nettrom: :) yw. Our DBAs did all of the hard work. [18:51:38] bd808: I figured it’s a group effort, the DBAs get my applause as well :) [20:03:06] (03CR) 10Lokal Profil: "a though. is this actually due to old table entries? I.e. entries which were created before we patched to ensure they always have all part" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/378975 (https://phabricator.wikimedia.org/T174503) (owner: 10Jean-Frédéric) [20:19:23] hi, I'm trying have a bash script restart a k8s pod, that's ran by the cron [20:19:41] however it keeps complaining `kubectl: not found` [20:19:55] I can run the bash script directly, without a cronjob, and it works [20:20:15] any ideas? [20:21:49] the cron entry is like `*/10 * * * * jlocal ~/restart_survey_cop.sh` [20:22:35] the script is executable, with `#!/bin/bash` at the top [20:22:56] it just runs `kubectl delete deployment ` and `create`s another [20:29:32] musikanimal: the cron host might have no knowledge of the k8s cluster [20:29:40] idk how to solve it though [20:29:48] evidently not [20:30:38] boo [20:33:29] musikanimal: first question is why do you need to cron a restart [20:33:40] yeah... haha [20:33:46] is the k8s deployment not restarting the pod? [20:34:18] no it does, for certain errors [20:34:23] the k8s reconciliation loop *should* take care of this sort of thing if the process inside the container is fully dying [20:34:51] well here it's erroring out because the login session died, e.g. invalid edit token [20:34:56] I've tried everything [20:35:06] you can re-login every time, at least not with this library [20:35:16] but I did a try/catch type thing, still no dice [20:35:33] have you read about pywikibot? ;) [20:36:06] k8s doesn't actually restart when that error happens though, for some reason. Doesn't matter though, it needs to not die when it attempts to edit [20:36:32] the cron host may not be setup to talk to kubernetes as zhuyifei1999_ pointed out. Probably worth a phab task [20:36:50] okay I can create that [20:37:07] in the meantime, I can do something really hacky... [20:37:14] I would not be surprised to find out that we only have the bastions setup to talk to the api [20:40:07] bd808: do you know if just the API's edit token expires, or could your login session as a whole expire? [20:40:21] the docs aren't clear on that [20:41:40] the long session can expire too, but it takes a long period of inactivity typically. A robust bot script should be prepared to deal with that though, because it does happen [20:42:03] writing a bot framework that is well behaved is quite a bit of work [20:42:11] lots of little edge cases [20:42:17] yeah I'm gonna make some PRs to this one once I figure it out [20:42:24] sweet [20:42:51] I know in my case it won't go idle for a very long time, so I'm going to focus on refreshing the edit token for now [20:43:15] that might actually solve my problem, and I won't need to restart the pod [21:08:59] (03CR) 10Lokal Profil: "I've created the landing page on Commons which will get linked to from each of the report pages." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/380060 (https://phabricator.wikimedia.org/T176528) (owner: 10Lokal Profil)