[02:22:56] !log tools.wm-commons-emoji-bot Stoped webservice. Kubernetes pod stuck in CrashLoopBackOff with 43007 restarts [02:22:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wm-commons-emoji-bot/SAL [02:26:17] !log tools.unique Stoped webservice. Kubernetes pod stuck in CrashLoopBackOff with 12325 restarts [02:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.unique/SAL [02:28:01] !log tools.sparrow Stoped webservice. Kubernetes pod stuck in CrashLoopBackOff with 22338 restarts [02:28:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sparrow/SAL [02:29:33] !log tools.deprecated-fixer-bot Stoped webservice. Kubernetes pod stuck in CrashLoopBackOff with 22317 restarts [02:29:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.deprecated-fixer-bot/SAL [02:52:37] !log tools.sparrow Deleted orphan ReplicaSet that was restarting webservice pod [02:52:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sparrow/SAL [02:54:45] !log tools.etytree Stopped webservice. Kubernetes pod stuck in CrashLoopBackOff with 8746 restarts [02:54:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.etytree/SAL [13:47:01] !log tools T213421 create tools-services-03 and tools-services-04 (stretch) they will use the new puppet role `role::wmcs::toolforge::services` [13:47:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:47:04] T213421: Toolforge: move services nodes from eqiad to eqiad1 - https://phabricator.wikimedia.org/T213421 [13:53:19] !log tools T213421 delete tools-services-03/04 and create them with another prefix: tools-sge-services-03/04 to actually use the new role [13:53:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:53:23] T213421: Toolforge: move services nodes from eqiad to eqiad1 - https://phabricator.wikimedia.org/T213421 [14:00:26] !log tools T213421 disable updatetools in the new services nodes while building them [14:00:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:00:31] T213421: Toolforge: move services nodes from eqiad to eqiad1 - https://phabricator.wikimedia.org/T213421 [16:44:54] !log tools T213418 docker-registry.tools.wmflabs.org point floating IP to tools-docker-registry-02 [16:44:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:44:57] T213418: Toolforge: move docker nodes from eqiad to eqiad1 - https://phabricator.wikimedia.org/T213418 [19:05:41] !log social-tools deleting some proxy configs with non-existent backends: gadgets.wmflabs.org, gadgetsclient-gadgets.wmflabs.org [19:05:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Social-tools/SAL [21:22:40] Suddenly tools-bastion-03 seems to be being really slow. As in taking minutes to execute commands like `crontab -r` and return to the prompt. [21:27:01] * zhuyifei1999_ looks [21:27:39] confirmed, it takes ages to authenticate alone [21:29:55] NFS lag, checking which one I should kill [21:33:45] !log tools SIGTERM PID 12542 24780 875 14569 14722. `tail`s with parent as init, belonging to user maxlath. they should submit to grid. [21:33:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:34:46] there is one scp and one egrep that's likely causing more trouble [21:35:42] scp finished I'm killing egrep [21:36:37] !log tools killed an egrep using too mush NFS bandwidth on tools-bastion-03 [21:36:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:37:59] !log tools that command belonged to tools.scholia (with fnielsen as the ssh user) [21:37:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:41:16] anomie Hauskatze: faster now? [21:41:44] zhuyifei1999_: Yes, thanks [21:41:50] I logged out a second ago, but it was faster yes [21:42:03] thanks [21:42:12] !log tools also `write`-ed to them (as root). auth on my personal account would take a long time [21:42:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:58:06] zhuyifei1999_: thanks for taking care of that [22:00:39] I am currently running a job on tools-exec-1442 with a login via qrsh. Before I did run it on tools-bastion-03 which was wrong and it was killed. I wonder if the qrsh method is ok? Or whether ther is still an NFS issue? [22:02:13] fnielsen: qrsh is better, yes. Things that cause a lot of NFS traffic can be problematic generally, but are better when the load is on a grid worker node rather than a bastion. [22:03:01] I am doing this job: egrep "^\[pid:" uwsgi.log | grep -v "GET /scholia/static/" | grep -v "GET /scholia/images/" | grep -v "HEAD /scholia" | awk '{print $12 " " $13 " " $14 " " $15 " " $16 " " $18 }' > stats-`date -I`.txt [22:03:04] !log tools T213711 Added ports needed for etcd-flannel to work on the etcd security group in eqiad [22:03:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:03:07] T213711: move tools proxy nodes to eqiad1 - https://phabricator.wikimedia.org/T213711 [22:03:09] I do not know if that can be improved [22:03:53] !log tools T213711 Added UDP port needed for flannel packets to work to k8s worker sec groups in both eqiad and eqiad1-r [22:03:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:04:48] fnielsen: the main problem with things like that on the bastion is that we limit the total IO over NFS from each host and that can make doing anything at all slow on the bastion where lots of people share the same origin as far as the NFS server is concerned. [22:06:21] Are there any rules on whether it is ok to copy the uwsgi.log out of Toolforge? [22:08:34] it should be fine to do that generally. You should treat the file as potentially sensitive if it contains user-agent strings or messages that show OAuth usernames [22:09:32] fnielsen: you might be able to mess around with a custom $HOME/www/python/uwsgi.ini file to get the trimmed down log output you want as well -- https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Default_uwsgi_configuration [22:10:45] I do not seem to have any header or other sensitive information, unless users accidentally copy-paste sensitive information into the URL. [22:13:56] I am now doing a scp via login.tools.wmflabs.org. I wonder if that is a problem? [22:16:18] fnielsen: it can be. You could use tools-dev.wmflabs.org instead. That is a lesser known bastion that may have lower contention for NFS resources [22:16:32] Ok. [22:16:41] it really all depends on how much data you are trying to move in/out [22:16:47] I did shut down the scp just a minute ago. [22:16:54] hmm, i can't connect to the old maps nodes... [22:17:14] I will use the tools-dev for the small file then (which hopefully will finish [22:17:25] thedj: which ones in particular? [22:18:10] maps-tiles2.eqiad.wmflabs [22:18:29] maps-tiles1.eqiad.wmflabs of eqiadr works just fine [22:19:01] * bd808 tries [22:21:26] thedj: I'm not getting in either, but openstack tells me the instance is up. I bet this is a cross-region firewall problem. We moved the shared bastions to the new eqiad1-r region and that host is in the old eqiad region [22:21:45] yeah suspected something like that [22:22:34] i was getting a permission error with the pqsql access on the renderer, but that might just be normal. I really wouldn't know without checking what the query does on the old server. [22:23:06] !log maps Added BryanDavis (self) as project admin to debug cross-region ssh access [22:23:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL [22:25:54] !log maps Added ssh from 172.16.0.0/21 to default security group in eqiad region [22:25:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL [22:26:03] thedj: I think it will work for you now [22:26:42] bd808: yup awesome ! [22:26:55] * bd808 likes the easy fixes [22:32:01] k. i'll try to see this weekend if the new server is able to properly render new tiles [22:33:04] if so, we can switch traffic to that one. And then i'll get back to my pet project of puppetizing it, so we have additional nodes [22:33:23] bedtime ! [22:39:35] !log upgraded packages and MW version on wikitech-static [22:39:35] andrewbogott: Unknown project "upgraded" [22:39:46] hm, wrong channel [23:31:43] bd808: np :)