[00:02:45] !log tools depooled tools-sgewebgrid-lighttpd-0918 and 0919 to move to cloudvirt1004 to improve spread [00:02:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [00:15:26] !log tools moving tools-sgewebgrid-lighttpd-0918 and -0919 to cloudvirt1004 from cloudvirt1029 to rebalance load [00:15:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [00:17:06] !log tools repooled tools-sgewebgrid-lighttpd-0918 [00:17:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [00:26:13] !log tools repooled tools-sgewebgrid-lighttpd-0919 [00:26:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [01:20:33] !log tools.citations Restarted webservice https://en.wikipedia.org/wiki/User_talk:Krenair#Citation_bot [01:20:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.citations/SAL [05:02:09] !log tools Creating tools-k8s-worker-[6-14] [05:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:20:56] !log tools Deleting busted tools-k8s-worker-[6-14] [05:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:28:18] !log tools Creating tools-k8s-worker-[6-14] (again) [05:28:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:08:30] hmm, looks like Hay's tools directory is down [09:27:00] Erutuon: do you have a link for me to check? [09:49:48] https://tools.wmflabs.org/hay/directory/ [09:49:55] arturo, ^ [09:50:10] looking [09:51:50] !log tools.hay stop/start webservice. It was showing 503 for whatever reason [09:51:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.hay/SAL [09:52:06] andre__: seems better now? cc Erutuon [09:58:24] BTW /me sends a big hello wave to andre__ [10:02:24] !log admin icinga downtime cloudvirt1009 for 30 minutes to re-create canary VM (T242078) [10:02:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:02:30] T242078: CloudVPS: prometheus-openstack-exporter producing bogus metrics values - https://phabricator.wikimedia.org/T242078 [10:07:52] !log testlabs delete canary1009-01 VM to re-create it (T242078) [10:07:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [10:07:55] T242078: CloudVPS: prometheus-openstack-exporter producing bogus metrics values - https://phabricator.wikimedia.org/T242078 [10:08:26] !log testlabs delete VM stretch-boot-arturo-01 no longer in use [10:08:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [10:08:33] !log testlabs delete VM buster-boot-arturo-01 no longer in use [10:08:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [10:08:37] !log testlabs delete VM stretch-boot-arturo-02 no longer in use [10:08:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [10:24:33] !log testlabs created canary1009-01 VM using horizon and the cold-migrate it to cloudvirt1009 from cloudvirt1006 (T242078) [10:24:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [10:24:35] T242078: CloudVPS: prometheus-openstack-exporter producing bogus metrics values - https://phabricator.wikimedia.org/T242078 [10:37:06] go ema [10:37:09] nope! [11:12:33] !log admin icinga-downtime everything cloud* for 30 minutes to merge nova scheduler changes [11:12:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:17:38] * Erutuon thanks arturo [11:18:04] now I have to remember what kind of tool I was going to look for... [11:18:07] o/ [13:23:00] !log tools [new k8s] doing changes to kube-state-metrics and metrics-server trying to relocate them to the 'metrics' namespace (T241853) [13:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:23:03] T241853: Move metrics-server and kube-state-metrics into the new metrics namespace - https://phabricator.wikimedia.org/T241853 [13:31:24] !log tools upload docker-registry.tools.wmflabs.org/metrics-server-amd64:v0.3.6 copied from k8s.gcr.io/metrics-server-amd64:v0.3.6 (T241853) [13:31:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:31:27] T241853: Move metrics-server and kube-state-metrics into the new metrics namespace - https://phabricator.wikimedia.org/T241853 [13:33:49] !log tools upload docker-registry.tools.wmflabs.org/coreos/kube-state-metrics:v1.8.0 copied from quay.io/coreos/kube-state-metrics:v1.8.0 (T241853) [13:33:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:02:12] !log tools `root@tools-k8s-control-3:~# wmcs-k8s-secret-for-cert -n metrics -s metrics-server-certs -a metrics-server` (T241853) [14:02:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:02:16] T241853: Move metrics-server and kube-state-metrics into the new metrics namespace - https://phabricator.wikimedia.org/T241853 [14:07:14] Could a Toolforge admin start the `userviews` tool? Just `webservice start` should do it. I'm in holiday and can't do it myself. Thanks! [14:09:57] musikanimal: [14:10:01] 14:08 [production]tools.userviews@tools-sgebastion-07:~ (master)🍺 webservice start [14:10:01] Your job is already running [14:10:49] musikanimal: I did this [14:10:51] https://www.irccloud.com/pastebin/EYrhYXVq/ [14:11:12] !log tools.userviews stop/start webservice per musikanimal request on IRC [14:11:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.userviews/SAL [14:34:31] hey our wm-bot stopped logging our logs on 2019-12-16. I looked over https://wikitech.wikimedia.org/wiki/Wm-bot but I don't have access to wm-bot2.wm-bot.eqiad.wmflabs so I don't think I can troubleshoot further. Any help is appreciated [14:42:24] milimetric: according to https://tools.wmflabs.org/openstack-browser/project/wm-bot only andrewbogott and petan have access [14:42:45] thanks hauskatze, petan: sorry to ping, you're the only maintainer around, any advice ^? [14:43:10] and WMCS admins via sudo I guess [14:57:37] milimetric: Hi, I'll take a look at the wm-bot2.wm-bot.eqiad.wmflabs instance [14:58:20] jeh: thank you very much, it does seem to only affect our logs: https://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics [14:58:20] Hey milimetric, you are welcome! [14:59:00] * milimetric is afraid because a bot is talking to him [15:09:40] milimetric: helle [15:09:42] * hello [15:09:49] !log wm-bot log archive missing for some channels since 2019-12-16, restarted wm-bot service on wm-bot2 [15:09:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wm-bot/SAL [15:09:53] all sorted petan, thank you [15:10:01] interesting, how? [15:10:20] j e h said: looks like one of the wm-bot processes was hung, did a clean stop/start on the service [15:10:55] `service wm-bot stop` ensured everything was shutdown and `service wm-bot start` [15:11:14] ok interesting [15:11:24] I am wondering what happened to log files though... [15:11:38] btw there is also SQL based logging backend [15:11:41] I checked the /opt/wm-bot/*logs, but didn't see anything interesting (to me at least) [15:12:07] it would be in wmib.log I guess, I will check it later [15:14:54] https://wm-bot.wmflabs.org/browser/ [15:16:20] that link points to SQL based channel logs, so if you are missing any logs they might be present in there [15:17:32] might want to logrotate this one though lol: -rw-r--r-- 1 wm-bot wm-bot 221M Jan 7 15:05 wmib.log [15:17:51] looks like `2019-12-16 00:32:42` was the last message in wikimedia-analytics before the service restart [15:18:03] that log file starts at 2018 [15:31:20] arturo: thank you! [15:33:55] petan: how do we logrotate? [15:34:25] (since I don't have access to that cloud vm) [15:35:35] !log tools changed kubeadm-config to use a list instead of a hash for extravols on the apiserver in the new k8s cluster T242067 [15:35:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:35:39] T242067: Error joining new worker node to Toolforge Kubernetes cluster - https://phabricator.wikimedia.org/T242067 [15:46:29] !log tools Rebooting tools-k8s-worker-[6-14] [15:46:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:51:23] !log toolsbeta changed kubeadm-config to use a list instead of a hash for extravols on the apiserver in the new k8s cluster T242067 [15:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:33:33] !log tools deleted by hand pod metrics/cadvisor-5pd46 due to prometheus having issues scrapping it [16:33:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:20:26] !log openstack create commons-corruption-checker project T241635 [21:20:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [21:20:28] T241635: Request creation of commons-corruption-checker VPS project - https://phabricator.wikimedia.org/T241635 [22:34:14] !log tools.replacer Shut down the webservice because it was taking down a node with excessively verbose STDOUT/ERR logging. [22:34:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.replacer/SAL [22:40:32] !log tools rebooted tools-worker-1007 to recover it from disk full and general badness [22:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL