[01:56:11] Is "sudo: a password is required" normal when trying to `become` a tool that was created too recently? [01:57:29] I think that means all the background processes haven't finished [01:57:32] Or have got stuck [01:58:47] Apparently it was, and I was just being too impatient :P [01:58:52] thanks [05:57:20] !log admin schedule 4h downtime for cloudvirts and other openstack components due to upgrade ops [05:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [06:11:23] !log admin schedule 4h downtime for labstores [06:11:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [06:39:34] zhuyifei1999_: 🎉 thanks! :-) [07:07:39] Hello. Toolforge seems to have troubles, is someone working on it ? [07:08:10] When I login to my account there, I got "Stale file handle" error messages for my home directory or even /home [07:08:51] NicoV: an upgrade is in progress [07:09:09] ok, thanks ! [07:44:26] when is the maint supposed to finish? [07:49:09] revi: it's done when it's done. It was scheduled for two hours (which isn't up) but we've encountered an unexpected networking issue so it may take longer [07:50:00] ack [07:54:18] Hi, has anyone noticed problems with mx-out02.wmflabs.org? I'm seeing timeout fatal errors for jobs related to ending emails from https://discuss-space.wmflabs.org [07:54:53] This seems to be quite recent, from an hour ago or so. [07:57:02] ("sending" email, not "ending") :) [08:00:08] qgil: https://lists.wikimedia.org/pipermail/cloud-announce/2019-December/000242.html [08:03:16] Ah, thank you andrewbogott. I'll tell Sidekick to relax. :) Best wishes for the completion of this task. [08:17:18] Emails sent. Job queue empty. Thank you very much andrewbogott !! [08:18:26] Hmm … my webservices do not start? webservice --backend=kubernetes php7.2 start … both, wikihistory & persondata? [08:18:57] tried 7.3 (which is new?) too, do not work too [08:22:35] !log tools.wikiloves Moved logs/crontabl.log following emails from Cron Daemon: /bin/sh: 1: cannot create /data/project/wikiloves/logs/crontab.log: Stale file handle [08:22:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikiloves/SAL [08:34:15] !log tools reboot tools-worker-1033/1034 and tools-sgebastion-08 to try to correct NFS mount issues [08:34:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:46:17] !log tools doing `run-puppet-agent` in all VMs to see state of NFS [08:46:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:22:04] !log tools reboot tools-sgeexec-0911 to try fixing weird NFS state [09:22:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:20:21] !log tools rolling reboot for all grid & k8s worker nodes due to NFS staleness [11:20:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:03:05] !log wikibugs restart to fix double posting to irc [13:03:07] Reedy: Unknown project "wikibugs" [13:03:07] Reedy: Did you mean to say "tools.wikibugs" instead? [13:03:13] !log tools.wikibugs restart to fix double posting to irc [13:03:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [21:24:38] !log openstack schedule downtime until Jan 6th 2020 on cloudvirt1015 (bad hardware) T220853 [21:24:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [21:24:41] T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory - https://phabricator.wikimedia.org/T220853 [21:38:51] I'm seeing some "Connection timed out" issues on Wikimedia Space. Could this be related to the recent cloud-vps maintenance? [21:39:24] I've restarted my instance and haven't made any changes in any configurations, so I'm kind of at a loss. [21:41:31] Hi CKoerner_WMF, I can help take a look. I'm not familiar with the Wikimedia Space though, what is your instance name? [21:42:13] Hey @jeh blog-news is the name of the instance. [21:43:14] thanks, I'll take a look [21:44:00] Thank you. [21:47:54] CKoerner_WMF: The instance and proxy https://space.wmflabs.org appear to be OK. Where are you seeing "Connection timed out" at? [21:49:14] I have a WordPress site that talks to another cloud instance running Discourse. When I try to publish from WordPress to Discourse I get a "cURL error 28: Connection timed out after 10001 milliseconds" error. [21:57:25] The Discourse instance is discuss-space.discourse.eqiad.wmflabs which @qgil mentioned earlier as experiencing timeout issues. [21:58:56] Ok, that's helpful. Do you know if the connection is timing out between blog-news and discourse, or is the users client timing out to discourse? [21:59:28] connectivity tests look good between blog-news and discourse [22:01:11] https://discuss-space.wmflabs.org (Discourse) seems to be running fine and all interfaces say the connection is good [22:01:35] https://space.wmflabs.org (WordPress) is running sluggish and is where I'm getting the issue. :/ [22:04:09] it doesn't look like the problems we had earlier with NFS, but I'll keep looking to see if anything sticks out [22:04:55] I appreciate you looking into this. [22:53:52] !log tools rebooting the cron server, tools-sgecron-01 as it wasn't recovered from last night's maintenance [22:53:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:58:12] !log rebooting tools-acme-chief-01 [22:58:12] bstorm_: Unknown project "rebooting" [22:58:18] !log tools rebooting tools-acme-chief-01 [22:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:15:29] CKoerner_WMF: It looks like it's something in wordpress PHP causing the delay. I've narrowed down pretty much everything else, even ruled out the proxy with `$ curl -H "X-Forwarded-Proto: https" http://127.0.0.1` [23:16:03] Hrm, Ok. I'll look some more on my end. [23:16:13] maybe it's expecting a 100 Continue? [23:22:55] maybe, I didn't see any hints of that with curl or web browser consoles. The long delay is only seen on blog pages or /blog, which is ~24KB and consistently hangs. [23:23:54] !log tools.para restarting service because it is using massive amounts of CPU [23:23:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.para/SAL [23:25:40] jeh: are they POST requests? [23:25:51] no, just GET [23:25:57] hmm [23:26:27] Wordpress does a GET to discourse to publish content? [23:28:25] It uses Discoure's API to retrieve a list of Discourse categories and to publish. [23:30:04] it will probably be an uncached expensive query to discourse [23:30:20] still not sure if such timeout should be 'expected' [23:30:31] I didn't see any connection timeouts, only long delays to the blog pages with GET [23:36:58] !log tools rebooting toolschecker after downtiming the services [23:37:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:43:30] I just tried disabling plugins and that seems to make no difference either. Thinking it might be a conflict somewhere. [23:52:07] I updated all php packages and checked everything I could think of. Uninstalled WordPress plugins, rebooted the server, etc. I'm afraid I'm well past the end of my day and have an family activity to attend. I'll pick this back up in the (my) morning. Thanks @jeh for taking a look at this.