[05:39:25] Can anyone help me with T246197? I've got a script on our VPS at /etc/cron.daily/calibre-cleanup that's seemingly not being run daily. [05:39:25] T246197: "No space left on device" on wsexport-prod01.eqiad - https://phabricator.wikimedia.org/T246197 [16:20:38] hey, why am I getting all these gateway timeouts on wmflabs projects? [16:21:56] is something wrong? [16:38:42] hello! could I pester anyone into helping me debug https://phabricator.wikimedia.org/T249035 further? it seems I can't get `tcpdump` to give me more info on *outgoing* requests from the VPS instance. All I see for outgoing are DNS lookups [16:50:41] !log tools launching tools-redis-03 (Buster) to see what happens [16:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:51:22] musikanimal: it's likely that tc ratelimiting causes the outgoing traffic to be strange [16:52:51] hmm okay, I had meaning to ask if it were possible for a VPS/Toolforge tool to have too many outgoing requests, if that's what you mean [16:53:36] something along those lines would seemingly be the only other explanation for this, aside from a prod network issue (which was my theory) [16:53:38] I'm the sort that googles my tcpdump commands, so I'm not the best person to help. However, I know that bit mangles traffic a bit on one end or the other [16:54:19] I don't think we limit outgoing requests per-se from VPS [16:54:46] The ratelimiting drops packets not full requests, per se [16:55:26] yeah here the requests are timing out, and I couldn't find them in logstash which suggests the connection was never made [16:56:13] Requests timing out are not by design on the TF/VPS end. It would suggest that something is broken or overloaded on the other end or something isn't right in the VPS network. [16:57:00] well I have the same issue on Toolforge, which is what led me to believe it might be production [16:57:32] It could be! It could also be WMCS network because that's going to be affected the same way by a network issue in VPS> [16:57:46] also the https://xtools.wmflabs.org/authorship tool is used frequently, and that tool makes requests to https://wikiwho.net. Those requests are not timing out, only requests to Wikimedia services [16:58:18] I feel like if it were a WMCS thing the WikIWho requests would be timing out too, since that tool is used so much [16:58:36] Ah. Then yeah, that sounds like production or the Cloud->production connection [16:59:57] Checking a thing... [17:00:39] okay, good that we're narrowing it down, I guess. I'm not sure how to debug further! But the issue is becoming rampant, 10-15 requests to production each minute are timing out. I can't be alone on this, since my bots on Toolforge are seeing the same issue [17:01:45] Hmm, yeah, we haven't changed the hacks on that end. Was just checking if we'd applied things yet. [17:05:52] Checking with the others on tcpdump commands and things [17:06:09] as well as in case something comes to mind [17:08:44] might help to limit the tcpdump command to a specific host and port, something like `tcpdump -i eth0 host en.wikipedia.org and port 443` [17:14:44] musikanimal: do you have an existing pcap file that captured a timeout? [17:17:53] yes, /home/musikanimal/webserver.pcap on xtools-prod07. I didn't see the outgoing request in there, though https://phabricator.wikimedia.org/T249035#6016364 [17:53:36] tgr: if you're still involved with wikispore, can you have a look at roebling.wikispore.eqiad.wmflabs? It's disk is full which is breaking various things [17:53:43] probably just need to wipe out vagrant logs or the like [18:23:05] !log tools spin up tools-redis-1003 on stretch and connect to the cluster T248929 [18:23:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:23:08] T248929: Rebuild tools-redis servers as buster - https://phabricator.wikimedia.org/T248929 [18:31:31] andrewbogott: I have no idea what that box is for, probably a good time to kill it [18:37:32] tgr: ok :) Will you do the honors? [18:38:11] it was created by Jeremyb [18:46:15] https://openstack-browser.toolforge.org/server/roebling.wikispore.eqiad.wmflabs [18:53:11] !log tools spin up tools-redis-1004 on stretch and connect to cluster T248929 [18:53:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:53:15] T248929: Rebuild tools-redis servers as stretch - https://phabricator.wikimedia.org/T248929 [18:55:55] !log wikispore deleting roebling and appleton, not used and triggering alarms [18:55:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikispore/SAL [18:59:54] there we go. [19:00:26] it's a little scary that when you delete an instance a random subset of the remaining instances also disappears [19:24:29] tgr: "it's a little scary that when you delete an instance a random subset of the remaining instances also disappears" -- in the Horizon UI? [19:29:10] yeah [20:18:54] !log techblog Syncing prod techblog db to develop server (upstream, not Cloud VPS) [20:18:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Techblog/SAL [20:41:40] !log tools deleting tools-redis-1003/4 to attach them to an anti-affinity group T248929 [20:41:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:41:43] T248929: Rebuild tools-redis servers as stretch - https://phabricator.wikimedia.org/T248929 [21:15:54] !log tools.zppixbot START - MediaWiki Upgrade - T249368 [21:15:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [21:15:57] T249368: ZppixBot Wiki Upgrade - 10:15 BST 2020-04-01 - https://phabricator.wikimedia.org/T249368 [21:19:42] !log tools.zppixbot tools.zppixbot@tools-sgebastion-07:~/ZppixBot/public_html/wiki$ git pull [21:19:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [21:35:01] !log tools.zppixbot reset and force the above due to conflict crazy [21:35:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [21:57:33] !log tools.zppixbot stopping webservice for now [21:57:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [22:12:22] !log tools.zppixbot restarted webservice due to update after fiddling with git - now updating submodules, then composer [22:12:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [22:32:51] !log tools switch tools-redis-1003 to the active redis server T248929 [22:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:32:54] T248929: Rebuild tools-redis servers as stretch - https://phabricator.wikimedia.org/T248929 [22:40:55] !log tools shut down tools-redis-1001/2 T248929 [22:40:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:40:59] T248929: Rebuild tools-redis servers as stretch - https://phabricator.wikimedia.org/T248929