[02:40:19] <_Somnifuguist> Hi- has something changed with the toolforge login server? I'm getting "Permission denied (publickey,hostbased)" when trying to ssh into it when it was working perfectly before. Might be something to do with having updated my os however. [03:05:45] Somnifuguist, double-check that the username you're trying to log into is correct and that your SSH public key matches the one in https://toolsadmin.wikimedia.org/profile/settings/ssh-keys/ [03:06:29] remember, the correct username is the UNIX shell username at https://toolsadmin.wikimedia.org/profile/settings/accounts/ [03:35:26] Thanks, have tried both those things but still get permission denied. Is it possible I've been blocked? I often use a vpn that is globally blocked by wikimedia, and would have logged into toolforge with it at some point. Other than that, it could be an os thing but am able to ssh into my own server without problem, so not sure. [11:24:44] !log admin [codfw1dev] enable puppet in puppetmaster01.cloudinfra-codfw1dev (disabled for unspecified reasons) [11:24:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:44:48] !log toolsbeta taking down one of the test-k8s etcd nodes to reimage (T267140) [14:44:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:44:52] T267140: [toolsbeta] Rebuild servers to learn how to take down the services without downtime - https://phabricator.wikimedia.org/T267140 [15:57:33] !log tools.quickcategories made tool readonly to avoid issues during ToolsDB maintenance [15:57:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL [16:24:20] !log clouddb-services set toolsdb to read-only T266587 [16:24:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [16:24:23] T266587: ToolsDB replication is broken - https://phabricator.wikimedia.org/T266587 [16:25:44] bstorm: best of luck! [16:26:06] Thanks! [16:37:58] !log admin icinga downtime toolschecker for 2h becasue toolsdb maintenance (T266587) [16:38:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:38:01] T266587: ToolsDB replication is broken - https://phabricator.wikimedia.org/T266587 [16:41:21] !log paws set paws in sqlite mode because T266587 (kubectl --namespace prod edit configmap hub-config) [16:41:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [17:15:18] !log toolsbeta removing unused toolsbeta-k8s-etcd prefix (we use toolsbeta-test-k8s-etcd) (T267140) [17:15:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:15:23] T267140: [toolsbeta] Rebuild servers to learn how to take down the services without downtime - https://phabricator.wikimedia.org/T267140 [17:18:27] !log toolsbeta launching instance toolsbeta-test-k8s-etcd-4 (T267140) [17:18:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:27:06] !log toolsbeta creating toolsbeta-docker-imagebuilder-01 (T267616) [18:27:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:27:10] T267616: Set up docker-registry and image builder infra in toolsbeta - https://phabricator.wikimedia.org/T267616 [18:49:19] !log toolsbeta associated floating IP to toolsbeta-docker-registry-01 and pointed DNS docker-registry.toolsbeta.wmflabs.org. at it [18:49:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [19:42:48] !log toolsbeta safelisted "argocd" namespace with namespaceSelector for registry-admission controller [19:42:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [19:45:12] !log tools rebooting tools-sgeexec-0950; OOM [19:45:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:16:44] !log paws restart hub to apply move to sqlite. T267667 [20:16:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [20:16:47] T267667: 500 Internal Server Error while trying to log into PAWS - https://phabricator.wikimedia.org/T267667 [20:25:43] Thanks, chicocvenancio, for restarting the PAWS server. [20:26:09] you're welcome. [20:40:32] Daniel_Mietchen: I just commented on T266948. Maybe you're not saving your notebooks manually? [20:40:33] T266948: Allow users to trigger re-syncing of live and public versions of a PAWS notebook - https://phabricator.wikimedia.org/T266948 [21:07:19] hello. what's wrong with labsdb? all my connections are hanging with "Waiting for global read lock" [21:17:45] maintenance [21:18:44] leloiandudu, https://lists.wikimedia.org/pipermail/cloud/2020-November/001287.html [21:23:43] the last message on that thread was 2 days ago? https://lists.wikimedia.org/pipermail/cloud-announce/2020-November/000339.html [21:28:09] that is a different thread [21:35:58] leloiandudu: We are trying to restore replication, which broke and needs a rebuilt replica [21:36:19] That requires an extended period of the database in read-only mode to get a clean dump from it [21:36:31] It's grown to over a TB, so it's taken most of the day so far [21:41:36] bstorm: thanks. do you have any thoughts on how much longer it might take? there's a global wikipedia event going on (Asian Month) and people are complaining because my tool is central to this event [21:42:30] I wish I had a good estimate. I am not entirely sure. It should complete today, and I am watching it. When it completes, I'll undo the read lock as soon as I can. [21:42:59] The thing is that without replication, there's no way to recover the database if anything happens to it. [21:43:32] So right now, it's a single point of failure. I wasn't aware about the event or I might have tried to schedule this differently. [21:44:20] I certainly couldn't let it sit for a month, though. The DB has crashed once in the past few months as is. We need a replica to keep things reasonably safe. [21:44:37] well, it's a whole month event, so it's probably impossible anyway [21:44:46] 👍🏻 [21:45:24] I'll send an email to the cloud-announce list as soon as I can get things back to normal [21:58:39] bstorm: thank you [23:52:18] hello! where can I find the implementation for `jsub`? I want to use the `-l h_rt=` but it seems it's unsupported, so I guess I need to use `qsub` instead. I wanted to make sure I'm otherwise doing the same thing jsub is