[04:44:20] morning, I'm trying to create a DNS zone in horizon under the traffic project but I get a pretty generic error "Error: Unable to create the zone.", If I check the browser request to the API, I got an HTTP 500 response with the following content: '"403 Client Error: FORBIDDEN for url: http://cloudservices1003.wikimedia.org:9001/v2/zones/"' [05:35:19] and regarding /var/lib/git/labs/private, I'm seeing some commits tagged as "[LOCAL]" that of course are not being pushed to https://gerrit.wikimedia.org/g/labs/private, those secrets are being backed up somehow? [07:01:05] arturo: ^^ let me know when you are around to check the DNS zone config please :) [07:57:39] o/ vgutierrez [07:58:02] morning sir :) [07:58:25] RE: DNS zone, I wonder if you need quota [07:58:54] I can check in a few minutes [07:59:02] dunno, I'm setting an acme-chief server for the traffic project, and I need a DNS zone to fulfill dns-01 challenges [07:59:24] something pretty similar to what Kren.air set up for deployment-prep with beta.wmflabs.org. [07:59:39] regarding labs/private, I assume you are talking about a project puppetmaster? [07:59:49] yeah, I've figured that out already :) [07:59:59] ok :-P [08:00:03] I've used the same approach that they're using in deployment-prep puppetmaster [08:00:32] ok let me investigate about the zone, and will get back to you [08:00:36] thx [08:28:28] vgutierrez: what is the name of the DNS zone you are trying to create? [08:28:42] traffic.wmflabs.org. [08:33:30] I may have a clue of what is happening [08:33:36] vgutierrez: could you please create a phab task? [08:36:22] arturo: https://phabricator.wikimedia.org/T229783 [08:37:23] !log wmflabsdotorg promote myself to user & projectadmin to handle T229783 [08:37:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wmflabsdotorg/SAL [08:37:33] T229783: Unable to create DNS zone traffic.wmflabs.org. in Horizon - https://phabricator.wikimedia.org/T229783 [08:39:05] arturo: hmm have you seen Krenair comment on the task? [08:39:23] yup [08:43:54] !log wmflabsdotorg create `traffic.wmflabs.org` subdomain T229783 using the `wmcs-makedomain` script [08:43:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wmflabsdotorg/SAL [08:43:57] T229783: Unable to create DNS zone traffic.wmflabs.org. in Horizon - https://phabricator.wikimedia.org/T229783 [08:44:19] !log traffic created the `traffic.wmflabs.org` subdomain (T229783) [08:44:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Traffic/SAL [08:47:29] arturo: yeah, now I already see the DNS zone under the traffic project, thanks :) [08:47:58] cool [08:48:17] will update our docs, because we don't have much [08:48:36] arturo: BTW, what's the procedure to get a service user to update that DNS zone via API? [08:49:22] you will have to generate a credentials file and then load it/use it in your script when interacting with the API [08:49:55] in horizon I think you can click in your name in the top right corner to download your credentials file [08:50:15] yeah, but I don't want to user my personal user [08:50:20] *use [08:50:30] the service should be able to work even if I leave WMF :) [08:51:31] then a new robot/service account should be created I think. That would be another phab task :-) [08:51:43] yeah, a I need a service account [08:51:47] do you have a template for that kind of task? [08:51:54] I don't think so [08:52:04] that request is not common at all [08:52:08] ack [08:56:32] arturo: https://phabricator.wikimedia.org/T229786 <3 [09:30:30] !log tools `root@tools-checker-03:~# toolscheckerctl restart` (T229787) [09:30:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:30:34] T229787: Toolforge: sudden issues in both gridengine and k8s webservices - https://phabricator.wikimedia.org/T229787 [09:38:34] !log tools.svgtranslate Set TRUSTED_PROXIES=127.0.0.1,172.16.6.39 [09:38:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.svgtranslate/SAL [09:39:03] !log tools `root@tools-checker-03:~# toolscheckerctl restart` again (T229787) [09:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:39:06] T229787: Toolforge: sudden issues in both gridengine and k8s webservices - https://phabricator.wikimedia.org/T229787 [09:42:34] !log tools.svgtranslate Re-enabled auto deployments. [09:42:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.svgtranslate/SAL [10:07:43] arturo: so.. when is this WMCS team meeting happening? :) [10:09:32] tomorrow :-) [10:09:47] ack, thx [10:10:23] vgutierrez: do you need something sooner? like right now? [10:10:35] err it would be nice yes :) [10:10:57] as I mentioned in the task, I'm not aware of the current procedure/rules/workflow [10:11:21] I guess I could create an account for you as long as you are fine if we delete it tomorrow to introduce a proper account [10:11:50] on the other hand you can unblock your work a bit if using your personal user account meanwhile? [10:11:58] later, is just a matter of replacing API tokens? [10:12:48] nah.. I think we can wait [14:11:00] Is something wrong with the job grid? [14:11:28] Getting lot's of cron errors on tools about unable to send messages to qmaster. [14:17:46] ORES in Cloud VPS just went down. [14:18:51] Looks like our redis node just went offline [14:19:42] andrewbogott, ^ could this be related? [14:19:48] I also got icinga notification for gerrit-test going down [14:19:53] though it just recovered [14:20:21] 'PROBLEM - Host gerrit-test.git.eqiad.wmflabs is DOWN: CRITICAL - Host Unreachable (gerrit-test.git.eqiad.wmflabs)' [14:20:31] https://lists.wikimedia.org/pipermail/cloud-announce/2019-August/000201.html [14:21:30] oh [14:21:41] i missed that email [14:23:32] Aha! Thank you. [14:26:24] halfak: it looks like it's only taking 2-3 minutes per ores host, so far [14:26:31] so things should be back to normal soon [14:29:09] Cyberpower678: I moved the grid master this morning (https://lists.wikimedia.org/pipermail/cloud-announce/2019-July/000200.html) but things should be back to normal now [14:29:43] andrewbogott: ah. Thanks. [15:19:32] I can't `become anticompositebot` on Tooolforge; it says there is no such tool. I created the tool an hour or two ago. [16:10:23] !log tools `tools-k8s-master-01: systemctl restart maintain-kubeusers` T229846 [16:10:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:10:27] T229846: anticompositebot tool missing project directory - https://phabricator.wikimedia.org/T229846 [16:12:34] @AntiComposite there was an issue with the tool creation (maybe related to the maintenance) Could you try the become again? [16:12:49] tracking the issue at https://phabricator.wikimedia.org/T229846 [16:25:52] jeh, It worked now. Thanks! [17:17:06] !log admin Set downtime on gridengine and kubernetes webservice checks in icinga until 2019-09-02 (flaky tests) [17:17:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:42:27] hi. it seems that the server i am working closed my ssh connection spontaneously. the nginx inside is not running either. [17:42:34] is it because of these VM migrations? [17:47:46] there’s a full list of affected instances in https://lists.wikimedia.org/pipermail/cloud-announce/2019-August/000201.html [17:47:52] you can check if your instance is one of the ones that were migrated [17:48:49] yup [17:49:59] how long is it xpected to last? the two days? [17:51:03] I assumed the downtime for each individual VM would be fairly short – only as long as it takes to move to a new hypervisor [17:51:18] though I don’t really know how long that would be [17:55:06] thanks. let's hope it's few hours [20:36:29] !log tools rebooting oom tools-worker-1026 [20:36:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:24:36] !log deployment-prep deploying ores 4270244d4a54c520f581dd33b312aa52f9a4c736 [22:24:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [22:49:45] !log tools launching tools-worker-1040 [22:49:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL