[10:17:26] hi, FYI the ceph alerts in icinga are UNKNOWN; prometheus-labmon.eqiad.wmnet still points to cloudmetrics1002 [10:23:45] mmmm [10:23:58] ok, let me fix that! [10:26:12] that name is unfortunate [10:26:32] I was only looking for cloud* stuff when failovering cloudmetrics1002 the other day [10:28:02] yeah I noticed it is a mixture of names :| [10:28:09] anyways thanks for investigating arturo [10:28:20] grafana-labs now asks me to login, but does not take my ldap creds [10:28:36] it should allow anon logins [10:28:39] however: `grafana-server[902]: t=2021-05-10T10:28:19+0000 lvl=eror msg="Anonymous access organization error: 'WMCS': organization not found"` [10:28:56] not sure how to fix that yet [10:29:16] this is merged now: https://gerrit.wikimedia.org/r/c/operations/dns/+/688213 [10:29:51] server admin -> orgs -> select current -> rename at least on my home grafana installation [10:31:01] I don't have any 'server admin' option [10:31:44] :/ [10:32:03] OK I think I fixed that one [10:32:06] then I have [10:32:09] `grafana-server[902]: t=2021-05-10T10:31:43+0000 lvl=eror msg="Could not find plugin definition for data source: datasource-plugin-genericdatasource"` [10:32:21] but all this makes me wonder, is this config not in puppet? :-S [10:32:35] if a database, isn't that replicated from the other server? [10:33:47] https://grafana-labs.wikimedia.org/d/000000012/tools-basic-alerts?orgId=1&refresh=5m works, which is enough for me :P [10:34:59] why are there so many erroring queues, I cleared those yesterday [10:38:01] godog: I see this: [10:38:26] https://usercontent.irccloud-cdn.com/file/fFGoYnpr/image.png [10:38:44] any idea how to update that tools.wmflabs.org URL [10:38:45] ? [10:40:35] actually something was wrong with the replication, that tools basic alerts board has a graph about k8s pod restarts but it's missing [10:41:03] ok [10:41:27] with replication you mean... between cloudmetrics servers? [10:42:15] I guess so? that dashboard is different on 1001 than what it was on 1002 [10:42:54] I'm afraid all changes we made to grafana as it was on 1002 are now lost. I doubt we had any kind of grafana-spècific replication between 1002 <-> 1002 [10:42:59] 1001 <-> 1002 [10:43:42] not a big deal, those can probably be recreated [10:43:43] honestly this whole cloudmetrics observability stack needs love and attention [10:43:57] * Majavah wonders what would be needed to be able to edit grafana dashboards [10:44:06] * arturo brb [10:44:44] https://github.com/wikimedia/puppet/blob/76e1f7cffcce02d4c303800eaa21a98698fdcd74/hieradata/role/common/wmcs/monitoring.yaml#L49 [11:00:17] o/ [11:00:51] ( wrong channel ^^' ) [11:02:23] Majavah: aren't you in that group' [11:02:24] ? [11:02:57] arturo: no [11:04:04] I've signed some sort of nda, but since toolforge root does not directly require anything in it, I'm not in it automatically [11:04:54] s/it/cn=nda/ [11:06:48] ok [11:50:15] !log toolsbeta testing ingress-nginx update https://gerrit.wikimedia.org/r/c/operations/puppet/+/685715 on toolsbeta T264221 [11:50:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:50:20] T264221: Upgrade the nginx ingress controller in Toolforge (and likely PAWS) - https://phabricator.wikimedia.org/T264221 [12:03:36] arturo: mmhh if the datasource isn't editable from the UI then must be in /etc/grafana [12:53:28] !log tools creating tools-k8s-haproxy-[3-4] to rebuild current ones without nfs and with keepalived [12:53:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:04:39] !log deployment-prep Forward renamed config name for improved template search features (T277028) [14:04:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [14:04:44] T277028: Add description to search and change title of component - https://phabricator.wikimedia.org/T277028 [14:04:53] !log deployment-prep Improve comment around ReferencePreviews beta cluster default (T271206) [14:04:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [14:04:56] T271206: Enable RefPreviews on first wikis - https://phabricator.wikimedia.org/T271206 [14:14:39] !log traffic fixing merge conflict in /labs/private [14:14:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Traffic/SAL [14:15:05] !log traffic fixing some confusion about puppetmaster name (.wmflabs vs. .cloud) [14:15:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Traffic/SAL [14:30:11] tarrow: Docker has filled up the drive on orig-01.wikibase-registry.eqiad1.wikimedia.cloud and it's failing in various sad ways. Can you do a cleanup there please? (or delete the VM if it's no longer good for anything) [14:30:22] right [14:30:52] andrewbogott: I feel I had this task to do a few months ago and I failed to finish it in some way :( [14:30:55] sorry :/ [14:31:17] It's not a problem, it just came up in a routine check of puppet breakage :) [14:34:15] andrewbogott: I hit the nuke button [14:34:18] thanks :) [14:34:21] that works! [14:35:53] yeah, the people who wanted it claimed that they don't need it any more. I'm happy to have less burden on us [14:57:38] !log tools allow tools-k8s-haproxy-[3-4] to use the tools-k8s-haproxy-keepalived-vip address (172.16.6.113) (T252239) [14:57:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:57:42] T252239: Rebuild tools-k8s-haproxy-* as an anti-affinity server group - https://phabricator.wikimedia.org/T252239 [15:03:33] !log tools clear all error states caused by overloaded exec nodes [15:03:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:06:35] !log tools carefully rolling out keepalived to tools-k8s-haproxy-[3-4] while making sure [1-2] do not have changes [15:06:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:22:43] !log tools change k8s.svc.tools.eqiad1.wikimedia.cloud. to point to the tools-k8s-haproxy-keepalived-vip address 172.16.6.113 (T252239) [15:22:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:22:47] T252239: Rebuild tools-k8s-haproxy-* as an anti-affinity server group - https://phabricator.wikimedia.org/T252239 [15:29:27] !log tools.fourohfour restarting, unresponsive [15:29:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.fourohfour/SAL [16:05:50] Was there any change in the last couple days that could explain why email sent from cloud VPS would not arrive ..but now it does? [16:06:01] anything with mail servers lately? [16:06:19] It's a good thing, it works now. just dont know why [16:07:00] mutante: we replaced the cloud vps outbound mail servers a week or two ago, but I don't think we did anything in the last day or two that would explain it starting again [16:08:09] Majavah: sometime between April 30 and today made it work. so I guess that was it. Though normally wouldn't expect changes in mail routing behaviour [16:08:39] thanks, I will just accept it is resolved [16:38:11] !log tools.lexeme-forms deployed 248527544d (l10n updates) [16:38:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [17:43:33] !log toolsbeta testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/688361 in toolsbeta T264221 [17:43:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:43:36] T264221: Upgrade the nginx ingress controller in Toolforge (and likely PAWS) - https://phabricator.wikimedia.org/T264221 [17:59:28] !log tools.ranker deployed 3c78f9f4b5 (remove dead code) [17:59:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL [19:31:23] I'm going to do a switchover of the backend instance for outreachdashboard.wmflabs.org. Is that something I can do just by updating the existing web proxy from Horizon? When I click 'update proxy' for it, both the 'Record' and 'Domain' fields in the mobile and non-editable, but the Domain field incorrectly shows "wmcloud.org" rather than "wmflabs.org". [19:33:27] ragesoss: looking... [19:33:52] ragesoss: what project is that in? [19:34:00] ("in the modal are non-editable" I meant) [19:34:05] globaleducation [19:35:08] I can still set up a new proxy for wmflabs, and maybe showing the default wmcloud domain instead of the actual domain of the edited proxy is just a UI bug? [19:36:02] yeah, it's definitely a UI bug I'm not sure if it will therefore create a new record or update the proper one [19:36:13] I will look at the cause of the bug... in the meantime you can try it or not as you prefer :) [19:36:31] I guess I can test it out with a not-in-use proxy that's already there. [19:37:11] oh, it errors: Error: Unable to update proxy: No such domain Details [19:37:11] can only concatenate str (not "int") to str [19:37:22] T282489 [19:37:23] T282489: Horizon proxy dashboard: edit dialog shows wmcloud.org even if a wmflabs.org proxy is being edited - https://phabricator.wikimedia.org/T282489 [19:37:37] ragesoss: that's all terrible, I will see if I can reproduce in a test env [19:42:22] !log toolsbeta setting profile::wmcs::kubeadm::docker_vol: false on ingress nodes [19:42:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [19:54:20] ragesoss: try now? [19:56:08] andrewbogott: yep, that works now! [19:56:26] great, sorry for the bug [19:56:44] no prob, thanks for the fix! [22:58:28] !log tools setting `profile::wmcs::kubeadm::docker_vol: false` on ingress nodes [22:58:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:58:49] !log tools cleared error state on a grid queue [22:58:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:08:58] !log tools.jouncebot Restart to revert 560a22a (T243394) [23:09:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.jouncebot/SAL