[00:05:55] FIRING: MaxConntrack: Max conntrack at 82.1% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:28:27] 10Toolforge (Quota-requests): Request increased build quota for toc Toolforge tool - https://phabricator.wikimedia.org/T398780#10977306 (10JJMC89) [00:50:55] RESOLVED: MaxConntrack: Max conntrack at 80.37% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:52:56] FIRING: MaxConntrack: Max conntrack at 80.18% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:57:55] RESOLVED: MaxConntrack: Max conntrack at 80.18% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:38:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:18:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:18:35] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:23:35] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:24:35] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:29:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [03:34:35] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:36:04] 10Tool-translatetagger: Create Gadget to Simplify Workflow for Adding Translation Tag - https://phabricator.wikimedia.org/T393170#10977356 (10Super_nabla) Well done! Nice tool. @Gopavasanth, I forked your project and tried to add support for "tvar"s: https://github.com/super-nabla/translatable-wikitext-converter... [03:37:51] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29361 bytes in 4.056 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [03:39:10] 10Tool-translatetagger: Adding tvars for links - https://phabricator.wikimedia.org/T393258#10977359 (10Super_nabla) I tried to address this in a fork: https://github.com/super-nabla/translatable-wikitext-converter. I also posted this in T393170 [03:42:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [03:43:51] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29769 bytes in 2.880 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [03:46:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [03:47:55] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29771 bytes in 7.267 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [03:50:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [03:51:49] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29770 bytes in 0.837 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [04:20:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [04:21:51] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29769 bytes in 3.216 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [04:24:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [04:29:55] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29772 bytes in 7.457 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [04:35:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [04:44:51] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29507 bytes in 3.630 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [05:17:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [05:22:55] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29769 bytes in 6.472 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [05:25:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [05:26:55] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29872 bytes in 6.695 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [05:29:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [05:30:51] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29771 bytes in 2.899 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [05:53:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [05:58:53] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29771 bytes in 4.479 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [06:02:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [06:12:49] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29770 bytes in 0.496 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [06:26:23] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/22 (owner: 10l10n-bot) [06:26:28] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/22 (owner: 10l10n-bot) [06:30:16] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/39 (owner: 10l10n-bot) [06:30:20] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/39 (owner: 10l10n-bot) [06:32:23] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/3 (owner: 10l10n-bot) [06:32:31] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/3 (owner: 10l10n-bot) [07:01:32] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 4 deleted instances on gitlab-runners-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [07:22:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance runner-1035 in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:24:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [07:25:49] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29769 bytes in 1.375 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [07:32:28] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance runner-1035 in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [07:38:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:48:34] (03open) 10taavi: logging: loki: Add network policy for jobs-api read access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/868 (https://phabricator.wikimedia.org/T398645) [07:48:36] (03update) 10taavi: logging: loki: Add network policy for jobs-api read access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/868 (https://phabricator.wikimedia.org/T398645) [07:50:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [07:54:01] (03open) 10taavi: logging: alloy: Deploy to double the workers [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/869 (https://phabricator.wikimedia.org/T386480) [07:54:04] (03update) 10taavi: logging: alloy: Deploy to double the workers [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/869 (https://phabricator.wikimedia.org/T386480) [07:56:12] (03update) 10taavi: logging: alloy: Deploy to double the workers [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/869 (https://phabricator.wikimedia.org/T386480) [07:56:14] (03update) 10taavi: logging: loki: Add network policy for jobs-api read access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/868 (https://phabricator.wikimedia.org/T398645) [07:58:55] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29781 bytes in 7.826 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [08:01:49] (03update) 10dcaro: [maintain-harbor] remove "environment" from charts [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/867 (https://phabricator.wikimedia.org/T396504) (owner: 10raymond-ndibe) [08:02:57] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [08:05:34] (03approved) 10dcaro: logging: loki: Add network policy for jobs-api read access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/868 (https://phabricator.wikimedia.org/T398645) (owner: 10taavi) [08:05:48] (03approved) 10dcaro: logging: alloy: Deploy to double the workers [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/869 (https://phabricator.wikimedia.org/T386480) (owner: 10taavi) [08:08:50] (03approved) 10dcaro: logs: Move multi-pod fix from jobs-api to here [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/82 (https://phabricator.wikimedia.org/T398647) (owner: 10taavi) [08:09:49] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29855 bytes in 0.721 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [08:14:23] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [maintain-harbor,ci] the version number does not get bumped on every release - https://phabricator.wikimedia.org/T396504#10977659 (10dcaro) I think this was because of the CI settings in the repo: https://gitlab.wikimedia.org/repos/cloud/toolforge/toolf... [08:15:41] 10Toolforge (Quota-requests): Request increased build quota for toc Toolforge tool - https://phabricator.wikimedia.org/T398780#10977672 (10Kanashimi) I checked the other tools and found that some of them need to increase quota: signature-check needs to increase CPU mgp-cewbot needs to increase memory to 32G [08:17:08] 10Toolforge (Quota-requests): Request increased build quota for toc Toolforge tool - https://phabricator.wikimedia.org/T398780#10977674 (10dcaro) @Kanashimi Do you have an estimation of the extra CPU that you need? (double the current, 50% increase, 10CPUs, ...) [08:17:33] (03update) 10taavi: logs: Move multi-pod fix from jobs-api to here [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/82 (https://phabricator.wikimedia.org/T398647) [08:17:47] (03merge) 10taavi: logs: Move multi-pod fix from jobs-api to here [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/82 (https://phabricator.wikimedia.org/T398647) [08:17:55] (03merge) 10taavi: logging: loki: Add network policy for jobs-api read access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/868 (https://phabricator.wikimedia.org/T398645) [08:18:16] (03update) 10taavi: logging: alloy: Deploy to double the workers [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/869 (https://phabricator.wikimedia.org/T386480) [08:19:00] (03merge) 10taavi: logging: alloy: Deploy to double the workers [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/869 (https://phabricator.wikimedia.org/T386480) [08:19:15] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component logging [08:19:33] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging [08:21:12] 10Toolforge (Quota-requests): Request increased build quota for toc Toolforge tool - https://phabricator.wikimedia.org/T398780#10977710 (10Kanashimi) This will allow the robots to perform safely: toc 16 CPUs signature-check 12 CPUs [08:21:18] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component logging [08:21:25] (03update) 10dcaro: [jobs-cli] quota refactor [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/97 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [08:24:34] 10VPS-project-Phabricator, 06collaboration-services, 06Release-Engineering-Team: Add the 'other assignee' field to the Phabricator test instance - https://phabricator.wikimedia.org/T398732#10977729 (10LSobanski) @brennen, @Dzahn for awareness [08:25:14] (03open) 10taavi: Tag v1.6.10 release [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/83 [08:25:18] (03update) 10taavi: Tag v1.6.10 release [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/83 [08:25:26] (03update) 10taavi: Tag v1.6.10 release [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/83 [08:26:31] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging [08:32:55] (03PS1) 10Klausman: hiera/deployment-server: change name of MT AWS user [labs/private] - 10https://gerrit.wikimedia.org/r/1166754 (https://phabricator.wikimedia.org/T335491) [08:42:44] (03CR) 10Elukey: [C:03+1] hiera/deployment-server: change name of MT AWS user [labs/private] - 10https://gerrit.wikimedia.org/r/1166754 (https://phabricator.wikimedia.org/T335491) (owner: 10Klausman) [08:43:11] (03CR) 10Klausman: [V:03+2 C:03+2] hiera/deployment-server: change name of MT AWS user [labs/private] - 10https://gerrit.wikimedia.org/r/1166754 (https://phabricator.wikimedia.org/T335491) (owner: 10Klausman) [08:55:34] (03open) 10dcaro: Draft: toolforge_get_versions: fix cli version detection [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/870 [08:56:39] (03update) 10dcaro: Draft: toolforge_get_versions: fix cli version detection [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/870 [09:00:45] 06cloud-services-team, 10Toolforge, 07Epic: [Epic] Toolforge UI: Discovery - https://phabricator.wikimedia.org/T375914#10977950 (10Sarai-WMF) [09:05:16] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#10977972 (10dcaro) >>! In T394333#10973902, @ayounsi wrote: > There is currently only one switch per rack, so I suggest we only use one uplink for now, and... [09:06:10] 10Toolforge (Quota-requests): Request increased build quota for toc Toolforge tool - https://phabricator.wikimedia.org/T398780#10977973 (10dcaro) signature-check is a different tool? Can you open a task for it if so? +1 on toc [09:18:00] 10Toolforge (Quota-requests): Request increased build quota for toc Toolforge tool - https://phabricator.wikimedia.org/T398780#10978025 (10dcaro) Actually, @Kanashimi, it seems that your jobs are hitting the default per-job limit: https://grafana.wmcloud.org/d/TJuKfnt4z/kubernetes-namespace?orgId=1&from=now-24h&... [09:41:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:52:06] (03approved) 10dcaro: [jobs-cli] quota refactor [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/97 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [09:52:10] (03update) 10dcaro: [jobs-cli] quota refactor [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/97 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [10:19:15] 06cloud-services-team, 10Toolforge: `information_schema.views` takes long time to query - https://phabricator.wikimedia.org/T398808 (10Hamishcn) 03NEW [10:21:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [10:28:55] FIRING: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown [10:29:28] FIRING: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:33:56] RESOLVED: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown [10:34:28] RESOLVED: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:45:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance runner-1038 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:48:09] 06cloud-services-team, 10Toolforge: toolsbeta paging - https://phabricator.wikimedia.org/T396038#10978423 (10taavi) The logged alert data from the last time looks like this: `lang=json { "alertname": "HarborComponentDown", "component": "redis", "description": "Component redis is reporting as down, toolfo... [10:51:19] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: toolsbeta paging - https://phabricator.wikimedia.org/T396038#10978437 (10taavi) a:03taavi [10:55:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance runner-1038 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:56:23] 06cloud-services-team, 10Cloud-VPS: metricsinfra: Karma now showing some alerts twice - https://phabricator.wikimedia.org/T398812 (10taavi) 03NEW p:05Triage→03Medium [10:57:27] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: metricsinfra: Karma now showing some alerts twice - https://phabricator.wikimedia.org/T398812#10978483 (10taavi) a:03taavi [10:58:00] (03PS1) 10Majavah: alertmanager: Add explicit blackhole rule for remaining alerts [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1166795 (https://phabricator.wikimedia.org/T398812) [11:08:35] (03merge) 10taavi: Tag v1.6.10 release [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/83 [11:10:32] (03update) 10taavi: Tag v1.6.10 release [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/83 [11:11:31] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld [11:22:47] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld [11:23:00] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld [11:23:17] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld [11:23:35] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld [11:23:55] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld [11:28:01] (03update) 10taavi: Query logs from Loki [repos/cloud/toolforge/jobs-api] (taavi/logging) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/180 (https://phabricator.wikimedia.org/T398645) [11:28:01] (03update) 10taavi: Draft: Use logging multi-pod fix moved to toolforge-weld [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/179 (https://phabricator.wikimedia.org/T398647) [11:28:40] (03update) 10taavi: Use logging multi-pod fix moved to toolforge-weld [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/179 (https://phabricator.wikimedia.org/T398647) [11:28:45] (03update) 10taavi: Use logging multi-pod fix moved to toolforge-weld [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/179 (https://phabricator.wikimedia.org/T398647) [11:38:56] FIRING: CloudVPSDesignateLeaks: Detected 8 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:16:05] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:16:28] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [jobs-api] expose health_check.type deprecation metrics - https://phabricator.wikimedia.org/T396236#10978684 (10Raymond_Ndibe) we decided to no longer not deprecate the health_check.type key anymore for the api, and only change this inside the code. As... [12:16:47] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [jobs-api] expose health_check.type deprecation metrics - https://phabricator.wikimedia.org/T396236#10978686 (10Raymond_Ndibe) 05In progress→03Declined [12:17:43] 06cloud-services-team, 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [jobs-api] Introduce deprecation metrics - https://phabricator.wikimedia.org/T390137#10978688 (10Raymond_Ndibe) 05In progress→03Resolved [12:17:44] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [jobs-api] expose health_check.type deprecation metrics - https://phabricator.wikimedia.org/T396236#10978691 (10Raymond_Ndibe) 05Declined→03Resolved [12:24:38] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/40 [12:26:00] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1166814 (owner: 10L10n-bot) [12:26:35] (03close) 10raymond-ndibe: [maintain-harbor] remove "environment" from charts [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/867 (https://phabricator.wikimedia.org/T396504) [12:30:56] (03update) 10taavi: Query logs from Loki [repos/cloud/toolforge/jobs-api] (taavi/logging) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/180 (https://phabricator.wikimedia.org/T398645) [12:40:28] FIRING: InstanceDown: Project gitlab-runners instance runner-1030 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:43:56] (03update) 10taavi: Query logs from Loki [repos/cloud/toolforge/jobs-api] (taavi/logging) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/180 (https://phabricator.wikimedia.org/T398645) [12:45:28] RESOLVED: InstanceDown: Project gitlab-runners instance runner-1030 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:47:30] (03update) 10dcaro: runtime.k8s.image: periodically refresh image-config data [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/160 (https://phabricator.wikimedia.org/T357112) (owner: 10raymond-ndibe) [12:48:36] (03update) 10taavi: Query logs from Loki [repos/cloud/toolforge/jobs-api] (taavi/logging) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/180 (https://phabricator.wikimedia.org/T398645) [12:48:45] (03update) 10taavi: Query logs from Loki [repos/cloud/toolforge/jobs-api] (taavi/logging) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/180 (https://phabricator.wikimedia.org/T398645) [13:13:07] (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [13:13:12] (03update) 10raymond-ndibe: Draft: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [13:13:20] (03update) 10raymond-ndibe: Draft: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [13:13:34] (03update) 10raymond-ndibe: Draft: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [13:17:05] (03CR) 10David Caro: [C:03+1] alertmanager: Add explicit blackhole rule for remaining alerts [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1166795 (https://phabricator.wikimedia.org/T398812) (owner: 10Majavah) [13:17:40] (03CR) 10Majavah: [C:03+2] alertmanager: Add explicit blackhole rule for remaining alerts [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1166795 (https://phabricator.wikimedia.org/T398812) (owner: 10Majavah) [13:18:25] 06cloud-services-team, 10Toolforge: [toolsdb] `information_schema.views` takes long time to query - https://phabricator.wikimedia.org/T398808#10978877 (10dcaro) [13:19:15] (03Merged) 10jenkins-bot: alertmanager: Add explicit blackhole rule for remaining alerts [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1166795 (https://phabricator.wikimedia.org/T398812) (owner: 10Majavah) [13:19:16] (03approved) 10dcaro: Deploy logging stack by default [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/253 (https://phabricator.wikimedia.org/T386480) (owner: 10taavi) [13:32:38] 06cloud-services-team, 10Toolforge: [components-api,beta] Generated configs should contain cpu values as numbers, not strings - https://phabricator.wikimedia.org/T398497#10978960 (10dcaro) Had a quick look, and it seems k8s might reply with a string instead of a float sometimes (not clear when though): https:/... [13:47:42] (03PS1) 10David Caro: toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) [13:51:20] (03CR) 10CI reject: [V:04-1] toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) (owner: 10David Caro) [13:51:45] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 10 deleted instances on gitlab-runners-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:01:38] (03PS2) 10David Caro: toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) [14:04:46] 06cloud-services-team, 10Cloud-VPS: metricsinfra: Karma now showing some alerts twice - https://phabricator.wikimedia.org/T398812#10979146 (10taavi) 05Open→03Resolved [14:05:09] (03CR) 10CI reject: [V:04-1] toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) (owner: 10David Caro) [14:08:19] (03update) 10raymond-ndibe: Draft: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [14:14:42] (03update) 10raymond-ndibe: Draft: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [14:17:22] (03merge) 10taavi: Deploy logging stack by default [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/253 (https://phabricator.wikimedia.org/T386480) [15:00:39] (03update) 10raymond-ndibe: Draft: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [15:04:52] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: toolsbeta paging - https://phabricator.wikimedia.org/T396038#10979650 (10taavi) 05Open→03Resolved Optimistically closing. [15:25:02] 10VPS-project-Phabricator, 06collaboration-services, 06Release-Engineering-Team: Add the 'other assignee' field to the Phabricator test instance - https://phabricator.wikimedia.org/T398732#10979764 (10LSobanski) a:03Dzahn [15:34:27] (03PS3) 10David Caro: toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) [15:38:14] (03CR) 10CI reject: [V:04-1] toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) (owner: 10David Caro) [15:49:44] (03PS4) 10David Caro: toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) [15:58:33] (03PS5) 10David Caro: toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) [15:59:37] (03PS6) 10David Caro: toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) [16:00:19] (03CR) 10David Caro: "I think this is ready for review, I tested some, but I'll fully test when doing the first deployment of misctools before merging." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) (owner: 10David Caro) [16:19:23] 06cloud-services-team, 10Toolforge: [toolsdb] `information_schema.views` takes long time to query - https://phabricator.wikimedia.org/T398808#10980017 (10dcaro) In the help for mysql they seem to point only to optimizing the querying the client does, as in avoid querying all the tables and such: https://dev.m... [16:27:19] (03update) 10dcaro: tool-config: export the config schema [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/98 (https://phabricator.wikimedia.org/T397724) [16:27:47] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/40 (owner: 10l10n-bot) [16:27:49] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/40 (owner: 10l10n-bot) [16:30:25] (03update) 10raymond-ndibe: Draft: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [16:31:09] 06cloud-services-team, 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [jobs-api] Jobs API should query logs from Loki - https://phabricator.wikimedia.org/T398645#10980069 (10dcaro) [16:31:20] 06cloud-services-team, 10Toolforge (Toolforge iteration 21): [components-api,beta] CI pipelines should wait until Toolforge deployment is 100% successful - https://phabricator.wikimedia.org/T398485#10980071 (10dcaro) [16:39:46] (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [16:45:02] (03PS1) 10Btullis: Add fake cephx key data for the new cephosd cluster in codfw [labs/private] - 10https://gerrit.wikimedia.org/r/1166871 (https://phabricator.wikimedia.org/T374923) [16:49:50] (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [16:54:48] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Store logs for Cloud VPS egress NAT mappings - https://phabricator.wikimedia.org/T273734#10980129 (10taavi) 05Open→03Resolved Deployed, and added a note to https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Network#routing_source_ip. [16:54:55] (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [16:55:51] (03open) 10dcaro: deploy: allow retrieving a deploy with a token [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/109 [17:02:19] (03update) 10dcaro: deploy: allow retrieving a deploy with a token [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/109 [17:02:59] (03CR) 10Btullis: [V:03+2 C:03+2] Add fake cephx key data for the new cephosd cluster in codfw [labs/private] - 10https://gerrit.wikimedia.org/r/1166871 (https://phabricator.wikimedia.org/T374923) (owner: 10Btullis) [17:07:34] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 [17:07:56] 10VPS-project-Phabricator, 06collaboration-services, 06Release-Engineering-Team: Add the 'other assignee' field to the Phabricator test instance - https://phabricator.wikimedia.org/T398732#10980167 (10Aklapper) I'd say we shouldn't spend time manually setting stuff but fix {T398460} and make the test instanc... [17:12:25] (03update) 10dcaro: toolforge_get_versions: fix cli version detection [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/870 [17:13:41] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-53 [17:15:44] (03update) 10dcaro: build: Upgrade Poetry dependencies [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/26 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:43:38] (03update) 10raymond-ndibe: [jobs-api] check services diff [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/158 (https://phabricator.wikimedia.org/T392717) [17:49:42] (03update) 10raymond-ndibe: [jobs-api] check services diff [repos/cloud/toolforge/jobs-api] (fix_diff_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/158 (https://phabricator.wikimedia.org/T392717) [18:01:45] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-53 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:12:02] 10VPS-project-Phabricator, 06collaboration-services, 06Release-Engineering-Team: Add the 'other assignee' field to the Phabricator test instance - https://phabricator.wikimedia.org/T398732#10980302 (10Dzahn) To me the problem to fix is "test instance config is different from prod instance" regardless of the... [18:14:24] 10VPS-project-Phabricator, 06collaboration-services, 06Release-Engineering-Team: Add the 'other assignee' field to the Phabricator test instance - https://phabricator.wikimedia.org/T398732#10980307 (10Dzahn) This gets back to the old issue of: "a test instance is supposed to be like production, otherwise it... [18:14:41] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 13Patch-For-Review: Add global_edit_count to wikireplicas - https://phabricator.wikimedia.org/T344108#10980310 (10SD0001) This will be useful for T355594 too. My script for generating voter lists for global elections can run quite faster if it can... [18:53:31] (03update) 10raymond-ndibe: [jobs-api] check services diff [repos/cloud/toolforge/jobs-api] (fix_diff_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/158 (https://phabricator.wikimedia.org/T392717) [18:56:28] (03update) 10raymond-ndibe: [jobs-api] check services diff [repos/cloud/toolforge/jobs-api] (fix_diff_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/158 (https://phabricator.wikimedia.org/T392717) [19:11:11] (03update) 10raymond-ndibe: [jobs-api] check services diff [repos/cloud/toolforge/jobs-api] (fix_diff_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/158 (https://phabricator.wikimedia.org/T392717) [19:14:17] (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [19:38:08] 10VPS-project-Phabricator, 06collaboration-services, 06Release-Engineering-Team: Add the 'other assignee' field to the Phabricator test instance - https://phabricator.wikimedia.org/T398732#10980682 (10A_smart_kitten) If I've accidentally opened a can of worms with this task(!), let it be known that I would a... [19:38:24] (03update) 10raymond-ndibe: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [19:39:41] (03update) 10raymond-ndibe: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [19:40:09] 10VPS-project-Phabricator, 06collaboration-services, 06Release-Engineering-Team: Add the 'other assignee' field to the Phabricator test instance - https://phabricator.wikimedia.org/T398732#10980701 (10A_smart_kitten) In case I've accidentally opened a can of worms with this task (apologies if so!), I'd be fi... [19:41:21] (03update) 10raymond-ndibe: [jobs-cli] refactor job payload [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/98 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [19:51:26] (03update) 10raymond-ndibe: [jobs-cli] refactor job payload [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/98 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [19:51:39] (03update) 10raymond-ndibe: [jobs-cli] refactor job payload [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/98 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [19:58:06] (03update) 10raymond-ndibe: openapi: added several servers [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/107 (owner: 10dcaro) [19:58:57] (03approved) 10raymond-ndibe: openapi: added several servers [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/107 (owner: 10dcaro) [20:01:21] (03update) 10raymond-ndibe: [envvars-cli] print error string and not dict [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T360147) [20:06:48] (03approved) 10raymond-ndibe: toolforge_get_versions: fix cli version detection [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/870 (owner: 10dcaro) [20:06:48] (03update) 10raymond-ndibe: toolforge_get_versions: fix cli version detection [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/870 (owner: 10dcaro) [20:17:22] 10VPS-project-Codesearch: Add the "Wikimedia Portals" project to Codesearch - https://phabricator.wikimedia.org/T398871 (10Amire80) 03NEW [20:17:34] 10VPS-project-Codesearch, 10Wikimedia-Portals: Add the "Wikimedia Portals" project to Codesearch - https://phabricator.wikimedia.org/T398871#10980889 (10Amire80) [20:30:55] (03PS1) 10Ladsgroup: Add portals [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1166924 (https://phabricator.wikimedia.org/T398871) [20:35:39] (03CR) 10Ladsgroup: [C:03+2] Add portals [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1166924 (https://phabricator.wikimedia.org/T398871) (owner: 10Ladsgroup) [20:36:28] (03Merged) 10jenkins-bot: Add portals [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1166924 (https://phabricator.wikimedia.org/T398871) (owner: 10Ladsgroup) [20:36:33] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [20:36:44] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [20:36:58] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [20:37:04] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [20:37:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [20:37:27] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [20:38:31] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [20:38:39] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [20:38:46] 10VPS-project-Codesearch, 10Wikimedia-Portals, 13Patch-For-Review: Add the "Wikimedia Portals" project to Codesearch - https://phabricator.wikimedia.org/T398871#10980961 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup It'll be added by the next 24 hours. If it doesn't, ping me. [20:44:02] (03open) 10raymond-ndibe: [quota] inc deprecated counter for quota requests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/181 (https://phabricator.wikimedia.org/T389118) [20:44:15] (03update) 10raymond-ndibe: [quota] inc deprecated counter for quota requests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/181 (https://phabricator.wikimedia.org/T389118) [20:44:30] (03update) 10raymond-ndibe: [quota] inc deprecated counter for quota requests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/181 (https://phabricator.wikimedia.org/T389118) [20:44:45] (03update) 10raymond-ndibe: [quota] inc deprecated counter for quota requests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/181 (https://phabricator.wikimedia.org/T389118) [20:50:32] (03update) 10raymond-ndibe: [jobs-cli] quota refactor [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/97 (https://phabricator.wikimedia.org/T389118) [20:59:58] (03PS1) 10Andrew Bogott: wmcs_libs/ceph.py: support changes in --ok-to-stop output in Octopus [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166929 [21:01:50] 06cloud-services-team, 10Data-Services, 10Wikifunctions, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07Essential-Work: Make wikifunctionsclient_usage table available on cloud wiki replicas - https://phabricator.wikimedia.org/T392475#10981024 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.upda... [21:03:18] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [21:05:17] (03PS2) 10Andrew Bogott: wmcs_libs/ceph.py: support changes in ok-to-stop output in Pacific [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166929 [21:05:55] 06cloud-services-team, 10Data-Services, 10Wikifunctions, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07Essential-Work: Make wikifunctionsclient_usage table available on cloud wiki replicas - https://phabricator.wikimedia.org/T392475#10981034 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.upda... [21:06:03] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=97) [21:06:08] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [21:06:49] 06cloud-services-team, 10Data-Services, 10Wikifunctions, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07Essential-Work: Make wikifunctionsclient_usage table available on cloud wiki replicas - https://phabricator.wikimedia.org/T392475#10981035 (10Ladsgroup) 05In progress→03Resolved This should be d... [21:19:07] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [21:21:07] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [21:24:19] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [22:48:55] FIRING: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown [22:49:28] FIRING: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [22:51:21] 10Toolforge (Quota-requests): Request increased build quota for toc Toolforge tool - https://phabricator.wikimedia.org/T398780#10981423 (10Kanashimi) > signature-check is a different tool? Can you open a task for it if so? Yes, they are different tools. also I need to add quota to a total of three tools: toc, s... [23:31:04] 10VPS-project-Phabricator, 06collaboration-services, 06Release-Engineering-Team: Add the 'other assignee' field to the Phabricator test instance - https://phabricator.wikimedia.org/T398732#10981512 (10Dzahn) @A_smart_kitten You can be bold with this one. If you just want to add it, please go ahead! And thank...