[03:03:28] FIRING: GanetiMemoryPressure: Ganeti: High memory usage (94.75%) on ganeti1036:9100 - https://wikitech.wikimedia.org/wiki/Ganeti#Memory_pressure - https://grafana.wikimedia.org/d/gd6vep5Iz/ganeti-memory-pressure?orgId=1&var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DGanetiMemoryPressure [03:33:28] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [05:28:28] FIRING: SystemdUnitFailed: wmf_auto_restart_ipmiseld.service on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:54:10] 10netops, 06Infrastructure-Foundations: Enable gNMI on SRX devices and fasw - https://phabricator.wikimedia.org/T390052#10921326 (10ayounsi) From JTAC: > Engineering stated that JSD the process code that manages gRPC missed being shipped to this platform and they are working to push the grpc library code to th... [07:03:28] FIRING: GanetiMemoryPressure: Ganeti: High memory usage (94.73%) on ganeti1036:9100 - https://wikitech.wikimedia.org/wiki/Ganeti#Memory_pressure - https://grafana.wikimedia.org/d/gd6vep5Iz/ganeti-memory-pressure?orgId=1&var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DGanetiMemoryPressure [07:33:28] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [09:28:28] FIRING: SystemdUnitFailed: wmf_auto_restart_ipmiseld.service on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:08:28] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_ipmiseld.service on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:17:44] fwiw netbox-deploy (and all the others depoloy repo) does build fine on my recent mac [10:17:52] slyngs: ^^^ [10:19:53] Only if you have forced it to always use AMD64 images in docker. [10:20:51] I looked and didn't find any specific setting, but might be [10:20:53] You can set DOCKER_DEFAULT_PLATFORM [10:21:01] I haven't, checked [10:21:22] was just fyi, not that the change looks bad :) [10:22:28] We could also just add the DOCKER_DEFAULT_PLATFORM to the read me, that's fine to. I just want to avoid having people thinking to much about it. [10:23:27] It's also not the only bug, I'm not really sure how the build worked for 4.0.8 as Django 5.0 isn't supported on Python 3.9 that ships in Bullseye. [10:24:57] Aaah, how do you install Docker? [10:25:58] I have docker desktop [10:27:00] Ah, my guess was that installing docker with brew or something might change something, but I also just install Docker Desktop... [10:27:24] * volans trying to rebuild 4.0.8 locally [10:27:46] Thanks :-) [10:27:50] netbox is bookworm [10:28:13] But the build is using bulleye containers? [10:28:23] did you just run "make all" as the readme says? [10:28:28] Yes [10:28:36] make clean and then make all [10:28:51] it does [10:28:52] docker build --build-arg DEBIAN_VER=bookworm -f Dockerfile.build -t netbox-bookworm:local . [10:29:14] gives me a warning for the amd stuff [10:29:14] - InvalidBaseImagePlatform: Base image docker-registry.wikimedia.org/python3-build-bookworm was pulled with platform "linux/amd64", expected "linux/arm64" for current build (line 1) [10:29:53] so it's ok to fix it properly in the code [10:29:57] build completed, all fine [10:30:07] created artifacts/artifacts.bookworm.tar.gz [10:30:22] Makefile:DEBIAN_VER ?= bookworm [10:30:30] have you the local code up-to-date? [10:30:51] I'll just do a new checkout [10:31:33] 10netops, 06Infrastructure-Foundations, 06SRE: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153 (10cmooney) 03NEW p:05Triage→03Low [10:32:10] Aargh, I have two checkout for some reason. [10:40:25] make sure also that src/ is checkeout at the right tag [10:40:53] https://wikitech.wikimedia.org/wiki/Netbox#Build_deploy_repository [10:41:22] my ssuggestion is try to build the current checkout first [10:45:57] It's working now, you where correct that my netbox-deploy checkout was out of date. I'll just try to remove the platform stuff and see if it now figures it out automatically. [10:48:30] Yeah, that fixes the platform stuff as well. Not sure why though. [10:50:14] dunno :) [10:51:08] That's a little weird, but I'll take it. [10:52:10] I'll get lunch and move on [10:59:31] FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_atftpd.service on install7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:03:28] FIRING: [2x] GanetiMemoryPressure: Ganeti: High memory usage (95.06%) on ganeti1036:9100 - https://wikitech.wikimedia.org/wiki/Ganeti#Memory_pressure - https://grafana.wikimedia.org/d/gd6vep5Iz/ganeti-memory-pressure?orgId=1&var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DGanetiMemoryPressure [11:08:28] FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_atftpd.service on install7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:09:31] FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_atftpd.service on install7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:23:09] FYI, I'm upgrading cuminunpriv1001 to Bookworm [11:33:28] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [11:34:31] FIRING: [2x] SystemdUnitFailed: prometheus-puppet-agent-stats.service on cuminunpriv1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:03:28] FIRING: [2x] SystemdUnitFailed: prometheus-puppet-agent-stats.service on cuminunpriv1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:16:00] cuminunpriv1001 update is completed [14:58:44] folks, is this normal? Tue Jun 17 13:55:37 2025 - INFO: No-installation mode selected, disabling startup [14:58:49] during a ganeti VM creation [14:58:57] it is 14:58 UTC [14:59:25] I am looking at the docs but I am unsure where to begin [15:00:25] and gnt console gives Instance 'wikikube-worker-exp2001.codfw.wmnet' does not exist [15:00:52] I wonder if I made any typos in the cookbook command, which wouldnt be a first for me [15:03:28] FIRING: GanetiMemoryPressure: Ganeti: High memory usage (94.81%) on ganeti1036:9100 - https://wikitech.wikimedia.org/wiki/Ganeti#Memory_pressure - https://grafana.wikimedia.org/d/gd6vep5Iz/ganeti-memory-pressure?orgId=1&var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DGanetiMemoryPressure [15:07:29] effie: disk operations are blocking and there's currently a large VM being moved for some hardware maintenance, you're instance creation should resume in approx an hour [15:08:13] think https://xkcd.com/303/ for Ganeti [15:08:38] s/disk operations/disk allocations [15:09:23] are you saying that there is a VM more important than my VM? [15:09:30] lol [15:09:45] effie: did you buy a premium VM or just a regular one? [15:10:17] please get in touch with our sales rep to upgrade to premium VMs :-P [15:10:23] I got a password from my brother, I assumed he bought the premium service [15:10:44] I will complain to management [15:10:46] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924062 (10xcollazo) CC @BTullis [15:16:00] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924112 (10xcollazo) @cmooney: +1 to the change. Can you please share the link to this dashboard? [15:21:48] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924152 (10cmooney) >>! In T397153#10924112, @xcollazo wrote: > @cmooney: +1 to the change. > > Can you please share the link to this dashboard?... [15:33:28] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [15:36:14] ^ this could be me, there is wikikube-worker-exp2001 on site.pp but host is still being created :/ [15:40:00] don't worry there is a bug that causes that alert to fire much more often than it should [15:42:23] I will need to sign off in a bit, and the VM is still en route, what are my options? [15:42:36] ie how can I cause less trouble to anyone else [15:52:38] just wait it out, the cookbook will automatically resume when your job is in the queue [15:53:12] the "Report parity errors between PuppetDB and Netbox for physical devices" basically alerts all the time for various reasons, it's most likely not even your VM [16:03:28] FIRING: SystemdUnitFailed: wmf_auto_restart_ipmiseld.service on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:36:55] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924650 (10xcollazo) Now that I think more about this: I don't know where in puppet, but I am aware that we throttle any individual download to 3-6... [16:55:52] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924754 (10cmooney) >>! In T397153#10924650, @xcollazo wrote: > Perhaps this also includes rsync traffic? Yeah the throughput graphs include all... [19:03:28] FIRING: GanetiMemoryPressure: Ganeti: High memory usage (94.75%) on ganeti1036:9100 - https://wikitech.wikimedia.org/wiki/Ganeti#Memory_pressure - https://grafana.wikimedia.org/d/gd6vep5Iz/ganeti-memory-pressure?orgId=1&var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DGanetiMemoryPressure [19:33:28] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [20:03:28] FIRING: [2x] GanetiMemoryPressure: Ganeti: High memory usage (95.02%) on ganeti1036:9100 - https://wikitech.wikimedia.org/wiki/Ganeti#Memory_pressure - https://grafana.wikimedia.org/d/gd6vep5Iz/ganeti-memory-pressure?orgId=1&var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DGanetiMemoryPressure [20:03:28] FIRING: SystemdUnitFailed: wmf_auto_restart_ipmiseld.service on netmon1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:52:32] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10925689 (10xcollazo) >That said we you can see that many of the busiest times - as seen on the Grafana throughput graph - correlate with times when... [21:15:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [21:30:25] RESOLVED: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [23:33:28] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts