[04:26:17] 06Fundraising-Backlog, 10fundraising-tech-ops, 10Wikimedia-Fundraising-CiviCRM: Create audit directories for chariot (incoming, completed/ignored) - https://phabricator.wikimedia.org/T425186#11888304 (10Dwisehaupt) 05Open→03Resolved chariot has been added to the audit config and directories have been... [05:00:32] PROBLEM - Host frmon2002 is DOWN: PING CRITICAL - Packet loss = 100% [05:05:10] RECOVERY - Host frmon2002 is UP: PING OK - Packet loss = 0%, RTA = 30.43 ms [05:11:23] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [extensions/DonationInterface] (REL1_43) - 10https://gerrit.wikimedia.org/r/1282545 (owner: 10L10n-bot) [06:27:10] PROBLEM - check_memory on fransw2001 is CRITICAL: CRIT Memory 99% used. Largest process: trino-server-co (1176121) = 97.8% https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=fransw2001&service=check_memory [06:37:08] RECOVERY - check_memory on fransw2001 is OK: OK Memory 92% used https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=fransw2001&service=check_memory [09:03:18] 10fundraising-tech-ops, 10Observability-Alerting: Update firewall rules to allow frtech hosts to send alerts to production alertmanger - https://phabricator.wikimedia.org/T422888#11888746 (10Dwisehaupt) 05Open→03Resolved a:03Dwisehaupt This has been merged and verified from the frmon hosts. closing. [09:16:04] FIRING: DiskSpaceWarn: Disk space WARNING for (instance civi1002.frack.eqiad.wmnet:9100):/tmp - https://alerts.wikimedia.org/?q=alertname%3DDiskSpaceWarn [09:21:04] FIRING: [2x] DiskSpaceWarn: Disk space WARNING for (instance civi1002.frack.eqiad.wmnet:9100):/tmp - https://alerts.wikimedia.org/?q=alertname%3DDiskSpaceWarn [09:23:42] (03PS1) 10Lars SG: Setting to exclude location types from Email::Clean [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/1282926 (https://phabricator.wikimedia.org/T425195) [09:23:59] 06Fundraising-Backlog, 10fundraising-tech-ops, 10Wikimedia-Fundraising-CiviCRM: Create audit directories for chariot (incoming, completed/ignored) - https://phabricator.wikimedia.org/T425186#11888813 (10Eileenmcnaughton) So my yaml looks like api_key: sk_live_5490 # Default directory where reports will... [09:24:14] (03PS2) 10Lars SG: Setting to exclude location types from Email::Clean [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/1282926 (https://phabricator.wikimedia.org/T425195) [09:29:03] (03PS1) 10Lars SG: Don't merge identical email addresses with different location types [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/1282929 (https://phabricator.wikimedia.org/T425195) [11:03:04] (03PS10) 10Eileen: Get chariot tests working & various updates in this pre-alpha stage [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/1281749 (https://phabricator.wikimedia.org/T419044) [11:03:07] RESOLVED: SystemdService: Systemd service nginx.service stopped on frdata1003.frack.eqiad.wmnet:9100 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DSystemdService [11:03:11] RESOLVED: SystemdService: Systemd service nginx.service stopped on frdata1003.frack.eqiad.wmnet:9100 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DSystemdService [11:03:24] RESOLVED: [2x] SSLExpiryWarning: SSL expiry warning donor-wiki expires in -20578.45810090278 days - https://alerts.wikimedia.org/?q=alertname%3DSSLExpiryWarning [11:03:24] RESOLVED: [2x] SSLExpiryCritical: SSL expiry critical donor-wiki expires in -20578.45810090278 days - https://alerts.wikimedia.org/?q=alertname%3DSSLExpiryCritical [11:03:36] (03CR) 10CI reject: [V:04-1] Get chariot tests working & various updates in this pre-alpha stage [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/1281749 (https://phabricator.wikimedia.org/T419044) (owner: 10Eileen) [11:04:17] (03PS11) 10Eileen: Get chariot tests working & various updates in this pre-alpha stage [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/1281749 (https://phabricator.wikimedia.org/T419044) [11:24:16] FIRING: ContextSwitchingSpike: Host context switching high (instance frdb1007.frack.eqiad.wmnet:9100) - https://alerts.wikimedia.org/?q=alertname%3DContextSwitchingSpike [11:59:16] RESOLVED: ContextSwitchingSpike: Host context switching high (instance frdb1007.frack.eqiad.wmnet:9100) - https://alerts.wikimedia.org/?q=alertname%3DContextSwitchingSpike [12:38:39] FIRING: SystemdService: Systemd service nginx.service stopped on frdata1003.frack.eqiad.wmnet:9100 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DSystemdService [12:39:24] FIRING: [2x] SSLExpiryWarning: SSL expiry warning donor-wiki expires in -20578.526850902777 days - https://alerts.wikimedia.org/?q=alertname%3DSSLExpiryWarning [12:39:24] FIRING: [2x] SSLExpiryCritical: SSL expiry critical donor-wiki expires in -20578.526850902777 days - https://alerts.wikimedia.org/?q=alertname%3DSSLExpiryCritical [12:43:39] FIRING: SystemdService: Systemd service nginx.service stopped on frdata1003.frack.eqiad.wmnet:9100 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DSystemdService [13:21:04] FIRING: DiskSpaceWarn: Disk space WARNING for (instance frmon1002.frack.eqiad.wmnet:9100):/srv - https://alerts.wikimedia.org/?q=alertname%3DDiskSpaceWarn [13:22:41] (03PS1) 10Ejegg: Use trixie for donut and core containers [wikimedia/fundraising/dev] - 10https://gerrit.wikimedia.org/r/1282980 (https://phabricator.wikimedia.org/T424774) [13:23:17] 06Fundraising-Backlog, 10fundraising-tech-ops, 10Observability-Alerting: Shift frack alerting to use prometheus-alertmanager instead of icinga - https://phabricator.wikimedia.org/T367370#11889537 (10Dwisehaupt) [13:23:39] RESOLVED: SystemdService: Systemd service nginx.service stopped on frdata1003.frack.eqiad.wmnet:9100 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DSystemdService [13:29:24] RESOLVED: [2x] SSLExpiryCritical: SSL expiry critical donor-wiki expires in -20578.558100902777 days - https://alerts.wikimedia.org/?q=alertname%3DSSLExpiryCritical [13:29:24] RESOLVED: [2x] SSLExpiryWarning: SSL expiry warning donor-wiki expires in -20578.558100902777 days - https://alerts.wikimedia.org/?q=alertname%3DSSLExpiryWarning [13:29:45] 06Fundraising-Backlog, 10fundraising-tech-ops, 10Observability-Alerting: Shift frack alerting to use prometheus-alertmanager instead of icinga - https://phabricator.wikimedia.org/T367370#11889568 (10Dwisehaupt) Adjusted the prometheus config in frack to sent alerts to the production alertmanager. Alerts are... [13:35:09] RESOLVED: SystemdService: Systemd service nginx.service stopped on frdata1003.frack.eqiad.wmnet:9100 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DSystemdService [13:36:59] 10fundraising-tech-ops, 10observability: Deprecate Fundraising nsca icinga alert collection - https://phabricator.wikimedia.org/T425424 (10Jgreen) 03NEW [13:37:31] (03PS1) 10Ejegg: Update python containers to use trixie images [wikimedia/fundraising/dev] - 10https://gerrit.wikimedia.org/r/1282987 (https://phabricator.wikimedia.org/T424774) [13:55:05] 06Fundraising-Backlog, 10fundraising-tech-ops, 10Observability-Alerting: Shift frack alerting to use prometheus-alertmanager instead of icinga - https://phabricator.wikimedia.org/T367370#11889697 (10Dwisehaupt) [14:07:55] 10fundraising-tech-ops, 10observability: Deprecate Fundraising nsca icinga alert collection - https://phabricator.wikimedia.org/T425424#11889756 (10Jgreen) [14:57:57] FIRING: [2x] MariadbReplicationLag: MariaDB replica frdb1008.frack.eqiad.wmnet:9004 behind origin: 364 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DMariadbReplicationLag [15:06:57] FIRING: [2x] MariadbReplicationLag: MariaDB replica frdb1008.frack.eqiad.wmnet:9004 behind origin: 904 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DMariadbReplicationLag [17:21:04] FIRING: DiskSpaceWarn: Disk space WARNING for (instance frmon1002.frack.eqiad.wmnet:9100):/srv - https://alerts.wikimedia.org/?q=alertname%3DDiskSpaceWarn [17:31:57] FIRING: [2x] MariadbReplicationLag: MariaDB replica frdb1008.frack.eqiad.wmnet:9004 behind origin: 2437 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DMariadbReplicationLag [17:32:57] RESOLVED: MariadbReplicationLag: MariaDB replica frdb1008.frack.eqiad.wmnet:9004 behind origin: 272 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DMariadbReplicationLag [18:35:23] (03CR) 10Bartosz Dziewoński: [C:03+2] Localisation updates from https://translatewiki.net. [extensions/DonationInterface] (REL1_46) - 10https://gerrit.wikimedia.org/r/1281132 (owner: 10L10n-bot) [18:41:57] RESOLVED: MariadbReplicationLag: MariaDB replica frdb2005.frack.codfw.wmnet:9004 behind origin: 1457 - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DMariadbReplicationLag [21:21:04] FIRING: DiskSpaceWarn: Disk space WARNING for (instance frmon1002.frack.eqiad.wmnet:9100):/srv - https://alerts.wikimedia.org/?q=alertname%3DDiskSpaceWarn [21:58:47] (03CR) 10Bartosz Dziewoński: [V:03+2 C:03+2] Localisation updates from https://translatewiki.net. [extensions/DonationInterface] (REL1_46) - 10https://gerrit.wikimedia.org/r/1281132 (owner: 10L10n-bot) [22:00:19] (03CR) 10Bartosz Dziewoński: [C:03+2] Localisation updates from https://translatewiki.net. [extensions/DonationInterface] (REL1_43) - 10https://gerrit.wikimedia.org/r/1282545 (owner: 10L10n-bot) [22:00:32] (03CR) 10Bartosz Dziewoński: [V:03+2 C:03+2] Localisation updates from https://translatewiki.net. [extensions/DonationInterface] (REL1_43) - 10https://gerrit.wikimedia.org/r/1282545 (owner: 10L10n-bot) [22:02:32] (03CR) 10Bartosz Dziewoński: [V:03+2 C:03+2] Localisation updates from https://translatewiki.net. [extensions/DonationInterface] (REL1_45) - 10https://gerrit.wikimedia.org/r/1279760 (owner: 10L10n-bot)