[02:00:47] <wikibugs>	 (03PS4) 10Faidon Liambotis: Add "accounting" report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/506663
[02:07:55] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 210371904 and 15 seconds
[02:08:35] <wikibugs>	 10Operations, 10ops-codfw: scs-a1-codfw: update serial in netbox - https://phabricator.wikimedia.org/T221984 (10faidon)
[02:10:31] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 213751000 and 12 seconds
[02:14:29] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 188318176 and 9 seconds
[02:15:47] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 13736 and 20 seconds
[02:23:32] <wikibugs>	 (03PS1) 10DannyS712: Add namespace aliases on zhwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506892 (https://phabricator.wikimedia.org/T222024)
[02:27:29] <wikibugs>	 (03PS2) 10DannyS712: Add namespace aliases on zhwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506892 (https://phabricator.wikimedia.org/T222024)
[03:05:19] <icinga-wm>	 PROBLEM - puppet last run on wtp1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:05:26] <chaomodus>	 Yes
[03:23:39] <icinga-wm>	 PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:31:51] <icinga-wm>	 RECOVERY - puppet last run on wtp1040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[03:34:15] <icinga-wm>	 PROBLEM - puppet last run on mw1346 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz]
[03:36:27] <icinga-wm>	 PROBLEM - puppet last run on mw2266 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz],File[/usr/share/GeoIP/GeoIP2-City.mmdb.test]
[03:36:31] <icinga-wm>	 PROBLEM - puppet last run on mw1309 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz],File[/usr/share/GeoIP/GeoIP2-City.mmdb.test]
[03:50:09] <icinga-wm>	 RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[03:52:27] <icinga-wm>	 RECOVERY - HP RAID on ms-be1037 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[04:02:57] <icinga-wm>	 RECOVERY - puppet last run on mw2266 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:03:03] <icinga-wm>	 RECOVERY - puppet last run on mw1309 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:06:03] <icinga-wm>	 RECOVERY - puppet last run on mw1346 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[04:17:13] <icinga-wm>	 PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:43:43] <icinga-wm>	 RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:30:21] <yannf>	 "PHP fatal error: 
[05:30:21] <yannf>	 entire web request took longer than 60 seconds and timed out"
[05:30:37] <yannf>	 this is a new error message
[05:42:11] <icinga-wm>	 PROBLEM - Mediawiki Cirrussearch update rate - codfw on icinga1001 is CRITICAL: CRITICAL: 30.00% of data under the critical threshold [50.0] https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[05:43:29] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:43:53] <icinga-wm>	 PROBLEM - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is CRITICAL: CRITICAL: 30.00% of data under the critical threshold [50.0] https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[05:46:01] <icinga-wm>	 PROBLEM - puppet last run on mw1271 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:50:05] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:55:07] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[05:57:43] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1015 is OK: OK - running: The system is fully operational
[06:07:03] <icinga-wm>	 RECOVERY - Mediawiki Cirrussearch update rate - codfw on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[06:07:25] <icinga-wm>	 RECOVERY - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[06:09:59] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:16:35] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[06:17:49] <icinga-wm>	 RECOVERY - puppet last run on mw1271 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[06:30:47] <icinga-wm>	 PROBLEM - puppet last run on mw2285 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:57:13] <icinga-wm>	 RECOVERY - puppet last run on mw2285 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[08:38:21] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1014 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:02:01] <icinga-wm>	 PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:10:59] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1014 is OK: OK - running: The system is fully operational
[09:28:27] <icinga-wm>	 RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[10:10:30] <wikibugs>	 (03PS1) 10Revi: Change kr.wikimedia.org destination [puppet] - 10https://gerrit.wikimedia.org/r/506895 (https://phabricator.wikimedia.org/T222033)
[10:11:33] <wikibugs>	 (03CR) 10Revi: [C: 04-1] "Please ping `revi` on #wm-operations before merging this because I need to move the page on Meta." [puppet] - 10https://gerrit.wikimedia.org/r/506895 (https://phabricator.wikimedia.org/T222033) (owner: 10Revi)
[12:45:45] <wikibugs>	 (03CR) 10Reedy: [C: 04-1] "You need to compile it too" [puppet] - 10https://gerrit.wikimedia.org/r/506895 (https://phabricator.wikimedia.org/T222033) (owner: 10Revi)
[12:47:04] <revi>	 Reedy: how do I do that? :-)
[12:47:15] <Reedy>	 revi: IIRC, there's a ruby script to run
[12:47:33] <revi>	 I'll look around tomorrow
[12:47:54] <Reedy>	 unless it's been made part of the puppet side...
[12:48:22] <Reedy>	 https://github.com/wikimedia/puppet/blob/95e34bdf87b89480115a1cc068549988f6ee9fd6/modules/mediawiki/manifests/web/prod_sites.pp#L5
[12:48:26] <Reedy>	 It has, nvm then
[12:48:40] <wikibugs>	 (03CR) 10Reedy: "Or not anymore, apparently" [puppet] - 10https://gerrit.wikimedia.org/r/506895 (https://phabricator.wikimedia.org/T222033) (owner: 10Revi)
[13:47:07] <icinga-wm>	 PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:03:07] <icinga-wm>	 PROBLEM - puppet last run on labpuppetmaster1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:13:39] <icinga-wm>	 RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:29:37] <icinga-wm>	 RECOVERY - puppet last run on labpuppetmaster1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:35:21] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Provide a temporary trwiki logo marking two years of censorship [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506849 (owner: 10Jforrester)
[14:36:25] <wikibugs>	 (03Merged) 10jenkins-bot: Provide a temporary trwiki logo marking two years of censorship [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506849 (owner: 10Jforrester)
[14:38:58] <wikibugs>	 (03CR) 10jenkins-bot: Provide a temporary trwiki logo marking two years of censorship [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506849 (owner: 10Jforrester)
[14:40:08] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Provide a temporary trwiki logo marking two years of censorship [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506849 (owner: 10Jforrester)
[14:44:18] <logmsgbot>	 !log jforrester@deploy1001 Synchronized static/images/project-logos/trwiki-2x.png: trwiki: Update logo for 2 year anniversary, part I (duration: 00m 55s)
[14:44:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:32] <logmsgbot>	 !log jforrester@deploy1001 Synchronized static/images/project-logos/trwiki-1.5x.png: trwiki: Update logo for 2 year anniversary, part II (duration: 00m 53s)
[14:45:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:48] <logmsgbot>	 !log jforrester@deploy1001 Synchronized static/images/project-logos/trwiki.png: trwiki: Update logo for 2 year anniversary, part III (duration: 00m 53s)
[14:47:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:16] <James_F>	 !log Manually purged the trwiki logos from Varnish as part of updating them for 2 year anniversary.
[14:53:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:55:38] <James_F>	 !log Updated trwiki's MediaWiki:Common.css to not over-ride the logo.
[14:55:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:58:33] <wikibugs>	 (03PS1) 10Jforrester: Revert "Provide a temporary trwiki logo marking two years of censorship" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506903
[14:58:44] <wikibugs>	 (03CR) 10Jforrester: [C: 04-1] "Not yet." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506903 (owner: 10Jforrester)
[15:53:55] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:57:51] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1015 is OK: OK - running: The system is fully operational
[16:57:27] <icinga-wm>	 PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:00:47] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[17:01:59] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[17:03:13] <icinga-wm>	 PROBLEM - Host cp3037 is DOWN: PING CRITICAL - Packet loss = 100%
[17:08:27] <icinga-wm>	 PROBLEM - Host cp3037.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[17:10:39] <icinga-wm>	 PROBLEM - IPsec on cp1080 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6
[17:10:43] <icinga-wm>	 PROBLEM - IPsec on cp2020 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:10:57] <icinga-wm>	 PROBLEM - IPsec on cp1090 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6
[17:10:57] <icinga-wm>	 PROBLEM - IPsec on cp1088 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6
[17:10:59] <icinga-wm>	 PROBLEM - IPsec on cp2002 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:10:59] <icinga-wm>	 PROBLEM - IPsec on cp2008 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:10:59] <icinga-wm>	 PROBLEM - IPsec on cp2017 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:10:59] <icinga-wm>	 PROBLEM - IPsec on cp2005 is CRITICAL: Strongswan CRITICAL - ok: 60 connecting: cp3037_v4 not-conn: cp3037_v6
[17:10:59] <icinga-wm>	 PROBLEM - IPsec on cp2018 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:11:03] <icinga-wm>	 PROBLEM - IPsec on cp2022 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:11:19] <icinga-wm>	 PROBLEM - IPsec on cp1078 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6
[17:11:19] <icinga-wm>	 PROBLEM - IPsec on cp1082 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6
[17:11:21] <icinga-wm>	 PROBLEM - IPsec on cp1086 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6
[17:11:23] <icinga-wm>	 PROBLEM - IPsec on cp2011 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:11:23] <icinga-wm>	 PROBLEM - IPsec on cp2024 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:11:33] <icinga-wm>	 PROBLEM - IPsec on cp1084 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6
[17:11:33] <icinga-wm>	 PROBLEM - IPsec on cp1076 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6
[17:11:49] <icinga-wm>	 PROBLEM - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:11:49] <icinga-wm>	 PROBLEM - IPsec on cp2025 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:11:49] <icinga-wm>	 PROBLEM - IPsec on cp2014 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6
[17:23:57] <icinga-wm>	 RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:46:03] <jijiki>	 !log Depooling cp3037
[17:46:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:46:32] <logmsgbot>	 !log jiji@cumin1001 conftool action : set/pooled=no; selector: name=cp3037.esams.wmnet
[17:46:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:52:11] <wikibugs>	 10Operations, 10Traffic: cp3037 is currently unreachable - https://phabricator.wikimedia.org/T222041 (10Vgutierrez)
[17:52:47] <wikibugs>	 10Operations, 10Traffic: cp3037 is currently unreachable - https://phabricator.wikimedia.org/T222041 (10Vgutierrez) p:05Triage→03Normal
[17:54:27] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1076 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:27] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1078 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:27] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1080 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:27] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1082 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:27] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1084 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:27] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1086 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:27] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1088 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:28] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp1090 is CRITICAL: Strongswan CRITICAL - ok: 68 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:28] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2002 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:29] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2005 is CRITICAL: Strongswan CRITICAL - ok: 60 connecting: cp3037_v4 not-conn: cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:29] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2008 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2011 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:30] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2014 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:31] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2017 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:31] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2018 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:32] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2020 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:32] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2022 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:33] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2024 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:33] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2025 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:54:34] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 60 not-conn: cp3037_v4, cp3037_v6 Effie Mouzeli cp3037 is down - T222041
[17:55:32] <icinga-wm>	 ACKNOWLEDGEMENT - Host cp3037 is DOWN: PING CRITICAL - Packet loss = 100% Effie Mouzeli cp3037 is down - T222041
[17:55:32] <icinga-wm>	 ACKNOWLEDGEMENT - Host cp3037.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Effie Mouzeli cp3037 is down - T222041
[18:37:11] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on kafka1023 is CRITICAL: 4.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=kafka1023&var-datasource=eqiad+prometheus/ops
[18:37:13] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1281 is CRITICAL: CRITICAL - load average: 62.99, 27.72, 19.99
[18:38:31] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1281 is OK: OK - load average: 21.67, 22.70, 18.83
[19:22:20] <wikibugs>	 10Operations, 10ops-eqiad, 10RESTBase, 10Core Platform Team Backlog (Watching / External), and 2 others: rack/setup/install restbase10[19-27].eqiad.wmnet - https://phabricator.wikimedia.org/T219404 (10mobrovac) @Cmjohnson any movement on this? Do you have an ETA on when the machines will be installed?
[19:59:55] <icinga-wm>	 PROBLEM - puppet last run on oresrdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:26:25] <icinga-wm>	 RECOVERY - puppet last run on oresrdb1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:57:25] <icinga-wm>	 RECOVERY - HP RAID on ms-be1030 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[22:42:13] <icinga-wm>	 PROBLEM - exim queue on mx1001 is CRITICAL: CRITICAL: 3167 mails in exim queue.