[00:00:32] <wikibugs>	 10Operations, 10Icinga, 10decommission, 10monitoring: decom einsteinium - https://phabricator.wikimedia.org/T209738 (10Dzahn)
[00:00:39] <wikibugs>	 10Operations, 10Icinga, 10decommission, 10monitoring: decom einsteinium - https://phabricator.wikimedia.org/T209738 (10Dzahn) a:03Dzahn
[00:01:27] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn)
[00:01:34] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn)
[00:02:29] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) 05Open>03Resolved   this ticket is resolved, einsteinium has been replaced by icinga1001 on stretch.  the rest of the steps will be part of the de...
[00:03:28] <wikibugs>	 (03PS3) 10Dzahn: icinga: remove einsteinium as an alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/473278 (https://phabricator.wikimedia.org/T209738)
[00:03:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] icinga: remove einsteinium as an alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/473278 (https://phabricator.wikimedia.org/T209738) (owner: 10Dzahn)
[00:04:10] <wikibugs>	 (03PS4) 10Dzahn: icinga: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782)
[00:04:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] icinga: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[00:08:18] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Concerns about icinga1001 check latency - https://phabricator.wikimedia.org/T208066 (10colewhite) I'll re-title the case and claim it to implement the metrics collection.
[00:08:59] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Push check latency and check execution time to Prometheus - https://phabricator.wikimedia.org/T208066 (10colewhite) p:05Normal>03Low a:03colewhite
[00:12:15] <wikibugs>	 (03PS4) 10Dzahn: icinga: remove einsteinium as an alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/473278 (https://phabricator.wikimedia.org/T202782)
[00:12:17] <wikibugs>	 (03PS5) 10Dzahn: icinga: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782)
[00:12:19] <wikibugs>	 (03PS1) 10Dzahn: decom einsteinium remove from netboot and DHCP [puppet] - 10https://gerrit.wikimedia.org/r/474390 (https://phabricator.wikimedia.org/T209738)
[00:13:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] icinga: remove einsteinium as an alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/473278 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[00:13:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] icinga: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[00:16:42] <wikibugs>	 (03PS2) 10Dzahn: decom einsteinium remove from netboot and DHCP [puppet] - 10https://gerrit.wikimedia.org/r/474390 (https://phabricator.wikimedia.org/T209738)
[00:17:21] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:17:28] <wikibugs>	 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown: Disable caching on the main page for anonymous users - https://phabricator.wikimedia.org/T119366 (10BBlack) >>! In T119366#4754978, @Bawolff wrote: > Fwiw: im of the opinion that date magic words should reduce varnish cache to at least 24 hours, maybe...
[00:17:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] decom einsteinium remove from netboot and DHCP [puppet] - 10https://gerrit.wikimedia.org/r/474390 (https://phabricator.wikimedia.org/T209738) (owner: 10Dzahn)
[00:19:27] <wikibugs>	 (03PS5) 10Dzahn: icinga: remove einsteinium as an alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/473278 (https://phabricator.wikimedia.org/T202782)
[00:20:37] <wikibugs>	 (03PS6) 10Paladox: WIP: Update gerrit to 2.16 [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/463509
[00:20:58] <wikibugs>	 (03PS7) 10Paladox: WIP: Update gerrit to 2.16 [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/463509
[00:23:59] <wikibugs>	 (03PS6) 10Dzahn: icinga: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782)
[00:28:16] <wikibugs>	 (03PS6) 10Dzahn: icinga: remove einsteinium as an alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/473278 (https://phabricator.wikimedia.org/T202782)
[00:36:24] <wikibugs>	 (03PS1) 10Dzahn: remove icinga-old.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/474392 (https://phabricator.wikimedia.org/T209738)
[00:37:48] <wikibugs>	 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown: Disable caching on the main page for anonymous users - https://phabricator.wikimedia.org/T119366 (10kruusamagi) >>! In T119366#4754973, @Bawolff wrote: >>>! In T119366#4754971, @kruusamagi wrote: >> For me, it seems that the issue has grown even bigger...
[00:42:07] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[00:52:14] <wikibugs>	 (03PS11) 10Dzahn: phabricator: add data types to all parameters [puppet] - 10https://gerrit.wikimedia.org/r/471325
[00:54:44] <wikibugs>	 (03PS12) 10Dzahn: phabricator: add data types to all parameters [puppet] - 10https://gerrit.wikimedia.org/r/471325
[00:57:20] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler1002/13566/phab1001.eqiad.wmnet/change.phab1001.eqiad.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/471325 (owner: 10Dzahn)
[01:04:51] <wikibugs>	 (03PS2) 10Herron: kafka_shipper: use mmrm1stspace to remove leading space in msg field [puppet] - 10https://gerrit.wikimedia.org/r/474317 (https://phabricator.wikimedia.org/T206454)
[01:05:11] <wikibugs>	 (03PS8) 10Dzahn: icinga/planet: add generic check_lastmod plugin and check planet updates [puppet] - 10https://gerrit.wikimedia.org/r/472713 (https://phabricator.wikimedia.org/T203208)
[01:06:06] <wikibugs>	 (03CR) 10Herron: [C: 032] kafka_shipper: use mmrm1stspace to remove leading space in msg field [puppet] - 10https://gerrit.wikimedia.org/r/474317 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron)
[01:06:31] <wikibugs>	 (03Abandoned) 10Dzahn: icinga: on stretch, use fping instead of ping for faster host checks [puppet] - 10https://gerrit.wikimedia.org/r/469333 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[01:11:25] <wikibugs>	 (03PS2) 10Herron: kafka_shipper: update syslog json template [puppet] - 10https://gerrit.wikimedia.org/r/474319 (https://phabricator.wikimedia.org/T206454)
[01:12:28] <wikibugs>	 (03CR) 10Herron: [C: 032] kafka_shipper: update syslog json template [puppet] - 10https://gerrit.wikimedia.org/r/474319 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron)
[01:15:35] <wikibugs>	 (03PS2) 10Herron: kafka_shipper: add apache2 to lookup table with kafka output [puppet] - 10https://gerrit.wikimedia.org/r/474320 (https://phabricator.wikimedia.org/T205852)
[01:16:56] <wikibugs>	 (03CR) 10Herron: [C: 032] kafka_shipper: add apache2 to lookup table with kafka output [puppet] - 10https://gerrit.wikimedia.org/r/474320 (https://phabricator.wikimedia.org/T205852) (owner: 10Herron)
[01:48:29] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Define an SLO for Wikidata Query Service public endpoint and communicate it - https://phabricator.wikimedia.org/T199228 (10Smalyshev) > ensuring that the data in the WDQS nodes accurately reflects the data upstre...
[01:55:09] <icinga-wm>	 PROBLEM - puppet last run on db1107 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:55:16] <wikibugs>	 (03PS1) 10Bstorm: sonofgridengine: configure grid hosts from OpenStack [puppet] - 10https://gerrit.wikimedia.org/r/474400 (https://phabricator.wikimedia.org/T200557)
[01:58:52] <wikibugs>	 (03CR) 10Bstorm: [C: 032] sonofgridengine: configure grid hosts from OpenStack [puppet] - 10https://gerrit.wikimedia.org/r/474400 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm)
[02:25:53] <icinga-wm>	 RECOVERY - puppet last run on db1107 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[02:54:16] <RoanKattouw>	 !log Deployed patches for T208112, T208109, T208110
[02:54:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:56:03] <wikibugs>	 (03PS1) 10Catrope: Add default for new CN variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474406 (https://phabricator.wikimedia.org/T208112)
[02:56:05] <wikibugs>	 (03PS1) 10Catrope: Add and grant banner-protect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474407 (https://phabricator.wikimedia.org/T208109)
[03:01:35] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[03:07:27] <wikibugs>	 (03CR) 10Catrope: [C: 032] Add default for new CN variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474406 (https://phabricator.wikimedia.org/T208112) (owner: 10Catrope)
[03:07:32] <wikibugs>	 (03CR) 10Catrope: [C: 032] Add and grant banner-protect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474407 (https://phabricator.wikimedia.org/T208109) (owner: 10Catrope)
[03:08:28] <wikibugs>	 (03Merged) 10jenkins-bot: Add default for new CN variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474406 (https://phabricator.wikimedia.org/T208112) (owner: 10Catrope)
[03:08:35] <wikibugs>	 (03Merged) 10jenkins-bot: Add and grant banner-protect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474407 (https://phabricator.wikimedia.org/T208109) (owner: 10Catrope)
[03:11:45] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[03:15:44] <wikibugs>	 (03CR) 10jenkins-bot: Add default for new CN variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474406 (https://phabricator.wikimedia.org/T208112) (owner: 10Catrope)
[03:15:46] <wikibugs>	 (03CR) 10jenkins-bot: Add and grant banner-protect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474407 (https://phabricator.wikimedia.org/T208109) (owner: 10Catrope)
[03:30:55] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[03:31:09] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 900.39 seconds
[03:42:13] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[04:15:05] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 192.09 seconds
[05:20:37] <wikibugs>	 (03PS1) 10Jayprakash12345: Enable NewUserMessage Extension on tcy.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474409
[05:23:14] <wikibugs>	 (03PS2) 10Jayprakash12345: Enable NewUserMessage Extension on tcy.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474409 (https://phabricator.wikimedia.org/T209432)
[05:34:45] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[05:41:33] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[05:42:08] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on db1078 is CRITICAL: CRITICAL slave_io_state could not connect
[05:50:09] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 637.23 seconds
[06:17:35] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[06:21:39] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: db-eqiad: depool db1078 from s3, it crashed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474417
[06:22:40] <_joe_>	 apergos: I'll check grafana for a minute then merge this
[06:25:55] <wikibugs>	 10Operations, 10DBA: MariaDB killed by systemd with ABRT6 - https://phabricator.wikimedia.org/T209754 (10Joe)
[06:26:00] <apergos>	 that means most regular traffic will go to the master but I don't see what choice we have
[06:26:23] <_joe_>	 no, a lot will go to the vslow and recentchanges hosts
[06:26:26] <_joe_>	 see the weights
[06:26:31] <_joe_>	 but yes, no alternative
[06:26:42] <_joe_>	 and this is an ongoing outage, thanks mediawki loadbalancer
[06:27:23] <_joe_>	 ok, merging
[06:27:35] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] db-eqiad: depool db1078 from s3, it crashed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474417 (owner: 10Giuseppe Lavagetto)
[06:28:44] <wikibugs>	 10Operations, 10DBA: MariaDB killed by systemd with ABRT6 - https://phabricator.wikimedia.org/T209754 (10colewhite) The server was depooled: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/474417/
[06:29:16] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad: depool db1078 from s3, it crashed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474417 (owner: 10Giuseppe Lavagetto)
[06:29:45] <icinga-wm>	 PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:34:03] <wikibugs>	 10Operations, 10DBA: MariaDB killed by systemd with ABRT6 - https://phabricator.wikimedia.org/T209754 (10Marostegui) Thank you for letting us know Thanks also @Joe for calling me up.  We will take it from here :-)
[06:37:37] <wikibugs>	 10Operations, 10DBA: db1078 (s3 candidate master) crashed  - https://phabricator.wikimedia.org/T209754 (10Marostegui) p:05Triage>03High
[06:38:28] <logmsgbot>	 !log oblivian@deploy1001 Synchronized wmf-config/db-eqiad.php: Depooling db1078 (duration: 00m 59s)
[06:38:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:40:39] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad: depool db1078 from s3, it crashed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474417 (owner: 10Giuseppe Lavagetto)
[06:41:15] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[06:42:08] <wikibugs>	 10Operations, 10DBA: db1078 (s3 candidate master) crashed - https://phabricator.wikimedia.org/T209754 (10Marostegui) MySQL got corrupted  - this host needs to be rebuilt.
[06:43:53] <wikibugs>	 (03PS1) 10Marostegui: db1078: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/474447 (https://phabricator.wikimedia.org/T209754)
[06:44:39] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db1078: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/474447 (https://phabricator.wikimedia.org/T209754) (owner: 10Marostegui)
[06:49:00] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: db1078 (s3 candidate master) crashed - https://phabricator.wikimedia.org/T209754 (10Marostegui) I haven't found anything on HW logs that might indicate a HW malfunction
[06:51:58] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s3 on db1078 is OK: OK slave_io_state Slave_IO_Running: Yes
[06:55:33] * apergos eyes the recovery page skeptically
[06:55:39] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[06:56:15] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1078 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[06:56:34] <_joe_>	 marostegui: ^^ uh? shouldn't it break like soon?
[06:56:58] * volans|off here
[06:57:11] <_joe_>	 volans|off: late to the party :P
[06:57:41] <marostegui>	 It was me starting replication again - just in case we really need that host before recloning (which we shouldn't)
[06:58:14] <apergos>	 that was a very fast catchup
[06:58:39] <marostegui>	 It wasn't delayed too much and that host has SSDs :)
[06:59:21] <apergos>	 impressive, the power of ssds
[06:59:50] <apergos>	 and we will get no pages if it decides to crash and burn again, yes?
[07:00:13] <icinga-wm>	 RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[07:02:12] <volans|off>	 !log 'reset modified attributes' on IcingaUI for db1078 (and mgmt) and all its services
[07:02:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:11:53] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[07:12:39] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Marostegui)
[07:16:25] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Banyek) When I was stasrted to run puppet on the new Parsercache hosts on the other day I disabled notifications in the same way, but the hosts were reporting error.
[07:17:52] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Marostegui) >>! In T209757#4755397, @Banyek wrote: > When I was stasrted to run puppet on the new Parsercache hosts on the other day I disabled notifications in the...
[07:17:56] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Volans)
[07:20:55] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Banyek) There was an error with pt-heartbeat indeed, but that error was reported to IRC which shouldn't happened if disabling notifications would work.  (Or maybe I...
[07:22:37] <icinga-wm>	 PROBLEM - DPKG on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[07:22:39] <icinga-wm>	 PROBLEM - dhclient process on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[07:22:41] <icinga-wm>	 PROBLEM - Check systemd state on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[07:22:45] <icinga-wm>	 PROBLEM - MD RAID on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[07:22:49] <icinga-wm>	 PROBLEM - configured eth on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[07:23:25] <icinga-wm>	 PROBLEM - Disk space on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[07:24:09] <icinga-wm>	 PROBLEM - puppet last run on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[07:25:45] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[07:25:46] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Marostegui) For the record I have downtimed db1078 (without touching notifications anymore to avoid messing with any investigation).
[07:40:25] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[07:41:33] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[07:56:39] <icinga-wm>	 PROBLEM - IPMI Sensor Status on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[08:04:49] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[08:09:15] <icinga-wm>	 RECOVERY - DPKG on notebook1004 is OK: All packages OK
[08:09:15] <icinga-wm>	 RECOVERY - dhclient process on notebook1004 is OK: PROCS OK: 0 processes with command name dhclient
[08:09:17] <icinga-wm>	 RECOVERY - Check systemd state on notebook1004 is OK: OK - running: The system is fully operational
[08:09:21] <icinga-wm>	 RECOVERY - MD RAID on notebook1004 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[08:09:27] <icinga-wm>	 RECOVERY - configured eth on notebook1004 is OK: OK - interfaces up
[08:10:04] <elukey>	 (should be recovered in a bit)
[08:10:07] <icinga-wm>	 RECOVERY - Disk space on notebook1004 is OK: DISK OK
[08:14:08] <apergos>	 elukey: what was up? and do I need to check on this when I see it?
[08:15:07] <icinga-wm>	 RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[08:16:59] <wikibugs>	 10Operations, 10Parsoid: parsoid-rt repeated failures on ruthenium (parsoid::testing) - https://phabricator.wikimedia.org/T209758 (10Volans)
[08:17:22] <volans|off>	 this is for the random systemd unit failures on ruthenium ^^^
[08:24:56] <wikibugs>	 10Operations, 10Wikimedia-Apache-configuration: Redirect from zh-yue.wiktionary.org is not working properly - https://phabricator.wikimedia.org/T209693 (10Hello903hello) We just need to add proper redirect rules for zh-yue.wiktionary.org to yue.wiktionary.org at the current stage, period.
[08:25:55] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on notebook1004 is OK: OK: synced at Sat 2018-11-17 08:25:54 UTC.
[08:26:45] <icinga-wm>	 RECOVERY - IPMI Sensor Status on notebook1004 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK
[08:29:25] <icinga-wm>	 PROBLEM - SSH ganeti2005.mgmt on ganeti2005.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:41:36] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Volans) p:05Triage>03High Things that I've found so far, some may be unrelated but still need a fix anyway.  === Permissions It seems that https://gerrit.wikimed...
[08:41:39] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[08:56:11] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:01:43] <icinga-wm>	 PROBLEM - dhclient process on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[09:01:45] <icinga-wm>	 PROBLEM - DPKG on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[09:01:45] <icinga-wm>	 PROBLEM - Check systemd state on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[09:01:49] <icinga-wm>	 PROBLEM - MD RAID on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[09:01:53] <icinga-wm>	 PROBLEM - configured eth on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[09:02:29] <icinga-wm>	 PROBLEM - Disk space on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[09:03:19] <icinga-wm>	 PROBLEM - puppet last run on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[09:11:45] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[09:28:21] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[09:29:25] <icinga-wm>	 RECOVERY - SSH ganeti2005.mgmt on ganeti2005.mgmt is OK: SSH OK - OpenSSH_5.8 (protocol 2.0)
[09:39:23] <icinga-wm>	 RECOVERY - dhclient process on notebook1004 is OK: PROCS OK: 0 processes with command name dhclient
[09:39:25] <icinga-wm>	 RECOVERY - DPKG on notebook1004 is OK: All packages OK
[09:39:27] <icinga-wm>	 RECOVERY - Check systemd state on notebook1004 is OK: OK - running: The system is fully operational
[09:39:31] <icinga-wm>	 RECOVERY - MD RAID on notebook1004 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[09:39:35] <icinga-wm>	 RECOVERY - configured eth on notebook1004 is OK: OK - interfaces up
[09:40:11] <icinga-wm>	 RECOVERY - Disk space on notebook1004 is OK: DISK OK
[09:44:05] <icinga-wm>	 RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:51:36] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: package_builder: Switch to class declaration syntax [puppet] - 10https://gerrit.wikimedia.org/r/473782
[09:54:20] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: ores::redis: Set maxmemory-policy: volatile-lur [puppet] - 10https://gerrit.wikimedia.org/r/474450 (https://phabricator.wikimedia.org/T209628)
[09:58:29] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on notebook1004 is OK: OK: synced at Sat 2018-11-17 09:58:27 UTC.
[11:36:20] <wikibugs>	 (03CR) 10Zoranzoki21: [C: 031] "Hi, patch looks good. But please fix commit message per my comment. Thanks!" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474372 (owner: 10Takidelfin)
[13:01:30] <wikibugs>	 (03PS1) 10Zoranzoki21: Add tboverride permission to extendedmover group on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474458 (https://phabricator.wikimedia.org/T209753)
[14:06:04] <wikibugs>	 (03Abandoned) 10Zoranzoki21: Remove duplicates of comments about task T206935 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472919 (https://phabricator.wikimedia.org/T206935) (owner: 10Zoranzoki21)
[14:26:27] <wikibugs>	 10Operations, 10Icinga, 10monitoring: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Dzahn) >>! In T209757#4755427, @Volans wrote:  > === Init files > Those two init files seems to still have the old paths for jessie and are not compatible with stret...
[14:27:41] <wikibugs>	 (03PS1) 10Dzahn: fix path to puppet_hosts/services in default_icinga.sh [puppet] - 10https://gerrit.wikimedia.org/r/474463 (https://phabricator.wikimedia.org/T209757)
[14:33:46] <wikibugs>	 (03PS1) 10Dzahn: icinga: do not manage retention.dat in puppet [puppet] - 10https://gerrit.wikimedia.org/r/474464 (https://phabricator.wikimedia.org/T209757)
[14:34:16] <wikibugs>	 (03PS2) 10Dzahn: icinga: fix path to puppet_hosts/services in default_icinga.sh [puppet] - 10https://gerrit.wikimedia.org/r/474463 (https://phabricator.wikimedia.org/T209757)
[14:35:17] <wikibugs>	 (03PS3) 10Dzahn: icinga: fix path to puppet_hosts/services in default_icinga.sh [puppet] - 10https://gerrit.wikimedia.org/r/474463 (https://phabricator.wikimedia.org/T202782)
[14:37:30] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "/etc/icinga/puppet_hosts.cfg: cannot open `/etc/icinga/puppet_hosts.cfg' (No such file or directory)" [puppet] - 10https://gerrit.wikimedia.org/r/474463 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[14:38:55] <mutante>	 akosiaris: hi, i see a pending change on puppetmaster
[14:40:02] <mutante>	 can merge both and take a look at package_builder. checking how that compiles
[14:41:38] <mutante>	 oh.. also used in CI/labs
[14:43:22] <mutante>	 merging both, looks harmless syntax change.. ack
[14:45:39] <wikibugs>	 (03CR) 10Dzahn: "merged on puppetmaster. noop on boron." [puppet] - 10https://gerrit.wikimedia.org/r/473782 (owner: 10Alexandros Kosiaris)
[14:49:07] <wikibugs>	 10Operations, 10Commons, 10Multimedia, 10media-storage: Damaged uploads interrupted with reaching of 5 MB - https://phabricator.wikimedia.org/T201379 (10SJu) The problem is still continuing... I propose to switch cross-wiki uploads off and forbid it until the problem is solved.
[14:50:41] <wikibugs>	 (03PS2) 10Dzahn: icinga: do not manage retention.dat in puppet [puppet] - 10https://gerrit.wikimedia.org/r/474464 (https://phabricator.wikimedia.org/T209757)
[14:51:15] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Dzahn)
[14:51:17] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn)
[14:51:42] <wikibugs>	 (03PS3) 10Dzahn: icinga: do not manage retention.dat in puppet [puppet] - 10https://gerrit.wikimedia.org/r/474464 (https://phabricator.wikimedia.org/T209757)
[14:53:24] <wikibugs>	 (03PS4) 10Dzahn: icinga: do not manage retention.dat in puppet [puppet] - 10https://gerrit.wikimedia.org/r/474464 (https://phabricator.wikimedia.org/T209757)
[14:54:19] <wikibugs>	 (03CR) 10Dzahn: [C: 032] icinga: do not manage retention.dat in puppet [puppet] - 10https://gerrit.wikimedia.org/r/474464 (https://phabricator.wikimedia.org/T209757) (owner: 10Dzahn)
[14:57:13] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Dzahn) >>! In T209757#4755427, @Volans wrote: > Notice: /Stage[main]/Icinga/File[/var/lib/icinga/retention.dat]/group: group changed 'nagios' t...
[15:05:12] <wikibugs>	 (03PS1) 10Dzahn: test disabling icinga notifications on ununpentium [puppet] - 10https://gerrit.wikimedia.org/r/474465
[15:06:09] <wikibugs>	 (03PS2) 10Dzahn: test disabling icinga notifications on ununpentium [puppet] - 10https://gerrit.wikimedia.org/r/474465
[15:06:55] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "T209757" [puppet] - 10https://gerrit.wikimedia.org/r/474465 (owner: 10Dzahn)
[15:18:08] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Dzahn) To summarize:  - permissions on retention.dat:   They are now:   56M -rw-r--r--  1 nagios nagios    56M Nov 17 14:53 retention.dat  and...
[15:19:36] <bawolff>	 interesting. I'm getting session errors when I save, even when I do it over again
[15:19:44] <bawolff>	 I wonder how my session became broken like that
[15:21:52] <bawolff>	 And there's no log data from the session mismatch. I would kind of expect something going wrong with my session to trigger a logging event
[15:26:52] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Marostegui) And db1078 picked it up too and has all notifications disabled. I guess this is fixed then or is there any other follow up needed?
[15:27:10] <bawolff>	 wtf, The tokens on the edit page are invalid for me, but the tokens on other pages (e.g. js csrfToken are valid)
[15:30:00] <bawolff>	 meh, upon further experimentation, all the tokens seem invalid
[15:31:07] <bawolff>	 which would be more consistent with my session being borked
[15:34:01] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Dzahn) Also db1078 specifically is now fixed without further steps.  Before the changes above just some services had notifications disabled in...
[15:41:51] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn)
[15:41:56] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Dzahn) 05Open>03Resolved a:03Dzahn >>! In T209757#4755730, @Marostegui wrote: > And db1078 picked it up too and has all notifications dis...
[15:45:07] <wikibugs>	 (03PS1) 10Dzahn: Revert "test disabling icinga notifications on ununpentium" [puppet] - 10https://gerrit.wikimedia.org/r/474468
[15:45:48] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "this was just a test for T209757" [puppet] - 10https://gerrit.wikimedia.org/r/474468 (owner: 10Dzahn)
[15:51:21] * bawolff gave up on trying to debug what was wrong with my session and just logged in and out again
[15:51:47] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Notifications disablement via puppet not working on icinga - https://phabricator.wikimedia.org/T209757 (10Dzahn) reverted my test patch on ununpentium and all notifications are enabled again, while db1078 still has all disabled. so that worked too.....
[16:28:41] <wikibugs>	 10Operations, 10Citoid, 10Patch-For-Review, 10Service-deployment-requests, and 3 others: Deploy translation-server-v2 - https://phabricator.wikimedia.org/T201611 (10akosiaris) This has now been deployed to the kubernetes staging cluster.   ` akosiaris@deploy1001:~$ curl -d 'http://www.nytimes.com/2018/06/1...
[16:34:12] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on wtp2020 is CRITICAL: 7.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1var-server=wtp2020var-datasource=codfw%2520prometheus%252Fops
[16:38:38] <wikibugs>	 10Operations, 10ops-eqiad, 10Dumps-Generation: Move dumpsdata1001 - https://phabricator.wikimedia.org/T207278 (10ayounsi) 05Open>03Resolved a:03ayounsi Correct, thanks!
[17:04:55] <wikibugs>	 (03PS1) 10Andrew Bogott: update the toolserver.org IP to point to eqiad1-r [dns] - 10https://gerrit.wikimedia.org/r/474475 (https://phabricator.wikimedia.org/T209769)
[17:06:29] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] update the toolserver.org IP to point to eqiad1-r [dns] - 10https://gerrit.wikimedia.org/r/474475 (https://phabricator.wikimedia.org/T209769) (owner: 10Andrew Bogott)
[17:24:20] <icinga-wm>	 PROBLEM - HTTPS-toolserver on www.toolserver.org is CRITICAL: SSL CRITICAL - Certificate stable.toolserver.org expired
[17:25:26] <icinga-wm>	 RECOVERY - HTTPS-toolserver on www.toolserver.org is OK: SSL OK - Certificate stable.toolserver.org valid until 2018-12-25 21:52:38 +0000 (expires in 38 days)
[17:38:10] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472917 (https://phabricator.wikimedia.org/T209250) (owner: 10Zoranzoki21)
[17:41:20] <icinga-wm>	 PROBLEM - HTTPS-toolserver on www.toolserver.org is CRITICAL: SSL CRITICAL - Certificate stable.toolserver.org expired
[17:46:55] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] Disable FlaggedRevs, enable RC patrol and add rights on srwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472745 (https://phabricator.wikimedia.org/T209251) (owner: 10Zoranzoki21)
[17:48:04] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/472918 (https://phabricator.wikimedia.org/T209252) (owner: 10Zoranzoki21)
[17:48:12] <icinga-wm>	 RECOVERY - HTTPS-toolserver on www.toolserver.org is OK: SSL OK - Certificate stable.toolserver.org valid until 2018-12-25 21:52:38 +0000 (expires in 38 days)
[17:48:19] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474458 (https://phabricator.wikimedia.org/T209753) (owner: 10Zoranzoki21)
[17:51:34] <icinga-wm>	 PROBLEM - HTTPS-toolserver on www.toolserver.org is CRITICAL: SSL CRITICAL - Certificate stable.toolserver.org expired
[18:00:01] <wikibugs>	 (03PS1) 10Zoranzoki21: IS.php: Cosmetic changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474478
[18:32:20] <icinga-wm>	 RECOVERY - HTTPS-toolserver on www.toolserver.org is OK: SSL OK - Certificate stable.toolserver.org valid until 2019-02-15 17:31:48 +0000 (expires in 89 days)
[18:38:06] <wikibugs>	 (03PS1) 10Andrew Bogott: Toolserver: fix ErrorDocument rule in apache config [puppet] - 10https://gerrit.wikimedia.org/r/474481 (https://phabricator.wikimedia.org/T209769)
[18:39:34] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Toolserver: fix ErrorDocument rule in apache config [puppet] - 10https://gerrit.wikimedia.org/r/474481 (https://phabricator.wikimedia.org/T209769) (owner: 10Andrew Bogott)
[19:04:09] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Need to shut down a list - https://phabricator.wikimedia.org/T209726 (10Aklapper) @Beeblebrox: Which exact list on https://lists.wikimedia.org/mailman/listinfo is this about?
[19:28:42] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[19:38:52] <wikibugs>	 10Operations, 10Parsoid: parsoid-rt repeated failures on ruthenium (parsoid::testing) - https://phabricator.wikimedia.org/T209758 (10ssastry) p:05Triage>03High
[19:42:18] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[19:56:56] <icinga-wm>	 PROBLEM - puppet last run on mwmaint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:27:40] <icinga-wm>	 RECOVERY - puppet last run on mwmaint2001 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[21:30:58] <wikibugs>	 (03CR) 10Takidelfin: "> Patch Set 2: Code-Review+1" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474372 (owner: 10Takidelfin)
[21:31:46] <wikibugs>	 (03PS3) 10Takidelfin: InitialiseSettings: Remove redundant namespace talks definitions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/474372 (https://phabricator.wikimedia.org/T206952)
[21:45:48] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[22:11:54] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[22:16:26] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[22:41:26] <icinga-wm>	 RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational
[22:42:36] <icinga-wm>	 PROBLEM - Disk space on analytics1039 is CRITICAL: DISK CRITICAL - free space: / 1366 MB (2% inode=97%)
[23:00:28] <icinga-wm>	 PROBLEM - DPKG on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[23:00:28] <icinga-wm>	 PROBLEM - dhclient process on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[23:00:40] <icinga-wm>	 PROBLEM - Check systemd state on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[23:00:42] <icinga-wm>	 PROBLEM - Disk space on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[23:00:50] <icinga-wm>	 PROBLEM - MD RAID on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[23:00:52] <icinga-wm>	 PROBLEM - configured eth on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[23:03:32] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[23:03:36] <icinga-wm>	 PROBLEM - puppet last run on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused
[23:09:32] <icinga-wm>	 RECOVERY - DPKG on notebook1004 is OK: All packages OK
[23:09:34] <icinga-wm>	 RECOVERY - dhclient process on notebook1004 is OK: PROCS OK: 0 processes with command name dhclient
[23:09:44] <icinga-wm>	 RECOVERY - Check systemd state on notebook1004 is OK: OK - running: The system is fully operational
[23:09:48] <icinga-wm>	 RECOVERY - Disk space on notebook1004 is OK: DISK OK
[23:09:56] <icinga-wm>	 RECOVERY - MD RAID on notebook1004 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[23:09:58] <icinga-wm>	 RECOVERY - configured eth on notebook1004 is OK: OK - interfaces up
[23:13:52] <icinga-wm>	 RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[23:33:40] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on notebook1004 is OK: OK: synced at Sat 2018-11-17 23:33:38 UTC.
[23:40:36] <icinga-wm>	 PROBLEM - Disk space on analytics1039 is CRITICAL: DISK CRITICAL - free space: / 1657 MB (3% inode=97%)
[23:43:52] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team, 10User-Smalyshev: Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10Smalyshev) Loading finished, overall took 8 days and 9 hours...
[23:55:14] <icinga-wm>	 PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.