[00:27:08] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:36:58] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:37:50] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 553906800 and 36 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:37:50] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 4764420936 and 769 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:39:20] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1205764104 and 64 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:40:34] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 8147460112 and 934 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:40:34] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1094098976 and 60 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:41:22] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 638290272 and 46 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:41:32] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1083556056 and 76 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:41:54] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2572763496 and 161 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:48:48] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 145728 and 135 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:49:36] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 992 and 182 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:49:46] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 284632 and 193 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:50:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 217840 and 215 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:50:52] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 346504 and 258 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:50:58] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 56928 and 266 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:52:58] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 140873880 and 10 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:53:26] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2009 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 65637856 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:53:32] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 190155144 and 9 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:53:32] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1178733344 and 66 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:53:42] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 151791368 and 10 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:53:46] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 433 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:54:02] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 507147904 and 29 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:54:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 465 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:55:10] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 265022856 and 12 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:02:14] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 720 and 46 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:02:50] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 83 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:03:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2009 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 23544 and 110 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:03:22] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 26832 and 115 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:03:24] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1008 and 116 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:03:24] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 68360 and 116 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:03:32] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 31280 and 125 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:09:14] <icinga-wm>	 RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:17:12] <icinga-wm>	 PROBLEM - Check systemd state on sodium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:12:29] <wikibugs>	 (03PS1) 10Andrew Bogott: Cinder: allow api filtering on 'bootable' [puppet] - 10https://gerrit.wikimedia.org/r/648840 (https://phabricator.wikimedia.org/T269511)
[03:13:33] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Cinder: allow api filtering on 'bootable' [puppet] - 10https://gerrit.wikimedia.org/r/648840 (https://phabricator.wikimedia.org/T269511) (owner: 10Andrew Bogott)
[03:15:54] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[03:17:34] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[03:18:00] <icinga-wm>	 PROBLEM - Prometheus prometheus2004/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus2004 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/global
[03:19:52] <icinga-wm>	 PROBLEM - Prometheus prometheus2003/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus2003 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/global
[03:22:32] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[03:24:36] <icinga-wm>	 PROBLEM - Prometheus prometheus2004/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus2004 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/global
[03:24:52] <icinga-wm>	 PROBLEM - Prometheus prometheus2003/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus2003 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/global
[03:25:48] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[03:29:34] <wikibugs>	 (03PS1) 10Andrew Bogott: Horizon: update LAUNCH_INSTANCE_DEFAULTS to prepare for Cinder [puppet] - 10https://gerrit.wikimedia.org/r/648847 (https://phabricator.wikimedia.org/T269511)
[03:33:47] <wikibugs>	 (03PS2) 10Andrew Bogott: Horizon: update LAUNCH_INSTANCE_DEFAULTS to prepare for Cinder [puppet] - 10https://gerrit.wikimedia.org/r/648847 (https://phabricator.wikimedia.org/T269511)
[03:35:44] <wikibugs>	 (03PS3) 10Andrew Bogott: Horizon: update LAUNCH_INSTANCE_DEFAULTS to prepare for Cinder [puppet] - 10https://gerrit.wikimedia.org/r/648847 (https://phabricator.wikimedia.org/T269511)
[03:38:03] <wikibugs>	 (03PS4) 10Andrew Bogott: Horizon: update LAUNCH_INSTANCE_DEFAULTS to prepare for Cinder [puppet] - 10https://gerrit.wikimedia.org/r/648847 (https://phabricator.wikimedia.org/T269511)
[03:40:52] <wikibugs>	 (03PS1) 10Andrew Bogott: Glance: disable the 'file' backend in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/648852
[03:41:22] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Horizon: update LAUNCH_INSTANCE_DEFAULTS to prepare for Cinder [puppet] - 10https://gerrit.wikimedia.org/r/648847 (https://phabricator.wikimedia.org/T269511) (owner: 10Andrew Bogott)
[03:41:50] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Glance: disable the 'file' backend in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/648852 (owner: 10Andrew Bogott)
[03:52:48] <icinga-wm>	 RECOVERY - Prometheus prometheus2003/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus2003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/global
[03:54:12] <icinga-wm>	 RECOVERY - Prometheus prometheus2004/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus2004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/global
[03:57:03] <wikibugs>	 (03PS1) 10Andrew Bogott: Glance: make glance active/active in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/648853
[03:57:05] <wikibugs>	 (03PS1) 10Andrew Bogott: Remove obsolete glance image_sync code [puppet] - 10https://gerrit.wikimedia.org/r/648854
[04:02:37] <wikibugs>	 (03PS1) 10Andrew Bogott: Glance: remove the glance_image_dir param [puppet] - 10https://gerrit.wikimedia.org/r/648857
[04:03:08] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Glance: make glance active/active in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/648853 (owner: 10Andrew Bogott)
[04:31:48] <icinga-wm>	 PROBLEM - Prometheus prometheus2003/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus2003 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/global
[04:35:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Glance: remove the glance_image_dir param [puppet] - 10https://gerrit.wikimedia.org/r/648857 (owner: 10Andrew Bogott)
[04:35:20] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Remove obsolete glance image_sync code [puppet] - 10https://gerrit.wikimedia.org/r/648854 (owner: 10Andrew Bogott)
[04:57:02] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:58:46] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 04-1] "I like the DRY idea but I think it could be a bit hard to read. The chown container also has bunch of environment variables it doesn't nee" [deployment-charts] - 10https://gerrit.wikimedia.org/r/648304 (owner: 10Ahmon Dancy)
[04:58:54] <icinga-wm>	 RECOVERY - Prometheus prometheus2003/global -or a Prometheus it scrapes- was restarted: beware possible monitoring artifacts on prometheus2003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=codfw+prometheus/global
[05:01:52] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:13:50] <icinga-wm>	 RECOVERY - Check systemd state on kubestagemaster2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:18:46] <icinga-wm>	 PROBLEM - Check systemd state on kubestagemaster2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:34] <icinga-wm>	 PROBLEM - ores on ores2006 is CRITICAL: connect to address 10.192.32.174 and port 8081: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/ores
[06:29:46] <icinga-wm>	 PROBLEM - ores on ores2009 is CRITICAL: connect to address 10.192.48.90 and port 8081: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/ores
[06:31:12] <icinga-wm>	 RECOVERY - ores on ores2006 is OK: HTTP OK: HTTP/1.0 200 OK - 6397 bytes in 0.083 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/ores
[06:36:16] <icinga-wm>	 RECOVERY - ores on ores2009 is OK: HTTP OK: HTTP/1.0 200 OK - 6397 bytes in 0.089 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/ores
[07:13:38] <icinga-wm>	 RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:23:02] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:27:54] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201213T0800)
[08:43:46] <icinga-wm>	 RECOVERY - Check systemd state on kubestagemaster2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:46:58] <wikibugs>	 10Operations, 10ops-eqsin, 10DC-Ops: cr2-eqsin: fan failure - https://phabricator.wikimedia.org/T267544 (10Volans) FWIW we're getting one email every hour from rancid about this. Is there any quick way to prevent/disable them by any chance?
[08:48:40] <icinga-wm>	 PROBLEM - Check systemd state on kubestagemaster2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:08:37] <wikibugs>	 (03PS1) 10ArielGlenn: fix up ssh key entry for gtzatchkova [puppet] - 10https://gerrit.wikimedia.org/r/648970 (https://phabricator.wikimedia.org/T269930)
[09:13:48] <wikibugs>	 (03CR) 10ArielGlenn: "The cross-validate-accounts cron job fails without the key type being in there, even if the key does make it onto the hosts for use. While" [puppet] - 10https://gerrit.wikimedia.org/r/648970 (https://phabricator.wikimedia.org/T269930) (owner: 10ArielGlenn)
[09:14:57] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] fix up ssh key entry for gtzatchkova [puppet] - 10https://gerrit.wikimedia.org/r/648970 (https://phabricator.wikimedia.org/T269930) (owner: 10ArielGlenn)
[09:44:10] <icinga-wm>	 RECOVERY - Check systemd state on kubestagemaster2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:49:04] <icinga-wm>	 PROBLEM - Check systemd state on kubestagemaster2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:23:02] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:27:56] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:17:54] <wikibugs>	 (03PS1) 10ArielGlenn: add platform engineering folks to snapshot and dumpsdata server access [puppet] - 10https://gerrit.wikimedia.org/r/649077
[14:18:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] add platform engineering folks to snapshot and dumpsdata server access [puppet] - 10https://gerrit.wikimedia.org/r/649077 (owner: 10ArielGlenn)
[14:21:29] <wikibugs>	 (03CR) 10ArielGlenn: "The managers should weigh in, adding them as reviewers. Also would like Moritz's thoughts on this, in particular setting up a group that's" [puppet] - 10https://gerrit.wikimedia.org/r/649077 (owner: 10ArielGlenn)
[14:23:58] <wikibugs>	 (03PS2) 10ArielGlenn: add platform engineering folks to snapshot and dumpsdata server access [puppet] - 10https://gerrit.wikimedia.org/r/649077
[15:27:26] <wikibugs>	 10Operations, 10netops: Upgrade Routinator 3000 to 0.8.2 - https://phabricator.wikimedia.org/T269738 (10ayounsi) https://www.ripe.net/ripe/mail/archives/routing-wg/2020-December/004206.html
[16:26:45] <wikibugs>	 10Operations, 10Diff-blog, 10Traffic, 10HTTPS: Send HSTS header on diff.wikimedia.org - https://phabricator.wikimedia.org/T270034 (10Nintendofan885)
[16:36:24] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 142 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:39:40] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 36 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:13:53] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10Volans) @jijiki given that the hosts that gets reimaged are changing interface name from ethN to enoN, we also need to run [[ https://wikit...
[17:55:38] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki)
[17:56:26] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) @Volans Should I run it for the ones I have already reimaged?
[18:40:50] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10Volans) >>! In T213089#6687500, @jijiki wrote: > @Volans So I should run it for all of them? Should we add  a notice about this on the reim...
[21:06:30] <wikibugs>	 (03CR) 10Krinkle: "@Bryan Thanks for the ping. I'm happy to decom the tool if the Grafana dash is recommended nowadays." [puppet] - 10https://gerrit.wikimedia.org/r/632471 (https://phabricator.wikimedia.org/T210993) (owner: 10Muehlenhoff)
[21:22:19] <wikibugs>	 (03CR) 10Krinkle: "btw, is it documented somewhere how to get patches merged here? I'm cc-ing you two based on previous commits and based on us not having +2" [labs/private] - 10https://gerrit.wikimedia.org/r/635859 (https://phabricator.wikimedia.org/T262962) (owner: 10Dave Pifke)
[21:57:40] <wikibugs>	 (03CR) 10QChris: [C: 04-1] "> Patch Set 15:" [puppet] - 10https://gerrit.wikimedia.org/r/556270 (https://phabricator.wikimedia.org/T240266) (owner: 10Paladox)
[23:44:58] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 484 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[23:46:36] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 15 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops