[00:01:15] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[00:02:01] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[00:17:41] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[00:59:35] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 553161552 and 57 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:00:09] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2350092296 and 171 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:00:33] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 5411409648 and 336 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:00:37] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1851303024 and 173 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:00:53] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2233167632 and 210 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:00:55] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1959130160 and 196 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:01:13] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2088610448 and 221 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:01:15] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 64784 and 123 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:01:49] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 29632 and 156 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:02:33] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 40520 and 201 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:03:55] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 138200 and 283 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:04:09] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 370600 and 298 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:04:33] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 548800 and 322 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:05:31] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 61888 and 379 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:08:58] <wikibugs>	 (03CR) 10Jforrester: "recheck" [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/593240 (owner: 10Ssingh)
[01:17:09] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 387417344 and 19 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:17:19] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2009 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 5782695664 and 327 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:17:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 638628552 and 34 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:17:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2785217320 and 156 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:17:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1177351528 and 58 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:20:29] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1168960 and 77 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:20:43] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 87832 and 92 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:20:43] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 87832 and 92 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:22:15] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2009 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 583864 and 184 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:22:23] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 284296 and 192 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[02:10:37] <wikibugs>	 (03CR) 10Jforrester: "recheck" [software/homer] - 10https://gerrit.wikimedia.org/r/644872 (owner: 10Ayounsi)
[05:29:51] <icinga-wm>	 PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[05:31:25] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqsin is CRITICAL: CRITICAL: No response from remote host 103.102.166.128 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:32:55] <icinga-wm>	 PROBLEM - Host mr1-eqsin IPv6 is DOWN: PING CRITICAL - Packet loss = 60%, RTA = 4561.26 ms
[05:36:13] <icinga-wm>	 PROBLEM - Juniper alarms on mr1-eqsin is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 103.102.166.128 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[05:37:49] <icinga-wm>	 RECOVERY - Juniper alarms on mr1-eqsin is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[05:37:57] <icinga-wm>	 RECOVERY - Router interfaces on mr1-eqsin is OK: OK: host 103.102.166.128, interfaces up: 38, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:38:37] <icinga-wm>	 RECOVERY - Host mr1-eqsin IPv6 is UP: PING OK - Packet loss = 0%, RTA = 235.10 ms
[05:41:21] <icinga-wm>	 RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 232.33 ms
[06:15:27] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[07:16:39] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[07:18:19] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[11:27:24] <wikibugs>	 10Operations, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 3 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Aklapper)
[11:45:21] <wikibugs>	 (03PS1) 10Majavah: Add fiwiki 500k temporary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/652687 (https://phabricator.wikimedia.org/T270974)
[11:45:23] <wikibugs>	 (03PS1) 10Majavah: Config for fiwiki 500k temporary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/652688 (https://phabricator.wikimedia.org/T270974)
[14:44:46] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Publish statistics about number of held messages per mailing list (Jan 2021) - https://phabricator.wikimedia.org/T270977 (10Aklapper) p:05Triage→03Low
[17:32:07] <icinga-wm>	 PROBLEM - Host ms-be2050 is DOWN: PING CRITICAL - Packet loss = 100%
[17:32:13] <icinga-wm>	 RECOVERY - Host ms-be2050 is UP: PING OK - Packet loss = 0%, RTA = 33.41 ms
[21:44:29] <wikibugs>	 (03PS1) 10Andrew Bogott: Keystone: update otp auth code for Stein [puppet] - 10https://gerrit.wikimedia.org/r/652832 (https://phabricator.wikimedia.org/T261134)
[21:46:58] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Keystone: update otp auth code for Stein [puppet] - 10https://gerrit.wikimedia.org/r/652832 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott)
[23:04:17] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on alert1001 is CRITICAL: 53.63 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[23:07:09] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[23:17:03] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[23:17:27] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on alert1001 is OK: (C)60 le (W)70 le 71.29 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1