[00:52:55] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 4964575696 and 413 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:53:11] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 7205717128 and 544 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:53:17] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 306555712 and 208 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:54:55] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 35648 and 299 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:56:09] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 263936 and 374 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:56:27] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 349168 and 390 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:05:35] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2066473208 and 136 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:05:43] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2009 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2702763376 and 179 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:06:23] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1554727344 and 93 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:07:21] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 79923072 and 49 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:08:49] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 71032 and 131 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:08:57] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2009 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 408 and 139 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:08:57] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 408 and 139 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[01:09:39] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 3760 and 181 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[06:44:43] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[06:47:55] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[08:49:59] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:51:37] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:23:22] <wikibugs>	 10Operations, 10Puppet: Multiple puppet/apt errors - https://phabricator.wikimedia.org/T270940 (10jijiki)
[09:23:47] <wikibugs>	 10Operations, 10Puppet: Multiple puppet/apt errors - https://phabricator.wikimedia.org/T270940 (10jijiki) I have not debugged the issue any further, but it appears to be affecting a small number of hosts
[09:24:19] <wikibugs>	 10Operations, 10Puppet: Multiple puppet/apt errors - https://phabricator.wikimedia.org/T270940 (10jijiki)
[09:43:14] <wikibugs>	 10Operations, 10Puppet: Multiple puppet/apt errors - https://phabricator.wikimedia.org/T270940 (10MoritzMuehlenhoff) This is a long standing race condition in apt updating itself, which is now exposed by the tzdata update which was released for Stretch (and we keep this one updated via package=>latest). Runnin...
[10:19:59] <icinga-wm>	 PROBLEM - Host ms-be2050 is DOWN: PING CRITICAL - Packet loss = 100%
[10:28:57] <icinga-wm>	 RECOVERY - Host ms-be2050 is UP: PING OK - Packet loss = 0%, RTA = 33.37 ms
[11:36:42] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "see inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) (owner: 10Elukey)
[12:45:16] <wikibugs>	 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10JavaScript: Display map markers on Kartographer maps even in case of mapserver failures - https://phabricator.wikimedia.org/T270865 (10jbond) p:05Triage→03Medium
[12:48:28] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: wikipedia-mai &  wikiur-l mail archives are empty after August 2018 & January 2019 respectively - https://phabricator.wikimedia.org/T270837 (10jbond) p:05Triage→03Medium
[12:48:48] <wikibugs>	 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10JavaScript: Display map markers on Kartographer maps even in case of mapserver failures - https://phabricator.wikimedia.org/T270865 (10jbond) @RKemper can you double check i have tagged this correctly?
[12:52:34] <wikibugs>	 10Operations, 10WVUI: Import npm 6.14.8 to buster dist. on apt.wikimedia.org - https://phabricator.wikimedia.org/T270321 (10jbond) p:05Triage→03Medium
[12:55:32] <wikibugs>	 10Operations, 10Puppet: Multiple puppet/apt errors - https://phabricator.wikimedia.org/T270940 (10jbond) 05Open→03Resolved a:03jbond > sudo dpkg --configure -a  This wasn;t enough i had to run `apt-get install -f` either way fixed now
[12:56:40] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: Enable webp thumbnails on all images for non-Commons wikis - https://phabricator.wikimedia.org/T269946 (10jbond)
[12:57:21] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: Enable webp thumbnails on all images for non-Commons wikis - https://phabricator.wikimedia.org/T269946 (10jbond) It seems this is being tracked by performance team so i have removed the operations tag but please add back if you feel this was an error.
[13:29:07] <wikibugs>	 (03PS1) 10Jbond: pki: add default date to cloud [puppet] - 10https://gerrit.wikimedia.org/r/652551
[13:30:48] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] pki: add default date to cloud [puppet] - 10https://gerrit.wikimedia.org/r/652551 (owner: 10Jbond)
[13:34:23] <wikibugs>	 (03PS1) 10Jbond: pki: add default vhost for cloud [puppet] - 10https://gerrit.wikimedia.org/r/652554
[13:39:49] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] pki: add default vhost for cloud [puppet] - 10https://gerrit.wikimedia.org/r/652554 (owner: 10Jbond)
[14:20:45] <wikibugs>	 (03CR) 10Elukey: admin: deprecate the analytics-users posix group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) (owner: 10Elukey)
[14:30:33] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: Enable webp thumbnails on all images for non-Commons wikis - https://phabricator.wikimedia.org/T269946 (10Peachey88) This is potentially a dup of {T27611} which is effectively stalled on {T211661}
[14:32:10] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] admin: deprecate the analytics-users posix group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) (owner: 10Elukey)
[14:33:15] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] admin: deprecate the analytics-users posix group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/651448 (https://phabricator.wikimedia.org/T269150) (owner: 10Elukey)
[18:31:07] <icinga-wm>	 PROBLEM - Host ms-be2050 is DOWN: PING CRITICAL - Packet loss = 100%
[18:31:31] <icinga-wm>	 RECOVERY - Host ms-be2050 is UP: PING OK - Packet loss = 0%, RTA = 33.35 ms
[19:01:21] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[19:01:57] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[20:20:47] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[20:21:25] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[21:11:39] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[21:20:55] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[22:11:47] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[22:40:51] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[22:41:43] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[22:59:41] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[23:30:07] <icinga-wm>	 PROBLEM - Host ms-be2050 is DOWN: PING CRITICAL - Packet loss = 100%
[23:30:15] <icinga-wm>	 RECOVERY - Host ms-be2050 is UP: PING OK - Packet loss = 0%, RTA = 35.56 ms