[00:02:29] <wikibugs>	 10Operations, 10serviceops: upgrade planet.wikimedia.org backends to buster - https://phabricator.wikimedia.org/T247651 (10Dzahn)
[00:02:38] <wikibugs>	 10Operations, 10serviceops: upgrade planet.wikimedia.org backends to buster - https://phabricator.wikimedia.org/T247651 (10Dzahn) a:03Dzahn
[00:05:54] <wikibugs>	 10Operations, 10serviceops: replace backends for releases.wikimedia.org with buster VMs - https://phabricator.wikimedia.org/T247652 (10Dzahn)
[00:07:10] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10serviceops: replace doc1001.eqiad.wmnet with a buster VM - https://phabricator.wikimedia.org/T247653 (10Dzahn)
[00:10:54] <wikibugs>	 10Operations, 10Epic: Migrate all of production metal to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn)
[00:10:54] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[00:10:56] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10serviceops: replace doc1001.eqiad.wmnet with a buster VM - https://phabricator.wikimedia.org/T247653 (10Dzahn)
[00:10:58] <wikibugs>	 10Operations, 10serviceops: upgrade planet.wikimedia.org backends to buster - https://phabricator.wikimedia.org/T247651 (10Dzahn)
[00:11:01] <wikibugs>	 10Operations, 10serviceops: replace backends for releases.wikimedia.org with buster VMs - https://phabricator.wikimedia.org/T247652 (10Dzahn)
[00:11:04] <wikibugs>	 10Operations, 10serviceops: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 (10Dzahn)
[00:11:07] <wikibugs>	 10Operations, 10serviceops: upgrade people.wikimedia.org backend to buster - https://phabricator.wikimedia.org/T247649 (10Dzahn)
[00:11:10] <wikibugs>	 10Operations, 10serviceops: miscweb1001/2001 - upgrade to buster or decom - https://phabricator.wikimedia.org/T247648 (10Dzahn)
[00:11:13] <wikibugs>	 10Operations, 10Epic: Migrate all of production metal to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn) @Muehlenhoff ^
[00:12:19] <wikibugs>	 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labstore1006/1007 to Stretch/Buster - https://phabricator.wikimedia.org/T224583 (10Bstorm) Ok, labstore1006 is now buster.  Failing things back to their steady state.
[00:12:26] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[00:12:58] <wikibugs>	 10Operations, 10Epic: Migrate all of production metal to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn)
[00:13:00] <wikibugs>	 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labstore1006/1007 to Stretch/Buster - https://phabricator.wikimedia.org/T224583 (10Dzahn)
[00:13:31] <wikibugs>	 (03PS1) 10Bstorm: Revert "dumps-distribution: move all NFS traffic to labstore1007" [puppet] - 10https://gerrit.wikimedia.org/r/579670 (https://phabricator.wikimedia.org/T224583)
[00:14:11] <wikibugs>	 (03PS1) 10Bstorm: Revert "dumps-distribution: fail over to labstore1007 for dumps.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/579671 (https://phabricator.wikimedia.org/T224583)
[00:14:43] <wikibugs>	 (03PS1) 10Bstorm: Revert "dumps-distribution: switch which host does acme" [puppet] - 10https://gerrit.wikimedia.org/r/579672 (https://phabricator.wikimedia.org/T224583)
[00:16:36] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] Revert "dumps-distribution: move all NFS traffic to labstore1007" [puppet] - 10https://gerrit.wikimedia.org/r/579670 (https://phabricator.wikimedia.org/T224583) (owner: 10Bstorm)
[00:17:14] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] Revert "dumps-distribution: fail over to labstore1007 for dumps.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/579671 (https://phabricator.wikimedia.org/T224583) (owner: 10Bstorm)
[00:18:34] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] Revert "dumps-distribution: switch which host does acme" [puppet] - 10https://gerrit.wikimedia.org/r/579672 (https://phabricator.wikimedia.org/T224583) (owner: 10Bstorm)
[00:19:16] <wikibugs>	 10Operations, 10Epic: Migrate all of production metal to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn)
[00:19:31] <wikibugs>	 10Operations, 10Patch-For-Review: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10Dzahn)
[00:19:34] <wikibugs>	 10Operations, 10Thumbor, 10serviceops, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10Dzahn)
[00:19:39] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Upgrade memcached for Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10Dzahn)
[00:21:09] <wikibugs>	 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10Patch-For-Review, 10cloud-services-team (Kanban): Migrate labstore1006/1007 to Stretch/Buster - https://phabricator.wikimedia.org/T224583 (10Bstorm)
[00:23:17] <wikibugs>	 (03PS1) 10Bstorm: Revert "dumps-distribution: set the TTL to 5M for dumps.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/579674 (https://phabricator.wikimedia.org/T224583)
[00:24:05] <wikibugs>	 (03PS2) 10Bstorm: Revert "dumps-distribution: set the TTL to 5M for dumps.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/579674 (https://phabricator.wikimedia.org/T224583)
[00:24:07] <wikibugs>	 (03PS1) 10Dzahn: racktables: remove port 80 firewall hole [puppet] - 10https://gerrit.wikimedia.org/r/579675
[00:24:50] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] Revert "dumps-distribution: set the TTL to 5M for dumps.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/579674 (https://phabricator.wikimedia.org/T224583) (owner: 10Bstorm)
[00:25:25] <wikibugs>	 (03PS1) 10Dzahn: iegreview: remove port 80 firewall hole [puppet] - 10https://gerrit.wikimedia.org/r/579677
[00:26:23] <wikibugs>	 10Operations, 10Epic: Migrate all of production metal to Buster or later - https://phabricator.wikimedia.org/T247045 (10Bstorm)
[00:26:25] <wikibugs>	 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10Patch-For-Review, 10cloud-services-team (Kanban): Migrate labstore1006/1007 to Stretch/Buster - https://phabricator.wikimedia.org/T224583 (10Bstorm) 05Open→03Resolved Everything is failed back to how it normally is except it's now buster.
[00:26:27] <wikibugs>	 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10Bstorm)
[00:28:21] <wikibugs>	 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10Dzahn)
[00:28:49] <wikibugs>	 (03PS1) 10Dzahn: planet: remove port 80 firewall hole [puppet] - 10https://gerrit.wikimedia.org/r/579678
[00:34:40] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:42:20] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:59:59] <icinga-wm>	 PROBLEM - puppet last run on stat1008 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[01:01:23] <wikibugs>	 10Operations: apt config on planet1001 would install systemd from backports - https://phabricator.wikimedia.org/T247592 (10Dzahn) on planet1001, `/etc/apt/sources.list` looks like this:   `   1 # deb http://mirrors.wikimedia.org/debian/ stretch main   2    3 ## Wikimedia APT repository   4 # deb http://apt1001.w...
[01:05:11] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:06:19] <mutante>	 !log planet1001 - copying /etc/apt/sources.list from planet2001 to planet1001 - apt-get update - apt-get install openssh-server T247592
[01:06:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:06:25] <stashbot>	 T247592: apt config on planet1001 would install systemd from backports - https://phabricator.wikimedia.org/T247592
[01:08:00] <wikibugs>	 10Operations: apt config on planet1001 would install systemd from backports - https://phabricator.wikimedia.org/T247592 (10Dzahn) 05Open→03Resolved Copied the sources.list from 2001 to 1001 and installed newer version of openssh-server and client.  libpam-systemd now 232-25+deb9u12 installed and candidate....
[01:13:21] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:15:45] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[01:17:57] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[01:51:33] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:55:43] <icinga-wm>	 PROBLEM - Check systemd state on boron is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:06:55] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:14:41] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:14:47] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:40:29] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:48:11] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:01:05] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[03:01:23] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[03:11:27] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:19:05] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:07:51] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:23:09] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:46:29] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:54:11] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:56:31] <icinga-wm>	 PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 102 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad
[04:58:53] <icinga-wm>	 PROBLEM - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad
[05:01:55] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[05:02:17] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[05:03:53] <icinga-wm>	 PROBLEM - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad
[05:06:59] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:14:45] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:17:19] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:21:49] <icinga-wm>	 RECOVERY - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 72.2 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad
[05:22:09] <icinga-wm>	 RECOVERY - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is OK: (C)100 gt (W)80 gt 75.25 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad
[05:24:57] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:27:01] <icinga-wm>	 RECOVERY - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 72.2 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad
[05:37:57] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[05:40:29] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[06:15:41] <wikibugs>	 (03PS1) 10KartikMistry: apertium-cy-en: Fix FTBFS with apertium >= 3.6 [debs/contenttranslation/apertium-cy-en] - 10https://gerrit.wikimedia.org/r/579683 (https://phabricator.wikimedia.org/T247585)
[06:21:31] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:11] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:31:41] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[06:32:07] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[06:52:09] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:58:18] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10MediaWiki-Vagrant: Package XDebug 2.7.2 for apt.wikimedia.org - https://phabricator.wikimedia.org/T220406 (10Tgr)
[06:58:30] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10MediaWiki-Vagrant: Package XDebug 2.7.2 for apt.wikimedia.org - https://phabricator.wikimedia.org/T220406 (10Tgr) Reframed the task since it seems the PECL route was discarded (probably for the better). OTOH we are about to support PHP 7.4 in MediaWiki, and testing that...
[06:59:57] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:07:35] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:15:17] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:56:03] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:03:43] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:29:35] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 57 probes of 541 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[08:32:03] <elukey>	 !log run systemctl restart systemd-timedated.service on stat1008
[08:32:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:45] <elukey>	 !log run kafka preferred-replica-election on kafka-jumbo1001 - T247561
[08:33:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:50] <stashbot>	 T247561: kafka-jumbo1006 network issues - https://phabricator.wikimedia.org/T247561
[08:34:09] <wikibugs>	 10Operations, 10Analytics, 10DC-Ops, 10netops: kafka-jumbo1006 and stat1005 network issues - https://phabricator.wikimedia.org/T247561 (10elukey)
[08:36:03] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 47 probes of 541 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[08:45:23] <icinga-wm>	 PROBLEM - Host stat1008 is DOWN: PING CRITICAL - Packet loss = 100%
[08:47:21] <icinga-wm>	 RECOVERY - Host stat1008 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms
[08:49:18] <elukey>	 this was me --^
[08:49:31] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:50:59] <icinga-wm>	 RECOVERY - puppet last run on stat1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[08:51:46] <wikibugs>	 10Operations, 10Analytics, 10DC-Ops, 10netops: kafka-jumbo1006 and stat1005 network issues - https://phabricator.wikimedia.org/T247561 (10elukey) @Papaul @Jclark-ctr can we try to move stat1005 to a different switch port again?
[09:30:59] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[09:40:47] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[10:05:01] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 89 probes of 541 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:11:35] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 37 probes of 541 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:31:25] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[10:38:49] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 100 probes of 541 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:49:05] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22102 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[10:58:23] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 43 probes of 541 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[12:17:03] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Email to WikimediaUA mailing list from base-w[at]yandex.ru does not get delivered - https://phabricator.wikimedia.org/T247603 (10Aklapper)
[14:31:15] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[14:31:47] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[16:02:31] <icinga-wm>	 PROBLEM - traffic_server tls process restarted on cp4025 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=ulsfo+prometheus/ops&var-instance=cp4025&var-layer=tls
[16:05:25] <icinga-wm>	 PROBLEM - Varnish frontend child restarted on cp4025 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp4025&var-datasource=ulsfo+prometheus/ops
[16:39:05] <Zoranzoki21>	 Hi, Zuul started to show this errors in tests "Permission denied"
[16:39:18] <Zoranzoki21>	 Example https://integration.wikimedia.org/ci/job/mwgate-node10-docker/93511/
[17:21:06] <wikibugs>	 (03PS1) 10KartikMistry: apertium-en-es: Fix FTBFS with apertium 3.6.1 [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/579757 (https://phabricator.wikimedia.org/T247585)
[17:28:21] <wikibugs>	 (03PS1) 10CRusnov: puppetdb uservice: Add individual host queries, expand for interface automation [puppet] - 10https://gerrit.wikimedia.org/r/579758
[17:30:03] <wikibugs>	 (03CR) 10CRusnov: "This has been tested on production PuppetDB using the flask test server for expected queries." [puppet] - 10https://gerrit.wikimedia.org/r/579758 (owner: 10CRusnov)
[17:31:14] <wikibugs>	 (03PS2) 10CRusnov: puppetdb uservice: Add individual host queries, expand for interface automation [puppet] - 10https://gerrit.wikimedia.org/r/579758 (https://phabricator.wikimedia.org/T244153)
[17:31:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetdb uservice: Add individual host queries, expand for interface automation [puppet] - 10https://gerrit.wikimedia.org/r/579758 (https://phabricator.wikimedia.org/T244153) (owner: 10CRusnov)
[17:33:08] <wikibugs>	 (03PS3) 10CRusnov: puppetdb uservice: Add individual host queries, expand for interface automation [puppet] - 10https://gerrit.wikimedia.org/r/579758 (https://phabricator.wikimedia.org/T244153)
[17:52:43] <revi>	 > Pausing due to database lag: Waiting for all: 31.35 seconds lagged.
[17:52:46] <revi>	 @wikidata
[17:52:55] <revi>	 31.35 sec... is it normal? (probably not?)
[18:11:22] <Reedy>	 It's not that much
[18:11:28] <Reedy>	 If there's a bot going on a rampage (like there often is)
[18:12:49] <revi>	 indeed :P
[18:13:21] <revi>	 it went better for moments and now sucking again
[18:15:41] <Reedy>	 Check RC and probably cry
[18:16:46] <revi>	 I don't see someone flooding RC tho
[19:03:23] <wikibugs>	 (03PS1) 10DannyS712: trwiki: Grant interface admins editprotected & editsemiprotected [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579772 (https://phabricator.wikimedia.org/T247672)
[19:05:55] <wikibugs>	 (03PS2) 10DannyS712: trwiki: Grant interface admins editprotected & editsemiprotected [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579772 (https://phabricator.wikimedia.org/T247672)
[19:06:55] <wikibugs>	 (03PS3) 10DannyS712: trwiki: Grant interface admins editprotected & editsemiprotected [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579772 (https://phabricator.wikimedia.org/T247672)
[19:10:52] <hauskatze>	 hmm?
[19:11:39] <hauskatze>	 ah
[19:13:03] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] trwiki: Grant interface admins editprotected & editsemiprotected (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579772 (https://phabricator.wikimedia.org/T247672) (owner: 10DannyS712)
[20:05:51] <icinga-wm>	 RECOVERY - Memory correctable errors -EDAC- on mw1239 is OK: (C)4 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1239&var-datasource=eqiad+prometheus/ops
[20:12:21] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[20:23:42] <wikibugs>	 (03CR) 10DannyS712: trwiki: Grant interface admins editprotected & editsemiprotected (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579772 (https://phabricator.wikimedia.org/T247672) (owner: 10DannyS712)
[20:24:27] <wikibugs>	 (03PS4) 10DannyS712: trwiki: Grant interface editors editprotected & editsemiprotected [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579772 (https://phabricator.wikimedia.org/T247672)
[20:26:06] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10serviceops: replace doc1001.eqiad.wmnet with a buster VM - https://phabricator.wikimedia.org/T247653 (10Dzahn) p:05Triage→03Medium
[20:26:11] <wikibugs>	 10Operations, 10serviceops: replace backends for releases.wikimedia.org with buster VMs - https://phabricator.wikimedia.org/T247652 (10Dzahn) p:05Triage→03Medium
[20:26:20] <wikibugs>	 10Operations: decom racktables? - https://phabricator.wikimedia.org/T247646 (10Dzahn) p:05Triage→03Medium
[20:45:29] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[20:51:31] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 1.219e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[20:59:05] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[21:05:38] <wikibugs>	 (03PS1) 10Alex Monk: Add public replica view for oauth_registered_consumer [puppet] - 10https://gerrit.wikimedia.org/r/579800
[21:13:02] <wikibugs>	 (03CR) 10Reedy: "Guessing https://github.com/wikimedia/puppet/blob/e0da161490eed41d9b446b825add7bb2747b9ce7/manifests/realm.pp#L209 probably needs removing" [puppet] - 10https://gerrit.wikimedia.org/r/579800 (owner: 10Alex Monk)
[21:15:03] <wikibugs>	 (03PS2) 10Alex Monk: Add public replica view for oauth_registered_consumer [puppet] - 10https://gerrit.wikimedia.org/r/579800
[22:08:49] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[22:11:19] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[22:51:41] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 1.396e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[22:59:19] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw