[00:00:28] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=wtp2015.codfw.wmnet
[00:00:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:01:04] <icinga-wm>	 RECOVERY - MD RAID on wtp2015 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[00:02:12] <icinga-wm>	 RECOVERY - puppet last run on wtp2015 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[00:02:38] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on wtp2015 is OK: OK: synced at Fri 2020-07-31 00:02:35 UTC. https://wikitech.wikimedia.org/wiki/NTP
[00:02:42] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (bad URL) is CRITICAL: Test bad URL returned the unexpected status 200 (expecting: 404) https://wikitech.wikimedia.org/wiki/Citoid
[00:06:17] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.36.0-wmf.1/extensions/Echo/modules/mobile/notificationsFilterOverlay.js: T258954 (duration: 01m 10s)
[00:06:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:06:24] <stashbot>	 T258954: Special:Notifications filter overlay never closes in Minerva - https://phabricator.wikimedia.org/T258954
[00:06:54] <icinga-wm>	 RECOVERY - Check the last execution of php7.2-fpm_check_restart on wtp2015 is OK: OK: Status of the systemd unit php7.2-fpm_check_restart https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[00:07:24] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.36.0-wmf.2/extensions/Echo/modules/mobile/notificationsFilterOverlay.js: T258954 (duration: 01m 06s)
[00:07:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:23] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[00:13:56] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[00:15:48] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[00:19:32] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on wtp2015 is OK: OK https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[00:40:48] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 138 probes of 572 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[00:44:24] <wikibugs>	 (03PS1) 10Tim Starling: Revert "Re-enable LilyPond/Score in safe mode (2nd attempt)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/617566
[00:44:45] <wikibugs>	 (03PS2) 10Tim Starling: Revert "Re-enable LilyPond/Score in safe mode (2nd attempt)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/617566
[00:44:59] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] Revert "Re-enable LilyPond/Score in safe mode (2nd attempt)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/617566 (owner: 10Tim Starling)
[00:46:00] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Re-enable LilyPond/Score in safe mode (2nd attempt)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/617566 (owner: 10Tim Starling)
[00:46:36] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 48 probes of 572 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[00:47:50] <logmsgbot>	 !log tstarling@deploy1001 Synchronized wmf-config/CommonSettings.php: disable lilypond execution again (duration: 01m 10s)
[00:47:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:48:51] <wikibugs>	 10Operations, 10MediaWiki-extensions-Score, 10Security-Team, 10Wikimedia-General-or-Unknown, and 3 others: Extension:Score / Lilypond is disabled on all wikis - https://phabricator.wikimedia.org/T257066 (10tstarling)
[00:49:43] <wikibugs>	 10Operations, 10MediaWiki-extensions-Score, 10Security-Team, 10Wikimedia-General-or-Unknown, and 3 others: Extension:Score / Lilypond is disabled on all wikis - https://phabricator.wikimedia.org/T257066 (10tstarling) 05Resolved→03Open It's disabled again, since I found a new vulnerability.
[01:20:38] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1031.eqiad.wmnet'] `  Of which...
[02:34:46] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt103[1-9]: use a simple one-drive raid config [puppet] - 10https://gerrit.wikimedia.org/r/617590 (https://phabricator.wikimedia.org/T251627)
[02:34:48] <wikibugs>	 (03Abandoned) 10Andrew Bogott: Add records for cloudvirt103[1-9] [dns] - 10https://gerrit.wikimedia.org/r/617558 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[02:35:28] <wikibugs>	 (03PS2) 10Andrew Bogott: cloudvirt103[1-9]: use a simple one-volume raid config [puppet] - 10https://gerrit.wikimedia.org/r/617590 (https://phabricator.wikimedia.org/T251627)
[02:36:19] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt103[1-9]: use a simple one-volume raid config [puppet] - 10https://gerrit.wikimedia.org/r/617590 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[02:38:53] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[02:54:19] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1031.eqiad.wmnet'] `  Of which...
[02:55:50] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[02:55:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:56:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[02:57:53] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[02:57:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:10:52] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1032.eqiad.wmnet'] `  Of which...
[03:12:20] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[03:12:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:14:24] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[03:14:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:15:34] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt103[1-9]: puppetize as thinvirts [puppet] - 10https://gerrit.wikimedia.org/r/617593 (https://phabricator.wikimedia.org/T251627)
[03:17:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cloudvirt103[1-9]: puppetize as thinvirts [puppet] - 10https://gerrit.wikimedia.org/r/617593 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[03:22:12] <wikibugs>	 (03PS2) 10Andrew Bogott: cloudvirt103[1-9]: puppetize as thinvirts [puppet] - 10https://gerrit.wikimedia.org/r/617593 (https://phabricator.wikimedia.org/T251627)
[03:23:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cloudvirt103[1-9]: puppetize as thinvirts [puppet] - 10https://gerrit.wikimedia.org/r/617593 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[03:24:16] <wikibugs>	 (03PS3) 10Andrew Bogott: cloudvirt103[1-9]: puppetize as thinvirts [puppet] - 10https://gerrit.wikimedia.org/r/617593 (https://phabricator.wikimedia.org/T251627)
[03:26:03] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt103[1-9]: puppetize as thinvirts [puppet] - 10https://gerrit.wikimedia.org/r/617593 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[03:31:03] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt103[1-9] -> debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/617595 (https://phabricator.wikimedia.org/T251627)
[03:31:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt103[1-9] -> debian stretch [puppet] - 10https://gerrit.wikimedia.org/r/617595 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[03:34:41] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[03:34:51] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1031.eqiad.wmnet'] `  Of which...
[03:38:13] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[03:51:49] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1031.eqiad.wmnet'] `  Of which...
[03:53:13] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[03:53:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:55:14] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[03:55:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:00:56] <wikibugs>	 (03PS1) 10Andrew Bogott: nova-compute: Remove a reference to a (now-not-always-present) mountpoint [puppet] - 10https://gerrit.wikimedia.org/r/617596 (https://phabricator.wikimedia.org/T251627)
[04:01:44] <icinga-wm>	 PROBLEM - Ensure hosts are not performing a change on every puppet run on puppetdb1002 is CRITICAL: CRITICAL: the following (5) node(s) change every puppet run: analytics1039.eqiad.wmnet, cloudvirt1032.eqiad.wmnet, wdqs1009.eqiad.wmnet, testreduce1001.eqiad.wmnet, cloudvirt1031.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes
[04:02:37] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova-compute: Remove a reference to a (now-not-always-present) mountpoint [puppet] - 10https://gerrit.wikimedia.org/r/617596 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[04:05:03] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[04:20:01] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[04:20:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:22:07] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[04:22:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:35:55] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1031.eqiad.wmnet'] `  Of which...
[04:39:29] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[04:54:28] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[04:54:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:56:34] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[04:56:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:04:36] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:09:41] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:09:51] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1032.eqiad.wmnet'] `  Of which...
[05:45:27] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/617529 (https://phabricator.wikimedia.org/T257016) (owner: 10Herron)
[05:49:17] <wikibugs>	 (03PS2) 10Muehlenhoff: Enable managed adduser/sysusers config also for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/602286 (https://phabricator.wikimedia.org/T235162)
[05:51:45] <wikibugs>	 (03PS1) 10Privacybatm: Sphinx: Resolve unexpected intend error [software/transferpy] - 10https://gerrit.wikimedia.org/r/617600 (https://phabricator.wikimedia.org/T257601)
[05:54:29] <wikibugs>	 (03CR) 10Privacybatm: "This resoves the sphinx doc build error." [software/transferpy] - 10https://gerrit.wikimedia.org/r/617600 (https://phabricator.wikimedia.org/T257601) (owner: 10Privacybatm)
[05:57:10] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[05:59:35] <moritzm>	 !log installing qemu updates on stretch
[05:59:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:00:31] <wikibugs>	 (03CR) 10Elukey: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/617479 (https://phabricator.wikimedia.org/T234826) (owner: 10Elukey)
[06:02:48] <wikibugs>	 (03PS5) 10Privacybatm: [POC4 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/616282 (https://phabricator.wikimedia.org/T259327)
[06:02:50] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (bad URL) is CRITICAL: Test bad URL returned the unexpected status 200 (expecting: 404) https://wikitech.wikimedia.org/wiki/Citoid
[06:02:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [POC4 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/616282 (https://phabricator.wikimedia.org/T259327) (owner: 10Privacybatm)
[06:03:59] <wikibugs>	 (03PS8) 10Privacybatm: [POC3 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/615179 (https://phabricator.wikimedia.org/T259327)
[06:04:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [POC3 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/615179 (https://phabricator.wikimedia.org/T259327) (owner: 10Privacybatm)
[06:04:39] <wikibugs>	 (03Restored) 10Privacybatm: [POC2 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/614744 (https://phabricator.wikimedia.org/T257601) (owner: 10Privacybatm)
[06:04:50] <wikibugs>	 (03PS2) 10Privacybatm: [POC2 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/614744 (https://phabricator.wikimedia.org/T259327)
[06:04:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [POC2 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/614744 (https://phabricator.wikimedia.org/T259327) (owner: 10Privacybatm)
[06:05:58] <wikibugs>	 (03Abandoned) 10Privacybatm: [POC2 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/614744 (https://phabricator.wikimedia.org/T259327) (owner: 10Privacybatm)
[06:06:18] <wikibugs>	 (03Restored) 10Privacybatm: [POC1 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/614745 (https://phabricator.wikimedia.org/T257601) (owner: 10Privacybatm)
[06:06:31] <wikibugs>	 (03PS3) 10Privacybatm: [POC1 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/614745 (https://phabricator.wikimedia.org/T259327)
[06:06:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [POC1 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/614745 (https://phabricator.wikimedia.org/T259327) (owner: 10Privacybatm)
[06:06:49] <wikibugs>	 (03Abandoned) 10Privacybatm: [POC1 WIP] transferpy: Multiprocess the transfers [software/transferpy] - 10https://gerrit.wikimedia.org/r/614745 (https://phabricator.wikimedia.org/T259327) (owner: 10Privacybatm)
[06:22:08] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1031 is CRITICAL: connect to address 10.64.20.73 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[06:22:08] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1031 is CRITICAL: connect to address 10.64.20.73 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[06:22:09] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1031 is CRITICAL: connect to address 10.64.20.73 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[06:24:50] <wikibugs>	 (03PS1) 10Ayounsi: Configure transport links OSPF based on Netbox data [homer/public] - 10https://gerrit.wikimedia.org/r/617603 (https://phabricator.wikimedia.org/T200277)
[06:26:51] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[06:26:52] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[06:26:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:26:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:29:35] <wikibugs>	 (03PS1) 10Elukey: druid: add cache monitoring for 0.19 clusters [puppet] - 10https://gerrit.wikimedia.org/r/617604 (https://phabricator.wikimedia.org/T244482)
[06:30:06] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] druid: add cache monitoring for 0.19 clusters [puppet] - 10https://gerrit.wikimedia.org/r/617604 (https://phabricator.wikimedia.org/T244482) (owner: 10Elukey)
[06:30:28] <wikibugs>	 (03CR) 10Ayounsi: "I'm not 100% satisfied with the Jinja code, so please let me know if you have suggestions on how to improve it." [homer/public] - 10https://gerrit.wikimedia.org/r/617603 (https://phabricator.wikimedia.org/T200277) (owner: 10Ayounsi)
[06:32:46] <elukey>	 !log roll restart of druid brokers on druid100[4-8] to pick up new changes
[06:32:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:40:28] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on cloudvirt1031 is CRITICAL: connect to address 10.64.20.73 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP
[06:50:19] <wikibugs>	 (03CR) 10Elukey: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/617479 (https://phabricator.wikimedia.org/T234826) (owner: 10Elukey)
[06:55:01] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::mariadb::misc::analytics::multiinstance: change ports [puppet] - 10https://gerrit.wikimedia.org/r/617479 (https://phabricator.wikimedia.org/T234826) (owner: 10Elukey)
[06:57:24] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1032 is CRITICAL: connect to address 10.64.20.74 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[06:57:34] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1032 is CRITICAL: connect to address 10.64.20.74 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[06:57:41] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1032 is CRITICAL: connect to address 10.64.20.74 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200731T0700)
[07:04:34] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on cloudvirt1032 is CRITICAL: connect to address 10.64.20.74 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP
[07:07:12] <elukey>	 !log stop mysql replication on db1108; update port config for mysql instances and restart them; restart replication on instances
[07:07:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:13:51] <wikibugs>	 (03PS1) 10Elukey: analytics-in[46]: add new ports for term mysql-replica [homer/public] - 10https://gerrit.wikimedia.org/r/617649 (https://phabricator.wikimedia.org/T234826)
[07:22:57] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] "Thank you for the quick fix!" [software/transferpy] - 10https://gerrit.wikimedia.org/r/617600 (https://phabricator.wikimedia.org/T257601) (owner: 10Privacybatm)
[07:23:25] <wikibugs>	 (03Merged) 10jenkins-bot: Sphinx: Resolve unexpected intend error [software/transferpy] - 10https://gerrit.wikimedia.org/r/617600 (https://phabricator.wikimedia.org/T257601) (owner: 10Privacybatm)
[07:39:30] <wikibugs>	 (03Restored) 10Jcrespo: mariadb: Create ugly exception for port assignment for db1108 [puppet] - 10https://gerrit.wikimedia.org/r/617077 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[07:39:41] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Match port 3351 and 3352 to 2 analytics sections [puppet] - 10https://gerrit.wikimedia.org/r/617077 (https://phabricator.wikimedia.org/T234826)
[07:39:43] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-backups: Move db1108 (analytics db) backups' ports [puppet] - 10https://gerrit.wikimedia.org/r/617650 (https://phabricator.wikimedia.org/T234826)
[07:40:12] <wikibugs>	 (03CR) 10Jcrespo: "New option." [puppet] - 10https://gerrit.wikimedia.org/r/617077 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[07:41:48] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] mariadb: Match port 3351 and 3352 to 2 analytics sections [puppet] - 10https://gerrit.wikimedia.org/r/617077 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[07:42:18] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Match port 3351 and 3352 to 2 analytics sections [puppet] - 10https://gerrit.wikimedia.org/r/617077 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[07:42:20] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] mariadb-backups: Move db1108 (analytics db) backups' ports [puppet] - 10https://gerrit.wikimedia.org/r/617650 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[07:42:33] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Move db1108 (analytics db) backups' ports [puppet] - 10https://gerrit.wikimedia.org/r/617650 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[07:43:54] <icinga-wm>	 RECOVERY - MariaDB read only matomo on db1108 is OK: Version 10.4.13-MariaDB-log, Uptime 2715s, read_only: True, event_scheduler: True, 27.33 QPS, connection latency: 0.002670s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[07:48:06] <jynus>	 elukey: can we run backups and I show you how to recover?
[07:48:20] <jynus>	 also you check backups look good
[07:49:36] <elukey>	 jynus: sure!
[07:49:59] <elukey>	 if the recovery is what we have on wikitech for bacula I have already used it for other stuff (like archiva etc..)
[07:50:20] <moritzm>	 !log uploaded lilypond 2.19.81+really-2.18.2-13~bpo9+1+wmf1 to stretch-wikimedia T256877
[07:50:20] <elukey>	 so I have an idea about the recovery process (but never done for mariadb)
[07:50:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:50:25] <stashbot>	 T256877: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877
[07:50:50] <jynus>	 elukey: indeed https://wikitech.wikimedia.org/wiki/MariaDB/Backups#Recovering_a_logical_backup
[07:50:55] <jynus>	 but it is not bacula
[07:51:03] <moritzm>	 !log updating lilypond on mw* servers
[07:51:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:51:07] <jynus>	 as bacula is used for storage, but not for automatic database loading
[07:51:24] <elukey>	 ahh nice
[07:51:25] <jynus>	 let me pm
[07:55:52] <wikibugs>	 10Operations, 10Graphoid, 10Code-Stewardship-Reviews, 10Release-Engineering-Team (Code Health), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10akosiaris) >>! In T211881#6350152, @kaldari wrote: > I'd like to propose that we close this ticket, since we've dec...
[07:57:01] <wikibugs>	 10Operations, 10Release Pipeline, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Create Graphoid .pipeline files - https://phabricator.wikimedia.org/T203092 (10akosiaris) 05Stalled→03Declined With graphoid being undeployed in T242855, this makes no sense anymore. Declining.
[07:57:08] <wikibugs>	 10Operations, 10Release Pipeline, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Pipeline), 10Services (watching): Move Graphoid to Kubernetes via the deployment pipeline - https://phabricator.wikimedia.org/T203091 (10akosiaris)
[08:02:57] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend snapshot Cumin alias to also include the testbed role [puppet] - 10https://gerrit.wikimedia.org/r/617651
[08:10:50] <icinga-wm>	 RECOVERY - dump of analytics_meta in eqiad on icinga1001 is OK: Last dump for analytics_meta at eqiad (db1108.eqiad.wmnet:3352) taken on 2020-07-31 07:54:57 (1 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[08:14:51] <wikibugs>	 (03PS1) 10JMeybohm: Switch helmfiles to use chartmuseum repository [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843)
[08:21:58] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:22:41] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Switch helmfiles to use chartmuseum repository (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[08:25:42] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:35:21] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Increase analytics binlog retention time to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/617653 (https://phabricator.wikimedia.org/T234826)
[08:38:06] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Increase analytics binlog retention time to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/617653 (https://phabricator.wikimedia.org/T234826)
[08:39:51] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] mariadb: Increase analytics binlog retention time to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/617653 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[08:42:12] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Increase analytics binlog retention time to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/617653 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[08:44:27] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] api-gateway: add helmfile.d configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/616467 (https://phabricator.wikimedia.org/T254906) (owner: 10Hnowlan)
[08:51:54] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Increase misc db binlog retention to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/617656 (https://phabricator.wikimedia.org/T234826)
[08:51:56] <wikibugs>	 (03PS2) 10JMeybohm: Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843)
[08:52:10] <icinga-wm>	 ACKNOWLEDGEMENT - OTRS SMTP on otrs1001 is CRITICAL: connect to address 10.64.16.39 and port 25: Connection refused alexandros kosiaris ignore, migration ongoing. https://wikitech.wikimedia.org/wiki/OTRS%23Troubleshooting
[08:52:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] mariadb: Increase misc db binlog retention to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/617656 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[08:53:40] <wikibugs>	 (03CR) 10JMeybohm: Remove the repository definition from helmfiles (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[08:53:53] <wikibugs>	 (03CR) 10Jcrespo: "At least until binlog backup is centralized on dbprov hosts." [puppet] - 10https://gerrit.wikimedia.org/r/617656 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[08:54:26] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Increase misc db binlog retention to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/617656 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[08:56:21] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] mariadb: Increase misc db binlog retention to 14 days [puppet] - 10https://gerrit.wikimedia.org/r/617656 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[08:59:22] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] logstash7: increase SSD tier JVM heap to 32G [puppet] - 10https://gerrit.wikimedia.org/r/617526 (https://phabricator.wikimedia.org/T259219) (owner: 10Herron)
[08:59:41] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] profile: ensure only one webrequest host sends 5xx to logstash [puppet] - 10https://gerrit.wikimedia.org/r/617388 (https://phabricator.wikimedia.org/T247968) (owner: 10Filippo Giunchedi)
[08:59:46] <wikibugs>	 (03PS3) 10Filippo Giunchedi: profile: ensure only one webrequest host sends 5xx to logstash [puppet] - 10https://gerrit.wikimedia.org/r/617388 (https://phabricator.wikimedia.org/T247968)
[08:59:52] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:07:04] <icinga-wm>	 RECOVERY - OTRS SMTP on otrs1001 is OK: SMTP OK - 0.007 sec. response time https://wikitech.wikimedia.org/wiki/OTRS%23Troubleshooting
[09:07:24] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:07:55] <wikibugs>	 10Operations, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10fgiunchedi)
[09:12:45] <wikibugs>	 (03PS1) 10JMeybohm: chartmuseum: Change repository name to stable [puppet] - 10https://gerrit.wikimedia.org/r/617659 (https://phabricator.wikimedia.org/T253843)
[09:14:39] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] chartmuseum: Change repository name to stable [puppet] - 10https://gerrit.wikimedia.org/r/617659 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[09:16:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "PCC https://puppet-compiler.wmflabs.org/compiler1001/24244/" [puppet] - 10https://gerrit.wikimedia.org/r/617083 (owner: 10Filippo Giunchedi)
[09:16:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] rsync: listen for stunnel connections on v4/v6 [puppet] - 10https://gerrit.wikimedia.org/r/617083 (owner: 10Filippo Giunchedi)
[09:16:49] <wikibugs>	 10Operations: gerrit.wm.o/r/changes/ has leading garbage in the output - https://phabricator.wikimedia.org/T259333 (10Kormat)
[09:18:03] <wikibugs>	 10Operations, 10Gerrit: gerrit.wm.o/r/changes/ has leading garbage in the output - https://phabricator.wikimedia.org/T259333 (10Majavah)
[09:18:27] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] chartmuseum: Change repository name to stable [puppet] - 10https://gerrit.wikimedia.org/r/617659 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[09:23:19] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM! Haven't tried building the package tho" [debs/prometheus-es-exporter] (debian/sid) - 10https://gerrit.wikimedia.org/r/617250 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[09:28:58] <wikibugs>	 (03CR) 10Filippo Giunchedi: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/617260 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[09:31:38] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Add port analytics assignment to wmfmariadbpy and backups [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/617661 (https://phabricator.wikimedia.org/T234826)
[09:33:08] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Add port analytics assignment to wmfmariadbpy and backups [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/617661 (https://phabricator.wikimedia.org/T234826)
[09:33:56] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:35:18] <icinga-wm>	 PROBLEM - Check systemd state on idp1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:36:26] <godog>	 that's probably me ^
[09:36:31] <wikibugs>	 (03CR) 10Kormat: [C: 04-1] mariadb: Add port analytics assignment to wmfmariadbpy and backups (033 comments) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/617661 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[09:37:59] <wikibugs>	 (03CR) 10Kormat: [C: 03+1] "My review crossed your updated. LGTM :)" (033 comments) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/617661 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[09:41:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good. The package name is too generic from my POV ("es" could be other things besides elasticsearch), but there's also an argument f" (031 comment) [debs/prometheus-es-exporter] (debian/sid) - 10https://gerrit.wikimedia.org/r/617250 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[09:41:33] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Add port analytics assignment to wmfmariadbpy and backups [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/617661 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[09:42:02] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-backups: Update backup automation to wmfmariadbpy's HEAD [puppet] - 10https://gerrit.wikimedia.org/r/617662 (https://phabricator.wikimedia.org/T234826)
[09:43:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb-backups: Update backup automation to wmfmariadbpy's HEAD [puppet] - 10https://gerrit.wikimedia.org/r/617662 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[09:44:38] <wikibugs>	 (03PS2) 10Jcrespo: mariadb-backups: Update backup automation to wmfmariadbpy's HEAD [puppet] - 10https://gerrit.wikimedia.org/r/617662 (https://phabricator.wikimedia.org/T234826)
[09:47:11] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Update backup automation to wmfmariadbpy's HEAD [puppet] - 10https://gerrit.wikimedia.org/r/617662 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[09:52:23] <wikibugs>	 10Operations: Rebase patches for VP9 support to ffmpeg 3.2.15 - https://phabricator.wikimedia.org/T259336 (10MoritzMuehlenhoff)
[09:56:34] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:56:59] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-backups: Add _ to the list of characters alowed for section names [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/617668 (https://phabricator.wikimedia.org/T234826)
[09:58:14] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Add _ to the list of characters alowed for section names [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/617668 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[09:58:29] <wikibugs>	 (03PS1) 10Muehlenhoff: Fix typo in sources for older distros on package build host (and remove jessie) [puppet] - 10https://gerrit.wikimedia.org/r/617669
[09:58:36] <wikibugs>	 (03PS15) 10Kormat: Create debian packages. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021)
[09:58:44] <wikibugs>	 (03Merged) 10jenkins-bot: mariadb-backups: Add _ to the list of characters alowed for section names [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/617668 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[10:02:14] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:03:40] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-backups: Update backup_mariadb.py to HEAD [puppet] - 10https://gerrit.wikimedia.org/r/617670 (https://phabricator.wikimedia.org/T234826)
[10:04:01] <wikibugs>	 (03PS2) 10Jcrespo: mariadb-backups: Update backup_mariadb.py to HEAD [puppet] - 10https://gerrit.wikimedia.org/r/617670 (https://phabricator.wikimedia.org/T234826)
[10:04:35] <wikibugs>	 10Operations, 10Gerrit, 10User-Kormat: gerrit.wm.o/r/changes/ has leading garbage in the output - https://phabricator.wikimedia.org/T259333 (10Kormat)
[10:05:40] <wikibugs>	 (03CR) 10Kormat: "I've done some light testing, and this seems to work. Lintian isn't overly thrilled with me, however:" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021) (owner: 10Kormat)
[10:05:47] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Update backup_mariadb.py to HEAD [puppet] - 10https://gerrit.wikimedia.org/r/617670 (https://phabricator.wikimedia.org/T234826) (owner: 10Jcrespo)
[10:07:33] <wikibugs>	 (03CR) 10Jcrespo: "> Patch Set 15:" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021) (owner: 10Kormat)
[10:07:56] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:07:59] <wikibugs>	 (03PS1) 10JMeybohm: helm: Allow multiple helm repositories [puppet] - 10https://gerrit.wikimedia.org/r/617673 (https://phabricator.wikimedia.org/T253843)
[10:10:13] <wikibugs>	 (03CR) 10JMeybohm: "PCC https://puppet-compiler.wmflabs.org/compiler1001/24245/deploy1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/617673 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[10:13:10] <wikibugs>	 (03PS3) 10JMeybohm: Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843)
[10:13:45] <wikibugs>	 (03CR) 10Kormat: "> Nothing there seems too surprising, but do you know where "out-of-date-standards-version 4.1.2" comes from?" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021) (owner: 10Kormat)
[10:17:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/617669 (owner: 10Muehlenhoff)
[10:19:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Fix typo in sources for older distros on package build host (and remove jessie) [puppet] - 10https://gerrit.wikimedia.org/r/617669 (owner: 10Muehlenhoff)
[10:23:21] <wikibugs>	 (03PS4) 10JMeybohm: Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843)
[10:32:44] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:40:18] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:50:25] <wikibugs>	 (03PS1) 10Jbond: profile::gerrit::migrations: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617676 (https://phabricator.wikimedia.org/T247956)
[10:51:31] <wikibugs>	 (03PS2) 10Jbond: profile::gerrit::migrations: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617676 (https://phabricator.wikimedia.org/T247956)
[10:53:54] <wikibugs>	 (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/24247/" [puppet] - 10https://gerrit.wikimedia.org/r/617676 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[10:58:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] debianization (031 comment) [debs/prometheus-es-exporter] (debian/sid) - 10https://gerrit.wikimedia.org/r/617250 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[10:59:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, two minor nits inline" (032 comments) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021) (owner: 10Kormat)
[11:03:37] <wikibugs>	 10Operations, 10Acme-chief, 10Traffic: do not generate metadata for parts that aren't allowed - https://phabricator.wikimedia.org/T259338 (10Vgutierrez) p:05Triage→03Medium
[11:04:19] <wikibugs>	 (03PS1) 10Vgutierrez: api: Exclude not valid parts from get_directory_metadata output [software/acme-chief] - 10https://gerrit.wikimedia.org/r/617680 (https://phabricator.wikimedia.org/T259338)
[11:11:13] <wikibugs>	 10Operations: Rebase patches for VP9 support to ffmpeg 3.2.15 - https://phabricator.wikimedia.org/T259336 (10MoritzMuehlenhoff) For the record, the steps to validate that VP9 multi-threading support works as expected in the new ffmpeg build:  * Download https://upload.wikimedia.org/wikipedia/commons/6/69/Wall_of...
[11:16:36] <moritzm>	 !log imported ffmpeg 3.2.15-0+deb9u1+wmf1 to component/vp9 for stretch-wikimedia T259336
[11:16:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:16:43] <stashbot>	 T259336: Rebase patches for VP9 support to ffmpeg 3.2.15 - https://phabricator.wikimedia.org/T259336
[11:19:54] <wikibugs>	 (03PS1) 10Jbond: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617683 (https://phabricator.wikimedia.org/T247956)
[11:19:56] <moritzm>	 !log installing ffmpeg security updates for jessie (standard version from security.debian.org, not the VP9-enabled component)
[11:20:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:21:19] <jynus>	 !log restart dbstore1004
[11:21:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:22:50] <wikibugs>	 (03PS2) 10Jbond: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617683 (https://phabricator.wikimedia.org/T247956)
[11:28:15] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Reduce buffer cache memory footprint to prevent OOMs [puppet] - 10https://gerrit.wikimedia.org/r/617684
[11:29:26] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[11:29:48] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Reduce buffer cache memory footprint to prevent OOMs [puppet] - 10https://gerrit.wikimedia.org/r/617684 (owner: 10Jcrespo)
[11:32:21] <wikibugs>	 (03PS1) 10Ema: atskafka: librdkafka settings tuning [puppet] - 10https://gerrit.wikimedia.org/r/617685 (https://phabricator.wikimedia.org/T254317)
[11:35:03] <wikibugs>	 (03PS3) 10Jbond: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617683 (https://phabricator.wikimedia.org/T247956)
[11:35:59] <wikibugs>	 (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/24250/" [puppet] - 10https://gerrit.wikimedia.org/r/617683 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[11:36:45] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt103[1-9]: move to insetup until I can figure out what's happening [puppet] - 10https://gerrit.wikimedia.org/r/617686 (https://phabricator.wikimedia.org/T251627)
[11:37:02] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[11:37:20] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt103[1-9]: move to insetup until I can figure out what's happening [puppet] - 10https://gerrit.wikimedia.org/r/617686 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[11:39:29] <wikibugs>	 (03PS2) 10Ema: atskafka: librdkafka settings tuning [puppet] - 10https://gerrit.wikimedia.org/r/617685 (https://phabricator.wikimedia.org/T254317)
[11:42:18] <wikibugs>	 (03CR) 10Ema: [C: 03+1] "Other than for the comment I've left inline, this looks great." (031 comment) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021) (owner: 10Kormat)
[11:42:41] <wikibugs>	 (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/617685 (https://phabricator.wikimedia.org/T254317) (owner: 10Ema)
[11:51:59] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] atskafka: librdkafka settings tuning [puppet] - 10https://gerrit.wikimedia.org/r/617685 (https://phabricator.wikimedia.org/T254317) (owner: 10Ema)
[11:54:06] <wikibugs>	 (03PS1) 10Filippo Giunchedi: alertmanager: add IRC notifier [puppet] - 10https://gerrit.wikimedia.org/r/617688 (https://phabricator.wikimedia.org/T258948)
[11:54:08] <wikibugs>	 (03PS1) 10Filippo Giunchedi: role: add alertmanager::irc to alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/617689 (https://phabricator.wikimedia.org/T258948)
[11:54:34] <wikibugs>	 (03PS1) 10Jbond: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617690 (https://phabricator.wikimedia.org/T247956)
[11:55:53] <moritzm>	 !log installing mercurial security updates
[11:55:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:56:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617690 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[11:56:22] <wikibugs>	 (03PS2) 10Jbond: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617690 (https://phabricator.wikimedia.org/T247956)
[11:57:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Things still TODO/pending:" [puppet] - 10https://gerrit.wikimedia.org/r/617688 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[11:57:17] <wikibugs>	 10Operations: Rebase patches for VP9 support to ffmpeg 3.2.15 - https://phabricator.wikimedia.org/T259336 (10MoritzMuehlenhoff) 05Open→03Resolved This is completed
[12:02:52] <icinga-wm>	 RECOVERY - Check systemd state on idp1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:03:33] <wikibugs>	 (03PS3) 10Jbond: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617690 (https://phabricator.wikimedia.org/T247956)
[12:07:06] <wikibugs>	 (03PS4) 10Jbond: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617690 (https://phabricator.wikimedia.org/T247956)
[12:08:48] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[12:09:15] <wikibugs>	 (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/24259/" [puppet] - 10https://gerrit.wikimedia.org/r/617690 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[12:10:36] <wikibugs>	 (03PS16) 10Kormat: Create debian packages. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021)
[12:11:04] <wikibugs>	 (03CR) 10Kormat: Create debian packages. (033 comments) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021) (owner: 10Kormat)
[12:11:29] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/617673 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[12:12:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021) (owner: 10Kormat)
[12:14:26] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) is CRITICAL: Test bad URL returned the unexpected status 200 (expecting: 404) https://wikitech.wikimedia.org/wiki/Citoid
[12:15:01] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor nitpick but otherwise LGTM" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[12:16:11] <wikibugs>	 (03PS1) 10Jbond: profile::gerrit::server: rename profile [puppet] - 10https://gerrit.wikimedia.org/r/617691
[12:18:01] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] helm: Allow multiple helm repositories [puppet] - 10https://gerrit.wikimedia.org/r/617673 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[12:18:47] <wikibugs>	 (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/24261/" [puppet] - 10https://gerrit.wikimedia.org/r/617691 (owner: 10Jbond)
[12:19:55] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] analytics-in[46]: add new ports for term mysql-replica [homer/public] - 10https://gerrit.wikimedia.org/r/617649 (https://phabricator.wikimedia.org/T234826) (owner: 10Elukey)
[12:23:10] <wikibugs>	 (03PS5) 10JMeybohm: Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843)
[12:23:12] <wikibugs>	 (03PS1) 10JMeybohm: Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617693 (https://phabricator.wikimedia.org/T253843)
[12:23:14] <wikibugs>	 (03PS1) 10JMeybohm: changeprop: Update repository URL in requirements [deployment-charts] - 10https://gerrit.wikimedia.org/r/617694 (https://phabricator.wikimedia.org/T253843)
[12:23:17] <wikibugs>	 (03PS1) 10JMeybohm: eventgate: Update repository URL in requirements [deployment-charts] - 10https://gerrit.wikimedia.org/r/617695 (https://phabricator.wikimedia.org/T253843)
[12:25:26] <wikibugs>	 (03CR) 10JMeybohm: Remove the repository definition from helmfiles (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[12:25:59] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-2] "Needs testing" [deployment-charts] - 10https://gerrit.wikimedia.org/r/617693 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[12:27:52] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[12:28:44] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 20897144 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:29:06] <wikibugs>	 (03Merged) 10jenkins-bot: Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617652 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[12:29:33] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: WIP prometheus::alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/354976 (owner: 10Filippo Giunchedi)
[12:29:54] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: role: use alertmanager in beta prometheus [puppet] - 10https://gerrit.wikimedia.org/r/354460 (owner: 10Filippo Giunchedi)
[12:29:58] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:30:38] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 45456 and 90 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:31:39] <wikibugs>	 (03PS2) 10Ayounsi: Netbox: add circuits support [software/homer] - 10https://gerrit.wikimedia.org/r/617418
[12:31:51] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: profile: install SMART checks after 'raid' fact is available. [puppet] - 10https://gerrit.wikimedia.org/r/428947 (https://phabricator.wikimedia.org/T132324) (owner: 10Filippo Giunchedi)
[12:32:58] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Create debian packages. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021) (owner: 10Kormat)
[12:33:29] <wikibugs>	 (03Merged) 10jenkins-bot: Create debian packages. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/616846 (https://phabricator.wikimedia.org/T259021) (owner: 10Kormat)
[12:33:46] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:41:09] <wikibugs>	 (03PS2) 10JMeybohm: Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617693 (https://phabricator.wikimedia.org/T253843)
[12:41:28] <wikibugs>	 (03PS2) 10JMeybohm: changeprop: Update repository URL in requirements [deployment-charts] - 10https://gerrit.wikimedia.org/r/617694 (https://phabricator.wikimedia.org/T253843)
[12:41:30] <wikibugs>	 (03PS2) 10JMeybohm: eventgate: Update repository URL in requirements [deployment-charts] - 10https://gerrit.wikimedia.org/r/617695 (https://phabricator.wikimedia.org/T253843)
[12:41:32] <wikibugs>	 (03PS1) 10JMeybohm: mathoid: Change staging chart back to stable [deployment-charts] - 10https://gerrit.wikimedia.org/r/617699 (https://phabricator.wikimedia.org/T253843)
[12:41:49] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: otrs: Disallow outgoing emails from test instance [puppet] - 10https://gerrit.wikimedia.org/r/617700 (https://phabricator.wikimedia.org/T187984)
[12:44:12] <wikibugs>	 (03PS1) 10JMeybohm: helm: Switch stable chart repository to chartmuseum [puppet] - 10https://gerrit.wikimedia.org/r/617701 (https://phabricator.wikimedia.org/T25384)
[12:46:39] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] otrs: Disallow outgoing emails from test instance [puppet] - 10https://gerrit.wikimedia.org/r/617700 (https://phabricator.wikimedia.org/T187984) (owner: 10Alexandros Kosiaris)
[12:48:15] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] helmfile: strawman refactoring (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/615498 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto)
[12:50:47] <wikibugs>	 (03CR) 10Ema: [C: 03+2] atskafka: librdkafka settings tuning [puppet] - 10https://gerrit.wikimedia.org/r/617685 (https://phabricator.wikimedia.org/T254317) (owner: 10Ema)
[12:53:10] <icinga-wm>	 PROBLEM - Check systemd state on otrs1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:53:58] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:55:52] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:02:31] <wikibugs>	 (03PS1) 10Jbond: standard: move none standard class to profile::standard [puppet] - 10https://gerrit.wikimedia.org/r/617703 (https://phabricator.wikimedia.org/T247956)
[13:02:33] <wikibugs>	 (03PS1) 10Jbond: profile::standard::admin: manage admin groups in profile::standard [puppet] - 10https://gerrit.wikimedia.org/r/617704 (https://phabricator.wikimedia.org/T247956)
[13:02:41] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: otrs: Fix ferm::rule syntax [puppet] - 10https://gerrit.wikimedia.org/r/617705
[13:03:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] standard: move none standard class to profile::standard [puppet] - 10https://gerrit.wikimedia.org/r/617703 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[13:04:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::standard::admin: manage admin groups in profile::standard [puppet] - 10https://gerrit.wikimedia.org/r/617704 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[13:04:24] <wikibugs>	 10Operations, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10MoritzMuehlenhoff)
[13:04:31] <kormat>	 !log proudly uploaded version 0.1 of python3-wmfmariadbpy + wmfmariadbpy
[13:04:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Extend snapshot Cumin alias to also include the testbed role [puppet] - 10https://gerrit.wikimedia.org/r/617651 (owner: 10Muehlenhoff)
[13:09:45] <wikibugs>	 (03PS2) 10Jbond: standard: move none standard class to profile::standard [puppet] - 10https://gerrit.wikimedia.org/r/617703 (https://phabricator.wikimedia.org/T247956)
[13:10:51] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] otrs: Fix ferm::rule syntax [puppet] - 10https://gerrit.wikimedia.org/r/617705 (owner: 10Alexandros Kosiaris)
[13:13:01] <wikibugs>	 (03PS2) 10Jbond: profile::standard::admin: manage admin groups in profile::standard [puppet] - 10https://gerrit.wikimedia.org/r/617704 (https://phabricator.wikimedia.org/T247956)
[13:13:58] <icinga-wm>	 RECOVERY - Check systemd state on otrs1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:17:55] <wikibugs>	 (03PS1) 10Jbond: ferm: ensure rules always end in a semi colon [puppet] - 10https://gerrit.wikimedia.org/r/617706
[13:17:57] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/617704 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[13:18:43] <wikibugs>	 (03PS2) 10Jbond: ferm: ensure rules always end in a semi colon [puppet] - 10https://gerrit.wikimedia.org/r/617706
[13:20:09] <moritzm>	 !log installing openjpeg2 security updates
[13:20:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:46] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] eventgate: Update repository URL in requirements [deployment-charts] - 10https://gerrit.wikimedia.org/r/617695 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[13:26:28] <wikibugs>	 (03PS1) 10Jbond: hieradata: drop apache::logrotate keys [puppet] - 10https://gerrit.wikimedia.org/r/617708 (https://phabricator.wikimedia.org/T247956)
[13:27:19] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] hieradata: drop apache::logrotate keys [puppet] - 10https://gerrit.wikimedia.org/r/617708 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[13:30:24] <wikibugs>	 (03PS1) 10Jbond: diamond: remove unused hiera key [puppet] - 10https://gerrit.wikimedia.org/r/617710 (https://phabricator.wikimedia.org/T247956)
[13:30:55] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] diamond: remove unused hiera key [puppet] - 10https://gerrit.wikimedia.org/r/617710 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[13:39:37] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] mathoid: Change staging chart back to stable (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/617699 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[13:42:33] <wikibugs>	 (03PS1) 10Jbond: confd: use the default $::domain variable for cofd  srv_dns [puppet] - 10https://gerrit.wikimedia.org/r/617716 (https://phabricator.wikimedia.org/T247956)
[13:42:44] <wikibugs>	 (03PS1) 10Urbanecm: New throttle rule for Czech editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/617717 (https://phabricator.wikimedia.org/T259352)
[13:43:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] New throttle rule for Czech editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/617717 (https://phabricator.wikimedia.org/T259352) (owner: 10Urbanecm)
[13:44:20] <wikibugs>	 (03PS2) 10Urbanecm: New throttle rule for Czech editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/617717 (https://phabricator.wikimedia.org/T259352)
[13:45:54] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] changeprop: Update repository URL in requirements [deployment-charts] - 10https://gerrit.wikimedia.org/r/617694 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[13:46:30] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] eventgate: Update repository URL in requirements [deployment-charts] - 10https://gerrit.wikimedia.org/r/617695 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[13:47:12] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617693 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[13:49:46] <wikibugs>	 (03PS2) 10Jbond: confd: use the default $::domain variable for cofd  srv_dns [puppet] - 10https://gerrit.wikimedia.org/r/617716 (https://phabricator.wikimedia.org/T247956)
[13:50:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] confd: use the default $::domain variable for cofd  srv_dns [puppet] - 10https://gerrit.wikimedia.org/r/617716 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[13:51:37] <moritzm>	 !log installing cups security updates (client-side tools/libs only)
[13:51:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:50] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] analytics-in[46]: add new ports for term mysql-replica [homer/public] - 10https://gerrit.wikimedia.org/r/617649 (https://phabricator.wikimedia.org/T234826) (owner: 10Elukey)
[13:52:15] <elukey>	 !log update cr1/cr2-eqiad's analytics filters (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/617649/)
[13:52:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:59] <wikibugs>	 (03PS3) 10Jbond: confd: use the default $::domain variable for cofd  srv_dns [puppet] - 10https://gerrit.wikimedia.org/r/617716 (https://phabricator.wikimedia.org/T247956)
[13:59:41] <wikibugs>	 (03CR) 10Jbond: "Ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/617716 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[14:00:50] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:03:54] <wikibugs>	 (03PS4) 10Jbond: confd: pass srv_dns directly instead of loading confd::srv_dns [puppet] - 10https://gerrit.wikimedia.org/r/617716 (https://phabricator.wikimedia.org/T247956)
[14:04:33] <wikibugs>	 (03PS1) 10Jbond: hieradata: remove unused hiera file [puppet] - 10https://gerrit.wikimedia.org/r/617723
[14:05:10] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] hieradata: remove unused hiera file [puppet] - 10https://gerrit.wikimedia.org/r/617723 (owner: 10Jbond)
[14:08:24] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:09:38] <wikibugs>	 (03PS1) 10Jbond: discovery: clean up old hiera values [puppet] - 10https://gerrit.wikimedia.org/r/617724 (https://phabricator.wikimedia.org/T247956)
[14:10:48] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] discovery: clean up old hiera values [puppet] - 10https://gerrit.wikimedia.org/r/617724 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[14:11:17] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] mathoid: Change staging chart back to stable [deployment-charts] - 10https://gerrit.wikimedia.org/r/617699 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[14:11:28] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617693 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[14:12:19] <wikibugs>	 (03Merged) 10jenkins-bot: mathoid: Change staging chart back to stable [deployment-charts] - 10https://gerrit.wikimedia.org/r/617699 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[14:12:43] <wikibugs>	 (03Merged) 10jenkins-bot: Remove the repository definition from helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/617693 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[14:17:16] <icinga-wm>	 PROBLEM - Widespread puppet agent failures- no resources reported on icinga1001 is CRITICAL: 0.1213 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[14:21:21] <elukey>	 jbond42: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Function lookup() did not find a value for the name 'discovery::app_routes' on node netbox1001.wikimedia.org
[14:21:50] <elukey>	 puppetboard is like a christmas tree :D
[14:22:25] <wikibugs>	 10Operations: Integrate Stretch 9.13 point update - https://phabricator.wikimedia.org/T258407 (10MoritzMuehlenhoff)
[14:24:25] <elukey>	 there seems to be an app_routes = hiera('discovery::app_routes') in realm.pp, related to aqs, no idea why
[14:25:06] <elukey>	 apparently used by restbase
[14:25:08] <jbond42>	 elukey: thanks looking
[14:26:05] <wikibugs>	 (03PS1) 10Jbond: Revert "discovery: clean up old hiera values" [puppet] - 10https://gerrit.wikimedia.org/r/617579
[14:26:48] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] Revert "discovery: clean up old hiera values" [puppet] - 10https://gerrit.wikimedia.org/r/617579 (owner: 10Jbond)
[14:27:37] <jbond42>	 elukey thanks have reverted 
[14:30:17] <wikibugs>	 (03PS1) 10Jbond: graphite: move graphite paramters under profile namespace [puppet] - 10https://gerrit.wikimedia.org/r/617725 (https://phabricator.wikimedia.org/T247956)
[14:31:59] <wikibugs>	 (03PS1) 10Jbond: discovery: clean up old hiera values [puppet] - 10https://gerrit.wikimedia.org/r/617580 (https://phabricator.wikimedia.org/T247956)
[14:40:00] <wikibugs>	 (03PS1) 10MSantos: Enable printBackground to fix style issues [deployment-charts] - 10https://gerrit.wikimedia.org/r/617728 (https://phabricator.wikimedia.org/T52178)
[14:40:26] <wikibugs>	 (03PS2) 10Jbond: discovery: clean up old hiera values [puppet] - 10https://gerrit.wikimedia.org/r/617580 (https://phabricator.wikimedia.org/T247956)
[14:40:28] <wikibugs>	 (03PS1) 10Jbond: profile::restbase: update aqs_uri to remove aqs_site variable [puppet] - 10https://gerrit.wikimedia.org/r/617729 (https://phabricator.wikimedia.org/T247956)
[14:40:54] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/617729 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[14:42:30] <wikibugs>	 10Operations, 10OTRS, 10serviceops, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10eyazi) Not sure if you did, but you should also reset the Ticket::SearchIndexModule setting. Can be done on the interface if you have acce...
[14:42:48] <wikibugs>	 (03PS3) 10Jbond: discovery: clean up old hiera values [puppet] - 10https://gerrit.wikimedia.org/r/617580 (https://phabricator.wikimedia.org/T247956)
[14:45:54] <icinga-wm>	 RECOVERY - Widespread puppet agent failures- no resources reported on icinga1001 is OK: (C)0.01 ge (W)0.006 ge 0.002489 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[14:52:21] <wikibugs>	 (03CR) 10Mholloway: "You'll also need to do a new chart release with this change to get it into production. The process is described in the README, but in a nu" [deployment-charts] - 10https://gerrit.wikimedia.org/r/617728 (https://phabricator.wikimedia.org/T52178) (owner: 10MSantos)
[14:54:17] <wikibugs>	 (03CR) 10Mholloway: "Lol, I'm failing badly at Gerrit this morning." [deployment-charts] - 10https://gerrit.wikimedia.org/r/617728 (https://phabricator.wikimedia.org/T52178) (owner: 10MSantos)
[14:54:43] <wikibugs>	 (03CR) 10Mholloway: "> Patch Set 1:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/617728 (https://phabricator.wikimedia.org/T52178) (owner: 10MSantos)
[15:01:20] <wikibugs>	 (03PS10) 10Ottomata: Initial debian commit [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/610880 (https://phabricator.wikimedia.org/T251006)
[15:07:43] <wikibugs>	 (03CR) 10Herron: [C: 03+2] logstash7: increase SSD tier JVM heap to 32G [puppet] - 10https://gerrit.wikimedia.org/r/617526 (https://phabricator.wikimedia.org/T259219) (owner: 10Herron)
[15:10:48] <wikibugs>	 10Operations, 10Release Pipeline, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Create Graphoid .pipeline files - https://phabricator.wikimedia.org/T203092 (10kaldari)
[15:10:53] <wikibugs>	 10Operations, 10Graphoid, 10Code-Stewardship-Reviews, 10Release-Engineering-Team (Code Health), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10kaldari) 05Open→03Resolved
[15:15:38] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[15:16:41] <wikibugs>	 (03PS1) 10Elukey: Set spark deploy-mode client for all the Analytics Hive to Druid jobs [puppet] - 10https://gerrit.wikimedia.org/r/617735 (https://phabricator.wikimedia.org/T254493)
[15:25:04] <wikibugs>	 (03PS1) 10Elukey: Swap fake keytabs from an-launcher1001 to 1002 [labs/private] - 10https://gerrit.wikimedia.org/r/617736
[15:25:21] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Swap fake keytabs from an-launcher1001 to 1002 [labs/private] - 10https://gerrit.wikimedia.org/r/617736 (owner: 10Elukey)
[15:28:41] <wikibugs>	 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020), 10User-notice: CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10RLazarus) The only user-impacting section of the process will be a read-only period for all wikis while we move MediaWiki itself -- that s...
[15:29:08] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/24268/an-launcher1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/617735 (https://phabricator.wikimedia.org/T254493) (owner: 10Elukey)
[15:30:37] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[15:30:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:38] <wikibugs>	 (03PS11) 10Ottomata: Initial debian commit [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/610880 (https://phabricator.wikimedia.org/T251006)
[15:32:00] <wikibugs>	 (03CR) 10MSantos: "> Patch Set 1:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/617728 (https://phabricator.wikimedia.org/T52178) (owner: 10MSantos)
[15:32:43] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[15:32:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:11] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1031.eqiad.wmnet'] `  and were...
[16:05:04] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[16:07:33] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for...
[16:09:59] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:20:04] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[16:20:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:11] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:22:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:31] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[16:22:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:24:37] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:24:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:38] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1032.eqiad.wmnet'] `  and were...
[16:30:05] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2016.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020...
[16:30:46] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudvirt1031.eqiad.wmnet'] `  and were...
[16:31:45] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2017.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020...
[16:37:55] <wikibugs>	 (03CR) 10Ottomata: Initial debian commit (031 comment) [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/610880 (https://phabricator.wikimedia.org/T251006) (owner: 10Ottomata)
[16:38:05] <wikibugs>	 (03PS4) 10Ahmon Dancy: Add mtail program for monitoring the Zuul error log [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821)
[16:39:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add mtail program for monitoring the Zuul error log [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[16:41:18] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:42:38] <wikibugs>	 (03PS1) 10Andrew Bogott: Update role for cloudvirt1031 and 1032 [puppet] - 10https://gerrit.wikimedia.org/r/617742 (https://phabricator.wikimedia.org/T251627)
[16:42:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update role for cloudvirt1031 and 1032 [puppet] - 10https://gerrit.wikimedia.org/r/617742 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[16:43:33] <wikibugs>	 (03PS2) 10Andrew Bogott: Update role for cloudvirt1031 and 1032 [puppet] - 10https://gerrit.wikimedia.org/r/617742 (https://phabricator.wikimedia.org/T251627)
[16:43:36] <wikibugs>	 (03CR) 10Ahmon Dancy: Add mtail program for monitoring the Zuul error log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[16:46:09] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Update role for cloudvirt1031 and 1032 [puppet] - 10https://gerrit.wikimedia.org/r/617742 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[16:49:11] <wikibugs>	 (03CR) 10Dzahn: Add mtail program for monitoring the Zuul error log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[16:53:19] <wikibugs>	 (03CR) 10Dzahn: "the issue now is "Found hiera call in class 'zuul::monitoring::server' for 'prometheus_nodes'". So you should do the lookup() in the param" [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[16:56:02] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "looks like there is a singular vs plural issue: migration/migrations" [puppet] - 10https://gerrit.wikimedia.org/r/617676 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[16:56:27] <wikibugs>	 (03CR) 10Ahmon Dancy: Add mtail program for monitoring the Zuul error log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[16:58:42] <Bsadowski1>	 Toolforge bad gateway? :O
[16:58:52] <wikibugs>	 (03CR) 10Dzahn: Add mtail program for monitoring the Zuul error log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[16:59:24] <mutante>	 Bsadowski1: let's use the !help feature in -cloud
[16:59:38] <Bsadowski1>	 nvm :P
[16:59:49] <RhinosF1>	 Bsadowski1: fine here
[16:59:56] <mutante>	 i could not repro, ack
[17:02:01] <wikibugs>	 (03CR) 10Ahmon Dancy: Add mtail program for monitoring the Zuul error log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[17:02:09] <wikibugs>	 (03PS1) 10Dzahn: switch xhgui1001 to new xhgui role [puppet] - 10https://gerrit.wikimedia.org/r/617744 (https://phabricator.wikimedia.org/T259206)
[17:06:13] <wikibugs>	 (03CR) 10Dzahn: "to solve the current reason for jerkins downvote:" [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[17:09:13] <wikibugs>	 (03PS1) 10Krinkle: mediawiki-cache-warmup: Reduce warmup URLs [puppet] - 10https://gerrit.wikimedia.org/r/617745
[17:09:15] <wikibugs>	 (03PS1) 10Krinkle: mediawiki-cache-warmup: Add "dry" mode [puppet] - 10https://gerrit.wikimedia.org/r/617746
[17:09:17] <wikibugs>	 (03PS1) 10Krinkle: mediawiki-cache-warmup: Limit warmup URLs to large wikis [puppet] - 10https://gerrit.wikimedia.org/r/617747
[17:11:56] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[17:11:56] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[17:12:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:12:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:15] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[17:13:15] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[17:13:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:14] <icinga-wm>	 PROBLEM - Check systemd state on wtp2017 is CRITICAL: connect to address 10.192.32.32 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:21:14] <icinga-wm>	 PROBLEM - MD RAID on wtp2017 is CRITICAL: connect to address 10.192.32.32 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[17:21:14] <icinga-wm>	 PROBLEM - php7.2-fpm service on wtp2017 is CRITICAL: connect to address 10.192.32.32 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:21:42] <mutante>	 ACK - wtp2017
[17:21:49] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[17:21:50] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[17:21:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:23:34] <icinga-wm>	 PROBLEM - configured eth on wtp2016 is CRITICAL: connect to address 10.192.32.31 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[17:23:34] <icinga-wm>	 PROBLEM - DPKG on wtp2016 is CRITICAL: connect to address 10.192.32.31 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[17:24:08] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt103[1-9]: rename nics for Stretch [puppet] - 10https://gerrit.wikimedia.org/r/617748 (https://phabricator.wikimedia.org/T251627)
[17:25:13] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[17:25:13] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[17:25:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:25:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:25:42] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt103[1-9]: rename nics for Stretch [puppet] - 10https://gerrit.wikimedia.org/r/617748 (https://phabricator.wikimedia.org/T251627) (owner: 10Andrew Bogott)
[17:30:48] <wikibugs>	 (03CR) 10Chad: [C: 03+1] "Nuke from high orbit!" [puppet] - 10https://gerrit.wikimedia.org/r/616164 (owner: 10Dzahn)
[17:31:35] <wikibugs>	 (03CR) 10Dzahn: "Greetings Chad! Hope you are doing well :) and ok, will do" [puppet] - 10https://gerrit.wikimedia.org/r/616164 (owner: 10Dzahn)
[17:32:08] <wikibugs>	 (03CR) 10Greg Grossmeier: [C: 03+1] "per Chad's comment. /me pours one out" [puppet] - 10https://gerrit.wikimedia.org/r/616164 (owner: 10Dzahn)
[17:33:27] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admins: remove demon from gerrit and phab root users [puppet] - 10https://gerrit.wikimedia.org/r/616164 (owner: 10Dzahn)
[17:33:32] <wikibugs>	 (03PS2) 10Dzahn: admins: remove demon from gerrit and phab root users [puppet] - 10https://gerrit.wikimedia.org/r/616164
[17:36:01] <wikibugs>	 (03CR) 10Chad: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/616164 (owner: 10Dzahn)
[17:40:53] <wikibugs>	 (03PS2) 10Dzahn: switch xhgui1001 to new xhgui role [puppet] - 10https://gerrit.wikimedia.org/r/617744 (https://phabricator.wikimedia.org/T259206)
[17:42:29] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] switch xhgui1001 to new xhgui role [puppet] - 10https://gerrit.wikimedia.org/r/617744 (https://phabricator.wikimedia.org/T259206) (owner: 10Dzahn)
[17:43:44] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] switch xhgui1001 to new xhgui role [puppet] - 10https://gerrit.wikimedia.org/r/617744 (https://phabricator.wikimedia.org/T259206) (owner: 10Dzahn)
[17:45:22] <mutante>	 !log rebooting / reinstalling OS on xhgui1001
[17:45:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:51] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[17:45:51] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[17:45:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:48:04] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:51:26] <wikibugs>	 (03PS1) 10Chad: Revoke all remaining group memberships, etc [puppet] - 10https://gerrit.wikimedia.org/r/617749
[17:54:01] <wikibugs>	 (03CR) 10Greg Grossmeier: [C: 03+1] Revoke all remaining group memberships, etc [puppet] - 10https://gerrit.wikimedia.org/r/617749 (owner: 10Chad)
[17:54:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revoke all remaining group memberships, etc [puppet] - 10https://gerrit.wikimedia.org/r/617749 (owner: 10Chad)
[17:56:57] <wikibugs>	 (03PS2) 10Greg Grossmeier: Revoke all remaining group memberships, etc [puppet] - 10https://gerrit.wikimedia.org/r/617749 (owner: 10Chad)
[17:57:03] <wikibugs>	 10Operations, 10Gerrit, 10User-Kormat: gerrit.wm.o/r/changes/ has leading garbage in the output - https://phabricator.wikimedia.org/T259333 (10dpifke) From https://gerrit-review.googlesource.com/Documentation/rest-api.html#output  > To prevent against Cross Site Script Inclusion (XSSI) attacks, the JSON resp...
[17:57:05] <wikibugs>	 (03PS3) 10Chad: Revoke all remaining group memberships, etc [puppet] - 10https://gerrit.wikimedia.org/r/617749
[18:01:21] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:13:14] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[18:18:50] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) is CRITICAL: Test bad URL returned the unexpected status 200 (expecting: 404) https://wikitech.wikimedia.org/wiki/Citoid
[18:19:01] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] mediawiki-cache-warmup: Reduce warmup URLs [puppet] - 10https://gerrit.wikimedia.org/r/617745 (owner: 10Krinkle)
[18:19:06] <wikibugs>	 10Operations, 10serviceops: reinstall xhgui* with buster - https://phabricator.wikimedia.org/T259206 (10Dzahn) 05Open→03Resolved both xhgui1001 and xhgui2001 are now on buster, have xhgui package installed and puppet is happy
[18:23:18] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:26:14] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:26:26] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:30:04] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:33:11] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] mediawiki-cache-warmup: Add "dry" mode (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/617746 (owner: 10Krinkle)
[18:36:46] <icinga-wm>	 RECOVERY - Check systemd state on wtp2017 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:36:48] <icinga-wm>	 RECOVERY - MD RAID on wtp2017 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[18:37:24] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:39:01] <wikibugs>	 (03PS2) 10Krinkle: mediawiki-cache-warmup: Add "dry" mode [puppet] - 10https://gerrit.wikimedia.org/r/617746
[18:39:08] <wikibugs>	 (03PS2) 10Krinkle: mediawiki-cache-warmup: Limit warmup URLs to large wikis [puppet] - 10https://gerrit.wikimedia.org/r/617747
[18:39:32] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the 'wmf' group for Monte Hurd - https://phabricator.wikimedia.org/T259382 (10Mhurd)
[18:40:31] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2017.codfw.wmnet'] `  and were **ALL** successful.
[18:41:14] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:42:13] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2016.codfw.wmnet'] `  and were **ALL** successful.
[18:47:45] <wikibugs>	 (03CR) 10Nuria: [C: 03+1] "Nice, this will bring piece of mind" [puppet] - 10https://gerrit.wikimedia.org/r/617735 (https://phabricator.wikimedia.org/T254493) (owner: 10Elukey)
[18:48:12] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the 'wmf' group for Monte Hurd - https://phabricator.wikimedia.org/T259382 (10greg) Approved as his manager's manager.
[18:48:17] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:49:21] <greg-g>	 redis connection errors ^
[18:52:25] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:52:35] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1006 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:53:27] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1006 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute andrew bogott restarting https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:53:39] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1006 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:55:27] <icinga-wm>	 RECOVERY - configured eth on wtp2016 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[18:55:27] <icinga-wm>	 RECOVERY - DPKG on wtp2016 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[18:55:50] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] mediawiki-cache-warmup: Limit warmup URLs to large wikis [puppet] - 10https://gerrit.wikimedia.org/r/617747 (owner: 10Krinkle)
[18:56:57] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the 'wmf' group for Monte Hurd - https://phabricator.wikimedia.org/T259382 (10herron) p:05Triage→03Medium Hi @Mhurd, it looks like you are actually a member of the `wmf` ldap group already, so that should be working with existing memberships unless a dif...
[18:57:26] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=wtp2016.codfw.wmnet
[18:57:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:58:45] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2018.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020...
[19:00:47] <wikibugs>	 10Operations, 10Gerrit, 10User-Kormat: gerrit.wm.o/r/changes/ has leading garbage in the output - https://phabricator.wikimedia.org/T259333 (10herron) p:05Triage→03Medium
[19:00:53] <wikibugs>	 10Operations, 10SRE-tools: Exception raised while executing cookbook sre.hosts.downtime - https://phabricator.wikimedia.org/T259158 (10herron) p:05Triage→03Medium
[19:02:56] <wikibugs>	 10Operations, 10Gerrit, 10User-Kormat: gerrit.wm.o/r/changes/ has leading garbage in the output - https://phabricator.wikimedia.org/T259333 (10greg) 05Open→03Invalid Boldly marking as invalid as this seems to be intended behavior.
[19:12:39] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:13:04] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] mediawiki-cache-warmup: Add "dry" mode [puppet] - 10https://gerrit.wikimedia.org/r/617746 (owner: 10Krinkle)
[19:14:43] <icinga-wm>	 PROBLEM - puppet last run on otrs1001 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[19:15:18] <mutante>	 known that there is WIP on otrs
[19:20:59] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=wtp2017.codfw.wmnet
[19:21:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:21:25] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2019.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020...
[19:23:39] <icinga-wm>	 ACKNOWLEDGEMENT - mediawiki-installation DSH group on wtp2017 is CRITICAL: Host wtp2017 is not in mediawiki-installation dsh group daniel_zahn reinstall https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[19:23:59] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:28:09] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2020.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020...
[19:34:15] <wikibugs>	 (03PS1) 10Andrew Bogott: Move cloudvirt1031 from virt_ceph to virt [puppet] - 10https://gerrit.wikimedia.org/r/617758
[19:34:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Move cloudvirt1031 from virt_ceph to virt [puppet] - 10https://gerrit.wikimedia.org/r/617758 (owner: 10Andrew Bogott)
[19:34:56] <wikibugs>	 (03PS2) 10Andrew Bogott: Move cloudvirt1031 from virt_ceph to virt [puppet] - 10https://gerrit.wikimedia.org/r/617758
[19:35:16] <wikibugs>	 10Operations, 10Analytics-Clusters, 10Analytics-Radar, 10observability: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10herron)
[19:35:48] <wikibugs>	 10Operations, 10vm-requests: eqiad: 1 VM for kafkamon - https://phabricator.wikimedia.org/T257560 (10herron) 05Open→03Resolved
[19:36:01] <wikibugs>	 10Operations, 10vm-requests: codfw: 1 VM for kafkamon - https://phabricator.wikimedia.org/T257561 (10herron) 05Open→03Resolved
[19:36:12] <wikibugs>	 10Operations, 10vm-requests: eqiad: 1 VM for kafkamon - kafkamon1002 - https://phabricator.wikimedia.org/T257560 (10herron)
[19:36:25] <wikibugs>	 10Operations, 10vm-requests: codfw: 1 VM for kafkamon - kafkamon2002 - https://phabricator.wikimedia.org/T257561 (10herron)
[19:39:05] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[19:39:06] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[19:39:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:39:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:59] <wikibugs>	 (03PS5) 10Ahmon Dancy: Add mtail program for monitoring the Zuul error log [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821)
[19:50:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add mtail program for monitoring the Zuul error log [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[19:51:18] <icinga-wm>	 PROBLEM - Check size of conntrack table on wtp2018 is CRITICAL: connect to address 10.192.32.33 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[19:51:18] <icinga-wm>	 PROBLEM - parsoid on wtp2018 is CRITICAL: connect to address 10.192.32.33 and port 8000: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/parsoid
[19:51:35] <mutante>	 ACK - wtp2018
[19:51:48] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[19:51:50] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[19:51:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:13] <wikibugs>	 (03PS6) 10Ahmon Dancy: Add mtail program for monitoring the Zuul error log [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821)
[19:55:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add mtail program for monitoring the Zuul error log [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[20:01:05] <wikibugs>	 (03PS1) 10Ahmon Dancy: Rplace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/617762
[20:01:07] <wikibugs>	 (03PS2) 10Ahmon Dancy: Replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/617762
[20:02:58] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[20:02:58] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[20:03:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:24] <wikibugs>	 (03PS1) 10Andrew Bogott: Revert "Move cloudvirt1031 from virt_ceph to virt" [puppet] - 10https://gerrit.wikimedia.org/r/617584
[20:04:16] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:05:24] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Revert "Move cloudvirt1031 from virt_ceph to virt" [puppet] - 10https://gerrit.wikimedia.org/r/617584 (owner: 10Andrew Bogott)
[20:08:34] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1031 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:08:43] <wikibugs>	 (03PS7) 10Ahmon Dancy: Add mtail program for monitoring the Zuul error log [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821)
[20:08:52] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:09:16] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1032 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[20:10:11] <wikibugs>	 10Operations, 10Discovery-Search: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10DVrandecic)
[20:11:12] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[20:11:12] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[20:11:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:11:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:46] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[20:18:46] <wikibugs>	 10Operations, 10Fundraising-Backlog: New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10DStrine) p:05Medium→03High
[20:21:32] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (bad URL) is CRITICAL: Test bad URL returned the unexpected status 200 (expecting: 404) https://wikitech.wikimedia.org/wiki/Citoid
[20:21:56] <icinga-wm>	 PROBLEM - PHP7 rendering on wtp2020 is CRITICAL: connect to address 10.192.32.35 and port 80: Connection refused https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[20:21:58] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on wtp2020 is CRITICAL: connect to address 10.192.32.35 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[20:21:58] <icinga-wm>	 PROBLEM - mcrouter process on wtp2019 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.34: Connection reset by peer https://wikitech.wikimedia.org/wiki/Mcrouter
[20:22:05] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[20:22:06] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[20:22:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:22:11] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[20:22:12] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[20:22:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:22:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:22:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:33:18] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10Nuria)
[20:33:31] <wikibugs>	 (03PS1) 10Andrew Bogott: Cloudvirt103[3-9] to Buster [puppet] - 10https://gerrit.wikimedia.org/r/617767
[20:34:03] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10Nuria)
[20:34:59] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10Nuria)
[20:35:19] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10Andrew)
[20:35:41] <wikibugs>	 (03PS3) 10Ahmon Dancy: Replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/617762
[20:35:43] <wikibugs>	 (03PS8) 10Ahmon Dancy: Add mtail program for monitoring the Zuul error log [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821)
[20:36:44] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10Andrew) I have cloudvirt1031 and 1032 running nova-compute, and things look right from the host...
[20:36:47] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Cloudvirt103[3-9] to Buster [puppet] - 10https://gerrit.wikimedia.org/r/617767 (owner: 10Andrew Bogott)
[20:37:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add mtail program for monitoring the Zuul error log [puppet] - 10https://gerrit.wikimedia.org/r/617271 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[20:55:30] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[20:55:30] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[20:55:30] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[20:55:30] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[20:55:30] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[20:55:31] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[20:55:31] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[20:55:33] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[20:55:33] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[20:55:33] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[20:55:33] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[20:55:33] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[20:55:34] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[20:55:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:55:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:55:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:55:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:55:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:55:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:55:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:56:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:56:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:56:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:56:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:56:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:56:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:57:40] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[20:57:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:43] <icinga-wm>	 RECOVERY - parsoid on wtp2018 is OK: HTTP OK: HTTP/1.1 200 OK - 1022 bytes in 0.179 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/parsoid
[21:07:44] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2018.codfw.wmnet'] `  and were **ALL** successful.
[21:13:14] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=wtp2018.codfw.wmnet
[21:13:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:17:39] <icinga-wm>	 RECOVERY - Check size of conntrack table on wtp2018 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[21:22:43] <wikibugs>	 10Operations, 10Mail, 10OTRS, 10Trust-and-Safety, and 2 others: Forward emails addressed to privacy@wikidata to privacy@wikimedia - https://phabricator.wikimedia.org/T255733 (10Dzahn) So the original request for privacy@wikidata is resolved.    We can either close this ticket or talk about the domains that...
[21:23:42] <wikibugs>	 10Operations, 10Mail, 10OTRS, 10Trust-and-Safety, and 2 others: Forward emails addressed to privacy@wikidata to privacy@wikimedia - https://phabricator.wikimedia.org/T255733 (10Dzahn) @Emufarmers I guess the corresponding OTRS queues can be disabled or removed if that's appropriate and needed.
[21:23:45] <icinga-wm>	 RECOVERY - mcrouter process on wtp2019 is OK: PROCS OK: 1 process with UID = 113 (mcrouter), command name mcrouter https://wikitech.wikimedia.org/wiki/Mcrouter
[21:24:58] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Dzahn) a:05JMeybohm→03Dzahn
[21:31:00] <wikibugs>	 10Operations, 10Fundraising-Backlog: New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10Dzahn) > At the moment we're just looking for a short-term solution   Fwiw,  creating a a new wiki involves quite a few steps and people and might not a...
[21:32:09] <icinga-wm>	 RECOVERY - PHP7 rendering on wtp2020 is OK: HTTP OK: HTTP/1.1 302 Found - 645 bytes in 0.475 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[21:35:13] <wikibugs>	 (03PS1) 10Andrew Bogott: Fix comment about cloudvirts and scheduling [puppet] - 10https://gerrit.wikimedia.org/r/617774
[21:35:15] <wikibugs>	 (03PS1) 10Andrew Bogott: Make cloudvirt103[3-9] into cloudvirts [puppet] - 10https://gerrit.wikimedia.org/r/617775
[21:36:11] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Make cloudvirt103[3-9] into cloudvirts [puppet] - 10https://gerrit.wikimedia.org/r/617775 (owner: 10Andrew Bogott)
[21:36:17] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Fix comment about cloudvirts and scheduling [puppet] - 10https://gerrit.wikimedia.org/r/617774 (owner: 10Andrew Bogott)
[21:36:43] <mutante>	 !log [wtp2019:~] $ sudo rm -rf /srv/deployment/parsoid/deploy-cache
[21:36:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:39:21] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10Nuria)
[21:40:39] <wikibugs>	 (03PS1) 10Andrew Bogott: Remove some old Ubuntu->Debian migration code [puppet] - 10https://gerrit.wikimedia.org/r/617777
[21:42:01] <wikibugs>	 (03PS2) 10Andrew Bogott: Nova compute: remove some old Ubuntu->Debian migration code [puppet] - 10https://gerrit.wikimedia.org/r/617777
[21:42:55] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2019.codfw.wmnet'] `  and were **ALL** successful.
[21:43:15] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Nova compute: remove some old Ubuntu->Debian migration code [puppet] - 10https://gerrit.wikimedia.org/r/617777 (owner: 10Andrew Bogott)
[21:46:23] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "in general good intention just the syntax for default values is a bit different and unfortunately can't be directly replaced like that" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/617762 (owner: 10Ahmon Dancy)
[21:47:20] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "nitpick: please start commit messages with the module name or topic, so  "zuul::server: replace hiera() with lookup()" or so" [puppet] - 10https://gerrit.wikimedia.org/r/617762 (owner: 10Ahmon Dancy)
[21:48:47] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on wtp2020 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[21:50:12] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet
[21:50:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:54:28] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2020.codfw.wmnet'] `  and were **ALL** successful.
[21:55:44] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt103[3-9]: rename nic yet again [puppet] - 10https://gerrit.wikimedia.org/r/617779
[21:56:07] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[21:56:56] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt103[3-9]: rename nic yet again [puppet] - 10https://gerrit.wikimedia.org/r/617779 (owner: 10Andrew Bogott)
[21:57:05] <wikibugs>	 (03PS21) 10CRusnov: customscripts/interface_automation.py: Add PuppetDB Importer [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153)
[21:57:56] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=wtp2019.codfw.wmnet
[21:58:04] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet
[21:58:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:58:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:59:46] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) is CRITICAL: Test bad URL returned the unexpected status 200 (expecting: 404) https://wikitech.wikimedia.org/wiki/Citoid
[22:03:22] <mutante>	 !log wtp2019 - parsoid could not start after reimaging - was missing /etc/parsoid/config.yaml which is a symbolic link deep onto /srv/deployment/parsoid/deploy-cache/.. like in some other cases before manually deleted deploy-cache dir and ran puppet again  .. T258775
[22:03:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:03:28] <stashbot>	 T258775: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775
[22:12:41] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s4 on db1145 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 910.14 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[22:13:11] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1033 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:14:53] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1033 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:36:40] <wikibugs>	 10Operations, 10Platform Engineering, 10Release Pipeline, 10Release-Engineering-Team-TODO, and 6 others: Kask functional testing with Cassandra via the Deployment Pipeline - https://phabricator.wikimedia.org/T224041 (10jeena) We attempted to run the tests using CI, but ran into errors deploying cassandra t...
[22:46:25] <icinga-wm>	 PROBLEM - parsoid on wtp2019 is CRITICAL: connect to address 10.192.32.34 and port 8000: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/parsoid
[22:53:19] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[22:53:20] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[22:53:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:36] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[22:53:36] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[22:53:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:53:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:10:33] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Dzahn) All wtp* and parse* servers have been reimaged.   With the exception of wtp2019 they have also been tested with httpbb, parsoid service running,  repooled and look fine in mo...
[23:14:55] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Dzahn) ` [cumin1001:~] $ sudo cumin wtp* 'df -h | grep mapper | cut -d "/" -f1,2' 43 hosts will be targeted: wtp[2001-2004,2006-2020].codfw.wmnet,wtp[1025-1048].eqiad.wmnet Confirm...
[23:15:13] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Dzahn) 05Open→03Resolved
[23:42:42] <wikibugs>	 (03PS1) 10Andrew Bogott: openstack::nova::compute::service::rocky::buster: use legacy ebtables [puppet] - 10https://gerrit.wikimedia.org/r/617791 (https://phabricator.wikimedia.org/T259399)
[23:43:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack::nova::compute::service::rocky::buster: use legacy ebtables [puppet] - 10https://gerrit.wikimedia.org/r/617791 (https://phabricator.wikimedia.org/T259399) (owner: 10Andrew Bogott)
[23:44:52] <wikibugs>	 (03PS2) 10Andrew Bogott: openstack::nova::compute::service::rocky::buster: use legacy ebtables [puppet] - 10https://gerrit.wikimedia.org/r/617791 (https://phabricator.wikimedia.org/T259399)
[23:45:32] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack::nova::compute::service::rocky::buster: use legacy ebtables [puppet] - 10https://gerrit.wikimedia.org/r/617791 (https://phabricator.wikimedia.org/T259399) (owner: 10Andrew Bogott)
[23:56:09] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s4 on db1145 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica