[00:00:04] <jouncebot>	 Deploy window No deploys! DC Switchover. See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201028T0000)
[00:02:28] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:02:54] <icinga-wm>	 PROBLEM - Check the last execution of grafana-ldap-users-sync on grafana1002 is CRITICAL: CRITICAL: Status of the systemd unit grafana-ldap-users-sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[00:03:48] <icinga-wm>	 PROBLEM - Check systemd state on grafana1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:04:20] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:33:44] <icinga-wm>	 PROBLEM - Disk space on maps2002 is CRITICAL: DISK CRITICAL - free space: /srv 59838 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=maps2002&var-datasource=codfw+prometheus/ops
[01:36:40] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1101 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:09:52] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10Reedy)
[02:09:54] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1028 with 10G interfaces - https://phabricator.wikimedia.org/T266514 (10Reedy)
[02:10:01] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1026 with 10G interfaces - https://phabricator.wikimedia.org/T266281 (10Reedy)
[02:10:03] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1027 with 10G interfaces - https://phabricator.wikimedia.org/T266369 (10Reedy)
[02:10:05] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1029 with 10G interfaces - https://phabricator.wikimedia.org/T266206 (10Reedy)
[02:10:08] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1025 with 10G interfaces - https://phabricator.wikimedia.org/T266187 (10Reedy)
[02:10:11] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1013 with 10G interfaces - https://phabricator.wikimedia.org/T264806 (10Reedy)
[02:10:14] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1021 with 10G interfaces - https://phabricator.wikimedia.org/T229873 (10Reedy)
[02:10:17] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1022 with 10G interfaces - https://phabricator.wikimedia.org/T229872 (10Reedy)
[02:10:20] <wikibugs>	 10Operations, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1014 with 10G interfaces - https://phabricator.wikimedia.org/T226188 (10Reedy)
[02:10:23] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1023 with 10G interfaces - https://phabricator.wikimedia.org/T229871 (10Reedy)
[02:10:26] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1013 with 10G interfaces - https://phabricator.wikimedia.org/T243414 (10Reedy)
[02:10:31] <wikibugs>	 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1016 with 10G interfaces - https://phabricator.wikimedia.org/T228692 (10Reedy)
[02:10:33] <wikibugs>	 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1002 with 10G interfaces - https://phabricator.wikimedia.org/T221140 (10Reedy)
[02:10:35] <wikibugs>	 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1017 with 10G interfaces - https://phabricator.wikimedia.org/T228691 (10Reedy)
[02:10:38] <wikibugs>	 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1001 with 10G interfaces - https://phabricator.wikimedia.org/T221141 (10Reedy)
[02:10:47] <wikibugs>	 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1004 with 10G interfaces - https://phabricator.wikimedia.org/T221138 (10Reedy)
[02:10:49] <wikibugs>	 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1003 with 10G interfaces - https://phabricator.wikimedia.org/T221139 (10Reedy)
[02:11:03] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1015 with 10G interfaces - https://phabricator.wikimedia.org/T217140 (10Reedy)
[02:11:05] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1007 with 10G interfaces - https://phabricator.wikimedia.org/T221047 (10Reedy)
[02:15:34] <icinga-wm>	 PROBLEM - Disk space on maps2002 is CRITICAL: DISK CRITICAL - free space: /srv 63445 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=maps2002&var-datasource=codfw+prometheus/ops
[02:17:09] <wikibugs>	 (03PS1) 10Gergő Tisza: Suggested edits: Include page ID with task preview data [extensions/GrowthExperiments] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/636787 (https://phabricator.wikimedia.org/T266600)
[02:57:54] <logmsgbot>	 !log ryankemper@cumin2001 START - Cookbook sre.elasticsearch.rolling-restart
[02:57:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:58:43] <ryankemper>	 !log T266492 Beginning rolling restart of codfw cirrus cluster, 3 nodes at a time, on `ryankemper@cumin2001` tmux session `elasticsearch_restart_codfw`
[02:58:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:58:48] <stashbot>	 T266492: Restart elasticsearch clusters to apply readahead changes - https://phabricator.wikimedia.org/T266492
[03:28:42] <icinga-wm>	 PROBLEM - Check systemd state on idp-test2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:32:00] <icinga-wm>	 PROBLEM - Check systemd state on netflow4001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:30:56] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1097 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:38:22] <wikibugs>	 (03PS1) 10Ryan Kemper: cirrus: fix shard_size thresholds [puppet] - 10https://gerrit.wikimedia.org/r/636811
[04:43:12] <logmsgbot>	 !log ryankemper@cumin2001 END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
[04:43:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:43:45] <ryankemper>	 !log T266492 Finished rolling restart of codfw cirrus cluster
[04:43:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:43:50] <stashbot>	 T266492: Restart elasticsearch clusters to apply readahead changes - https://phabricator.wikimedia.org/T266492
[04:46:05] <wikibugs>	 (03PS2) 10Ryan Kemper: cirrus: fix shard_size thresholds [puppet] - 10https://gerrit.wikimedia.org/r/636811
[04:53:28] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1100 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:34:29] <wikibugs>	 (03PS3) 10Ryan Kemper: cirrus: fix shard_size thresholds [puppet] - 10https://gerrit.wikimedia.org/r/636811 (https://phabricator.wikimedia.org/T265908)
[05:35:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: fix shard_size thresholds [puppet] - 10https://gerrit.wikimedia.org/r/636811 (https://phabricator.wikimedia.org/T265908) (owner: 10Ryan Kemper)
[05:37:46] <wikibugs>	 (03PS4) 10Ryan Kemper: cirrus: fix shard_size thresholds [puppet] - 10https://gerrit.wikimedia.org/r/636811 (https://phabricator.wikimedia.org/T265908)
[05:39:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: fix shard_size thresholds [puppet] - 10https://gerrit.wikimedia.org/r/636811 (https://phabricator.wikimedia.org/T265908) (owner: 10Ryan Kemper)
[06:06:07] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] "Will ship this Weds" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636480 (owner: 10Ebernhardson)
[06:06:52] <wikibugs>	 (03Merged) 10jenkins-bot: Increase cirrus morelike pool counter by 20% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636480 (owner: 10Ebernhardson)
[06:10:10] <wikibugs>	 (03PS1) 10Ryan Kemper: Revert "cirrus: Hardcode more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636791
[06:11:45] <wikibugs>	 (03PS5) 10Ryan Kemper: cirrus: fix shard_size thresholds [puppet] - 10https://gerrit.wikimedia.org/r/636811 (https://phabricator.wikimedia.org/T265908)
[06:13:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: fix shard_size thresholds [puppet] - 10https://gerrit.wikimedia.org/r/636811 (https://phabricator.wikimedia.org/T265908) (owner: 10Ryan Kemper)
[06:43:30] <wikibugs>	 (03CR) 10Elukey: "@Razzi: there are still some leftovers in the webserver.yaml file (see my comments above), can you check it?" [puppet] - 10https://gerrit.wikimedia.org/r/636514 (https://phabricator.wikimedia.org/T240439) (owner: 10Razzi)
[06:53:53] <wikibugs>	 (03PS1) 10Marostegui: db2093: Clarify it is active for orchestrator DB [puppet] - 10https://gerrit.wikimedia.org/r/636816 (https://phabricator.wikimedia.org/T266003)
[06:53:57] <wikibugs>	 (03PS1) 10Elukey: profile::analytics::cluster::packages::statistics: add npm [puppet] - 10https://gerrit.wikimedia.org/r/636817
[06:54:21] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2093: Clarify it is active for orchestrator DB [puppet] - 10https://gerrit.wikimedia.org/r/636816 (https://phabricator.wikimedia.org/T266003) (owner: 10Marostegui)
[07:05:40] <wikibugs>	 (03Abandoned) 10Elukey: admin: allow users to be removed preserving their home directories [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey)
[07:05:48] <icinga-wm>	 PROBLEM - Check systemd state on netflow1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:06:51] <wikibugs>	 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey: Archival of home directories on servers with very large homes - https://phabricator.wikimedia.org/T215171 (10elukey) 05Open→03Declined Declining this since we have been following another path over the past year and it worked well, will re-op...
[07:12:50] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-test: Move db1077 from test-s4 to test-s1 [puppet] - 10https://gerrit.wikimedia.org/r/636818 (https://phabricator.wikimedia.org/T187984)
[07:14:07] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb-test: Move db1077 from test-s4 to test-s1 [puppet] - 10https://gerrit.wikimedia.org/r/636818 (https://phabricator.wikimedia.org/T187984) (owner: 10Jcrespo)
[07:15:37] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-test: Move db1077 from test-s4 to test-s1 [puppet] - 10https://gerrit.wikimedia.org/r/636818 (https://phabricator.wikimedia.org/T187984) (owner: 10Jcrespo)
[07:15:54] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1097 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:17:56] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1100 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:19:38] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1101 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:22:45] <godog>	 !log swift codfw-prod: bump object weight for ms-be2057 - T261633
[07:22:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:22:52] <stashbot>	 T261633: Put ms-be2057 (Dell R740xd2) in service - https://phabricator.wikimedia.org/T261633
[07:35:24] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM! See inline" (031 comment) [software/ecs] - 10https://gerrit.wikimedia.org/r/636513 (owner: 10Cwhite)
[07:36:23] <wikibugs>	 10Operations, 10Data-Persistence-Backup, 10SRE-swift-storage, 10Goal, 10Patch-For-Review: Prepare a proof of concept of the minimum setup capable of backup and recover testwiki media files - https://phabricator.wikimedia.org/T264189 (10jcrespo) For archival purposes, this is the (naive) code solution for...
[07:40:40] <godog>	 !log update thanos-fe1002 to thanos 0.16.0 - T261281
[07:40:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:40:46] <stashbot>	 T261281: Improve performance of Thanos (+ Prometheus) - https://phabricator.wikimedia.org/T261281
[07:53:15] <volans>	 !log upgraded python3-wmflib to 0.0.3 on the cumin hosts - T257905
[07:53:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:20] <stashbot>	 T257905: Spin off common Spicerack modules into a standalone Python library importable anywhere - https://phabricator.wikimedia.org/T257905
[07:53:40] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Remove modules migrated to wmflib [software/spicerack] - 10https://gerrit.wikimedia.org/r/636000 (https://phabricator.wikimedia.org/T257905) (owner: 10Volans)
[07:57:03] <wikibugs>	 (03Merged) 10jenkins-bot: Remove modules migrated to wmflib [software/spicerack] - 10https://gerrit.wikimedia.org/r/636000 (https://phabricator.wikimedia.org/T257905) (owner: 10Volans)
[08:04:55] <wikibugs>	 (03CR) 10Elukey: requests: add new module (032 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/636645 (owner: 10Volans)
[08:06:08] <wikibugs>	 (03PS1) 10Elukey: zookeeper: use profile::java [puppet] - 10https://gerrit.wikimedia.org/r/636864
[08:11:10] <wikibugs>	 (03PS2) 10Elukey: zookeeper: use profile::java [puppet] - 10https://gerrit.wikimedia.org/r/636864
[08:17:47] <icinga-wm>	 PROBLEM - Check systemd state on dumpsdata1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:20:51] <wikibugs>	 (03PS3) 10Elukey: zookeeper: use profile::java [puppet] - 10https://gerrit.wikimedia.org/r/636864
[08:20:53] <wikibugs>	 (03PS1) 10Elukey: profile::java: add support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/636865
[08:23:50] <wikibugs>	 (03CR) 10Volans: "replies to questions/comments" (032 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/636645 (owner: 10Volans)
[08:24:39] <icinga-wm>	 PROBLEM - SSH on ms-be2037 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:25:18] <wikibugs>	 (03PS2) 10Elukey: profile::java: add support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/636865
[08:25:20] <wikibugs>	 (03PS4) 10Elukey: zookeeper: use profile::java [puppet] - 10https://gerrit.wikimedia.org/r/636864
[08:29:13] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2037 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:29:43] <icinga-wm>	 RECOVERY - SSH on ms-be2037 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:29:45] <wikibugs>	 (03PS5) 10Elukey: zookeeper: use profile::java [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176)
[08:31:22] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/26180/" [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176) (owner: 10Elukey)
[08:32:10] <wikibugs>	 (03CR) 10Elukey: "Going to quickly test it manually but overall it looks good!" (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/636645 (owner: 10Volans)
[08:32:19] <icinga-wm>	 PROBLEM - Check systemd state on netflow3001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:35:17] <wikibugs>	 (03PS2) 10Volans: requests: add new module [software/pywmflib] - 10https://gerrit.wikimedia.org/r/636645
[08:37:29] <jynus>	 !log updated dump grants on db2093
[08:37:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:37:32] <wikibugs>	 (03PS1) 10Filippo Giunchedi: alertmanager: add dashboard url to irc messages [puppet] - 10https://gerrit.wikimedia.org/r/636868 (https://phabricator.wikimedia.org/T266017)
[08:38:31] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] requests: add new module [software/pywmflib] - 10https://gerrit.wikimedia.org/r/636645 (owner: 10Volans)
[08:38:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] alertmanager: add dashboard url to irc messages [puppet] - 10https://gerrit.wikimedia.org/r/636868 (https://phabricator.wikimedia.org/T266017) (owner: 10Filippo Giunchedi)
[08:39:01] <wikibugs>	 10Operations, 10Analytics-Clusters: Switch Zookeeper to profile::java - https://phabricator.wikimedia.org/T264176 (10elukey) a:05razzi→03None
[08:40:07] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={atlas_exporter,swagger_check_citoid_cluster_eqiad} site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:40:22] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Failure is due to this wmf-style violation:" [puppet] - 10https://gerrit.wikimedia.org/r/636868 (https://phabricator.wikimedia.org/T266017) (owner: 10Filippo Giunchedi)
[08:40:24] <wikibugs>	 10Operations, 10Analytics-Clusters: Switch Zookeeper to profile::java - https://phabricator.wikimedia.org/T264176 (10elukey) a:03elukey Going to takeover the ownership of the task since I need to do some refactoring of some code that I have written :)
[08:40:37] <wikibugs>	 10Operations, 10Analytics-Clusters, 10Analytics-Kanban: Switch Zookeeper to profile::java - https://phabricator.wikimedia.org/T264176 (10elukey)
[08:41:51] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:44:13] <wikibugs>	 (03PS2) 10Filippo Giunchedi: alertmanager: add dashboard url to irc messages [puppet] - 10https://gerrit.wikimedia.org/r/636868 (https://phabricator.wikimedia.org/T266017)
[08:48:11] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2037 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:49:43] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Revert "hieradata: move swiftrepl to codfw" [puppet] - 10https://gerrit.wikimedia.org/r/636873
[08:50:58] <wikibugs>	 10Operations, 10DBA, 10User-Kormat: orchestrator: Get packages into WMF apt - https://phabricator.wikimedia.org/T266023 (10Marostegui) 05Open→03Resolved a:03Kormat Going to close this as resolved as the packages are uploaded. Thank you Stevie! Per T266023#6570807, medium-term we should take a look at c...
[08:51:00] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review, 10User-Kormat: orchestrator: Puppetize - https://phabricator.wikimedia.org/T265990 (10Marostegui)
[08:56:17] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LGTM" [software/cumin] - 10https://gerrit.wikimedia.org/r/636729 (owner: 10Volans)
[09:02:16] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.hosts.downtime
[09:02:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:04:20] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[09:04:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:12:57] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2042 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:14:07] <wikibugs>	 10Operations, 10Analytics-Clusters, 10Analytics-Kanban: Switch Zookeeper to profile::java - https://phabricator.wikimedia.org/T264176 (10MoritzMuehlenhoff) One gotcha: conf1* is still on jessie (and consequently Java 7), and I don't think anything accounts for Java 7 yet
[09:15:35] <elukey>	 moritzm: you have a cr for java 7 support :)
[09:17:16] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10User-Kormat: orchestrator: integrate promotion rules into puppet - https://phabricator.wikimedia.org/T266002 (10Marostegui)
[09:17:25] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: orchestrator: Puppetize - https://phabricator.wikimedia.org/T265990 (10Marostegui)
[09:17:41] <moritzm>	 ah, "good" :-)
[09:17:49] <moritzm>	 will look in a bit
[09:20:20] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] toolforge: script to make long-running processes on bastions less good [puppet] - 10https://gerrit.wikimedia.org/r/635888 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm)
[09:22:22] <wikibugs>	 (03CR) 10Muehlenhoff: "Obsoleted/duplicated by https://gerrit.wikimedia.org/r/c/operations/puppet/+/636730" [puppet] - 10https://gerrit.wikimedia.org/r/636614 (owner: 10Muehlenhoff)
[09:22:32] <wikibugs>	 (03Abandoned) 10Muehlenhoff: Only handle auto restart of Jenkins on active instance [puppet] - 10https://gerrit.wikimedia.org/r/636614 (owner: 10Muehlenhoff)
[09:23:24] <wikibugs>	 10Operations, 10ops-eqiad: Power supply lost for analytics1072 - https://phabricator.wikimedia.org/T266644 (10elukey)
[09:23:57] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2051 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:24:13] <wikibugs>	 10Operations, 10Analytics-Clusters, 10Analytics-Kanban: Switch Zookeeper to profile::java - https://phabricator.wikimedia.org/T264176 (10elukey) >>! In T264176#6584036, @MoritzMuehlenhoff wrote: > One gotcha: conf1* is still on jessie (and consequently Java 7), and I don't think anything accounts for Java 7...
[09:24:49] <wikibugs>	 10Operations, 10SRE-Access-Requests: New prod ssh key for calbon - https://phabricator.wikimedia.org/T266498 (10ema) p:05Triage→03Medium
[09:26:14] <jayme>	 !log imported kubeyaml 0.0.3~20201027+git5f5556c-1 to buster-wikimedia
[09:26:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:55] <wikibugs>	 (03CR) 10Kormat: orchestrator: Install mariadb client [puppet] - 10https://gerrit.wikimedia.org/r/636616 (https://phabricator.wikimedia.org/T265990) (owner: 10Kormat)
[09:29:00] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] orchestrator: Install mariadb client [puppet] - 10https://gerrit.wikimedia.org/r/636616 (https://phabricator.wikimedia.org/T265990) (owner: 10Kormat)
[09:29:21] <wikibugs>	 (03CR) 10Kormat: orchestrator: Search both eqiad and codfw dns [puppet] - 10https://gerrit.wikimedia.org/r/636613 (https://phabricator.wikimedia.org/T265990) (owner: 10Kormat)
[09:29:23] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] orchestrator: Search both eqiad and codfw dns [puppet] - 10https://gerrit.wikimedia.org/r/636613 (https://phabricator.wikimedia.org/T265990) (owner: 10Kormat)
[09:32:46] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10User-Kormat: Integrate orchestrator with !log - https://phabricator.wikimedia.org/T266452 (10Marostegui)
[09:34:06] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10User-Kormat: orchestrator: Add service monitoring - https://phabricator.wikimedia.org/T266338 (10Marostegui)
[09:34:15] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10CAS-SSO, 10User-Kormat: orchestrator: Support SSO - https://phabricator.wikimedia.org/T266106 (10Marostegui)
[09:35:02] <wikibugs>	 (03PS1) 10Kormat: pontoon: Add orchestrator role in mariadb104-test [puppet] - 10https://gerrit.wikimedia.org/r/636880
[09:36:22] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] pontoon: Add orchestrator role in mariadb104-test [puppet] - 10https://gerrit.wikimedia.org/r/636880 (owner: 10Kormat)
[09:46:13] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "see inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/635656 (owner: 10Dzahn)
[09:49:11] <wikibugs>	 (03PS1) 10JMeybohm: Test charts/deployments for compatibility with k8s 1.19 [deployment-charts] - 10https://gerrit.wikimedia.org/r/636881 (https://phabricator.wikimedia.org/T266032)
[09:49:23] <wikibugs>	 (03PS1) 10Urbanecm: [cswiki] Set wgGEHomepageManualAssignmentMentorsList to Wikipedie:Potřebuji pomoc/Mentoři/Manuální [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636882 (https://phabricator.wikimedia.org/T245639)
[09:49:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Test charts/deployments for compatibility with k8s 1.19 [deployment-charts] - 10https://gerrit.wikimedia.org/r/636881 (https://phabricator.wikimedia.org/T266032) (owner: 10JMeybohm)
[09:49:49] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.hosts.downtime
[09:49:52] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] "Depends on:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/636881 (https://phabricator.wikimedia.org/T266032) (owner: 10JMeybohm)
[09:49:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:31] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.hosts.downtime
[09:50:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:27] <wikibugs>	 (03CR) 10Jbond: wmflib:: add data type for puppetmaster server type and use it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/635660 (owner: 10Dzahn)
[09:51:35] <wikibugs>	 (03CR) 10Jbond: wmflib: add data type for SSLVerifyClient and use it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/635658 (owner: 10Dzahn)
[09:52:17] <wikibugs>	 10Operations, 10SRE-Access-Requests: New prod ssh key for calbon - https://phabricator.wikimedia.org/T266498 (10ema) I've pinged @calbon on Google Chat asking to confirm the public key, taking care of the puppet change once I hear from him.
[09:52:51] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[09:52:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:54:34] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/634192 (owner: 10Alexandros Kosiaris)
[09:54:40] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[09:54:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:02:08] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/636653 (https://phabricator.wikimedia.org/T266561) (owner: 10Ayounsi)
[10:02:37] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2042 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:04:33] <wikibugs>	 (03PS4) 10Filippo Giunchedi: Grafana config changes for CAS-enabled grafana-rw.w.o vhost [puppet] - 10https://gerrit.wikimedia.org/r/629122 (https://phabricator.wikimedia.org/T262512) (owner: 10Muehlenhoff)
[10:05:07] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] systemd::timer::job: switch monitoring_enabled default to false [puppet] - 10https://gerrit.wikimedia.org/r/636628 (https://phabricator.wikimedia.org/T265138) (owner: 10Jbond)
[10:05:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] Grafana config changes for CAS-enabled grafana-rw.w.o vhost [puppet] - 10https://gerrit.wikimedia.org/r/629122 (https://phabricator.wikimedia.org/T262512) (owner: 10Muehlenhoff)
[10:09:19] <wikibugs>	 (03PS1) 10Muehlenhoff: Also enable cn=grafana-admin for grafana-rw.w.o [puppet] - 10https://gerrit.wikimedia.org/r/636885 (https://phabricator.wikimedia.org/T262512)
[10:14:17] <wikibugs>	 (03CR) 10JMeybohm: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/636881 (https://phabricator.wikimedia.org/T266032) (owner: 10JMeybohm)
[10:14:29] <wikibugs>	 10Operations, 10Analytics-Clusters, 10vm-requests: Create a ganeti VM in eqiad: an-test-ui1001.eqiad.wmnet - https://phabricator.wikimedia.org/T266648 (10elukey) p:05Triage→03Medium
[10:17:31] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2051 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:19:22] <wikibugs>	 10Operations, 10Analytics-Clusters, 10vm-requests: Create a ganeti VM in eqiad: an-test-ui1001.eqiad.wmnet - https://phabricator.wikimedia.org/T266648 (10elukey) ` elukey@ganeti1011:~$   sudo gnt-group list Group Nodes Instances AllocPolicy NDParams row_A     4        36 preferred   ovs=False, ssh_port=22, o...
[10:19:44] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] "Good find! See a few style comments inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/636811 (https://phabricator.wikimedia.org/T265908) (owner: 10Ryan Kemper)
[10:20:35] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.ganeti.makevm
[10:20:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:22:04] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2016 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:23:04] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2016 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:25:00] <ema>	 !log A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 T266567 T264398
[10:25:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:07] <stashbot>	 T264398: 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398
[10:25:07] <stashbot>	 T266567: libvmod-netmapper: must specify ABI stanza - https://phabricator.wikimedia.org/T266567
[10:27:14] <wikibugs>	 (03PS1) 10Jbond: apereo_cas: dont monitor systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/636889
[10:27:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] apereo_cas: dont monitor systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/636889 (owner: 10Jbond)
[10:28:21] <wikibugs>	 (03PS2) 10Jbond: apereo_cas: dont monitor systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/636889
[10:28:49] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] Revert "cirrus: Hardcode more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636791 (owner: 10Ryan Kemper)
[10:29:59] <wikibugs>	 (03PS1) 10Elukey: sre.ganeti.makevm: ask to review args before DNS allocation [cookbooks] - 10https://gerrit.wikimedia.org/r/636890
[10:32:30] <wikibugs>	 (03PS1) 10Jbond: helm: drop monitoring for systemd::timer::job [puppet] - 10https://gerrit.wikimedia.org/r/636892
[10:35:06] <wikibugs>	 (03PS1) 10Mvolz: Update zotero translators [deployment-charts] - 10https://gerrit.wikimedia.org/r/636896
[10:35:37] <logmsgbot>	 !log elukey@cumin1001 END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
[10:35:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:07] <wikibugs>	 (03PS2) 10Jbond: helm: drop monitoring for systemd::timer::job [puppet] - 10https://gerrit.wikimedia.org/r/636892
[10:38:04] <elukey>	 !log clean up 10.64.5.7 and 2620:0:861:104:10:64:5:7 from Netbox (records mistakely allocated via the makevm cookbook) - T266648
[10:38:07] <wikibugs>	 10Operations, 10Traffic: varnish crash upon reload after libvmod-netmapper upgrade due to liburcu6 assertion - https://phabricator.wikimedia.org/T266651 (10ema)
[10:38:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:13] <stashbot>	 T266648: Create a ganeti VM in eqiad: an-test-ui1001.eqiad.wmnet - https://phabricator.wikimedia.org/T266648
[10:38:19] <wikibugs>	 10Operations, 10Traffic: varnish crash upon reload after libvmod-netmapper upgrade due to liburcu6 assertion - https://phabricator.wikimedia.org/T266651 (10ema) p:05Triage→03High
[10:39:14] <wikibugs>	 (03PS1) 10Jbond: profile::docker::builder: drop monitoring for systemd::timer::job [puppet] - 10https://gerrit.wikimedia.org/r/636898
[10:39:33] <ema>	 !log due to T266651, cancel the entry above: A:cp upgrade libvmod-netmapper to 1.9-1 T266567 T264398 
[10:39:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:42] <stashbot>	 T266651: varnish crash upon reload after libvmod-netmapper upgrade due to liburcu6 assertion - https://phabricator.wikimedia.org/T266651
[10:39:42] <stashbot>	 T266567: libvmod-netmapper: must specify ABI stanza - https://phabricator.wikimedia.org/T266567
[10:39:42] <stashbot>	 T264398: 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398
[10:41:53] <wikibugs>	 (03PS1) 10Jbond: profile::icinga: drop monitoring for systemd::timer::job [puppet] - 10https://gerrit.wikimedia.org/r/636899
[10:48:56] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Turn ECDHE-ECDSA-AES128-SHA support off [puppet] - 10https://gerrit.wikimedia.org/r/636901 (https://phabricator.wikimedia.org/T258405)
[10:50:49] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/636765 (https://phabricator.wikimedia.org/T266593) (owner: 10Bstorm)
[10:53:59] <wikibugs>	 (03PS1) 10Vgutierrez: ssl_ciphersuite: Remove CBC based cipher suites [puppet] - 10https://gerrit.wikimedia.org/r/636902 (https://phabricator.wikimedia.org/T258405)
[10:57:24] <wikibugs>	 (03PS3) 10Jbond: helm: drop monitoring for systemd::timer::job [puppet] - 10https://gerrit.wikimedia.org/r/636892
[10:57:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Sounds fine" [puppet] - 10https://gerrit.wikimedia.org/r/636817 (owner: 10Elukey)
[10:59:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 04-2] "Yeah, this is a legitimate error and it should be handled instead. The fact that a few cases of failing restarts were showing up is simply" [puppet] - 10https://gerrit.wikimedia.org/r/636728 (owner: 10Dzahn)
[11:01:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/636889 (owner: 10Jbond)
[11:02:44] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, one nit inline" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/636890 (owner: 10Elukey)
[11:03:03] <wikibugs>	 (03CR) 10Ema: [C: 03+1] ATS: Turn ECDHE-ECDSA-AES128-SHA support off [puppet] - 10https://gerrit.wikimedia.org/r/636901 (https://phabricator.wikimedia.org/T258405) (owner: 10Vgutierrez)
[11:09:06] <wikibugs>	 (03PS1) 10Kosta Harlan: Define scaffold_version before attempting to use it [deployment-charts] - 10https://gerrit.wikimedia.org/r/636905
[11:09:39] <wikibugs>	 (03PS2) 10Elukey: sre.ganeti.makevm: ask to review args before DNS allocation [cookbooks] - 10https://gerrit.wikimedia.org/r/636890
[11:10:24] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::analytics::cluster::packages::statistics: add npm [puppet] - 10https://gerrit.wikimedia.org/r/636817 (owner: 10Elukey)
[11:11:04] <wikibugs>	 10Operations, 10CommRel-Specialists-Support (Oct-Dec-2020), 10User-notice: CommRel support for FY2020-2021 Q2 DC switchback - https://phabricator.wikimedia.org/T264364 (10Trizek-WMF) Retro item: dealing with the date displayed on the banner, [[ https://meta.wikimedia.org/wiki/MediaWiki_talk:Centralnotice-tem...
[11:11:09] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] sre.ganeti.makevm: ask to review args before DNS allocation [cookbooks] - 10https://gerrit.wikimedia.org/r/636890 (owner: 10Elukey)
[11:14:20] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10User-Kormat: Explore orchestrator hooks to integrate them with !log, irc alerts and emails - https://phabricator.wikimedia.org/T266452 (10Marostegui)
[11:15:04] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10User-Kormat: Explore orchestrator hooks to integrate them with !log, irc alerts and emails - https://phabricator.wikimedia.org/T266452 (10Marostegui)
[11:15:12] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10User-Kormat: Explore orchestrator hooks to integrate them with !log, irc alerts and emails - https://phabricator.wikimedia.org/T266452 (10Marostegui) Along with !log we should include sending an email/irc alerts on some of the most important cases like: PostUnsuccessf...
[11:18:05] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2044 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:18:41] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10User-Kormat: Explore orchestrator hooks to integrate them with !log, irc alerts and emails - https://phabricator.wikimedia.org/T266452 (10Peachey88)
[11:24:01] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad #o11y on alert1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash-codfw instance=kafkamon1002 job=burrow partition={0,1} prometheus=ops site=eqiad topic={udp_localhost-info,udp_localhost-warning} https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-dat
[11:24:01] <icinga-wm>	 r-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[11:24:37] <wikibugs>	 (03PS1) 10Muehlenhoff: Hide the "Sign out" menu option when using CAS [puppet] - 10https://gerrit.wikimedia.org/r/636907 (https://phabricator.wikimedia.org/T262512)
[11:24:51] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] profile::icinga: drop monitoring for systemd::timer::job [puppet] - 10https://gerrit.wikimedia.org/r/636899 (owner: 10Jbond)
[11:25:15] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Hide the "Sign out" menu option when using CAS [puppet] - 10https://gerrit.wikimedia.org/r/636907 (https://phabricator.wikimedia.org/T262512) (owner: 10Muehlenhoff)
[11:30:04] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack to version 7 - https://phabricator.wikimedia.org/T234854 (10ayounsi) Trying to load: https://logstash-next.wikimedia.org/app/dashboards#/view/6bcd2a10-7d21-11e7-86fb-51c84229aeb7  My laptop fan starts spinning very hard, everything tim...
[11:30:37] <wikibugs>	 (03PS3) 10Jbond: base::labs: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/635905 (owner: 10Dzahn)
[11:32:18] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885) (owner: 10Dzahn)
[11:32:25] <wikibugs>	 (03PS5) 10Jbond: base/labs: add systemd timer to clean puppet client bucket [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885) (owner: 10Dzahn)
[11:32:33] <wikibugs>	 (03PS4) 10Jbond: base::labs: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/635905 (owner: 10Dzahn)
[11:32:41] <wikibugs>	 (03PS1) 10Hashar: ci: run docker with debug logging [puppet] - 10https://gerrit.wikimedia.org/r/636908 (https://phabricator.wikimedia.org/T265615)
[11:33:35] <wikibugs>	 (03PS1) 10Hnowlan: maps: reenable eqiad OSM replication [puppet] - 10https://gerrit.wikimedia.org/r/636909 (https://phabricator.wikimedia.org/T254014)
[11:34:45] <wikibugs>	 (03PS5) 10Jbond: base::labs: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/635905 (owner: 10Dzahn)
[11:34:47] <wikibugs>	 (03PS6) 10Jbond: base/labs: add systemd timer to clean puppet client bucket [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885) (owner: 10Dzahn)
[11:35:45] <wikibugs>	 10Operations, 10DBA, 10Orchestrator: Run orchestrator as non-root - https://phabricator.wikimedia.org/T266656 (10Marostegui)
[11:35:54] <wikibugs>	 (03Abandoned) 10Hnowlan: maps: reenable eqiad OSM replication [puppet] - 10https://gerrit.wikimedia.org/r/636909 (https://phabricator.wikimedia.org/T254014) (owner: 10Hnowlan)
[11:36:12] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] base::labs: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/635905 (owner: 10Dzahn)
[11:36:33] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "I flipped the relation chain so this can get merged first and will mereg" [puppet] - 10https://gerrit.wikimedia.org/r/635905 (owner: 10Dzahn)
[11:36:36] <wikibugs>	 (03CR) 10Hashar: "I have cherry picked it.  That then requires docker to be reloaded 'systemctl reload docker'. I have no idea about the amount of logs that" [puppet] - 10https://gerrit.wikimedia.org/r/636908 (https://phabricator.wikimedia.org/T265615) (owner: 10Hashar)
[11:36:53] <wikibugs>	 10Operations, 10DBA, 10Orchestrator: Run orchestrator as non-root - https://phabricator.wikimedia.org/T266656 (10Marostegui) p:05Triage→03Medium
[11:36:59] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885) (owner: 10Dzahn)
[11:38:24] <wikibugs>	 10Operations, 10DBA, 10Orchestrator: Run orchestrator as non-root - https://phabricator.wikimedia.org/T266656 (10Marostegui)
[11:38:27] <wikibugs>	 10Operations, 10DBA, 10User-Kormat: orchestrator: Get packages into WMF apt - https://phabricator.wikimedia.org/T266023 (10Marostegui)
[11:39:03] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2044 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:42:10] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+1] Define scaffold_version before attempting to use it [deployment-charts] - 10https://gerrit.wikimedia.org/r/636905 (owner: 10Kosta Harlan)
[11:44:25] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm optional comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/636865 (owner: 10Elukey)
[11:46:30] <XioNoX>	 !log configure urpf strict log-only on cr3-ulsfo:et-0/0/1.501 - T266561
[11:46:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:37] <stashbot>	 T266561: Apply uRPF strict mode on Customer links - https://phabricator.wikimedia.org/T266561
[11:47:24] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] ci: run docker with debug logging [puppet] - 10https://gerrit.wikimedia.org/r/636908 (https://phabricator.wikimedia.org/T265615) (owner: 10Hashar)
[11:47:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Hide the "Sign out" menu option when using CAS [puppet] - 10https://gerrit.wikimedia.org/r/636907 (https://phabricator.wikimedia.org/T262512) (owner: 10Muehlenhoff)
[11:50:56] <wikibugs>	 (03CR) 10Muehlenhoff: profile::java: add support for Jessie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/636865 (owner: 10Elukey)
[11:52:35] <wikibugs>	 (03CR) 10Jbond: "lgtm some optional nits" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176) (owner: 10Elukey)
[11:54:47] <wikibugs>	 (03CR) 10Ayounsi: Add uRPF strict mode to Customers links (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/636653 (https://phabricator.wikimedia.org/T266561) (owner: 10Ayounsi)
[11:55:19] <wikibugs>	 10Operations, 10DBA, 10Orchestrator: Run orchestrator as non-root - https://phabricator.wikimedia.org/T266656 (10MoritzMuehlenhoff) If this is solely about the need to bind to a privileged port,   ` sudo setcap 'cap_net_bind_service=+ep' $ORCHESTRATORBINARY `  might also simply work out?
[11:56:45] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/636645 (owner: 10Volans)
[11:57:06] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] profile::icinga: drop monitoring for systemd::timer::job [puppet] - 10https://gerrit.wikimedia.org/r/636899 (owner: 10Jbond)
[11:57:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/636898 (owner: 10Jbond)
[11:57:51] <jbond42>	 moritzm: ok to merge
[11:58:58] <jbond42>	 moritzm: merging https://gerrit.wikimedia.org/r/636907
[11:59:04] <moritzm>	 ack, please go ahead
[11:59:07] <jbond42>	 done
[11:59:28] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] apereo_cas: dont monitor systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/636889 (owner: 10Jbond)
[12:03:17] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] profile::docker::builder: drop monitoring for systemd::timer::job [puppet] - 10https://gerrit.wikimedia.org/r/636898 (owner: 10Jbond)
[12:04:26] <wikibugs>	 (03PS4) 10Hnowlan: Isolate eqiad master maps1004 from cluster [puppet] - 10https://gerrit.wikimedia.org/r/608729 (https://phabricator.wikimedia.org/T254014) (owner: 10Ryan Kemper)
[12:04:36] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "Agreed, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/636892 (owner: 10Jbond)
[12:07:24] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review: Apply uRPF strict mode on Customer links - https://phabricator.wikimedia.org/T266561 (10ayounsi) Pushed the following: `lang=diff [edit interfaces et-0/0/1 unit 501 family inet] +       rpf-check { +           apply-groups-except external-links; +           fail-fi...
[12:09:07] <wikibugs>	 (03PS5) 10Hnowlan: Isolate eqiad master maps1004 from cluster [puppet] - 10https://gerrit.wikimedia.org/r/608729 (https://phabricator.wikimedia.org/T254014) (owner: 10Ryan Kemper)
[12:10:32] <wikibugs>	 (03PS2) 10Ayounsi: Add uRPF strict mode to Customers links [homer/public] - 10https://gerrit.wikimedia.org/r/636653 (https://phabricator.wikimedia.org/T266561)
[12:11:00] <wikibugs>	 (03CR) 10Ayounsi: "Diff on ulsfo routers:" [homer/public] - 10https://gerrit.wikimedia.org/r/636653 (https://phabricator.wikimedia.org/T266561) (owner: 10Ayounsi)
[12:11:51] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:14:20] <wikibugs>	 (03PS1) 10JMeybohm: eventrouter: deploy to codfw and eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/636912 (https://phabricator.wikimedia.org/T262675)
[12:19:47] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review: Apply uRPF strict mode on Customer links - https://phabricator.wikimedia.org/T266561 (10ayounsi) Another one: `Oct 28 12:08:13  cr4-ulsfo fpc0 PFE_FW_SYSLOG_ETH_IP: FW: et-0/0/1.501 A 01f5:0800 ac:1f:6b:c4:38:c8 -> ec:38:73:75:34:cf  tcp 64.x 202.y 30799 54744 (1 p...
[12:29:36] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "Fun." [software/cumin] - 10https://gerrit.wikimedia.org/r/636729 (owner: 10Volans)
[12:31:11] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:32:09] <icinga-wm>	 PROBLEM - SSH on ms-be2039 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[12:33:41] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "spicerack._lgtm" [software/spicerack] - 10https://gerrit.wikimedia.org/r/634056 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans)
[12:34:23] <icinga-wm>	 RECOVERY - SSH on ms-be2039 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[12:37:19] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:37:38] <wikibugs>	 (03CR) 10Muehlenhoff: zookeeper: use profile::java (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176) (owner: 10Elukey)
[12:38:41] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:39:58] <moritzm>	 !log installing libdatetime-timezone-perl  updates
[12:40:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:44:11] <wikibugs>	 10Operations, 10DBA, 10Orchestrator: Run orchestrator as non-root - https://phabricator.wikimedia.org/T266656 (10Kormat) There's no reason it needs a privileged port. It will be behind a reverse proxy anyway. The package doesn't create a user/group, so that's the first thing to fix.
[12:47:51] <wikibugs>	 10Operations, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10jijiki)
[12:48:47] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:56:51] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:03:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Also enable cn=grafana-admin for grafana-rw.w.o [puppet] - 10https://gerrit.wikimedia.org/r/636885 (https://phabricator.wikimedia.org/T262512) (owner: 10Muehlenhoff)
[13:03:38] <wikibugs>	 (03PS1) 10Kosta Harlan: linkrecommendation: Add deployment chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/636916 (https://phabricator.wikimedia.org/T265893)
[13:04:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] linkrecommendation: Add deployment chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/636916 (https://phabricator.wikimedia.org/T265893) (owner: 10Kosta Harlan)
[13:07:26] <wikibugs>	 (03PS1) 10Filippo Giunchedi: grafana: fix bytes vs string error in ldap_users_sync [puppet] - 10https://gerrit.wikimedia.org/r/636917 (https://phabricator.wikimedia.org/T265712)
[13:07:35] <wikibugs>	 (03CR) 10Ottomata: "FYI npm and nodejs are included in the anaconda / conda distribution.  If they use anaconda, they can npm install already.  Although...loo" [puppet] - 10https://gerrit.wikimedia.org/r/636817 (owner: 10Elukey)
[13:07:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] grafana: fix bytes vs string error in ldap_users_sync [puppet] - 10https://gerrit.wikimedia.org/r/636917 (https://phabricator.wikimedia.org/T265712) (owner: 10Filippo Giunchedi)
[13:10:08] <wikibugs>	 (03PS2) 10Filippo Giunchedi: grafana: fix bytes vs string error in ldap_users_sync [puppet] - 10https://gerrit.wikimedia.org/r/636917 (https://phabricator.wikimedia.org/T265712)
[13:11:26] <wikibugs>	 10Operations, 10Puppet, 10observability, 10Patch-For-Review, and 2 others: Puppet: get row/rack info from Netbox - https://phabricator.wikimedia.org/T229397 (10ayounsi) Let's not over-engineer it.  **Automatic**. For what I understand, that data is to be used for "Grafana dashboards and Cumin", so if Netbo...
[13:12:07] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/636922
[13:15:37] <wikibugs>	 (03CR) 10Matthias Mullie: "Code seems to make senses; just spotted a few things that I don't know are intentional." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/629121 (https://phabricator.wikimedia.org/T259067) (owner: 10Cparle)
[13:30:03] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] Revert "hieradata: move swiftrepl to codfw" [puppet] - 10https://gerrit.wikimedia.org/r/636873 (owner: 10Filippo Giunchedi)
[13:34:14] <wikibugs>	 (03CR) 10Elukey: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/636817 (owner: 10Elukey)
[13:36:45] <wikibugs>	 (03PS2) 10Niedzielski: admin: remove niedzielski [puppet] - 10https://gerrit.wikimedia.org/r/636671
[13:36:58] <wikibugs>	 (03PS1) 10Jbond: Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923
[13:37:00] <wikibugs>	 (03PS1) 10Jbond: java: add new java version facts [puppet] - 10https://gerrit.wikimedia.org/r/636924
[13:37:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] admin: remove niedzielski [puppet] - 10https://gerrit.wikimedia.org/r/636671 (owner: 10Niedzielski)
[13:37:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] java: add new java version facts [puppet] - 10https://gerrit.wikimedia.org/r/636924 (owner: 10Jbond)
[13:40:24] <wikibugs>	 (03PS3) 10Niedzielski: admin: remove niedzielski [puppet] - 10https://gerrit.wikimedia.org/r/636671
[13:42:41] <wikibugs>	 (03CR) 10Niedzielski: admin: remove niedzielski (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/636671 (owner: 10Niedzielski)
[13:44:51] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] eventrouter: deploy to codfw and eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/636912 (https://phabricator.wikimedia.org/T262675) (owner: 10JMeybohm)
[13:45:13] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/608729 (https://phabricator.wikimedia.org/T254014) (owner: 10Ryan Kemper)
[13:47:39] <wikibugs>	 (03Merged) 10jenkins-bot: eventrouter: deploy to codfw and eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/636912 (https://phabricator.wikimedia.org/T262675) (owner: 10JMeybohm)
[13:52:09] <wikibugs>	 (03PS3) 10Filippo Giunchedi: grafana: fix bytes vs string error in ldap_users_sync [puppet] - 10https://gerrit.wikimedia.org/r/636917 (https://phabricator.wikimedia.org/T265712)
[13:52:11] <wikibugs>	 (03PS1) 10Filippo Giunchedi: grafana: make grafana-rw dashboards link work for anonymous users [puppet] - 10https://gerrit.wikimedia.org/r/636927 (https://phabricator.wikimedia.org/T265712)
[13:54:48] <logmsgbot>	 !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
[13:54:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:08] <wikibugs>	 (03PS2) 10Filippo Giunchedi: grafana: make grafana-rw dashboards link work for anonymous users [puppet] - 10https://gerrit.wikimedia.org/r/636927 (https://phabricator.wikimedia.org/T265712)
[13:56:36] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1001/26182/grafana2001.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/636927 (https://phabricator.wikimedia.org/T265712) (owner: 10Filippo Giunchedi)
[13:58:09] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/636917 (https://phabricator.wikimedia.org/T265712) (owner: 10Filippo Giunchedi)
[13:58:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] Revert "hieradata: move swiftrepl to codfw" [puppet] - 10https://gerrit.wikimedia.org/r/636873 (owner: 10Filippo Giunchedi)
[14:00:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] grafana: fix bytes vs string error in ldap_users_sync [puppet] - 10https://gerrit.wikimedia.org/r/636917 (https://phabricator.wikimedia.org/T265712) (owner: 10Filippo Giunchedi)
[14:02:47] <wikibugs>	 10Operations, 10Analytics-Radar, 10SRE-Access-Requests: Nuria's volunteer account - https://phabricator.wikimedia.org/T266086 (10gsingers) @MoritzMuehlenhoff Thanks! Approved.
[14:04:59] <wikibugs>	 10Operations, 10DBA, 10User-Kormat: Clean up role::mariadb::ferm and profile::mariadb::ferm - https://phabricator.wikimedia.org/T265901 (10LSobanski)
[14:05:24] <wikibugs>	 10Operations, 10DBA, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Config, 10User-Kormat: Create integration test env for wmfmariadbpy - https://phabricator.wikimedia.org/T265266 (10LSobanski)
[14:05:44] <logmsgbot>	 !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
[14:05:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:53] <icinga-wm>	 PROBLEM - Check systemd state on kubestage1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:11:50] <wikibugs>	 (03PS1) 10Muehlenhoff: Fold Grafana settings for CAS into the Hiera role data [puppet] - 10https://gerrit.wikimedia.org/r/636929 (https://phabricator.wikimedia.org/T265712)
[14:11:52] <wikibugs>	 (03PS1) 10Muehlenhoff: Point grafana-rw to grafana1002 [puppet] - 10https://gerrit.wikimedia.org/r/636930 (https://phabricator.wikimedia.org/T265712)
[14:12:57] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2054 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:15:05] <wikibugs>	 (03CR) 10C. Scott Ananian: Enable parsoid on api_appserver (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635086 (https://phabricator.wikimedia.org/T265954) (owner: 10Ppchelko)
[14:16:06] <wikibugs>	 (03CR) 10C. Scott Ananian: "This patch shouldn't be necessary if $wgParsoidEnableRESTAPI defaults to true?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635095 (https://phabricator.wikimedia.org/T265954) (owner: 10Ppchelko)
[14:17:24] <wikibugs>	 (03Abandoned) 10Ppchelko: Enable Parsoid REST API when loading it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635095 (https://phabricator.wikimedia.org/T265954) (owner: 10Ppchelko)
[14:18:39] <wikibugs>	 (03PS2) 10Muehlenhoff: Fold Grafana settings for CAS into the Hiera role data [puppet] - 10https://gerrit.wikimedia.org/r/636929 (https://phabricator.wikimedia.org/T265712)
[14:26:13] <wikibugs>	 (03PS6) 10Ppchelko: Enable parsoid on api_appserver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635086 (https://phabricator.wikimedia.org/T265954)
[14:27:13] <wikibugs>	 (03CR) 10Ppchelko: Enable parsoid on api_appserver (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635086 (https://phabricator.wikimedia.org/T265954) (owner: 10Ppchelko)
[14:31:47] <wikibugs>	 (03CR) 10Subramanya Sastry: "If we can run npm install on testreduce1001 VM, we can probably drop this entire repo. But, we'll need some puppet code to init it on test" [puppet] - 10https://gerrit.wikimedia.org/r/577656 (owner: 10C. Scott Ananian)
[14:36:45] <wikibugs>	 (03CR) 10Ottomata: "Naw I think its ok, we still need to spend some time on making this the default way to use stat boxes, but for that we need to get rid of " [puppet] - 10https://gerrit.wikimedia.org/r/636817 (owner: 10Elukey)
[14:38:35] <wikibugs>	 (03CR) 10MSantos: [C: 03+2] Update mobileapps to 2020-10-26-150740-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/636496 (https://phabricator.wikimedia.org/T264024) (owner: 10Ppchelko)
[14:39:08] <wikibugs>	 10Operations, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10jijiki)
[14:39:38] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, to be merged once we're GTG" [puppet] - 10https://gerrit.wikimedia.org/r/636929 (https://phabricator.wikimedia.org/T265712) (owner: 10Muehlenhoff)
[14:39:41] <wikibugs>	 (03CR) 10C. Scott Ananian: [C: 04-2] "LGTM, although a comment here would be nice." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635086 (https://phabricator.wikimedia.org/T265954) (owner: 10Ppchelko)
[14:39:46] <wikibugs>	 (03PS1) 10Muehlenhoff: Update email address for Nuria [puppet] - 10https://gerrit.wikimedia.org/r/636936
[14:41:01] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: safe-service-restart: add optional poolcounter support (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/635991 (https://phabricator.wikimedia.org/T266055) (owner: 10Giuseppe Lavagetto)
[14:41:44] <wikibugs>	 (03Merged) 10jenkins-bot: Update mobileapps to 2020-10-26-150740-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/636496 (https://phabricator.wikimedia.org/T264024) (owner: 10Ppchelko)
[14:41:56] <wikibugs>	 10Operations, 10Analytics-Radar, 10SRE-Access-Requests: Nuria's volunteer account - https://phabricator.wikimedia.org/T266086 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff Excellent, thanks! Closing this task since everything is completed now. I'll merge https://gerrit.wikimedia.org/r/c...
[14:42:24] <wikibugs>	 (03PS7) 10Ppchelko: Enable parsoid on api_appserver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635086 (https://phabricator.wikimedia.org/T265954)
[14:42:28] <wikibugs>	 (03CR) 10Ppchelko: Enable parsoid on api_appserver (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635086 (https://phabricator.wikimedia.org/T265954) (owner: 10Ppchelko)
[14:45:14] <wikibugs>	 (03PS1) 10Kormat: debian: Add repack script [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/636937
[14:45:28] <wikibugs>	 (03PS2) 10Jbond: Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923
[14:46:02] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 04-1] "Holding for modifications" [puppet] - 10https://gerrit.wikimedia.org/r/636074 (https://phabricator.wikimedia.org/T243009) (owner: 10Ahmon Dancy)
[14:46:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923 (owner: 10Jbond)
[14:46:33] <wikibugs>	 (03CR) 10Kormat: [V: 03+2 C: 03+2] debian: Add repack script [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/636937 (owner: 10Kormat)
[14:48:13] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Point grafana-rw to grafana1002 [puppet] - 10https://gerrit.wikimedia.org/r/636930 (https://phabricator.wikimedia.org/T265712) (owner: 10Muehlenhoff)
[14:50:53] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.ganeti.makevm
[14:50:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:54:01] <wikibugs>	 (03PS6) 10Elukey: zookeeper: use profile::java [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176)
[14:57:13] <wikibugs>	 (03CR) 10Elukey: zookeeper: use profile::java (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176) (owner: 10Elukey)
[14:57:30] <wikibugs>	 10Operations, 10observability, 10Epic: Monitor and alarm on SMART attributes [tracking] - https://phabricator.wikimedia.org/T86552 (10fgiunchedi)
[14:58:03] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2054 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:58:15] <wikibugs>	 10Operations, 10observability, 10Patch-For-Review: Open Phab tasks on SMART failure - https://phabricator.wikimedia.org/T196994 (10fgiunchedi)
[14:59:49] <wikibugs>	 (03PS3) 10Elukey: profile::java: add support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/636865
[14:59:51] <wikibugs>	 (03PS7) 10Elukey: zookeeper: use profile::java [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176)
[15:00:20] <wikibugs>	 (03CR) 10Elukey: profile::java: add support for Jessie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/636865 (owner: 10Elukey)
[15:01:30] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::java: add support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/636865 (owner: 10Elukey)
[15:04:06] <wikibugs>	 10Operations, 10ops-codfw, 10serviceops, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10Papaul) p:05Triage→03Medium
[15:04:15] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: Add --force flag to safe-service-restart.py [puppet] - 10https://gerrit.wikimedia.org/r/635630 (https://phabricator.wikimedia.org/T243009) (owner: 10Ahmon Dancy)
[15:04:17] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: safe-service-restart: add optional poolcounter support [puppet] - 10https://gerrit.wikimedia.org/r/635991 (https://phabricator.wikimedia.org/T266055)
[15:04:19] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: poolcounter: add client configuration classes [puppet] - 10https://gerrit.wikimedia.org/r/635992 (https://phabricator.wikimedia.org/T266055)
[15:04:21] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: profile::lvs::realserver: add ability to configure poolcounter for pools [puppet] - 10https://gerrit.wikimedia.org/r/635993 (https://phabricator.wikimedia.org/T266055)
[15:04:23] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: restbase: add poolcounter support to safe-service-restart scripts [puppet] - 10https://gerrit.wikimedia.org/r/635994 (https://phabricator.wikimedia.org/T266055)
[15:04:25] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: profile::lvs::realserver: add ability to configure poolcounter for pools (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/635993 (https://phabricator.wikimedia.org/T266055) (owner: 10Giuseppe Lavagetto)
[15:04:31] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:06:13] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:07:26] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission wtp2001 through wtp2020 - https://phabricator.wikimedia.org/T265558 (10Papaul)
[15:08:02] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission wtp2001 through wtp2020 - https://phabricator.wikimedia.org/T265558 (10Papaul) 05Open→03Resolved complete
[15:08:32] <godog>	 there's lag in logstash5 codfw, I'll take a look
[15:09:11] <wikibugs>	 (03PS3) 10Jbond: Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923
[15:09:13] <wikibugs>	 (03PS1) 10Jbond: pick_initscript_spec: use shared spec helper [puppet] - 10https://gerrit.wikimedia.org/r/636942
[15:10:19] <wikibugs>	 10Operations, 10netops: all network devices must run OpenSSH >= 7.2p1 but != 7.4p1 - https://phabricator.wikimedia.org/T254013 (10ayounsi)
[15:10:22] <wikibugs>	 (03CR) 10Volans: "> Patch Set 3: Code-Review+1" [software/cumin] - 10https://gerrit.wikimedia.org/r/636729 (owner: 10Volans)
[15:10:25] <godog>	 !log roll restart logstash5 in codfw
[15:10:28] <wikibugs>	 (03CR) 10Volans: "reply inline" [software/cumin] - 10https://gerrit.wikimedia.org/r/636729 (owner: 10Volans)
[15:10:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923 (owner: 10Jbond)
[15:11:06] <wikibugs>	 10Operations, 10ops-codfw, 10ops-eqiad, 10DC-Ops: Audit & update spares part tracking for all sites - https://phabricator.wikimedia.org/T243450 (10Papaul)
[15:11:33] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] pick_initscript_spec: use shared spec helper [puppet] - 10https://gerrit.wikimedia.org/r/636942 (owner: 10Jbond)
[15:12:01] <wikibugs>	 (03CR) 10Volans: "> Patch Set 1: Code-Review+1" [software/spicerack] - 10https://gerrit.wikimedia.org/r/634056 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans)
[15:15:42] <wikibugs>	 (03PS4) 10Elukey: profile::java: add support for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/636865
[15:15:44] <wikibugs>	 (03PS8) 10Elukey: zookeeper: use profile::java [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176)
[15:19:42] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10Cmjohnson)
[15:20:10] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/636865 (owner: 10Elukey)
[15:20:43] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10Cmjohnson) a:05Cmjohnson→03RobH @robh These still need the raid setup, you mentioned you could do that. If not please let me know and I will take c...
[15:23:26] <logmsgbot>	 !log ppchelko@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
[15:23:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:23:39] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install ms-be20[58-61] - https://phabricator.wikimedia.org/T265419 (10Papaul)
[15:24:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Thanks, looks good! I'll merge the patch after the 7th." [puppet] - 10https://gerrit.wikimedia.org/r/636671 (owner: 10Niedzielski)
[15:24:53] <logmsgbot>	 !log ppchelko@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
[15:24:53] <logmsgbot>	 !log ppchelko@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
[15:24:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:28] <wikibugs>	 (03PS1) 10Jbond: base: fix spec test [puppet] - 10https://gerrit.wikimedia.org/r/636943
[15:26:10] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/26185/" [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176) (owner: 10Elukey)
[15:27:10] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] base: fix spec test [puppet] - 10https://gerrit.wikimedia.org/r/636943 (owner: 10Jbond)
[15:29:39] <wikibugs>	 (03PS1) 10Elukey: aptrepo: add flink to the bigtop14 package list [puppet] - 10https://gerrit.wikimedia.org/r/636944 (https://phabricator.wikimedia.org/T266495)
[15:33:09] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad #o11y on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=thanos&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[15:33:17] <logmsgbot>	 !log ppchelko@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
[15:33:17] <logmsgbot>	 !log ppchelko@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
[15:33:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:35] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[15:33:46] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] aptrepo: add flink to the bigtop14 package list [puppet] - 10https://gerrit.wikimedia.org/r/636944 (https://phabricator.wikimedia.org/T266495) (owner: 10Elukey)
[15:35:21] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2010 is CRITICAL: instance=kubernetes2010.codfw.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:36:05] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2008 is CRITICAL: instance=kubernetes2008.codfw.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:36:12] <wikibugs>	 (03CR) 10Volans: [C: 03+2] requests: add new module [software/pywmflib] - 10https://gerrit.wikimedia.org/r/636645 (owner: 10Volans)
[15:36:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176) (owner: 10Elukey)
[15:37:39] <wikibugs>	 (03Merged) 10jenkins-bot: requests: add new module [software/pywmflib] - 10https://gerrit.wikimedia.org/r/636645 (owner: 10Volans)
[15:38:51] <jayme>	 kubelet stuff is "fine" (consequence of mobileapps deploy...)
[15:38:52] <wikibugs>	 (03PS7) 10Volans: sre.hosts.decommission: import from new library [cookbooks] - 10https://gerrit.wikimedia.org/r/629692 (https://phabricator.wikimedia.org/T257905)
[15:39:40] <wikibugs>	 10Operations, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10jijiki)
[15:39:47] <wikibugs>	 10Operations, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Traffic, and 2 others: PDF download generates invalid PDF files - https://phabricator.wikimedia.org/T266559 (10LGoto) a:03Jgiannelos
[15:41:02] <wikibugs>	 (03PS1) 10Jbond: node_intel_microcode: fix time spec 1h vs hourly [puppet] - 10https://gerrit.wikimedia.org/r/636968
[15:41:51] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2007 is CRITICAL: instance=kubernetes2007.codfw.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:41:56] <wikibugs>	 (03CR) 10Niedzielski: "> Patch Set 3: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/636671 (owner: 10Niedzielski)
[15:42:07] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] node_intel_microcode: fix time spec 1h vs hourly [puppet] - 10https://gerrit.wikimedia.org/r/636968 (owner: 10Jbond)
[15:43:27] <wikibugs>	 (03CR) 10Volans: [C: 03+2] sre.hosts.decommission: import from new library [cookbooks] - 10https://gerrit.wikimedia.org/r/629692 (https://phabricator.wikimedia.org/T257905) (owner: 10Volans)
[15:43:31] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2007 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:43:39] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2010 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:44:23] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2008 is CRITICAL: instance=kubernetes2008.codfw.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:46:05] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2008 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:46:34] <wikibugs>	 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10LGoto) a:03Jgiannelos
[15:47:46] <wikibugs>	 (03Merged) 10jenkins-bot: sre.hosts.decommission: import from new library [cookbooks] - 10https://gerrit.wikimedia.org/r/629692 (https://phabricator.wikimedia.org/T257905) (owner: 10Volans)
[15:48:55] <wikibugs>	 (03PS1) 10Volans: netbox: add dependency on python3-wmflib [puppet] - 10https://gerrit.wikimedia.org/r/636969
[15:49:47] <logmsgbot>	 !log elukey@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
[15:49:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:20] <wikibugs>	 (03PS3) 10Volans: dns: add retry logic to all Netbox API calls [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/636406
[15:50:57] <wikibugs>	 (03CR) 10Volans: "Updated to use the new feature in the wmflib package. Added the depedency in Puppet in the Depends-On patch." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/636406 (owner: 10Volans)
[15:51:25] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1013 is CRITICAL: instance=kubernetes1013.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:51:43] <Amir1>	 !log restarting uwsgi on ores in eqiad
[15:51:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:52:47] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1012 is CRITICAL: instance=kubernetes1012.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:55:43] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+2] Add changeprop rules for newcomerTasksCacheRefreshJob [deployment-charts] - 10https://gerrit.wikimedia.org/r/636078 (https://phabricator.wikimedia.org/T260758) (owner: 10Catrope)
[15:56:07] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1012 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:56:23] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1013 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[15:59:29] <wikibugs>	 (03Merged) 10jenkins-bot: Add changeprop rules for newcomerTasksCacheRefreshJob [deployment-charts] - 10https://gerrit.wikimedia.org/r/636078 (https://phabricator.wikimedia.org/T260758) (owner: 10Catrope)
[16:01:51] <icinga-wm>	 PROBLEM - Check systemd state on netflow2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:03:19] <logmsgbot>	 !log ppchelko@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
[16:03:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:04:20] <wikibugs>	 (03PS1) 10Cmjohnson: Add mac addresses to dhcp for clouddb1013-1020 [puppet] - 10https://gerrit.wikimedia.org/r/636971 (https://phabricator.wikimedia.org/T260441)
[16:05:40] <logmsgbot>	 !log ppchelko@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
[16:05:42] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need By: ASAP) rack/setup/install clouddb10[13-20] - https://phabricator.wikimedia.org/T260441 (10Cmjohnson)
[16:05:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:54] <wikibugs>	 (03CR) 10Cmjohnson: [C: 03+2] Add mac addresses to dhcp for clouddb1013-1020 [puppet] - 10https://gerrit.wikimedia.org/r/636971 (https://phabricator.wikimedia.org/T260441) (owner: 10Cmjohnson)
[16:06:25] <logmsgbot>	 !log ppchelko@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
[16:06:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:00] <wikibugs>	 (03PS5) 10Jeena Huneidi: [DNM] Experimental King helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/634354 (https://phabricator.wikimedia.org/T258572)
[16:07:35] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[16:09:05] <wikibugs>	 (03PS6) 10Jeena Huneidi: King helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/634354 (https://phabricator.wikimedia.org/T258572)
[16:09:49] <wikibugs>	 (03PS1) 10Cmjohnson: updating site.pp entry for new ES servers in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/636972 (https://phabricator.wikimedia.org/T260370)
[16:09:51] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:11:28] <wikibugs>	 (03CR) 10Cmjohnson: [C: 03+2] updating site.pp entry for new ES servers in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/636972 (https://phabricator.wikimedia.org/T260370) (owner: 10Cmjohnson)
[16:11:31] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:14:42] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: ASAP) rack/setup/install clouddb10[13-20] - https://phabricator.wikimedia.org/T260441 (10Cmjohnson) a:05Cmjohnson→03RobH @robh these are ready for install, the raid configuration has been completed. Just need to do the fin...
[16:15:07] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[16:15:08] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:15:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:37] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] Temporarily disable tilerator in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/608459 (https://phabricator.wikimedia.org/T254014) (owner: 10Ryan Kemper)
[16:16:02] <hnowlan>	 !log Disabling tilerator in eqiad 
[16:16:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:17:12] <wikibugs>	 (03Abandoned) 10Andrew Bogott: shinkengen for all projects [puppet] - 10https://gerrit.wikimedia.org/r/374897 (https://phabricator.wikimedia.org/T166845) (owner: 10Alex Monk)
[16:18:03] <wikibugs>	 (03CR) 10Marostegui: "This was already done a few weeks ago at https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881 - going to revert this and update the" [puppet] - 10https://gerrit.wikimedia.org/r/636467 (https://phabricator.wikimedia.org/T260370) (owner: 10Cmjohnson)
[16:18:54] <logmsgbot>	 !log hnowlan@puppetmaster1001 conftool action : set/pooled=no; selector: dc=eqiad,cluster=kartotherian,service=kartotherian,name=maps1004.eqiad.wmnet
[16:18:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:48] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] k8s-haproxy: take steps to fix logging [puppet] - 10https://gerrit.wikimedia.org/r/636765 (https://phabricator.wikimedia.org/T266593) (owner: 10Bstorm)
[16:20:58] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Remove duplicate external store entries. [puppet] - 10https://gerrit.wikimedia.org/r/636974 (https://phabricator.wikimedia.org/T260370)
[16:21:33] <wikibugs>	 10Operations, 10DC-Ops, 10Platform Engineering, 10serviceops: Rename wtp* servers to parse* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10Dzahn) The codfw part of this is done meanwhile. There are only parse2* but no wtp2*.  (T247441 and others)  The eqiad part though is still left t...
[16:21:35] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] site.pp: Remove duplicate external store entries. [puppet] - 10https://gerrit.wikimedia.org/r/636974 (https://phabricator.wikimedia.org/T260370) (owner: 10Marostegui)
[16:21:37] <icinga-wm>	 PROBLEM - Check systemd state on maps1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:22:02] <logmsgbot>	 !log hnowlan@puppetmaster1001 conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
[16:22:04] <marostegui>	 bstorm: ok to merge your change?
[16:22:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:08] <logmsgbot>	 !log hnowlan@puppetmaster1001 conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
[16:22:09] <bstorm>	 Sure!
[16:22:12] <bstorm>	 I was just about to
[16:22:14] <marostegui>	 bstorm: Merging!
[16:22:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:17] <bstorm>	 Thanks!
[16:27:11] <icinga-wm>	 PROBLEM - Check systemd state on idp2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:27:44] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10User-Kormat: Explore orchestrator hooks to integrate them with dbctl, !log, irc alerts and emails - https://phabricator.wikimedia.org/T266452 (10Marostegui)
[16:28:01] <wikibugs>	 10Operations, 10DBA, 10Orchestrator: Run orchestrator as non-root - https://phabricator.wikimedia.org/T266656 (10Marostegui)
[16:28:15] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on maps1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Hnowlan Expected for maps resync https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:28:49] <wikibugs>	 (03PS1) 10Bstorm: k8s-haproxy: Fix a typo in the logrotate config [puppet] - 10https://gerrit.wikimedia.org/r/636975 (https://phabricator.wikimedia.org/T266593)
[16:29:09] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission-hardware, 10cloud-services-team (Kanban): decommission cloudvirt100[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T263151 (10Cmjohnson)
[16:29:14] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission-hardware, 10cloud-services-team (Kanban): decommission cloudvirt100[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T263151 (10Cmjohnson) 05Open→03Resolved
[16:29:46] <wikibugs>	 (03PS6) 10Hnowlan: Isolate eqiad master maps1004 from cluster [puppet] - 10https://gerrit.wikimedia.org/r/608729 (https://phabricator.wikimedia.org/T254014) (owner: 10Ryan Kemper)
[16:31:35] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] Isolate eqiad master maps1004 from cluster [puppet] - 10https://gerrit.wikimedia.org/r/608729 (https://phabricator.wikimedia.org/T254014) (owner: 10Ryan Kemper)
[16:31:40] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission-hardware, 10cloud-services-team (Kanban): decommission cloudvirt1015.eqiad.wmnet - https://phabricator.wikimedia.org/T260840 (10Cmjohnson)
[16:31:44] <wikibugs>	 (03PS2) 10Bstorm: k8s-haproxy: Fix a typo in the logrotate config [puppet] - 10https://gerrit.wikimedia.org/r/636975 (https://phabricator.wikimedia.org/T266593)
[16:31:51] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission-hardware, 10cloud-services-team (Kanban): decommission cloudvirt1015.eqiad.wmnet - https://phabricator.wikimedia.org/T260840 (10Cmjohnson) 05Open→03Resolved
[16:32:11] <wikibugs>	 (03PS3) 10Bstorm: k8s-haproxy: Fix a typo in the logrotate config [puppet] - 10https://gerrit.wikimedia.org/r/636975 (https://phabricator.wikimedia.org/T266593)
[16:34:34] <wikibugs>	 (03CR) 10Bstorm: "Just saw this because it conflicts with something I was doing. Is it still a current patch?" [puppet] - 10https://gerrit.wikimedia.org/r/604782 (https://phabricator.wikimedia.org/T195217) (owner: 10Arturo Borrero Gonzalez)
[16:34:40] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[16:34:41] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:34:44] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[16:34:44] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:34:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:46] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[16:34:47] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:34:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:54] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] k8s-haproxy: Fix a typo in the logrotate config [puppet] - 10https://gerrit.wikimedia.org/r/636975 (https://phabricator.wikimedia.org/T266593) (owner: 10Bstorm)
[16:34:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:35:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:35:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:36:35] <wikibugs>	 10Operations, 10Patch-For-Review: logrotate for visualdiff tests on Parsoid test host (scandium) - https://phabricator.wikimedia.org/T161920 (10Dzahn) Yes, there are 56G in scandium:/srv/visualdiff/pngs but that is just over 50% so not an acute issue here.  @ssastry Do you ever manually delete pngs from /srv/v...
[16:38:48] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "looks good for the production part: https://puppet-compiler.wmflabs.org/compiler1002/26186/" [puppet] - 10https://gerrit.wikimedia.org/r/577043 (owner: 10C. Scott Ananian)
[16:44:38] <wikibugs>	 (03CR) 10Dzahn: "ready for an ACK or non-veto from WMCS team. Since it is opt-in it should not have any consequences until a host or role is added in Hiera" [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885) (owner: 10Dzahn)
[16:45:46] <wikibugs>	 10Operations, 10Puppet, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885 (10Dzahn) @Paladox Please see the change above. still interested in this?
[16:46:23] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "Fine with me -- since it's a no-op by default it seems harmless." [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885) (owner: 10Dzahn)
[16:46:53] <wikibugs>	 (03Abandoned) 10Dzahn: wmf-auto-restart: return 0 if service is not present or running [puppet] - 10https://gerrit.wikimedia.org/r/636728 (owner: 10Dzahn)
[16:49:27] <icinga-wm>	 PROBLEM - Check systemd state on releases2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:50:37] <mutante>	 ^ was working on fixing that, will look again
[16:54:48] <icinga-wm>	 RECOVERY - Check systemd state on releases2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:00:34] <wikibugs>	 (03PS1) 10Ppchelko: Temporary enable 'editpage' warn logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636983 (https://phabricator.wikimedia.org/T251023)
[17:02:53] <wikibugs>	 (03CR) 10DannyS712: Temporary enable 'editpage' warn logging (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636983 (https://phabricator.wikimedia.org/T251023) (owner: 10Ppchelko)
[17:04:09] <wikibugs>	 (03PS2) 10Ppchelko: Temporary enable 'editpage' warn logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636983 (https://phabricator.wikimedia.org/T251023)
[17:11:22] <wikibugs>	 (03CR) 10Jbond: "Have tested this with  `bundle exec rake global:spec` and profile fails however ` bundle exec rake global:spec:profile` succeeds" [puppet] - 10https://gerrit.wikimedia.org/r/636923 (owner: 10Jbond)
[17:12:14] <wikibugs>	 (03PS4) 10Jbond: Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923
[17:12:21] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885) (owner: 10Dzahn)
[17:12:54] <wikibugs>	 10Operations, 10Puppet, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885 (10Paladox) +1'd
[17:13:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923 (owner: 10Jbond)
[17:15:25] <wikibugs>	 (03PS2) 10Jbond: java: add new java version facts [puppet] - 10https://gerrit.wikimedia.org/r/636924
[17:15:32] <wikibugs>	 (03PS1) 10Andrew Bogott: deployment-prep: unset profile::cassandra::metrics_whitelist [puppet] - 10https://gerrit.wikimedia.org/r/636987
[17:16:29] <wikibugs>	 (03CR) 10Andrew Bogott: "This should fix puppet on at least 6 deployment-prep instances" [puppet] - 10https://gerrit.wikimedia.org/r/636987 (owner: 10Andrew Bogott)
[17:16:46] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] deployment-prep: unset profile::cassandra::metrics_whitelist [puppet] - 10https://gerrit.wikimedia.org/r/636987 (owner: 10Andrew Bogott)
[17:17:35] <wikibugs>	 10Operations, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10jijiki)
[17:18:24] <wikibugs>	 10Operations, 10Patch-For-Review: logrotate for visualdiff tests on Parsoid test host (scandium) - https://phabricator.wikimedia.org/T161920 (10ssastry) Since we aren't going to be running parsoid-vd and parsoid-vd-client on scanidum, that whole dir can be removed and unmounted (iirc, that is an external volum...
[17:19:59] <wikibugs>	 (03CR) 10Jbond: "ignore the previous CR in this chain (unless you know rspec then review most welcome)." [puppet] - 10https://gerrit.wikimedia.org/r/636924 (owner: 10Jbond)
[17:20:24] <wikibugs>	 (03PS3) 10Jbond: java: add new java version facts [puppet] - 10https://gerrit.wikimedia.org/r/636924
[17:22:25] <wikibugs>	 (03PS1) 10Bstorm: paws: Get haproxy logging working [puppet] - 10https://gerrit.wikimedia.org/r/636988 (https://phabricator.wikimedia.org/T266593)
[17:22:49] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:23:49] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:24:24] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] helm: drop monitoring for systemd::timer::job [puppet] - 10https://gerrit.wikimedia.org/r/636892 (owner: 10Jbond)
[17:24:26] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack to version 7 - https://phabricator.wikimedia.org/T234854 (10herron) >>! In T234854#6584418, @ayounsi wrote: > Trying to load: https://logstash-next.wikimedia.org/app/dashboards#/view/6bcd2a10-7d21-11e7-86fb-51c84229aeb7 >  > My laptop...
[17:24:42] <hnowlan>	 !log removing OSM database on maps1004 
[17:24:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:19] <icinga-wm>	 RECOVERY - Host ms-be1057 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms
[17:28:13] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/636865 (owner: 10Elukey)
[17:28:16] <wikibugs>	 10Operations, 10ops-eqiad: Power supply lost for analytics1072 - https://phabricator.wikimedia.org/T266644 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson fixed
[17:28:33] <wikibugs>	 10Operations, 10Platform Engineering, 10serviceops, 10User-jijiki: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643 (10jijiki)
[17:29:13] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: ms-be1057 down - cable disconnected? - https://phabricator.wikimedia.org/T266604 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson I am sorry, I am not sure how I did that but I did....it's fixed now.
[17:30:24] <hnowlan>	 !log reimporting OSM data for eqiad 
[17:30:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:42] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/636864 (https://phabricator.wikimedia.org/T264176) (owner: 10Elukey)
[17:34:53] <icinga-wm>	 RECOVERY - IPMI Sensor Status on analytics1072 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[17:34:53] <icinga-wm>	 PROBLEM - Host cloudvirt1030.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[17:37:31] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10Cmjohnson) this server does not have a 10GB nic card
[17:38:00] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10Cmjohnson) @Andrew @Bstorm Do you want me to put this back where it was?
[17:38:20] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] "Since this is the same change as for Toolforge, I'm going to go ahead and merge it." [puppet] - 10https://gerrit.wikimedia.org/r/636988 (https://phabricator.wikimedia.org/T266593) (owner: 10Bstorm)
[17:38:57] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1028 with 10G interfaces - https://phabricator.wikimedia.org/T266514 (10Cmjohnson) @andrew @bstrom This server does not have a 10GB nic card
[17:39:19] <wikibugs>	 (03CR) 10Razzi: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/636514 (https://phabricator.wikimedia.org/T240439) (owner: 10Razzi)
[17:39:30] <wikibugs>	 10Operations, 10DC-Ops, 10netops: patch in FB peering into cr1-eqiad:xe-3/2/1 - https://phabricator.wikimedia.org/T265916 (10Cmjohnson)
[17:39:33] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1028 with 10G interfaces - https://phabricator.wikimedia.org/T266514 (10Bstorm) That's very surprising!
[17:39:39] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: fix/replace cable ID 2648 on FB peering patch - cable report error - https://phabricator.wikimedia.org/T266497 (10Cmjohnson) 05Open→03Resolved Fixed, the cable that we had labeled as 2648 is actually 2649
[17:39:47] <icinga-wm>	 RECOVERY - Host cloudvirt1030.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.01 ms
[17:40:27] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[17:40:56] <wikibugs>	 (03PS2) 10Razzi: stats: Remove nginx from thorium [puppet] - 10https://gerrit.wikimedia.org/r/636514 (https://phabricator.wikimedia.org/T240439)
[17:42:34] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "Enum['absent', 'present'], got 'enable'" [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885) (owner: 10Dzahn)
[17:43:54] <wikibugs>	 (03PS7) 10Dzahn: base/labs: add systemd timer to clean puppet client bucket [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885)
[17:44:14] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] stats: Remove nginx from thorium [puppet] - 10https://gerrit.wikimedia.org/r/636514 (https://phabricator.wikimedia.org/T240439) (owner: 10Razzi)
[17:44:26] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 3 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki)
[17:44:29] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] stats: Remove nginx from thorium [puppet] - 10https://gerrit.wikimedia.org/r/636514 (https://phabricator.wikimedia.org/T240439) (owner: 10Razzi)
[17:44:33] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 3 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki)
[17:46:07] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[17:46:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:46:49] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.dns.netbox
[17:46:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:47:10] <volans>	 nice race! let's see who wins
[17:47:13] <volans>	 and please don't abort
[17:47:28] <volans>	 perfect way to test the locking mechanism
[17:48:59] <elukey>	 today I am not lucky with DNS
[17:49:09] <wikibugs>	 (03PS5) 10Jbond: Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923
[17:51:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923 (owner: 10Jbond)
[17:51:55] <icinga-wm>	 PROBLEM - Host labstore1005.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[17:52:28] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1028 with 10G interfaces - https://phabricator.wikimedia.org/T266514 (10Bstorm) The quote at T201352 lists the combined QLogic 57800 NIC, which should have 10 and 1 GB ports.
[17:52:33] <icinga-wm>	 PROBLEM - Host cloudvirt1030.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[17:52:50] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10Bstorm) The quote at T201352 lists the combined QLogic 57800 NIC, which should have 10 and 1 GB ports. I do not know if that matches reality...
[17:53:20] <wikibugs>	 (03PS1) 10Legoktm: Look for service.template in various code directories [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/636993 (https://phabricator.wikimedia.org/T266692)
[17:54:03] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/26189/" [puppet] - 10https://gerrit.wikimedia.org/r/635406 (https://phabricator.wikimedia.org/T165885) (owner: 10Dzahn)
[17:54:15] <elukey>	 volans: I see the diff, cmjohnson1 did you get a failure?
[17:54:28] <volans>	 both should see the diff
[17:54:52] <volans>	 the first one that pushes the commit "wins" and the other one should fail when pushing
[17:54:54] <wikibugs>	 10Operations, 10Parsoid, 10Parsoid-Tests, 10serviceops, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ssastry)
[17:55:12] <volans>	 because history changed
[17:55:26] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 3 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10aaron) Regarding RedisLockManager (it only needs 2 of the 3 host to be reachable). If one of them is depooled or refuses connection...
[17:55:28] <elukey>	 volans: ok ok, so should I go forward or should I wait for cmjohnson1's feedback?
[17:55:49] <wikibugs>	 (03CR) 10DannyS712: [C: 03+1] Temporary enable 'editpage' warn logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636983 (https://phabricator.wikimedia.org/T251023) (owner: 10Ppchelko)
[17:56:09] <volans>	 for the sake of testing if we can get a feedback better, but it's designed to DTRT whatever you do :)
[17:56:59] <icinga-wm>	 RECOVERY - Host labstore1005.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.61 ms
[18:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201028T1800).
[18:00:04] <jouncebot>	 kaldari, tgr, ryankemper, and Pchelolo: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[18:00:13] <kaldari>	 here
[18:00:26] <tgr_>	 o/
[18:01:11] <ryankemper>	 \o
[18:01:16] <Pchelolo>	 mine is super simple - just enabling a logging channel
[18:02:00] <ryankemper>	 Oh I just noticed I made a mistake last night - I forgot that +2'ing the patch auto submits
[18:02:05] <ryankemper>	 Should I do a quick revert of https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/636480?
[18:02:51] <Urbanecm>	 ryankemper: if it was not deployed yet, sure
[18:02:51] <logmsgbot>	 !log elukey@cumin1001 END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
[18:02:54] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10Bstorm) From online photos, I'd expect he NIC to have 4 ports, and 2 of those would be 10Gb
[18:02:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:03:04] <Urbanecm>	 tgr_: happy to deploy, unless you want to lead the window?
[18:03:21] <wikibugs>	 (03PS1) 10Ryan Kemper: Revert "Increase cirrus morelike pool counter by 20%" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636955
[18:03:35] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] Revert "Increase cirrus morelike pool counter by 20%" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636955 (owner: 10Ryan Kemper)
[18:03:43] <icinga-wm>	 PROBLEM - Host labstore1005.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[18:03:48] <tgr_>	 I don't think there's much point in reverting, we can just deploy it first
[18:03:55] <tgr_>	 Urbanecm: works for me either way
[18:04:24] <Urbanecm>	 tgr_: it's you deploying then :)
[18:04:24] <ryankemper>	 Well I'm great at bad timing :P
[18:04:38] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Increase cirrus morelike pool counter by 20%" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636955 (owner: 10Ryan Kemper)
[18:04:40] <wikibugs>	 (03PS6) 10Jbond: Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923
[18:05:06] <ryankemper>	 It's reverted so you can proceed in order of the queue
[18:06:04] <tgr_>	 ok, thanks
[18:06:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923 (owner: 10Jbond)
[18:06:31] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[18:06:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:10] <wikibugs>	 10Operations, 10Puppet, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885 (10Dzahn) ` dzahn@wikistats-dancing-goat:~$ sudo systemctl start cleanup_puppet_client_bucket.timer dzahn@wikista...
[18:07:17] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Removing obsolete license definition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/619880 (owner: 10Kaldari)
[18:07:19] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[18:07:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:35] <wikibugs>	 (03PS1) 10Ryan Kemper: Revert "Revert "Increase cirrus morelike pool counter by 20%"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636956
[18:08:05] <icinga-wm>	 RECOVERY - Host labstore1005.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms
[18:08:07] <wikibugs>	 (03Merged) 10jenkins-bot: Removing obsolete license definition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/619880 (owner: 10Kaldari)
[18:08:14] <wikibugs>	 10Operations, 10Puppet, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885 (10Dzahn) a:03Dzahn
[18:08:31] <icinga-wm>	 RECOVERY - Host cloudvirt1030.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.04 ms
[18:10:21] <tgr_>	 kaldari: it's on mwdebug2001 if you want to check
[18:10:28] <kaldari>	 checking....
[18:11:28] <kaldari>	 "The Wikimedia Commons database is temporarily in read-only mode for the following reason"?
[18:11:30] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[18:11:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:59] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[18:12:03] <wikibugs>	 (03PS2) 10Legoktm: Look for service.template in various code directories [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/636993 (https://phabricator.wikimedia.org/T266692)
[18:12:22] <Urbanecm>	 tgr_: fetch to 1002
[18:12:22] <tgr_>	 huh
[18:12:32] <Urbanecm>	 wrong mwdebug - we are on eqiad again
[18:12:36] <tgr_>	 ohh, right, we just switched back
[18:12:57] <wikibugs>	 (03CR) 10Legoktm: "Tested on Toolforge: https://phabricator.wikimedia.org/P13091" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/636993 (https://phabricator.wikimedia.org/T266692) (owner: 10Legoktm)
[18:13:19] <wikibugs>	 (03PS7) 10Jbond: Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923
[18:13:34] <kaldari>	 lemme know when I should test on 1002
[18:13:35] <tgr_>	 ok, it's on 1001
[18:13:39] <kaldari>	 or 1001 :)
[18:14:34] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Suggested edits: Include page ID with task preview data [extensions/GrowthExperiments] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/636787 (https://phabricator.wikimedia.org/T266600) (owner: 10Gergő Tisza)
[18:15:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923 (owner: 10Jbond)
[18:15:23] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10Cmjohnson) @bstorm you are correct, that is the nic that is in the server but the 10G capability would require 10GB SFP transceiver. I belie...
[18:16:14] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1] puppetmaster: add data types to all remaining parameters (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/635656 (owner: 10Dzahn)
[18:16:34] <kaldari>	 tgr_: All good, feel free to push
[18:18:31] <wikibugs>	 (03PS7) 10Dzahn: puppetmaster: add data types to all remaining parameters [puppet] - 10https://gerrit.wikimedia.org/r/635656
[18:19:37] <logmsgbot>	 !log tgr@deploy1001 Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:619880|Removing obsolete license definition]] (duration: 01m 00s)
[18:19:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:44] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Revert "Revert "Increase cirrus morelike pool counter by 20%"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636956 (owner: 10Ryan Kemper)
[18:20:05] <wikibugs>	 (03PS1) 10Elukey: sre.dns.netbox: add link to help with --force option [cookbooks] - 10https://gerrit.wikimedia.org/r/636997
[18:20:22] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10Cmjohnson) we will also need cat6 or 6a cable for these 2 each at 7M please.
[18:20:34] <tgr_>	 ryankemper: do you want those two config patches deployed separately, or in one step?
[18:20:35] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "As requested by Jbond I've reviewed the script more or less as if it was a new script, but avoiding to suggest anything drastic that would" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/634572 (owner: 10Jbond)
[18:21:03] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Revert "Increase cirrus morelike pool counter by 20%"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636956 (owner: 10Ryan Kemper)
[18:21:19] <wikibugs>	 (03PS2) 10Gergő Tisza: Revert "cirrus: Hardcode more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636791 (owner: 10Ryan Kemper)
[18:21:32] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM +1 assuming pcc is good" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/635656 (owner: 10Dzahn)
[18:21:36] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, thanks for creating the doc" [cookbooks] - 10https://gerrit.wikimedia.org/r/636997 (owner: 10Elukey)
[18:22:15] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] sre.dns.netbox: add link to help with --force option [cookbooks] - 10https://gerrit.wikimedia.org/r/636997 (owner: 10Elukey)
[18:22:16] <ryankemper>	 tgr_: they're unrelated, but neither of them require any testing at the `mwdebug` stage before proceeding to the rest of the fleet, so if you can do together that'd be great
[18:22:24] <tgr_>	 "1 apaches had sync errors"
[18:22:40] <tgr_>	 but it doesn't say which one...
[18:23:02] <tgr_>	 ['/usr/bin/scap', 'pull', '--no-update-l10n', '--no-php-restart', '--include', 'wmf-config', '--include', 'wmf-config/CommonSettings.php', 'mw1268.eqiad.wmnet', 'mw1319.eqiad.wmnet', 'mw1366.eqiad.wmnet', 'mw2215.codfw.wmnet', 'mw2254.codfw.wmnet', 'mw2289.codfw.wmnet', 'mw1285.eqiad.wmnet', 'mw1313.eqiad.wmnet'] on scandium.eqiad.wmnet returned [255]: Host key verification failed.
[18:23:10] <tgr_>	 so I guess one of those?
[18:23:40] <ryankemper>	 > Host key verification failed
[18:23:42] <ryankemper>	 that's weird
[18:24:16] <wikibugs>	 (03PS1) 10Dzahn: phabricator: replace require_package with ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/636999 (https://phabricator.wikimedia.org/T266479)
[18:24:22] <hnowlan>	 I think scandium just got removed from ssh_known_hosts- no idea why but I saw it in a puppet run
[18:25:04] <volans>	 because puppet has been disabled  (10168 minutes ago)
[18:25:31] <mutante>	 it's been requested to disable it for parsoid tests
[18:25:31] <volans>	 and so it has been evicted from puppetdb after 1 week
[18:25:56] <DannyS712>	 Pchelolo once your patch merges, is there any way for us to test, since it is expected not to be logging anything?
[18:26:06] <volans>	 in fact it shows up in https://netbox.wikimedia.org/extras/reports/puppetdb.PhysicalHosts/
[18:26:27] <tgr_>	 so the failed apache is scandium and all the other hostnames in that message are just distraction?
[18:26:38] <Pchelolo>	 nope, no way to test DannyS712 we hope that code that logs this is never executed 
[18:26:46] <tgr_>	 if that's the case we can ignore the error
[18:26:59] <volans>	 tgr_: no scandium has been removed from the other hosts so tehy can't verify scandium's identity
[18:27:09] <wikibugs>	 (03CR) 10Muehlenhoff: "Note this doesn't clean out the nginx* packages (not sure whether known/intentional etc, just mentioning it here)" [puppet] - 10https://gerrit.wikimedia.org/r/636514 (https://phabricator.wikimedia.org/T240439) (owner: 10Razzi)
[18:27:24] <volans>	 now is scandium a source of scap or just a target?
[18:27:38] <wikibugs>	 (03Merged) 10jenkins-bot: Suggested edits: Include page ID with task preview data [extensions/GrowthExperiments] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/636787 (https://phabricator.wikimedia.org/T266600) (owner: 10Gergő Tisza)
[18:27:44] <volans>	 if just a target I think it could be ignored
[18:27:53] <mutante>	 yes, it's just scandium. note how the error message gives a list of mw hosts but then it's "on scandium" after the bracket closes
[18:27:56] <tgr_>	 that message makes it sound like a source
[18:28:08] <mutante>	 'mw1313.eqiad.wmnet'] on ...
[18:28:10] <volans>	 yeah it's a bit confusing as a message
[18:28:12] <volans>	 and I have no context
[18:29:09] <tgr_>	 if scap pull failed on scandium, that's a meh. If fanout failed for half a dozen production apaches that would be bad
[18:29:49] <mutante>	 the list of mw servers is part of the command line
[18:31:21] <tgr_>	 mutante: I'm just confused why a scap pull would be run on scandium with a bunch of server names in the parameters
[18:31:25] <wikibugs>	 (03CR) 10Razzi: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/636514 (https://phabricator.wikimedia.org/T240439) (owner: 10Razzi)
[18:32:20] <tgr_>	 but you are saying this no other appserver than scandium is affected in any way, right?
[18:33:12] <mutante>	 tgr_: they are the scap proxies
[18:33:42] <mutante>	 yes, pretty sure it's just scandium 
[18:33:52] <tgr_>	 right, thanks. I see it's mentioned in scap --help, I should just have rtfm
[18:33:55] <mutante>	 and scandium is not pooled and I was just told today we can now remove it
[18:34:31] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) rack/setup/install frdb1004.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T265086 (10Cmjohnson)
[18:34:40] <wikibugs>	 (03PS8) 10Jbond: Gemfile: update puppetlabs_spec_helper version and switch to rspec-mock [puppet] - 10https://gerrit.wikimedia.org/r/636923
[18:35:49] <mutante>	 tgr_: ignore it.. and I will make a patch to remove that from the dsh group.. then reimage it later 
[18:36:17] <mutante>	 subbu just told me all parsoid patches are merged which means we can reimage that and no more reason to keep puppet disabled after that
[18:38:19] <subbu>	 mutante, are you going to reimage scandium anytime today? or later in the week?
[18:38:32] <subbu>	 just checking so i know when to start new rt testing runs.
[18:39:39] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Revert "cirrus: Hardcode more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636791 (owner: 10Ryan Kemper)
[18:39:47] <mutante>	 subbu: I can do it today.. but maybe I should first remove it from scap dsh groups to avoid that stuff above from happening
[18:39:56] <mutante>	 that was interesting timing :)
[18:40:02] <logmsgbot>	 !log tgr@deploy1001 Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:636787|Suggested edits: Include page ID with task preview data (T266600)]] (duration: 00m 59s)
[18:40:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:40:08] <stashbot>	 T266600: Newcomer tasks: edit tag not applying to all edits - https://phabricator.wikimedia.org/T266600
[18:40:09] <subbu>	 ok. right.
[18:40:10] <mutante>	 "up to a week" is no problem to disable puppet but after that we run into this
[18:40:44] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "cirrus: Hardcode more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636791 (owner: 10Ryan Kemper)
[18:41:05] <subbu>	 ok, i'll hold off kicking off test runs for now. let me know once scandium is ready again.
[18:41:08] <wikibugs>	 (03PS1) 10Dzahn: remove scandium from scap dsh group [puppet] - 10https://gerrit.wikimedia.org/r/637003
[18:41:35] <mutante>	 ok
[18:43:37] <logmsgbot>	 !log volans@cumin1001 START - Cookbook sre.dns.netbox
[18:43:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:08] <mutante>	 jouncebot: next
[18:44:09] <jouncebot>	 In 1 hour(s) and 15 minute(s): Services – Graphoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201028T2000)
[18:44:18] <wikibugs>	 (03PS2) 10Dzahn: remove scandium from scap dsh group [puppet] - 10https://gerrit.wikimedia.org/r/637003
[18:44:56] <wikibugs>	 (03PS3) 10Gergő Tisza: Temporary enable 'editpage' warn logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636983 (https://phabricator.wikimedia.org/T251023) (owner: 10Ppchelko)
[18:45:04] <logmsgbot>	 !log tgr@deploy1001 Synchronized wmf-config/PoolCounterSettings.php: Config: [[gerrit:636956|Revert "Revert "Increase cirrus morelike pool counter by 20%"" ()]] (duration: 00m 57s)
[18:45:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:46:09] <wikibugs>	 (03PS3) 10Dzahn: remove scandium from scap dsh group [puppet] - 10https://gerrit.wikimedia.org/r/637003
[18:46:13] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] Temporary enable 'editpage' warn logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636983 (https://phabricator.wikimedia.org/T251023) (owner: 10Ppchelko)
[18:46:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] remove scandium from scap dsh group [puppet] - 10https://gerrit.wikimedia.org/r/637003 (owner: 10Dzahn)
[18:46:49] <logmsgbot>	 !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636791|Revert "cirrus: Hardcode more_like to codfw cirrus cluster"]] (duration: 00m 56s)
[18:46:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:08] <logmsgbot>	 !log volans@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[18:47:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:49] <wikibugs>	 (03PS4) 10Dzahn: remove scandium from scap dsh group [puppet] - 10https://gerrit.wikimedia.org/r/637003 (https://phabricator.wikimedia.org/T257906)
[18:48:05] <mutante>	 always love the -1 for "one space missing after Bug:"
[18:48:17] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10Andrew) >>! In T266623#6585835, @Cmjohnson wrote: > @bstorm you are correct, that is the nic that is in the server but the 10G capability wo...
[18:49:00] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] remove scandium from scap dsh group [puppet] - 10https://gerrit.wikimedia.org/r/637003 (https://phabricator.wikimedia.org/T257906) (owner: 10Dzahn)
[18:49:15] <mutante>	 ^ there, that issue should be gone for now.. 
[18:49:56] <tgr_>	 thx mutante 
[18:50:20] <mutante>	 yep, np, the host will be reinstalled and added later, but then with running puppet and new host keys
[18:51:09] <wikibugs>	 (03Merged) 10jenkins-bot: Temporary enable 'editpage' warn logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636983 (https://phabricator.wikimedia.org/T251023) (owner: 10Ppchelko)
[18:51:32] <DannyS712>	 ^ too late, but probably should have been *temporarily*
[18:51:50] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[18:51:50] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[18:51:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:42] <tgr_>	 I'm sure all zero people who read mw-config git log as a hobby will be sad
[18:53:02] <wikibugs>	 (03CR) 10Dzahn: "this can go ahead now" [puppet] - 10https://gerrit.wikimedia.org/r/634383 (https://phabricator.wikimedia.org/T257906) (owner: 10Dzahn)
[18:54:11] <mutante>	 leaves a "if you can read this you won a barnstar / apply for a job" message to test that hypothesis
[18:55:11] * DannyS712 does read it
[18:55:21] <DannyS712>	 (when really bored)
[18:55:56] <logmsgbot>	 !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636983|Temporary enable 'editpage' warn logging (T251023)]] (duration: 00m 57s)
[18:55:58] <mutante>	 tried to downtime scandium in Icinga but of course that won't work if the host is already dropped crom puppet :)
[18:55:59] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:56:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:02] <stashbot>	 T251023: EditPage::getCurrentContent unexpectedly changes $currentModel and $currentFormat - https://phabricator.wikimedia.org/T251023
[18:56:09] <tgr_>	 !log Morning deploys done
[18:56:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:14] <wikibugs>	 10Operations, 10DC-Ops, 10netops: patch in FB peering into cr1-eqiad:xe-3/2/1 - https://phabricator.wikimedia.org/T265916 (10RobH)
[18:56:23] <tgr_>	 ryankemper: Pchelolo: patches are live
[18:56:28] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: fix/replace cable ID 2648 on FB peering patch - cable report error - https://phabricator.wikimedia.org/T266497 (10RobH) 05Resolved→03Open 2649 is also already in use, so now your fix introduced a new error:  https://netbox.wikimedia.org/dcim/cables/1167/ https://netbo...
[18:56:42] <mutante>	 subbu: your -1 is now a +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/634383  ? right?
[18:56:47] <Pchelolo>	 thank you tgr_.
[18:56:54] <DannyS712>	 eg merging https://gerrit.wikimedia.org/r/c/mediawiki/core/+/592471
[18:57:12] <wikibugs>	 (03CR) 10Subramanya Sastry: [C: 03+1] parsoid: stop using nodejs parsoid on scandium [puppet] - 10https://gerrit.wikimedia.org/r/634383 (https://phabricator.wikimedia.org/T257906) (owner: 10Dzahn)
[18:57:16] <subbu>	 fixed it.
[18:57:28] <mutante>	 thanks
[18:57:39] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] parsoid: stop using nodejs parsoid on scandium [puppet] - 10https://gerrit.wikimedia.org/r/634383 (https://phabricator.wikimedia.org/T257906) (owner: 10Dzahn)
[18:57:41] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:57:44] <wikibugs>	 (03PS2) 10Dzahn: parsoid: stop using nodejs parsoid on scandium [puppet] - 10https://gerrit.wikimedia.org/r/634383 (https://phabricator.wikimedia.org/T257906)
[18:57:56] <legoktm>	 if you spell a word wrong, just create a page on Wiktionary that says "alternative spelling" and no one will ever know ;-)
[18:58:12] <wikibugs>	 (03PS1) 10Dave Pifke: webperf: move navtiming monitoring back to eqiad [puppet] - 10https://gerrit.wikimedia.org/r/637007
[18:59:12] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 3 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) @aaron If you have any insights regarding the Redis Lock Manager and file upload, it would be much appreciated (+ T265643)
[19:01:23] <wikibugs>	 (03PS1) 10Dzahn: Revert "remove scandium from scap dsh group" [puppet] - 10https://gerrit.wikimedia.org/r/636958
[19:01:46] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/636924 (owner: 10Jbond)
[19:04:03] <wikibugs>	 (03PS3) 10Ahmon Dancy: modules/scap/templates/scap.cfg.erb: Define php_fpm_unsafe_restart_script [puppet] - 10https://gerrit.wikimedia.org/r/636074 (https://phabricator.wikimedia.org/T243009)
[19:05:34] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10wiki_willy) ++ @RobH - can you create a related procurement task and look into getting a quote for what the WMCS team needs?  Much appreciat...
[19:05:57] <wikibugs>	 10Operations, 10Parsoid, 10Parsoid-Tests, 10serviceops, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` scandium.eqi...
[19:06:00] <wikibugs>	 10Operations, 10Parsoid, 10Parsoid-Tests, 10serviceops, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['scandium.eqiad.wmnet'] `  Of which those **FAILED**: ` ['sc...
[19:06:33] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] profile::lvs::realserver: add ability to configure poolcounter for pools [puppet] - 10https://gerrit.wikimedia.org/r/635993 (https://phabricator.wikimedia.org/T266055) (owner: 10Giuseppe Lavagetto)
[19:07:27] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] safe-service-restart: add optional poolcounter support [puppet] - 10https://gerrit.wikimedia.org/r/635991 (https://phabricator.wikimedia.org/T266055) (owner: 10Giuseppe Lavagetto)
[19:07:34] <wikibugs>	 10Operations, 10Parsoid, 10Parsoid-Tests, 10serviceops, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` scandium.eqi...
[19:10:18] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10RobH) a:05Cmjohnson→03RobH >>! In T266623#6585634, @Cmjohnson wrote: > this server does not have a 10GB nic card  These were ordered wit...
[19:11:25] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/637011
[19:11:25] <wikibugs>	 10Operations, 10Patch-For-Review: logrotate for visualdiff tests on Parsoid test host (scandium) - https://phabricator.wikimedia.org/T161920 (10Dzahn) Thanks! Reimaging scandium right now as part of T257906.  We can close this ticket then as not needed anymore.
[19:20:12] <wikibugs>	 (03PS1) 10Bstorm: paws-k8s: switch the ingress for https to http logging [puppet] - 10https://gerrit.wikimedia.org/r/637017
[19:20:22] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[19:20:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:22:22] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[19:22:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:26:40] <wikibugs>	 (03PS1) 10Andrew Bogott: cloud-vps instances: include bsd-mailx on all hosts [puppet] - 10https://gerrit.wikimedia.org/r/637018
[19:30:42] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[19:30:43] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[19:30:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:29] <wikibugs>	 (03CR) 10Dzahn: "In the past when I had to do the same fix on prod hosts I was about to do that but ended up using the "s-nail" package instead. apt-cache " [puppet] - 10https://gerrit.wikimedia.org/r/637018 (owner: 10Andrew Bogott)
[19:33:08] <wikibugs>	 (03CR) 10Dzahn: "as far as I remember both packages provide /usr/bin/mail but the latter meant I did not have to change my existing commands and parameters" [puppet] - 10https://gerrit.wikimedia.org/r/637018 (owner: 10Andrew Bogott)
[19:35:57] <wikibugs>	 (03CR) 10Jbond: "> Patch Set 3: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/636924 (owner: 10Jbond)
[19:36:34] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.ganeti.makevm
[19:36:34] <logmsgbot>	 !log herron@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
[19:36:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:47] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.ganeti.makevm
[19:36:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:38:21] <icinga-wm>	 PROBLEM - Check systemd state on mw1381 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:44:26] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack to version 7 - https://phabricator.wikimedia.org/T234854 (10herron)
[19:48:11] <icinga-wm>	 PROBLEM - Check systemd state on releases2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:50:23] <logmsgbot>	 !log herron@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
[19:50:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:24] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.ganeti.makevm
[19:53:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:34] <wikibugs>	 (03CR) 10Andrew Bogott: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/637018 (owner: 10Andrew Bogott)
[19:53:43] <logmsgbot>	 !log herron@cumin1001 END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
[19:53:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:37] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata Query Builder: Deploy WDQS query builder to microsites - https://phabricator.wikimedia.org/T266703 (10Addshore)
[19:55:56] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata Query UI: Move WDQS UI to microsites - https://phabricator.wikimedia.org/T266702 (10Addshore)
[19:57:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10RobH) IRC Update:  * These are copper based 1g/10g NICs and we in 1g racks so it wasn't an issue until now. * We'll need to swap these out e...
[19:57:26] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata Query UI, 10User-Addshore: Move WDQS UI to microsites - https://phabricator.wikimedia.org/T266702 (10Addshore)
[19:57:32] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata Query Builder, 10User-Addshore: Deploy WDQS query builder to microsites - https://phabricator.wikimedia.org/T266703 (10Addshore)
[19:58:31] <wikibugs>	 (03PS2) 10Cwhite: Initial release based on ECS 1.6.0. [software/ecs] - 10https://gerrit.wikimedia.org/r/636513
[20:00:04] <jouncebot>	 chrisalbon and accraze: My dear minions, it's time we take the moon! Just kidding. Time for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201028T2000).
[20:03:40] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 3 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki)
[20:04:12] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 3 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) @aaron thank you! I updated the task description
[20:04:23] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] alertmanager: add dashboard url to irc messages [puppet] - 10https://gerrit.wikimedia.org/r/636868 (https://phabricator.wikimedia.org/T266017) (owner: 10Filippo Giunchedi)
[20:12:35] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] Initial release based on ECS 1.6.0. [software/ecs] - 10https://gerrit.wikimedia.org/r/636513 (owner: 10Cwhite)
[20:12:37] <wikibugs>	 (03CR) 10Cwhite: [V: 03+2 C: 03+2] Initial release based on ECS 1.6.0. [software/ecs] - 10https://gerrit.wikimedia.org/r/636513 (owner: 10Cwhite)
[20:14:28] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1025 with 10G interfaces - https://phabricator.wikimedia.org/T266187 (10RobH)
[20:14:39] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1029 with 10G interfaces - https://phabricator.wikimedia.org/T266206 (10RobH)
[20:14:48] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1026 with 10G interfaces - https://phabricator.wikimedia.org/T266281 (10RobH)
[20:15:13] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1027 with 10G interfaces - https://phabricator.wikimedia.org/T266369 (10RobH)
[20:15:20] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1028 with 10G interfaces - https://phabricator.wikimedia.org/T266514 (10RobH)
[20:16:00] <wikibugs>	 (03PS1) 10Ladsgroup: Change logo of Wikidata for the eighth birthday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637024
[20:16:11] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1030 with 10G interfaces - https://phabricator.wikimedia.org/T266623 (10RobH)
[20:17:46] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Change logo of Wikidata for the eighth birthday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637024 (owner: 10Ladsgroup)
[20:19:13] <wikibugs>	 (03CR) 10DannyS712: "is there an on-wiki announcement of this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637024 (owner: 10Ladsgroup)
[20:19:28] <wikibugs>	 (03Merged) 10jenkins-bot: Change logo of Wikidata for the eighth birthday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637024 (owner: 10Ladsgroup)
[20:22:59] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized static/images/project-logos: Changing logo of Wikidata for the brithday (duration: 00m 58s)
[20:23:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:50] <wikibugs>	 (03CR) 10Ladsgroup: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637024 (owner: 10Ladsgroup)
[20:30:40] <wikibugs>	 (03PS2) 10Dzahn: httpd/puppetmaster: add data type for SSLVerifyClient and use it [puppet] - 10https://gerrit.wikimedia.org/r/635658
[20:32:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] httpd/puppetmaster: add data type for SSLVerifyClient and use it [puppet] - 10https://gerrit.wikimedia.org/r/635658 (owner: 10Dzahn)
[20:32:10] <wikibugs>	 (03CR) 10Dzahn: "thanks!:)" [puppet] - 10https://gerrit.wikimedia.org/r/635905 (owner: 10Dzahn)
[20:35:34] <wikibugs>	 (03CR) 10Dzahn: wmflib:: add data type for puppetmaster server type and use it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/635660 (owner: 10Dzahn)
[20:36:54] <wikibugs>	 (03PS2) 10Dzahn: puppetmaster: add data type for server type and use it [puppet] - 10https://gerrit.wikimedia.org/r/635660
[20:37:27] <wikibugs>	 (03CR) 10Dzahn: httpd/puppetmaster: add data type for SSLVerifyClient and use it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/635658 (owner: 10Dzahn)
[20:37:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetmaster: add data type for server type and use it [puppet] - 10https://gerrit.wikimedia.org/r/635660 (owner: 10Dzahn)
[20:43:06] <wikibugs>	 10Operations, 10ops-eqiad, 10Reading Epics (Analytics): an-coord1001 ram upgrade - https://phabricator.wikimedia.org/T266709 (10RobH) p:05Triage→03Medium
[20:43:14] <wikibugs>	 10Operations, 10ops-eqiad, 10Reading Epics (Analytics): an-coord1001 ram upgrade - https://phabricator.wikimedia.org/T266709 (10RobH)
[20:44:00] <wikibugs>	 (03PS8) 10Dzahn: puppetmaster: add data types to all remaining parameters [puppet] - 10https://gerrit.wikimedia.org/r/635656
[20:44:02] <wikibugs>	 (03PS3) 10Dzahn: httpd/puppetmaster: add data type for SSLVerifyClient and use it [puppet] - 10https://gerrit.wikimedia.org/r/635658
[20:44:06] <wikibugs>	 (03CR) 10Dzahn: httpd/puppetmaster: add data type for SSLVerifyClient and use it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/635658 (owner: 10Dzahn)
[20:44:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetmaster: add data types to all remaining parameters [puppet] - 10https://gerrit.wikimedia.org/r/635656 (owner: 10Dzahn)
[20:45:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] httpd/puppetmaster: add data type for SSLVerifyClient and use it [puppet] - 10https://gerrit.wikimedia.org/r/635658 (owner: 10Dzahn)
[20:47:40] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[20:50:41] <logmsgbot>	 !log herron@cumin1001 START - Cookbook sre.ganeti.makevm
[20:50:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:57:24] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:58:58] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:59:44] <wikibugs>	 (03CR) 10Razzi: "https://puppet-compiler.wmflabs.org/compiler1003/26191/" [puppet] - 10https://gerrit.wikimedia.org/r/636517 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi)
[21:02:56] <icinga-wm>	 PROBLEM - Check systemd state on kubestage1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:03:29] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/26190/" [puppet] - 10https://gerrit.wikimedia.org/r/633857 (https://phabricator.wikimedia.org/T265138) (owner: 10Dzahn)
[21:03:56] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] prometheus: re-enable compaction by default [puppet] - 10https://gerrit.wikimedia.org/r/636362 (https://phabricator.wikimedia.org/T261281) (owner: 10Filippo Giunchedi)
[21:04:26] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[21:04:47] <wikibugs>	 (03CR) 10Bstorm: "We definitely are using bsd-mailx on the bastions." [puppet] - 10https://gerrit.wikimedia.org/r/637018 (owner: 10Andrew Bogott)
[21:05:01] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[21:05:03] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[21:05:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:42] <wikibugs>	 (03CR) 10Bstorm: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/637018 (owner: 10Andrew Bogott)
[21:09:24] <wikibugs>	 (03CR) 10CDanis: pcc: expore posting pcc to gerrit comments (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/636652 (owner: 10Jbond)
[21:13:19] <wikibugs>	 (03CR) 10Dzahn: "confirmed this is working like so:" [puppet] - 10https://gerrit.wikimedia.org/r/633857 (https://phabricator.wikimedia.org/T265138) (owner: 10Dzahn)
[21:13:22] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 04-1] "This mostly looks good to me. I left a few comments on some changes that are needed." (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/636916 (https://phabricator.wikimedia.org/T265893) (owner: 10Kosta Harlan)
[21:16:13] <wikibugs>	 (03Abandoned) 10Dzahn: cdh/hiveserver2: add shebang, fix bashisms [puppet] - 10https://gerrit.wikimedia.org/r/631889 (https://phabricator.wikimedia.org/T95064) (owner: 10Dzahn)
[21:18:18] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "stalled on https://phabricator.wikimedia.org/T264920" [puppet] - 10https://gerrit.wikimedia.org/r/632570 (https://phabricator.wikimedia.org/T210993) (owner: 10Dzahn)
[21:18:20] <icinga-wm>	 PROBLEM - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is CRITICAL: /api/rest_v1/page/html/{title} (Get html by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[21:19:22] <wikibugs>	 10Operations, 10Patch-For-Review: logrotate for visualdiff tests on Parsoid test host (scandium) - https://phabricator.wikimedia.org/T161920 (10Dzahn) 05Open→03Invalid
[21:19:52] <icinga-wm>	 RECOVERY - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[21:22:01] <logmsgbot>	 !log herron@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
[21:22:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:25:12] <wikibugs>	 (03PS10) 10Jbond: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572
[21:26:50] <wikibugs>	 (03PS1) 10Ppchelko: JobQueue: Increase concurrency for cdnPurge jobs. [deployment-charts] - 10https://gerrit.wikimedia.org/r/637031
[21:27:45] <wikibugs>	 (03CR) 10Jbond: "made some updates but still more to come, thanks for the review as always very useful 😊" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/634572 (owner: 10Jbond)
[21:37:11] <wikibugs>	 (03CR) 10Dzahn: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/637018 (owner: 10Andrew Bogott)
[21:40:20] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:41:19] <ryankemper>	 !log Disabled elasticsearch "saneitizer" systemd timer in eqiad due to checker jobs falling behind: `sudo systemctl disable mediawiki_job_cirrus_sanitize_jobs.timer` on `mwmaint1002`
[21:41:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:58] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:46:36] <wikibugs>	 10Operations, 10conftool, 10serviceops, 10Datacenter-Switchover: Disable maintenance scripts via conftool - https://phabricator.wikimedia.org/T266717 (10RLazarus)
[21:46:44] <wikibugs>	 10Operations, 10conftool, 10serviceops, 10Datacenter-Switchover: Disable maintenance scripts via conftool - https://phabricator.wikimedia.org/T266717 (10RLazarus) p:05Triage→03Medium
[21:47:25] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to prod cluster for annet - https://phabricator.wikimedia.org/T266718 (10AnneT)
[21:50:53] <wikibugs>	 10Operations, 10Parsoid, 10Parsoid-Tests, 10serviceops, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['scandium.eqiad.wmnet'] `  Of which those **FAILED**: ` ['sc...
[21:58:30] <wikibugs>	 (03PS1) 10Dzahn: site: assign insetup role to scandium, reimaging fails with prod role [puppet] - 10https://gerrit.wikimedia.org/r/637034 (https://phabricator.wikimedia.org/T257906)
[21:59:00] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] site: assign insetup role to scandium, reimaging fails with prod role [puppet] - 10https://gerrit.wikimedia.org/r/637034 (https://phabricator.wikimedia.org/T257906) (owner: 10Dzahn)