[01:29:15] 10Operations, 10ops-codfw, 10Wikimedia-Logstash: (No Need By Date Provided) rack/setup/install logstash202[6-9].codfw.wmnet - https://phabricator.wikimedia.org/T240882 (10Papaul) [02:01:31] (03PS3) 10Andrew Bogott: neutron: update l3_agent_hacks for Pike [puppet] - 10https://gerrit.wikimedia.org/r/561660 (https://phabricator.wikimedia.org/T241347) [02:01:33] (03PS1) 10Andrew Bogott: neutron/pike: remove accitdental router_info.py.orig [puppet] - 10https://gerrit.wikimedia.org/r/561725 [02:07:21] (03PS2) 10Andrew Bogott: neutron/pike: remove accidental router_info.py.orig [puppet] - 10https://gerrit.wikimedia.org/r/561725 [02:07:22] (03PS4) 10Andrew Bogott: neutron: update l3_agent_hacks for Pike [puppet] - 10https://gerrit.wikimedia.org/r/561660 (https://phabricator.wikimedia.org/T241347) [02:08:41] (03CR) 10Andrew Bogott: [C: 03+2] neutron/pike: remove accidental router_info.py.orig [puppet] - 10https://gerrit.wikimedia.org/r/561725 (owner: 10Andrew Bogott) [02:08:45] (03CR) 10Andrew Bogott: [C: 03+2] neutron: update l3_agent_hacks for Pike [puppet] - 10https://gerrit.wikimedia.org/r/561660 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott) [02:33:51] (03PS1) 10Legoktm: maintain-views: Remove outdated comment about spamblacklist log [puppet] - 10https://gerrit.wikimedia.org/r/561728 (https://phabricator.wikimedia.org/T241668) [02:36:47] (03CR) 10BryanDavis: "Same as my I28377bfdeb1f9dc0087b5ba23584169a4cd6b444, but with a less descriptive commit message" [puppet] - 10https://gerrit.wikimedia.org/r/561728 (https://phabricator.wikimedia.org/T241668) (owner: 10Legoktm) [02:51:04] (03PS1) 10BryanDavis: support tools: Add script to rebuild all images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561730 [02:54:39] (03PS1) 10Andrew Bogott: neutron pike: roll back our router_info.py slightly [puppet] - 10https://gerrit.wikimedia.org/r/561731 (https://phabricator.wikimedia.org/T241347) [02:56:15] (03CR) 10Andrew Bogott: [C: 03+2] neutron pike: roll back our router_info.py slightly [puppet] - 10https://gerrit.wikimedia.org/r/561731 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott) [03:00:17] (03CR) 10BryanDavis: [C: 03+1] "We are using `#!/usr/bin/python` (purposefully) in the entry point which means we will actually run under Python 2.7 on all currently supp" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [03:10:48] (03CR) 10Legoktm: [C: 03+1] wiki replicas: Remove outdated comment about spamblacklist [puppet] - 10https://gerrit.wikimedia.org/r/561352 (https://phabricator.wikimedia.org/T241668) (owner: 10BryanDavis) [03:11:03] (03Abandoned) 10Legoktm: maintain-views: Remove outdated comment about spamblacklist log [puppet] - 10https://gerrit.wikimedia.org/r/561728 (https://phabricator.wikimedia.org/T241668) (owner: 10Legoktm) [03:14:29] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 52 probes of 509 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [03:20:21] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 37 probes of 509 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [04:51:32] (03CR) 10Ayounsi: [C: 03+1] "+1 for network devices" [dns] - 10https://gerrit.wikimedia.org/r/561679 (https://phabricator.wikimedia.org/T239597) (owner: 10Volans) [05:15:03] 10Operations, 10ops-codfw, 10DBA: (No Need By Date Provided) codfw: rack/setup/install es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Papaul) [06:14:09] (03PS1) 10Andrew Bogott: keystone.conf: specify pymysql driver for mysql connection [puppet] - 10https://gerrit.wikimedia.org/r/561744 (https://phabricator.wikimedia.org/T241347) [06:21:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2087:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10028 and previous config saved to /var/cache/conftool/dbconfig/20200103-062148-marostegui.json [06:21:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:53] T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 [06:22:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2089:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10029 and previous config saved to /var/cache/conftool/dbconfig/20200103-062242-marostegui.json [06:22:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:43] !log Deploy schema change on db2089:3316 [06:23:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:26:48] 10Operations, 10ops-codfw, 10DBA: (Needed By 31st January) codfw: rack/setup/install es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Marostegui) [06:27:07] 10Operations, 10ops-eqiad, 10DBA: (Needed by 31st January) eqiad: rack/setup/install es102[0-5].eqiad.wmnet - https://phabricator.wikimedia.org/T241359 (10Marostegui) [06:41:48] (03PS1) 10Marostegui: site.pp: Add new external store hosts as spare [puppet] - 10https://gerrit.wikimedia.org/r/561750 (https://phabricator.wikimedia.org/T241336) [06:43:11] (03CR) 10Marostegui: [C: 03+2] site.pp: Add new external store hosts as spare [puppet] - 10https://gerrit.wikimedia.org/r/561750 (https://phabricator.wikimedia.org/T241336) (owner: 10Marostegui) [06:46:43] (03PS2) 10Andrew Bogott: keystone.conf: specify pymysql driver for mysql connection [puppet] - 10https://gerrit.wikimedia.org/r/561744 (https://phabricator.wikimedia.org/T241347) [06:46:44] (03PS1) 10Andrew Bogott: openstack nova userdata.txt: set manage_etc_hosts to false [puppet] - 10https://gerrit.wikimedia.org/r/561751 (https://phabricator.wikimedia.org/T240899) [06:48:28] (03CR) 10Andrew Bogott: [C: 03+2] keystone.conf: specify pymysql driver for mysql connection [puppet] - 10https://gerrit.wikimedia.org/r/561744 (https://phabricator.wikimedia.org/T241347) (owner: 10Andrew Bogott) [06:48:45] (03CR) 10Andrew Bogott: [C: 03+2] openstack nova userdata.txt: set manage_etc_hosts to false [puppet] - 10https://gerrit.wikimedia.org/r/561751 (https://phabricator.wikimedia.org/T240899) (owner: 10Andrew Bogott) [06:49:08] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: (Needed By 31st January) codfw: rack/setup/install es202[0-5].codfw.wmnet - https://phabricator.wikimedia.org/T241336 (10Marostegui) These hosts have been assigned to the partman recipe and set as initial spare for the installation: https://gerrit.wiki... [06:51:07] 10Operations, 10ops-eqiad, 10DBA: (Needed by 31st January) eqiad: rack/setup/install es102[0-5].eqiad.wmnet - https://phabricator.wikimedia.org/T241359 (10Marostegui) These hosts have been assigned to the partman recipe and set as initial spare for the installation: https://github.com/wikimedia/puppet/blob/... [06:57:21] !log Deploy schema change on s6 eqiad hosts - T234052 [06:57:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:24] T234052: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052 [07:09:59] !log Deploy schema change on s2 codfw master, lag will appear on codfw - T234052 [07:10:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:02] T234052: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052 [07:11:29] (03PS1) 10Legoktm: Add static-html webservice type that is the same as PHP 7.3 [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561753 (https://phabricator.wikimedia.org/T241817) [07:15:01] (03CR) 10Legoktm: "I was looking to see if I could just alias "static-html" to "php7.3", but there didn't seem to be a good way to do that since KubernetesBa" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561753 (https://phabricator.wikimedia.org/T241817) (owner: 10Legoktm) [07:31:13] 10Operations, 10ops-codfw, 10serviceops: (Need By: Jan 15) rack/setup/install mc-gp200[123].codfw.wmnet - https://phabricator.wikimedia.org/T241796 (10elukey) @papaul this seems a duplicate of T239249, let me know which task you prefer to keep/close :) [07:32:32] 10Operations, 10ops-codfw: (Need By: Jan 15) codfw: rack/setup/install mc-gp200[123].codfw.wmnet - https://phabricator.wikimedia.org/T239249 (10elukey) [07:56:26] (03PS2) 10Muehlenhoff: Switch mc* to standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/561599 (https://phabricator.wikimedia.org/T156955) [07:57:54] (03CR) 10Muehlenhoff: "Updated the hostname globbing in PS2, the pending mc extensions use a different hostname scheme (mc-gcp), but need the same partitioning." [puppet] - 10https://gerrit.wikimedia.org/r/561599 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [08:02:34] (03CR) 10Elukey: [C: 03+1] "Looks good to me. Please note though that eventually, when Redis will be removed from mc nodes, there will be no real need for an /srv par" [puppet] - 10https://gerrit.wikimedia.org/r/561599 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [08:03:55] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Kris_Litson_WMDE) All signed and ready to go. Thank you all for your help! [08:11:28] 10Operations, 10Domains, 10Traffic: nameserver change for wikimedia.sk - https://phabricator.wikimedia.org/T241084 (10Luky001) 05Stalled→03Open Hello, the name servers are as following: ` ns1.websupport.sk ns2.websupport.sk ns3.websupport.sk ` Everything should be working but please keep in mind that I... [08:17:21] !log Deploy schema change on labswiki (wikitech) T234052 [08:17:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:25] T234052: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052 [08:22:08] (03PS2) 10Muehlenhoff: Add a note to manage_principals for added/removed Kerberos principals [puppet] - 10https://gerrit.wikimedia.org/r/559731 [08:22:29] (03CR) 10Muehlenhoff: Add a note to manage_principals for added/removed Kerberos principals (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559731 (owner: 10Muehlenhoff) [08:35:16] (03CR) 10Elukey: [C: 03+2] Add a note to manage_principals for added/removed Kerberos principals [puppet] - 10https://gerrit.wikimedia.org/r/559731 (owner: 10Muehlenhoff) [08:41:54] (03PS3) 10Muehlenhoff: Switch mc* to standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/561599 (https://phabricator.wikimedia.org/T156955) [08:46:47] (03CR) 10Muehlenhoff: [C: 03+2] Switch mc* to standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/561599 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [09:07:49] (03PS1) 10Elukey: amd-rocm: update prometheus exporter to match new rocm-smi output format [puppet] - 10https://gerrit.wikimedia.org/r/561789 (https://phabricator.wikimedia.org/T236007) [09:11:23] (03PS2) 10Elukey: amd-rocm: update prometheus exporter to match new rocm-smi output format [puppet] - 10https://gerrit.wikimedia.org/r/561789 (https://phabricator.wikimedia.org/T236007) [09:11:57] (03CR) 10Elukey: [C: 03+2] amd-rocm: update prometheus exporter to match new rocm-smi output format [puppet] - 10https://gerrit.wikimedia.org/r/561789 (https://phabricator.wikimedia.org/T236007) (owner: 10Elukey) [09:21:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1076 schema change', diff saved to https://phabricator.wikimedia.org/P10030 and previous config saved to /var/cache/conftool/dbconfig/20200103-092107-marostegui.json [09:21:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:57] (03PS1) 10Elukey: amd-rocm: update prometheus exporter to avoid cron spam [puppet] - 10https://gerrit.wikimedia.org/r/561791 [09:23:31] (03CR) 10Elukey: [C: 03+2] amd-rocm: update prometheus exporter to avoid cron spam [puppet] - 10https://gerrit.wikimedia.org/r/561791 (owner: 10Elukey) [09:30:37] (03PS1) 10Legoktm: Switch to pytest [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561793 [09:30:39] (03PS1) 10Legoktm: Simplify tox configuration by using tox-wikimedia [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561794 [09:30:41] (03PS1) 10Legoktm: Rewrite webservice-python-bootstrap in Python [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561795 [09:31:01] (03PS5) 10Muehlenhoff: Add a define to install a package from a repository component [puppet] - 10https://gerrit.wikimedia.org/r/560458 (https://phabricator.wikimedia.org/T240324) [09:31:03] (03CR) 10Muehlenhoff: Add a define to install a package from a repository component (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/560458 (https://phabricator.wikimedia.org/T240324) (owner: 10Muehlenhoff) [09:35:57] (03PS2) 10Legoktm: Rewrite webservice-python-bootstrap in Python [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561795 [09:38:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P10031 and previous config saved to /var/cache/conftool/dbconfig/20200103-093829-marostegui.json [09:38:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1074 schema change', diff saved to https://phabricator.wikimedia.org/P10032 and previous config saved to /var/cache/conftool/dbconfig/20200103-094252-marostegui.json [09:42:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:21] (03PS1) 10Andrew Bogott: haproxy for neutron: As of pike, the healthcheck url returns 405. [puppet] - 10https://gerrit.wikimedia.org/r/561796 [09:49:51] (03CR) 10Andrew Bogott: [C: 03+2] haproxy for neutron: As of pike, the healthcheck url returns 405. [puppet] - 10https://gerrit.wikimedia.org/r/561796 (owner: 10Andrew Bogott) [09:54:01] (03CR) 10Jbond: [C: 03+1] "lgtm" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/560458 (https://phabricator.wikimedia.org/T240324) (owner: 10Muehlenhoff) [10:04:42] (03PS3) 10Ema: vcl: block bot aggressively hitting search API [puppet] - 10https://gerrit.wikimedia.org/r/561322 (https://phabricator.wikimedia.org/T241421) (owner: 10CDanis) [10:04:49] 10Operations, 10Elasticsearch, 10Traffic, 10Discovery-Search (Current work), and 2 others: Sustained periods (2-4h) of bad latency on production-search eqiad - https://phabricator.wikimedia.org/T241421 (10ema) p:05Triage→03High [10:09:29] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 52 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:10:05] 10Operations, 10Elasticsearch, 10Traffic, 10Discovery-Search (Current work), and 2 others: Sustained periods (2-4h) of bad latency on production-search eqiad - https://phabricator.wikimedia.org/T241421 (10jcrespo) [10:10:12] (03PS6) 10Muehlenhoff: Add a define to install a package from a repository component [puppet] - 10https://gerrit.wikimedia.org/r/560458 (https://phabricator.wikimedia.org/T240324) [10:11:42] !log installing cyrus-sasl2 security updates on stretch/buster [10:11:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:36] (03PS1) 10Legoktm: Add --fresh to webservice-python-bootstrap [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561802 [10:15:13] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 29 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:15:17] (03PS2) 10Legoktm: Add --fresh to webservice-python-bootstrap [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561802 [10:16:54] (03PS4) 10Ema: vcl: block bot aggressively hitting search API [puppet] - 10https://gerrit.wikimedia.org/r/561322 (https://phabricator.wikimedia.org/T241421) (owner: 10CDanis) [10:20:20] !log restarting apache on cloudmetrics* to pick up SASL security update [10:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:49] (03PS1) 10Andrew Bogott: haproxy for neutron: As of pike, the healthcheck url returns 405. [puppet] - 10https://gerrit.wikimedia.org/r/561806 (https://phabricator.wikimedia.org/T241347) [10:24:40] (03CR) 10Muehlenhoff: [C: 03+2] Add a define to install a package from a repository component [puppet] - 10https://gerrit.wikimedia.org/r/560458 (https://phabricator.wikimedia.org/T240324) (owner: 10Muehlenhoff) [10:37:57] (03PS1) 10Muehlenhoff: Switch cergen to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561814 [10:46:14] (03PS1) 10Jbond: etcd: add paramater type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) [10:46:16] (03PS1) 10Jbond: etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) [10:46:18] (03PS1) 10Jbond: etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) [10:46:20] (03PS1) 10Jbond: etcd: remove username/password [puppet] - 10https://gerrit.wikimedia.org/r/561819 (https://phabricator.wikimedia.org/T240941) [10:48:40] (03CR) 10jerkins-bot: [V: 04-1] etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [10:52:20] (03PS5) 10Ema: vcl: block bot aggressively hitting search API [puppet] - 10https://gerrit.wikimedia.org/r/561322 (https://phabricator.wikimedia.org/T241421) (owner: 10CDanis) [10:52:39] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/20159/" [puppet] - 10https://gerrit.wikimedia.org/r/561814 (owner: 10Muehlenhoff) [10:53:07] (03CR) 10Filippo Giunchedi: install_server: deprecate raid10-gpt-srv-lvm-ext4.cfg (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [10:53:29] (03PS2) 10Jbond: etcd: add paramater type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) [10:53:54] (03PS2) 10Jbond: etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) [10:54:42] (03CR) 10jerkins-bot: [V: 04-1] etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [10:55:17] (03PS4) 10Filippo Giunchedi: install_server: use raid10-8dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) [10:55:19] (03PS4) 10Filippo Giunchedi: install_server: use raid10-6dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559549 (https://phabricator.wikimedia.org/T156955) [10:55:21] (03PS4) 10Filippo Giunchedi: install_server: deprecate raid10-gpt-srv-lvm-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) [10:55:49] (03PS3) 10Jbond: etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) [10:56:13] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=gerrit site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:56:37] (03PS6) 10Ema: vcl: block bot aggressively hitting search API [puppet] - 10https://gerrit.wikimedia.org/r/561322 (https://phabricator.wikimedia.org/T241421) (owner: 10CDanis) [10:57:01] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Check if a GPU fits in any of the remaining stat or notebook hosts - https://phabricator.wikimedia.org/T220698 (10elukey) 05Open→03Resolved [10:58:01] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:58:12] (03PS3) 10Jbond: etcd: add paramater type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) [10:58:18] 10Operations, 10Analytics, 10Traffic, 10User-Elukey: Add VSL error counters to Varnishkafka stats - https://phabricator.wikimedia.org/T164259 (10elukey) 05Open→03Declined Given than we are moving to ATS, I'd decline this task :) [10:59:22] (03PS4) 10Jbond: etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) [11:00:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P10033 and previous config saved to /var/cache/conftool/dbconfig/20200103-110028-marostegui.json [11:00:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:45] (03CR) 10jerkins-bot: [V: 04-1] etcd: add paramater type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [11:01:14] (03PS1) 10Muehlenhoff: profile::mediawiki::videoscaler: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561824 [11:01:46] (03PS2) 10Jbond: etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) [11:01:48] (03CR) 10jerkins-bot: [V: 04-1] etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [11:08:23] 10Operations, 10Elasticsearch, 10Traffic, 10Discovery-Search (Current work), and 2 others: Sustained periods (2-4h) of bad latency on production-search eqiad - https://phabricator.wikimedia.org/T241421 (10ema) >>! In T241421#5768632, @dcausse wrote: > I believe this is caused by a bot sending a large amoun... [11:11:21] gehel (or other folks that might apply here?) when you have some time I got https://gerrit.wikimedia.org/r/c/operations/puppet/+/559549 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/559553 for your eyes, thanks! [11:12:33] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/20161/" [puppet] - 10https://gerrit.wikimedia.org/r/561824 (owner: 10Muehlenhoff) [11:13:52] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [11:17:40] (03PS4) 10Jbond: etcd: add paramater type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) [11:18:48] (03PS5) 10Jbond: etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) [11:23:09] (03PS1) 10Ema: varnish: update 16-x-connection-properties.vtc [puppet] - 10https://gerrit.wikimedia.org/r/561828 (https://phabricator.wikimedia.org/T241653) [11:23:13] PROBLEM - Host backup2001 is DOWN: PING CRITICAL - Packet loss = 100% [11:23:39] RECOVERY - Host backup2001 is UP: PING OK - Packet loss = 0%, RTA = 36.22 ms [11:26:01] (03PS5) 10Jbond: etcd: add paramater type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) [11:26:03] (03PS6) 10Jbond: etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) [11:26:05] (03PS3) 10Jbond: etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) [11:27:36] (03PS6) 10Arturo Borrero Gonzalez: toolforge: new k8s: deploy cadvisor.yaml [puppet] - 10https://gerrit.wikimedia.org/r/561654 (https://phabricator.wikimedia.org/T237643) [11:28:37] 10Operations, 10DBA: backup2001 crashed 2019-12-08 - https://phabricator.wikimedia.org/T240177 (10Marostegui) Looks like this host crashed again? ` [11:23:13] <+icinga-wm> PROBLEM - Host backup2001 is DOWN: PING CRITICAL - Packet loss = 100% [11:23:39] <+icinga-wm> RECOVERY - Host backup2001 is UP: PING OK -... [11:28:55] (03PS4) 10Jbond: etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) [11:29:08] (03PS2) 10Jbond: etcd: remove username/password [puppet] - 10https://gerrit.wikimedia.org/r/561819 (https://phabricator.wikimedia.org/T240941) [11:29:52] 10Operations, 10DBA: backup2001 crashed 2019-12-08 - https://phabricator.wikimedia.org/T240177 (10MoritzMuehlenhoff) There's nothing which indicates a cause of crash in SEL, syslog or kernel logs. [11:31:18] 10Operations, 10DBA: backup2001 crashed 2019-12-08 - https://phabricator.wikimedia.org/T240177 (10MoritzMuehlenhoff) I think this bug is just another case of T238305 [11:31:21] (03CR) 10jerkins-bot: [V: 04-1] etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [11:31:50] (03CR) 10jerkins-bot: [V: 04-1] etcd: remove username/password [puppet] - 10https://gerrit.wikimedia.org/r/561819 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [11:31:53] (03PS2) 10Ema: varnish: update 16-x-connection-properties.vtc [puppet] - 10https://gerrit.wikimedia.org/r/561828 (https://phabricator.wikimedia.org/T241653) [11:35:09] (03PS5) 10Jbond: etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) [11:39:51] 10Operations, 10DBA: backup2001 crashed 2019-12-08 - https://phabricator.wikimedia.org/T240177 (10jcrespo) a:05jcrespo→03Papaul @Papaul could you proceed with T240177#5727654 as this is the 3rd crash, and the second since firmware upgrade. [11:40:14] (03PS7) 10Arturo Borrero Gonzalez: toolforge: new k8s: deploy cadvisor.yaml [puppet] - 10https://gerrit.wikimedia.org/r/561654 (https://phabricator.wikimedia.org/T237643) [11:41:44] (03PS3) 10Ema: varnish: update 16-x-connection-properties.vtc [puppet] - 10https://gerrit.wikimedia.org/r/561828 (https://phabricator.wikimedia.org/T241653) [11:43:50] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: deploy cadvisor.yaml (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/561654 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [11:44:25] (03CR) 10Ema: [C: 03+2] varnish: update 16-x-connection-properties.vtc [puppet] - 10https://gerrit.wikimedia.org/r/561828 (https://phabricator.wikimedia.org/T241653) (owner: 10Ema) [11:44:51] (03PS6) 10Jbond: etcd: add parameter type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) [11:46:10] 10Operations, 10Traffic, 10Patch-For-Review: two failing upload VTC tests - https://phabricator.wikimedia.org/T241653 (10ema) 05Open→03Resolved a:03ema ` [*] Finding cluster... cp3051.esams.wmnet is a cache_upload host [*] Running varnishtest (this might take a while)... sudo varnishte... [11:49:53] (03CR) 10Jbond: "lgtm (minor typo)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/561814 (owner: 10Muehlenhoff) [11:49:57] (03CR) 10Jbond: [C: 03+1] Switch cergen to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561814 (owner: 10Muehlenhoff) [11:57:44] (03PS7) 10Jbond: etcd: add parameter type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) [11:57:54] (03PS1) 10Arturo Borrero Gonzalez: toolforge: prometheus: collect metrics from cadvisor in the new k8s cluster [puppet] - 10https://gerrit.wikimedia.org/r/561831 (https://phabricator.wikimedia.org/T237643) [11:59:24] (03CR) 10Muehlenhoff: Switch cergen to apt::package_from_component (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/561814 (owner: 10Muehlenhoff) [12:00:08] (03PS2) 10Muehlenhoff: Switch cergen to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561814 [12:00:18] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: prometheus: collect metrics from cadvisor in the new k8s cluster [puppet] - 10https://gerrit.wikimedia.org/r/561831 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [12:00:25] (03CR) 10Jbond: [C: 03+1] Switch cergen to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561814 (owner: 10Muehlenhoff) [12:01:18] (03PS8) 10Jbond: etcd: add parameter type checking and clean up [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) [12:03:31] (03PS7) 10Jbond: etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) [12:09:03] (03PS6) 10Jbond: etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) [12:09:29] (03PS3) 10Jbond: etcd: remove username/password [puppet] - 10https://gerrit.wikimedia.org/r/561819 (https://phabricator.wikimedia.org/T240941) [12:11:33] (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/20172/" [puppet] - 10https://gerrit.wikimedia.org/r/561816 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [12:19:29] (03PS2) 10Muehlenhoff: Switch snapshot100[89] to standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/561593 (https://phabricator.wikimedia.org/T156955) [12:22:18] (03PS3) 10Vgutierrez: ATS: Trigger update-ocsp-all iff non acme-chief certs are deployed [puppet] - 10https://gerrit.wikimedia.org/r/553526 [12:25:13] (03CR) 10Vgutierrez: "rebased, pcc still looks happy: https://puppet-compiler.wmflabs.org/compiler1002/20173/" [puppet] - 10https://gerrit.wikimedia.org/r/553526 (owner: 10Vgutierrez) [12:28:05] (03CR) 10Jbond: [C: 03+1] "LGTM however note it also effects archiva1001.wikimedia.org via archiva::proxy" [puppet] - 10https://gerrit.wikimedia.org/r/551396 (https://phabricator.wikimedia.org/T238518) (owner: 10Vgutierrez) [12:28:39] (03PS1) 10Muehlenhoff: Switch DNS servers and contemporary LVSes to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/561837 (https://phabricator.wikimedia.org/T156955) [12:29:56] (03CR) 10jerkins-bot: [V: 04-1] Switch DNS servers and contemporary LVSes to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/561837 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [12:30:24] (03PS2) 10Vgutierrez: ssl_ciphersuite: Allow TLSv1/TLSv1.1 in compat mode only [puppet] - 10https://gerrit.wikimedia.org/r/551396 (https://phabricator.wikimedia.org/T238518) [12:31:14] (03PS2) 10Muehlenhoff: Switch DNS servers and contemporary LVSes to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/561837 (https://phabricator.wikimedia.org/T156955) [12:36:49] (03PS1) 10Arturo Borrero Gonzalez: toolforge: prometheus: fix label config for cadvisor metrics [puppet] - 10https://gerrit.wikimedia.org/r/561839 (https://phabricator.wikimedia.org/T237643) [12:42:59] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me, archiva fragments might be fetched from JREs with outdated TLS, but we'll find out only by merging :-)" [puppet] - 10https://gerrit.wikimedia.org/r/551396 (https://phabricator.wikimedia.org/T238518) (owner: 10Vgutierrez) [12:52:59] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: prometheus: fix label config for cadvisor metrics [puppet] - 10https://gerrit.wikimedia.org/r/561839 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [12:55:04] (03PS7) 10Jbond: etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) [12:59:40] (03PS8) 10Jbond: etcd: add cert parameter to enable client auth [puppet] - 10https://gerrit.wikimedia.org/r/561818 (https://phabricator.wikimedia.org/T240941) [13:14:38] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Marostegui) [13:16:56] (03CR) 10ArielGlenn: [C: 03+1] "Fine by me for the labstore boxes, but Bstorm should have final sign-off." [puppet] - 10https://gerrit.wikimedia.org/r/551396 (https://phabricator.wikimedia.org/T238518) (owner: 10Vgutierrez) [13:21:19] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561286 (https://phabricator.wikimedia.org/T241304) (owner: 10Majavah) [13:27:39] (03PS1) 10ArielGlenn: exclude 10wikipedia from dump stats generation [puppet] - 10https://gerrit.wikimedia.org/r/561849 [13:27:47] (03CR) 10Filippo Giunchedi: [C: 03+1] Switch DNS servers and contemporary LVSes to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/561837 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [13:30:24] (03CR) 10Zoranzoki21: Rearrange of wmgEnableGeoData (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561658 (owner: 10Zoranzoki21) [13:31:03] (03CR) 10Zoranzoki21: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561658 (owner: 10Zoranzoki21) [13:32:46] (03CR) 10ArielGlenn: [C: 03+2] exclude 10wikipedia from dump stats generation [puppet] - 10https://gerrit.wikimedia.org/r/561849 (owner: 10ArielGlenn) [13:35:06] (03PS1) 10Jbond: profile::tlsproxy::envoy: add type checking and defaults [puppet] - 10https://gerrit.wikimedia.org/r/561850 [13:36:48] (03PS2) 10Jbond: profile::tlsproxy::envoy: add type checking and defaults [puppet] - 10https://gerrit.wikimedia.org/r/561850 [13:36:57] (03CR) 10jerkins-bot: [V: 04-1] profile::tlsproxy::envoy: add type checking and defaults [puppet] - 10https://gerrit.wikimedia.org/r/561850 (owner: 10Jbond) [13:38:46] (03CR) 10jerkins-bot: [V: 04-1] profile::tlsproxy::envoy: add type checking and defaults [puppet] - 10https://gerrit.wikimedia.org/r/561850 (owner: 10Jbond) [13:42:00] (03CR) 10Phamhi: [C: 03+1] "I just have two nitpick comments but otherwise LGTM." (032 comments) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/561730 (owner: 10BryanDavis) [13:45:04] (03CR) 10Phamhi: [C: 03+2] toolforge: replace diamond redis monitoring with prometheus [puppet] - 10https://gerrit.wikimedia.org/r/561437 (https://phabricator.wikimedia.org/T210993) (owner: 10BryanDavis) [13:45:19] (03CR) 10Phamhi: [C: 03+1] "LGTM too as well." [puppet] - 10https://gerrit.wikimedia.org/r/561437 (https://phabricator.wikimedia.org/T210993) (owner: 10BryanDavis) [13:46:04] !log restarting exim on MXes to pick up SASL security update [13:46:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:48] !log Deploy schema change on s4 codfw (lag will appear on codfw s4) - T234052 [13:50:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:51] T234052: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052 [13:54:24] (03CR) 10ArielGlenn: [C: 03+1] "Should be fine." [puppet] - 10https://gerrit.wikimedia.org/r/561593 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [13:58:24] (03PS3) 10Jbond: profile::tlsproxy::envoy: add type checking and defaults [puppet] - 10https://gerrit.wikimedia.org/r/561850 [13:59:39] (03PS3) 10Muehlenhoff: Switch snapshot100[89] to standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/561593 (https://phabricator.wikimedia.org/T156955) [14:02:52] (03PS1) 10Jbond: add schema private key [labs/private] - 10https://gerrit.wikimedia.org/r/561851 [14:03:35] (03CR) 10Jbond: [V: 03+2 C: 03+2] add schema private key [labs/private] - 10https://gerrit.wikimedia.org/r/561851 (owner: 10Jbond) [14:04:03] (03CR) 10Muehlenhoff: [C: 03+2] Switch snapshot100[89] to standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/561593 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [14:05:37] (03PS4) 10Jbond: profile::tlsproxy::envoy: add type checking and defaults [puppet] - 10https://gerrit.wikimedia.org/r/561850 [14:10:51] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 17649224 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [14:11:31] (03PS1) 10Muehlenhoff: Deprecate raid1-lvm-ext4-srv-dualboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/561852 (https://phabricator.wikimedia.org/T156955) [14:11:44] (03CR) 10Ammarpad: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561576 (https://phabricator.wikimedia.org/T241705) (owner: 10Ammarpad) [14:12:39] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 43608 and 47 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [14:15:37] !log Run undelete.php on a couple of pages at plwikisource per T241824 [14:15:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:40] T241824: Create "proofread-admin" user group in plwikisource - https://phabricator.wikimedia.org/T241824 [14:24:31] (03CR) 10Filippo Giunchedi: [C: 03+1] Deprecate raid1-lvm-ext4-srv-dualboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/561852 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [14:27:55] (03PS5) 10Jbond: profile::tlsproxy::envoy: add type checking and defaults [puppet] - 10https://gerrit.wikimedia.org/r/561850 (https://phabricator.wikimedia.org/T240941) [14:35:22] (03PS1) 10Muehlenhoff: grafana: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561855 [14:36:13] (03PS2) 10Muehlenhoff: grafana: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561855 [14:40:52] (03PS1) 10Volans: eqiad: add missing mgmt asset tag records [dns] - 10https://gerrit.wikimedia.org/r/561856 (https://phabricator.wikimedia.org/T239597) [14:40:53] (03PS1) 10Volans: Fix network devices management records [dns] - 10https://gerrit.wikimedia.org/r/561857 (https://phabricator.wikimedia.org/T239597) [14:43:24] (03PS1) 10Ema: WIP: puppetvagrant [puppet] - 10https://gerrit.wikimedia.org/r/561858 [14:45:28] !log clean up old /etc/apt/preferences.d/facter.pref file [14:45:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:24] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/20178/" [puppet] - 10https://gerrit.wikimedia.org/r/561855 (owner: 10Muehlenhoff) [14:48:05] !log clean up old /etc/apt/preferences.d/puppet_all.pref file [14:48:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:55] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/561855 (owner: 10Muehlenhoff) [14:49:57] (03PS1) 10Ema: cache: hiera setting to apply performance tweaks [puppet] - 10https://gerrit.wikimedia.org/r/561863 [14:52:44] (03PS1) 10Muehlenhoff: elasticsearch::packages: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561866 [14:53:17] (03CR) 10Ema: "pcc seems fine: https://puppet-compiler.wmflabs.org/compiler1001/20179/" [puppet] - 10https://gerrit.wikimedia.org/r/561863 (owner: 10Ema) [14:56:11] (03PS1) 10Elukey: Set Buster for analytics1031 [puppet] - 10https://gerrit.wikimedia.org/r/561869 (https://phabricator.wikimedia.org/T231067) [14:56:21] !log clean up old /etc/apt/preferences.d/smartmontools.pref file [14:56:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:26] (03CR) 10Elukey: [C: 03+2] Set Buster for analytics1031 [puppet] - 10https://gerrit.wikimedia.org/r/561869 (https://phabricator.wikimedia.org/T231067) (owner: 10Elukey) [14:56:33] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/20180/" [puppet] - 10https://gerrit.wikimedia.org/r/561866 (owner: 10Muehlenhoff) [14:57:04] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) >>! In T214605#5756945, @jbond wrote: > I have generated a list of managed sources files with the following snippet > > ` > lang=python > from pypuppetdb import connec... [14:58:08] !log Deploy schema changes on s2 and s4 eqiad hosts T234052 [14:58:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:11] T234052: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052 [15:02:45] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) remove the followinf old files * /etc/apt/preferences.d/puppet_all.pref * /etc/apt/preferences.d/facter.pref * /etc/apt/preferences.d/smartmontools.pref The followi... [15:03:39] (03PS1) 10Muehlenhoff: spicerack: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561871 [15:04:21] !log Upgrade db2107 [15:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:13] (03CR) 10CDanis: [C: 03+1] "thanks!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/561855 (owner: 10Muehlenhoff) [15:05:44] (03CR) 10Volans: [C: 03+2] "Double checked them one by one" [dns] - 10https://gerrit.wikimedia.org/r/561606 (https://phabricator.wikimedia.org/T239597) (owner: 10Volans) [15:06:17] (03CR) 10CDanis: [C: 03+1] Deprecate raid1-lvm-ext4-srv-dualboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/561852 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [15:06:26] (03CR) 10Volans: [C: 03+2] Remove stale management records [dns] - 10https://gerrit.wikimedia.org/r/561679 (https://phabricator.wikimedia.org/T239597) (owner: 10Volans) [15:07:26] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [15:07:28] (03PS3) 10Muehlenhoff: grafana: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561855 [15:07:31] (03CR) 10Muehlenhoff: grafana: Switch to apt::package_from_component (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/561855 (owner: 10Muehlenhoff) [15:07:59] (03CR) 10Herron: [C: 03+1] "LGTM, much neater!" [puppet] - 10https://gerrit.wikimedia.org/r/561866 (owner: 10Muehlenhoff) [15:12:03] (03CR) 10Filippo Giunchedi: [C: 03+1] grafana: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561855 (owner: 10Muehlenhoff) [15:12:50] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/20181/" [puppet] - 10https://gerrit.wikimedia.org/r/561871 (owner: 10Muehlenhoff) [15:15:33] (03CR) 10Herron: [C: 03+1] install_server: use raid10-8dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [15:19:08] (03PS1) 10Muehlenhoff: Extend the docs for selecting packages to install from a component [puppet] - 10https://gerrit.wikimedia.org/r/561875 [15:26:24] (03PS1) 10Jbond: apt::package_from_component: Don't ping if already covered [puppet] - 10https://gerrit.wikimedia.org/r/561876 (https://phabricator.wikimedia.org/T214605) [15:30:48] (03PS1) 10Ema: cache: rename cache::text_ats role to cache::text [puppet] - 10https://gerrit.wikimedia.org/r/561878 (https://phabricator.wikimedia.org/T241239) [15:31:02] (03CR) 10Gehel: [C: 03+1] "LGTM (I'm assuming that raid10-4dev does what it seems to do, partman has too much magic for me)" [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [15:31:41] (03CR) 10Gehel: "LGTM (I'm assuming that raid10-6dev does what it seems to do, partman has too much magic for me)" [puppet] - 10https://gerrit.wikimedia.org/r/559549 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [15:31:49] (03CR) 10Gehel: [C: 03+1] install_server: use raid10-6dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559549 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [15:35:48] (03CR) 10Ema: "pcc looks sane https://puppet-compiler.wmflabs.org/compiler1003/20182/" [puppet] - 10https://gerrit.wikimedia.org/r/561878 (https://phabricator.wikimedia.org/T241239) (owner: 10Ema) [15:36:07] 10Operations, 10SRE-Access-Requests: Requesting access to EventLogging data for knissen - https://phabricator.wikimedia.org/T241838 (10kai.nissen) [15:37:09] (03PS2) 10Jbond: apt::package_from_component: Don't ping if already covered [puppet] - 10https://gerrit.wikimedia.org/r/561876 (https://phabricator.wikimedia.org/T214605) [15:40:25] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) * remove /etc/apt/preferences.d/reprepro.pref from install servers as package is now in jessie-wikimedia/backports * remove /etc/apt/preferences.d/go.pref on kubeernete... [15:41:52] (03CR) 10Jhedden: [C: 03+1] install_server: deprecate raid10-gpt-srv-lvm-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [15:42:54] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) remaining ` (2) labstore[1004-1005].eqiad.wmnet ----- OUTPUT of 'ls -1 /etc/apt/p...|wi... [15:43:23] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/561876 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [15:44:56] (03CR) 10Jbond: [C: 03+2] apt::package_from_component: Don't ping if already covered [puppet] - 10https://gerrit.wikimedia.org/r/561876 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [15:45:33] (03PS2) 10Volans: Fix network devices management records [dns] - 10https://gerrit.wikimedia.org/r/561857 (https://phabricator.wikimedia.org/T239597) [15:50:18] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) * remove python3{-,_}{ldap3,pyasn}.pref and nethogs.pref as package available in jessie-wikimedia/openstack-mitaka-jessie [15:51:24] (03PS2) 10Ema: cache: rename cache::text_ats role to cache::text [puppet] - 10https://gerrit.wikimedia.org/r/561878 (https://phabricator.wikimedia.org/T241239) [15:51:26] (03PS1) 10Ema: cache: make backend_services default to 'ats-be' [puppet] - 10https://gerrit.wikimedia.org/r/561881 (https://phabricator.wikimedia.org/T241239) [15:52:29] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) remaining unmanaged files under /etc/apt/prefrences.d/ ` ===== NODE GROUP =====... [15:53:10] (03PS1) 10Muehlenhoff: amd_rocm: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/561882 [15:58:30] (03PS1) 10Ema: ATS: Deploy acme-chief version of unified certificate on text [puppet] - 10https://gerrit.wikimedia.org/r/561883 (https://phabricator.wikimedia.org/T234803) [15:59:26] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/20183/" [puppet] - 10https://gerrit.wikimedia.org/r/561882 (owner: 10Muehlenhoff) [16:02:00] (03PS2) 10Ema: ATS: Deploy acme-chief version of unified certificate on text [puppet] - 10https://gerrit.wikimedia.org/r/561883 (https://phabricator.wikimedia.org/T234803) [16:03:57] (03CR) 10Muehlenhoff: "Updated PCC: https://puppet-compiler.wmflabs.org/compiler1002/20184/" [puppet] - 10https://gerrit.wikimedia.org/r/561871 (owner: 10Muehlenhoff) [16:06:47] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) * removed /etc/apt/preferences.d/t.pref and /etc/apt/preferences.d/t2.pref as pined version is no longer available * remove mitaka_stretch_nojessiebpo_clientpackages.pr... [16:07:44] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) only managed files now exist in /etc/apt/prefrences.d ` sudo cumin 'A:all' 'ls -1 /etc/apt/preferences.d/ | egrep -v "jessie_mitaka_pinning_python_keystoneclient.pref... [16:09:50] (03CR) 10Muehlenhoff: "Updated PCC: https://puppet-compiler.wmflabs.org/compiler1003/20185/" [puppet] - 10https://gerrit.wikimedia.org/r/561866 (owner: 10Muehlenhoff) [16:09:51] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 53038472 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [16:10:50] (03CR) 10Cwhite: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/561866 (owner: 10Muehlenhoff) [16:11:39] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 61824 and 94 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [16:11:51] (03CR) 10Muehlenhoff: "Updated PCC: https://puppet-compiler.wmflabs.org/compiler1003/20186/" [puppet] - 10https://gerrit.wikimedia.org/r/561855 (owner: 10Muehlenhoff) [16:13:34] (03CR) 10Muehlenhoff: "Updated PCC: https://puppet-compiler.wmflabs.org/compiler1001/20187/" [puppet] - 10https://gerrit.wikimedia.org/r/561814 (owner: 10Muehlenhoff) [16:17:02] (03CR) 10Muehlenhoff: "Updated PCC: https://puppet-compiler.wmflabs.org/compiler1002/20188/" [puppet] - 10https://gerrit.wikimedia.org/r/561824 (owner: 10Muehlenhoff) [16:17:16] (03PS3) 10Jhedden: lvs ceph: add cloudceph service and cluster [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) [16:19:34] (03PS1) 10Jbond: apt::pin: manage the apt preferences directory and purge unmanaged resources [puppet] - 10https://gerrit.wikimedia.org/r/561884 (https://phabricator.wikimedia.org/T214605) [16:21:23] (03CR) 10Muehlenhoff: apt::pin: manage the apt preferences directory and purge unmanaged resources (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/561884 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [16:21:32] (03CR) 10jerkins-bot: [V: 04-1] apt::pin: manage the apt preferences directory and purge unmanaged resources [puppet] - 10https://gerrit.wikimedia.org/r/561884 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [16:27:13] (03PS2) 10Jbond: apt::pin: manage the apt preferences directory and purge unmanaged resources [puppet] - 10https://gerrit.wikimedia.org/r/561884 (https://phabricator.wikimedia.org/T214605) [16:27:16] (03CR) 10Jbond: "Thanks updated" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/561884 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [16:33:56] (03CR) 10SBassett: [C: 03+1] "Seems fine from a security perspective, given the discussion b/w Lego and Bawolff." [puppet] - 10https://gerrit.wikimedia.org/r/561352 (https://phabricator.wikimedia.org/T241668) (owner: 10BryanDavis) [16:34:20] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/561875 (owner: 10Muehlenhoff) [16:35:29] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/561855 (owner: 10Muehlenhoff) [16:36:05] !log stopping db2084 [16:36:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:00] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10herron) Thanks for the update @Kris_Litson_WMDE @RStallman-legalteam is everything in order on your end? [16:43:36] (03CR) 10Elukey: [C: 03+1] "wow!" [puppet] - 10https://gerrit.wikimedia.org/r/561882 (owner: 10Muehlenhoff) [16:49:29] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/561884 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [16:53:39] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2070.codfw.wmnet - https://phabricator.wikimedia.org/T239684 (10Papaul) ` papaul@asw-c-codfw# show | compare [edit interfaces interface-range vlan-private1-c-codfw] - member ge-5/0/22; [edit interfaces interface-range disabled] me... [16:54:49] (03PS1) 10Arturo Borrero Gonzalez: toolforge: prometheus: fix regexp for cadvisor discovery [puppet] - 10https://gerrit.wikimedia.org/r/561887 (https://phabricator.wikimedia.org/T237643) [16:56:42] (03CR) 10jerkins-bot: [V: 04-1] toolforge: prometheus: fix regexp for cadvisor discovery [puppet] - 10https://gerrit.wikimedia.org/r/561887 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [16:57:52] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2070.codfw.wmnet - https://phabricator.wikimedia.org/T239684 (10Papaul) [16:59:19] (03PS2) 10Arturo Borrero Gonzalez: toolforge: prometheus: fix regexp for cadvisor discovery [puppet] - 10https://gerrit.wikimedia.org/r/561887 (https://phabricator.wikimedia.org/T237643) [16:59:44] (03CR) 10Jbond: [C: 03+2] apt::pin: manage the apt preferences directory and purge unmanaged resources [puppet] - 10https://gerrit.wikimedia.org/r/561884 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [17:02:25] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: prometheus: fix regexp for cadvisor discovery [puppet] - 10https://gerrit.wikimedia.org/r/561887 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [17:09:11] 10Operations, 10ops-eqiad, 10serviceops: rack/setup/install new eqiad mw systems - https://phabricator.wikimedia.org/T241849 (10RobH) p:05Triage→03Normal [17:09:14] 10Operations, 10ops-eqiad, 10serviceops: rack/setup/install new eqiad kubenetes systems - https://phabricator.wikimedia.org/T241850 (10RobH) p:05Triage→03Normal [17:09:31] (03PS1) 10Arturo Borrero Gonzalez: toolforge: new k8s: give prometheus permission to read pod/proxy resources [puppet] - 10https://gerrit.wikimedia.org/r/561888 (https://phabricator.wikimedia.org/T237643) [17:10:32] (03CR) 10Jhedden: "PCC results: https://puppet-compiler.wmflabs.org/compiler1003/20190/" [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [17:10:58] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: give prometheus permission to read pod/proxy resources [puppet] - 10https://gerrit.wikimedia.org/r/561888 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [17:18:32] 10Operations, 10ops-codfw, 10DBA: Upgrade BIOS and firmware on db2084 - https://phabricator.wikimedia.org/T241103 (10Papaul) a:05Papaul→03jcrespo Before BIOS Version 2.4.3 Firmware Version 2.40.40.40 After BIOS Version 2.11.0 Firmware Version 2.70.70.70 Upgrade complete I tried to start MYSQL wit... [17:24:27] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: decommission db2065.codfw.wmnet - https://phabricator.wikimedia.org/T239046 (10Papaul) ` papaul@asw-d-codfw# show | compare [edit interfaces interface-range vlan-private1-d-codfw] - member ge-6/0/13; [edit interfaces interface-range disabled] me... [17:26:53] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: decommission db2065.codfw.wmnet - https://phabricator.wikimedia.org/T239046 (10Papaul) [17:28:30] (03CR) 10BryanDavis: [C: 04-1] "I think that if we are going to have this as an option it should actually be a container that is only for static html and not including th" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561753 (https://phabricator.wikimedia.org/T241817) (owner: 10Legoktm) [17:30:49] (03CR) 10Bstorm: [C: 03+1] ssl_ciphersuite: Allow TLSv1/TLSv1.1 in compat mode only [puppet] - 10https://gerrit.wikimedia.org/r/551396 (https://phabricator.wikimedia.org/T238518) (owner: 10Vgutierrez) [17:35:36] (03CR) 10Paladox: [C: 03+1] ssl_ciphersuite: Allow TLSv1/TLSv1.1 in compat mode only [puppet] - 10https://gerrit.wikimedia.org/r/551396 (https://phabricator.wikimedia.org/T238518) (owner: 10Vgutierrez) [17:36:43] (03PS1) 10Bstorm: toolforge-k8s: switch cluster monitor script to using a new clusterrole [puppet] - 10https://gerrit.wikimedia.org/r/561889 (https://phabricator.wikimedia.org/T241381) [17:41:55] (03CR) 10BryanDavis: [C: 03+1] toolforge-k8s: switch cluster monitor script to using a new clusterrole [puppet] - 10https://gerrit.wikimedia.org/r/561889 (https://phabricator.wikimedia.org/T241381) (owner: 10Bstorm) [17:42:15] 10Operations, 10ops-codfw, 10serviceops: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10RobH) p:05Triage→03Normal [17:42:17] (03CR) 10Bstorm: [C: 03+2] toolforge-k8s: switch cluster monitor script to using a new clusterrole [puppet] - 10https://gerrit.wikimedia.org/r/561889 (https://phabricator.wikimedia.org/T241381) (owner: 10Bstorm) [17:44:48] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db2084 instances T241103', diff saved to https://phabricator.wikimedia.org/P10035 and previous config saved to /var/cache/conftool/dbconfig/20200103-174447-jynus.json [17:44:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:52] T241103: Upgrade BIOS and firmware on db2084 - https://phabricator.wikimedia.org/T241103 [17:54:10] 10Operations, 10ops-codfw, 10DBA: Upgrade BIOS and firmware on db2084 - https://phabricator.wikimedia.org/T241103 (10jcrespo) a:05jcrespo→03Marostegui I have started mysql instances back again, and replication, as on codfw there is low load. I saw to minor issues @Marostegui , not important in this case... [18:02:10] (03PS1) 10Bstorm: toolforge-k8s: renamed files don't get changed [puppet] - 10https://gerrit.wikimedia.org/r/561891 (https://phabricator.wikimedia.org/T241381) [18:03:48] (03CR) 10Bstorm: [C: 03+2] toolforge-k8s: renamed files don't get changed [puppet] - 10https://gerrit.wikimedia.org/r/561891 (https://phabricator.wikimedia.org/T241381) (owner: 10Bstorm) [18:04:01] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10RStallman-legalteam) Yes, this is fully signed now. Thanks! [18:04:05] (03PS1) 10Elukey: graphite::wmcs::archiver: move warnings to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/561892 [18:10:39] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 17296528 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:12:27] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 10816 and 45 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:25:13] (03CR) 10Bstorm: [C: 04-1] "One thing I just thought of: This doesn't up date the changelog. Can we get a debian/changelog update in this just to mark it (and allow " [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561262 (owner: 10Legoktm) [18:38:21] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Dzahn) a:03Dzahn [18:53:25] 10Operations, 10ops-codfw: (No Need By Date Provided) codfw: rack/setup/install elastic20{55,56,57,58,59,60}.wikimedia.org - https://phabricator.wikimedia.org/T241337 (10wiki_willy) [18:54:13] 10Operations, 10ops-codfw, 10Core Platform Team: (No Need By Date Provided) rack/setup/install restbase202[123] - https://phabricator.wikimedia.org/T241790 (10wiki_willy) [18:57:17] (03PS1) 10Dzahn: hieradata/labs: add common Hiera values for Phabricator staging [puppet] - 10https://gerrit.wikimedia.org/r/561897 [18:59:08] (03CR) 10Dzahn: "These are supposed to be the ones that are the same for "stage" and "prod" instance, so phabricator_server and some others are missing on " [puppet] - 10https://gerrit.wikimedia.org/r/561897 (owner: 10Dzahn) [19:00:42] 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install new eqiad mw systems - https://phabricator.wikimedia.org/T241849 (10wiki_willy) [19:01:22] 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install new eqiad kubenetes systems - https://phabricator.wikimedia.org/T241850 (10wiki_willy) [19:13:41] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Dzahn) Hi @Franziska_Heine this request is ready to go but pending your approval. Best, Daniel [19:15:52] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Dzahn) a:05Dzahn→03Franziska_Heine [19:15:53] 10Operations, 10SRE-Access-Requests: Requesting access to EventLogging data for knissen - https://phabricator.wikimedia.org/T241838 (10herron) [19:15:55] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Dzahn) p:05Triage→03Normal [19:21:10] (03CR) 10Dzahn: [C: 03+2] ""cloud only" https://puppet-compiler.wmflabs.org/compiler1003/20191/" [puppet] - 10https://gerrit.wikimedia.org/r/561897 (owner: 10Dzahn) [19:31:27] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10wiki_willy) In going through all the affected systems in this task, I'd like to treat db2125 and backup2001 separately, since they seem like one-offs and could very well be hardware issues. (db2125... [19:37:27] (03CR) 10Paladox: [C: 03+1] hieradata/labs: add common Hiera values for Phabricator staging [puppet] - 10https://gerrit.wikimedia.org/r/561897 (owner: 10Dzahn) [19:38:16] 10Operations, 10SRE-Access-Requests: Requesting access to EventLogging data for knissen - https://phabricator.wikimedia.org/T241838 (10herron) [19:42:56] 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and researchers for Aroraakhil - https://phabricator.wikimedia.org/T241096 (10herron) Hi @Nuria, a friendly ping/bump for approval on this. Happy new year! [19:43:28] 10Operations, 10SRE-Access-Requests: Requesting access to EventLogging data for knissen - https://phabricator.wikimedia.org/T241838 (10herron) p:05Triage→03Normal [19:43:46] 10Operations, 10cloud-services-team: Migrate Cloud VPS to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10herron) p:05Triage→03Normal [19:44:07] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1014 - https://phabricator.wikimedia.org/T241494 (10herron) p:05Triage→03High [19:44:27] 10Operations, 10Traffic: Add more detailed instructions to the "sec-advice" page - https://phabricator.wikimedia.org/T241309 (10herron) p:05Triage→03Normal [19:44:53] 10Operations: Track services without a native systemd unit - https://phabricator.wikimedia.org/T240843 (10herron) p:05Triage→03Normal [19:45:37] 10Operations, 10serviceops, 10Patch-For-Review: PHP Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 20480 bytes) in /var/www/php-monitoring/lib.php on line 35 - https://phabricator.wikimedia.org/T240824 (10herron) p:05Triage→03Normal [20:07:59] (03PS4) 10CRusnov: tools/import-mgmt-dns.py: General Improvements [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/560367 [20:08:21] (03PS5) 10CRusnov: tools/import-mgmt-dns.py: General Improvements [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/560367 [20:08:39] (03CR) 10CRusnov: [C: 03+2] "I'm self-merging since this has already been used in practice." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/560367 (owner: 10CRusnov) [20:08:41] (03PS1) 10Dzahn: hieradata/labs: add Hiera key/values for phabricator cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/561898 [20:12:29] (03PS2) 10Dzahn: hieradata/labs: add Hiera key/values for phabricator cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/561898 [20:15:45] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Dzahn) ` [mwmaint1002:~] $ /usr/bin/ldapsearch -x -b "dc=wikimedia,dc=org" -s sub "mail=kris.litson*" | grep uid dn: uid=krli,ou=people,dc=wikimedia,dc=org uid: krli ui... [20:18:43] (03PS1) 10Dzahn: admins: add Kris Litson to ldap_only admins (WMDE) [puppet] - 10https://gerrit.wikimedia.org/r/561899 (https://phabricator.wikimedia.org/T241722) [20:21:57] (03CR) 10Dzahn: [C: 04-1] "pending manager approval" [puppet] - 10https://gerrit.wikimedia.org/r/561899 (https://phabricator.wikimedia.org/T241722) (owner: 10Dzahn) [20:33:32] 10Operations: investigate making 'notrack' the default on our ferm rules - https://phabricator.wikimedia.org/T240495 (10herron) p:05Triage→03Normal [20:34:18] 10Operations, 10DNS, 10Traffic: redirect non-existing wikimania2020.wikimedia.org to wikimania.wikimedia.org - https://phabricator.wikimedia.org/T240341 (10herron) p:05Triage→03Normal [20:42:54] (03CR) 10Dzahn: [C: 03+2] ""cloud-only"" [puppet] - 10https://gerrit.wikimedia.org/r/561898 (owner: 10Dzahn) [21:09:27] (03PS1) 10Dzahn: hieradata/labs: move Hiera keys from role-based to common.yaml for devtools [puppet] - 10https://gerrit.wikimedia.org/r/561903 [21:16:08] (03PS1) 10Dzahn: phabricator: simplify Hiera, remove duplicate values [puppet] - 10https://gerrit.wikimedia.org/r/561904 [21:28:05] (03PS1) 10Papaul: DNS: Remove mgmt asset tag for db2065 and db2070 [dns] - 10https://gerrit.wikimedia.org/r/561907 [21:28:19] (03PS1) 10Dzahn: hieradata/labs: change sshd_server listening IP for phabricator-stage [puppet] - 10https://gerrit.wikimedia.org/r/561908 [21:31:40] (03PS2) 10Dzahn: hieradata/labs: change vcs listening IP for phabricator-stage [puppet] - 10https://gerrit.wikimedia.org/r/561908 [21:32:32] (03CR) 10Dzahn: [C: 03+2] hieradata/labs: change vcs listening IP for phabricator-stage [puppet] - 10https://gerrit.wikimedia.org/r/561908 (owner: 10Dzahn) [21:34:53] (03CR) 10Dzahn: [C: 03+2] hieradata/labs: move Hiera keys from role-based to common.yaml for devtools [puppet] - 10https://gerrit.wikimedia.org/r/561903 (owner: 10Dzahn) [22:00:32] (03PS1) 10Dzahn: delete unused fake SSL keys [labs/private] - 10https://gerrit.wikimedia.org/r/561909 [22:11:29] (03PS1) 10Dzahn: add fake SSL keys for phabricator devtools VPSes [labs/private] - 10https://gerrit.wikimedia.org/r/561911 [22:13:25] (03CR) 10Dzahn: [V: 03+2 C: 03+2] "labs/private" [labs/private] - 10https://gerrit.wikimedia.org/r/561911 (owner: 10Dzahn) [22:14:34] !log volker-e@deploy1001 Started deploy [design/style-guide@8054026]: Deploy design/style-guide: [22:14:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:42] !log volker-e@deploy1001 Finished deploy [design/style-guide@8054026]: Deploy design/style-guide: (duration: 00m 08s) [22:14:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:17:58] (03PS1) 10Dzahn: hieradata/labs: adjust TLS cert names for phabricator in cloud [puppet] - 10https://gerrit.wikimedia.org/r/561912 [22:23:12] (03CR) 10Dzahn: [C: 03+2] "noop - https://puppet-compiler.wmflabs.org/compiler1001/20192/" [puppet] - 10https://gerrit.wikimedia.org/r/561904 (owner: 10Dzahn) [22:27:03] (03CR) 10Dzahn: [C: 03+2] hieradata/labs: adjust TLS cert names for phabricator in cloud [puppet] - 10https://gerrit.wikimedia.org/r/561912 (owner: 10Dzahn) [22:45:15] (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt asset tag for db2065 and db2070 [dns] - 10https://gerrit.wikimedia.org/r/561907 (owner: 10Papaul) [22:46:26] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission db2065.codfw.wmnet - https://phabricator.wikimedia.org/T239046 (10Papaul) [22:46:32] 10Operations, 10DBA: Decommission db2043-db2070 - https://phabricator.wikimedia.org/T228258 (10Papaul) [22:46:34] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission db2065.codfw.wmnet - https://phabricator.wikimedia.org/T239046 (10Papaul) 05Open→03Resolved Complete [22:47:22] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2070.codfw.wmnet - https://phabricator.wikimedia.org/T239684 (10Papaul) [22:47:36] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2070.codfw.wmnet - https://phabricator.wikimedia.org/T239684 (10Papaul) 05Open→03Resolved complete. [22:54:56] (03PS2) 10Volans: dns: include all IP addresses with FQDN [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/561601 (https://phabricator.wikimedia.org/T233183) [22:54:58] (03PS2) 10Volans: dns: generate correct zone name in all cases [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/561602 (https://phabricator.wikimedia.org/T233183) [22:55:00] (03PS2) 10Volans: dns: sort records by the rightmost part [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/561603 (https://phabricator.wikimedia.org/T233183) [22:55:02] (03PS1) 10Volans: dns: manage also devices in Inventory state [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/561917 (https://phabricator.wikimedia.org/T233183) [22:55:04] (03PS1) 10Volans: dns: manage separately servers from other devices [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/561918 (https://phabricator.wikimedia.org/T233183) [23:00:19] 10Operations, 10Mail: CA App Synthetic Monitor Mail (SMTP): Connection timed out; connect(): -2 - https://phabricator.wikimedia.org/T240906 (10herron) >>! In T240906#5762100, @jcrespo wrote: > There has been multiple of mx1001 issues lately (even if that is unreliable, it is worth noting). My suggestion would... [23:03:06] 10Operations, 10ops-eqiad, 10decommission, 10User-jijiki: Decommission rdb1001, rdb1002, rdb1003, rdb1004, rdb1007, rdb1008 - https://phabricator.wikimedia.org/T209181 (10Papaul) ` papaul@asw2-c-eqiad# show | compare [edit interfaces] - ge-5/0/30 { - description rdb1008; - } ` ` papaul@asw2-c-eq... [23:17:19] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Hardware asset tag Netbox/DNS mgmt inconsistencies - https://phabricator.wikimedia.org/T239597 (10Jclark-ctr) [23:20:39] (03PS1) 10Dzahn: phabricator: skip LVS-realserver setup in "labs" [puppet] - 10https://gerrit.wikimedia.org/r/561922 [23:21:37] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Hardware asset tag Netbox/DNS mgmt inconsistencies - https://phabricator.wikimedia.org/T239597 (10Jclark-ctr) [23:23:51] (03CR) 10Dzahn: [C: 03+2] "noop - https://puppet-compiler.wmflabs.org/compiler1003/20193/" [puppet] - 10https://gerrit.wikimedia.org/r/561922 (owner: 10Dzahn) [23:24:10] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Hardware asset tag Netbox/DNS mgmt inconsistencies - https://phabricator.wikimedia.org/T239597 (10Jclark-ctr) [23:25:06] (03PS1) 10Papaul: DNS: Remove promethium from DNS [dns] - 10https://gerrit.wikimedia.org/r/561923 [23:25:08] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Hardware asset tag Netbox/DNS mgmt inconsistencies - https://phabricator.wikimedia.org/T239597 (10Jclark-ctr) Verified host asset tags updated ticket . Corrected labstore1007 in netbox. [23:26:44] (03CR) 10Dzahn: [C: 03+1] DNS: Remove promethium from DNS [dns] - 10https://gerrit.wikimedia.org/r/561923 (owner: 10Papaul) [23:26:48] (03CR) 10Papaul: [C: 03+2] DNS: Remove promethium from DNS [dns] - 10https://gerrit.wikimedia.org/r/561923 (owner: 10Papaul) [23:28:53] (03CR) 10Dzahn: "see one remnant left in install_server (DHCP)" [dns] - 10https://gerrit.wikimedia.org/r/561923 (owner: 10Papaul) [23:29:16] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10Papaul) [23:29:24] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission old and unused/spare servers in eqiad - https://phabricator.wikimedia.org/T187473 (10Papaul) [23:29:27] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10Papaul) 05Open→03Resolved Complete [23:30:44] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10Dzahn) It is still in DHCP config. [23:35:02] (03PS1) 10Dzahn: DHCP: remove promethium [puppet] - 10https://gerrit.wikimedia.org/r/561924 (https://phabricator.wikimedia.org/T191362) [23:36:40] (03PS1) 10Volans: mgmt: fix asset tags based on the physical label [dns] - 10https://gerrit.wikimedia.org/r/561925 (https://phabricator.wikimedia.org/T239597) [23:37:11] (03CR) 10Dzahn: [C: 03+2] DHCP: remove promethium [puppet] - 10https://gerrit.wikimedia.org/r/561924 (https://phabricator.wikimedia.org/T191362) (owner: 10Dzahn) [23:41:37] (03PS1) 10Papaul: DHCP: Remove MAC address entry for promethium [puppet] - 10https://gerrit.wikimedia.org/r/561926 (https://phabricator.wikimedia.org/T191362) [23:42:23] (03PS2) 10Dzahn: DHCP: Remove MAC address entry for promethium [puppet] - 10https://gerrit.wikimedia.org/r/561926 (https://phabricator.wikimedia.org/T191362) (owner: 10Papaul) [23:42:43] (03CR) 10Dzahn: "already done in https://gerrit.wikimedia.org/r/c/operations/puppet/+/561924 so it rebased to nothing" [puppet] - 10https://gerrit.wikimedia.org/r/561926 (https://phabricator.wikimedia.org/T191362) (owner: 10Papaul) [23:49:43] (03PS1) 10Gergő Tisza: GrowthExperiments: use local search in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561927 (https://phabricator.wikimedia.org/T235717) [23:50:27] 10Operations, 10Domains, 10Traffic: nameserver change for wikimedia.sk - https://phabricator.wikimedia.org/T241084 (10CRoslof) I've updated the nameservers. Let me know if there are any issues and you would like me to change them back. [23:51:21] (03Abandoned) 10Papaul: DHCP: Remove MAC address entry for promethium [puppet] - 10https://gerrit.wikimedia.org/r/561926 (https://phabricator.wikimedia.org/T191362) (owner: 10Papaul)