[00:07:33] (03CR) 10Samwilson: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651890 (https://phabricator.wikimedia.org/T255790) (owner: 10Samwilson) [01:01:51] PROBLEM - Postgres Replication Lag on maps2006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 3891481560 and 255 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:02:01] PROBLEM - Postgres Replication Lag on maps2010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2292106640 and 145 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:02:39] PROBLEM - Postgres Replication Lag on maps2008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1387632736 and 128 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:03:41] RECOVERY - Postgres Replication Lag on maps2010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 50832 and 106 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:04:19] RECOVERY - Postgres Replication Lag on maps2008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 242552 and 144 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:05:13] RECOVERY - Postgres Replication Lag on maps2006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 165744 and 197 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:45:23] PROBLEM - Ensure local MW versions match expected deployment on deploy1002 is CRITICAL: CRITICAL: Missing 5 sites from wikiversions. 957 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [02:48:23] PROBLEM - Ensure local MW versions match expected deployment on deploy2002 is CRITICAL: CRITICAL: Missing 5 sites from wikiversions. 957 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [04:18:24] (03PS1) 10Gergő Tisza: Alphabetize ORES settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655301 (https://phabricator.wikimedia.org/T256887) [04:18:26] (03PS1) 10Gergő Tisza: Enable ORES filters on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655302 (https://phabricator.wikimedia.org/T256887) [04:50:23] PROBLEM - SSH on logstash1008 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [04:56:47] PROBLEM - ElasticSearch health check for shards on 9200 on logstash1008 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(requests.packages.urllib3.connection.HTTPConnection object at 0x7f57536b24e0: Failed to establish a new connection: [Errno 111] Connection [04:56:47] ://wikitech.wikimedia.org/wiki/Search%23Administration [04:56:55] RECOVERY - SSH on logstash1008 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [04:57:25] PROBLEM - Check systemd state on logstash1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:08:35] PROBLEM - Check nf_conntrack usage in neutron netns on cloudnet1004 is CRITICAL: CRITICAL: nf_conntrack usage over 80% in netns qrouter-d93771ba-2711-4f88-804a-8df6fd03978a https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [05:15:47] RECOVERY - Check systemd state on logstash1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:16:35] PROBLEM - cassandra service on maps1009 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [05:16:45] RECOVERY - ElasticSearch health check for shards on 9200 on logstash1008 is OK: OK - elasticsearch status production-logstash-eqiad: delayed_unassigned_shards: 0, unassigned_shards: 0, number_of_in_flight_fetch: 0, number_of_nodes: 6, active_shards: 916, number_of_data_nodes: 3, task_max_waiting_in_queue_millis: 0, cluster_name: production-logstash-eqiad, timed_out: False, number_of_pending_tasks: 0, status: green, active_shards_ [05:16:45] : 100.0, active_primary_shards: 483, relocating_shards: 0, initializing_shards: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration [05:16:45] PROBLEM - tilerator on maps1009 is CRITICAL: connect to address 10.64.32.8 and port 6534: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/tilerator [05:17:09] PROBLEM - cassandra CQL 10.64.32.8:9042 on maps1009 is CRITICAL: connect to address 10.64.32.8 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886 [05:17:11] PROBLEM - Check systemd state on maps1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:33:51] PROBLEM - tilerator on maps2007 is CRITICAL: connect to address 10.192.32.46 and port 6534: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/tilerator [05:34:55] PROBLEM - Check systemd state on maps2007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:36:27] PROBLEM - Long running screen/tmux on maps1009 is CRITICAL: CRIT: Long running SCREEN process. (user: root PID: 141381, 3843711s 1728000s). https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens [06:03:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13698 and previous config saved to /var/cache/conftool/dbconfig/20210111-060342-marostegui.json [06:03:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:14] !log Deploy schema change on s7 codfw master - T270187 [06:04:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:16] T270187: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 [06:04:26] !log Depool db1121 to clone db1155:3314 [06:04:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:05:38] (03PS1) 10Marostegui: db1121: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/655304 [06:06:44] (03CR) 10Marostegui: [C: 03+2] db1121: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/655304 (owner: 10Marostegui) [06:13:39] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [06:15:19] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [06:21:00] (03CR) 10Marostegui: "> Patch Set 35:" [puppet] - 10https://gerrit.wikimedia.org/r/627379 (https://phabricator.wikimedia.org/T260389) (owner: 10Bstorm) [06:31:25] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P13699 and previous config saved to /var/cache/conftool/dbconfig/20210111-063124-marostegui.json [06:31:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:31:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1094', diff saved to https://phabricator.wikimedia.org/P13700 and previous config saved to /var/cache/conftool/dbconfig/20210111-063155-marostegui.json [06:31:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:32:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P13701 and previous config saved to /var/cache/conftool/dbconfig/20210111-063226-marostegui.json [06:32:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13702 and previous config saved to /var/cache/conftool/dbconfig/20210111-064046-root.json [06:40:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13703 and previous config saved to /var/cache/conftool/dbconfig/20210111-065550-root.json [06:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P13704 and previous config saved to /var/cache/conftool/dbconfig/20210111-065640-marostegui.json [06:56:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:28] (03PS1) 10Marostegui: install_server: Do not reimage db1155 [puppet] - 10https://gerrit.wikimedia.org/r/655305 [06:59:11] (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db1155 [puppet] - 10https://gerrit.wikimedia.org/r/655305 (owner: 10Marostegui) [07:03:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13706 and previous config saved to /var/cache/conftool/dbconfig/20210111-070342-root.json [07:03:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:50] !log Deploy schema change on s8 codfw master - T270187 [07:12:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:53] T270187: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 [07:14:06] 10SRE, 10SRE-Access-Requests: Hue access for Peter Pelberg - https://phabricator.wikimedia.org/T271602 (10Joe) p:05Triage→03Medium [07:16:35] 10SRE, 10SRE-Access-Requests: Hue access for Peter Pelberg - https://phabricator.wikimedia.org/T271602 (10Joe) While we wait for the manager approval, I will loop in @Ottomata for analytics approval. @ppelberg please provide the *public* ssh key you generated specifically for accessing production services, so... [07:18:46] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13707 and previous config saved to /var/cache/conftool/dbconfig/20210111-071846-root.json [07:18:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:21:27] 10SRE, 10LDAP: Create auto-populated LDAP group of those who have production shell access - https://phabricator.wikimedia.org/T271587 (10Joe) This would practically be a subset of `cn=wmf` + `cn=wmde` + `cn=nda`, which I thought already had access to klaxon, per `profile::klaxon`. Am I missing some detail? [07:23:20] 10SRE, 10SRE-Access-Requests: Hue access for Peter Pelberg - https://phabricator.wikimedia.org/T271602 (10elukey) @ppelberg Hi! Are you planning to ssh to stat100x hosts to explore data via cli tools like hive/presto/etc.. or are you looking for a way to explore data via a UI? If so we have a new Analytics acc... [07:25:25] 10SRE, 10SRE-Access-Requests: Hue access for Peter Pelberg - https://phabricator.wikimedia.org/T271602 (10Joe) >>! In T271602#6735021, @elukey wrote: > @ppelberg Hi! Are you planning to ssh to stat100x hosts to explore data via cli tools like hive/presto/etc.. or are you looking for a way to explore data via a... [07:27:11] 10SRE, 10serviceops: upgrade conf2* servers to stretch - https://phabricator.wikimedia.org/T271573 (10Joe) @elukey we can do the same for etcd, as long as we don't use codfw for clients. So my proposal would be to: - change dns/configuration of every etcd client to point to eqiad - reimage one server at a tim... [07:27:16] 10SRE, 10serviceops: upgrade conf2* servers to stretch - https://phabricator.wikimedia.org/T271573 (10Joe) p:05Triage→03High [07:33:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13708 and previous config saved to /var/cache/conftool/dbconfig/20210111-073349-root.json [07:33:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:17] 10SRE, 10serviceops: upgrade conf2* servers to stretch - https://phabricator.wikimedia.org/T271573 (10elukey) Seems good! For Zookeeper it should be a matter of reimaging one node at the time, conf200x are already running the stretch version backported, so there shouldn't be any upgrade to do in theory. Data f... [07:41:02] !log depooling & restarting blazegraph on wdqs1007 (T242453) [07:41:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:05] T242453: Deadlock in blazegraph blocking all queries and updates - https://phabricator.wikimedia.org/T242453 [07:43:55] !log repool wdqs1007 (wrong machine) (T242453) [07:43:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:09] 10SRE, 10serviceops: upgrade conf2* servers to stretch - https://phabricator.wikimedia.org/T271573 (10elukey) Also in the procedure we should remember to stop and downtime `etcdmirror` on conf2002 when we reimage the node. [07:44:18] (03PS1) 10Effie Mouzeli: hiera: upgrade mc1030, mc2030 to buster [puppet] - 10https://gerrit.wikimedia.org/r/655372 (https://phabricator.wikimedia.org/T213089) [07:45:31] (03PS1) 10Effie Mouzeli: hiera: upgrade mc1029, mc2029 to buster [puppet] - 10https://gerrit.wikimedia.org/r/655373 (https://phabricator.wikimedia.org/T21308) [07:46:35] (03PS1) 10Effie Mouzeli: hiera: upgrade mc1028, mc2037 to buster [puppet] - 10https://gerrit.wikimedia.org/r/655374 (https://phabricator.wikimedia.org/T213089) [07:48:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13709 and previous config saved to /var/cache/conftool/dbconfig/20210111-074853-root.json [07:48:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:17] !log depooling & restarting blazegraph on wdqs2007 (T242453) [07:49:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:19] T242453: Deadlock in blazegraph blocking all queries and updates - https://phabricator.wikimedia.org/T242453 [07:49:41] PROBLEM - Query Service HTTP Port on wdqs2007 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 7.279 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [07:51:11] RECOVERY - Query Service HTTP Port on wdqs2007 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.024 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [07:53:28] (03Abandoned) 10ZPapierski: Fix deployment for internal wdqs hosts [puppet] - 10https://gerrit.wikimedia.org/r/654597 (owner: 10ZPapierski) [07:55:41] (03PS5) 10Muehlenhoff: Stop installing apt-transport-https on Buster and prune it from Stretch installs [puppet] - 10https://gerrit.wikimedia.org/r/646654 [07:57:41] (03PS2) 10Effie Mouzeli: hiera: upgrade mc1029, mc2029 to buster [puppet] - 10https://gerrit.wikimedia.org/r/655373 (https://phabricator.wikimedia.org/T213089) [08:01:37] (03CR) 10Effie Mouzeli: [C: 03+2] hiera: upgrade mc1030, mc2030 to buster [puppet] - 10https://gerrit.wikimedia.org/r/655372 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli) [08:03:33] 10SRE, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc1030.eqiad.wmnet ` The log can be found i... [08:03:49] 10SRE, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc2030.codfw.wmnet ` The log can be found i... [08:07:56] (03CR) 10Giuseppe Lavagetto: Add typing support (033 comments) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655031 (owner: 10Giuseppe Lavagetto) [08:08:04] (03PS3) 10Giuseppe Lavagetto: Add typing support [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655031 [08:12:06] (03PS2) 10David Caro: wmcs.backup_glance_images: disable the backups on 1003 and 1004 [puppet] - 10https://gerrit.wikimedia.org/r/655095 (https://phabricator.wikimedia.org/T270478) [08:16:28] 10SRE, 10docker-pkg, 10serviceops: Docker image on the build host seem to ignore apt priority for wikimedia packages - https://phabricator.wikimedia.org/T268612 (10Joe) 05Open→03Resolved [08:19:46] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc2030.codfw.wmnet with reason: REIMAGE [08:19:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:29] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1030.eqiad.wmnet with reason: REIMAGE [08:20:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:34] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2030.codfw.wmnet with reason: REIMAGE [08:22:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:50] (03CR) 10JMeybohm: [C: 03+1] "Cool. LGTM!" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655031 (owner: 10Giuseppe Lavagetto) [08:24:27] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1030.eqiad.wmnet with reason: REIMAGE [08:24:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:02] (03CR) 10David Caro: "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/655095 (https://phabricator.wikimedia.org/T270478) (owner: 10David Caro) [08:31:41] (03CR) 10Muehlenhoff: [C: 03+2] Stop installing apt-transport-https on Buster and prune it from Stretch installs [puppet] - 10https://gerrit.wikimedia.org/r/646654 (owner: 10Muehlenhoff) [08:33:43] (03PS4) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-es-exporter [puppet] - 10https://gerrit.wikimedia.org/r/647730 (https://phabricator.wikimedia.org/T135991) [08:35:44] (03CR) 10Hashar: [C: 03+1] "Some random notes about setting tox / testenv but that can be tweaked later :]" (032 comments) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655031 (owner: 10Giuseppe Lavagetto) [08:38:41] (03CR) 10Muehlenhoff: [C: 03+2] Enable base::service_auto_restart for prometheus-es-exporter [puppet] - 10https://gerrit.wikimedia.org/r/647730 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [08:41:34] 10SRE, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2030.codfw.wmnet'] ` and were **ALL** successful. [08:43:19] 10SRE, 10ops-eqiad: Please remove sdb from ms-be1022 - https://phabricator.wikimedia.org/T271512 (10fgiunchedi) @wiki_willy that's correct yes, no need to replace the drive. The action in this case is to physically pull the disk since it isn't functional. The rationale being that when decom time comes all the... [08:45:33] (03Abandoned) 10Muehlenhoff: Fix directive used in keyholder proxy [puppet] - 10https://gerrit.wikimedia.org/r/546440 (owner: 10Muehlenhoff) [08:46:04] (03PS4) 10Muehlenhoff: Enable managed adduser/sysusers config also for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/602286 (https://phabricator.wikimedia.org/T235162) [08:54:36] !log swift codfw-prod: more weight to ms-be20[58-61] - T269337 [08:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:11] T269337: Add ms-be20[58-61] to swift - https://phabricator.wikimedia.org/T269337 [08:58:27] PROBLEM - ElasticSearch health check for shards on 9200 on logstash1007 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(requests.packages.urllib3.connection.HTTPConnection object at 0x7fac796944e0: Failed to establish a new connection: [Errno 111] Connection [08:58:27] ://wikitech.wikimedia.org/wiki/Search%23Administration [09:01:08] 10SRE, 10CAS-SSO: Update CAS to 6.3 - https://phabricator.wikimedia.org/T271684 (10MoritzMuehlenhoff) [09:02:06] !log force puppet on logstash1007 after ES OOM [09:02:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:07] RECOVERY - ElasticSearch health check for shards on 9200 on logstash1007 is OK: OK - elasticsearch status production-logstash-eqiad: number_of_in_flight_fetch: 0, active_shards: 916, number_of_pending_tasks: 0, timed_out: False, cluster_name: production-logstash-eqiad, number_of_nodes: 6, number_of_data_nodes: 3, delayed_unassigned_shards: 0, active_primary_shards: 483, relocating_shards: 0, initializing_shards: 0, task_max_waiti [09:03:07] s: 0, unassigned_shards: 0, status: green, active_shards_percent_as_number: 100.0 https://wikitech.wikimedia.org/wiki/Search%23Administration [09:22:25] (03PS1) 10Muehlenhoff: Only install base packages list for Stretch and later [puppet] - 10https://gerrit.wikimedia.org/r/655381 [09:31:35] !log Deploy schema change on s1 codfw master - T270187 [09:31:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:38] T270187: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 [09:31:55] !log Sanitize db1155:3314 - T268742 [09:31:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:57] T268742: Test upgrading sanitarium hosts to Buster + 10.4 - https://phabricator.wikimedia.org/T268742 [09:33:35] 10SRE, 10docker-pkg, 10serviceops: docker-pkg update cli renders unclear guidance - https://phabricator.wikimedia.org/T253131 (10Joe) a:03Joe I agree @rzl, your proposal is probably the most user-friendly solution. Rationale here was I felt I had no need to replicate in every subparser the same last argume... [09:55:12] (03CR) 10Muehlenhoff: Enable managed adduser/sysusers config also for WMCS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/602286 (https://phabricator.wikimedia.org/T235162) (owner: 10Muehlenhoff) [09:55:18] 10SRE, 10Traffic, 10Patch-For-Review: ats-be occasional system CPU usage increase - https://phabricator.wikimedia.org/T265625 (10ema) A low-hanging fruit when it comes to Lua overhead seems to be tuning the number of allowed Lua states. By looking at the internal tslua statistics on cp3050, it seems that mos... [09:56:35] (03PS5) 10Muehlenhoff: Enable managed adduser/sysusers config also for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/602286 (https://phabricator.wikimedia.org/T235162) [09:57:04] (03CR) 10Ema: [V: 03+1 C: 03+2] ATS: make number of allowed Lua states configurable [puppet] - 10https://gerrit.wikimedia.org/r/655043 (https://phabricator.wikimedia.org/T265625) (owner: 10Ema) [09:59:07] 10SRE, 10LDAP: Create auto-populated LDAP group of those who have production shell access - https://phabricator.wikimedia.org/T271587 (10MoritzMuehlenhoff) >>! In T271587#6735019, @Joe wrote: > This would practically be a subset of `cn=wmf` + `cn=wmde` + `cn=nda`, which I thought already had access to klaxon,... [09:59:42] (03CR) 10Muehlenhoff: [C: 03+2] Enable managed adduser/sysusers config also for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/602286 (https://phabricator.wikimedia.org/T235162) (owner: 10Muehlenhoff) [10:00:06] (03PS2) 10David Caro: wmcs.sge.prometheus Retry getting the job count [puppet] - 10https://gerrit.wikimedia.org/r/655384 (https://phabricator.wikimedia.org/T271686) [10:00:08] (03PS2) 10David Caro: wmcs.sge.prometheus: blacked and isorted [puppet] - 10https://gerrit.wikimedia.org/r/655385 [10:01:35] (03CR) 10Muehlenhoff: "The patch for WMCS is now merged, but let's wait a few more days to land this." [puppet] - 10https://gerrit.wikimedia.org/r/644808 (https://phabricator.wikimedia.org/T235162) (owner: 10Jbond) [10:01:51] (03CR) 10Ema: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27414/console" [puppet] - 10https://gerrit.wikimedia.org/r/655044 (https://phabricator.wikimedia.org/T265625) (owner: 10Ema) [10:03:55] (03CR) 10Ema: [V: 03+1 C: 03+2] ATS: lower number of allowed Lua states on cp3050 [puppet] - 10https://gerrit.wikimedia.org/r/655044 (https://phabricator.wikimedia.org/T265625) (owner: 10Ema) [10:05:43] RECOVERY - Check nf_conntrack usage in neutron netns on cloudnet1004 is OK: OK: everything is apparently fine https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [10:06:16] (03CR) 10Muehlenhoff: "Shouldn't we move the defined check in the base restart class instead?" [puppet] - 10https://gerrit.wikimedia.org/r/655383 (owner: 10Elukey) [10:06:22] !log cp3050: restart ats-be to lower lua states from 256 to 64 T265625 [10:06:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:25] T265625: ats-be occasional system CPU usage increase - https://phabricator.wikimedia.org/T265625 [10:12:09] (03CR) 10Elukey: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/655383 (owner: 10Elukey) [10:15:38] (03CR) 10Muehlenhoff: [C: 03+1] "Ack, makes sense." [puppet] - 10https://gerrit.wikimedia.org/r/655383 (owner: 10Elukey) [10:16:17] (03CR) 10Muehlenhoff: [C: 03+2] Enable base::service_auto_restart for purged [puppet] - 10https://gerrit.wikimedia.org/r/646971 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [10:19:37] (03PS1) 10Arturo Borrero Gonzalez: cloud: neutron: l3_agent: double size of conntrack table [puppet] - 10https://gerrit.wikimedia.org/r/655407 (https://phabricator.wikimedia.org/T271058) [10:20:25] (03CR) 10Elukey: [C: 03+2] profile::hadoop::yarn_proxy_testcluster: guard auto-restart inclusion [puppet] - 10https://gerrit.wikimedia.org/r/655383 (owner: 10Elukey) [10:21:46] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloud: neutron: l3_agent: double size of conntrack table [puppet] - 10https://gerrit.wikimedia.org/r/655407 (https://phabricator.wikimedia.org/T271058) (owner: 10Arturo Borrero Gonzalez) [10:29:12] jouncebot: now [10:29:12] No deployments scheduled for the next 1 hour(s) and 0 minute(s) [10:29:43] (03CR) 10Urbanecm: [C: 03+2] Enable anniversary logo for cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655292 (https://phabricator.wikimedia.org/T271662) (owner: 10Urbanecm) [10:30:32] (03Merged) 10jenkins-bot: Enable anniversary logo for cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655292 (https://phabricator.wikimedia.org/T271662) (owner: 10Urbanecm) [10:30:47] (03PS2) 10David Caro: cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) [10:32:02] (03PS4) 10Jbond: P:toolforge: migrate to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639826 (https://phabricator.wikimedia.org/T266479) [10:33:31] (03CR) 10jerkins-bot: [V: 04-1] P:toolforge: migrate to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639826 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [10:35:05] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: ab9e80dad5c44ff72a6fa7568a5ba59798df3d4e: Enable anniversary logo for cs.wikipedia (T271662; 1/2) (duration: 01m 00s) [10:35:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:08] T271662: Enable anniversary logo for cs.wikipedia - 20th birthday - https://phabricator.wikimedia.org/T271662 [10:35:41] (03PS5) 10Jbond: P:toolforge: migrate to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639826 (https://phabricator.wikimedia.org/T266479) [10:36:17] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: ab9e80dad5c44ff72a6fa7568a5ba59798df3d4e: Enable anniversary logo for cs.wikipedia (T271662; 2/2) (duration: 00m 56s) [10:36:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:04] 10SRE, 10Wikimedia-Mailing-lists: wikipedia-mai & wikiur-l mail archives are empty after August 2018 & January 2019 respectively - https://phabricator.wikimedia.org/T270837 (10Aklapper) @jayantanth: If this is a request to have active mailing list moderators/admins (not sure if it is?): Can you please find, a... [10:45:33] 10SRE, 10Wikimedia-Mailing-lists: wikipedia-mai & wikiur-l lists do not seem to have active list admins (mail archives empty after August 2018 & January 2019) - https://phabricator.wikimedia.org/T270837 (10Aklapper) [10:46:17] 10SRE, 10Wikimedia-Mailing-lists: wikipedia-mai & wikiur-l lists do not seem to have active list admins (mail archives empty after August 2018 & January 2019) - https://phabricator.wikimedia.org/T270837 (10Aklapper) p:05Medium→03Low @jbond: I don't understand how this is medium priority given that WMF cann... [10:49:20] (03CR) 10Arturo Borrero Gonzalez: cloud.encapi: enable ssl nginx vhost (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) (owner: 10David Caro) [10:50:00] 10SRE, 10Wikimedia-Mailing-lists: wikipedia-mai & wikiur-l lists do not seem to have active list admins (mail archives empty after August 2018 & January 2019) - https://phabricator.wikimedia.org/T270837 (10jbond) >>! In T270837#6735598, @Aklapper wrote: > @jbond: I don't understand how this is medium priority... [11:00:51] (03PS1) 10Giuseppe Lavagetto: Fix exception raised by build process [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655409 (https://phabricator.wikimedia.org/T226728) [11:00:53] (03PS1) 10Giuseppe Lavagetto: Fix UX of the argument parser [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655410 (https://phabricator.wikimedia.org/T253131) [11:00:55] (03PS1) 10Giuseppe Lavagetto: Always refresh the base images [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655411 (https://phabricator.wikimedia.org/T219398) [11:00:57] (03PS1) 10Giuseppe Lavagetto: Add ability to separate the apt and the general http proxy [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655412 (https://phabricator.wikimedia.org/T183545) [11:01:53] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add typing support [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655031 (owner: 10Giuseppe Lavagetto) [11:02:31] (03CR) 10jerkins-bot: [V: 04-1] Always refresh the base images [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655411 (https://phabricator.wikimedia.org/T219398) (owner: 10Giuseppe Lavagetto) [11:02:33] (03CR) 10jerkins-bot: [V: 04-1] Fix UX of the argument parser [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655410 (https://phabricator.wikimedia.org/T253131) (owner: 10Giuseppe Lavagetto) [11:02:37] (03CR) 10Jbond: [C: 03+2] P:toolforge: migrate to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639826 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [11:03:07] (03CR) 10jerkins-bot: [V: 04-1] Add ability to separate the apt and the general http proxy [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655412 (https://phabricator.wikimedia.org/T183545) (owner: 10Giuseppe Lavagetto) [11:04:00] (03Merged) 10jenkins-bot: Add typing support [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/655031 (owner: 10Giuseppe Lavagetto) [11:10:11] (03CR) 10Faidon Liambotis: "FWIW my PR against apt and is in unstable (in time for the freeze!). So the http->https redirects will start to function in bullseye. The " [puppet] - 10https://gerrit.wikimedia.org/r/651300 (owner: 10Legoktm) [11:10:19] !log push change to ratelimit vscode-phabricator - T271528 [11:10:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:23] T271528: Excessive queries from vscode-phabricator - https://phabricator.wikimedia.org/T271528 [11:10:26] !log upgrade Routinator to 0.8.2 on rpki2001 - T269738 [11:10:28] (03CR) 10Jbond: [C: 03+2] varnish: ratelimit vscode-phabricator plugin [puppet] - 10https://gerrit.wikimedia.org/r/650494 (https://phabricator.wikimedia.org/T270482) (owner: 10Jbond) [11:10:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:29] T269738: Upgrade Routinator 3000 to 0.8.2 - https://phabricator.wikimedia.org/T269738 [11:14:29] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={pdu_sentry4,routinator} site={codfw,eqsin} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:17:30] (03PS1) 10Elukey: tox: add support for Python 3.9 [cookbooks] - 10https://gerrit.wikimedia.org/r/655413 [11:17:32] (03PS1) 10Elukey: sre.hadoop.change-distro-from-cdh: add success threshold for workers [cookbooks] - 10https://gerrit.wikimedia.org/r/655414 [11:17:37] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:20:27] (03CR) 10Jbond: "Ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/651174 (https://phabricator.wikimedia.org/T193762) (owner: 10Jbond) [11:21:40] (03CR) 10Jbond: "Ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/651171 (https://phabricator.wikimedia.org/T270618) (owner: 10Jbond) [11:27:12] 10SRE, 10CAS-SSO: Update CAS to 6.3 - https://phabricator.wikimedia.org/T271684 (10MoritzMuehlenhoff) p:05Triage→03Medium [11:29:58] 10Puppet, 10SRE, 10puppet-compiler, 10Patch-For-Review, 10User-jbond: OKR: Work required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10jbond) [11:30:04] jan_drewniak: It is that lovely time of the day again! You are hereby commanded to deploy Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210111T1130). [11:30:44] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655418 (https://phabricator.wikimedia.org/T128546) [11:34:19] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655418 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:35:08] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655418 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:35:57] 10SRE, 10Performance-Team, 10Traffic: Enable webp thumbnails on all images for non-Commons wikis - https://phabricator.wikimedia.org/T269946 (10Gilles) p:05Triage→03Medium [11:37:50] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:655418| Bumping portals to master (T128546)]] (duration: 00m 56s) [11:37:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:56] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [11:38:54] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:655418| Bumping portals to master (T128546)]] (duration: 01m 03s) [11:38:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:24] (03CR) 10Volans: [C: 03+1] "LGTM, do they pass locally the tests? They are skipped in CI for now..." [cookbooks] - 10https://gerrit.wikimedia.org/r/655413 (owner: 10Elukey) [11:42:36] (03PS1) 10Hnowlan: similar-users: rename sockpuppet-api [labs/private] - 10https://gerrit.wikimedia.org/r/655420 (https://phabricator.wikimedia.org/T268837) [11:44:10] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/655414 (owner: 10Elukey) [11:47:06] 10Puppet, 10SRE, 10puppet-compiler, 10User-jbond: puppet master command will be removed in puppet 6 - https://phabricator.wikimedia.org/T236373 (10jbond) https://github.com/hlindberg/misc-puppet-docs/blob/PUP-6841_document-parser-api/parser_api/parser_api.md [11:47:29] (03CR) 10Hnowlan: [V: 03+2 C: 03+2] similar-users: rename sockpuppet-api [labs/private] - 10https://gerrit.wikimedia.org/r/655420 (https://phabricator.wikimedia.org/T268837) (owner: 10Hnowlan) [12:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do European mid-day backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210111T1200). [12:00:04] No GERRIT patches in the queue for this window AFAICS. [12:00:30] I'll deploy sth in a minute. [12:01:54] (03CR) 10Hnowlan: [C: 03+2] deployment: rename sockpuppet-api to similar-users [puppet] - 10https://gerrit.wikimedia.org/r/655047 (https://phabricator.wikimedia.org/T268837) (owner: 10Hnowlan) [12:02:59] (03CR) 10Marostegui: "+1 to push. Keep in mind that also even if puppet fails or compiles with an unexpected result, the proxies won't reload themselves, they n" [puppet] - 10https://gerrit.wikimedia.org/r/655174 (https://phabricator.wikimedia.org/T271476) (owner: 10Bstorm) [12:16:19] (03CR) 10Hnowlan: "The config in puppet and `private` has now been changed to refer to `similar-users` instead of `sockpuppet-api`" [deployment-charts] - 10https://gerrit.wikimedia.org/r/643721 (https://phabricator.wikimedia.org/T268837) (owner: 10Hnowlan) [12:19:07] PROBLEM - SSH on logstash1008 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [12:19:25] 10Puppet, 10SRE, 10puppet-compiler, 10Patch-For-Review, 10User-jbond: OKR: Work required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10jbond) [12:21:31] PROBLEM - ElasticSearch health check for shards on 9200 on logstash1008 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(requests.packages.urllib3.connection.HTTPConnection object at 0x7f6887de14e0: Failed to establish a new connection: [Errno 111] Connection [12:21:31] ://wikitech.wikimedia.org/wiki/Search%23Administration [12:21:35] PROBLEM - Check systemd state on logstash1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:22:19] RECOVERY - SSH on logstash1008 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [12:45:31] RECOVERY - Check systemd state on logstash1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:47:05] RECOVERY - ElasticSearch health check for shards on 9200 on logstash1008 is OK: OK - elasticsearch status production-logstash-eqiad: number_of_data_nodes: 3, number_of_in_flight_fetch: 0, unassigned_shards: 0, relocating_shards: 0, active_shards_percent_as_number: 100.0, delayed_unassigned_shards: 0, timed_out: False, number_of_pending_tasks: 0, cluster_name: production-logstash-eqiad, status: green, number_of_nodes: 6, active_pr [12:47:05] , active_shards: 916, task_max_waiting_in_queue_millis: 0, initializing_shards: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration [13:05:30] (03PS3) 10Alexandros Kosiaris: eventgate, eventstreams: Log with namedlevels [deployment-charts] - 10https://gerrit.wikimedia.org/r/594492 (https://phabricator.wikimedia.org/T239459) [13:07:07] (03CR) 10Alexandros Kosiaris: [C: 03+1] "I think this can be deployed just fine now. @ottomata, any objections?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/594492 (https://phabricator.wikimedia.org/T239459) (owner: 10Alexandros Kosiaris) [13:12:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13711 and previous config saved to /var/cache/conftool/dbconfig/20210111-131206-root.json [13:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:10] (03PS2) 10Hnowlan: tegola: Add docker image. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/654662 (https://phabricator.wikimedia.org/T270170) [13:18:28] 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10LSobanski) @Jclark-ctr @Cmjohnson, what's a realistic ETA for completing the work on these servers? It would help us plan the next steps for this quarter. [13:22:24] 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10Jclark-ctr) @lsobanski I will be working on these this week. As long as nothing comes up urgent it should not be that long [13:27:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13712 and previous config saved to /var/cache/conftool/dbconfig/20210111-132709-root.json [13:27:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:18] 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10Gilles) Awesome, glad to see that the bisecting paid off! Still 361 commits between those 2 versions, though 😕 [13:31:36] 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10LSobanski) Thanks! [13:38:56] (03PS1) 10Alexandros Kosiaris: Remove evenstreams role [puppet] - 10https://gerrit.wikimedia.org/r/655425 [13:42:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13713 and previous config saved to /var/cache/conftool/dbconfig/20210111-134213-root.json [13:42:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:47] (03CR) 10Elukey: "Yes it passes, but I'll also add the config for CI. It is worth to do it also for pywmflib, more reviews coming :)" [cookbooks] - 10https://gerrit.wikimedia.org/r/655413 (owner: 10Elukey) [13:45:36] (03PS1) 10Alexandros Kosiaris: ores: Switch from oresrdb.svc to host names [puppet] - 10https://gerrit.wikimedia.org/r/655426 (https://phabricator.wikimedia.org/T270071) [13:47:50] akosiaris: new year, but ores is always in your <3 [13:47:52] :) [13:52:07] !log installing curl security updates on stretch [13:52:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:15] !loog repool wdqs2007 [13:58:54] 10SRE, 10SRE-tools, 10serviceops-radar, 10Patch-For-Review: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10akosiaris) >>! In T270071#6689398, @akosiaris wrote: >> * DNS Records with non-standard TTL. We have just one for oresrdb that has a 5M TTL instead of the default... [14:00:32] elukey: and if you think I am already in 2022, that means ORES will be around for all of 2021 for you :P [14:06:05] (03CR) 10Alexandros Kosiaris: [C: 04-1] "> Patch Set 3: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/610050 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [14:13:12] !log Deploy schema change on s3 codfw master - T270187 [14:13:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:16] T270187: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 [14:21:53] (03PS1) 10Ladsgroup: Add sources to specialSiteLinkGroups Wikibase setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655428 (https://phabricator.wikimedia.org/T138332) [14:22:46] !log restarting FPM/Apache on app server canaries for curl update [14:22:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:14] 10SRE, 10LDAP: Create auto-populated LDAP group of those who have production shell access - https://phabricator.wikimedia.org/T271587 (10jbond) >>! In T271587#6735366, @MoritzMuehlenhoff wrote: >>>! In T271587#6735019, @Joe wrote: >> This would practically be a subset of `cn=wmf` + `cn=wmde` + `cn=nda`, which... [14:30:17] 10SRE, 10SRE-Access-Requests: Hue access for Peter Pelberg - https://phabricator.wikimedia.org/T271602 (10Ottomata) Approved for LDAP and ssh if needed too. @ppelberg +1 to what Luca said, Superset's SQL Lab will probably be what you are looking for. [14:32:51] !log add Routinator 0.8.2 to APT repo - T269738 [14:32:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:54] T269738: Upgrade Routinator 3000 to 0.8.2 - https://phabricator.wikimedia.org/T269738 [14:39:44] 10SRE, 10vm-requests: esams/ulsfo/eqsin: 1 VM requested for bastions - https://phabricator.wikimedia.org/T271404 (10akosiaris) LGTM. [14:49:22] 10SRE, 10Dumps-Generation, 10SRE-Access-Requests, 10Platform Team Workboards (Clinic Duty Team): Add all of CPT to snapshot/dumpsdata admins - https://phabricator.wikimedia.org/T271718 (10ArielGlenn) p:05Triage→03Medium [14:50:00] (03PS6) 10ArielGlenn: add platform engineering folks to snapshot and dumpsdata server access [puppet] - 10https://gerrit.wikimedia.org/r/649077 (https://phabricator.wikimedia.org/T271718) [14:50:46] (03PS1) 10Marostegui: Revert "db1121: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/655388 [14:50:53] 10SRE, 10Dumps-Generation, 10SRE-Access-Requests, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Add all of CPT to snapshot/dumpsdata admins - https://phabricator.wikimedia.org/T271718 (10ArielGlenn) Adding @MoritzMuehlenhoff since he's involved in the process (thanks!) [14:51:20] (03PS1) 10DCausse: Revert "Disable sanity check cirrus jobs for Wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655389 [14:51:26] (03CR) 10Marostegui: [C: 03+2] Revert "db1121: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/655388 (owner: 10Marostegui) [14:52:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P13716 and previous config saved to /var/cache/conftool/dbconfig/20210111-145239-marostegui.json [14:52:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:45] (03PS2) 10DCausse: Revert "Disable sanity check cirrus jobs for Wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655389 (https://phabricator.wikimedia.org/T239931) [14:52:51] (03CR) 10Alexandros Kosiaris: [C: 04-1] Switch the base image to buster from stretch. (035 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/615683 (owner: 10Giuseppe Lavagetto) [14:56:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13717 and previous config saved to /var/cache/conftool/dbconfig/20210111-145612-root.json [14:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:18] 10SRE, 10netops: Upgrade Routinator 3000 to 0.8.2 - https://phabricator.wikimedia.org/T269738 (10ayounsi) All done. [14:56:20] (03CR) 10Giuseppe Lavagetto: Switch the base image to buster from stretch. (032 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/615683 (owner: 10Giuseppe Lavagetto) [14:56:51] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:58:31] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:59:11] (03CR) 10Elukey: [C: 03+2] tox: add support for Python 3.9 [cookbooks] - 10https://gerrit.wikimedia.org/r/655413 (owner: 10Elukey) [14:59:23] (03CR) 10Elukey: [C: 03+2] sre.hadoop.change-distro-from-cdh: add success threshold for workers [cookbooks] - 10https://gerrit.wikimedia.org/r/655414 (owner: 10Elukey) [15:00:04] Urbanecm and Amir1: How many deployers does it take to do Create new wikis deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210111T1500). [15:00:32] I need five minutes [15:04:04] done [15:05:45] !log jmm@cumin2001 START - Cookbook sre.ganeti.makevm [15:05:45] !log jmm@cumin2001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [15:05:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:50] (03CR) 10Jbond: [C: 03+2] cfssl_ocsprefresh: blank CR soliciting general python post-review [puppet] - 10https://gerrit.wikimedia.org/r/650120 (owner: 10Jbond) [15:06:02] !log jmm@cumin2001 START - Cookbook sre.ganeti.makevm [15:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:50] Amir1: I'm here [15:11:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13718 and previous config saved to /var/cache/conftool/dbconfig/20210111-151116-root.json [15:11:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:00] and now I'm fully here, I'm afraid I forgot to create the configs early through :/ [15:12:47] Amir1: so, creating the configs now :/ [15:13:27] 10SRE, 10MW-on-K8s, 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Pipeline): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10akosiaris) >>! In T261369#6707040, @Legoktm wrote: >>>! In T261369#6695002, @akosiaris wrote: >>>>! In T261... [15:13:29] oh it's fine, it's not too complex, we have three to create and two hours [15:15:14] (03PS2) 10Alexandros Kosiaris: Bump all helm_scaffold_versions to 0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/643057 [15:15:22] yeah, it's doable, and i did it dozens of times already :) [15:18:50] (03PS1) 10Urbanecm: Initial configuration for niawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655433 (https://phabricator.wikimedia.org/T270408) [15:19:43] (03PS2) 10Urbanecm: Initial configuration for niawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655433 (https://phabricator.wikimedia.org/T270408) [15:21:16] (03PS3) 10Urbanecm: Initial configuration for niawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655433 (https://phabricator.wikimedia.org/T270408) [15:21:19] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for niawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655433 (https://phabricator.wikimedia.org/T270408) (owner: 10Urbanecm) [15:22:09] (03PS4) 10Urbanecm: Initial configuration for niawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655433 (https://phabricator.wikimedia.org/T270408) [15:23:51] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Revert "Disable sanity check cirrus jobs for Wikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655389 (https://phabricator.wikimedia.org/T239931) (owner: 10DCausse) [15:26:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13719 and previous config saved to /var/cache/conftool/dbconfig/20210111-152619-root.json [15:26:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:52] (03PS1) 10Urbanecm: Initial configuration for niawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655436 (https://phabricator.wikimedia.org/T270409) [15:28:20] (03PS1) 10Ottomata: Migrate SpecialMuteSubmit to EventGate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655437 (https://phabricator.wikimedia.org/T268517) [15:29:28] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for niawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655436 (https://phabricator.wikimedia.org/T270409) (owner: 10Urbanecm) [15:29:41] (03PS2) 10Urbanecm: Initial configuration for niawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655436 (https://phabricator.wikimedia.org/T270409) [15:30:09] (03PS3) 10Urbanecm: Initial configuration for niawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655436 (https://phabricator.wikimedia.org/T270409) [15:31:43] !log jmm@cumin2001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [15:31:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:04] (03CR) 10Andrew Bogott: [C: 03+1] wmcs.backup_glance_images: disable the backups on 1003 and 1004 [puppet] - 10https://gerrit.wikimedia.org/r/655095 (https://phabricator.wikimedia.org/T270478) (owner: 10David Caro) [15:32:05] !log upgrading python-thumbor-wikimedia to 2.9 on thumbor1001 [15:32:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:00] Urbanecm: ok if I deploy a small config change? [15:34:17] ottomata: go ahead if it's fast [15:34:19] ya [15:34:20] k [15:34:42] (03CR) 10Ottomata: [C: 03+2] Migrate SpecialMuteSubmit to EventGate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655437 (https://phabricator.wikimedia.org/T268517) (owner: 10Ottomata) [15:35:20] (03PS1) 10Urbanecm: Initial configuration for diqwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655439 (https://phabricator.wikimedia.org/T270275) [15:35:36] 10SRE, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Papaul) The DIMM is on site, I will replace it tomorrow once onsite. [15:36:22] !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Migrate SpecialMuteSubmit to EventGate - T268517 (duration: 00m 58s) [15:36:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:25] T268517: Migrate Anti-Harassment EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T268517 [15:36:28] (03CR) 10David Caro: [C: 03+2] wmcs.backup_glance_images: disable the backups on 1003 and 1004 [puppet] - 10https://gerrit.wikimedia.org/r/655095 (https://phabricator.wikimedia.org/T270478) (owner: 10David Caro) [15:36:38] (03PS2) 10Effie Mouzeli: hiera: upgrade mc1028, mc2037 to buster [puppet] - 10https://gerrit.wikimedia.org/r/655374 (https://phabricator.wikimedia.org/T213089) [15:37:24] (03PS2) 10Urbanecm: Initial configuration for diqwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655439 (https://phabricator.wikimedia.org/T270275) [15:38:52] (03CR) 10Effie Mouzeli: [C: 03+2] hiera: upgrade mc1028, mc2037 to buster [puppet] - 10https://gerrit.wikimedia.org/r/655374 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli) [15:39:00] Urbanecm: done thank you [15:39:06] thanks [15:39:51] Amir1: we really have to figure out a better way to do the db-* files [15:39:56] 10SRE, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc1028.eqiad.wmnet ` The log can be found i... [15:40:09] 10SRE, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc2037.codfw.wmnet ` The log can be found i... [15:40:43] (03PS1) 10Ottomata: Finalize migration of SpecialMuteSubmit to event platform [puppet] - 10https://gerrit.wikimedia.org/r/655442 (https://phabricator.wikimedia.org/T268517) [15:40:53] Urbanecm: yeah, the yaml configs would fix it [15:41:06] !log andrew@deploy1001 Started deploy [striker/deploy@fb85bfd]: Striker deploy for T271621 [15:41:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:10] T271621: 500 Internal Server Error when creating new tool due to missing static assets - https://phabricator.wikimedia.org/T271621 [15:41:15] hopefully [15:41:19] we can also make the bot create the dblists patch too [15:41:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13720 and previous config saved to /var/cache/conftool/dbconfig/20210111-154123-root.json [15:41:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:07] if it makes the majority of the patch (let's say, all but logos but IS.php), could be helpful i guess [15:42:10] !log andrew@deploy1001 Finished deploy [striker/deploy@fb85bfd]: Striker deploy for T271621 (duration: 01m 04s) [15:42:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:30] (03PS1) 10Alexandros Kosiaris: Remove oresrdb.svc RRs [dns] - 10https://gerrit.wikimedia.org/r/655443 (https://phabricator.wikimedia.org/T270071) [15:42:42] (03CR) 10Ottomata: [C: 03+2] Finalize migration of SpecialMuteSubmit to event platform [puppet] - 10https://gerrit.wikimedia.org/r/655442 (https://phabricator.wikimedia.org/T268517) (owner: 10Ottomata) [15:43:31] (03PS1) 10Urbanecm: Initial configuration for bclwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655444 (https://phabricator.wikimedia.org/T270274) [15:43:55] anyway, configs are done, proceeding with the actual wiki creation [15:44:35] (03PS5) 10Urbanecm: Initial configuration for niawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655433 (https://phabricator.wikimedia.org/T270408) [15:44:40] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for niawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655433 (https://phabricator.wikimedia.org/T270408) (owner: 10Urbanecm) [15:44:45] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for bclwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655444 (https://phabricator.wikimedia.org/T270274) (owner: 10Urbanecm) [15:44:49] (03PS1) 10Ayounsi: Only configure relevant vlans on a device [homer/public] - 10https://gerrit.wikimedia.org/r/655445 [15:44:51] (03PS1) 10Ayounsi: Add new cloudsw switches [homer/public] - 10https://gerrit.wikimedia.org/r/655446 (https://phabricator.wikimedia.org/T251632) [15:45:44] !log andrew@deploy1001 Started deploy [striker/deploy@fb85bfd]: Striker deploy for T271621 [15:45:46] (03Merged) 10jenkins-bot: Initial configuration for niawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655433 (https://phabricator.wikimedia.org/T270408) (owner: 10Urbanecm) [15:45:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:20] (03PS2) 10Urbanecm: Initial configuration for bclwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655444 (https://phabricator.wikimedia.org/T270274) [15:46:57] (03CR) 10Alexandros Kosiaris: [C: 03+2] Remove evenstreams role [puppet] - 10https://gerrit.wikimedia.org/r/655425 (owner: 10Alexandros Kosiaris) [15:47:29] !log andrew@deploy1001 Finished deploy [striker/deploy@fb85bfd]: Striker deploy for T271621 (duration: 01m 45s) [15:47:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:32] T271621: 500 Internal Server Error when creating new tool due to missing static assets - https://phabricator.wikimedia.org/T271621 [15:47:46] !log andrew@deploy1001 Started deploy [striker/deploy@fb85bfd]: Striker deploy for T271621 [15:47:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:02] niawiki's db created [15:48:12] (03CR) 10Giuseppe Lavagetto: [C: 03+1] tox: Drop py27, py34, add py37 (031 comment) [software/service-checker] - 10https://gerrit.wikimedia.org/r/641969 (owner: 10Alexandros Kosiaris) [15:48:27] another wiki to patrol Urbanecm :P [15:48:28] !log andrew@deploy1001 Finished deploy [striker/deploy@fb85bfd]: Striker deploy for T271621 (duration: 00m 43s) [15:48:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:40] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Remove old trusty comment [software/service-checker] - 10https://gerrit.wikimedia.org/r/641789 (owner: 10Alexandros Kosiaris) [15:49:22] (03CR) 10Bstorm: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/655384 (https://phabricator.wikimedia.org/T271686) (owner: 10David Caro) [15:49:25] wiki works, going ot sync all the things [15:49:57] tabbycat: yeah yeah. [15:50:09] and another wiki for re.vi to maintain his userpage on :) [15:50:28] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating niawiki (T270408) (duration: 00m 56s) [15:50:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:30] T270408: Create Wikipedia Nias - https://phabricator.wikimedia.org/T270408 [15:51:26] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating niawiki (T270408) (duration: 00m 57s) [15:51:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:26] !log urbanecm@deploy1001 Synchronized dblists: Creating niawiki (T270408) (duration: 00m 57s) [15:52:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:43] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1028.eqiad.wmnet with reason: REIMAGE [15:53:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:59] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating niawiki (T270408) [15:54:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:15] RECOVERY - Ensure local MW versions match expected deployment on deploy2002 is OK: OKAY: Not alerting due to fresh production wikiversions: Missing 6 sites from wikiversions. 957 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [15:54:56] 10SRE, 10Goal, 10Patch-For-Review: FY2020-2021 Q1 DC switchover and switchback - https://phabricator.wikimedia.org/T243314 (10RLazarus) 05Open→03Resolved Yep! [15:54:58] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: Creating niawiki (T270408) (duration: 00m 56s) [15:55:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:00] !log andrew@deploy1001 Started deploy [striker/deploy@b2804f2]: Striker deploy for T271621 [15:55:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:02] T271621: 500 Internal Server Error when creating new tool due to missing static assets - https://phabricator.wikimedia.org/T271621 [15:55:15] 10SRE: FY2020-2021 Q1 eqiad -> codfw switchover - https://phabricator.wikimedia.org/T243316 (10RLazarus) 05Open→03Resolved [15:55:17] 10SRE, 10Goal, 10Patch-For-Review: FY2020-2021 Q1 DC switchover and switchback - https://phabricator.wikimedia.org/T243314 (10RLazarus) [15:55:25] 10SRE: FY2020-2021 Q1 codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10RLazarus) 05Open→03Resolved [15:55:28] 10SRE, 10Goal, 10Patch-For-Review: FY2020-2021 Q1 DC switchover and switchback - https://phabricator.wikimedia.org/T243314 (10RLazarus) [15:55:36] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE [15:55:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:47] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1028.eqiad.wmnet with reason: REIMAGE [15:55:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:54] effie: two nodes left?? [15:55:54] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating niawiki (T270408) (duration: 00m 55s) [15:55:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:58] T270408: Create Wikipedia Nias - https://phabricator.wikimedia.org/T270408 [15:56:22] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Allow skipping cert verification [software/service-checker] - 10https://gerrit.wikimedia.org/r/641790 (https://phabricator.wikimedia.org/T259686) (owner: 10Alexandros Kosiaris) [15:56:30] (03PS4) 10Urbanecm: Initial configuration for niawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655436 (https://phabricator.wikimedia.org/T270409) [15:56:36] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for niawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655436 (https://phabricator.wikimedia.org/T270409) (owner: 10Urbanecm) [15:56:50] !log urbanecm@deploy1001 Synchronized langlist: Creating niawiki (T270408) (duration: 00m 53s) [15:56:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:05] !log andrew@deploy1001 Finished deploy [striker/deploy@b2804f2]: Striker deploy for T271621 (duration: 02m 05s) [15:57:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:21] RECOVERY - Ensure local MW versions match expected deployment on deploy1002 is OK: OKAY: Not alerting due to fresh production wikiversions: Missing 6 sites from wikiversions. 957 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [15:57:31] (03Merged) 10jenkins-bot: Initial configuration for niawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655436 (https://phabricator.wikimedia.org/T270409) (owner: 10Urbanecm) [15:57:43] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE [15:57:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:59] (03CR) 10Alexandros Kosiaris: "We probably want to reapply Ib32827c9433f362600643666f0156fed567fb3bc on top of this. Note btw that the lvm module causes licensing issues" [puppet] - 10https://gerrit.wikimedia.org/r/654216 (https://phabricator.wikimedia.org/T271099) (owner: 10Jbond) [15:58:02] (03CR) 10Hnowlan: [C: 03+1] Bump all helm_scaffold_versions to 0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/643057 (owner: 10Alexandros Kosiaris) [15:58:17] elukey: 2 nides left [15:58:19] nodes* [15:58:21] damn [15:58:37] * elukey dances [15:58:40] greatttttttttt [15:58:51] so tomorrow morning, we are concluding this odyssey [15:59:00] \o/ [15:59:14] elukey: moritz will buy us beers I hear [15:59:51] or I'm collecting a crate for every Q you've been slacking on these, not sure yet :-) [16:00:17] niawiktionary's DB is created, syncing it too [16:00:25] moritzm: lol lol [16:01:23] effie: next week Moritz will ask us to prep for Bullseye [16:01:29] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating niawiktionary (T270409) (duration: 00m 56s) [16:01:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:33] T270409: Create Wiktionary Nias - https://phabricator.wikimedia.org/T270409 [16:02:10] (03PS5) 10Jbond: cfssl: helper script [puppet] - 10https://gerrit.wikimedia.org/r/654418 [16:02:20] elukey: we will ask him to come back in 2022 [16:02:22] (03CR) 10Jbond: "thanks updated" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/654418 (owner: 10Jbond) [16:02:27] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating niawiktionary (T270409) (duration: 00m 56s) [16:02:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:02:42] (03CR) 10jerkins-bot: [V: 04-1] cfssl: helper script [puppet] - 10https://gerrit.wikimedia.org/r/654418 (owner: 10Jbond) [16:03:27] !log urbanecm@deploy1001 Synchronized dblists: Creating niawiktionary (T270409) (duration: 00m 55s) [16:03:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:57] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating niawiktionary (T270409) [16:05:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:58] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: Creating niawiktionary (T270409) (duration: 00m 56s) [16:06:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:33] 10SRE, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc1028.eqiad.wmnet'] ` and were **ALL** successful. [16:06:36] (03PS3) 10Urbanecm: Initial configuration for diqwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655439 (https://phabricator.wikimedia.org/T270275) [16:06:42] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for diqwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655439 (https://phabricator.wikimedia.org/T270275) (owner: 10Urbanecm) [16:06:59] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating niawiktionary (T270409) (duration: 00m 55s) [16:07:00] (03PS1) 10Muehlenhoff: Add bast3005 [puppet] - 10https://gerrit.wikimedia.org/r/655450 (https://phabricator.wikimedia.org/T257324) [16:07:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:02] T270409: Create Wiktionary Nias - https://phabricator.wikimedia.org/T270409 [16:07:45] (03Merged) 10jenkins-bot: Initial configuration for diqwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655439 (https://phabricator.wikimedia.org/T270275) (owner: 10Urbanecm) [16:10:57] diqwiktionary's DB is created, going to sync it [16:12:02] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating diqwiktionary (T270275) (duration: 00m 55s) [16:12:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:05] T270275: Create Wiktionary Zazaki - https://phabricator.wikimedia.org/T270275 [16:13:13] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating diqwiktionary (T270275) (duration: 00m 57s) [16:13:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:12] !log urbanecm@deploy1001 Synchronized dblists: Creating diqwiktionary (T270275) (duration: 00m 57s) [16:14:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:43] (03CR) 10SBassett: [C: 03+1] "Security Team rates this low risk for now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/632598 (https://phabricator.wikimedia.org/T264834) (owner: 10Jforrester) [16:15:45] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating diqwiktionary (T270275) [16:15:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:04] (03PS3) 10Urbanecm: Initial configuration for bclwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655444 (https://phabricator.wikimedia.org/T270274) [16:17:08] !log installing remaining p11-kit security updates on stretch [16:17:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:09] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for bclwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655444 (https://phabricator.wikimedia.org/T270274) (owner: 10Urbanecm) [16:17:22] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating diqwiktionary (T270275) (duration: 01m 34s) [16:17:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:25] T270275: Create Wiktionary Zazaki - https://phabricator.wikimedia.org/T270275 [16:18:31] (03Merged) 10jenkins-bot: Initial configuration for bclwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655444 (https://phabricator.wikimedia.org/T270274) (owner: 10Urbanecm) [16:18:49] !log andrew@deploy1001 Started deploy [striker/deploy@ba6c0ae]: Striker deploy for T271621 [16:18:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:54] T271621: 500 Internal Server Error when creating new tool due to missing static assets - https://phabricator.wikimedia.org/T271621 [16:19:34] 10SRE, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2037.codfw.wmnet'] ` and were **ALL** successful. [16:19:54] is logstash all right? Just got `16:17:06 Check 'Logstash Error rate for mw1264.eqiad.wmnet' failed: WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='logstash1009.eqiad.wmnet', port=9200): Read timed out. (read timeout=10)",)': /logstash-*/_search` [16:20:03] Amir1: any ideas? [16:20:36] 10SRE, 10WMF-NDA-Requests: Request from WMDE employee Amrutha - https://phabricator.wikimedia.org/T271725 (10amy_rc) [16:20:51] !log andrew@deploy1001 Finished deploy [striker/deploy@ba6c0ae]: Striker deploy for T271621 (duration: 02m 02s) [16:20:53] I want to see the logs but I realized the irony [16:20:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:21:41] Urbanecm: there might be a random network partition, can you try again [16:21:55] sure [16:21:55] if it persists, we'll ping observability team [16:22:01] syncing that one again [16:22:11] it apparently didn't hit the canary treshold (3/9) at the very least [16:22:53] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating diqwiktionary (T270275) (duration: 00m 56s) [16:22:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:55] no error this time, so seems just a random fluke [16:22:56] T270275: Create Wiktionary Zazaki - https://phabricator.wikimedia.org/T270275 [16:23:04] going to the last one! [16:24:00] (03PS3) 10David Caro: cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) [16:24:15] db is created, syncing [16:24:27] (03CR) 10jerkins-bot: [V: 04-1] cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) (owner: 10David Caro) [16:25:51] !log installing openldap security updates on stretch (client tools/libs only, all slapd installation on Buster and fixed already) [16:25:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:23] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating bclwiktionary (T270274) (duration: 00m 54s) [16:26:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:27] T270274: Create Wiktionary Bikol - https://phabricator.wikimedia.org/T270274 [16:29:18] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating bclwiktionary (T270274) (duration: 00m 55s) [16:29:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:24] !log urbanecm@deploy1001 Synchronized dblists: Creating bclwiktionary (T270274) (duration: 00m 55s) [16:30:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:12] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating bclwiktionary (T270274) [16:32:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:15] T270274: Create Wiktionary Bikol - https://phabricator.wikimedia.org/T270274 [16:33:20] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: Creating bclwiktionary (T270274) (duration: 00m 56s) [16:33:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:23] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating bclwiktionary (T270274) (duration: 00m 56s) [16:34:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:36] so, that would be done Amir1 [16:34:42] just the cache update [16:34:46] Thank you Urbanecm ! [16:35:02] (03PS1) 10Urbanecm: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655454 [16:35:04] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655454 (owner: 10Urbanecm) [16:35:51] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655454 (owner: 10Urbanecm) [16:37:01] 10SRE, 10ops-codfw, 10DC-Ops, 10RESTBase: restbase2009 reimaging issues - https://phabricator.wikimedia.org/T269853 (10hnowlan) [16:37:12] !log urbanecm@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 18s) [16:37:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:34] and also beta, i doN't think someone did it in some time [16:38:08] (03PS4) 10David Caro: cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) [16:38:29] 10SRE, 10SRE-tools, 10serviceops-radar, 10Patch-For-Review: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10Volans) >>! In T270071#6736069, @akosiaris wrote: > If we could someway treat it like a node when generating the zones it would also solve the problem (I think).... [16:38:37] (03CR) 10jerkins-bot: [V: 04-1] cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) (owner: 10David Caro) [16:39:39] (03CR) 10Volans: "Nice! Thanks for looking into this Alex. I have zero context on the Ores side to have a say in the change, but appreciate the removal of a" [dns] - 10https://gerrit.wikimedia.org/r/655443 (https://phabricator.wikimedia.org/T270071) (owner: 10Alexandros Kosiaris) [16:42:01] !log andrew@deploy1001 Started deploy [striker/deploy@3180f72]: Striker deploy for T271621 [16:42:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:05] T271621: 500 Internal Server Error when creating new tool due to missing static assets - https://phabricator.wikimedia.org/T271621 [16:42:19] (03CR) 10Volans: [C: 03+1] "Looks sane to me" [homer/public] - 10https://gerrit.wikimedia.org/r/655445 (owner: 10Ayounsi) [16:42:45] (03PS5) 10David Caro: cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) [16:43:02] !log andrew@deploy1001 Finished deploy [striker/deploy@3180f72]: Striker deploy for T271621 (duration: 01m 01s) [16:43:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:16] (03CR) 10jerkins-bot: [V: 04-1] cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) (owner: 10David Caro) [16:43:39] (03PS1) 10Urbanecm: Update interwiki cache for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655462 [16:43:42] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655462 (owner: 10Urbanecm) [16:44:47] (03Merged) 10jenkins-bot: Update interwiki cache for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655462 (owner: 10Urbanecm) [16:48:05] !log Create new wiki window is completed [16:48:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:22] (03CR) 10Volans: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/654418 (owner: 10Jbond) [16:48:37] jouncebot: refresh [16:48:37] I refreshed my knowledge about deployments. [16:49:42] 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Kanban): (Need By: TBD) rack/setup/install cloudgw2001-dev.codfw.wmnet - https://phabricator.wikimedia.org/T271590 (10RobH) [16:50:36] (03PS1) 10Jbond: lvm: Always force vgremoval [puppet] - 10https://gerrit.wikimedia.org/r/655465 [16:51:24] 10SRE, 10Inuka-Team, 10SRE-Access-Requests, 10Security-Team, 10Product-Analytics (Kanban): Provide raw KaiOSAppFeedback data to Chelsea Riley for analysis - https://phabricator.wikimedia.org/T271202 (10nshahquinn-wmf) Thank you for giving detailed feedback so quickly, everyone! I was expecting that I wou... [16:52:21] (03CR) 10Jbond: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/654216 (https://phabricator.wikimedia.org/T271099) (owner: 10Jbond) [16:53:28] (03PS6) 10Jbond: cfssl: helper script [puppet] - 10https://gerrit.wikimedia.org/r/654418 [16:53:35] (03CR) 10Jbond: cfssl: helper script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/654418 (owner: 10Jbond) [16:55:47] (03CR) 10Volans: [C: 03+1] "ship it!" [puppet] - 10https://gerrit.wikimedia.org/r/654418 (owner: 10Jbond) [16:56:22] (03CR) 10Jbond: [C: 03+2] cfssl: helper script [puppet] - 10https://gerrit.wikimedia.org/r/654418 (owner: 10Jbond) [17:00:13] 10SRE, 10Inuka-Team, 10SRE-Access-Requests, 10Security-Team, 10Product-Analytics (Kanban): Provide raw KaiOSAppFeedback data to Chelsea Riley for analysis - https://phabricator.wikimedia.org/T271202 (10Dzahn) I would suggest to use GPG to send an encrypted file as an email attachment. But then she still... [17:10:15] !log andrew@deploy1001 Started deploy [striker/deploy@b6441b8]: Striker deploy for T271621 [17:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:22] T271621: 500 Internal Server Error when creating new tool due to missing static assets - https://phabricator.wikimedia.org/T271621 [17:12:21] !log andrew@deploy1001 Finished deploy [striker/deploy@b6441b8]: Striker deploy for T271621 (duration: 02m 05s) [17:12:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:13:18] !log andrew@deploy1001 Started deploy [striker/deploy@b6441b8]: Striker deploy for T271621 [17:13:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:15:17] !log andrew@deploy1001 Finished deploy [striker/deploy@b6441b8]: Striker deploy for T271621 (duration: 01m 59s) [17:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:41] (03CR) 10Dzahn: [C: 03+1] visualdiff: Switch require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/655183 (https://phabricator.wikimedia.org/T266479) (owner: 10Legoktm) [17:19:38] (03PS1) 10David Caro: wmcs.backups: Fix leftover param when removing image backup [puppet] - 10https://gerrit.wikimedia.org/r/655474 [17:20:38] (03CR) 10Andrew Bogott: [C: 03+1] wmcs.backups: Fix leftover param when removing image backup [puppet] - 10https://gerrit.wikimedia.org/r/655474 (owner: 10David Caro) [17:20:58] 10SRE, 10Inuka-Team, 10SRE-Access-Requests, 10Security-Team, 10Product-Analytics (Kanban): Provide raw KaiOSAppFeedback data to Chelsea Riley for analysis - https://phabricator.wikimedia.org/T271202 (10Dzahn) also T269519 might be related [17:21:35] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] wmcs.backups: Fix leftover param when removing image backup [puppet] - 10https://gerrit.wikimedia.org/r/655474 (owner: 10David Caro) [17:28:20] (03PS2) 10David Caro: wmcs.backups: Fix missing param when removing image backup [puppet] - 10https://gerrit.wikimedia.org/r/655474 [17:28:33] 10SRE, 10Inuka-Team, 10SRE-Access-Requests, 10Security-Team, 10Product-Analytics (Kanban): Provide raw KaiOSAppFeedback data to Chelsea Riley for analysis - https://phabricator.wikimedia.org/T271202 (10Ottomata) Probably not helpful, but users can now access Superset to query Presto and Druid via SQL wit... [17:28:38] (03PS3) 10SBassett: Enable StopForumSpam on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521233 (https://phabricator.wikimedia.org/T181217) (owner: 10Reedy) [17:30:03] (03CR) 10David Caro: [C: 03+2] wmcs.backups: Fix missing param when removing image backup [puppet] - 10https://gerrit.wikimedia.org/r/655474 (owner: 10David Caro) [17:34:52] PROBLEM - Ensure local MW versions match expected deployment on deploy1002 is CRITICAL: CRITICAL: Missing 9 sites from wikiversions. 957 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [17:36:04] ^ related to the "create new wikis" deploy window that just ended, I assume [17:36:37] seems like it, yea [17:36:45] is that "it's expected to be out of alignment briefly" or did something get missed? [17:37:16] (03PS1) 10Alexandros Kosiaris: Reapply Ib32827c9433f362600643666f0156fed567fb3bc [puppet] - 10https://gerrit.wikimedia.org/r/655475 (https://phabricator.wikimedia.org/T271099) [17:37:24] rzl: deploy1002 is the new server [17:37:29] ahh of course yeah [17:37:30] i should handle it [17:37:34] by doing a scap pull [17:37:35] I missed that, thank you [17:37:38] hold on [17:38:23] I need to extend my downtimes, sorry about even letting it alert [17:38:40] or.. it's because somebody starting initializing scap [17:38:48] which is what I was hoping for :) [17:39:00] !log deploy1002 - scap pull [17:39:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:26] PROBLEM - Ensure local MW versions match expected deployment on deploy2002 is CRITICAL: CRITICAL: Missing 9 sites from wikiversions. 957 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [17:39:50] same in codfw, not the active one [17:40:08] !log deploy2002 - scap pull [17:40:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:32] (03PS6) 10David Caro: cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) [17:45:41] (03PS6) 10Razzi: Add cookbook for rebooting druid nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) [17:45:51] (03PS4) 10SBassett: Enable StopForumSpam on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521233 (https://phabricator.wikimedia.org/T181217) (owner: 10Reedy) [17:46:06] RECOVERY - Ensure local MW versions match expected deployment on deploy1002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [17:46:07] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on deploy1002.eqiad.wmnet with reason: new install on buster [17:46:07] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on deploy1002.eqiad.wmnet with reason: new install on buster [17:46:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:10] (03CR) 10jerkins-bot: [V: 04-1] cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) (owner: 10David Caro) [17:46:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:21] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on deploy2002.codfw.wmnet with reason: new install on buster [17:46:22] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on deploy2002.codfw.wmnet with reason: new install on buster [17:46:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:44] (03PS7) 10David Caro: cloud.encapi: enable ssl nginx vhost [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) [17:48:12] (03CR) 10David Caro: cloud.encapi: enable ssl nginx vhost (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/654419 (https://phabricator.wikimedia.org/T268877) (owner: 10David Caro) [17:48:22] !log manually removing watchlist rows for Dexbot in Wikidata [17:48:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:48:23] rzl: downtimes renewed for "*-2" deployment servers for another 2 weeks. don't have a timeline yet when they will be switched but they should not alert [17:48:35] 👍 thanks! [17:50:42] RECOVERY - Ensure local MW versions match expected deployment on deploy2002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [17:52:47] rzl: and in general, that kind of alert does get fixed by a local 'scap pull' to get code in sync, pulling from the actual deployment server (but there are other alerts) [17:52:56] nod [17:53:16] (03CR) 10Razzi: [C: 03+1] hadoop: Migrate hiera() to lookup() and setting datatype [puppet] - 10https://gerrit.wikimedia.org/r/655073 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [17:53:28] 400k rows removed from watchlist in wikidatawiki [17:54:26] (03CR) 10Reedy: [C: 03+2] Enable StopForumSpam on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521233 (https://phabricator.wikimedia.org/T181217) (owner: 10Reedy) [17:55:27] (03Merged) 10jenkins-bot: Enable StopForumSpam on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521233 (https://phabricator.wikimedia.org/T181217) (owner: 10Reedy) [17:57:41] 10SRE, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) @Reedy "stretch -> buster" and "PHP 7.2 -> PHP 7.3" are separated deliberately. So yes, they should also be a... [17:57:52] !log reedy@deploy1001 Synchronized wmf-config/extension-list: T181217 (duration: 00m 56s) [17:57:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:57:56] T181217: Deploy StopForumSpam to the Beta Cluster - https://phabricator.wikimedia.org/T181217 [17:58:01] (03CR) 10Bstorm: [C: 03+2] wikireplicas: enable standby behavior on multiinstance proxies [puppet] - 10https://gerrit.wikimedia.org/r/655174 (https://phabricator.wikimedia.org/T271476) (owner: 10Bstorm) [17:58:15] (03PS1) 10Andrew Bogott: acme-chief designate-sync.py: set ttl to 0 for txt records [puppet] - 10https://gerrit.wikimedia.org/r/655476 [17:59:42] (03CR) 10Razzi: [C: 03+1] hive: Migrate hiera() to lookup() and setting datatype in serve [puppet] - 10https://gerrit.wikimedia.org/r/655065 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [18:00:03] !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: T181217 (duration: 00m 57s) [18:00:05] ryankemper: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210111T1800). [18:00:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:26] (03CR) 10Razzi: [C: 03+1] cache: Migrate hiera() to lookup() and setting datatype in statsv [puppet] - 10https://gerrit.wikimedia.org/r/655193 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [18:01:10] !log reedy@deploy1001 Synchronized wmf-config/CommonSettings-labs.php: T181217 (duration: 00m 56s) [18:01:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:01:24] (03CR) 10Elukey: [C: 03+2] cache: Migrate hiera() to lookup() and setting datatype in statsv [puppet] - 10https://gerrit.wikimedia.org/r/655193 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [18:01:35] (03CR) 10Bstorm: "Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Functi" [puppet] - 10https://gerrit.wikimedia.org/r/655174 (https://phabricator.wikimedia.org/T271476) (owner: 10Bstorm) [18:01:59] (03PS1) 10Bstorm: Revert "wikireplicas: enable standby behavior on multiinstance proxies" [puppet] - 10https://gerrit.wikimedia.org/r/655390 [18:02:12] 10SRE, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Reedy) [18:02:22] (03CR) 10Elukey: [C: 03+2] hadoop: Migrate hiera() to lookup() and setting datatype [puppet] - 10https://gerrit.wikimedia.org/r/655073 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [18:02:50] (03CR) 10Elukey: [C: 03+2] hive: Migrate hiera() to lookup() and setting datatype in serve [puppet] - 10https://gerrit.wikimedia.org/r/655065 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [18:02:53] 10SRE, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Reedy) [18:14:35] 10SRE, 10Inuka-Team, 10SRE-Access-Requests, 10Security-Team, 10Product-Analytics (Kanban): Provide raw KaiOSAppFeedback data to Chelsea Riley for analysis - https://phabricator.wikimedia.org/T271202 (10Platonides) On the topic of ssh accesses, there shouldn't be a "big headache of using the command line"... [18:16:19] (03CR) 10Elukey: "Letf minor comments, please don't hate me, we are close!" (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) (owner: 10Razzi) [18:17:41] (03PS1) 10CDanis: Environment variables are strings. Convert types where it matters. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655478 [18:17:45] (03PS1) 10CDanis: Correctly handle team_ids being an empty string (no filter). [software/klaxon] - 10https://gerrit.wikimedia.org/r/655479 [18:17:48] (03PS1) 10CDanis: Parse the team_ids filter from an environment variable. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 [18:18:42] (03CR) 10jerkins-bot: [V: 04-1] Environment variables are strings. Convert types where it matters. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655478 (owner: 10CDanis) [18:18:51] (03CR) 10jerkins-bot: [V: 04-1] Parse the team_ids filter from an environment variable. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 (owner: 10CDanis) [18:19:05] (03CR) 10jerkins-bot: [V: 04-1] Correctly handle team_ids being an empty string (no filter). [software/klaxon] - 10https://gerrit.wikimedia.org/r/655479 (owner: 10CDanis) [18:19:55] (03PS3) 10Jforrester: loginwiki: Allow users to mark Notifications as read [mediawiki-config] - 10https://gerrit.wikimedia.org/r/632598 (https://phabricator.wikimedia.org/T264834) [18:19:58] (03PS4) 10Jforrester: loginwiki: Allow users to mark Notifications as read [mediawiki-config] - 10https://gerrit.wikimedia.org/r/632598 (https://phabricator.wikimedia.org/T264834) [18:20:46] (03PS2) 10CDanis: Environment variables are strings. Convert types where it matters. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655478 [18:20:50] (03PS2) 10CDanis: Correctly handle team_ids being an empty string (no filter). [software/klaxon] - 10https://gerrit.wikimedia.org/r/655479 [18:20:53] (03PS2) 10CDanis: Parse the team_ids filter from an environment variable. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 [18:26:36] PROBLEM - Check systemd state on stat1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:26:56] (03PS1) 10CDanis: klaxon: allow setting of team IDs filter for pages to show [puppet] - 10https://gerrit.wikimedia.org/r/655485 [18:27:16] 10SRE, 10fundraising-tech-ops: hw troubleshooting: Illegal opcode error on boot for frdb1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T271284 (10Dwisehaupt) 05Open→03Resolved Going to close this. Due to the age of the host and replacement host in place, we are just going to decommission this h... [18:29:34] 10SRE, 10fundraising-tech-ops: hw troubleshooting: Illegal opcode error on boot for frdb1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T271284 (10RhinosF1) > also put the system as failed in netbox. Should that be updated on the checklist at the top? [18:29:49] (03PS1) 10Dwisehaupt: Decommission frdb1001 [puppet] - 10https://gerrit.wikimedia.org/r/655487 (https://phabricator.wikimedia.org/T271739) [18:32:35] 10SRE, 10fundraising-tech-ops: hw troubleshooting: Illegal opcode error on boot for frdb1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T271284 (10Dwisehaupt) [18:33:11] dwisehaupt: thanks :) apologies but things like that I can't unsee [18:33:12] (03CR) 10RLazarus: Correctly handle team_ids being an empty string (no filter). (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/655479 (owner: 10CDanis) [18:33:38] 10SRE, 10fundraising-tech-ops: hw troubleshooting: Illegal opcode error on boot for frdb1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T271284 (10Dwisehaupt) Thanks. I fixed that for clarity in the future. Less of an issue since it will be decommissioned, but good for history. [18:33:39] (03CR) 10Jgreen: [C: 03+2] Decommission frdb1001 [puppet] - 10https://gerrit.wikimedia.org/r/655487 (https://phabricator.wikimedia.org/T271739) (owner: 10Dwisehaupt) [18:34:15] ha, not a problem. :) [18:34:23] (03CR) 10RLazarus: [C: 03+1] "Sorry for not catching this the first time." [software/klaxon] - 10https://gerrit.wikimedia.org/r/655478 (owner: 10CDanis) [18:34:39] (03CR) 10CDanis: Correctly handle team_ids being an empty string (no filter). (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/655479 (owner: 10CDanis) [18:34:46] (03PS3) 10CDanis: Correctly handle team_ids being an empty string (no filter). [software/klaxon] - 10https://gerrit.wikimedia.org/r/655479 [18:34:50] (03PS3) 10CDanis: Parse the team_ids filter from an environment variable. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 [18:40:55] (03CR) 10RLazarus: [C: 03+1] Parse the team_ids filter from an environment variable. (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 (owner: 10CDanis) [18:42:20] (03PS1) 10Dwisehaupt: Decommission frdb1001 [dns] - 10https://gerrit.wikimedia.org/r/655489 (https://phabricator.wikimedia.org/T271739) [18:46:54] (03CR) 10RLazarus: [C: 03+1] Correctly handle team_ids being an empty string (no filter). [software/klaxon] - 10https://gerrit.wikimedia.org/r/655479 (owner: 10CDanis) [18:48:41] !log dpifke@deploy1001 Started deploy [performance/arc-lamp@6bbac6d]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/650277 [18:48:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:46] !log dpifke@deploy1001 Finished deploy [performance/arc-lamp@6bbac6d]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/650277 (duration: 00m 04s) [18:48:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:42] 10SRE, 10WVUI: Import npm 6.14.8 to buster dist. on apt.wikimedia.org - https://phabricator.wikimedia.org/T270321 (10ovasileva) @jbond - do you have a rough estimate on when this task will be completed? We're planning for the deployment of WVUI and wanted to sketch out the timeline. [18:50:48] (03CR) 10Jgreen: [C: 03+2] Decommission frdb1001 [dns] - 10https://gerrit.wikimedia.org/r/655489 (https://phabricator.wikimedia.org/T271739) (owner: 10Dwisehaupt) [18:53:54] (03CR) 10Razzi: Add cookbook for rebooting druid nodes (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) (owner: 10Razzi) [18:54:01] (03PS7) 10Razzi: Add cookbook for rebooting druid nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/651636 (https://phabricator.wikimedia.org/T269596) [18:54:31] (03CR) 10Bstorm: [C: 03+2] "Going to revert this for now while I find the quirk." [puppet] - 10https://gerrit.wikimedia.org/r/655390 (owner: 10Bstorm) [18:55:09] (03PS4) 10CDanis: Parse the team_ids filter from an environment variable. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 [18:55:11] (03CR) 10CDanis: Parse the team_ids filter from an environment variable. (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 (owner: 10CDanis) [18:56:23] (03CR) 10RLazarus: [C: 03+1] "As discussed: Thanks for doing what I meant, not what I said." [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 (owner: 10CDanis) [18:56:44] 10SRE, 10ops-eqiad, 10DC-Ops, 10decommission-hardware, and 2 others: decommission frdb1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T271739 (10Dwisehaupt) a:03Cmjohnson [18:57:34] 10SRE, 10ops-eqiad, 10DC-Ops, 10decommission-hardware, and 2 others: decommission frdb1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T271739 (10Dwisehaupt) All steps complete from the frack side. Passing on to @Cmjohnson for onsite portion. [18:58:06] (03CR) 10CDanis: [C: 03+2] Parse the team_ids filter from an environment variable. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 (owner: 10CDanis) [18:58:10] (03CR) 10CDanis: [C: 03+2] Correctly handle team_ids being an empty string (no filter). [software/klaxon] - 10https://gerrit.wikimedia.org/r/655479 (owner: 10CDanis) [18:58:12] (03CR) 10CDanis: [C: 03+2] Environment variables are strings. Convert types where it matters. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655478 (owner: 10CDanis) [18:59:14] (03Merged) 10jenkins-bot: Environment variables are strings. Convert types where it matters. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655478 (owner: 10CDanis) [18:59:24] (03Merged) 10jenkins-bot: Correctly handle team_ids being an empty string (no filter). [software/klaxon] - 10https://gerrit.wikimedia.org/r/655479 (owner: 10CDanis) [18:59:26] (03Merged) 10jenkins-bot: Parse the team_ids filter from an environment variable. [software/klaxon] - 10https://gerrit.wikimedia.org/r/655480 (owner: 10CDanis) [19:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor I � Unicode. All rise for Morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210111T1900). [19:00:04] No GERRIT patches in the queue for this window AFAICS. [19:14:16] !log start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T270417 T270413 T270279) [19:14:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:22] T270417: Add Wikidata support for niawiki - https://phabricator.wikimedia.org/T270417 [19:14:22] T270279: Add Wikidata support for diqwiktionary - https://phabricator.wikimedia.org/T270279 [19:14:22] T270413: Add Wikidata support for niawiktionary - https://phabricator.wikimedia.org/T270413 [19:25:59] (03PS1) 10Jforrester: Don't pass protocol-relative URLs to the Ace worker [extensions/AbuseFilter] (wmf/1.36.0-wmf.25) - 10https://gerrit.wikimedia.org/r/655394 (https://phabricator.wikimedia.org/T271487) [19:29:44] (03PS1) 10Razzi: kafka-test: Mirror eventlogging_SearchSatisfaction topic [puppet] - 10https://gerrit.wikimedia.org/r/655494 (https://phabricator.wikimedia.org/T268074) [19:34:39] (03CR) 10Ottomata: [C: 03+1] kafka-test: Mirror eventlogging_SearchSatisfaction topic [puppet] - 10https://gerrit.wikimedia.org/r/655494 (https://phabricator.wikimedia.org/T268074) (owner: 10Razzi) [19:43:51] !log end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T270417 T270413 T270279) [19:43:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:43:56] T270417: Add Wikidata support for niawiki - https://phabricator.wikimedia.org/T270417 [19:43:56] T270279: Add Wikidata support for diqwiktionary - https://phabricator.wikimedia.org/T270279 [19:43:57] T270413: Add Wikidata support for niawiktionary - https://phabricator.wikimedia.org/T270413 [19:45:24] (03CR) 10Razzi: [C: 03+2] kafka-test: Mirror eventlogging_SearchSatisfaction topic [puppet] - 10https://gerrit.wikimedia.org/r/655494 (https://phabricator.wikimedia.org/T268074) (owner: 10Razzi) [19:45:43] (03CR) 10CRusnov: "I have tested this change against failing dumps on netbox1001 and it fixes the issue." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/655496 (https://phabricator.wikimedia.org/T271696) (owner: 10CRusnov) [19:46:22] (03PS2) 10CRusnov: tools/rotatedump: Constrain dump cleanup to generated directories [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/655496 (https://phabricator.wikimedia.org/T271696) [19:46:46] 10SRE, 10WVUI: Import npm 6.14.8 to buster dist. on apt.wikimedia.org - https://phabricator.wikimedia.org/T270321 (10jbond) @ovasileva sorry for the delay i had meant to catch up with @MoritzMuehlenhoff but got missed during the holidays. will follow up and provide a more concrete update tomorrow. [19:49:44] (03PS1) 10Ottomata: Migrate UniversalLanguageSelector to Event Platform [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655497 (https://phabricator.wikimedia.org/T267352) [19:51:34] (03CR) 10Ottomata: [C: 03+2] Migrate UniversalLanguageSelector to Event Platform [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655497 (https://phabricator.wikimedia.org/T267352) (owner: 10Ottomata) [19:52:30] (03PS3) 10CRusnov: tools/rotatedump: Constrain dump maintenance to automatically generated files [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/655496 (https://phabricator.wikimedia.org/T271696) [19:53:03] !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Migrate UniversalLanguageSelector to Event Platform - T268517 (duration: 00m 57s) [19:53:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:06] T268517: Migrate Anti-Harassment EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T268517 [20:01:05] (03PS1) 10Bstorm: wikireplicas: enable standby behavior on multiinstance proxies [puppet] - 10https://gerrit.wikimedia.org/r/655498 (https://phabricator.wikimedia.org/T271476) [20:01:12] (03Abandoned) 10Thcipriani: Install docker on releases-jenkins [puppet] - 10https://gerrit.wikimedia.org/r/474825 (https://phabricator.wikimedia.org/T208529) (owner: 10Thcipriani) [20:05:42] (03PS1) 10Dduvall: releases: Provide docker to PipelineLib based jobs [puppet] - 10https://gerrit.wikimedia.org/r/655500 (https://phabricator.wikimedia.org/T271477) [20:13:06] 10SRE, 10SRE-Access-Requests: Requesting access to contint-roots for dduvall - https://phabricator.wikimedia.org/T271746 (10dduvall) [20:14:33] (03CR) 10Dduvall: releases: Provide docker to PipelineLib based jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/655500 (https://phabricator.wikimedia.org/T271477) (owner: 10Dduvall) [20:20:28] 10SRE, 10Dumps-Generation, 10SRE-Access-Requests, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Add all of CPT to snapshot/dumpsdata admins - https://phabricator.wikimedia.org/T271718 (10AMooney) As a manager on the platform engineering team, I approve this change. [20:22:28] (03CR) 10Jforrester: releases: Provide docker to PipelineLib based jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/655500 (https://phabricator.wikimedia.org/T271477) (owner: 10Dduvall) [20:36:50] 10SRE, 10SRE-Access-Requests: Requesting access to contint-roots for dduvall - https://phabricator.wikimedia.org/T271746 (10thcipriani) > Name of approving party (hiring manager for WMF staff): @thcipriani Approved [20:47:29] !log authdns2001 - upgrade gdnsd to 3.5.0 package [20:47:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:53:53] (03PS1) 10SBassett: Temporarily disable StopForumSpam on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655503 (https://phabricator.wikimedia.org/T271740) [20:55:48] (03CR) 10SBassett: [C: 03+2] Temporarily disable StopForumSpam on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655503 (https://phabricator.wikimedia.org/T271740) (owner: 10SBassett) [20:56:47] (03Merged) 10jenkins-bot: Temporarily disable StopForumSpam on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655503 (https://phabricator.wikimedia.org/T271740) (owner: 10SBassett) [21:00:04] chrisalbon and accraze: #bothumor I � Unicode. All rise for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210111T2100). [21:02:02] !log dns4002 - upgrade gdnsd to 3.5.0 package [21:02:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:55] (03PS1) 10Dzahn: update changelog for dropping ServerAdmin config [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/655506 (https://phabricator.wikimedia.org/T251005) [21:10:10] (03PS1) 10Andrew Bogott: Remove toolserver.org zone [dns] - 10https://gerrit.wikimedia.org/r/655507 [21:11:32] (03CR) 10Dzahn: [C: 03+2] update changelog for dropping ServerAdmin config [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/655506 (https://phabricator.wikimedia.org/T251005) (owner: 10Dzahn) [21:13:25] (03CR) 10Dzahn: [V: 03+2 C: 03+2] update changelog for dropping ServerAdmin config [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/655506 (https://phabricator.wikimedia.org/T251005) (owner: 10Dzahn) [21:13:32] (03CR) 10Andrew Bogott: "We have another 6-8 hours of TTL before it's safe to merge this." [dns] - 10https://gerrit.wikimedia.org/r/655507 (owner: 10Andrew Bogott) [21:20:53] !log docker images - [deneb:/srv/images/production-images] $ sudo -i build-production-images [21:20:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:23:07] (03CR) 10Dzahn: "[deneb:/srv/images/production-images] $ sudo -i build-production-images" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/655506 (https://phabricator.wikimedia.org/T251005) (owner: 10Dzahn) [21:26:55] 10SRE, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Stop advertising webmaster@wikimedia.org in apache configs - https://phabricator.wikimedia.org/T251005 (10Dzahn) built docker prod images on build host: ` [deneb:/srv/images/production-images] $ sudo -i build-production-images == Ste... [21:30:41] (03CR) 10Bstorm: [C: 03+2] "Reviewed at Iddd46a82a7505f95e9caee893 and corrected here. The PCC looks pretty good so going to try merge." [puppet] - 10https://gerrit.wikimedia.org/r/655498 (https://phabricator.wikimedia.org/T271476) (owner: 10Bstorm) [21:39:45] (03PS1) 10Dzahn: profile::base: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/655508 (https://phabricator.wikimedia.org/T209953) [21:41:16] (03CR) 10jerkins-bot: [V: 04-1] profile::base: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/655508 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [21:44:42] (03PS1) 10Bstorm: wikireplicas: one last tweak to merge the hashes right [puppet] - 10https://gerrit.wikimedia.org/r/655509 (https://phabricator.wikimedia.org/T271476) [21:47:08] (03PS2) 10Dzahn: profile::base: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/655508 (https://phabricator.wikimedia.org/T209953) [21:54:14] (03PS1) 10Dzahn: pybal::testing: convert to role/profile, hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/655511 (https://phabricator.wikimedia.org/T209953) [21:55:03] (03PS2) 10Bstorm: wikireplicas: one last tweak to merge the hashes right [puppet] - 10https://gerrit.wikimedia.org/r/655509 (https://phabricator.wikimedia.org/T271476) [21:58:49] !log deleting watchlist enteries of Fawikibot in fawiki (1.1M rows) [21:58:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:59:01] in batches of 10K [21:59:09] in batches of 10K [22:00:04] Reedy and sbassett: I, the Bot under the Fountain, allow thee, The Deployer, to do Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210111T2200). [22:00:09] (03PS1) 10SBassett: Add a monolog channel for StopForumSpam [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655512 (https://phabricator.wikimedia.org/T271755) [22:04:15] (03PS1) 10Dzahn: kubernetes::master: add data types, hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/655513 (https://phabricator.wikimedia.org/T209953) [22:04:17] (03PS3) 10Bstorm: wikireplicas: one last tweak to merge the hashes right [puppet] - 10https://gerrit.wikimedia.org/r/655509 (https://phabricator.wikimedia.org/T271476) [22:05:01] (03PS2) 10SBassett: Add a monolog channel for StopForumSpam [mediawiki-config] - 10https://gerrit.wikimedia.org/r/655512 (https://phabricator.wikimedia.org/T271755) [22:06:10] (03CR) 10jerkins-bot: [V: 04-1] kubernetes::master: add data types, hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/655513 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:06:25] (03PS1) 10Dzahn: discovery:client: hiera->lookup, add data type [puppet] - 10https://gerrit.wikimedia.org/r/655515 (https://phabricator.wikimedia.org/T209953) [22:07:27] (03PS4) 10Bstorm: wikireplicas: one last tweak to merge the hashes right [puppet] - 10https://gerrit.wikimedia.org/r/655509 (https://phabricator.wikimedia.org/T271476) [22:09:21] (03CR) 10Bstorm: [C: 03+2] wikireplicas: one last tweak to merge the hashes right [puppet] - 10https://gerrit.wikimedia.org/r/655509 (https://phabricator.wikimedia.org/T271476) (owner: 10Bstorm) [22:09:43] (03CR) 10Ladsgroup: "base is scary. We can run "check experimental" to run PCC on the fleet and see the result (usually takes 40 minutes.) Does that make sense" [puppet] - 10https://gerrit.wikimedia.org/r/655508 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:11:28] (03CR) 10Dzahn: "yes, it does. same result is using the compiler form and just leaving the list of nodes blank or *" [puppet] - 10https://gerrit.wikimedia.org/r/655508 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:12:19] (03PS1) 10Dzahn: monitoring::host: move hostgroup_default to params, hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/655516 (https://phabricator.wikimedia.org/T209953) [22:12:31] (03CR) 10Dzahn: "easy one" [puppet] - 10https://gerrit.wikimedia.org/r/655515 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:14:18] (03CR) 10jerkins-bot: [V: 04-1] monitoring::host: move hostgroup_default to params, hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/655516 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:18:09] (03CR) 10Bstorm: [C: 03+2] wikireplicas: fix up wmf-pt-kill service on multiinstance replicas [puppet] - 10https://gerrit.wikimedia.org/r/654890 (https://phabricator.wikimedia.org/T260511) (owner: 10Bstorm) [22:18:51] (03CR) 10Ladsgroup: [C: 03+1] discovery:client: hiera->lookup, add data type [puppet] - 10https://gerrit.wikimedia.org/r/655515 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:20:12] (03PS2) 10Dzahn: kubernetes::master: add data types, hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/655513 (https://phabricator.wikimedia.org/T209953) [22:21:54] (03CR) 10Ladsgroup: "Started: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27421/console" [puppet] - 10https://gerrit.wikimedia.org/r/655508 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:22:14] (03PS2) 10Dzahn: monitoring::host: move hostgroup_default to params, hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/655516 (https://phabricator.wikimedia.org/T209953) [22:26:40] (03PS1) 10Dzahn: redis::slave: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/655518 (https://phabricator.wikimedia.org/T209953) [22:31:19] (03PS1) 10Dzahn: icinga::elastic: require_package->ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/655519 (https://phabricator.wikimedia.org/T266479) [22:33:37] (03PS1) 10Dzahn: docker: require_package->ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/655520 (https://phabricator.wikimedia.org/T266479) [22:35:09] (03PS1) 10Dzahn: bird: require_package->ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/655522 (https://phabricator.wikimedia.org/T266479) [22:41:39] RECOVERY - Widespread puppet agent failures on alert1001 is OK: (C)0.01 ge (W)0.006 ge 0.005583 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [23:36:26] (03PS1) 10Dzahn: bump version to 2.4.38-2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/655528 [23:40:51] (03CR) 10Legoktm: [C: 03+1] bump version to 2.4.38-2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/655528 (owner: 10Dzahn) [23:42:24] (03PS2) 10Dzahn: bump version to 2.4.38-2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/655528 (https://phabricator.wikimedia.org/T251005) [23:46:22] (03CR) 10Dzahn: [V: 03+2 C: 03+2] bump version to 2.4.38-2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/655528 (https://phabricator.wikimedia.org/T251005) (owner: 10Dzahn) [23:47:51] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [23:59:23] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5009 is OK: HTTP OK: HTTP/1.0 200 OK - 23556 bytes in 0.728 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server