[04:11:29] 10Operations, 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests: LDAP access to the wmf group for Segun Oworu (superset, turnilo, hue) - https://phabricator.wikimedia.org/T252703 (10Nuria) links: https://superset.wikimedia.org and https://turnilo.wikimedia.org [05:00:39] !log Stop MySQL on labsdb1011 to copy its content to backup1001 T249188 [05:00:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:43] T249188: Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 [05:03:23] ACKNOWLEDGEMENT - haproxy failover on dbproxy1018 is CRITICAL: CRITICAL check_failover servers up 1 down 1 Marostegui expected https://wikitech.wikimedia.org/wiki/HAProxy [05:19:51] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 - https://phabricator.wikimedia.org/T251639 (10Marostegui) This looks good on all the hosts: ` ----- OUTPUT of 'df -hT /srv' ----- Filesystem Type Size Used Avail Use%... [05:26:06] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Make kafka metrics also report the topic [software/purged] - 10https://gerrit.wikimedia.org/r/596614 (owner: 10Giuseppe Lavagetto) [05:54:03] <_joe_> !log uploaded purged 0.12 to apt.w.o [05:54:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:55:00] (03PS1) 10KartikMistry: Make the threshold for Chinese WP to prevent publishing 5% more strict [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596916 (https://phabricator.wikimedia.org/T252786) [05:55:07] <_joe_> !log installing purged 0.12 on cp2027 [05:55:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:56:33] (03PS1) 10Marostegui: parsercache.my.cnf: Disable rowfilter as we do on core hosts [puppet] - 10https://gerrit.wikimedia.org/r/596917 (https://phabricator.wikimedia.org/T250666) [05:58:04] (03CR) 10Marostegui: [C: 03+2] parsercache.my.cnf: Disable rowfilter as we do on core hosts [puppet] - 10https://gerrit.wikimedia.org/r/596917 (https://phabricator.wikimedia.org/T250666) (owner: 10Marostegui) [06:14:38] (03PS1) 10Marostegui: mariadb: Set innodb_purge_threads to 1 [puppet] - 10https://gerrit.wikimedia.org/r/596918 (https://phabricator.wikimedia.org/T250666) [06:16:25] (03CR) 10Marostegui: [C: 03+2] mariadb: Set innodb_purge_threads to 1 [puppet] - 10https://gerrit.wikimedia.org/r/596918 (https://phabricator.wikimedia.org/T250666) (owner: 10Marostegui) [06:21:49] 10Operations, 10Core Platform Team, 10MediaWiki-General, 10serviceops, and 2 others: Revisit timeouts, concurrency limits in remote HTTP calls from MediaWiki - https://phabricator.wikimedia.org/T245170 (10tstarling) Caller survey: * ServiceWiring.php VirtualRESTServiceClient * Although it makes a MultiHt... [06:24:52] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2088 for upgrade', diff saved to https://phabricator.wikimedia.org/P11213 and previous config saved to /var/cache/conftool/dbconfig/20200518-062452-marostegui.json [06:24:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:26:08] (03PS3) 10Marostegui: install: Disable reimage of db114[1-9], db213[6-9] and db2140 [puppet] - 10https://gerrit.wikimedia.org/r/596650 (https://phabricator.wikimedia.org/T252512) (owner: 10Jcrespo) [06:28:01] (03CR) 10Marostegui: [C: 03+2] install: Disable reimage of db114[1-9], db213[6-9] and db2140 [puppet] - 10https://gerrit.wikimedia.org/r/596650 (https://phabricator.wikimedia.org/T252512) (owner: 10Jcrespo) [06:29:59] PROBLEM - Check systemd state on ores1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:30:54] (03PS1) 10Marostegui: install_server: Reimage db2088 [puppet] - 10https://gerrit.wikimedia.org/r/596964 (https://phabricator.wikimedia.org/T250666) [06:32:22] (03CR) 10Marostegui: [C: 03+2] install_server: Reimage db2088 [puppet] - 10https://gerrit.wikimedia.org/r/596964 (https://phabricator.wikimedia.org/T250666) (owner: 10Marostegui) [06:35:21] (03PS1) 10Marostegui: install_server: Reimage db2088 as Buster [puppet] - 10https://gerrit.wikimedia.org/r/596996 (https://phabricator.wikimedia.org/T250666) [06:36:23] (03CR) 10Marostegui: [C: 03+2] install_server: Reimage db2088 as Buster [puppet] - 10https://gerrit.wikimedia.org/r/596996 (https://phabricator.wikimedia.org/T250666) (owner: 10Marostegui) [06:41:27] !log Stop MySQL on db2088 [06:41:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:21] PROBLEM - ores_workers_running on ores1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES [06:52:23] RECOVERY - Check systemd state on ores1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:58:17] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [06:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:00:42] 10Operations, 10Mail: Forwarding or alias for fundraising@ - https://phabricator.wikimedia.org/T252932 (10Dzahn) Hello, fundraising@ is in Google ` fundraising@wikimedia.org router = ldap_account, transport = remote_smtp host aspmx.l.google.com [173.194.204.27] ` The following more or less related ali... [07:00:46] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [07:00:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:05:09] 10Operations, 10Mail: Forwarding or alias for fundraising@ - https://phabricator.wikimedia.org/T252932 (10Dzahn) jimmy@ and katherine@ are in Google: jimmy@ has been removed from our side as an urgent request from JKrauska back in 2016 in T123315. ` jimmy@wikimedia.org router = ldap_group, transport = rem... [07:06:07] 10Operations, 10Mail, 10Epic: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144 (10Dzahn) [07:06:10] 10Operations, 10Mail: Forwarding or alias for fundraising@ - https://phabricator.wikimedia.org/T252932 (10Dzahn) [07:06:19] 10Operations, 10Mail: Forwarding or alias for fundraising@ - https://phabricator.wikimedia.org/T252932 (10Dzahn) p:05Triage→03Medium [07:11:52] 10Operations, 10Research: Add Git LFS support for research/wikiworkshop - https://phabricator.wikimedia.org/T252956 (10Dzahn) I'm afraid we can't currently offer multiple Gigabytes of space on the virtual machines hosting this. They are made for small static sites which are usually a few hundred MB in total.... [07:15:31] (03PS1) 10Ema: varnish: move reload-vcl to /usr/local/sbin [puppet] - 10https://gerrit.wikimedia.org/r/597002 [07:16:05] RECOVERY - ores_workers_running on ores1003 is OK: PROCS OK: 91 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES [07:16:47] (03CR) 10Dzahn: [C: 03+2] "nothing uses it anymore per openstack-browser :)" [puppet] - 10https://gerrit.wikimedia.org/r/596682 (https://phabricator.wikimedia.org/T252190) (owner: 10Dzahn) [07:19:28] (03CR) 10Dzahn: [C: 03+2] "not used" [puppet] - 10https://gerrit.wikimedia.org/r/596704 (owner: 10Dzahn) [07:20:47] !log Upload MariaDB 10.4.13 to the buster repo - T250666 [07:20:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:20:51] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [07:22:07] (03CR) 10Vgutierrez: [C: 03+1] varnish: move reload-vcl to /usr/local/sbin [puppet] - 10https://gerrit.wikimedia.org/r/597002 (owner: 10Ema) [07:22:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2088 after upgrade', diff saved to https://phabricator.wikimedia.org/P11214 and previous config saved to /var/cache/conftool/dbconfig/20200518-072234-marostegui.json [07:22:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:28:50] 10Operations, 10SRE-Access-Requests: Requesting access to wikimedia namespace in packagist - https://phabricator.wikimedia.org/T252987 (10Nikerabbit) [07:28:59] (03CR) 10Ema: [C: 03+2] varnish: move reload-vcl to /usr/local/sbin [puppet] - 10https://gerrit.wikimedia.org/r/597002 (owner: 10Ema) [07:29:48] (03CR) 10VulpesVulpes825: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596916 (https://phabricator.wikimedia.org/T252786) (owner: 10KartikMistry) [07:29:54] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/596640 (owner: 10Jbond) [07:36:54] !log Remove and add pc2007 from tendril as the Act is frozen after reimage - T250666 [07:36:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:57] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [07:37:41] (03PS2) 10Dzahn: deployment::server: replace apache module with httpd module [puppet] - 10https://gerrit.wikimedia.org/r/596692 (https://phabricator.wikimedia.org/T252190) [07:37:43] (03CR) 10Filippo Giunchedi: [C: 03+2] swift: migrate off swift::params [puppet] - 10https://gerrit.wikimedia.org/r/596617 (https://phabricator.wikimedia.org/T252537) (owner: 10Filippo Giunchedi) [07:39:50] 10Operations, 10netops: Faulty port cr2-eqord:xe-0/1/1 - https://phabricator.wikimedia.org/T252988 (10ayounsi) p:05Triage→03Low [07:40:18] 10Operations, 10ops-eqord, 10netops: eqord - ulsfo Telia link down - IC-313592 - https://phabricator.wikimedia.org/T221259 (10ayounsi) 05Open→03Resolved a:03ayounsi Physically moving the optic to a different port solved the issue. Opened T252988 to troubleshot that specific issue. [07:40:25] (03CR) 10Dzahn: [C: 04-1] "Duplicate declaration: File[/etc/apache2/mods-available/status.conf] is already declared at (file: /srv/jenkins-workspace/puppet-compiler/" [puppet] - 10https://gerrit.wikimedia.org/r/596692 (https://phabricator.wikimedia.org/T252190) (owner: 10Dzahn) [07:44:12] (03CR) 10Filippo Giunchedi: [C: 03+2] "noop as expected: https://puppet-compiler.wmflabs.org/compiler1003/22559/" [puppet] - 10https://gerrit.wikimedia.org/r/596656 (https://phabricator.wikimedia.org/T252537) (owner: 10Filippo Giunchedi) [07:46:24] (03PS3) 10Dzahn: deployment::server: replace apache module with httpd module [puppet] - 10https://gerrit.wikimedia.org/r/596692 (https://phabricator.wikimedia.org/T252190) [07:47:57] !log installing apt security updates on jessie systems [07:47:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:29] 10Operations, 10DBA: In-place conversion from LVM to normal partition - https://phabricator.wikimedia.org/T252195 (10Marostegui) p:05Triage→03Medium [07:51:19] (03PS2) 10Filippo Giunchedi: swift: read hash_path_suffix with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/596657 (https://phabricator.wikimedia.org/T252537) [07:51:21] (03PS2) 10Filippo Giunchedi: swift: enable s3api [puppet] - 10https://gerrit.wikimedia.org/r/596658 (https://phabricator.wikimedia.org/T252186) [07:54:02] (03CR) 10Filippo Giunchedi: [C: 03+2] "PCC effectively a noop https://puppet-compiler.wmflabs.org/compiler1002/22562/" [puppet] - 10https://gerrit.wikimedia.org/r/596657 (https://phabricator.wikimedia.org/T252537) (owner: 10Filippo Giunchedi) [07:55:11] (03Abandoned) 10Ayounsi: Prometheus, collect Netbox metrics [puppet] - 10https://gerrit.wikimedia.org/r/526819 (https://phabricator.wikimedia.org/T226331) (owner: 10Ayounsi) [07:56:43] jouncebot: next [07:56:43] In 2 hour(s) and 33 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200518T1030) [07:56:51] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/22561/deploy1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/596692 (https://phabricator.wikimedia.org/T252190) (owner: 10Dzahn) [07:57:44] !log replacing apache module with httpd module on deployment servers [07:57:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:26] 10Operations, 10DBA, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10jcrespo) [08:01:20] 10Operations, 10SRE-Access-Requests: Give access to the Analytics Cluster to Research Inter (Rodolfo) - https://phabricator.wikimedia.org/T252476 (10Marostegui) @Rvvalentim you'd need to provide an email. Thanks [08:01:43] ACKNOWLEDGEMENT - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=netbox_device_statistics site=codfw Ayounsi https://phabricator.wikimedia.org/T243927 - The acknowledgement expires at: 2020-05-20 08:00:47. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [08:03:26] 10Operations, 10SRE-Access-Requests: Give access to the Analytics Cluster to Research Inter (Rodolfo) - https://phabricator.wikimedia.org/T252476 (10Marostegui) a:03Marostegui [08:07:24] (03PS1) 10Filippo Giunchedi: hieradata: bump object replicator concurrency for decom'ing hosts [puppet] - 10https://gerrit.wikimedia.org/r/597003 (https://phabricator.wikimedia.org/T252008) [08:10:01] (03CR) 10Filippo Giunchedi: "PCC as expected https://puppet-compiler.wmflabs.org/compiler1001/22564/" [puppet] - 10https://gerrit.wikimedia.org/r/597003 (https://phabricator.wikimedia.org/T252008) (owner: 10Filippo Giunchedi) [08:10:10] (03PS1) 10Marostegui: data.yaml: Add Rodolfo Valentim to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) [08:10:31] (03CR) 10Marostegui: [C: 04-1] "Pending email confirmation." [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) (owner: 10Marostegui) [08:12:33] 10Operations, 10observability, 10good first task: nagios-nrpe-server.service: systemd unit references path below legacy directory /var/run/ - https://phabricator.wikimedia.org/T252990 (10ema) [08:13:14] !log set weight to 0 for all but objects in ms-be10[678] - T252008 [08:13:16] 10Operations, 10observability, 10good first task: nagios-nrpe-server.service: systemd unit references path below legacy directory /var/run/ - https://phabricator.wikimedia.org/T252990 (10ema) p:05Triage→03Low [08:13:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:13:18] T252008: Decom ms-be101[678] - https://phabricator.wikimedia.org/T252008 [08:13:23] (03CR) 10Dzahn: [C: 04-1] "The UID is rodolfovalentim for that uidNumber." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) (owner: 10Marostegui) [08:15:05] (03PS2) 10Marostegui: data.yaml: Add Rodolfo Valentim to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) [08:15:14] (03PS1) 10Jcrespo: mariadb-backups: Disable monitoring screens on database backup hosts [puppet] - 10https://gerrit.wikimedia.org/r/597005 (https://phabricator.wikimedia.org/T138562) [08:15:41] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Disable monitoring screens on database backup hosts [puppet] - 10https://gerrit.wikimedia.org/r/597005 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo) [08:15:53] (03CR) 10jerkins-bot: [V: 04-1] data.yaml: Add Rodolfo Valentim to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) (owner: 10Marostegui) [08:16:39] (03CR) 10Dzahn: [C: 03+1] "lgtm. the UID, UID number match LDAP, the key on office wiki matches what is on ticket. I just don't know about the kerberos part that is " [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) (owner: 10Marostegui) [08:18:20] (03PS3) 10Marostegui: data.yaml: Add Rodolfo Valentim to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) [08:19:39] (03PS1) 10Giuseppe Lavagetto: Fixed bug with lag reporting. [software/purged] - 10https://gerrit.wikimedia.org/r/597006 [08:20:08] (03CR) 10JMeybohm: [C: 03+2] termbox: deploy up to date chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/596227 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [08:20:14] (03PS3) 10JMeybohm: termbox: deploy up to date chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/596227 (https://phabricator.wikimedia.org/T235411) [08:20:20] (03CR) 10Dzahn: [C: 03+1] data.yaml: Add Rodolfo Valentim to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) (owner: 10Marostegui) [08:24:08] 10Operations, 10observability, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348 (10jcrespo) 05Resolved→03Open I said that this is was going to lead to people annoying other people for things that are non impacting, and I agreed to the change because... [08:25:47] !log jayme@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [08:25:47] !log jayme@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [08:25:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:13] 10Operations, 10observability, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348 (10ayounsi) >>! In T165348#6143870, @jcrespo wrote: > Today I got pinged by @ayounsi for a WARNING running for a few hours For the record: > WARNING - (for 2d 15h 51m 27s) -... [08:33:02] 10Operations, 10Traffic, 10observability: prometheus-trafficserver-exporter: InsecureRequestWarning - https://phabricator.wikimedia.org/T252993 (10ema) [08:33:10] 10Operations, 10Traffic, 10observability: prometheus-trafficserver-exporter: InsecureRequestWarning - https://phabricator.wikimedia.org/T252993 (10ema) p:05Triage→03Low [08:39:34] (03CR) 10Dzahn: "Could i merge it anyways? I am pretty sure it would be effectively noop since i did this on puppetmaster, deployment_servers etc just rece" [puppet] - 10https://gerrit.wikimedia.org/r/596687 (https://phabricator.wikimedia.org/T252190) (owner: 10Dzahn) [08:40:08] (03PS1) 10Ema: cp3051: end large_objects_cutoff experiment [puppet] - 10https://gerrit.wikimedia.org/r/597008 (https://phabricator.wikimedia.org/T249809) [08:40:44] 10Operations, 10observability, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348 (10Dzahn) a:05Dzahn→03None [08:41:46] 10Operations, 10observability, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348 (10Dzahn) I don't see the issue here given that there is an easy way to exclude some hosts and that has already happened. [08:43:32] !log jayme@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' . [08:43:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:27] !log jayme@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' . [08:44:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:47] (03CR) 10Filippo Giunchedi: [C: 03+2] "Merging only, Gilles will be effectively deploying the patch (restarting, verifying, etc)" [puppet] - 10https://gerrit.wikimedia.org/r/596149 (https://phabricator.wikimedia.org/T252426) (owner: 10Gilles) [08:44:55] 10Puppet, 10Cloud-VPS, 10serviceops, 10Patch-For-Review, and 2 others: upgrade simplelamp class (apache -> httpd and mysql -> mariadb) or deprecate it - https://phabricator.wikimedia.org/T215662 (10Dzahn) 05Open→03Resolved The role has been deleted and replaced by simplelamp2. [08:44:58] 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): convert cloud VPS projects from apache to httpd module - https://phabricator.wikimedia.org/T202574 (10Dzahn) [08:45:01] 10Puppet, 10Cloud-VPS: role::simplelamp takes ownership of all content in /etc/apache2/sites-enabled - https://phabricator.wikimedia.org/T169368 (10Dzahn) [08:45:04] 10Operations, 10Puppet, 10Cloud-VPS: role::simplelamp fails to start mysql due to apparmor - https://phabricator.wikimedia.org/T128642 (10Dzahn) [08:45:07] 10Operations, 10DBA, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) [08:47:37] 10Operations, 10observability, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348 (10jcrespo) @Dhzan I think documenting how one is supposed to use the WARNINGS (to adopt some of my feedback) and document the general idea of what not to worry about (e.g. s... [08:47:58] (03PS1) 10Gergő Tisza: Update GrowthExperiments mentor list page for viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597010 [08:50:51] 10Operations, 10observability, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348 (10Dzahn) I think "how to handle Icinga warnings" is not something specific to this task about monitoring screens. The part that even after 12 days it is merely a warning an... [08:51:21] (03PS6) 10RhinosF1: Site name & meta namespace localisations for ti[wikipedia|wiktionary] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/595883 (https://phabricator.wikimedia.org/T251287) [08:57:23] 10Operations, 10observability, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348 (10jcrespo) I made an amend to the policy: https://wikitech.wikimedia.org/w/index.php?title=Monitoring%2FLong_running_screens&type=revision&diff=1866533&oldid=1823979 @Dzah... [08:59:46] (03PS1) 10Hashar: contint: cleanup no more used material [puppet] - 10https://gerrit.wikimedia.org/r/597011 (https://phabricator.wikimedia.org/T225735) [09:00:37] 10Operations, 10Traffic, 10observability: prometheus-trafficserver-exporter: InsecureRequestWarning - https://phabricator.wikimedia.org/T252993 (10Vgutierrez) in this case is pretty obvious, ats-tls is the only instance listening on TLS only, but yeah, +1 to provide unique SylogIdentifiers [09:04:59] 10Operations, 10observability, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348 (10Dzahn) 05Open→03Resolved a:03Dzahn Ok, resolving. Note: The thresholds are currently set to 240 hours (10 days) for WARN and 480 hours (20 days) for CRIT. [09:06:53] (03PS2) 10Giuseppe Lavagetto: Fixed bug with lag reporting. [software/purged] - 10https://gerrit.wikimedia.org/r/597006 [09:07:40] (03PS1) 10Hashar: contint: remove local apache virtualhost [puppet] - 10https://gerrit.wikimedia.org/r/597013 (https://phabricator.wikimedia.org/T225735) [09:07:53] 10Operations, 10Performance-Team, 10Thumbor, 10Patch-For-Review: Lower per-IP PoolCounter throttling Thumbor settings - https://phabricator.wikimedia.org/T252426 (10Gilles) 05Open→03Resolved a:03Gilles [09:10:25] (03PS1) 10Jbond: check_puppet_run_changes: remove -dev and -test hosts from icing check [puppet] - 10https://gerrit.wikimedia.org/r/597014 [09:11:01] (03PS1) 10Marostegui: db2088: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/597015 [09:12:36] (03PS1) 10Hashar: contint: remove profile::ci::browsers [puppet] - 10https://gerrit.wikimedia.org/r/597016 (https://phabricator.wikimedia.org/T225735) [09:12:59] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/597013 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [09:13:28] (03CR) 10Hashar: "That also get rid of the apache::site ;]" [puppet] - 10https://gerrit.wikimedia.org/r/597013 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [09:13:42] (03PS1) 10Filippo Giunchedi: profile: add thanos::swift::frontend [puppet] - 10https://gerrit.wikimedia.org/r/597017 (https://phabricator.wikimedia.org/T252186) [09:13:44] (03PS1) 10Filippo Giunchedi: thanos: add Envoy TLS terminator [puppet] - 10https://gerrit.wikimedia.org/r/597018 (https://phabricator.wikimedia.org/T252186) [09:13:46] (03PS1) 10Filippo Giunchedi: thanos: add Store Gateway [puppet] - 10https://gerrit.wikimedia.org/r/597019 (https://phabricator.wikimedia.org/T252186) [09:14:10] 10Operations, 10observability, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348 (10jcrespo) For context, I was opposed to this being on icinga (NOT the concept itself) because I was worried about icinga spam and pings from other users stressing SREs. I c... [09:15:11] 10Operations, 10Mail: URGENT - remove jimmy@ alias from exim mail aliases - https://phabricator.wikimedia.org/T123315 (10Peachey88) [09:15:56] (03CR) 10Ayounsi: [C: 03+1] "Not tested but LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [09:17:05] (03CR) 10Ema: [C: 03+1] Fixed bug with lag reporting. [software/purged] - 10https://gerrit.wikimedia.org/r/597006 (owner: 10Giuseppe Lavagetto) [09:17:13] (03PS2) 10Hashar: contint: remove local apache virtualhost [puppet] - 10https://gerrit.wikimedia.org/r/597013 (https://phabricator.wikimedia.org/T225735) [09:17:36] (03CR) 10Hashar: "PS2: I have forgot to delete modules/contint/templates/apache/localvhost.erb" [puppet] - 10https://gerrit.wikimedia.org/r/597013 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [09:17:39] (03CR) 10Marostegui: [C: 03+2] db2088: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/597015 (owner: 10Marostegui) [09:17:46] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/597013 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [09:18:01] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/597013 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [09:18:35] (03CR) 10Jcrespo: "> remove -dev and -test hosts from icing check" [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [09:20:51] (03PS1) 10Muehlenhoff: Remove access for thargrove [puppet] - 10https://gerrit.wikimedia.org/r/597020 [09:21:24] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) rack/setup/install thanos-be100[123] - https://phabricator.wikimedia.org/T251618 (10fgiunchedi) [09:21:51] 10Operations, 10ops-eqiad, 10DC-Ops: (NEED BY: ASAP) rack/setup/install thanos-fe100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T251620 (10fgiunchedi) [09:27:16] (03CR) 10Volans: [C: 04-1] "I'm on the fence on this. Aside from the fact that dev/test hosts should not be in prod IMHO (but that's out of scope), if the typical use" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [09:27:36] (03CR) 10Dzahn: [C: 03+2] "noop on contint* in prod: https://puppet-compiler.wmflabs.org/compiler1003/22568/" [puppet] - 10https://gerrit.wikimedia.org/r/597011 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [09:28:23] mutante: so I went with a lot more cleanup beside just removing the apache::site call :] [09:28:39] we got the last few jobs migrated just a few weeks ago [09:30:00] (03CR) 10Dzahn: check_puppet_run_changes: remove -dev and -test hosts from icing check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [09:30:13] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/596787 (owner: 10CRusnov) [09:31:04] mutante: vcan i merge yours [09:31:05] (03CR) 10Jbond: [C: 03+2] puppetmaster: add type checking [puppet] - 10https://gerrit.wikimedia.org/r/596640 (owner: 10Jbond) [09:31:37] (03CR) 10Dzahn: check_puppet_run_changes: remove -dev and -test hosts from icing check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [09:31:39] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Fixed bug with lag reporting. [software/purged] - 10https://gerrit.wikimedia.org/r/597006 (owner: 10Giuseppe Lavagetto) [09:33:03] (03CR) 10Dzahn: "> I could setup the test host and be happy with it, all green on Icinga (maybe with notification disabled, who knows) and then once applie" [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [09:33:55] jbond42: yes, thanks! [09:34:08] merging [09:34:18] hashar: cool, thanks [09:36:37] (03Abandoned) 10Dzahn: jenkins: simplify java setup, delete common class [puppet] - 10https://gerrit.wikimedia.org/r/595866 (owner: 10Dzahn) [09:36:46] 10Operations, 10SRE-Access-Requests: Access to analytics-privatedata-users for Research intern Daniram - https://phabricator.wikimedia.org/T252129 (10Miriam) 05Open→03Resolved Thanks @colewhite ! Closing this task. Thanks a lot all for your help :) [09:37:22] (03CR) 10Vgutierrez: [C: 03+2] Release 8.0.7-1wm8 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/596647 (owner: 10Vgutierrez) [09:37:28] (03PS2) 10Hashar: contint: remove profile::ci::browsers [puppet] - 10https://gerrit.wikimedia.org/r/597016 (https://phabricator.wikimedia.org/T225735) [09:37:32] mutante: I will look at the java8 vs java11 later today hopefully [09:37:39] for the contint* machines [09:37:52] hashar: it's already done [09:39:44] mutante: I have to fix it, my remarks on the change have not been taken in account ;) [09:39:56] openjdk-8 is installed on contint2001 [09:39:58] not a big deal, I will just fix it up [09:40:53] hashar: everything uses version 8 [09:40:57] nothing else pulls in java [09:41:16] we'll just have to remove the 11 packages from 2001 [09:41:23] which i can do right now [09:42:53] the change i abandoned above was also to remove the java_path stuff that isn't used [09:43:05] <_joe_> !log upload purged 0.13 to buster-wikimedia [09:43:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:12] (03PS1) 10Marostegui: Revert "install_server: Reimage db2088" [puppet] - 10https://gerrit.wikimedia.org/r/597024 [09:44:20] (03PS2) 10Jbond: check_puppet_run_changes: remove -dev and -test hosts from icinga check [puppet] - 10https://gerrit.wikimedia.org/r/597014 [09:45:12] (03PS1) 10Dzahn: jenkins: remove unused java_path variable [puppet] - 10https://gerrit.wikimedia.org/r/597026 [09:46:37] !log contint2001 - apt-get remove --purge openjdk-11-* - T224591 [09:46:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:46:41] T224591: Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 [09:46:52] (03CR) 10Volans: "Reply inline" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/570161 (https://phabricator.wikimedia.org/T243935) (owner: 10Volans) [09:48:10] PROBLEM - Check systemd state on maps2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:48:34] PROBLEM - Check systemd state on maps2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:48:41] (03CR) 10Jbond: "> Patch Set 1: Code-Review-1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [09:49:24] PROBLEM - Check systemd state on maps2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:49:58] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [09:51:32] 10Operations, 10Traffic, 10Patch-For-Review: Discarded VCL files stuck in auto/busy state cause high number of backend probe requests - https://phabricator.wikimedia.org/T236754 (10ema) What is happening is quite clear: due to some sort of race when reloading and/or discarding VCL, `vcl->busy` isn't decreme... [09:53:07] <_joe_> !log upgrading purged in codfw, ulsfo [09:53:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:00] (03PS1) 10Hashar: contint: remove obsolete roles [puppet] - 10https://gerrit.wikimedia.org/r/597027 (https://phabricator.wikimedia.org/T225735) [09:55:05] (03PS1) 10Dzahn: ci: rename role::ci::slave::labs::common to a profile [puppet] - 10https://gerrit.wikimedia.org/r/597028 [09:55:21] (03CR) 10Dzahn: "follow-ups for parts of it:" [puppet] - 10https://gerrit.wikimedia.org/r/595866 (owner: 10Dzahn) [09:56:27] (03CR) 10Volans: "Thanks for the patch, surely useful! Few nits inline, nothing major." (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/594988 (owner: 10Jbond) [09:57:19] (03PS2) 10Dzahn: ci: rename role::ci::slave::labs::common to a profile [puppet] - 10https://gerrit.wikimedia.org/r/597028 (https://phabricator.wikimedia.org/T225735) [09:57:37] (03PS2) 10Dzahn: jenkins: remove unused java_path variable [puppet] - 10https://gerrit.wikimedia.org/r/597026 (https://phabricator.wikimedia.org/T225735) [09:59:53] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/22569/" [puppet] - 10https://gerrit.wikimedia.org/r/597026 (https://phabricator.wikimedia.org/T225735) (owner: 10Dzahn) [10:01:03] (03CR) 10Muehlenhoff: [C: 03+2] Remove access for thargrove [puppet] - 10https://gerrit.wikimedia.org/r/597020 (owner: 10Muehlenhoff) [10:02:27] !log upload trafficserver 8.0.7-1wm8 to apt.wm.o (buster) [10:02:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:01] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/22570/" [puppet] - 10https://gerrit.wikimedia.org/r/597013 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [10:04:09] (03PS3) 10Dzahn: contint: remove local apache virtualhost [puppet] - 10https://gerrit.wikimedia.org/r/597013 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [10:04:14] (03PS1) 10RhinosF1: Create Draft (Talk) namespace on thwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597029 [10:07:16] (03PS1) 10Filippo Giunchedi: hieradata: cleanup ms-fe1005 host variables [puppet] - 10https://gerrit.wikimedia.org/r/597030 [10:07:18] (03CR) 10Dzahn: [C: 03+2] contint: remove profile::ci::browsers [puppet] - 10https://gerrit.wikimedia.org/r/597016 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [10:07:44] !log upload druid 0.12.3-1.1 to stretch|buster-wikimedia [10:07:44] (03PS3) 10Dzahn: contint: remove profile::ci::browsers [puppet] - 10https://gerrit.wikimedia.org/r/597016 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [10:07:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:46] (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: cleanup ms-fe1005 host variables [puppet] - 10https://gerrit.wikimedia.org/r/597030 (owner: 10Filippo Giunchedi) [10:08:03] (03CR) 10Ema: [C: 03+2] cp3051: end large_objects_cutoff experiment [puppet] - 10https://gerrit.wikimedia.org/r/597008 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [10:08:14] (03PS4) 10Elukey: Add role::druid::analytics::worker to an-druid100[1,2] [puppet] - 10https://gerrit.wikimedia.org/r/596678 (https://phabricator.wikimedia.org/T252771) [10:09:39] mutante: hi! ok to puppet-merge your changes along with mine? [10:09:45] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1001/22567/" [puppet] - 10https://gerrit.wikimedia.org/r/597017 (https://phabricator.wikimedia.org/T252186) (owner: 10Filippo Giunchedi) [10:10:00] ema: yes please [10:10:30] mutante: ack, done! [10:10:42] thanks [10:11:10] (03CR) 10Elukey: [C: 03+2] Add role::druid::analytics::worker to an-druid100[1,2] [puppet] - 10https://gerrit.wikimedia.org/r/596678 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [10:12:17] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/596478 (https://phabricator.wikimedia.org/T188912) (owner: 10Bstorm) [10:13:16] (03PS4) 10Jbond: interactive: add get_secret function [software/spicerack] - 10https://gerrit.wikimedia.org/r/594988 [10:13:36] (03CR) 10Jbond: interactive: add get_secret function (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/594988 (owner: 10Jbond) [10:16:10] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) (owner: 10Marostegui) [10:19:04] (03PS2) 10RhinosF1: Add thwiki's draft namespace to wmgExemptFromUserRobotsControlExtra and enable VE. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597029 (https://phabricator.wikimedia.org/T252959) [10:19:39] (03CR) 10jerkins-bot: [V: 04-1] Add thwiki's draft namespace to wmgExemptFromUserRobotsControlExtra and enable VE. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597029 (https://phabricator.wikimedia.org/T252959) (owner: 10RhinosF1) [10:20:17] <_joe_> !log upgrading purged in the remaining datacenters [10:20:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:44] (03CR) 10Dzahn: [C: 03+2] contint: remove obsolete roles [puppet] - 10https://gerrit.wikimedia.org/r/597027 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [10:20:52] (03PS2) 10Dzahn: contint: remove obsolete roles [puppet] - 10https://gerrit.wikimedia.org/r/597027 (https://phabricator.wikimedia.org/T225735) (owner: 10Hashar) [10:21:41] Can someone check that Jenkins error? [10:21:48] (03PS1) 10JMeybohm: mathoid: enable TLS with chart defaults [deployment-charts] - 10https://gerrit.wikimedia.org/r/597032 (https://phabricator.wikimedia.org/T235411) [10:21:54] * RhinosF1 hasn't touched line 26249 [10:23:18] seen it [10:23:47] RhinosF1: 06:19:30 Unexpected ';', expecting ']' in ./wmf-config/InitialiseSettings.php on line 26249 [10:24:07] mutante: yeah, it's ~9,000 lines higher than that my error [10:24:41] <_joe_> sigh [10:24:53] <_joe_> 35k lines of initialisesettings [10:25:27] * RhinosF1 is used to errors being one or two lines up not 9,000 - fixing. It's a big file. [10:25:36] (03PS3) 10RhinosF1: Add thwiki's draft namespace to wmgExemptFromUserRobotsControlExtra and enable VE. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597029 (https://phabricator.wikimedia.org/T252959) [10:25:49] <_joe_> yeah it's *decidedly* **way** too large [10:26:12] let's replace it with an sqlite3 file :-P [10:26:14] * volans hides [10:27:52] sam's next project can be to make it a managed via a db to finish chads work >.> [10:30:04] jan_drewniak: #bothumor My software never has bugs. It just develops random features. Rise for Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200518T1030). [10:31:28] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597033 (https://phabricator.wikimedia.org/T128546) [10:33:07] 10Operations, 10Traffic, 10Patch-For-Review: Discarded VCL files stuck in auto/busy state cause high number of backend probe requests - https://phabricator.wikimedia.org/T236754 (10ema) I have tried getting rid of `vcl-e7ac6c17-ad61-4947-afdb-835f4eee6caa` on cp3050 at ~09:12 by setting `busy` to 0. After a... [10:33:30] (03Abandoned) 10Dzahn: ci::worker_localhost: replace apache with httpd module [puppet] - 10https://gerrit.wikimedia.org/r/596687 (https://phabricator.wikimedia.org/T252190) (owner: 10Dzahn) [10:33:32] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597033 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:34:17] (03PS5) 10Jbond: interactive: add get_secret function [software/spicerack] - 10https://gerrit.wikimedia.org/r/594988 [10:34:20] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597033 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:35:31] (03CR) 10Volans: [C: 03+1] "Ship it!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/594988 (owner: 10Jbond) [10:37:00] !log copy prometheus-druid-exporter 0.8-1 from stretch to buster wikimedia [10:37:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:22] (03PS1) 10JMeybohm: termbox: fix wrong TLS port [deployment-charts] - 10https://gerrit.wikimedia.org/r/597034 (https://phabricator.wikimedia.org/T235411) [10:37:24] (03PS1) 10JMeybohm: termbox: enable TLS with chart defaults [deployment-charts] - 10https://gerrit.wikimedia.org/r/597035 (https://phabricator.wikimedia.org/T235411) [10:37:29] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:597033| Bumping portals to master (597033)]] (duration: 01m 32s) [10:37:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:36] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:597033| Bumping portals to master (597033)]] (duration: 01m 06s) [10:38:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:42] (03CR) 10Jbond: [V: 03+2 C: 03+2] interactive: add get_secret function [software/spicerack] - 10https://gerrit.wikimedia.org/r/594988 (owner: 10Jbond) [10:40:17] (03PS6) 10Jbond: interactive: add get_secret function [software/spicerack] - 10https://gerrit.wikimedia.org/r/594988 [10:43:46] (03PS3) 10Dzahn: ci: rename role::ci::slave::labs::common to a profile [puppet] - 10https://gerrit.wikimedia.org/r/597028 (https://phabricator.wikimedia.org/T225735) [10:46:16] (03PS1) 10JMeybohm: zotero: enable TLS with chart defaults [deployment-charts] - 10https://gerrit.wikimedia.org/r/597036 (https://phabricator.wikimedia.org/T235411) [10:47:46] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/22573/compiler1001.puppet-diffs.eqiad.wmflabs/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/597028 (https://phabricator.wikimedia.org/T225735) (owner: 10Dzahn) [10:55:59] (03CR) 10Dzahn: [C: 04-1] "was meanwhile replaced by https://gerrit.wikimedia.org/r/c/operations/puppet/+/594477 and https://gerrit.wikimedia.org/r/c/operations/dns/" [puppet] - 10https://gerrit.wikimedia.org/r/587521 (owner: 10Hashar) [10:56:28] (03PS2) 10Dzahn: delete the apache module, replaced by httpd [puppet] - 10https://gerrit.wikimedia.org/r/596694 (https://phabricator.wikimedia.org/T252190) [11:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor I � Unicode. All rise for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200518T1100). [11:00:05] Lucas_WMDE, kart_, and tgr: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:09] o/ [11:00:20] * kart_ is here. [11:00:30] o/ [11:00:58] +2ed my backport but that’ll take a while [11:01:03] * Lucas_WMDE looks at kart_’s change [11:02:06] kart_: is it ok if I update the change to add the task number? [11:02:11] (in a comment in the config) [11:03:55] Lucas_WMDE: sure [11:04:43] (03PS2) 10Lucas Werkmeister (WMDE): Make the threshold for Chinese WP to prevent publishing 5% more strict [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596916 (https://phabricator.wikimedia.org/T252786) (owner: 10KartikMistry) [11:04:50] (03CR) 10Lucas Werkmeister (WMDE): "Rebased and added the task number in a comment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596916 (https://phabricator.wikimedia.org/T252786) (owner: 10KartikMistry) [11:04:56] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596916 (https://phabricator.wikimedia.org/T252786) (owner: 10KartikMistry) [11:05:05] Lucas_WMDE: seems I've broken network. Sorry about it. [11:05:23] you mean, network connection to IRC? [11:05:38] (03Merged) 10jenkins-bot: Make the threshold for Chinese WP to prevent publishing 5% more strict [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596916 (https://phabricator.wikimedia.org/T252786) (owner: 10KartikMistry) [11:05:59] Lucas_WMDE: yes. [11:06:20] oh no :/ [11:06:34] Lucas_WMDE: Back on mobile. I can test :) [11:06:42] ok, it’s on mwdebug1001 :) [11:06:50] Cool. checking. [11:08:29] Lucas_WMDE: threshold is set. Go ahead. [11:08:43] ok! [11:08:45] thanks [11:10:23] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:596916|Make the threshold for Chinese WP to prevent publishing 5% more strict (T252786)]] (duration: 01m 06s) [11:10:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:27] T252786: Make the threshold for Chinese Wikipedia to prevent publishing 5% more strict - https://phabricator.wikimedia.org/T252786 [11:10:52] backport will still take a while [11:10:52] Thanks Lucas_WMDE! [11:10:57] let’s continue with tgr! [11:11:28] do you want to deploy the change yourself? [11:11:44] (03PS1) 10Arturo Borrero Gonzalez: toolforge: maintain-dbusers: depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/597040 (https://phabricator.wikimedia.org/T249188) [11:12:24] (03PS3) 10Dzahn: delete the apache module, replaced by httpd [puppet] - 10https://gerrit.wikimedia.org/r/596694 (https://phabricator.wikimedia.org/T252190) [11:12:39] Lucas_WMDE: no, if you don't mind doing it [11:12:57] ok sure [11:13:03] thx! [11:13:05] (03PS2) 10Lucas Werkmeister (WMDE): Update GrowthExperiments mentor list page for viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597010 (owner: 10Gergő Tisza) [11:13:42] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] "Page exists on viwiki, that’s good enough for me d)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597010 (owner: 10Gergő Tisza) [11:13:47] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597010 (owner: 10Gergő Tisza) [11:14:27] (03Merged) 10jenkins-bot: Update GrowthExperiments mentor list page for viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597010 (owner: 10Gergő Tisza) [11:15:20] change is on mwdebug1001 [11:17:04] tgr: can you test it? [11:18:12] I’ve enabled Special:Homepage but so far haven’t found the mentor link on it [11:18:43] Lucas_WMDE: thanks, it works [11:18:47] ok thanks [11:18:52] it's a bit complicated to test [11:19:39] (03Abandoned) 10Arturo Borrero Gonzalez: toolforge: maintain-dbusers: depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/597040 (https://phabricator.wikimedia.org/T249188) (owner: 10Arturo Borrero Gonzalez) [11:20:22] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:597010|Update GrowthExperiments mentor list page for viwiki]] (duration: 01m 06s) [11:20:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:44] (03PS3) 10Jbond: check_puppet_run_changes: remove -dev and -test hosts from icinga check [puppet] - 10https://gerrit.wikimedia.org/r/597014 [11:21:13] and meanwhile, the backport was merged [11:22:40] I don’t think it can be tested very well, I’ll just check that it doesn’t blow up completely [11:23:34] yeah, everything looks ok [11:25:22] !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.35.0-wmf.32/extensions/Wikibase/: SWAT: [[gerrit:596616|Fix core's TitleFactory not being used correctly (T252803)]] (duration: 01m 12s) [11:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:26] T252803: AffectedPagesFinder: Call to a member function exists() on null - https://phabricator.wikimedia.org/T252803 [11:26:49] logstash looks fine so far [11:26:57] no other changes in the calendar [11:27:39] !log EU SWAT done [11:27:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:26] (03PS4) 10Jbond: check_puppet_run_changes: remove staging hosts from this test [puppet] - 10https://gerrit.wikimedia.org/r/597014 [11:32:15] (03PS5) 10Jbond: check_puppet_run_changes: remove staging hosts from this test [puppet] - 10https://gerrit.wikimedia.org/r/597014 [11:32:44] (03PS6) 10Jbond: check_puppet_run_changes: remove staging hosts from this test [puppet] - 10https://gerrit.wikimedia.org/r/597014 [11:33:10] (03CR) 10Jbond: "I have updated the commit message to try and make the motives a bit more clear" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [11:45:39] (03CR) 10Jbond: check_puppet_run_changes: remove staging hosts from this test (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [11:52:42] !log Install 10.1.43-2 on db1122 and db1109 - T251981 [11:52:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:46] T251981: Upgrade and restart s2 and s8 (wikidatawiki) primary database masters: Tue 19th May - https://phabricator.wikimedia.org/T251981 [12:05:27] (03PS1) 10Volans: typing: add typing module for custom type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/597044 [12:05:29] (03PS1) 10Volans: icinga: refactor input parsing [software/spicerack] - 10https://gerrit.wikimedia.org/r/597045 [12:05:31] (03PS1) 10Volans: icinga: allow to check the status of a host [software/spicerack] - 10https://gerrit.wikimedia.org/r/597046 [12:11:22] (03CR) 10Dzahn: "This should theoretically be ready to go now, so i ran it in compiler but got so far:" [puppet] - 10https://gerrit.wikimedia.org/r/391849 (https://phabricator.wikimedia.org/T162070) (owner: 10Jcrespo) [12:14:18] (03CR) 10Jbond: [C: 03+1] typing: add typing module for custom type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/597044 (owner: 10Volans) [12:15:19] (03CR) 10Jbond: [C: 03+1] icinga: refactor input parsing [software/spicerack] - 10https://gerrit.wikimedia.org/r/597045 (owner: 10Volans) [12:16:51] (03CR) 10Marostegui: [C: 03+2] Revert "install_server: Reimage db2088" [puppet] - 10https://gerrit.wikimedia.org/r/597024 (owner: 10Marostegui) [12:18:45] (03CR) 10Jbond: [C: 03+1] "LGTM thanks :)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/597046 (owner: 10Volans) [12:24:31] (03PS8) 10Privacybatm: transfer.py: Add the ability to auto-detect free port for netcat to listen [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/595516 (https://phabricator.wikimedia.org/T252171) [12:24:54] (03CR) 10Volans: [C: 03+2] typing: add typing module for custom type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/597044 (owner: 10Volans) [12:25:00] (03CR) 10Jcrespo: "Needs rebasing - may need it manual rebasing. Given the nature of the patch it may be easier to just amend it from 0." [puppet] - 10https://gerrit.wikimedia.org/r/391849 (https://phabricator.wikimedia.org/T162070) (owner: 10Jcrespo) [12:25:27] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Give access to the Analytics Cluster to Research Inter (Rodolfo) - https://phabricator.wikimedia.org/T252476 (10Rvvalentim) >>! In T252476#6143792, @Marostegui wrote: > @Rvvalentim you'd need to provide an email. > Thanks Hi, my email is rodolfovieira... [12:25:50] (03CR) 10Volans: [C: 03+2] icinga: refactor input parsing [software/spicerack] - 10https://gerrit.wikimedia.org/r/597045 (owner: 10Volans) [12:27:03] (03CR) 10Volans: [C: 03+2] icinga: allow to check the status of a host [software/spicerack] - 10https://gerrit.wikimedia.org/r/597046 (owner: 10Volans) [12:28:06] (03PS4) 10Marostegui: data.yaml: Add Rodolfo Valentim to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) [12:28:19] (03CR) 10Marostegui: "User confirmed the email: https://phabricator.wikimedia.org/T252476#6144466" [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) (owner: 10Marostegui) [12:29:09] (03CR) 10Marostegui: [C: 03+2] data.yaml: Add Rodolfo Valentim to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/597004 (https://phabricator.wikimedia.org/T252476) (owner: 10Marostegui) [12:31:54] (03Merged) 10jenkins-bot: typing: add typing module for custom type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/597044 (owner: 10Volans) [12:31:56] (03Merged) 10jenkins-bot: icinga: refactor input parsing [software/spicerack] - 10https://gerrit.wikimedia.org/r/597045 (owner: 10Volans) [12:32:53] (03CR) 10Gehel: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/596655 (owner: 10DCausse) [12:33:03] (03Merged) 10jenkins-bot: icinga: allow to check the status of a host [software/spicerack] - 10https://gerrit.wikimedia.org/r/597046 (owner: 10Volans) [12:33:20] (03PS1) 10Arturo Borrero Gonzalez: nagios: add victorops-wmcs contact to the wmcs team [puppet] - 10https://gerrit.wikimedia.org/r/597047 (https://phabricator.wikimedia.org/T250717) [12:33:53] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Give access to the Analytics Cluster to Research Inter (Rodolfo) - https://phabricator.wikimedia.org/T252476 (10Marostegui) 05Open→03Resolved This is done. Patch merged, ran puppet on bast1002: ` Notice: /Stage[main]/Admin/Admin::Hashuser[rodolfova... [12:34:25] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] nagios: add victorops-wmcs contact to the wmcs team [puppet] - 10https://gerrit.wikimedia.org/r/597047 (https://phabricator.wikimedia.org/T250717) (owner: 10Arturo Borrero Gonzalez) [12:36:46] (03CR) 10Marostegui: "So as spoken during the meeting, the only concern for the initial run would be how many threads we run mydumper with." [puppet] - 10https://gerrit.wikimedia.org/r/596255 (https://phabricator.wikimedia.org/T79922) (owner: 10Jcrespo) [12:38:21] 10Operations, 10ops-codfw: BBU faulty on ms-be2016 - https://phabricator.wikimedia.org/T252851 (10Marostegui) p:05Triage→03Medium a:03Papaul [12:38:36] (03PS2) 10Gehel: [wdqs] fix DCAT-AP reload and load it to the categories endpoint [puppet] - 10https://gerrit.wikimedia.org/r/596655 (owner: 10DCausse) [12:39:54] 10Operations, 10Analytics, 10serviceops, 10vm-requests: Create a VM for matomo1002 (eqiad) - https://phabricator.wikimedia.org/T252742 (10Marostegui) p:05Triage→03Medium [12:41:21] 10Operations, 10SRE-tools: Create cookbook to reboot hosts - https://phabricator.wikimedia.org/T252807 (10Marostegui) p:05Triage→03Medium [12:41:54] 10Operations, 10serviceops: Sandbox/limit child processes within a container runtime - https://phabricator.wikimedia.org/T252745 (10Marostegui) p:05Triage→03Medium [12:42:16] (03PS3) 10Gehel: [wdqs] fix DCAT-AP reload and load it to the categories endpoint [puppet] - 10https://gerrit.wikimedia.org/r/596655 (owner: 10DCausse) [12:42:51] jouncebot: now [12:42:51] No deployments scheduled for the next 5 hour(s) and 17 minute(s) [12:44:20] (03PS5) 10Dzahn: Remove mysql module from WMF [puppet] - 10https://gerrit.wikimedia.org/r/391849 (https://phabricator.wikimedia.org/T162070) (owner: 10Jcrespo) [12:51:21] (03CR) 10Dzahn: "rebased :)" [puppet] - 10https://gerrit.wikimedia.org/r/391849 (https://phabricator.wikimedia.org/T162070) (owner: 10Jcrespo) [12:52:40] (03PS2) 10Ema: 5.1.3-1wm15: don't set temperature to cold [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/596626 (https://phabricator.wikimedia.org/T236754) [12:53:18] (03CR) 10Dzahn: [C: 03+1] "I compiled it and entered "C:mysql" as hosts and it comes out empty:" [puppet] - 10https://gerrit.wikimedia.org/r/391849 (https://phabricator.wikimedia.org/T162070) (owner: 10Jcrespo) [12:55:20] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/596694 (https://phabricator.wikimedia.org/T252190) (owner: 10Dzahn) [12:55:29] 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): convert cloud VPS projects from apache to httpd module - https://phabricator.wikimedia.org/T202574 (10Dzahn) 05Open→03Resolved This is done. [13:00:06] 10Puppet, 10Cloud-VPS, 10serviceops, 10Patch-For-Review, and 2 others: upgrade simplelamp class (apache -> httpd and mysql -> mariadb) or deprecate it - https://phabricator.wikimedia.org/T215662 (10Dzahn) [13:00:41] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.35 [software/spicerack] - 10https://gerrit.wikimedia.org/r/597049 [13:00:48] !log hashar@deploy1001 Synchronized php-1.35.0-wmf.32/skins/Vector/includes/VectorTemplate.php: VectorTemplate: SkinTemplateToolboxEnd hook isn't deprecated - T252906 (duration: 01m 07s) [13:00:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:52] T252906: Warning flood: "Use of SkinTemplateToolboxEnd hook was deprecated " - https://phabricator.wikimedia.org/T252906 [13:04:18] (03PS1) 10Kormat: Add db2136 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/597050 (https://phabricator.wikimedia.org/T252985) [13:05:51] marostegui: ^ [13:06:35] (03CR) 10Marostegui: "Normally for this kind of thing we use "mariadb: Add blablabla" on the commit message." [puppet] - 10https://gerrit.wikimedia.org/r/597050 (https://phabricator.wikimedia.org/T252985) (owner: 10Kormat) [13:07:39] (03PS2) 10Giuseppe Lavagetto: cache::text: enable consuming from kafka everywhere [puppet] - 10https://gerrit.wikimedia.org/r/596651 (https://phabricator.wikimedia.org/T133821) [13:07:41] (03PS1) 10Giuseppe Lavagetto: purged: enable consuming from kafka on cp2029 too [puppet] - 10https://gerrit.wikimedia.org/r/597051 (https://phabricator.wikimedia.org/T133821) [13:08:19] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.35 [software/spicerack] - 10https://gerrit.wikimedia.org/r/597049 (owner: 10Volans) [13:08:34] (03PS2) 10Kormat: mariadb: Add db2136 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/597050 (https://phabricator.wikimedia.org/T252985) [13:08:49] (03CR) 10jerkins-bot: [V: 04-1] 5.1.3-1wm15: don't set temperature to cold [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/596626 (https://phabricator.wikimedia.org/T236754) (owner: 10Ema) [13:09:50] marostegui: i updated the commit message [13:11:33] (03CR) 10Marostegui: [C: 03+1] mariadb: Add db2136 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/597050 (https://phabricator.wikimedia.org/T252985) (owner: 10Kormat) [13:11:47] (03CR) 10Giuseppe Lavagetto: [C: 03+2] purged: enable consuming from kafka on cp2029 too [puppet] - 10https://gerrit.wikimedia.org/r/597051 (https://phabricator.wikimedia.org/T133821) (owner: 10Giuseppe Lavagetto) [13:11:49] (03CR) 10Kormat: [C: 03+2] mariadb: Add db2136 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/597050 (https://phabricator.wikimedia.org/T252985) (owner: 10Kormat) [13:12:03] (03PS1) 10Dzahn: simplelamp2: do not purge unmanaged config files [puppet] - 10https://gerrit.wikimedia.org/r/597052 [13:12:55] _joe_: is it safe to puppet-merge your purged change? [13:12:59] <_joe_> kormat: yes [13:13:03] <_joe_> I was about to say [13:13:12] <_joe_> please do [13:13:41] done :) [13:13:46] <_joe_> kormat: thanks! [13:14:13] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.35 [software/spicerack] - 10https://gerrit.wikimedia.org/r/597049 (owner: 10Volans) [13:18:24] (03PS1) 10Volans: Upstream release v0.0.35 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/597053 [13:18:51] <_joe_> "upstream" :{ [13:19:32] (03PS2) 10Filippo Giunchedi: profile: add thanos::swift::frontend [puppet] - 10https://gerrit.wikimedia.org/r/597017 (https://phabricator.wikimedia.org/T252186) [13:19:34] (03PS2) 10Filippo Giunchedi: thanos: add Envoy TLS terminator [puppet] - 10https://gerrit.wikimedia.org/r/597018 (https://phabricator.wikimedia.org/T252186) [13:19:36] (03PS2) 10Filippo Giunchedi: thanos: add Store Gateway [puppet] - 10https://gerrit.wikimedia.org/r/597019 (https://phabricator.wikimedia.org/T252186) [13:27:37] (03PS1) 10Vgutierrez: Release 8.0.7-1wm9 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/597055 [13:27:51] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.35 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/597053 (owner: 10Volans) [13:28:25] 10Operations, 10SRE-tools: Create cookbook to reboot hosts - https://phabricator.wikimedia.org/T252807 (10jbond) As the CR specificity mentions operating on a single host i don't think that LBRemoteCluster comes into play here. I do think it would be great to have cookbooks witch can safley reboot every host... [13:28:59] 10Operations, 10SRE-tools: Create cookbook to reboot hosts - https://phabricator.wikimedia.org/T252807 (10jbond) > For the repool/depool aspect of things I think all hosts should have /usr/local/bin/(de)pool for the record i would definetly work on this peice if people agree its a good idea [13:29:41] (03PS3) 10Filippo Giunchedi: swift: enable s3api [puppet] - 10https://gerrit.wikimedia.org/r/596658 (https://phabricator.wikimedia.org/T252186) [13:29:43] (03PS1) 10Filippo Giunchedi: swift: set defaults for replicator concurrency [puppet] - 10https://gerrit.wikimedia.org/r/597056 (https://phabricator.wikimedia.org/T252537) [13:29:53] !log elukey@cumin1001 START - Cookbook sre.hadoop.roll-restart-workers [13:29:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:30:02] (test cluster) [13:30:15] (03CR) 10Ema: [C: 03+1] Release 8.0.7-1wm9 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/597055 (owner: 10Vgutierrez) [13:33:50] (03CR) 10Filippo Giunchedi: "PCC noop as expected https://puppet-compiler.wmflabs.org/compiler1002/22579/" [puppet] - 10https://gerrit.wikimedia.org/r/597056 (https://phabricator.wikimedia.org/T252537) (owner: 10Filippo Giunchedi) [13:34:06] (03Merged) 10jenkins-bot: Upstream release v0.0.35 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/597053 (owner: 10Volans) [13:35:53] 10Operations, 10SRE-tools: Create cookbook to reboot hosts - https://phabricator.wikimedia.org/T252807 (10Volans) >>! In T252807#6144703, @jbond wrote: > As the CR specificity mentions operating on a single host i don't think that LBRemoteCluster comes into play here. For smaller clusters we might still cros... [13:39:35] (03CR) 10Jcrespo: [C: 03+1] Remove mysql module from WMF [puppet] - 10https://gerrit.wikimedia.org/r/391849 (https://phabricator.wikimedia.org/T162070) (owner: 10Jcrespo) [13:41:28] (03PS1) 10Volans: tests: add @require_caplog to some actions tests [software/spicerack] - 10https://gerrit.wikimedia.org/r/597058 [13:42:12] (03CR) 10Giuseppe Lavagetto: [C: 03+2] cache::text: enable consuming from kafka everywhere [puppet] - 10https://gerrit.wikimedia.org/r/596651 (https://phabricator.wikimedia.org/T133821) (owner: 10Giuseppe Lavagetto) [13:46:28] 10Operations, 10SRE-tools: Create cookbook to reboot hosts - https://phabricator.wikimedia.org/T252807 (10jcrespo) > Of course im also expecting some historical context to potentially raise its head here. Cannot speak for other clusters, but while databases are not impossible, we are very far way to be able t... [13:47:53] (03CR) 10Vgutierrez: [C: 03+2] Release 8.0.7-1wm9 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/597055 (owner: 10Vgutierrez) [13:48:16] (03CR) 10Volans: [C: 03+2] tests: add @require_caplog to some actions tests [software/spicerack] - 10https://gerrit.wikimedia.org/r/597058 (owner: 10Volans) [13:50:33] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.36 [software/spicerack] - 10https://gerrit.wikimedia.org/r/597060 [13:50:36] (03CR) 10Filippo Giunchedi: [C: 03+2] swift: set defaults for replicator concurrency [puppet] - 10https://gerrit.wikimedia.org/r/597056 (https://phabricator.wikimedia.org/T252537) (owner: 10Filippo Giunchedi) [13:50:44] (03PS2) 10Filippo Giunchedi: swift: set defaults for replicator concurrency [puppet] - 10https://gerrit.wikimedia.org/r/597056 (https://phabricator.wikimedia.org/T252537) [13:50:46] (03CR) 10Jcrespo: "Did you also review the rest of the patch? If I just add threads: 2 or something like that, the rest is as to do a +1?" [puppet] - 10https://gerrit.wikimedia.org/r/596255 (https://phabricator.wikimedia.org/T79922) (owner: 10Jcrespo) [13:52:29] (03PS1) 10Ottomata: Add kafka-jumbo1007 to jumbo-eqiad brokers [puppet] - 10https://gerrit.wikimedia.org/r/597061 (https://phabricator.wikimedia.org/T252675) [13:52:49] (03CR) 10Marostegui: [C: 03+1] "> Did you also review the rest of the patch? If I just add threads: 2" [puppet] - 10https://gerrit.wikimedia.org/r/596255 (https://phabricator.wikimedia.org/T79922) (owner: 10Jcrespo) [13:53:39] (03CR) 10Ottomata: [C: 03+2] Add kafka-jumbo1007 to jumbo-eqiad brokers [puppet] - 10https://gerrit.wikimedia.org/r/597061 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [13:53:41] (03PS2) 10Ottomata: Add kafka-jumbo1007 to jumbo-eqiad brokers [puppet] - 10https://gerrit.wikimedia.org/r/597061 (https://phabricator.wikimedia.org/T252675) [13:54:11] (03PS5) 10Jcrespo: backups: Add backup1002 as the eqiad host for ES db backups [puppet] - 10https://gerrit.wikimedia.org/r/596255 (https://phabricator.wikimedia.org/T79922) [13:57:11] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.36 [software/spicerack] - 10https://gerrit.wikimedia.org/r/597060 (owner: 10Volans) [13:57:56] !log authdns - ns[01] static routes on cr[12]-eqiad switching from authdns1001 to dns1002 for T241770 [13:57:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:26] (03PS1) 10Ssingh: cescout: harden the Postgres installation (improves f3a35978) [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) [13:59:49] (03PS1) 10Volans: Upstream release v0.0.36 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/597063 [14:00:17] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) [14:00:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:20] !log elukey@cumin1001 START - Cookbook sre.hadoop.roll-restart-workers [14:02:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:04] !log upload trafficserver 8.0.7-1wm9 to apt.wm.o (buster) [14:06:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:17] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.36 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/597063 (owner: 10Volans) [14:06:57] kormat: you might want to ACK db2136 alerts on Icinga so we keep it clean :) [14:07:04] !log authdns - ns[01] static routes on cr[12]-eqiad switching back to authdns1001 (oops, that's not the server we're taking offline today!) [14:07:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:02] (03PS5) 10Muehlenhoff: Add debian/ directory to the build overlay [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/594718 (https://phabricator.wikimedia.org/T233947) [14:08:19] (03CR) 10Muehlenhoff: Add debian/ directory to the build overlay (031 comment) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/594718 (https://phabricator.wikimedia.org/T233947) (owner: 10Muehlenhoff) [14:08:34] (03PS1) 10JMeybohm: tiller: Upgrade to v2.16.7 on buster [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597067 (https://phabricator.wikimedia.org/T252428) [14:08:39] marostegui: you know, i have zero idea how to do that [14:08:45] haha [14:08:56] kormat: icinga.wikimedia.org/alerts [14:09:16] (03CR) 10Ssingh: "https://puppet-compiler.wmflabs.org/compiler1002/22580/cescout1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [14:09:29] there, if you check the boxes for db2136, and then: Select Command: Acknowledge checked service(s) [14:09:31] that should be it [14:09:41] !log uploaded spicerack_0.0.36-1_amd64.deb to apt.wikimedia.org stretch-wikimedia [14:09:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:32] 10Operations: Migrate ldap/corp replicas to Stretch/Buster - https://phabricator.wikimedia.org/T224557 (10ayounsi) 05Resolved→03Open Not sure if I'm re-opening the proper task, but looks relevant. dubnium/pollux are still present in DNS while I don't think they should (and they don't reply to pings). [14:10:34] 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10ayounsi) [14:12:01] (03CR) 10Volans: "recheck" [cookbooks] - 10https://gerrit.wikimedia.org/r/596639 (owner: 10Kormat) [14:12:15] (03PS1) 10Privacybatm: CuminExecution.py: Improve output message readabiliy of transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/597069 (https://phabricator.wikimedia.org/T252802) [14:12:38] !log dns1001 - shutting down for T241770 [14:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:40] (03CR) 10jerkins-bot: [V: 04-1] CuminExecution.py: Improve output message readabiliy of transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/597069 (https://phabricator.wikimedia.org/T252802) (owner: 10Privacybatm) [14:12:48] (03PS1) 10Filippo Giunchedi: thanos: add objstore support to sidecar [puppet] - 10https://gerrit.wikimedia.org/r/597071 (https://phabricator.wikimedia.org/T252186) [14:12:50] (03PS1) 10Filippo Giunchedi: thanos: add thanos::compact [puppet] - 10https://gerrit.wikimedia.org/r/597072 (https://phabricator.wikimedia.org/T252186) [14:12:52] (03PS1) 10Filippo Giunchedi: profile: add thanos::swift::backend [puppet] - 10https://gerrit.wikimedia.org/r/597073 (https://phabricator.wikimedia.org/T252186) [14:14:13] (03PS1) 10Ottomata: apt: add thirdparty/confluent to buster-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) [14:14:23] (03CR) 10jerkins-bot: [V: 04-1] thanos: add thanos::compact [puppet] - 10https://gerrit.wikimedia.org/r/597072 (https://phabricator.wikimedia.org/T252186) (owner: 10Filippo Giunchedi) [14:14:49] (03CR) 10jerkins-bot: [V: 04-1] profile: add thanos::swift::backend [puppet] - 10https://gerrit.wikimedia.org/r/597073 (https://phabricator.wikimedia.org/T252186) (owner: 10Filippo Giunchedi) [14:15:05] !log kormat@cumin1001 dbctl commit (dc=all): 'Depool db2073 while replacing it T252985', diff saved to https://phabricator.wikimedia.org/P11216 and previous config saved to /var/cache/conftool/dbconfig/20200518-141505-kormat.json [14:15:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:09] T252985: Productionize db213[6-9] and db2140 - https://phabricator.wikimedia.org/T252985 [14:15:18] PROBLEM - BFD status on cr2-eqiad is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [14:16:20] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [14:16:45] ^ those are probably due to downtime/shutdown of dns1001 above [14:16:56] (03PS2) 10Privacybatm: CuminExecution.py: Improve output message readabiliy of transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/597069 (https://phabricator.wikimedia.org/T252802) [14:17:49] (03CR) 10Jbond: [C: 03+1] Add debian/ directory to the build overlay (031 comment) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/594718 (https://phabricator.wikimedia.org/T233947) (owner: 10Muehlenhoff) [14:17:56] <_joe_> hnowlan: so I'm going to restart purged one dc at a time [14:18:57] ACKNOWLEDGEMENT - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 1 Brandon Black T241770 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [14:18:57] ACKNOWLEDGEMENT - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast Brandon Black T241770 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [14:18:57] ACKNOWLEDGEMENT - BFD status on cr2-eqiad is CRITICAL: CRIT: Down: 1 Brandon Black T241770 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [14:18:57] ACKNOWLEDGEMENT - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast Brandon Black T241770 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [14:19:13] (03CR) 10Privacybatm: "Proper comments will be ready once I rebase it with the auto port detection PR." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/597069 (https://phabricator.wikimedia.org/T252802) (owner: 10Privacybatm) [14:19:27] <_joe_> !log start consuming $dc.resource-purge kafka topic from purged in all of codfw T133821 [14:19:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:30] T133821: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 [14:19:34] _joe_: ack [14:19:57] _joe_: do you want me to deploy the changeprop change per DC or wait until both are done? [14:19:57] PROBLEM - Check systemd state on ms-be1045 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:20:13] <_joe_> hnowlan: wait until I'm done [14:20:20] ok [14:20:21] <_joe_> but +2 the change in the meanwhile if you want [14:23:25] <_joe_> !log start consuming $dc.resource-purge kafka topic from purged in all of eqsin, ulsfo T133821 [14:23:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:16] 10Operations, 10Traffic, 10vm-requests: Create a Ganeti VM for Wikidough - https://phabricator.wikimedia.org/T253024 (10ssingh) [14:26:30] (03CR) 10Muehlenhoff: "We don't need to duplicate the update definition, the old naming was admittedly confusing, see inline comments (this is caused because in " (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [14:27:00] there might be gerrit spam incoming, apologies in advance [14:27:01] (03PS2) 10Filippo Giunchedi: thanos: add thanos::compact [puppet] - 10https://gerrit.wikimedia.org/r/597072 (https://phabricator.wikimedia.org/T252186) [14:27:03] (03PS2) 10Filippo Giunchedi: profile: add thanos::swift::backend [puppet] - 10https://gerrit.wikimedia.org/r/597073 (https://phabricator.wikimedia.org/T252186) [14:27:09] ok less than I thought [14:27:58] (03PS6) 10Muehlenhoff: Add debian/ directory to the build overlay [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/594718 (https://phabricator.wikimedia.org/T233947) [14:28:03] (03CR) 10Ottomata: apt: add thirdparty/confluent to buster-wikimedia (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [14:28:41] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be1045 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:29:03] <_joe_> !log start consuming $dc.resource-purge kafka topic from purged in all of eqiad T133821 [14:29:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:07] T133821: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 [14:29:15] PROBLEM - mysqld processes on db2073 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [14:29:35] (03PS2) 10Ottomata: apt: add thirdparty/confluent to buster-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) [14:30:11] curious re: ms-be1045, I'll bounce ferm [14:30:27] (03CR) 10Alexandros Kosiaris: [C: 03+1] tiller: Upgrade to v2.16.7 on buster [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597067 (https://phabricator.wikimedia.org/T252428) (owner: 10JMeybohm) [14:30:45] RECOVERY - Check systemd state on ms-be1045 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:31:01] (03CR) 10Alexandros Kosiaris: [C: 03+1] zotero: enable TLS with chart defaults [deployment-charts] - 10https://gerrit.wikimedia.org/r/597036 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [14:31:10] kormat: db2073 alerted ^ [14:32:17] (03PS2) 10Krinkle: SpecialVersionVersionUrl: Don't use confusing local variable name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596726 (owner: 10Jforrester) [14:32:20] (03CR) 10Krinkle: [C: 03+1] SpecialVersionVersionUrl: Don't use confusing local variable name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596726 (owner: 10Jforrester) [14:33:03] <_joe_> !log start consuming $dc.resource-purge kafka topic from purged in all of esams T133821 [14:33:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:18] <_joe_> hnowlan: deploy at your pleasure [14:33:25] _joe_: will do [14:33:42] marostegui: that's what i get for trying to do a minimal downtime of services [14:33:59] (03CR) 10Muehlenhoff: apt: add thirdparty/confluent to buster-wikimedia (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [14:34:02] kormat: I usually downtime the host entirely for that reason :( [14:34:04] !log hnowlan@deploy1001 Started deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this [14:34:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:26] !log hnowlan@deploy1001 Finished deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this (duration: 01m 22s) [14:35:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:02] (03CR) 10Ottomata: apt: add thirdparty/confluent to buster-wikimedia (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [14:36:05] _joe_: done [14:36:21] (03PS3) 10Ottomata: apt: add thirdparty/confluent to buster-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) [14:36:50] (03PS1) 10JMeybohm: admin: jayme dotfiles: Add hfenv to PS1 [puppet] - 10https://gerrit.wikimedia.org/r/597089 [14:37:21] <_joe_> hnowlan: ok, this also means you should be unblocked from moving changeprop to k8s completely, correct? [14:37:47] _joe_: yep, as it stands zero changes are needed in k8s beyond this. All that needs to be done is removing the scb instances. [14:37:48] (03PS4) 10Ottomata: apt: add thirdparty/confluent to buster-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) [14:37:51] Thanks! :D [14:37:55] (03CR) 10JMeybohm: [C: 03+2] admin: jayme dotfiles: Add hfenv to PS1 [puppet] - 10https://gerrit.wikimedia.org/r/597089 (owner: 10JMeybohm) [14:38:11] <_joe_> nice! [14:38:36] _joe_: I'll hold off on a complete removal of the scb instances for a few days, just in case. [14:39:46] <_joe_> hnowlan: should we turn them off progressively? [14:39:54] <_joe_> but we can wait for petr to be back too [14:40:18] As it stands there's no real benefit to it. They're not subscribed to any topics or processing any messages at the moment [14:40:38] the only reason I propose keeping them around is that we can reenable the old rules in a hurry if needs be [14:40:52] (03PS1) 10Hashar: jenkins: master should stick to java 8 [puppet] - 10https://gerrit.wikimedia.org/r/597090 (https://phabricator.wikimedia.org/T224591) [14:42:14] (03CR) 10Muehlenhoff: "Looks good, one last comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [14:42:25] 10Operations: dbctl gives user-hostile diffs - https://phabricator.wikimedia.org/T253025 (10Kormat) [14:42:39] 10Operations: dbctl gives user-hostile diffs - https://phabricator.wikimedia.org/T253025 (10Kormat) [14:42:54] 10Operations, 10conftool: dbctl gives user-hostile diffs - https://phabricator.wikimedia.org/T253025 (10Marostegui) [14:43:03] (03CR) 10Ottomata: apt: add thirdparty/confluent to buster-wikimedia (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [14:43:20] (03PS5) 10Ottomata: apt: add thirdparty/confluent to buster-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) [14:43:44] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [14:43:46] (03PS1) 10Ema: 5.1.3-1wm15: add 0038-vcl_active-lock.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/597091 (https://phabricator.wikimedia.org/T236754) [14:44:58] (03CR) 10Hashar: "currently, contint2001 (Buster) has /usr/bin/java pointing to java11. Since the systemd service uses /usr/bin/java to start the service th" [puppet] - 10https://gerrit.wikimedia.org/r/597090 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [14:45:21] (03CR) 10Ottomata: [C: 03+2] apt: add thirdparty/confluent to buster-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/597076 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [14:45:23] (03CR) 10Hashar: "(and I have dropped jenkins::common in the process since it was slightly confusing)" [puppet] - 10https://gerrit.wikimedia.org/r/597090 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [14:45:41] (03CR) 10Alexandros Kosiaris: [C: 03+1] termbox: fix wrong TLS port [deployment-charts] - 10https://gerrit.wikimedia.org/r/597034 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [14:45:56] (03Abandoned) 10Jbond: pcc: add default paramaters [puppet] - 10https://gerrit.wikimedia.org/r/594724 (owner: 10Jbond) [14:46:47] (03CR) 10JMeybohm: [C: 03+2] termbox: fix wrong TLS port [deployment-charts] - 10https://gerrit.wikimedia.org/r/597034 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [14:48:25] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) Status update: purged is now consuming purges from restbase directly via kafka and not via multicast anymore. This should unblock the complete migration of c... [14:49:33] PROBLEM - Host dns1001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:49:59] !log jayme@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [14:50:00] !log jayme@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [14:50:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:12] !log jayme@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' . [14:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:23] !log hnowlan@cumin1001 START - Cookbook sre.cassandra.roll-restart [14:55:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:28] RECOVERY - Host dns1001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.04 ms [14:56:30] !log roll-restart of sessionstore cassandra hosts for java security update [14:56:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:54] 10Operations: Integrate Buster 10.4 point update - https://phabricator.wikimedia.org/T252394 (10MoritzMuehlenhoff) [14:57:25] !log jayme@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' . [14:57:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:45] (03CR) 10Alexandros Kosiaris: [C: 03+1] termbox: enable TLS with chart defaults [deployment-charts] - 10https://gerrit.wikimedia.org/r/597035 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [14:59:33] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1045 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [15:00:43] (03PS1) 10Cmjohnson: Adding new mac address for dns1001 to reflect nic card addition [puppet] - 10https://gerrit.wikimedia.org/r/597095 (https://phabricator.wikimedia.org/T241770) [15:01:10] (03PS1) 10Ottomata: Use apt::package_from_component in confluent::kafka::common [puppet] - 10https://gerrit.wikimedia.org/r/597097 (https://phabricator.wikimedia.org/T252675) [15:02:01] (03CR) 10Cmjohnson: [C: 03+2] Adding new mac address for dns1001 to reflect nic card addition [puppet] - 10https://gerrit.wikimedia.org/r/597095 (https://phabricator.wikimedia.org/T241770) (owner: 10Cmjohnson) [15:04:02] (03PS1) 10Marostegui: wikireplica_analytics: Query killer decreased to 7200 [puppet] - 10https://gerrit.wikimedia.org/r/597098 (https://phabricator.wikimedia.org/T251719) [15:04:17] (03CR) 10jerkins-bot: [V: 04-1] 5.1.3-1wm15: add 0038-vcl_active-lock.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/597091 (https://phabricator.wikimedia.org/T236754) (owner: 10Ema) [15:04:27] (03CR) 10jerkins-bot: [V: 04-1] wikireplica_analytics: Query killer decreased to 7200 [puppet] - 10https://gerrit.wikimedia.org/r/597098 (https://phabricator.wikimedia.org/T251719) (owner: 10Marostegui) [15:05:46] 10Operations, 10netops: RRDP status alert - https://phabricator.wikimedia.org/T245121 (10ayounsi) 05Open→03Resolved a:03ayounsi * Routinator upgraded in T252010. Which helped to remove the "dubious" targets. * Since this task has been opened, proxies have been moved to new hosts and performance has incre... [15:06:36] (03PS2) 10Marostegui: wikireplica_analytics: Query killer decreased to 7200 [puppet] - 10https://gerrit.wikimedia.org/r/597098 (https://phabricator.wikimedia.org/T251719) [15:07:10] (03CR) 10Ottomata: "Looks good I think" [puppet] - 10https://gerrit.wikimedia.org/r/597097 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [15:07:37] (03CR) 10Marostegui: [C: 03+2] wikireplica_analytics: Query killer decreased to 7200 [puppet] - 10https://gerrit.wikimedia.org/r/597098 (https://phabricator.wikimedia.org/T251719) (owner: 10Marostegui) [15:10:47] (03Abandoned) 10Hashar: Point to current working directory by default [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/589710 (owner: 10Hashar) [15:11:16] !log krinkle@mc1021 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet [15:11:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:33] (03Abandoned) 10Hashar: WIP WIP WIP Switch CI to contint2001 WIP WIP WIP [puppet] - 10https://gerrit.wikimedia.org/r/587521 (owner: 10Hashar) [15:14:55] (03PS1) 10Muehlenhoff: Remove DNS entries for dubnium/pollux [dns] - 10https://gerrit.wikimedia.org/r/597099 (https://phabricator.wikimedia.org/T224557) [15:15:42] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/596639 (owner: 10Kormat) [15:16:16] (03PS2) 10Muehlenhoff: Enable CAS staging host for Icinga [puppet] - 10https://gerrit.wikimedia.org/r/596174 [15:19:42] (03CR) 10Muehlenhoff: [C: 03+1] "Looks fine" [puppet] - 10https://gerrit.wikimedia.org/r/597090 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [15:20:18] (03PS2) 10Kormat: Update cookbooks for 'mysql' -> 'mysql_legacy' rename. [cookbooks] - 10https://gerrit.wikimedia.org/r/596639 [15:22:29] (03PS1) 10Volans: icinga: fix get_status() [software/spicerack] - 10https://gerrit.wikimedia.org/r/597102 [15:22:41] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) [15:22:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:56] \o/ [15:23:40] (03CR) 10Kormat: [C: 03+2] Update cookbooks for 'mysql' -> 'mysql_legacy' rename. [cookbooks] - 10https://gerrit.wikimedia.org/r/596639 (owner: 10Kormat) [15:24:58] (03PS1) 10Cmjohnson: Removing all dns for decom host mw1280 [dns] - 10https://gerrit.wikimedia.org/r/597103 (https://phabricator.wikimedia.org/T251077) [15:25:09] (03PS2) 10Volans: icinga: fix get_status() [software/spicerack] - 10https://gerrit.wikimedia.org/r/597102 [15:25:25] (03Merged) 10jenkins-bot: Update cookbooks for 'mysql' -> 'mysql_legacy' rename. [cookbooks] - 10https://gerrit.wikimedia.org/r/596639 (owner: 10Kormat) [15:25:29] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/597097 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [15:25:49] (03CR) 10Cmjohnson: [C: 03+2] Removing all dns for decom host mw1280 [dns] - 10https://gerrit.wikimedia.org/r/597103 (https://phabricator.wikimedia.org/T251077) (owner: 10Cmjohnson) [15:30:35] (03CR) 10Muehlenhoff: cescout: harden the Postgres installation (improves f3a35978) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [15:30:49] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) [15:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:51] 10Operations, 10ops-eqiad, 10serviceops, 10Patch-For-Review: mw1280 correctable memory errors logged in getsel - https://phabricator.wikimedia.org/T251077 (10Cmjohnson) 05Open→03Resolved Removed from rack, updated netbox, removed dns, verified no entries in site.pp or puppet. [15:32:48] (03PS1) 10Cmjohnson: Removing mgmt dns entries for decom host dbproxy1010 [dns] - 10https://gerrit.wikimedia.org/r/597104 (https://phabricator.wikimedia.org/T248944) [15:34:11] (03PS2) 10Ssingh: cescout: harden the Postgres installation (improves f3a35978) [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) [15:34:57] (03CR) 10Ssingh: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [15:35:01] (03CR) 10Cmjohnson: [C: 03+2] Removing mgmt dns entries for decom host dbproxy1010 [dns] - 10https://gerrit.wikimedia.org/r/597104 (https://phabricator.wikimedia.org/T248944) (owner: 10Cmjohnson) [15:37:43] (03CR) 10Jbond: [C: 03+1] "ahh thanks :)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/597102 (owner: 10Volans) [15:39:51] (03CR) 10Volans: [C: 03+2] icinga: fix get_status() [software/spicerack] - 10https://gerrit.wikimedia.org/r/597102 (owner: 10Volans) [15:40:27] (03CR) 10Jbond: [C: 03+1] "lgtm" [dns] - 10https://gerrit.wikimedia.org/r/597099 (https://phabricator.wikimedia.org/T224557) (owner: 10Muehlenhoff) [15:41:42] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission dbproxy1010.eqiad.wmnet - https://phabricator.wikimedia.org/T248944 (10Cmjohnson) [15:41:55] 10Operations, 10DBA, 10Data-Services: Replace labsdb (wikireplicas) dbproxies: dbproxy1010 and dbproxy1011 - https://phabricator.wikimedia.org/T231520 (10Cmjohnson) [15:42:02] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission dbproxy1010.eqiad.wmnet - https://phabricator.wikimedia.org/T248944 (10Cmjohnson) 05Open→03Resolved updated and removed [15:44:10] 10Operations, 10ORES, 10Scoring-platform-team (Current): ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) Oh yes. We need help from SRE. I'm at my limit here. @akosiaris was working with us on this in the... [15:46:25] (03CR) 10Muehlenhoff: "Looks good, one final nit." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [15:47:05] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.37 [software/spicerack] - 10https://gerrit.wikimedia.org/r/597105 [15:48:18] (03PS3) 10Ssingh: cescout: harden the Postgres installation (improves f3a35978) [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) [15:48:31] (03PS2) 10Muehlenhoff: Remove DNS entries for dubnium/pollux [dns] - 10https://gerrit.wikimedia.org/r/597099 (https://phabricator.wikimedia.org/T224557) [15:48:33] (03CR) 10Ssingh: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [15:48:56] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [15:50:29] (03CR) 10Ssingh: [C: 03+2] cescout: harden the Postgres installation (improves f3a35978) [puppet] - 10https://gerrit.wikimedia.org/r/597062 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [15:51:31] !log hnowlan@cumin1001 START - Cookbook sre.cassandra.roll-restart [15:51:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:12] (03CR) 10Muehlenhoff: [C: 03+2] Remove DNS entries for dubnium/pollux [dns] - 10https://gerrit.wikimedia.org/r/597099 (https://phabricator.wikimedia.org/T224557) (owner: 10Muehlenhoff) [15:52:21] !log rolling codfw cassandra for java security updates [15:52:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:33] 10Operations, 10Patch-For-Review: Migrate ldap/corp replicas to Stretch/Buster - https://phabricator.wikimedia.org/T224557 (10MoritzMuehlenhoff) 05Open→03Resolved >>! In T224557#6144902, @ayounsi wrote: > Not sure if I'm re-opening the proper task, but looks relevant. > > dubnium/pollux are still present... [15:53:35] 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff) [15:55:15] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) install additional SSDs into prometheus100[34] - https://phabricator.wikimedia.org/T251621 (10Cmjohnson) I see on the procurement task that these were supposed to be here on the 6 May but I have not seen them. @Jclark-ctr have you received them? [15:56:03] (03CR) 10Dzahn: [C: 03+2] delete the apache module, replaced by httpd [puppet] - 10https://gerrit.wikimedia.org/r/596694 (https://phabricator.wikimedia.org/T252190) (owner: 10Dzahn) [15:56:06] 10Operations, 10ops-eqiad, 10decommission, 10cloud-services-team (Kanban): labsdb1002-array1: status clarification - https://phabricator.wikimedia.org/T214903 (10Cmjohnson) 05Open→03Resolved I removed from rack and updated netbox [15:57:00] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.37 [software/spicerack] - 10https://gerrit.wikimedia.org/r/597105 (owner: 10Volans) [15:58:34] (03PS1) 10Volans: Upstream release v0.0.37 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/597108 [16:04:08] (03CR) 10Halfak: [C: 03+1] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/595167 (owner: 10Alexandros Kosiaris) [16:06:53] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.37 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/597108 (owner: 10Volans) [16:10:25] !log uploaded spicerack_0.0.37-1_amd64.deb to apt.wikimedia.org stretch-wikimedia [16:10:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:30] !log dns1001 - reimaging for new NIC - T241770 [16:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:57] PROBLEM - Recursive DNS on 208.80.154.10 is CRITICAL: Return code of 255 is out of bounds https://wikitech.wikimedia.org/wiki/DNS [16:24:17] ^ related to dns1001 reimage (the IP came back online during install, so now it can alert on the service :P) [16:25:12] ACKNOWLEDGEMENT - Recursive DNS on 208.80.154.10 is CRITICAL: Return code of 255 is out of bounds Brandon Black T241770 https://wikitech.wikimedia.org/wiki/DNS [16:30:13] !log bblack@cumin1001 START - Cookbook sre.hosts.downtime [16:30:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:46] !log bblack@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [16:32:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:33] PROBLEM - Host 2620:0:861:1:b226:28ff:fed9:dcc0 is DOWN: PING CRITICAL - Packet loss = 100% [16:34:39] PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:37:29] PROBLEM - Host 2620:0:861:1:b226:28ff:fed9:dcc0 is DOWN: PING CRITICAL - Packet loss = 100% [16:38:13] RECOVERY - BFD status on cr2-eqiad is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [16:41:39] RECOVERY - BGP status on cr1-eqiad is OK: BGP OK - up: 38, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [16:42:37] 10Operations, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) Hio, I see some upgrades happened, should they have fixed this? [16:45:55] !log updated views on labsdb1011 for the wb_terms changes T251598 [16:45:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:00] T251598: Clean up wb_terms related views - https://phabricator.wikimedia.org/T251598 [16:49:26] (03PS13) 10Alexandros Kosiaris: Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [16:54:45] (03CR) 10Alexandros Kosiaris: [C: 03+1] "I 've uploaded 1 last change fixing some issues with the configuration file being mounted in the container, this is ready for merge." (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [17:06:09] !log dns1001 - removing downtimes, back in service - T241770 [17:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:58] (03PS1) 10Jdlrobson: English wordmark dimensions are incorrect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597117 (https://phabricator.wikimedia.org/T252143) [17:11:31] (03CR) 10CRusnov: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/596787 (owner: 10CRusnov) [17:14:17] !log update domain object for 56.15.185.in-addr.arpa - T247972 [17:14:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:20] T247972: Cloud DNS: fix inconsistent ownership of reverse domains for openstack floating ip networks - https://phabricator.wikimedia.org/T247972 [17:15:13] cdanis: can you s/maro.*$/rzl/ in the topic please? [17:15:46] thanks Reedy [17:15:53] rzl: and reminder I'm happy to be your backup [17:15:55] thanks indeed! [17:16:00] cdanis: 👍 [17:21:40] (03CR) 10VolkerE: [C: 03+1] English wordmark dimensions are incorrect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597117 (https://phabricator.wikimedia.org/T252143) (owner: 10Jdlrobson) [17:23:31] 10Operations, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) Heh, just tried myself, I guess not; still got disconnected after 15 minutes. [17:25:44] (03CR) 10Ottomata: [C: 03+2] Use apt::package_from_component in confluent::kafka::common [puppet] - 10https://gerrit.wikimedia.org/r/597097 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [17:25:50] (03PS2) 10Ottomata: Use apt::package_from_component in confluent::kafka::common [puppet] - 10https://gerrit.wikimedia.org/r/597097 (https://phabricator.wikimedia.org/T252675) [17:29:33] apt1001 [17:29:37] oops wrong room [17:33:16] (03CR) 10Jcrespo: "Looking good, but could you put the hiding of output under a --verbose flag. You don't have to do it now, you can wait on rebase if it is " [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/597069 (https://phabricator.wikimedia.org/T252802) (owner: 10Privacybatm) [17:35:01] (03PS1) 10Papaul: Partman: Change entry for thanos-be200[1-4] [puppet] - 10https://gerrit.wikimedia.org/r/597121 [17:39:17] (03CR) 10Jcrespo: "I will now look at the final code proposal, but please also add more details to the commit message. Previous ones were very simple patches" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/595516 (https://phabricator.wikimedia.org/T252171) (owner: 10Privacybatm) [17:39:36] 10Operations, 10ops-codfw, 10DBA: db2097 memory errors leading to crash - https://phabricator.wikimedia.org/T252492 (10Papaul) Case Reference ID: 5347351645 Status: Case is generated and in Progress Product: HPE ProLiant DL360 Gen10 8SFF Configure-to-order Server Product number: 867959-B21 Serial number: Su... [17:39:56] (03CR) 10Jcrespo: "On my previous message, when I said netcat, I meant netstat, sorry." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/595516 (https://phabricator.wikimedia.org/T252171) (owner: 10Privacybatm) [17:45:41] 10Operations, 10ops-codfw: BBU faulty on ms-be2016 - https://phabricator.wikimedia.org/T252851 (10Papaul) @fgiunchedi unfortunately this system is out of warranty since 2018. Please let me know how do you want to proceed. Buy a new BBU or decommission the server Thanks [17:46:44] (03CR) 10Papaul: [C: 03+2] Partman: Change entry for thanos-be200[1-4] [puppet] - 10https://gerrit.wikimedia.org/r/597121 (owner: 10Papaul) [17:56:14] (03CR) 10Jcrespo: "Please check my comments with a couple of improvements (IP hardcoding, patch summary). I will test it next." (032 comments) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/595516 (https://phabricator.wikimedia.org/T252171) (owner: 10Privacybatm) [18:00:04] RoanKattouw, Niharika, and Urbanecm: How many deployers does it take to do Morning SWAT(Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200518T1800). [18:00:04] Jdlrobson: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:33] 10Operations, 10ops-codfw, 10DBA: db2097 memory errors leading to crash - https://phabricator.wikimedia.org/T252492 (10Papaul) Will be receiving the DIMM tomorrow. The HP engineer recommended to update the firmware after the DIMM has been replaced. [18:01:59] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` thanos-be2001.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [18:04:02] PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 78, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [18:04:06] (03CR) 10Privacybatm: "> Patch Set 8:" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/595516 (https://phabricator.wikimedia.org/T252171) (owner: 10Privacybatm) [18:05:49] Jdlrobson: You here for your SWAT? [18:07:08] (03CR) 10Privacybatm: "And the hardcoded IP was a mistake, that was the ip of my machine, I forgot to change it :-/" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/595516 (https://phabricator.wikimedia.org/T252171) (owner: 10Privacybatm) [18:07:40] RoanKattouw: yep [18:07:47] sorry was having connection issues this morning [18:07:53] are you still around? [18:08:27] (03CR) 10Catrope: [C: 03+2] English wordmark dimensions are incorrect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597117 (https://phabricator.wikimedia.org/T252143) (owner: 10Jdlrobson) [18:09:12] (03Merged) 10jenkins-bot: English wordmark dimensions are incorrect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597117 (https://phabricator.wikimedia.org/T252143) (owner: 10Jdlrobson) [18:10:29] Jdlrobson: Ready for testing on mwdebug1002 [18:10:37] RoanKattouw: sweet this should be quick [18:10:50] RoanKattouw: perfect :) [18:11:39] (03CR) 10Jcrespo: "> I will make 2 patches then, can I refer to this same ticket for" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/595516 (https://phabricator.wikimedia.org/T252171) (owner: 10Privacybatm) [18:12:24] RoanKattouw: thanks for the swat! [18:13:15] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Fix English Wikipedia wordmark dimensions (T252143) (duration: 01m 06s) [18:13:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:19] T252143: Update existing outdated wordmarks - https://phabricator.wikimedia.org/T252143 [18:19:58] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thanos-be2001.codfw.wmnet'] ` Of which those **FAILED**: ` ['thanos-be2001.codfw.wmnet'] ` [18:20:23] 10Operations, 10Security-Team, 10serviceops, 10vm-requests, 10PM: Eqiad: 1VM request for Peek (PM service in use by Security Team) - https://phabricator.wikimedia.org/T252210 (10chasemp) 10 day bump :) [18:21:48] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10Papaul) ` [edit interfaces interface-range vlan-private1-a-codfw] member ge-1/0/0 { ... } + member xe-7/0/3; [edit interfaces interface-range disabled] - member x... [18:24:00] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) [18:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:42] 10Operations, 10ops-codfw, 10DBA: db2097 memory errors leading to crash - https://phabricator.wikimedia.org/T252492 (10Papaul) Hello Papaul, Greetings from Hewlett Packard Enterprise! As discussed , as per the AHS logs : Memory Failure is seen on Proc 2 DIMM 4. Uncorrectable Machine Check exception is s... [18:29:04] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` thanos-be2001.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [18:34:28] (03PS1) 10Krinkle: Turn off $wgResourceLoaderUseObjectCacheForDeps in Beta and testwikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597129 [18:36:40] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Discovery-Search (Current work): SRE Onboarding - Ryan Kemper, Search Platform team - https://phabricator.wikimedia.org/T251572 (10herron) [18:38:00] !log upgraded spicerack to 0.0.37-1 on cumin[12]001 [18:38:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:38:24] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Discovery-Search (Current work): SRE Onboarding - Ryan Kemper, Search Platform team - https://phabricator.wikimedia.org/T251572 (10herron) 05Open→03Resolved Per IRC convo with @RKemper we'll defer the U2F setup for a later date, unless that... [18:46:08] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [18:46:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:49] (03CR) 10Krinkle: [C: 03+2] Turn off $wgResourceLoaderUseObjectCacheForDeps in Beta and testwikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597129 (owner: 10Krinkle) [18:49:26] RoanKattouw: (I cancelled ^) Is deploy clear? [18:49:27] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [18:49:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:56] Krinkle: yeah go for it [18:50:14] (03CR) 10Krinkle: [C: 03+2] Turn off $wgResourceLoaderUseObjectCacheForDeps in Beta and testwikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597129 (owner: 10Krinkle) [18:51:05] (03Merged) 10jenkins-bot: Turn off $wgResourceLoaderUseObjectCacheForDeps in Beta and testwikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597129 (owner: 10Krinkle) [18:51:50] (03PS1) 10Jon Harald Søby: Initial config for shnwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597132 (https://phabricator.wikimedia.org/T253029) [18:53:08] (03CR) 10jerkins-bot: [V: 04-1] Initial config for shnwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597132 (https://phabricator.wikimedia.org/T253029) (owner: 10Jon Harald Søby) [18:54:14] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thanos-be2001.codfw.wmnet'] ` and were **ALL** successful. [18:54:34] * Krinkle staging on mwdebug1002 [18:58:20] !log krinkle@deploy1001 Synchronized wmf-config/CommonSettings-labs.php: Ic005093778d (duration: 01m 06s) [18:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:32] (03PS2) 10Jon Harald Søby: Initial config for shnwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597132 (https://phabricator.wikimedia.org/T253029) [18:59:23] (03CR) 10jerkins-bot: [V: 04-1] Initial config for shnwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597132 (https://phabricator.wikimedia.org/T253029) (owner: 10Jon Harald Søby) [19:00:39] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Ic005093778d (duration: 01m 08s) [19:00:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:57] !log performing rolling maintenance on kafka-main to pick up java security updates [19:07:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:48] (03PS1) 10Ottomata: Add kafka-jumbo100[89] into the jumbo-eqiad kafka cluster [puppet] - 10https://gerrit.wikimedia.org/r/597134 (https://phabricator.wikimedia.org/T252675) [19:17:28] (03PS2) 10Ottomata: Add kafka-jumbo100[89] into the jumbo-eqiad kafka cluster [puppet] - 10https://gerrit.wikimedia.org/r/597134 (https://phabricator.wikimedia.org/T252675) [19:18:27] (03CR) 10Ottomata: [C: 03+2] Add kafka-jumbo100[89] into the jumbo-eqiad kafka cluster [puppet] - 10https://gerrit.wikimedia.org/r/597134 (https://phabricator.wikimedia.org/T252675) (owner: 10Ottomata) [19:20:03] (03PS3) 10Jon Harald Søby: Initial config for shnwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597132 (https://phabricator.wikimedia.org/T253029) [19:20:52] (03CR) 10jerkins-bot: [V: 04-1] Initial config for shnwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597132 (https://phabricator.wikimedia.org/T253029) (owner: 10Jon Harald Søby) [19:24:53] PROBLEM - Check systemd state on ms-be2053 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:30:26] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2053 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [19:38:00] 10Operations, 10ops-eqiad, 10DC-Ops: (NEED BY: ASAP) rack/setup/install thanos-fe100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T251620 (10Jclark-ctr) [19:39:24] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) install additional SSDs into prometheus100[34] - https://phabricator.wikimedia.org/T251621 (10Jclark-ctr) have not seen these arrive [19:45:19] 10Operations, 10ops-eqiad, 10netops: asw2-d1-eqiad:VCP failure - https://phabricator.wikimedia.org/T252797 (10wiki_willy) a:03Jclark-ctr [19:52:20] RECOVERY - Check systemd state on ms-be2053 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:57:42] 10Operations, 10SRE-Access-Requests: Requesting access to sites from Google Search Console for soworu - https://phabricator.wikimedia.org/T252705 (10RLazarus) a:03RLazarus Hi Segun, thanks for the clear and complete request! I've granted you [[ https://support.google.com/webmasters/answer/7687615 | "restrict... [19:58:09] 10Operations, 10SRE-Access-Requests: Requesting access to sites from Google Search Console for soworu - https://phabricator.wikimedia.org/T252705 (10RLazarus) p:05Triage→03Medium [20:00:04] halfak and accraze: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Graphoid / Citoid / ORES . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200518T2000). [20:00:42] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2053 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [20:18:53] (03PS9) 10Privacybatm: transfer.py: Add the ability to auto-detect free port for netcat to listen [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/595516 (https://phabricator.wikimedia.org/T252171) [20:23:59] !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@12efc14]: Update mobileapps to c960b349 [20:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:29] !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@12efc14]: Update mobileapps to c960b349 (duration: 03m 31s) [20:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:17] (03PS1) 10Privacybatm: Add comments to Firewall, MariaDB and transfer modules [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/597158 (https://phabricator.wikimedia.org/T252171) [20:34:20] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 9516 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:36:12] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 52 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:37:48] (03PS1) 10Andrew Bogott: prepare cloudnet2002-dev and 2003-dev for rebuild with Buster [puppet] - 10https://gerrit.wikimedia.org/r/597159 (https://phabricator.wikimedia.org/T251294) [20:46:39] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` thanos-be2002.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [21:00:04] Reedy and sbassett: That opportune time is upon us again. Time for a Weekly Security deployment window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200518T2100). [21:01:02] 10Operations, 10SRE-Access-Requests: Requesting access to wikimedia namespace in packagist - https://phabricator.wikimedia.org/T252987 (10bd808) Packagist does not have an "org account" concept to make multi-user access easy (https://github.com/composer/packagist/issues/461). Instead we have a shared username... [21:11:34] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [21:11:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:02] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [21:14:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:40] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thanos-be2002.codfw.wmnet'] ` Of which those **FAILED**: ` ['thanos-be2002.codfw.wmnet'] ` [21:16:46] * Krinkle staging on mwdebug1002 [21:17:46] (03CR) 10Privacybatm: "> The idea would be that, "if you run transfer.py --verbose" you get the same output as before, but by default you get the more resonable " [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/597069 (https://phabricator.wikimedia.org/T252802) (owner: 10Privacybatm) [21:20:48] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` thanos-be2002.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [21:21:23] (03PS1) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/597165 [21:23:58] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.32/includes/resourceloader/dependencystore/: I015fa5885, I972a93806006 (duration: 01m 07s) [21:23:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:25:31] 10Operations: Requesting access to wikimedia namespace in packagist - https://phabricator.wikimedia.org/T252987 (10RLazarus) Thanks @bd808! It doesn't sound like there's presently anything for the SRE clinic-duty person to do here, so I'm optimistically removing #sre-access-requests, but feel free to add it back... [21:27:13] 10Operations, 10ops-eqiad, 10DC-Ops: (NEED BY: ASAP) rack/setup/install thanos-fe100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T251620 (10wiki_willy) a:03Jclark-ctr [21:27:29] 10Operations: Enforce reference to Phabricator task for all commits to modules/admin/data/data.yaml - https://phabricator.wikimedia.org/T142827 (10hashar) Nowaday this can be done directly in the puppet repository. There is an admin testenv (`tox -e admin`) which runs: `pytest modules/admin/data` I guess the log... [21:27:38] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org - https://phabricator.wikimedia.org/T251619 (10wiki_willy) a:03Jclark-ctr [21:27:58] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TDB) rack/setup/install cloudvirt103[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10wiki_willy) a:03Jclark-ctr [21:28:21] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) rack/setup/install thanos-be100[123] - https://phabricator.wikimedia.org/T251618 (10wiki_willy) a:03Jclark-ctr [21:32:12] (03PS2) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/597165 [21:34:50] (03PS1) 10RLazarus: admin: Add cbogen to ldap_only_users, preparatory to adding to wmf group. [puppet] - 10https://gerrit.wikimedia.org/r/597166 (https://phabricator.wikimedia.org/T252887) [21:36:10] (03CR) 10CDanis: [C: 03+1] admin: Add cbogen to ldap_only_users, preparatory to adding to wmf group. [puppet] - 10https://gerrit.wikimedia.org/r/597166 (https://phabricator.wikimedia.org/T252887) (owner: 10RLazarus) [21:36:30] (03CR) 10RLazarus: [C: 03+2] admin: Add cbogen to ldap_only_users, preparatory to adding to wmf group. [puppet] - 10https://gerrit.wikimedia.org/r/597166 (https://phabricator.wikimedia.org/T252887) (owner: 10RLazarus) [21:37:51] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [21:37:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:17] !log pt1979@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [21:40:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:49] (03PS3) 10Herron: netbox::frontend: log to localhost udp rsyslog listener (json compat) [puppet] - 10https://gerrit.wikimedia.org/r/597165 [21:41:16] (03CR) 10Herron: "PCC looks to DTRT https://puppet-compiler.wmflabs.org/compiler1001/22583/netbox1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/597165 (owner: 10Herron) [21:44:19] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Carly Bogen - https://phabricator.wikimedia.org/T252887 (10RLazarus) 05Open→03Resolved a:03RLazarus Hi Carly, I've added you to the wmf LDAP group -- feel free to reopen if you need anything else. ` rzl@mwmaint1... [21:45:37] (03CR) 10Andrew Bogott: [C: 03+2] prepare cloudnet2002-dev and 2003-dev for rebuild with Buster [puppet] - 10https://gerrit.wikimedia.org/r/597159 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [21:46:06] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thanos-be2002.codfw.wmnet'] ` and were **ALL** successful. [21:53:02] (03PS1) 10RLazarus: admin: Add dcipoletti to ldap_only_users, preparatory to adding to wmf group. [puppet] - 10https://gerrit.wikimedia.org/r/597169 (https://phabricator.wikimedia.org/T252674) [21:55:41] (03CR) 10CDanis: [C: 03+1] admin: Add dcipoletti to ldap_only_users, preparatory to adding to wmf group. [puppet] - 10https://gerrit.wikimedia.org/r/597169 (https://phabricator.wikimedia.org/T252674) (owner: 10RLazarus) [21:55:51] (03CR) 10RLazarus: [C: 03+2] admin: Add dcipoletti to ldap_only_users, preparatory to adding to wmf group. [puppet] - 10https://gerrit.wikimedia.org/r/597169 (https://phabricator.wikimedia.org/T252674) (owner: 10RLazarus) [21:56:35] andrewbogott: okay to merge yours? [21:56:44] yes please! [21:57:22] done 👍 [21:57:40] thx [21:59:33] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Add `dcipoletti` to `wmf` Access Group - https://phabricator.wikimedia.org/T252674 (10RLazarus) 05Open→03Resolved a:03RLazarus Hi Daniel, welcome to the Foundation! I've added you to the wmf LDAP group, feel free to reopen if you need anything e... [21:59:35] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` thanos-be2003.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [22:00:05] gehel and maryum: How many deployers does it take to do Wikidata Query Service weekly deploy deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200518T2200). [22:01:43] 10Operations, 10ORES, 10Release Pipeline (Blubber), 10Scoring-platform-team (Current): Build blubber file for ORES - https://phabricator.wikimedia.org/T210268 (10ACraze) [22:01:59] 10Operations, 10ORES, 10Release Pipeline (Blubber), 10Scoring-platform-team (Current): Build blubber file for ORES - https://phabricator.wikimedia.org/T210268 (10ACraze) a:03ACraze [22:02:38] !log Clear module_deps on hewiki (group1, s7) to monitor regeneration, ref T247028 [22:02:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:02:41] T247028: Database 'INSERT' query rate doubled (module_deps regression?) - https://phabricator.wikimedia.org/T247028 [22:08:19] 10Operations, 10ORES, 10Release Pipeline (Blubber), 10Scoring-platform-team (Current): Build blubber file for ORES - https://phabricator.wikimedia.org/T210268 (10ACraze) Got a WIP PR here: https://github.com/wikimedia/ores/pull/345 Still need to slim down the production image and handle the redis dep for... [22:15:55] !log ryankemper@deploy1001 Started deploy [wdqs/wdqs@4886dc3]: 0.3.32 [22:15:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:16:36] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [22:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:30] !log Clear module_deps on s2 wikis to monitor regeneration [22:18:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:19:18] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [22:19:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:19:52] (03CR) 10Volans: [C: 03+2] tests: relax bandit dependency [software/cumin] - 10https://gerrit.wikimedia.org/r/596448 (owner: 10Volans) [22:20:17] (03CR) 10Volans: [C: 03+2] tests: fix new pylint reported issues [software/cumin] - 10https://gerrit.wikimedia.org/r/596449 (owner: 10Volans) [22:21:28] (03CR) 10Volans: [C: 03+2] setup.py: make it Debian Buster compatible [software/cumin] - 10https://gerrit.wikimedia.org/r/596450 (owner: 10Volans) [22:22:17] (03PS2) 10Volans: Drop support for Python 3.5 and 3.6 [software/cumin] - 10https://gerrit.wikimedia.org/r/596451 [22:25:06] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thanos-be2003.codfw.wmnet'] ` and were **ALL** successful. [22:25:23] (03Merged) 10jenkins-bot: tests: relax bandit dependency [software/cumin] - 10https://gerrit.wikimedia.org/r/596448 (owner: 10Volans) [22:25:34] (03CR) 10Volans: [C: 03+2] Drop support for Python 3.5 and 3.6 [software/cumin] - 10https://gerrit.wikimedia.org/r/596451 (owner: 10Volans) [22:26:08] (03Merged) 10jenkins-bot: tests: fix new pylint reported issues [software/cumin] - 10https://gerrit.wikimedia.org/r/596449 (owner: 10Volans) [22:27:29] (03PS2) 10Volans: test: improve integration tests [software/cumin] - 10https://gerrit.wikimedia.org/r/596452 [22:27:47] (03Merged) 10jenkins-bot: setup.py: make it Debian Buster compatible [software/cumin] - 10https://gerrit.wikimedia.org/r/596450 (owner: 10Volans) [22:27:49] (03PS3) 10Volans: doc: fix and improved documentation [software/cumin] - 10https://gerrit.wikimedia.org/r/596453 [22:27:51] (03Merged) 10jenkins-bot: Drop support for Python 3.5 and 3.6 [software/cumin] - 10https://gerrit.wikimedia.org/r/596451 (owner: 10Volans) [22:27:57] (03PS3) 10Volans: doc: split HTML and manpage generation [software/cumin] - 10https://gerrit.wikimedia.org/r/596454 [22:28:23] (03CR) 10Volans: [C: 03+2] test: improve integration tests [software/cumin] - 10https://gerrit.wikimedia.org/r/596452 (owner: 10Volans) [22:29:50] (03CR) 10Volans: [C: 03+2] doc: fix and improved documentation [software/cumin] - 10https://gerrit.wikimedia.org/r/596453 (owner: 10Volans) [22:30:23] (03Merged) 10jenkins-bot: test: improve integration tests [software/cumin] - 10https://gerrit.wikimedia.org/r/596452 (owner: 10Volans) [22:30:31] (03PS4) 10Volans: doc: split HTML and manpage generation [software/cumin] - 10https://gerrit.wikimedia.org/r/596454 [22:31:01] (03CR) 10Volans: [C: 03+2] doc: split HTML and manpage generation [software/cumin] - 10https://gerrit.wikimedia.org/r/596454 (owner: 10Volans) [22:31:50] (03Merged) 10jenkins-bot: doc: fix and improved documentation [software/cumin] - 10https://gerrit.wikimedia.org/r/596453 (owner: 10Volans) [22:33:08] !log ryankemper@deploy1001 Finished deploy [wdqs/wdqs@4886dc3]: 0.3.32 (duration: 17m 12s) [22:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:33:34] (03Merged) 10jenkins-bot: doc: split HTML and manpage generation [software/cumin] - 10https://gerrit.wikimedia.org/r/596454 (owner: 10Volans) [22:35:01] !log Clear module_deps on commonswiki (group1, s4) to monitor regeneration [22:35:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:48:19] !log Clear module_deps on commonswiki (group0, mostly s3) to monitor regeneration [22:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:48:26] !log Clear module_deps on group0 (mostly s3) to monitor regeneration [22:48:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:55:58] !log Clear module_deps on dewiki (group2, old mw version, s5) to monitor regeneration [22:56:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:00:05] RoanKattouw, Niharika, and Urbanecm: #bothumor I � Unicode. All rise for Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200518T2300). [23:00:05] No GERRIT patches in the queue for this window AFAICS. [23:05:58] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` thanos-be2004.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [23:12:29] !log Restarted `wdqs-updater` across all wdqs nodes and restarted `wdqs-categories` across all nodes except 1010 (test wdqs server) and 1009 (automated deployment server) [23:12:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:19] 10Operations, 10Research: Add Git LFS support for research/wikiworkshop - https://phabricator.wikimedia.org/T252956 (10bmansurov) @Dzahn we have about 1.7 GB of video recordings of the recent workshop. We're trying to upload those files. Currently, there are only 4 files. In the future, we may have more video... [23:19:19] 10Operations, 10Research: Add Git LFS support for research/wikiworkshop - https://phabricator.wikimedia.org/T252956 (10Reedy) Why would you store videos in git? It seems the wrong way of going about it for files that aren't going to be changed... Are they freely licensed? Can you just upload them to commons? [23:23:01] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [23:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:25:25] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [23:25:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:03] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thanos-be2004.codfw.wmnet'] ` Of which those **FAILED**: ` ['thanos-be2004.codfw.wmnet'] ` [23:29:33] @Urbanecm @Niharika @RoanKattouw can I still add a patch to swat? [23:29:41] Sure, I can SWAT it [23:30:07] its a simple typo fix that is bugging me: https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/e6deb7ae270494249ebeda8c514e46da969e120b/w/static.php#20 says "requesrs" instead of "requests" [23:30:36] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` thanos-be2004.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [23:30:44] (03PS1) 10DannyS712: static.php: Fix a typo (requesrs -> requests) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597173 (https://phabricator.wikimedia.org/T201491) [23:31:03] (03PS2) 10DannyS712: static.php: Fix a typo (requesrs -> requests) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597173 (https://phabricator.wikimedia.org/T201491) [23:33:03] @RoanKattouw its https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/597173/ [23:36:51] 10Operations, 10Research: Add Git LFS support for research/wikiworkshop - https://phabricator.wikimedia.org/T252956 (10leila) Thanks all for your comments. @bmansurov how about we go with uploading them on [[ https://www.youtube.com/channel/UCgIIsBhcseFH1Kghmo0ULbA | Wikimedia Foundation's YouTube channel ]]... [23:37:34] (03CR) 10CRusnov: "see https://gerrit.wikimedia.org/r/c/operations/puppet/+/596787 for an alternate fix. jbond will be looking at it tomorrow." [puppet] - 10https://gerrit.wikimedia.org/r/597165 (owner: 10Herron) [23:38:32] (03CR) 10Catrope: [C: 03+2] static.php: Fix a typo (requesrs -> requests) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597173 (https://phabricator.wikimedia.org/T201491) (owner: 10DannyS712) [23:39:19] (03Merged) 10jenkins-bot: static.php: Fix a typo (requesrs -> requests) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597173 (https://phabricator.wikimedia.org/T201491) (owner: 10DannyS712) [23:47:38] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [23:47:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:10] !log pt1979@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [23:50:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:55:58] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['thanos-be2004.codfw.wmnet'] ` and were **ALL** successful. [23:56:35] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10Papaul) [23:57:05] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10Papaul) 05Open→03Resolved @fgiunchedi All done