[00:00:04] RoanKattouw, Niharika, and Urbanecm: Dear deployers, time to do the Evening SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191220T0000). [00:00:05] No GERRIT patches in the queue for this window AFAICS. [00:05:48] !log mholloway-shell@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/WikimediaEditorTasks: Fix: Pass a RevisionRecord to Counter::onRevert from onArticleRollbackComplete (T241013) (duration: 00m 54s) [00:05:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:05:55] T241013: Argument 3 passed to MediaWiki\Extension\WikimediaEditorTasks\WikipediaAppDescriptionEditCounter::onRevert() must be an instance of MediaWiki\Revision\RevisionRecord, instance of Revision given, called in /srv/mediawiki/php-1.35.0-wmf.10/extensions/WikimediaEditorTasks/src/Hooks.php on line 116 - https://phabricator.wikimedia.org/T241013 [00:15:45] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users, researchers & wmf for Shay Nowick - https://phabricator.wikimedia.org/T240917 (10colewhite) Just to confirm the SSH key, could you paste it into a comment here and sign the comment with MFA? To sign a... [00:46:23] !log mholloway-shell@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/MachineVision: Fix: Ignore duplicate key errors when inserting data from annotation jobs (duration: 00m 53s) [00:46:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:48:21] PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 2.659e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [00:52:42] !log mholloway-shell@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/MachineVision: Fix: Allow a single period in $basePath in maintenance scripts (duration: 00m 54s) [00:52:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:02:29] RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [01:17:06] (03PS1) 10Volans: dns: add batch mode to generate snippets script [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559638 (https://phabricator.wikimedia.org/T233183) [01:17:08] (03PS1) 10Volans: dns: handle push failures [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559639 (https://phabricator.wikimedia.org/T233183) [01:52:20] (03PS2) 10Mstyles: Add new MLR models [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559614 (https://phabricator.wikimedia.org/T219534) [02:33:49] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [02:35:35] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [02:51:41] (03CR) 10Dzahn: "@Ebernhardson This isn't needed anymore i assume?" [puppet] - 10https://gerrit.wikimedia.org/r/554215 (https://phabricator.wikimedia.org/T236180) (owner: 10Dzahn) [03:13:39] PROBLEM - Backup freshness on backup1001 is CRITICAL: All failures: 1 (cloudmetrics1001), Fresh: 90 jobs https://wikitech.wikimedia.org/wiki/Backups%23Monitoring [03:20:49] (03CR) 10Dzahn: [C: 03+2] install_server: remove phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/552609 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn) [03:42:21] (03PS2) 10Dzahn: install_server: remove phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/552609 (https://phabricator.wikimedia.org/T238957) [03:42:43] (03PS1) 10Reedy: Another message [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/559644 [03:43:16] (03CR) 10Dzahn: "This change is ready for review." [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/559644 (owner: 10Reedy) [03:43:27] (03PS2) 10Reedy: Another message [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/559644 [03:44:23] (03CR) 10Dzahn: [C: 03+1] "Thanks! Please be prepared to receive relocating occupants on your floor!" [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/559644 (owner: 10Reedy) [03:44:38] (03PS3) 10Reedy: Another message [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/559644 [03:46:07] (03CR) 10Dzahn: [C: 03+2] install_server: remove phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/552609 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn) [03:46:18] (03PS3) 10Dzahn: install_server: remove phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/552609 (https://phabricator.wikimedia.org/T238957) [03:48:31] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [03:50:21] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [03:54:00] (03CR) 10Dzahn: [C: 03+1] Another message [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/559644 (owner: 10Reedy) [03:58:09] (03CR) 10Dzahn: [C: 03+2] phabricator: remove phab1003 from list of phab servers [puppet] - 10https://gerrit.wikimedia.org/r/552592 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn) [03:58:18] (03PS2) 10Dzahn: phabricator: remove phab1003 from list of phab servers [puppet] - 10https://gerrit.wikimedia.org/r/552592 (https://phabricator.wikimedia.org/T238957) [04:03:40] (03CR) 10Dzahn: [C: 03+1] "Adding DBA because it needs deployment and not just merge. This can be done now but isn't urgent." [puppet] - 10https://gerrit.wikimedia.org/r/552607 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn) [04:06:04] (03CR) 10Dzahn: [C: 03+1] "maybe after the holiday just in case.. all other things to bring back phab1003 wouldn't need help from another team" [puppet] - 10https://gerrit.wikimedia.org/r/552607 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn) [04:11:27] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/20083/" [puppet] - 10https://gerrit.wikimedia.org/r/552603 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn) [04:11:36] (03PS2) 10Dzahn: site: turn phab1003 into a spare::system [puppet] - 10https://gerrit.wikimedia.org/r/552603 (https://phabricator.wikimedia.org/T238957) [04:14:25] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 91 jobs https://wikitech.wikimedia.org/wiki/Backups%23Monitoring [04:18:51] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [04:20:39] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [04:30:40] 10Operations, 10ops-esams: rack/setup/install ganeti300[123] - https://phabricator.wikimedia.org/T236216 (10Dzahn) @herron netflow3001 and netflow4001 are popping up in Icinga because of the microcode mitigations. I think a reboot would fix that though. [04:39:17] (03PS1) 10Dzahn: phabricator: delete phab1003.yaml from Hiera data, not used anymore [puppet] - 10https://gerrit.wikimedia.org/r/559648 (https://phabricator.wikimedia.org/T238957) [04:40:58] (03CR) 10Dzahn: [C: 03+2] phabricator: delete phab1003.yaml from Hiera data, not used anymore [puppet] - 10https://gerrit.wikimedia.org/r/559648 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn) [04:53:17] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [04:54:29] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [04:55:17] !log phab1003 - rm /etc/ssh/sshd_config.phabricator ; kill 26085 (secondary sshd for phab; systemctl start sshd (fixes regular sshd) (T238957) [04:55:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:55:24] T238957: decommission phab1003.eqiad.wmnet - https://phabricator.wikimedia.org/T238957 [05:04:34] 10Operations, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10Legoktm) [05:46:24] (03CR) 10Dzahn: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/551919 (owner: 10Dzahn) [05:48:58] (03Abandoned) 10Dzahn: interface: use data types for Ipv4 and Ipv6 addresses [puppet] - 10https://gerrit.wikimedia.org/r/478114 (owner: 10Dzahn) [05:49:51] (03Abandoned) 10Dzahn: CNAMEs for bastions in each DC for user convenience [dns] - 10https://gerrit.wikimedia.org/r/489103 (owner: 10Dzahn) [05:52:13] (03Abandoned) 10Dzahn: (WIP) create eggdrop module [puppet] - 10https://gerrit.wikimedia.org/r/320698 (owner: 10Dzahn) [05:58:51] (03PS1) 10Dzahn: Revert "Remove access for bmansurov" [puppet] - 10https://gerrit.wikimedia.org/r/559651 [05:59:01] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:59:11] (03PS2) 10Dzahn: Revert "Remove access for bmansurov" [puppet] - 10https://gerrit.wikimedia.org/r/559651 (https://phabricator.wikimedia.org/T241089) [05:59:45] (03CR) 10jerkins-bot: [V: 04-1] Revert "Remove access for bmansurov" [puppet] - 10https://gerrit.wikimedia.org/r/559651 (https://phabricator.wikimedia.org/T241089) (owner: 10Dzahn) [06:00:49] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:01:49] (03PS1) 10Marostegui: db-eqiad.php: Depool pc1009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559652 [06:03:18] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool pc1009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559652 (owner: 10Marostegui) [06:04:12] (03Merged) 10jenkins-bot: db-eqiad.php: Depool pc1009 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559652 (owner: 10Marostegui) [06:05:22] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool pc1009 for upgrade (duration: 00m 55s) [06:05:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:22] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559653 [06:09:15] (03PS1) 10Dzahn: planet: add script to check whether http URLs work with https [puppet] - 10https://gerrit.wikimedia.org/r/559656 [06:09:34] (03CR) 10Andrew Bogott: [C: 03+1] add forward and reverse for cloudceph.svc.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/558707 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [06:10:22] (03CR) 10Andrew Bogott: [C: 03+1] ceph: add secondary interface for cloudceph servers [dns] - 10https://gerrit.wikimedia.org/r/558636 (https://phabricator.wikimedia.org/T240965) (owner: 10Jhedden) [06:11:01] 10Operations, 10Mail: Add security-team@wikimedia.org as recipient of abuse@wikimedia.org emails - https://phabricator.wikimedia.org/T241078 (10Dzahn) a:03Dzahn [06:11:56] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559653 (owner: 10Marostegui) [06:12:49] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559653 (owner: 10Marostegui) [06:12:57] 10Operations, 10Mail: Add security-team@wikimedia.org as recipient of abuse@wikimedia.org emails - https://phabricator.wikimedia.org/T241078 (10Dzahn) 05Open→03Resolved Done! committed change in private repo and ran puppet on mx1001 ` -abuse: postmaster +abuse: postmaster, security-team ` [06:13:57] PROBLEM - MariaDB Slave IO: s7 on db2087 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [06:14:14] me ^ [06:14:18] I thought I had downtimed it [06:14:19] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool pc1009 after upgrade (duration: 00m 54s) [06:14:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:15:02] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [06:15:43] RECOVERY - MariaDB Slave IO: s7 on db2087 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [06:16:56] 10Operations, 10LDAP-Access-Requests: Request for Superset & Turnilo Access - https://phabricator.wikimedia.org/T240988 (10Dzahn) Ilana is Product Manager for the Community Tech team. https://www.mediawiki.org/wiki/User:IFried_(WMF) A change in the puppet repo is needed to add her to "ldap_only" admins. [06:23:51] 10Operations, 10DNS, 10Traffic: redirect non-existing wikimania2020.wikimedia.org to wikimania.wikimedia.org - https://phabricator.wikimedia.org/T240341 (10Dzahn) >>! In T240341#5728007, @Bugreporter wrote: > If someone find an old Wikimania website they may think the current website is wikimania2020.wikimed... [06:25:24] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/c/operations/puppet/+/559656" [puppet] - 10https://gerrit.wikimedia.org/r/551919 (owner: 10Dzahn) [06:25:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1089 for upgrade', diff saved to https://phabricator.wikimedia.org/P9983 and previous config saved to /var/cache/conftool/dbconfig/20191220-062530-marostegui.json [06:25:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:32:40] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [06:32:49] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P9984 and previous config saved to /var/cache/conftool/dbconfig/20191220-063248-marostegui.json [06:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:38:47] (03CR) 10Marostegui: [C: 03+1] "Let's do this after holidays indeed. I can merge + remove the users from the DB" [puppet] - 10https://gerrit.wikimedia.org/r/552607 (https://phabricator.wikimedia.org/T238957) (owner: 10Dzahn) [06:41:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P9985 and previous config saved to /var/cache/conftool/dbconfig/20191220-064129-marostegui.json [06:41:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:54] !log netflow3001 - rebooting [06:41:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:17] 10Operations, 10cloud-services-team (Kanban): rack/setup/install cloudcontrol2001-dev & cloudvirt200[123]-dev - https://phabricator.wikimedia.org/T214448 (10Dzahn) fyi: new alerts at https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cloudcontrol2001-dev&service=DPKG https://icinga.wikimedia.... [06:50:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P9986 and previous config saved to /var/cache/conftool/dbconfig/20191220-065022-marostegui.json [06:50:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:10] 10Operations, 10SRE-tools, 10docker-pkg, 10serviceops: Report image metadata to debmonitor - https://phabricator.wikimedia.org/T241206 (10Joe) [06:56:25] 10Operations, 10SRE-tools, 10docker-pkg, 10serviceops: Report image metadata to debmonitor - https://phabricator.wikimedia.org/T241206 (10Joe) p:05Triage→03High a:03Joe [06:56:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P9987 and previous config saved to /var/cache/conftool/dbconfig/20191220-065628-marostegui.json [06:56:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:54] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1103:3312, db1103:3314 for upgrade', diff saved to https://phabricator.wikimedia.org/P9988 and previous config saved to /var/cache/conftool/dbconfig/20191220-065753-marostegui.json [06:57:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:58] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [07:24:14] (03PS3) 10Giuseppe Lavagetto: First version of the debmonitor client [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165 (https://phabricator.wikimedia.org/T241206) [07:24:22] (03CR) 10Giuseppe Lavagetto: First version of the debmonitor client (0314 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [07:30:05] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:30:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P9989 and previous config saved to /var/cache/conftool/dbconfig/20191220-073017-marostegui.json [07:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:31:27] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559547 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [07:33:10] (03CR) 10Muehlenhoff: "This changes kafka-main, not install servers?" [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [07:35:29] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:38:21] (03CR) 10Muehlenhoff: [C: 03+1] "Never mind my earlier comment, I assumed this referred to the install* servers, but this is about the role ofc. Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [07:40:43] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/559549 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [07:43:17] (03CR) 10Muehlenhoff: [C: 04-1] install_server: deprecate raid10-gpt.cfg (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559550 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [07:44:35] (03CR) 10Muehlenhoff: [C: 04-1] "These servers no longer exist (they have been renamed to kafka-main[12]00[1-3], it can simply be removed without replacement." [puppet] - 10https://gerrit.wikimedia.org/r/559551 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [07:45:00] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559552 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [07:45:06] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P9990 and previous config saved to /var/cache/conftool/dbconfig/20191220-074504-marostegui.json [07:45:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:54:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P9991 and previous config saved to /var/cache/conftool/dbconfig/20191220-075434-marostegui.json [07:54:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:11] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:57:24] 10Operations, 10Wikimedia-Mailing-lists: Create wikivoyage-zh mailing list - https://phabricator.wikimedia.org/T62255 (10Dzahn) [07:57:26] 10Operations, 10Wikimedia-Mailing-lists: Close mailing list "Wikivoyage-zh" - https://phabricator.wikimedia.org/T240850 (10Dzahn) [07:58:45] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:59:09] 10Operations, 10Wikimedia-Mailing-lists: Close mailing list "Wikivoyage-zh" - https://phabricator.wikimedia.org/T240850 (10Dzahn) 05Open→03Resolved a:03Dzahn Done. ` [fermium:~] $ sudo rmlist wikivoyage-zh ` [07:59:11] 10Operations, 10Wikimedia-Mailing-lists: Create wikivoyage-zh mailing list - https://phabricator.wikimedia.org/T62255 (10Dzahn) [08:02:36] (03CR) 10Dzahn: [C: 03+2] "just a little helper to run locally.. not being installed on the server" [puppet] - 10https://gerrit.wikimedia.org/r/559656 (owner: 10Dzahn) [08:04:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P9992 and previous config saved to /var/cache/conftool/dbconfig/20191220-080400-marostegui.json [08:04:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:41] (03PS1) 10Muehlenhoff: Remove obsolete partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/559666 (https://phabricator.wikimedia.org/T156955) [08:18:29] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:20:17] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:32:45] (03CR) 10Filippo Giunchedi: [C: 03+1] Remove obsolete partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/559666 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [08:33:34] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: remove unused raid1-30G.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559547 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [08:33:43] (03PS2) 10Filippo Giunchedi: install_server: remove unused raid1-30G.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559547 (https://phabricator.wikimedia.org/T156955) [08:34:25] (03CR) 10Andrew Bogott: [C: 03+1] "yep!" [puppet] - 10https://gerrit.wikimedia.org/r/559666 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [08:35:11] (03CR) 10Andrew Bogott: [C: 03+1] install_server: deprecate raid10-gpt-srv-lvm-xfs.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559552 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [08:45:35] !log failover ganeti master on the new esams cluster [08:45:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:29] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: deprecate raid10-gpt-srv-lvm-xfs.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559552 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [08:46:37] (03PS2) 10Filippo Giunchedi: install_server: deprecate raid10-gpt-srv-lvm-xfs.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559552 (https://phabricator.wikimedia.org/T156955) [08:49:34] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/559509 (https://phabricator.wikimedia.org/T239832) (owner: 10Muehlenhoff) [08:51:18] !log addshore@graphite1004&2003:~$ sudo -u _graphite find /var/lib/carbon/whisper/daily/wikidata/api/actions -delete # T227594 [08:51:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:23] T227594: Remove daily.wikidata.api.actions.* metrics from graphite - https://phabricator.wikimedia.org/T227594 [08:52:06] 10Operations, 10WMDE-Analytics-Engineering, 10User-Addshore: Regularly & Automatically backup WMDE metrics stored in graphite - https://phabricator.wikimedia.org/T125408 (10Addshore) [08:53:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1096:3315, db1096:3316 for upgrade', diff saved to https://phabricator.wikimedia.org/P9993 and previous config saved to /var/cache/conftool/dbconfig/20191220-085300-marostegui.json [08:53:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:22] 10Operations, 10ops-eqsin: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099 (10MoritzMuehlenhoff) JFTR to avoid confusion: These should use Buster (we have the main Ganeti clusters on Stretch, but the new edge Ganeti setups are on Buster). [09:00:22] (03CR) 10Jbond: [C: 03+1] "np, lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/554215 (https://phabricator.wikimedia.org/T236180) (owner: 10Dzahn) [09:02:06] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1096:3315 db1096:3316', diff saved to https://phabricator.wikimedia.org/P9994 and previous config saved to /var/cache/conftool/dbconfig/20191220-090204-marostegui.json [09:02:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:20] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [09:04:11] (03PS1) 10Dzahn: admins: add ifried to ldap_only_admins, wmf group [puppet] - 10https://gerrit.wikimedia.org/r/559704 (https://phabricator.wikimedia.org/T240988) [09:06:25] (03PS1) 10Dzahn: conftool: remove parsoid, keep parsoid-php [puppet] - 10https://gerrit.wikimedia.org/r/559705 (https://phabricator.wikimedia.org/T241207) [09:10:52] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1096:3315 db1096:3316', diff saved to https://phabricator.wikimedia.org/P9995 and previous config saved to /var/cache/conftool/dbconfig/20191220-091050-marostegui.json [09:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1096:3315 db1096:3316', diff saved to https://phabricator.wikimedia.org/P9996 and previous config saved to /var/cache/conftool/dbconfig/20191220-091934-marostegui.json [09:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:52] (03CR) 10Ayounsi: "Thanks!" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789) (owner: 10Ayounsi) [09:21:49] (03PS4) 10Ayounsi: Fastnetmon: add thresholds overrides [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789) [09:26:17] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [09:26:18] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:26:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:06] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1096:3315 db1096:3316', diff saved to https://phabricator.wikimedia.org/P9997 and previous config saved to /var/cache/conftool/dbconfig/20191220-092805-marostegui.json [09:28:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:58] !log applied Ganeti cluster setting to pass through CPU flags for MDS/SSBD to esams/ulsfo clusters T226444 T236216 [09:32:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:05] T236216: rack/setup/install ganeti300[123] - https://phabricator.wikimedia.org/T236216 [09:32:06] T226444: rack/setup/install ganeti400[123] - https://phabricator.wikimedia.org/T226444 [09:32:28] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [09:32:29] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:32:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:55] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:35:43] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:37:38] (03PS1) 10Muehlenhoff: Re-enable notifications for ganeti3*, setup complete [puppet] - 10https://gerrit.wikimedia.org/r/559710 (https://phabricator.wikimedia.org/T236216) [09:37:49] (03PS1) 10Ema: ATS: remove X-Analytics from responses sent to users [puppet] - 10https://gerrit.wikimedia.org/r/559711 (https://phabricator.wikimedia.org/T196558) [09:40:17] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust main traffic weight for db1096:3316 and db1098:3316', diff saved to https://phabricator.wikimedia.org/P9998 and previous config saved to /var/cache/conftool/dbconfig/20191220-094016-marostegui.json [09:40:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:05] (03CR) 10Muehlenhoff: [C: 03+2] Re-enable notifications for ganeti3*, setup complete [puppet] - 10https://gerrit.wikimedia.org/r/559710 (https://phabricator.wikimedia.org/T236216) (owner: 10Muehlenhoff) [09:48:21] 10Operations, 10netbox: Sync new ganeti clusters with netbox - https://phabricator.wikimedia.org/T241166 (10MoritzMuehlenhoff) This might be a duplicate of T239123? [09:50:39] 10Operations, 10ops-esams, 10Patch-For-Review: rack/setup/install ganeti300[123] - https://phabricator.wikimedia.org/T236216 (10MoritzMuehlenhoff) [09:51:02] 10Operations, 10Analytics, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Send X-Analytics information from Varnish to Hadoop with VCL_Log - https://phabricator.wikimedia.org/T196558 (10ema) [09:52:31] 10Operations, 10ops-esams, 10Patch-For-Review: rack/setup/install ganeti300[123] - https://phabricator.wikimedia.org/T236216 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff >>! In T236216#5754892, @herron wrote: > The esams ganeti cluster is now up and running, and netflow3001 has been cr... [09:55:29] 10Operations, 10Traffic: rack/setup/install ganeti400[123] - https://phabricator.wikimedia.org/T226444 (10MoritzMuehlenhoff) [09:56:03] 10Operations, 10Traffic: rack/setup/install ganeti400[123] - https://phabricator.wikimedia.org/T226444 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff I tested a failover and an instance migration successfully. I also changed the cluster setting so that CPU vulnerability flags are passed th... [10:18:50] (03CR) 10Muehlenhoff: [C: 03+1] "Looks great!" (032 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [10:22:09] (03CR) 10Vgutierrez: [C: 03+1] ATS: remove X-Analytics from responses sent to users [puppet] - 10https://gerrit.wikimedia.org/r/559711 (https://phabricator.wikimedia.org/T196558) (owner: 10Ema) [10:27:12] (03CR) 10Jforrester: [C: 03+2] Another message [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/559644 (owner: 10Reedy) [10:27:34] (03PS2) 10Muehlenhoff: Remove obsolete partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/559666 (https://phabricator.wikimedia.org/T156955) [10:27:39] (03Merged) 10jenkins-bot: Another message [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/559644 (owner: 10Reedy) [10:29:40] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/559666 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [10:32:01] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559704 (https://phabricator.wikimedia.org/T240988) (owner: 10Dzahn) [10:35:54] (03PS3) 10Muehlenhoff: Track Kerberos principals in data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/559104 (https://phabricator.wikimedia.org/T235418) [10:38:46] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:40:20] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:42:59] (03CR) 10Muehlenhoff: [C: 03+2] Track Kerberos principals in data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/559104 (https://phabricator.wikimedia.org/T235418) (owner: 10Muehlenhoff) [10:44:53] (03CR) 10Jbond: "LGTM added a few comments and i did go on a bit so feel free to ping me on irc if anything makes senses." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789) (owner: 10Ayounsi) [10:48:42] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:50:07] (03PS1) 10Muehlenhoff: Add a note to manage_principals for added/removed Kerberos principals [puppet] - 10https://gerrit.wikimedia.org/r/559731 [10:52:16] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:55:17] (03PS1) 10Muehlenhoff: Remove obsolete partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/559737 (https://phabricator.wikimedia.org/T156955) [10:55:48] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:57:36] (03CR) 10Volans: "LGTM, just two small things (a test failing and black requiring 3.6+)" (0316 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [10:58:04] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "overall ok, a small correction." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/558158 (https://phabricator.wikimedia.org/T240824) (owner: 10Effie Mouzeli) [11:03:01] (03PS6) 10Effie Mouzeli: mediawiki::php::admin memory optimisation for lib.php [puppet] - 10https://gerrit.wikimedia.org/r/558158 (https://phabricator.wikimedia.org/T240824) [11:08:22] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:13:42] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:15:30] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:15:40] (03CR) 10Volans: "2 small improvements inline, nits" (032 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [11:17:20] (03CR) 10Volans: [C: 03+2] "Self merging to unblock testing, please comment anyway, I'll send a follow up patch if anything should be changed." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559638 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [11:17:27] (03PS7) 10Effie Mouzeli: mediawiki::php::admin memory optimisation for lib.php [puppet] - 10https://gerrit.wikimedia.org/r/558158 (https://phabricator.wikimedia.org/T240824) [11:17:43] (03CR) 10Volans: [C: 03+2] "Self merging to unblock testing, please comment anyway, I'll send a follow up patch if anything should be changed." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559639 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [11:24:52] (03PS4) 10Arturo Borrero Gonzalez: toolforge: new k8s: add kube-state-metrics.yaml [puppet] - 10https://gerrit.wikimedia.org/r/559506 (https://phabricator.wikimedia.org/T237643) [11:27:58] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:29:21] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "thanks @bstorm for your review! I think I got paranoid by the auth thing, it wasn't clear to me." [puppet] - 10https://gerrit.wikimedia.org/r/559506 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [11:29:42] (03PS1) 10Muehlenhoff: Support password changes in manage_principals (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/559765 (https://phabricator.wikimedia.org/T237605) [11:29:44] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:30:36] (03CR) 10Giuseppe Lavagetto: First version of the debmonitor client (033 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [11:32:06] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] dynamicproxy: add backend information to access log entries [puppet] - 10https://gerrit.wikimedia.org/r/554041 (https://phabricator.wikimedia.org/T238641) (owner: 10Arturo Borrero Gonzalez) [11:33:20] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:35:06] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:38:15] (03PS1) 10Arturo Borrero Gonzalez: toolforge: new k8s: kube-state-metrics: drop toleration to run on control nodes [puppet] - 10https://gerrit.wikimedia.org/r/559771 (https://phabricator.wikimedia.org/T237643) [11:38:38] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:39:07] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: kube-state-metrics: drop toleration to run on control nodes [puppet] - 10https://gerrit.wikimedia.org/r/559771 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [11:40:24] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:43:58] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:45:44] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:49:02] (03PS22) 10Volans: netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov) [11:57:48] (03PS23) 10Volans: netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov) [12:00:09] !log installing glib2.0 security updates on stretch [12:00:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:34] (03CR) 10Volans: "I've uploaded a new PS with the fixes, replies inline. Compiler looks happy:" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov) [12:03:28] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [12:05:16] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [12:05:25] (03CR) 10Alexandros Kosiaris: [C: 03+1] "+1, but I get bump the chart version?" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/559607 (owner: 10Ottomata) [12:16:04] (03CR) 10Filippo Giunchedi: [C: 03+1] Remove obsolete partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/559737 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [12:27:52] 10Operations, 10Puppet, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) I queried the puppetdb [[ https://phabricator.wikimedia.org/P10000 | using pypuppetdb ]] and there is only one file in [[ https://github.com/wikimedia/puppet/blob/production/modules/package_... [12:30:35] (03CR) 10Giuseppe Lavagetto: First version of the debmonitor client (032 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [12:31:05] (03PS4) 10Giuseppe Lavagetto: First version of the debmonitor client [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165 (https://phabricator.wikimedia.org/T241206) [12:31:57] I get "Error sending hot-shots message: Error: getaddrinfo ENOTFOUND labmon1001.eqiad.wmnet" in cloud VPS, is labmon1001 got decommissioned? Doesn't have any notice: https://wikitech.wikimedia.org/wiki/Labmon1001 but icinga can't find it: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=labmon1001 [12:33:14] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Improve ATS backend connection reuse against origin servers - https://phabricator.wikimedia.org/T241145 (10ema) p:05Triage→03Normal [12:33:36] 10Operations, 10DNS, 10Mail, 10Traffic: wikimedia.community domain name is not resolving an mx record - https://phabricator.wikimedia.org/T241132 (10ema) p:05Triage→03Normal [12:33:42] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [12:33:56] I found cloudmetrics1002 in puppet git log grep [12:34:23] 10Operations, 10Traffic: Create a system for distributed shared secret material to server tmps - https://phabricator.wikimedia.org/T240866 (10ema) p:05Triage→03Normal [12:34:48] 10Operations, 10Traffic: Secure shared ticket key rotation for anycast authdns - https://phabricator.wikimedia.org/T240863 (10ema) p:05Triage→03Normal [12:35:28] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [12:36:26] 10Operations, 10Traffic: HTTPS/Browser Recommendations page on Wikitech is outdated - https://phabricator.wikimedia.org/T240813 (10ema) p:05Triage→03Normal [12:37:02] 10Operations, 10DNS, 10Domains, 10Traffic: Donate wikiźródła.pl and wikisłownik.pl to the Foundation - https://phabricator.wikimedia.org/T240446 (10ema) p:05Triage→03Normal [12:42:57] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime [12:42:57] !log aborrero@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [12:43:00] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime [12:43:01] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [12:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:44] (03PS1) 10Giuseppe Lavagetto: Add class to scan a registry for images [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559804 (https://phabricator.wikimedia.org/T241206) [12:45:27] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] First version of the debmonitor client [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [12:50:22] (03PS1) 10Jforrester: [officewiki] Grant ipblock-exempt to all users on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559811 (https://phabricator.wikimedia.org/T231943) [12:51:32] (03CR) 10Jforrester: "This is a far-simpler solution than endless re-workings of the Block page in certain circumstances." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559811 (https://phabricator.wikimedia.org/T231943) (owner: 10Jforrester) [12:51:41] 10Operations, 10Puppet, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) [12:52:27] 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labmon* to Buster - https://phabricator.wikimedia.org/T224585 (10Ladsgroup) Hey, thanks for doing this but I really think this needs some sort of announcement, lots of tools in labs depend on this and one of our pr... [12:53:36] 10Operations, 10Traffic: High CPU usage for ats-be ET_NET thread handling PURGE requests on cache_text - https://phabricator.wikimedia.org/T241232 (10ema) [12:53:49] (03PS1) 10Jbond: package_builder: manage source repositories using apt::repository [puppet] - 10https://gerrit.wikimedia.org/r/559812 (https://phabricator.wikimedia.org/T214605) [12:54:59] 10Operations, 10Traffic: High CPU usage for ats-be ET_NET thread handling PURGE requests on cache_text - https://phabricator.wikimedia.org/T241232 (10ema) p:05Triage→03Normal [12:57:57] (03CR) 10Ayounsi: "Thanks for the thorough comments!" [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789) (owner: 10Ayounsi) [12:59:58] (03PS1) 10BBlack: Switch to digicert-2019a for eqsin and esams [puppet] - 10https://gerrit.wikimedia.org/r/559816 (https://phabricator.wikimedia.org/T238494) [13:00:31] (03CR) 10Jbond: [C: 03+1] "ack lgtm then :)" [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789) (owner: 10Ayounsi) [13:04:04] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:05:52] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:07:29] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Audit /etc/apt directories - https://phabricator.wikimedia.org/T214605 (10jbond) I have generated a list of managed sources files with the following snippet ` lang=python from pypuppetdb import connect from pypuppetdb.QueryBuilder import RegexOpera... [13:12:15] (03CR) 10Muehlenhoff: package_builder: manage source repositories using apt::repository (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559812 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [13:12:30] (03CR) 10Ayounsi: [C: 03+2] Fastnetmon: add thresholds overrides [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789) (owner: 10Ayounsi) [13:13:14] (03CR) 10BBlack: [C: 03+2] Switch to digicert-2019a for eqsin and esams [puppet] - 10https://gerrit.wikimedia.org/r/559816 (https://phabricator.wikimedia.org/T238494) (owner: 10BBlack) [13:13:54] XioNoX: merging yours [13:14:07] bblack: thx [13:14:11] was about to do it [13:15:51] !log esams+eqiad edges switching back to digicert-2019a unified TLS cert [13:15:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:11] grrr [13:16:25] !log [correction] esams+eqsin edges switching back to digicert-2019a unified TLS cert [13:16:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:02] maybe we need a unique and less-mechanical naming system for DCs :) [13:17:50] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/559737 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [13:22:03] (03PS2) 10Jbond: package_builder: manage source repositories using apt::repository [puppet] - 10https://gerrit.wikimedia.org/r/559812 (https://phabricator.wikimedia.org/T214605) [13:22:24] (03PS1) 10Arturo Borrero Gonzalez: toolforge: new k8s: kube-state-metrics: updates to the service endpoint [puppet] - 10https://gerrit.wikimedia.org/r/559820 (https://phabricator.wikimedia.org/T237643) [13:22:55] 10Operations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Return traffic to eqiad WMCS triggering FNM - https://phabricator.wikimedia.org/T240789 (10ayounsi) 05Open→03Resolved a:03ayounsi All good! [13:23:56] 10Operations, 10Traffic: ats-be: consider moving accept from dedicated thread to workers - https://phabricator.wikimedia.org/T241233 (10ema) [13:24:04] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: kube-state-metrics: updates to the service endpoint [puppet] - 10https://gerrit.wikimedia.org/r/559820 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [13:24:06] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559812 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [13:24:18] (03CR) 10Jbond: "thanks" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559812 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [13:24:20] (03CR) 10Jbond: [C: 03+2] package_builder: manage source repositories using apt::repository [puppet] - 10https://gerrit.wikimedia.org/r/559812 (https://phabricator.wikimedia.org/T214605) (owner: 10Jbond) [13:24:36] 10Operations, 10Traffic: ats-be: consider increasing accept threads or moving accept from dedicated thread to workers - https://phabricator.wikimedia.org/T241233 (10ema) [13:24:47] 10Operations, 10Traffic: ats-be: consider increasing accept threads or moving accept from dedicated thread to workers - https://phabricator.wikimedia.org/T241233 (10ema) p:05Triage→03Normal [13:29:07] (03PS1) 10Jbond: package_builder: ensure title is dynamic [puppet] - 10https://gerrit.wikimedia.org/r/559825 [13:30:15] (03CR) 10Jbond: [C: 03+2] package_builder: ensure title is dynamic [puppet] - 10https://gerrit.wikimedia.org/r/559825 (owner: 10Jbond) [13:30:52] (03PS1) 10Muehlenhoff: thorium/eventlog: Switch to standard recipes [puppet] - 10https://gerrit.wikimedia.org/r/559827 (https://phabricator.wikimedia.org/T156955) [13:34:16] (03CR) 10Elukey: [C: 03+1] Support password changes in manage_principals (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/559765 (https://phabricator.wikimedia.org/T237605) (owner: 10Muehlenhoff) [13:36:46] PROBLEM - rpki grafana alert on icinga1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: eqiad rsync status alert, rsync status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [13:36:51] (03PS1) 10Arturo Borrero Gonzalez: toolforge: prometheus: add job for kube-state-metrics [puppet] - 10https://gerrit.wikimedia.org/r/559830 (https://phabricator.wikimedia.org/T237643) [13:38:14] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: prometheus: add job for kube-state-metrics [puppet] - 10https://gerrit.wikimedia.org/r/559830 (https://phabricator.wikimedia.org/T237643) (owner: 10Arturo Borrero Gonzalez) [13:42:39] 10Puppet: Add QUIC for ATS - https://phabricator.wikimedia.org/T241237 (10Obiwan2208-2) [13:48:36] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:50:22] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:51:17] 10Operations, 10Traffic: Cleanup after varnish-be -> ats-be migration - https://phabricator.wikimedia.org/T241239 (10ema) [13:51:28] thats getting increasingly spikey [13:51:36] 10Operations, 10Traffic: Cleanup after varnish-be -> ats-be migration - https://phabricator.wikimedia.org/T241239 (10ema) p:05Triage→03Normal [13:59:16] 10Operations: Integrate Stretch 9.10/9.11 point updates - https://phabricator.wikimedia.org/T232308 (10MoritzMuehlenhoff) [14:03:40] 10Puppet: Add QUIC for ATS - https://phabricator.wikimedia.org/T241237 (10Aklapper) Hi @Obiwan2208-2, thanks for taking the time to report this and welcome to Wikimedia Phabricator! Unfortunately this Phabricator task lacks some information. Please [[ https://www.mediawiki.org/wiki/How_to_report_a_bug | add a m... [14:04:35] (03PS1) 10CDanis: esams collector netflow3001 [homer/public] - 10https://gerrit.wikimedia.org/r/559842 [14:05:44] (03CR) 10Ayounsi: [C: 03+1] esams collector netflow3001 [homer/public] - 10https://gerrit.wikimedia.org/r/559842 (owner: 10CDanis) [14:08:21] (03CR) 10CDanis: [C: 03+2] esams collector netflow3001 [homer/public] - 10https://gerrit.wikimedia.org/r/559842 (owner: 10CDanis) [14:08:25] (03CR) 10CDanis: [V: 03+2 C: 03+2] esams collector netflow3001 [homer/public] - 10https://gerrit.wikimedia.org/r/559842 (owner: 10CDanis) [14:09:57] (03PS1) 10Jbond: puppet-merge: Ensure we update the labs repo on every puppet run [puppet] - 10https://gerrit.wikimedia.org/r/559843 [14:09:59] (03PS1) 10Jbond: puppet-merge: ensure puppet is only run from one server [puppet] - 10https://gerrit.wikimedia.org/r/559844 [14:10:39] akosiaris: hey! is there a chance that the health check for the termbox service currently works without a valid `response` field in the `x-amples` json? [14:10:43] (03PS2) 10Filippo Giunchedi: install_server: use raid10-8dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) [14:10:45] (03PS2) 10Filippo Giunchedi: install_server: use raid10-6dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559549 (https://phabricator.wikimedia.org/T156955) [14:10:47] (03PS2) 10Filippo Giunchedi: install_server: deprecate raid10-gpt.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559550 (https://phabricator.wikimedia.org/T156955) [14:10:49] (03PS2) 10Filippo Giunchedi: install_server: deprecate raid10-gpt-srv-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559551 (https://phabricator.wikimedia.org/T156955) [14:10:51] (03PS2) 10Filippo Giunchedi: install_server: deprecate raid10-gpt-srv-lvm-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) [14:11:09] https://gerrit.wikimedia.org/r/c/wikibase/termbox/+/556431 this patch surfaced that we currently nest the `response` field inside the `request` field which clearly isn't right [14:11:15] (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/559551 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [14:11:23] but the health checks still seem to work *somehow* [14:11:32] jakob_WMDE: lemme have a look [14:11:36] thanks! [14:11:43] (03CR) 10Filippo Giunchedi: [C: 03+1] install_server: deprecate raid10-gpt.cfg (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559550 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [14:11:51] (03PS1) 10DCausse: [wdqs] enable async imports on wdqs1005 and wdqs2001 [puppet] - 10https://gerrit.wikimedia.org/r/559847 (https://phabricator.wikimedia.org/T238045) [14:12:15] (03CR) 10jerkins-bot: [V: 04-1] puppet-merge: ensure puppet is only run from one server [puppet] - 10https://gerrit.wikimedia.org/r/559844 (owner: 10Jbond) [14:13:41] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559550 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [14:14:04] !log homer 'cr*esams*' commit 'I022c62120 enable netflow collection in esams' [14:14:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:32] jakob_WMDE: https://gerrit.wikimedia.org/r/plugins/gitiles/wikibase/termbox/+/refs/heads/master/openapi.json#111 [14:14:42] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559551 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [14:14:46] it looks like the /termbox endpoint isn't monitored? [14:16:01] akosiaris: it dynamically sets the x-monitor value. I'm fairly certain it's monitored [14:16:13] a, wait that was the dynanic setting. Was done back in https://gerrit.wikimedia.org/r/plugins/gitiles/wikibase/termbox/+/7f9aaea5c743cf53a0a823e44356108b57e1c14d%5E%21/#F4 [14:16:16] I remembered that now [14:16:33] yup [14:17:23] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: deprecate raid10-gpt-srv-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559551 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [14:17:31] (03PS3) 10Filippo Giunchedi: install_server: deprecate raid10-gpt-srv-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559551 (https://phabricator.wikimedia.org/T156955) [14:18:34] (03PS2) 10Jbond: puppet-merge: ensure puppet is only run from one server [puppet] - 10https://gerrit.wikimedia.org/r/559844 [14:23:20] jakob_WMDE: found it I think. https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/service-checker/+/refs/heads/master/servicechecker/swagger.py#132 [14:23:48] and default_request has a response: {'status': 200}, in line 40 [14:24:03] so it assumes that if you forgot to add it you want a 200 [14:24:46] I wonder how many services would break if I removed that [14:25:04] I somehow thought that it would break at https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/service-checker/+/304884e7a2bad975d7b72b8185ee38adf79a86cb/servicechecker/swagger.py#106 but maybe not [14:25:27] niah, what you send back is valid json [14:25:27] good, but that means that we should fix that soon and in case we want to check for more than just status: 200 :) [14:25:48] it's even valid swagger as anything x- isn't validated as it's assumed to be business specific [14:26:10] right, makes sense [14:27:14] akosiaris: thanks for checking! [14:27:20] yw [14:28:03] (03PS1) 10Hashar: labs: send integration alerts to the team [puppet] - 10https://gerrit.wikimedia.org/r/559858 [14:29:10] (03CR) 10Hashar: "It is really mostly puppet/apt failures on one of the integration instance. Not too spammy ;)" [puppet] - 10https://gerrit.wikimedia.org/r/559858 (owner: 10Hashar) [14:29:37] (03PS1) 10Volans: Add netbox-exports.w.o record as CNAME of netbox [dns] - 10https://gerrit.wikimedia.org/r/559860 (https://phabricator.wikimedia.org/T233183) [14:34:13] (03CR) 10Volans: [C: 03+2] Add netbox-exports.w.o record as CNAME of netbox [dns] - 10https://gerrit.wikimedia.org/r/559860 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [14:35:33] (03PS1) 10Volans: Fix typo for netbox-exports [dns] - 10https://gerrit.wikimedia.org/r/559862 (https://phabricator.wikimedia.org/T233183) [14:35:40] sorry [14:36:04] (03CR) 10Volans: [C: 03+2] Fix typo for netbox-exports [dns] - 10https://gerrit.wikimedia.org/r/559862 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [14:38:12] (03PS24) 10Volans: netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov) [14:38:43] !log temporarily disable puppet on netbox[12]001 to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/555715 - T233183 [14:38:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:49] T233183: Automate generation of Management DNS records from Netbox - https://phabricator.wikimedia.org/T233183 [14:44:31] (03CR) 10Volans: [C: 03+2] netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov) [14:45:07] (03CR) 10ArielGlenn: "The raid10-4dev.cfg looks like it has a different raid recipe than the current one, as far as francium goes. what am I missing?" [puppet] - 10https://gerrit.wikimedia.org/r/559550 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [14:45:37] (03PS3) 10Ottomata: Fix kafka-dev chart to work with docker-desktop [deployment-charts] - 10https://gerrit.wikimedia.org/r/559607 [14:45:47] (03CR) 10Ottomata: "Done." (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/559607 (owner: 10Ottomata) [14:46:35] (03PS2) 10Jbond: puppet-merge: Ensure we update the labs repo on every puppet run [puppet] - 10https://gerrit.wikimedia.org/r/559843 [14:46:40] (03CR) 10Ottomata: [C: 03+2] Fix kafka-dev chart to work with docker-desktop [deployment-charts] - 10https://gerrit.wikimedia.org/r/559607 (owner: 10Ottomata) [14:47:56] (03CR) 10jerkins-bot: [V: 04-1] puppet-merge: Ensure we update the labs repo on every puppet run [puppet] - 10https://gerrit.wikimedia.org/r/559843 (owner: 10Jbond) [14:48:41] (03PS3) 10Jbond: puppet-merge: Ensure we update the labs repo on every puppet run [puppet] - 10https://gerrit.wikimedia.org/r/559843 [14:49:13] (03CR) 10Krinkle: "I presume the header map key existing but set to nil results in it not being sent (not even as empty string). That's different from what I" [puppet] - 10https://gerrit.wikimedia.org/r/559711 (https://phabricator.wikimedia.org/T196558) (owner: 10Ema) [14:52:01] (03PS1) 10Volans: netbox: create also parent directory [puppet] - 10https://gerrit.wikimedia.org/r/559867 (https://phabricator.wikimedia.org/T233183) [14:52:09] (03PS4) 10Jbond: puppet-merge: Ensure we update the labs repo on every puppet run [puppet] - 10https://gerrit.wikimedia.org/r/559843 [14:53:46] (03CR) 10Volans: "Compiler happy: https://puppet-compiler.wmflabs.org/compiler1002/20090/netbox1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/559867 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [14:53:50] (03CR) 10Volans: [C: 03+2] netbox: create also parent directory [puppet] - 10https://gerrit.wikimedia.org/r/559867 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [14:56:25] (03PS1) 10CDanis: templatize bgp sessions with pmacct [homer/public] - 10https://gerrit.wikimedia.org/r/559870 [14:57:07] (03PS5) 10Jbond: puppet-merge: Ensure we update the labs repo on every puppet run [puppet] - 10https://gerrit.wikimedia.org/r/559843 [14:57:22] (03PS3) 10Jbond: puppet-merge: ensure puppet is only run from one server [puppet] - 10https://gerrit.wikimedia.org/r/559844 [15:00:59] (03PS1) 10Volans: netbox: fix path for file [puppet] - 10https://gerrit.wikimedia.org/r/559875 (https://phabricator.wikimedia.org/T233183) [15:02:26] (03CR) 10Volans: [C: 03+2] netbox: fix path for file [puppet] - 10https://gerrit.wikimedia.org/r/559875 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [15:10:07] 10Operations, 10DNS, 10Mail, 10Traffic: wikimedia.community domain name is not resolving an mx record - https://phabricator.wikimedia.org/T241132 (10Basak) Hi All! Just wanted to drop a line to share that this is the address shared as the main communication email of the group in social media and other cha... [15:12:33] (03PS2) 10CDanis: templatize bgp sessions with pmacct [homer/public] - 10https://gerrit.wikimedia.org/r/559870 [15:15:55] (03PS2) 10Herron: ganeti: apply ferm regardless of ganeti_cluster fact [puppet] - 10https://gerrit.wikimedia.org/r/559879 [15:17:47] (03CR) 10jerkins-bot: [V: 04-1] ganeti: apply ferm regardless of ganeti_cluster fact [puppet] - 10https://gerrit.wikimedia.org/r/559879 (owner: 10Herron) [15:18:11] (03PS1) 10Volans: netbox: fix variable name in template [puppet] - 10https://gerrit.wikimedia.org/r/559882 (https://phabricator.wikimedia.org/T233183) [15:19:02] (03PS3) 10Herron: ganeti: apply ferm regardless of ganeti_cluster fact [puppet] - 10https://gerrit.wikimedia.org/r/559879 [15:20:25] (03CR) 10Volans: [C: 03+2] netbox: fix variable name in template [puppet] - 10https://gerrit.wikimedia.org/r/559882 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [15:21:05] (03CR) 10Ottomata: "!!! Actually this doesn't work for the Kafka consumer. I don't know why....I still think using 'localhost' as advertised host is the prob" [deployment-charts] - 10https://gerrit.wikimedia.org/r/559607 (owner: 10Ottomata) [15:24:43] apergos: re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/559550 the recipe being deprecated is also raid10 on four devices if I'm not mistaken so it should be equivalent, standard.cfg + raid10-4dev.cfg that is [15:25:05] (03PS1) 10Volans: netbox: fix typo in pre-existing vhost [puppet] - 10https://gerrit.wikimedia.org/r/559883 [15:25:48] we have two filesystems set up though, and that one only has one (also that one sets up ext4 instead of the current xfs on the one, although I do not object to switching)... unless i am misreading something [15:25:54] which is quite possible [15:27:05] (03CR) 10Volans: [C: 03+2] netbox: fix typo in pre-existing vhost [puppet] - 10https://gerrit.wikimedia.org/r/559883 (owner: 10Volans) [15:27:33] apergos: there will be / and /srv as separate filesystems, I'm updating the commit message so it is more clear what's happening [15:27:54] hey thanks [15:28:25] both raid10? [15:31:03] (03PS3) 10CDanis: templatize bgp sessions with pmacct [homer/public] - 10https://gerrit.wikimedia.org/r/559870 [15:33:53] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1001/20093/" [puppet] - 10https://gerrit.wikimedia.org/r/559879 (owner: 10Herron) [15:33:58] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:35:44] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:35:58] (03PS1) 10Alexandros Kosiaris: Edit Project Config [deployment-charts] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/559886 [15:36:01] (03Abandoned) 10Alexandros Kosiaris: Edit Project Config [deployment-charts] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/559886 (owner: 10Alexandros Kosiaris) [15:36:03] (03PS4) 10CDanis: templatize bgp sessions with pmacct [homer/public] - 10https://gerrit.wikimedia.org/r/559870 [15:36:05] (03PS3) 10Filippo Giunchedi: install_server: use raid10-8dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) [15:36:07] (03PS3) 10Filippo Giunchedi: install_server: use raid10-6dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559549 (https://phabricator.wikimedia.org/T156955) [15:36:09] (03PS3) 10Filippo Giunchedi: install_server: deprecate raid10-gpt.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559550 (https://phabricator.wikimedia.org/T156955) [15:36:11] (03PS3) 10Filippo Giunchedi: install_server: deprecate raid10-gpt-srv-lvm-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) [15:36:20] apergos: a single raid10 array, with lvm on top, updated the review now [15:37:05] ah that would be ok [15:39:55] (03CR) 10Ayounsi: [C: 03+1] templatize bgp sessions with pmacct [homer/public] - 10https://gerrit.wikimedia.org/r/559870 (owner: 10CDanis) [15:40:41] apergos: aye, the gory details are in standard.cfg [15:40:52] really my mistake was not looking at that first [15:41:27] (03CR) 10CDanis: [V: 03+2 C: 03+2] templatize bgp sessions with pmacct [homer/public] - 10https://gerrit.wikimedia.org/r/559870 (owner: 10CDanis) [15:42:47] apergos: is not looking at partman really ever a mistake? [15:43:53] (03PS2) 10Minhducsun2002: Upload HD logos for hi, la and no wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559344 (https://phabricator.wikimedia.org/T150618) [15:45:12] (03PS1) 10CDanis: bgp: pmacct: use local AS, not global [homer/public] - 10https://gerrit.wikimedia.org/r/559889 [15:46:02] (03CR) 10Herron: "Coincidentally I ran a PCC on this with stale facts present (before ganeti_cluster had been populated on ganeti3001) and can see that ferm" [puppet] - 10https://gerrit.wikimedia.org/r/559879 (owner: 10Herron) [15:46:14] 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labmon* to Buster - https://phabricator.wikimedia.org/T224585 (10Phamhi) [15:46:33] cdanis: zing! [15:47:20] (03CR) 10Minhducsun2002: "Forgot to compare logos, sorry." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559344 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [15:48:01] (03CR) 10Ayounsi: [C: 03+1] bgp: pmacct: use local AS, not global [homer/public] - 10https://gerrit.wikimedia.org/r/559889 (owner: 10CDanis) [15:48:14] (03CR) 10CDanis: [V: 03+2 C: 03+2] bgp: pmacct: use local AS, not global [homer/public] - 10https://gerrit.wikimedia.org/r/559889 (owner: 10CDanis) [15:49:15] (03CR) 10Filippo Giunchedi: "LGTM, although changing only eventlog would work too due to overlap" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559827 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [15:52:00] 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labmon* to Buster - https://phabricator.wikimedia.org/T224585 (10Phamhi) Hi @Ladsgroup, noted; I apologize for the inconvenience. I have updated the docs. Please let me know if you need me on anything. [15:53:34] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:54:40] (03PS1) 10CDanis: set confed ASN for esams/knams [homer/public] - 10https://gerrit.wikimedia.org/r/559891 [15:55:20] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:59:06] (03PS1) 10CDanis: bgp: pmacct: add iBGP cluster id [homer/public] - 10https://gerrit.wikimedia.org/r/559892 [16:00:44] (03CR) 10Vgutierrez: "is the endpoint already working on the servers? low-traffic lvs seems unable to reach the port, and that's required to get the healthcheck" [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [16:02:26] (03CR) 10Jhedden: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [16:04:22] (03CR) 10Cwhite: [C: 03+1] install_server: use raid10-8dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [16:05:38] RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on netflow4001 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode [16:08:38] (03PS4) 10Jhedden: ceph: add support for dedicated cluster network [puppet] - 10https://gerrit.wikimedia.org/r/559620 (https://phabricator.wikimedia.org/T240965) [16:10:31] (03CR) 10Ayounsi: [C: 03+1] set confed ASN for esams/knams [homer/public] - 10https://gerrit.wikimedia.org/r/559891 (owner: 10CDanis) [16:10:53] (03CR) 10Ayounsi: [C: 03+1] bgp: pmacct: add iBGP cluster id [homer/public] - 10https://gerrit.wikimedia.org/r/559892 (owner: 10CDanis) [16:11:08] (03CR) 10CDanis: [C: 03+2] set confed ASN for esams/knams [homer/public] - 10https://gerrit.wikimedia.org/r/559891 (owner: 10CDanis) [16:11:10] (03CR) 10CDanis: [V: 03+2 C: 03+2] set confed ASN for esams/knams [homer/public] - 10https://gerrit.wikimedia.org/r/559891 (owner: 10CDanis) [16:11:14] (03CR) 10CDanis: [V: 03+2 C: 03+2] bgp: pmacct: add iBGP cluster id [homer/public] - 10https://gerrit.wikimedia.org/r/559892 (owner: 10CDanis) [16:14:44] (03CR) 10Jhedden: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [16:15:50] (03PS2) 10Krinkle: Disable wgExtractsExtendOpenSearchXml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559224 (https://phabricator.wikimedia.org/T240691) [16:15:54] (03PS1) 10CDanis: devices: sort by fqdn [software/homer] - 10https://gerrit.wikimedia.org/r/559896 [16:16:50] (03PS1) 10Volans: netbox: add dns.cfg configuration file [puppet] - 10https://gerrit.wikimedia.org/r/559897 (https://phabricator.wikimedia.org/T233183) [16:18:32] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:18:33] (03CR) 10Krinkle: [C: 03+2] Disable wgExtractsExtendOpenSearchXml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559224 (https://phabricator.wikimedia.org/T240691) (owner: 10Krinkle) [16:18:46] (03PS1) 10Volans: dns: update default config file path [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559899 (https://phabricator.wikimedia.org/T233183) [16:19:20] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Enable gzip compression for interface icon SVGs served by MediaWiki - https://phabricator.wikimedia.org/T232615 (10Krinkle) 05Open→03Resolved [16:19:23] (03CR) 10Ebe123: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559344 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [16:19:28] (03CR) 10Volans: [C: 03+2] dns: update default config file path [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559899 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [16:19:32] (03Merged) 10jenkins-bot: Disable wgExtractsExtendOpenSearchXml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559224 (https://phabricator.wikimedia.org/T240691) (owner: 10Krinkle) [16:19:59] (03CR) 10CDanis: "I think we'd ideally want to sort by site, for niceness of output, but this still seems better than random/arbitrary order in diff results" [software/homer] - 10https://gerrit.wikimedia.org/r/559896 (owner: 10CDanis) [16:20:17] 10Operations, 10Performance-Team, 10Traffic, 10Performance-Team-publish: Enable gzip compression for interface icon SVGs served by MediaWiki - https://phabricator.wikimedia.org/T232615 (10Krinkle) [16:20:18] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:20:42] cdanis: was in my todo, just ENOTIME [16:20:44] thanks [16:20:48] :D [16:21:03] * Krinkle staging on mwdebug1001 [16:21:35] (03CR) 10Volans: [C: 03+1] "LGTM, thanks for the patch, was in my TODO" [software/homer] - 10https://gerrit.wikimedia.org/r/559896 (owner: 10CDanis) [16:22:34] (03CR) 10Vgutierrez: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [16:24:13] 10Operations, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Nuria) 05Open→03Resolved [16:24:52] (03PS3) 10Urbanecm: Upload HD logos for hi, la and no wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559344 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [16:24:54] (03CR) 10Ayounsi: [C: 03+1] devices: sort by fqdn [software/homer] - 10https://gerrit.wikimedia.org/r/559896 (owner: 10CDanis) [16:25:06] (03PS6) 10Urbanecm: Add wgLogoHD entry for hi, la and no wikibooks in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559347 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [16:25:48] (03CR) 10CDanis: [C: 03+2] devices: sort by fqdn [software/homer] - 10https://gerrit.wikimedia.org/r/559896 (owner: 10CDanis) [16:26:04] no need to submit cdanis ;) [16:26:13] (03PS2) 10Volans: netbox: add dns.cfg configuration file [puppet] - 10https://gerrit.wikimedia.org/r/559897 (https://phabricator.wikimedia.org/T233183) [16:26:14] CI will do for you [16:26:26] I know :) [16:27:19] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Ia9190a4e5, T240691: Disable wgExtractsExtendOpenSearchXml (duration: 00m 55s) [16:27:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:27:26] T240691: TextExtracts extension frequent slows down opensearch API by several seconds - https://phabricator.wikimedia.org/T240691 [16:28:31] (03CR) 10Volans: [C: 03+2] "Compiler is sane: https://puppet-compiler.wmflabs.org/compiler1001/20095/netbox1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/559897 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [16:28:39] (03Merged) 10jenkins-bot: devices: sort by fqdn [software/homer] - 10https://gerrit.wikimedia.org/r/559896 (owner: 10CDanis) [16:28:52] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ homer 'cr*' commit 'templatize BGP sessions with pmacct netflow collector cb096f509 0f56b2233 2e050ad33' [16:28:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:04] PROBLEM - PHP opcache health on scandium is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [16:29:31] (03CR) 10Urbanecm: [C: 03+1] "LGTM, logos seems to match, but the size is interesting. Not opposing through." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559344 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [16:30:28] (03CR) 10Urbanecm: [C: 04-1] Add wgLogoHD entry for hi, la and no wikibooks in IS.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559347 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [16:31:18] (03CR) 10Minhducsun2002: "Seems like a typo when I run the commands." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559344 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [16:33:54] (03PS1) 10Ottomata: Kafka needs to be reachable on different ports for internal and external clients [deployment-charts] - 10https://gerrit.wikimedia.org/r/559902 [16:34:26] (03PS1) 10Volans: netbox: create home directory for the user [puppet] - 10https://gerrit.wikimedia.org/r/559903 [16:34:48] (03CR) 10Ottomata: "what a pain!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/559902 (owner: 10Ottomata) [16:34:50] (03CR) 10Ottomata: [C: 03+2] Kafka needs to be reachable on different ports for internal and external clients [deployment-charts] - 10https://gerrit.wikimedia.org/r/559902 (owner: 10Ottomata) [16:35:06] (03CR) 10jerkins-bot: [V: 04-1] netbox: create home directory for the user [puppet] - 10https://gerrit.wikimedia.org/r/559903 (owner: 10Volans) [16:36:54] (03PS2) 10Volans: netbox: create home directory for the user [puppet] - 10https://gerrit.wikimedia.org/r/559903 [16:37:32] (03CR) 10jerkins-bot: [V: 04-1] netbox: create home directory for the user [puppet] - 10https://gerrit.wikimedia.org/r/559903 (owner: 10Volans) [16:37:44] * volans getting tired I guess :D [16:38:54] (03PS3) 10Volans: netbox: create home directory for the user [puppet] - 10https://gerrit.wikimedia.org/r/559903 [16:39:07] (03CR) 10Volans: "Compiler looks ok: https://puppet-compiler.wmflabs.org/compiler1003/20097/netbox1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/559903 (owner: 10Volans) [16:39:48] (03CR) 10Volans: [C: 03+2] netbox: create home directory for the user [puppet] - 10https://gerrit.wikimedia.org/r/559903 (owner: 10Volans) [16:41:04] (03Abandoned) 10CRusnov: Add script to import management DNS entries [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/529977 (https://phabricator.wikimedia.org/T228670) (owner: 10CRusnov) [16:43:28] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:45:14] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:46:25] (03CR) 10Minhducsun2002: "Looks like ImageMagick did something wrong here :" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559344 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [16:56:40] (03PS7) 10Minhducsun2002: Add wgLogoHD entry for hi, la and no wikibooks in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559347 (https://phabricator.wikimedia.org/T150618) [16:58:39] (03CR) 10Ebe123: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559347 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [17:04:32] 10Operations, 10Analytics, 10Analytics-Kanban: Terminate Wikimetrics - https://phabricator.wikimedia.org/T219446 (10Nuria) [17:04:40] 10Operations, 10Analytics, 10Analytics-Kanban: Sunset Wikimetrics - https://phabricator.wikimedia.org/T211835 (10Nuria) [17:04:42] 10Operations, 10Analytics, 10Analytics-Kanban: Terminate Wikimetrics - https://phabricator.wikimedia.org/T219446 (10Nuria) 05Open→03Resolved [17:06:05] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users, researchers & wmf for Shay Nowick - https://phabricator.wikimedia.org/T240917 (10SNowick_WMF) {F31484193} Confirming SSH key is secured with passphrase [17:07:43] 10Puppet: Add QUIC for ATS - https://phabricator.wikimedia.org/T241237 (10Bugreporter) [17:07:46] 10Operations, 10Traffic, 10HTTPS: Enable QUIC support on Wikimedia servers - https://phabricator.wikimedia.org/T238034 (10Bugreporter) [17:08:30] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:10:16] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:11:23] mutante: no rush, but if you're around could do xhgui sync + switchover [17:13:59] (03PS1) 10Volans: netbox: fix path for dns snippets repo [puppet] - 10https://gerrit.wikimedia.org/r/559912 (https://phabricator.wikimedia.org/T233183) [17:15:08] (03PS1) 10Volans: dns: make netbox API query backward compatible [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559913 (https://phabricator.wikimedia.org/T233183) [17:15:54] (03CR) 10CRusnov: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559912 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [17:16:14] (03CR) 10Volans: [C: 03+2] netbox: fix path for dns snippets repo [puppet] - 10https://gerrit.wikimedia.org/r/559912 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [17:19:43] 10Operations, 10Traffic, 10HTTPS, 10Performance-Team (Radar): Enable QUIC support on Wikimedia servers - https://phabricator.wikimedia.org/T238034 (10Krinkle) [17:19:55] !log re-generating sre-bot ro-token for Netbox, accidentally leaked to gerrit, already revoked within a minute from leak [17:20:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:27] (03CR) 10CDanis: [C: 03+1] puppet-merge: Ensure we update the labs repo on every puppet run (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559843 (owner: 10Jbond) [17:24:26] (03PS2) 10Volans: dns: make netbox API query backward compatible [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559913 (https://phabricator.wikimedia.org/T233183) [17:27:42] (03PS4) 10Ottomata: analytics::refinery::job::data_purge: Add growth deletion timers [puppet] - 10https://gerrit.wikimedia.org/r/556232 (https://phabricator.wikimedia.org/T237124) (owner: 10Mforns) [17:28:01] (03CR) 10Ottomata: [V: 03+2 C: 03+2] analytics::refinery::job::data_purge: Add growth deletion timers [puppet] - 10https://gerrit.wikimedia.org/r/556232 (https://phabricator.wikimedia.org/T237124) (owner: 10Mforns) [17:28:43] (03CR) 10CDanis: puppet-merge: ensure puppet is only run from one server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559844 (owner: 10Jbond) [17:29:16] (03CR) 10CDanis: [C: 03+1] "forgot to say: lgtm aside from one nit :)" [puppet] - 10https://gerrit.wikimedia.org/r/559843 (owner: 10Jbond) [17:29:18] !log Getting cloudweb2001-dev caught up with the MW train via a manual `scap pull` (T241251) [17:29:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:34] T241251: Test Wikitech is still running wmf.8 (should be on wmf.11) - https://phabricator.wikimedia.org/T241251 [17:32:25] (03CR) 10CRusnov: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559913 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [17:32:30] (03CR) 10Volans: [C: 03+2] dns: make netbox API query backward compatible [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/559913 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [17:33:34] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:34:37] (03PS6) 10Jbond: puppet-merge: Ensure we update the labs repo on every puppet run [puppet] - 10https://gerrit.wikimedia.org/r/559843 [17:34:59] 10Operations, 10ops-eqsin: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099 (10wiki_willy) @RobH - I'm pretty sure we took care of all the onsite work via T229243 with DreamICC, but can you confirm and check off the boxes in this task, up to the current step we're on?... [17:35:18] (03CR) 10Jbond: "thanks" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559843 (owner: 10Jbond) [17:35:22] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:36:59] (03PS5) 10Jbond: puppet-merge: ensure puppet is only run from one server [puppet] - 10https://gerrit.wikimedia.org/r/559844 [17:39:03] (03PS6) 10Jbond: puppet-merge: ensure puppet is only run from one server [puppet] - 10https://gerrit.wikimedia.org/r/559844 [17:39:24] (03CR) 10Jbond: "thanks updated" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559844 (owner: 10Jbond) [17:39:52] (03CR) 10Jbond: [C: 03+2] puppet-merge: Ensure we update the labs repo on every puppet run [puppet] - 10https://gerrit.wikimedia.org/r/559843 (owner: 10Jbond) [17:43:26] (03PS1) 10Jbond: test puppet-merge [puppet] - 10https://gerrit.wikimedia.org/r/559917 [17:43:28] (03PS5) 10Ottomata: analytics::refinery::job::data_purge: Add timer to delete old MWH dumps [puppet] - 10https://gerrit.wikimedia.org/r/539151 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [17:43:37] (03CR) 10Ottomata: [V: 03+2 C: 03+2] analytics::refinery::job::data_purge: Add timer to delete old MWH dumps [puppet] - 10https://gerrit.wikimedia.org/r/539151 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [17:44:19] (03CR) 10Jbond: [C: 03+2] test puppet-merge [puppet] - 10https://gerrit.wikimedia.org/r/559917 (owner: 10Jbond) [17:44:43] (03PS1) 10CRusnov: netbox: Adjust permissions of Puppet-managed git files [puppet] - 10https://gerrit.wikimedia.org/r/559918 [17:45:16] (03CR) 10CRusnov: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/559918 (owner: 10CRusnov) [17:45:18] (03CR) 10CDanis: [C: 03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/559844 (owner: 10Jbond) [17:45:50] (03CR) 10CDanis: [C: 03+1] "pcc lg as well https://puppet-compiler.wmflabs.org/compiler1002/20099/" [puppet] - 10https://gerrit.wikimedia.org/r/559844 (owner: 10Jbond) [17:48:24] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:49:26] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:49:33] (03PS7) 10Jbond: puppet-merge: ensure puppet is only run from one server [puppet] - 10https://gerrit.wikimedia.org/r/559844 [17:50:05] cdanis: can you give that ^^ a quick check [17:50:40] hah somehow i interpolated that extra stuff as being there when i read it in the first place [17:50:42] jbond42: lgtm [17:50:47] (03CR) 10CDanis: [C: 03+1] puppet-merge: ensure puppet is only run from one server [puppet] - 10https://gerrit.wikimedia.org/r/559844 (owner: 10Jbond) [17:50:51] thanks [17:51:10] (03PS7) 10CDanis: systemd::timer::job: fix bug re: On(In)?ActiveUnitSec [puppet] - 10https://gerrit.wikimedia.org/r/551281 [17:51:12] (03PS14) 10CDanis: prometheus: export NIC firmware versions [puppet] - 10https://gerrit.wikimedia.org/r/549683 (https://phabricator.wikimedia.org/T236744) [17:51:32] (03CR) 10Jbond: [C: 03+2] puppet-merge: ensure puppet is only run from one server [puppet] - 10https://gerrit.wikimedia.org/r/559844 (owner: 10Jbond) [17:53:01] (03CR) 10jerkins-bot: [V: 04-1] systemd::timer::job: fix bug re: On(In)?ActiveUnitSec [puppet] - 10https://gerrit.wikimedia.org/r/551281 (owner: 10CDanis) [17:54:51] (03PS1) 10Jbond: Revert "test puppet-merge" [puppet] - 10https://gerrit.wikimedia.org/r/559920 [17:54:58] (03CR) 10Jbond: [V: 03+2 C: 03+2] Revert "test puppet-merge" [puppet] - 10https://gerrit.wikimedia.org/r/559920 (owner: 10Jbond) [17:55:37] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559918 (owner: 10CRusnov) [17:56:16] (03PS2) 10CRusnov: netbox: Adjust permissions of Puppet-managed git files [puppet] - 10https://gerrit.wikimedia.org/r/559918 [17:56:44] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/559918 (owner: 10CRusnov) [17:58:28] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:59:34] (03PS3) 10Mforns: analytics::search::jobs.pp: Move last deletion timers to drop-older-than [puppet] - 10https://gerrit.wikimedia.org/r/539094 (https://phabricator.wikimedia.org/T204735) [17:59:44] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:02:31] (03PS4) 10Ottomata: analytics::search::jobs.pp: Move last deletion timers to drop-older-than [puppet] - 10https://gerrit.wikimedia.org/r/539094 (https://phabricator.wikimedia.org/T204735) (owner: 10Mforns) [18:02:55] (03CR) 10Ottomata: [V: 03+2 C: 03+2] analytics::search::jobs.pp: Move last deletion timers to drop-older-than [puppet] - 10https://gerrit.wikimedia.org/r/539094 (https://phabricator.wikimedia.org/T204735) (owner: 10Mforns) [18:05:48] (03PS1) 10Jbond: puppet-merge: correct error handeling [puppet] - 10https://gerrit.wikimedia.org/r/559924 [18:12:10] (03PS1) 10Mforns: analytics::refinery::job::data_purge: Correct timer syntax [puppet] - 10https://gerrit.wikimedia.org/r/559926 (https://phabricator.wikimedia.org/T208612) [18:12:58] (03PS2) 10Jbond: puppet-merge: correct error handeling [puppet] - 10https://gerrit.wikimedia.org/r/559924 [18:14:22] (03CR) 10Volans: [C: 03+1] "As discussed offline, LGTM for this iteration. cescout will unify and remove a bunch of duplicated code and some fixes were delayed until " [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/556732 (owner: 10Ssingh) [18:14:47] (03CR) 10Ottomata: [C: 03+2] analytics::refinery::job::data_purge: Correct timer syntax [puppet] - 10https://gerrit.wikimedia.org/r/559926 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [18:15:08] (03CR) 10Jbond: [C: 03+2] puppet-merge: correct error handeling [puppet] - 10https://gerrit.wikimedia.org/r/559924 (owner: 10Jbond) [18:18:39] (03CR) 10CDanis: [C: 03+1] puppet-merge: correct error handeling [puppet] - 10https://gerrit.wikimedia.org/r/559924 (owner: 10Jbond) [18:18:46] (03PS1) 10Jbond: Revert "Revert "test puppet-merge"" [puppet] - 10https://gerrit.wikimedia.org/r/559928 [18:18:54] (03CR) 10Jbond: [V: 03+2 C: 03+2] Revert "Revert "test puppet-merge"" [puppet] - 10https://gerrit.wikimedia.org/r/559928 (owner: 10Jbond) [18:28:02] 10Operations, 10Puppet, 10User-jbond: puppet-merge: race condition - https://phabricator.wikimedia.org/T241262 (10jbond) [18:31:26] 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labmon* to Buster - https://phabricator.wikimedia.org/T224585 (10bd808) >>! In T224585#5756913, @Ladsgroup wrote: > Hey, thanks for doing this but I really think this needs some sort of announcement, lots of tools... [18:34:00] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:35:46] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:38:05] (03PS1) 10Giuseppe Lavagetto: Add a registry reporter [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559933 [18:44:51] !log mholloway-shell@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/WikimediaEditorTasks: Fix: Get RevisionRecord directly from the Revision in onPageContentSaveComplete (T241014) (duration: 00m 54s) [18:44:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:59] T241014: Argument 2 passed to MediaWiki\Extension\WikimediaEditorTasks\Hooks::countersOnEditSuccess() must be an instance of MediaWiki\Revision\RevisionRecord, null given, called in /srv/mediawiki/php-1.35.0-wmf.10/extensions/WikimediaEditorTasks/src/Hooks.php on line 78 - https://phabricator.wikimedia.org/T241014 [18:49:21] (03PS3) 10CRusnov: netbox: Adjust permissions of Puppet-managed git files [puppet] - 10https://gerrit.wikimedia.org/r/559918 [18:52:47] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:53:53] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:54:05] (03CR) 10CRusnov: [C: 03+2] netbox: Adjust permissions of Puppet-managed git files [puppet] - 10https://gerrit.wikimedia.org/r/559918 (owner: 10CRusnov) [19:00:13] (03PS1) 10CDanis: puppet-merge: propagate the merged SHAs [puppet] - 10https://gerrit.wikimedia.org/r/559936 [19:03:27] (03CR) 10CDanis: "PCC looks good https://puppet-compiler.wmflabs.org/compiler1003/20101/ did some local testing on an extracted fragment of script as well" [puppet] - 10https://gerrit.wikimedia.org/r/559936 (owner: 10CDanis) [19:04:45] (03PS2) 10CDanis: puppet-merge: propagate the merged SHAs [puppet] - 10https://gerrit.wikimedia.org/r/559936 [19:05:21] 10Operations, 10Wikimedia-Mailing-lists: Spamhaus check suddenly started bouncing me; whitelist request? - https://phabricator.wikimedia.org/T241267 (10TheSandDoctor) [19:06:13] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559347 (https://phabricator.wikimedia.org/T150618) (owner: 10Minhducsun2002) [19:08:21] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:09:53] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:11:25] (03PS3) 10CDanis: puppet-merge: propagate the merged SHAs [puppet] - 10https://gerrit.wikimedia.org/r/559936 [19:20:28] 10Operations, 10Wikimedia-Mailing-lists: Spamhaus check suddenly started bouncing me; whitelist request? - https://phabricator.wikimedia.org/T241267 (10Aklapper) For a little bit more context, with a redacted IP and URL for privacy: `SMTP error from remote server for RCPT TO command, host: lists.wikimedia.org... [19:21:45] (03CR) 10Jbond: "lgtm but missing a cat :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559936 (owner: 10CDanis) [19:22:11] (03PS4) 10CDanis: puppet-merge: propagate the merged SHAs [puppet] - 10https://gerrit.wikimedia.org/r/559936 [19:23:21] (03CR) 10CDanis: [C: 03+2] puppet-merge: propagate the merged SHAs [puppet] - 10https://gerrit.wikimedia.org/r/559936 (owner: 10CDanis) [19:23:41] (03CR) 10Jbond: "LGTM, thanks" [puppet] - 10https://gerrit.wikimedia.org/r/559936 (owner: 10CDanis) [19:25:47] 10Operations, 10ops-codfw: ms-fe2007 NIC failure - https://phabricator.wikimedia.org/T239805 (10Papaul) [19:27:29] (03PS1) 10CDanis: test puppet-merge [puppet] - 10https://gerrit.wikimedia.org/r/559940 [19:28:05] (03CR) 10CDanis: [V: 03+2 C: 03+2] test puppet-merge [puppet] - 10https://gerrit.wikimedia.org/r/559940 (owner: 10CDanis) [19:28:10] (03PS5) 10Ottomata: New eventstreams chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) [19:28:14] 10Operations, 10Wikimedia-Mailing-lists: Spamhaus check suddenly started bouncing me; whitelist request? - https://phabricator.wikimedia.org/T241267 (10Peachey88) @TheSandDoctor Have you applied to have your entry removed from SpamHaus? https://www.spamhaus.org/lookup/ [19:32:42] 10Operations, 10Puppet, 10User-jbond: puppet-merge: race condition - https://phabricator.wikimedia.org/T241262 (10jbond) 05Open→03Resolved a:03jbond this has been resolved https://gerrit.wikimedia.org/r/559936 [19:43:49] (03CR) 10MarcoAurelio: [C: 03+1] Add sandboxlink for eswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559576 (https://phabricator.wikimedia.org/T241163) (owner: 10Ammarpad) [20:13:42] 10Operations, 10Puppet: puppet-merge can't accept an explicit SHA1 for an --ops merge - https://phabricator.wikimedia.org/T241277 (10CDanis) [20:13:55] (03PS1) 10CDanis: puppet-merge.py: SHA1 or explicit FETCH_HEAD is mandatory [puppet] - 10https://gerrit.wikimedia.org/r/559944 [20:20:12] (03CR) 10Ssingh: [C: 03+2] Add script for fetching routing information from RIPEstat [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/556732 (owner: 10Ssingh) [20:20:57] (03CR) 10Ssingh: [V: 03+2 C: 03+2] Add script for fetching routing information from RIPEstat [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/556732 (owner: 10Ssingh) [20:20:59] (03Merged) 10jenkins-bot: Add script for fetching routing information from RIPEstat [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/556732 (owner: 10Ssingh) [20:38:25] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:40:13] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:49:21] 10Operations, 10observability, 10Patch-For-Review, 10Performance-Team (Radar): Fully migrate producers off statsd - https://phabricator.wikimedia.org/T205870 (10colewhite) [20:55:50] (03CR) 10Paladox: "Hi, this broke puppetmasters in labs:" [puppet] - 10https://gerrit.wikimedia.org/r/559844 (owner: 10Jbond) [21:02:39] (03PS1) 10CDanis: puppet-merge: attempt to fix labs case [puppet] - 10https://gerrit.wikimedia.org/r/559949 [21:04:31] (03CR) 10CDanis: [C: 03+2] "PCC says no-op in prod, and the erroneous line now matches what seems to work in labs on other files, so let's try it https://puppet-compi" [puppet] - 10https://gerrit.wikimedia.org/r/559949 (owner: 10CDanis) [21:05:04] (03PS1) 10Jbond: puppet-merge: update to use unstractured facts [puppet] - 10https://gerrit.wikimedia.org/r/559950 [21:05:22] (03CR) 10CDanis: "> Patch Set 7:" [puppet] - 10https://gerrit.wikimedia.org/r/559844 (owner: 10Jbond) [21:05:56] ah jbond42 I already guessed at and merged the same change :) [21:08:39] (03CR) 10Jbond: [C: 03+2] "PCC ops: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/20103/console" [puppet] - 10https://gerrit.wikimedia.org/r/559950 (owner: 10Jbond) [21:10:35] (03PS1) 10RLazarus: Test multiple hosts in parallel. [software/httpbb] - 10https://gerrit.wikimedia.org/r/559952 [21:10:37] cdanis: ahh i was just wondering wht my change was empty [21:10:45] 😂 [21:10:45] thanks [21:10:57] paladox: are things better now? [21:11:28] cdanis yup! Thank you! [21:11:32] 10Operations, 10User-Addshore: https://noc.wikimedia.org/conf/highlight.php returns 404, so all links from noc. are broken. - https://phabricator.wikimedia.org/T240928 (10Addshore) 05Open→03Resolved a:03Addshore Also works for me now [21:11:59] (03CR) 10RLazarus: "There's a little more work to do here (called out with TODOs) but this is a good first multi-host implementation." [software/httpbb] - 10https://gerrit.wikimedia.org/r/559952 (owner: 10RLazarus) [21:12:28] (03CR) 10jerkins-bot: [V: 04-1] Test multiple hosts in parallel. [software/httpbb] - 10https://gerrit.wikimedia.org/r/559952 (owner: 10RLazarus) [21:12:44] ok going back off ping if needed [21:13:07] 10Operations, 10netops: fastnetmon fired for routine text-lb.esams traffic - https://phabricator.wikimedia.org/T241281 (10CDanis) [21:14:20] (03PS5) 10Jhedden: ceph: add support for dedicated cluster network [puppet] - 10https://gerrit.wikimedia.org/r/559620 (https://phabricator.wikimedia.org/T240965) [21:19:15] 10Operations, 10MediaWiki-extensions-PdfHandler, 10Multimedia, 10Patch-For-Review: Error creating PDF on Commons: "convert: no decode delegate for this image format" (fixed in GS 9.07) - https://phabricator.wikimedia.org/T50007 (10PhotographerTom) We are also having this issue on a private wiki. [21:21:50] (03PS6) 10Jhedden: ceph: add support for dedicated cluster network [puppet] - 10https://gerrit.wikimedia.org/r/559620 (https://phabricator.wikimedia.org/T240965) [21:26:07] PROBLEM - Unmerged changes on repository puppet on labtestpuppetmaster2001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [21:26:27] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [21:29:11] 10Operations, 10MediaWiki-extensions-PdfHandler, 10Multimedia, 10Patch-For-Review: Error creating PDF on Commons: "convert: no decode delegate for this image format" (fixed in GS 9.07) - https://phabricator.wikimedia.org/T50007 (10Aklapper) Hi @PhotographerTom. The task summary says "(fixed in GS 9.07)". W... [21:32:58] (03PS7) 10Jhedden: ceph: add support for dedicated cluster network [puppet] - 10https://gerrit.wikimedia.org/r/559620 (https://phabricator.wikimedia.org/T240965) [21:33:43] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:35:29] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:36:22] (03PS6) 10Ottomata: [WIP] New eventstreams chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) [21:37:09] (03PS8) 10Jhedden: ceph: add support for dedicated cluster network [puppet] - 10https://gerrit.wikimedia.org/r/559620 (https://phabricator.wikimedia.org/T240965) [21:38:07] (03PS2) 10RLazarus: Test multiple hosts in parallel. [software/httpbb] - 10https://gerrit.wikimedia.org/r/559952 [21:38:37] 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labmon* to Buster - https://phabricator.wikimedia.org/T224585 (10Ladsgroup) >>! In T224585#5757762, @bd808 wrote: >>>! In T224585#5756913, @Ladsgroup wrote: >> Hey, thanks for doing this but I really think this nee... [21:45:04] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1123 - https://phabricator.wikimedia.org/T240534 (10Jclark-ctr) @marostegui disk arrived today message me on irc if available to change [21:45:45] (03PS1) 10CDanis: puppet-merge: it isn't just lack of diffs that matters [puppet] - 10https://gerrit.wikimedia.org/r/559957 [21:48:32] (03PS2) 10CDanis: puppet-merge: it isn't lack of diffs that matters [puppet] - 10https://gerrit.wikimedia.org/r/559957 [21:49:49] (03PS2) 10Jhedden: ceph: add secondary interface for cloudceph servers [dns] - 10https://gerrit.wikimedia.org/r/558636 (https://phabricator.wikimedia.org/T240965) [21:49:54] (03CR) 10CDanis: [C: 03+2] puppet-merge: it isn't lack of diffs that matters [puppet] - 10https://gerrit.wikimedia.org/r/559957 (owner: 10CDanis) [21:51:10] (03PS1) 10CDanis: test puppet-merge yet again today [puppet] - 10https://gerrit.wikimedia.org/r/559959 [21:51:53] (03CR) 10CDanis: [V: 03+2 C: 03+2] test puppet-merge yet again today [puppet] - 10https://gerrit.wikimedia.org/r/559959 (owner: 10CDanis) [21:52:08] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@0ad64e7]: properly decode unicode on ltr model upload [21:52:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:52:15] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [21:52:20] 10Operations, 10Wikimedia-Mailing-lists: Spamhaus check suddenly started bouncing me; whitelist request? - https://phabricator.wikimedia.org/T241267 (10Platonides) Hello @TheSandDoctor The entry [[https://www.spamhaus.org/sbl/query/SBL205747|SBL205747]] is special in that it is an entry requested by mail.com... [21:52:43] (03CR) 10Jhedden: [C: 03+2] ceph: add secondary interface for cloudceph servers [dns] - 10https://gerrit.wikimedia.org/r/558636 (https://phabricator.wikimedia.org/T240965) (owner: 10Jhedden) [21:53:19] 10Operations, 10Wikimedia-Mailing-lists: Spamhaus check suddenly started bouncing me; whitelist request? - https://phabricator.wikimedia.org/T241267 (10Platonides) 05Open→03Invalid [21:53:37] jeh: please let me know if you encounter any issues with your puppet-merge -- I just tweaked the script [21:53:49] cdanis: will do thanks, just noticed you had one in queue [21:54:28] cdanis: do you want me to merge `test puppet-merge yet again today (d037e17a10)`? [21:54:36] ah I'm doing it now [21:54:37] * jeh is not in a hurry if you're still testing [21:54:43] did your run finish? [21:54:53] no, I haven't merged anything yet [21:55:40] oh okay [21:55:45] well you can proceed! [21:55:51] RECOVERY - Unmerged changes on repository puppet on labtestpuppetmaster2001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [21:58:12] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@0ad64e7]: properly decode unicode on ltr model upload (duration: 06m 04s) [21:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:58:59] (03PS1) 10CDanis: This commit intentionally left blank (vacat; from Latin 'vacare') [puppet] - 10https://gerrit.wikimedia.org/r/559962 [22:00:56] (03PS3) 10Jhedden: add forward and reverse for cloudceph.svc.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/558707 (https://phabricator.wikimedia.org/T240715) [22:02:10] (03CR) 10Jhedden: [C: 03+2] add forward and reverse for cloudceph.svc.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/558707 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [22:02:37] (03PS2) 10CDanis: This commit intentionally left blank to test puppet-merge bugfix [puppet] - 10https://gerrit.wikimedia.org/r/559962 [22:03:24] (03CR) 10CDanis: [C: 03+2] This commit intentionally left blank to test puppet-merge bugfix [puppet] - 10https://gerrit.wikimedia.org/r/559962 (owner: 10CDanis) [22:10:23] (03PS1) 10CDanis: another intentionally-empty commit [puppet] - 10https://gerrit.wikimedia.org/r/559964 [22:11:01] (03CR) 10CDanis: [C: 03+2] another intentionally-empty commit [puppet] - 10https://gerrit.wikimedia.org/r/559964 (owner: 10CDanis) [22:13:52] okay I believe things are much better in puppet-merge land :) [22:16:19] PROBLEM - HTTPS-wmflabs on tools.wmflabs.org is CRITICAL: SSL CRITICAL - Certificate toolforge.org valid until 2020-01-19 22:15:27 +0000 (expires in 29 days) https://phabricator.wikimedia.org/tag/toolforge/ [22:23:37] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:25:25] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:31:27] (03PS1) 10Vogone: Add transwiki sources in order to enable basic import from English language Wikimedia projects. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559965 [22:45:34] (03PS2) 10Vogone: Add basic transwiki sources for ltwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559965 (https://phabricator.wikimedia.org/T241288) [22:47:03] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@d99bebf]: force utf-8 encoding when not detected [22:47:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:47:09] (03PS3) 10Vogone: Add basic transwiki sources for ltwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559965 (https://phabricator.wikimedia.org/T241288) [22:48:24] (03PS1) 10Jhedden: Refactor ceph keyring data [labs/private] - 10https://gerrit.wikimedia.org/r/559969 [22:49:17] (03CR) 10Jhedden: [V: 03+2 C: 03+2] Refactor ceph keyring data [labs/private] - 10https://gerrit.wikimedia.org/r/559969 (owner: 10Jhedden) [22:52:25] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@d99bebf]: force utf-8 encoding when not detected (duration: 05m 22s) [22:52:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:59:05] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@8373b0d]: Correct upload metadata usage [22:59:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:05:46] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@8373b0d]: Correct upload metadata usage (duration: 06m 41s) [23:05:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:07:44] (03PS1) 10Volans: netbox: skip virtual chassis without domain [software/homer] - 10https://gerrit.wikimedia.org/r/559973 [23:10:03] !log ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=testwikidatawiki --sleep 2 --batch-size=10 (T241209) [23:10:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:10:09] T241209: Full rebuild of new wikidata terms tables on test after T237984 - https://phabricator.wikimedia.org/T241209 [23:32:31] 10Operations, 10Wikibugs: wikibugs needs restart almost everyday - https://phabricator.wikimedia.org/T241109 (10valhallasw) First of all thank you for restarting wikibugs when this happens! Judging by the behavior, it sounds like the issue is in wb2-phab: the irc bot is still there, but phabricator messages a... [23:38:51] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [23:39:13] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: Relabel labmon1001.eqiad.wmnet to cloudmetrics1001eqiad.wmnet and labmon1002.eqiad.wmnet to cloudmetrics1002eqiad.wmnet - https://phabricator.wikimedia.org/T241155 (10wiki_willy) a:03Cmjohnson [23:40:37] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [23:47:17] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users, researchers & wmf for Shay Nowick - https://phabricator.wikimedia.org/T240917 (10colewhite) [23:47:30] (03CR) 10Cwhite: [C: 03+2] admin: add shay to analytics-privatedata-users and researchers [puppet] - 10https://gerrit.wikimedia.org/r/559630 (https://phabricator.wikimedia.org/T240917) (owner: 10Cwhite) [23:47:37] (03PS2) 10Cwhite: admin: add shay to analytics-privatedata-users and researchers [puppet] - 10https://gerrit.wikimedia.org/r/559630 (https://phabricator.wikimedia.org/T240917) [23:53:52] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users, researchers & wmf for Shay Nowick - https://phabricator.wikimedia.org/T240917 (10colewhite) Thank you! Group membership changes have been deployed. Please feel free to reopen if you encounter any rela... [23:54:01] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users, researchers & wmf for Shay Nowick - https://phabricator.wikimedia.org/T240917 (10colewhite) 05Open→03Resolved [23:59:23] 10Operations, 10ops-eqiad, 10serviceops: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10Jclark-ctr) @jijiki small delay had a few tickets become urgent. Being a contractor i will be in week after christmas and will wrap them up at beginning of new...