[00:00:04] RoanKattouw, Niharika, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201110T0000). [00:00:04] No GERRIT patches in the queue for this window AFAICS. [01:00:07] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 36819824 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:08:10] (03PS1) 10Ebernhardson: elastic: Turn on adaptive replica selection in elastic 6 [puppet] - 10https://gerrit.wikimedia.org/r/640274 (https://phabricator.wikimedia.org/T259539) [01:08:49] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 2040176 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:27:21] (03CR) 10Ppchelko: [C: 04-1] Turn on formatnum logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640254 (https://phabricator.wikimedia.org/T267587) (owner: 10C. Scott Ananian) [02:07:13] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.36.0-wmf.17 [core] (wmf/1.36.0-wmf.17) - 10https://gerrit.wikimedia.org/r/640283 [02:08:33] (03PS2) 10DannyS712: Branch commit for wmf/1.36.0-wmf.17 [core] (wmf/1.36.0-wmf.17) - 10https://gerrit.wikimedia.org/r/640283 (https://phabricator.wikimedia.org/T263183) (owner: 10TrainBranchBot) [02:09:09] (03CR) 10DannyS712: "No deployment this week, though should this be merged anyway? Or just abandoned?" [core] (wmf/1.36.0-wmf.17) - 10https://gerrit.wikimedia.org/r/640283 (https://phabricator.wikimedia.org/T263183) (owner: 10TrainBranchBot) [05:45:22] (03CR) 10Volans: "I've suggested some minor improvement inline" (037 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635849 (owner: 10Ayounsi) [05:54:13] PROBLEM - Check systemd state on dbprov1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:54:56] (03CR) 10RLazarus: "I'm coming in late; apologies if any of these questions was already covered earlier in the review. I really like where this is going!" (0311 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [06:16:43] 10Operations, 10Data-Persistence-Backup, 10SRE-tools: Add toil::systemd_scope_cleanup to dbprov hosts - https://phabricator.wikimedia.org/T265323 (10Marostegui) Another one :( ` root@dbprov1003:~# systemctl list-units --state=failed UNIT LOAD ACTIVE SUB DESCRIPTION ● session-139110.sco... [06:22:57] (03PS1) 10Marostegui: control-mariadb-*: Upgrade package version [software] - 10https://gerrit.wikimedia.org/r/640286 [06:24:03] (03CR) 10Marostegui: [C: 03+2] control-mariadb-*: Upgrade package version [software] - 10https://gerrit.wikimedia.org/r/640286 (owner: 10Marostegui) [06:29:04] (03CR) 10Marostegui: [C: 03+1] "Thanks Brooke!. Don't worry too much about the buffer sizes for now, we can adjust as we go later." [puppet] - 10https://gerrit.wikimedia.org/r/639815 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [06:29:18] (03PS2) 10Marostegui: wikireplicas: set up site.pp and hosts hiera for new servers [puppet] - 10https://gerrit.wikimedia.org/r/639815 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [06:31:31] RECOVERY - Check systemd state on dbprov1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:31:35] (03PS1) 10Marostegui: install_server: Do not reimage, es1029-es1031 [puppet] - 10https://gerrit.wikimedia.org/r/640287 (https://phabricator.wikimedia.org/T261717) [06:32:38] (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage, es1029-es1031 [puppet] - 10https://gerrit.wikimedia.org/r/640287 (https://phabricator.wikimedia.org/T261717) (owner: 10Marostegui) [06:39:56] 10Operations, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Marostegui) @jijiki reading the task it is not clear to me what's needed from us (#dba). Is it a heads up that you'll be running `updateCollation.php` against the wikis listed on T264991#... [06:44:44] !log Restart pc1010 to pick up report_host - T266483 [06:44:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:52] T266483: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 [06:53:58] !log Restart dbstore* to pick up report_host - T266483 [06:54:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:05] T266483: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 [07:14:20] 10Operations, 10DBA, 10Orchestrator, 10Patch-For-Review: orchestrator: Use ssl for talking to db servers - https://phabricator.wikimedia.org/T267401 (10Marostegui) p:05Triage→03Medium [07:16:29] (03CR) 10Elukey: "This change also affects an-coord1001 afaics, where we don't run bigtop yet and Oozie doesn't support the admin list via posix group :(" [puppet] - 10https://gerrit.wikimedia.org/r/640260 (https://phabricator.wikimedia.org/T262660) (owner: 10Razzi) [07:18:44] (03PS2) 10Hamish: Add wgImportSources for zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637869 (https://phabricator.wikimedia.org/T266388) [07:18:46] (03PS1) 10Hamish: Add wgImportSources for zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640290 (https://phabricator.wikimedia.org/T266388) [07:19:24] (03CR) 10Elukey: [C: 03+1] Refactoring: rename internal modules [software/spicerack] - 10https://gerrit.wikimedia.org/r/634056 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [07:21:14] 10Operations, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10jijiki) @Marostegui yes, it is a headsup for your radar, thank you! [07:22:22] 10Operations, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Marostegui) Thank you! What's the expected impact of `updateCollation.php`? [07:23:33] (03CR) 10Hamish: Add wgImportSources for zhwikinews (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637869 (https://phabricator.wikimedia.org/T266388) (owner: 10Hamish) [07:24:56] (03Abandoned) 10Elukey: Assing role::analytics_cluster::coordinator::query to an-coord1002 [puppet] - 10https://gerrit.wikimedia.org/r/635000 (https://phabricator.wikimedia.org/T257412) (owner: 10Elukey) [07:25:56] (03PS1) 10Marostegui: core-mysql.my.cnf.erb: Change expire_log_days [puppet] - 10https://gerrit.wikimedia.org/r/640293 [07:29:05] <_joe_> uhm [07:35:37] (03PS2) 10Marostegui: core-mysql.my.cnf.erb: Change expire_log_days [puppet] - 10https://gerrit.wikimedia.org/r/640293 [07:40:01] !log import hue_4.8.0-2 to buster-wikimedia [07:40:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:45] (03CR) 10Elukey: [C: 03+1] "Checked manually all the versions in Buster, LGTM!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/639010 (owner: 10Volans) [07:51:55] (03CR) 10Hashar: [V: 03+2 C: 03+2] Add metrics-reporter-jmx plugin [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/640177 (https://phabricator.wikimedia.org/T184086) (owner: 10Hashar) [07:52:15] (03CR) 10Elukey: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/639011 (owner: 10Volans) [07:52:29] (03PS5) 10KartikMistry: Remove wgContentTranslationRESTBase config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634956 (https://phabricator.wikimedia.org/T266213) [07:53:52] (03PS2) 10Hashar: Add metrics-reporter-prometheus plugin [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/640174 (https://phabricator.wikimedia.org/T184086) [07:54:02] (03CR) 10Hashar: [V: 03+2 C: 03+2] Add metrics-reporter-prometheus plugin [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/640174 (https://phabricator.wikimedia.org/T184086) (owner: 10Hashar) [07:54:16] 10Operations, 10Data-Persistence-Backup, 10SRE-tools: Add toil::systemd_scope_cleanup to dbprov hosts - https://phabricator.wikimedia.org/T265323 (10jcrespo) Just to be clear- this is closed- the profile was added- that won't fix the issue, only run reset-failed on a cron, as the parent did. If you wanted "f... [08:02:37] (03PS2) 10Volans: dependencies: update min version to match Buster [software/spicerack] - 10https://gerrit.wikimedia.org/r/639010 [08:04:16] I am going to restart Gerrit in a few to deploy a couple plugins. ~ 1 minute interruption. [08:04:44] !log hashar@deploy1001 Started deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - T184086 [08:04:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:04:51] T184086: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086 [08:04:54] !log hashar@deploy1001 Finished deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - T184086 (duration: 00m 10s) [08:05:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:00] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM, one comment inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/640221 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [08:05:14] * volans holding off merging patches [08:06:15] !log Restarting Gerrit on gerrit2001 / gerrit-replica [08:06:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:46] 10Operations, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10MoritzMuehlenhoff) One other sanity check for the rollout (in particular when the whole server batch gets upgraded on the 16th); ` cumin foo* 'php -r "var_dump(IntlChar::getUnicodeVersi... [08:14:24] 10Operations, 10Data-Persistence-Backup, 10SRE-tools: Add toil::systemd_scope_cleanup to dbprov hosts - https://phabricator.wikimedia.org/T265323 (10Marostegui) >>! In T265323#6615449, @jcrespo wrote: > Just to be clear- this is closed- Yes, I saw that. > If you wanted "fixed", that should be brought to the... [08:36:34] 10Operations, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10MoritzMuehlenhoff) [08:51:21] 10Operations, 10MediaWiki-General, 10serviceops, 10MW-1.34-notes (1.34.0-wmf.16; 2019-07-30), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Joe) Gentle nudge, this really needs to be completed. @WDoranWMF... [08:54:19] PROBLEM - Check systemd state on otrs1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:54:49] PROBLEM - clamd running on otrs1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 112 (clamav), command name clamd https://wikitech.wikimedia.org/wiki/OTRS%23ClamAV [08:55:21] (03CR) 10Volans: [C: 03+2] dependencies: update min version to match Buster [software/spicerack] - 10https://gerrit.wikimedia.org/r/639010 (owner: 10Volans) [08:57:00] (03CR) 10Giuseppe Lavagetto: "> Patch Set 7: Code-Review-1" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/634924 (https://phabricator.wikimedia.org/T265324) (owner: 10Giuseppe Lavagetto) [08:57:56] (03CR) 10jerkins-bot: [V: 04-1] dependencies: update min version to match Buster [software/spicerack] - 10https://gerrit.wikimedia.org/r/639010 (owner: 10Volans) [08:59:23] (03CR) 10Volans: [C: 03+2] dependencies: update min version to match Buster [software/spicerack] - 10https://gerrit.wikimedia.org/r/639010 (owner: 10Volans) [08:59:51] !log Restarted Gerrit for plugins deployment [08:59:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:28] (03CR) 10Hashar: [C: 03+1] "I have deployed the plugins to Gerrit a minute ago :]" [puppet] - 10https://gerrit.wikimedia.org/r/640215 (https://phabricator.wikimedia.org/T184086) (owner: 10Hashar) [09:04:13] (03CR) 10Volans: [C: 03+2] dependencies: update min version to match Buster [software/spicerack] - 10https://gerrit.wikimedia.org/r/639010 (owner: 10Volans) [09:07:06] (03Merged) 10jenkins-bot: dependencies: update min version to match Buster [software/spicerack] - 10https://gerrit.wikimedia.org/r/639010 (owner: 10Volans) [09:07:29] (03CR) 10JMeybohm: [C: 04-1] "> Patch Set 7:" (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/634924 (https://phabricator.wikimedia.org/T265324) (owner: 10Giuseppe Lavagetto) [09:07:40] (03PS2) 10Volans: tests: remove require_* decorators [software/spicerack] - 10https://gerrit.wikimedia.org/r/639011 [09:09:11] (03PS1) 10Jcrespo: admin: fix bug on jynus' .bashrc - sockets are not regular files [puppet] - 10https://gerrit.wikimedia.org/r/640347 [09:09:41] RECOVERY - Check systemd state on otrs1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:10:11] RECOVERY - clamd running on otrs1001 is OK: PROCS OK: 1 process with UID = 112 (clamav), command name clamd https://wikitech.wikimedia.org/wiki/OTRS%23ClamAV [09:10:11] (03CR) 10Jcrespo: [C: 03+2] admin: fix bug on jynus' .bashrc - sockets are not regular files [puppet] - 10https://gerrit.wikimedia.org/r/640347 (owner: 10Jcrespo) [09:17:08] (03CR) 10Volans: [C: 03+2] tests: remove require_* decorators [software/spicerack] - 10https://gerrit.wikimedia.org/r/639011 (owner: 10Volans) [09:19:46] (03Merged) 10jenkins-bot: tests: remove require_* decorators [software/spicerack] - 10https://gerrit.wikimedia.org/r/639011 (owner: 10Volans) [09:25:46] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, some nits inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/640217 (owner: 10Jbond) [09:37:30] (03PS1) 10David Caro: [admin] Add dcaro home dir with base vimrc [puppet] - 10https://gerrit.wikimedia.org/r/640351 (https://phabricator.wikimedia.org/T266068) [09:38:04] (03CR) 10jerkins-bot: [V: 04-1] [admin] Add dcaro home dir with base vimrc [puppet] - 10https://gerrit.wikimedia.org/r/640351 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [09:39:03] (03PS2) 10David Caro: [admin] Add dcaro home dir with base vimrc [puppet] - 10https://gerrit.wikimedia.org/r/640351 (https://phabricator.wikimedia.org/T266068) [09:39:54] (03PS3) 10David Caro: admin: Add dcaro home dir with base vimrc [puppet] - 10https://gerrit.wikimedia.org/r/640351 (https://phabricator.wikimedia.org/T266068) [09:41:49] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] admin: Add dcaro home dir with base vimrc [puppet] - 10https://gerrit.wikimedia.org/r/640351 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [09:41:54] (03CR) 10David Caro: [C: 03+2] admin: Add dcaro home dir with base vimrc [puppet] - 10https://gerrit.wikimedia.org/r/640351 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [09:42:32] (03CR) 10Muehlenhoff: [C: 03+1] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/640158 (https://phabricator.wikimedia.org/T262512) (owner: 10Filippo Giunchedi) [10:02:44] (03PS1) 10David Caro: nagios: Add dcaro to the contacts [puppet] - 10https://gerrit.wikimedia.org/r/640354 (https://phabricator.wikimedia.org/T266068) [10:04:17] (03PS1) 10Arturo Borrero Gonzalez: kubeadm: wmcs-k8s-node-upgrade: bump version numbers [puppet] - 10https://gerrit.wikimedia.org/r/640356 (https://phabricator.wikimedia.org/T263284) [10:04:26] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] nagios: Add dcaro to the contacts [puppet] - 10https://gerrit.wikimedia.org/r/640354 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [10:05:11] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] kubeadm: wmcs-k8s-node-upgrade: bump version numbers [puppet] - 10https://gerrit.wikimedia.org/r/640356 (https://phabricator.wikimedia.org/T263284) (owner: 10Arturo Borrero Gonzalez) [10:05:25] (03CR) 10David Caro: [C: 03+2] nagios: Add dcaro to the contacts [puppet] - 10https://gerrit.wikimedia.org/r/640354 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [10:10:07] (03PS1) 10Elukey: role::analytics_cluster::coordinator: reword docs and remove unused bits [puppet] - 10https://gerrit.wikimedia.org/r/640357 [10:11:01] (03CR) 10Jbond: [C: 03+2] debian::codename::requre: allow passing a custom message to require [puppet] - 10https://gerrit.wikimedia.org/r/640217 (owner: 10Jbond) [10:12:41] (03PS3) 10Jbond: debian::codename::require: update code to use new debian:: functions [puppet] - 10https://gerrit.wikimedia.org/r/640221 (https://phabricator.wikimedia.org/T266479) [10:13:08] (03CR) 10Jbond: debian::codename::require: update code to use new debian:: functions (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/640221 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [10:14:05] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26397" [puppet] - 10https://gerrit.wikimedia.org/r/640357 (owner: 10Elukey) [10:14:50] (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::coordinator: reword docs and remove unused bits [puppet] - 10https://gerrit.wikimedia.org/r/640357 (owner: 10Elukey) [10:14:52] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26398" [puppet] - 10https://gerrit.wikimedia.org/r/640221 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [10:15:03] (03Abandoned) 10Muehlenhoff: profile::analytics::database::meta: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/599696 (owner: 10Muehlenhoff) [10:15:36] (03Abandoned) 10Muehlenhoff: Switch Thumbor hardening from Firejail to native systemd features (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/482309 (https://phabricator.wikimedia.org/T212941) (owner: 10Muehlenhoff) [10:16:03] (03Abandoned) 10Muehlenhoff: Add debug repository for stretch onwards [puppet] - 10https://gerrit.wikimedia.org/r/356822 (https://phabricator.wikimedia.org/T164819) (owner: 10Muehlenhoff) [10:19:33] (03CR) 10Vgutierrez: [C: 03+1] configure digicert-2020 certificates [puppet] - 10https://gerrit.wikimedia.org/r/640213 (https://phabricator.wikimedia.org/T261419) (owner: 10BBlack) [10:22:28] (03PS1) 10David Caro: nagios.cgi: added dcaro to the allowed commands [puppet] - 10https://gerrit.wikimedia.org/r/640359 (https://phabricator.wikimedia.org/T266068) [10:27:52] (03CR) 10Muehlenhoff: [C: 04-1] "There's two more you need: authorized_for_system_information and authorized_for_configuration_information" [puppet] - 10https://gerrit.wikimedia.org/r/640359 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [10:30:11] (03PS1) 10Elukey: role::analytics_cluster::coordinator: remove unused hiera settings [puppet] - 10https://gerrit.wikimedia.org/r/640362 [10:33:39] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26400" [puppet] - 10https://gerrit.wikimedia.org/r/640362 (owner: 10Elukey) [10:34:01] (03CR) 10Elukey: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/26400/" [puppet] - 10https://gerrit.wikimedia.org/r/640362 (owner: 10Elukey) [10:38:14] (03PS2) 10David Caro: nagios.cgi: added dcaro to the allowed commands [puppet] - 10https://gerrit.wikimedia.org/r/640359 (https://phabricator.wikimedia.org/T266068) [10:38:16] (03CR) 10David Caro: "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/640359 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [10:38:35] (03PS1) 10Elukey: Introduce role::analytics_cluster::coordinator::replica [puppet] - 10https://gerrit.wikimedia.org/r/640364 (https://phabricator.wikimedia.org/T257412) [10:39:07] (03CR) 10David Caro: [C: 04-1] nagios.cgi: added dcaro to the allowed commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/640359 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [10:39:44] (03PS3) 10David Caro: nagios.cgi: added dcaro to the allowed commands [puppet] - 10https://gerrit.wikimedia.org/r/640359 (https://phabricator.wikimedia.org/T266068) [10:41:03] (03CR) 10Effie Mouzeli: [C: 03+2] mcrouter_wancache: tune onhost memcached [puppet] - 10https://gerrit.wikimedia.org/r/639775 (https://phabricator.wikimedia.org/T244340) (owner: 10Effie Mouzeli) [10:41:30] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me! (And let's also fix up Nicholas's config when he's back for consistency, I think this was just an oversight)" [puppet] - 10https://gerrit.wikimedia.org/r/640359 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [10:41:43] (03PS1) 10Elukey: Add analytics-hive.eqiad.wmnet CNAME to an-coord1002 [dns] - 10https://gerrit.wikimedia.org/r/640365 (https://phabricator.wikimedia.org/T257412) [10:42:10] (03CR) 10Filippo Giunchedi: [C: 03+2] profile: redirect to grafana-rw with referer [puppet] - 10https://gerrit.wikimedia.org/r/640158 (https://phabricator.wikimedia.org/T262512) (owner: 10Filippo Giunchedi) [10:44:01] (03CR) 10Jbond: [V: 03+1 C: 03+2] debian::codename::require: update code to use new debian:: functions [puppet] - 10https://gerrit.wikimedia.org/r/640221 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [10:44:32] (03CR) 10Elukey: [C: 03+2] Add analytics-hive.eqiad.wmnet CNAME to an-coord1002 [dns] - 10https://gerrit.wikimedia.org/r/640365 (https://phabricator.wikimedia.org/T257412) (owner: 10Elukey) [10:45:48] (03PS8) 10Giuseppe Lavagetto: Add apache httpd base image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/634924 (https://phabricator.wikimedia.org/T265324) [10:45:50] (03PS6) 10Giuseppe Lavagetto: Add an httpd-fcgi image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/636634 (https://phabricator.wikimedia.org/T265324) [10:45:52] (03PS2) 10Giuseppe Lavagetto: Add base php cli image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/638095 (https://phabricator.wikimedia.org/T265324) [10:45:54] (03PS1) 10Giuseppe Lavagetto: Add a php-fpm image for php 7.2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/640386 (https://phabricator.wikimedia.org/T265324) [10:47:41] (03PS1) 10Jbond: P:tendril::webserver: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640387 (https://phabricator.wikimedia.org/T266479) [10:48:37] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26401" [puppet] - 10https://gerrit.wikimedia.org/r/640387 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [10:52:34] (03CR) 10David Caro: [C: 03+2] nagios.cgi: added dcaro to the allowed commands (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/640359 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [10:54:04] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10ArielGlenn) I will updating to icu63 in deployment-prep, with Moritz looking on. This will likely happen later today, and I'll post updates about the prog... [10:57:00] (03PS2) 10Elukey: Introduce role::analytics_cluster::coordinator::replica [puppet] - 10https://gerrit.wikimedia.org/r/640364 (https://phabricator.wikimedia.org/T257412) [10:57:05] (03PS1) 10David Caro: nagios.cgi: add nskaggs to info auth groups [puppet] - 10https://gerrit.wikimedia.org/r/640388 (https://phabricator.wikimedia.org/T266068) [10:57:27] (03CR) 10David Caro: "Added Nicholas account here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/640388" [puppet] - 10https://gerrit.wikimedia.org/r/640359 (https://phabricator.wikimedia.org/T266068) (owner: 10David Caro) [10:58:30] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:tendril::webserver: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640387 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [10:59:06] dcaro: you ok for me to merge your cgi change [10:59:30] jbond42: sure, thanks :) [10:59:49] merged [11:01:58] 10Operations, 10observability: smart-data-dump should fail loudly when it can't gather metrics - https://phabricator.wikimedia.org/T267135 (10fgiunchedi) I think it is fair to say that if no disks are detected then that's always an error condition (?) In that case I think a simple(r) solution would be to exit... [11:03:27] (03PS2) 10Jbond: P:tendril::webserver: migrate to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639824 (https://phabricator.wikimedia.org/T266479) [11:09:23] (03PS1) 10Jbond: P:base: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640389 (https://phabricator.wikimedia.org/T266479) [11:13:02] (03PS1) 10Jbond: P:toolforge: migrate to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/640390 (https://phabricator.wikimedia.org/T267396) [11:14:29] (03CR) 10Jbond: [C: 03+2] P:toolforge: migrate to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/640390 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [11:17:44] (03PS3) 10Jbond: P:toolforge: migrate to ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639826 (https://phabricator.wikimedia.org/T266479) [11:17:47] (03CR) 10Elukey: [C: 03+2] Introduce role::analytics_cluster::coordinator::replica [puppet] - 10https://gerrit.wikimedia.org/r/640364 (https://phabricator.wikimedia.org/T257412) (owner: 10Elukey) [11:18:51] (03CR) 10Jbond: "ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/639826 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [11:19:12] 10Operations, 10MediaWiki-Documentation, 10Patch-For-Review, 10User-Dereckson, 10patch-welcome: Repair "svn.wikimedia.org/doc/" redirect for doc.wikimedia.org - https://phabricator.wikimedia.org/T109950 (10Aklapper) Ping - how to get a review from SRE on this patch? (@Vgutierrez was mentioned in https://... [11:19:54] (03CR) 10Jbond: [C: 03+2] P:base: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640389 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [11:24:24] 10Operations, 10observability: smart-data-dump should fail loudly when it can't gather metrics - https://phabricator.wikimedia.org/T267135 (10fgiunchedi) >>! In T267135#6615766, @fgiunchedi wrote: > The related problem to this one is also that the hp raid controller name changed when we changed the facter invo... [11:32:30] (03PS1) 10Jbond: mariadb: migrate to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/640394 (https://phabricator.wikimedia.org/T267396) [11:32:38] jouncebot: refresh please [11:32:39] I refreshed my knowledge about deployments. [11:32:42] thx [11:32:59] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26404" [puppet] - 10https://gerrit.wikimedia.org/r/640394 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [11:33:04] (03CR) 10Jbond: [C: 03+2] nginx: drop condition for ge jessei as it matches all nodes [puppet] - 10https://gerrit.wikimedia.org/r/639790 (owner: 10Jbond) [11:39:09] (03CR) 10Muehlenhoff: "We still have jessie nginx systems on the conf2 hosts." [puppet] - 10https://gerrit.wikimedia.org/r/639790 (owner: 10Jbond) [11:39:26] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26405" [puppet] - 10https://gerrit.wikimedia.org/r/640394 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [11:41:39] (03CR) 10Jbond: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/639790 (owner: 10Jbond) [11:42:30] (03CR) 10Muehlenhoff: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/639790 (owner: 10Jbond) [11:44:26] (03CR) 10Jbond: [V: 03+1 C: 03+2] mariadb: migrate to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/640394 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [11:49:07] (03PS3) 10Volans: Refactoring: rename internal modules [software/spicerack] - 10https://gerrit.wikimedia.org/r/634056 (https://phabricator.wikimedia.org/T221212) [11:49:09] (03PS12) 10Volans: cookbook API: add class API [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) [11:49:32] (03CR) 10Volans: "Thanks a lot for the reviews, replies inline, code updated." (0311 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [11:52:18] (03CR) 10jerkins-bot: [V: 04-1] cookbook API: add class API [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [11:52:49] what's up jenkins... it works on my computer :D [11:53:19] (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [12:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for European mid-day backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201110T1200). [12:00:04] Lucas_WMDE and effie: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:11] o/ [12:00:38] I’ll start with my backport, which was merged at the wrong time and ended up not being deployed [12:00:44] should only need a git rebase on the deployment server (+ sync) [12:02:16] testing on mwdebug1001 [12:02:58] (03CR) 10Alexandros Kosiaris: [C: 03+1] Add an httpd-fcgi image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/636634 (https://phabricator.wikimedia.org/T265324) (owner: 10Giuseppe Lavagetto) [12:03:11] seems to be working, syncing [12:04:04] (03PS3) 10Jbond: mariadb: migrate to ensure_packages and minor refactor [puppet] - 10https://gerrit.wikimedia.org/r/639785 (https://phabricator.wikimedia.org/T266479) [12:04:39] (03CR) 10Alexandros Kosiaris: [C: 03+1] Add apache httpd base image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/634924 (https://phabricator.wikimedia.org/T265324) (owner: 10Giuseppe Lavagetto) [12:04:48] (03CR) 10Jbond: "ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/639785 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [12:04:49] !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.36.0-wmf.16/extensions/Wikibase: Backport: [[gerrit:639035|Revert JS parser commits (T266671)]] (duration: 01m 04s) [12:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:56] T266671: Revert commit 7f430f142d from `Malformed input error on text which is not malformed` - https://phabricator.wikimedia.org/T266671 [12:05:22] effie: still testing those config changes? [12:05:43] Lucas_WMDE: yes, can I have some more time to do a final check please? [12:05:47] ok sure! [12:05:50] thank you [12:06:03] (03CR) 10Jbond: [C: 03+1] "LGTM" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/640067 (owner: 10Giuseppe Lavagetto) [12:19:40] Lucas_WMDE: ok please go ahead [12:19:46] ok! [12:19:52] thank you for waiting [12:20:31] (03PS2) 10Lucas Werkmeister (WMDE): Add "mcrouter-with-onhost-tier" entry to $wgObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636094 (https://phabricator.wikimedia.org/T264604) (owner: 10Aaron Schulz) [12:20:37] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Add "mcrouter-with-onhost-tier" entry to $wgObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636094 (https://phabricator.wikimedia.org/T264604) (owner: 10Aaron Schulz) [12:20:49] can the changes be tested, either individually or together? [12:21:02] the first one is basically a noop [12:21:12] the second one is the one that will impact mwdebug1001 [12:21:30] (03Merged) 10jenkins-bot: Add "mcrouter-with-onhost-tier" entry to $wgObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636094 (https://phabricator.wikimedia.org/T264604) (owner: 10Aaron Schulz) [12:21:33] only mwdebug1001? [12:21:35] yes [12:21:42] we will manage the rollout via puppet [12:22:05] we are experimenting on using a memcached instance running on each app server [12:22:12] ok [12:23:03] syncing the first one now [12:23:12] (03PS2) 10Lucas Werkmeister (WMDE): Switch parser cache to using "mcrouter-with-onhost-tier" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636095 (https://phabricator.wikimedia.org/T264604) (owner: 10Aaron Schulz) [12:23:52] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/mc.php: Config: [[gerrit:636094|Add "mcrouter-with-onhost-tier" entry to $wgObjectCaches (T264604)]] (duration: 00m 57s) [12:23:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:59] T264604: MediaWiki to route specific keys to /*/mw-with-onhost-tier/ - https://phabricator.wikimedia.org/T264604 [12:23:59] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Switch parser cache to using "mcrouter-with-onhost-tier" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636095 (https://phabricator.wikimedia.org/T264604) (owner: 10Aaron Schulz) [12:24:21] oh, right, I remember hearing about that task [12:24:25] sounds good :) [12:24:44] (03Merged) 10jenkins-bot: Switch parser cache to using "mcrouter-with-onhost-tier" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636095 (https://phabricator.wikimedia.org/T264604) (owner: 10Aaron Schulz) [12:25:33] effie: change is on mwdebug1001, do you want to test it before I sync it? [12:25:57] I can see it working:) [12:26:05] can we have a go at mwdebug1002 ? [12:26:05] ok :) [12:26:11] uh, sure [12:26:15] awesome [12:26:37] pulled there as well [12:27:49] 1' [12:28:05] 0K [12:29:33] (03CR) 10JMeybohm: [C: 04-1] Add apache httpd base image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/634924 (https://phabricator.wikimedia.org/T265324) (owner: 10Giuseppe Lavagetto) [12:29:35] looks good to me [12:29:40] alright, syncing [12:29:43] go for it, thank you [12:31:10] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:636095|Switch parser cache to using "mcrouter-with-onhost-tier" (T264604)]] (duration: 00m 57s) [12:31:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:17] T264604: MediaWiki to route specific keys to /*/mw-with-onhost-tier/ - https://phabricator.wikimedia.org/T264604 [12:33:07] logspam-watch looks fine so far [12:33:14] !log installing wireshark security updates [12:33:17] !log EU backport&config window done [12:33:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:33:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:47] (03CR) 10Jbond: [C: 03+1] "LGTM just some minor nits not tested locally as have slightly different set up" (035 comments) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/639913 (owner: 10Giuseppe Lavagetto) [12:40:58] (03CR) 10Jbond: [C: 03+1] "LGTM" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/639912 (owner: 10Giuseppe Lavagetto) [12:41:45] (03PS1) 10Effie Mouzeli: hieradata: enable onhost memcached on mw1276 [puppet] - 10https://gerrit.wikimedia.org/r/640401 (https://phabricator.wikimedia.org/T244340) [12:41:47] (03CR) 10Jbond: "bump" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [12:44:48] (03CR) 10Effie Mouzeli: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26406" [puppet] - 10https://gerrit.wikimedia.org/r/640401 (https://phabricator.wikimedia.org/T244340) (owner: 10Effie Mouzeli) [12:45:36] (03CR) 10Muehlenhoff: use dnsmasq: add configuration to use dnsmasq with WMF config (031 comment) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [12:46:36] !log depool mw1276 to install onhost memcached - T244340 [12:46:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:42] T244340: Reduce read pressure on mc* servers by adding a machine-local Memcached instance (on-host memcached) - https://phabricator.wikimedia.org/T244340 [12:49:06] (03PS10) 10Jbond: use dnsmasq: add configuration to use dnsmasq with WMF config [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 [12:49:37] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] hieradata: enable onhost memcached on mw1276 [puppet] - 10https://gerrit.wikimedia.org/r/640401 (https://phabricator.wikimedia.org/T244340) (owner: 10Effie Mouzeli) [12:49:49] (03CR) 10Jbond: "thanks updated" (031 comment) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [12:49:53] (03PS2) 10Effie Mouzeli: hieradata: enable onhost memcached on mw1276 [puppet] - 10https://gerrit.wikimedia.org/r/640401 (https://phabricator.wikimedia.org/T244340) [12:51:49] (03CR) 10Muehlenhoff: use dnsmasq: add configuration to use dnsmasq with WMF config (031 comment) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [12:52:41] (03PS1) 10Jbond: P:base migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640406 (https://phabricator.wikimedia.org/T266479) [12:53:03] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 106024288 and 7 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:57:05] (03CR) 10Jbond: "questions for cloud do we still need to handle wmflabs?" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [12:57:31] (03CR) 10Jbond: use dnsmasq: add configuration to use dnsmasq with WMF config (031 comment) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [12:58:03] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 99656 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [13:01:03] (03PS1) 10Jgreen: switch frbast.wm.o to frbast-codfw while we upgrade the eqiad bastion [dns] - 10https://gerrit.wikimedia.org/r/640407 [13:02:32] (03CR) 10Jgreen: [C: 03+2] switch frbast.wm.o to frbast-codfw while we upgrade the eqiad bastion [dns] - 10https://gerrit.wikimedia.org/r/640407 (owner: 10Jgreen) [13:02:56] (03CR) 10Jbond: [C: 03+2] P:base migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/640406 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:06:24] (03CR) 10Muehlenhoff: "Could you also add an entry to README, which points to the separate binary package along with a short description? The way wmf-sre-laptop " (031 comment) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [13:15:04] (03PS1) 10Jgreen: flip frbast.wm.o back to eqiad bastion [dns] - 10https://gerrit.wikimedia.org/r/640410 [13:15:13] (03PS1) 10Jbond: P:mediawiki::common: migrate to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/640411 (https://phabricator.wikimedia.org/T267396) [13:16:49] (03CR) 10Jgreen: [C: 03+2] flip frbast.wm.o back to eqiad bastion [dns] - 10https://gerrit.wikimedia.org/r/640410 (owner: 10Jgreen) [13:17:23] !log Restart db1117* to pick up report_host - T266483 [13:17:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:30] T266483: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 [13:17:51] (03CR) 10Hashar: [C: 03+1] "Logged in as an administrator, I can reach https://gerrit.wikimedia.org/r/plugins/metrics-reporter-prometheus/metrics" [puppet] - 10https://gerrit.wikimedia.org/r/640215 (https://phabricator.wikimedia.org/T184086) (owner: 10Hashar) [13:22:03] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26407" [puppet] - 10https://gerrit.wikimedia.org/r/640411 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [13:22:30] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:mediawiki::common: migrate to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/640411 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [13:23:27] (03CR) 10Muehlenhoff: [C: 03+1] "We could also just swap the Depends: to elpa-magit, but I'm merging this as-is :-)" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/640067 (owner: 10Giuseppe Lavagetto) [13:23:34] (03PS2) 10Muehlenhoff: Move git-review to Recommends [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/640067 (owner: 10Giuseppe Lavagetto) [13:23:51] (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Move git-review to Recommends [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/640067 (owner: 10Giuseppe Lavagetto) [13:28:44] (03PS11) 10Jbond: use dnsmasq: add configuration to use dnsmasq with WMF config [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 [13:28:59] !log Restart db2093 to pick up report_host - T266483 [13:29:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:06] T266483: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 [13:32:18] (03CR) 10Jbond: "> Patch Set 10:" (031 comment) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [13:40:00] (03CR) 10Muehlenhoff: "Looks good, two more comments inline" (032 comments) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [13:51:03] !log imported php-memcached 3.0.1+2.2.0-1~wmf3+buster1 to component/php72 for buster-wikimedia [13:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:34] (03PS1) 10Jbond: ssl_ciphersuite: update to use stdlibs os_version_gte [puppet] - 10https://gerrit.wikimedia.org/r/640417 [13:53:36] (03PS1) 10Jbond: wmflib::os_version: drop the os_version and requires_os functions [puppet] - 10https://gerrit.wikimedia.org/r/640418 (https://phabricator.wikimedia.org/T267396) [13:54:49] (03PS1) 10ArielGlenn: option for text pass fixup script to write files to specified directory [dumps] - 10https://gerrit.wikimedia.org/r/640420 [13:55:58] (03CR) 10jerkins-bot: [V: 04-1] ssl_ciphersuite: update to use stdlibs os_version_gte [puppet] - 10https://gerrit.wikimedia.org/r/640417 (owner: 10Jbond) [13:56:16] (03CR) 10jerkins-bot: [V: 04-1] wmflib::os_version: drop the os_version and requires_os functions [puppet] - 10https://gerrit.wikimedia.org/r/640418 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [14:03:02] (03CR) 10Jbond: use dnsmasq: add configuration to use dnsmasq with WMF config (032 comments) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [14:03:04] (03PS12) 10Jbond: use dnsmasq: add configuration to use dnsmasq with WMF config [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 [14:06:10] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640424 [14:09:17] (03CR) 10CDanis: [C: 03+2] Add httpbb tests for apple-app-site-association magic URL [puppet] - 10https://gerrit.wikimedia.org/r/640257 (https://phabricator.wikimedia.org/T259312) (owner: 10CDanis) [14:11:56] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640427 [14:11:58] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640428 [14:12:23] (03PS1) 10David Caro: apt.unattendedupgrades: use apt::conf [puppet] - 10https://gerrit.wikimedia.org/r/640429 [14:14:15] (03CR) 10Muehlenhoff: use dnsmasq: add configuration to use dnsmasq with WMF config (031 comment) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [14:17:34] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640430 [14:18:55] (03PS2) 10Ladsgroup: ores: Stop memory reporting [puppet] - 10https://gerrit.wikimedia.org/r/637557 [14:19:01] (03PS3) 10Ladsgroup: ores: Stop memory reporting [puppet] - 10https://gerrit.wikimedia.org/r/637557 [14:19:32] (03CR) 10Ladsgroup: ores: Stop memory reporting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/637557 (owner: 10Ladsgroup) [14:20:43] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/637557 (owner: 10Ladsgroup) [14:21:21] !log pooling mw1276 - T244340 [14:21:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:29] T244340: Reduce read pressure on mc* servers by adding a machine-local Memcached instance (on-host memcached) - https://phabricator.wikimedia.org/T244340 [14:23:06] (03PS2) 10Jbond: ssl_ciphersuite: update to use the legacy fact operatingsystemmajrelease [puppet] - 10https://gerrit.wikimedia.org/r/640417 [14:24:21] (03PS4) 10Reedy: Stop installing timidity and freepats on appservers [puppet] - 10https://gerrit.wikimedia.org/r/445604 [14:25:14] (03CR) 10Reedy: "As we're getting closer to doing the buster upgrades (T245757; post ICU upgrade in T264991 obviously)... Would be good to get this merged " [puppet] - 10https://gerrit.wikimedia.org/r/445604 (owner: 10Reedy) [14:29:36] (03CR) 10Andrew Bogott: "> Patch Set 10:" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [14:30:27] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Reedy) >>! In T264991#6615421, @Marostegui wrote: > Thank you! What's the expected impact of `updateCollation.php`? Many many categorylinks rows being up... [14:31:53] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services): Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10MoritzMuehlenhoff) [14:31:54] (03CR) 10Muehlenhoff: "Added a note to https://phabricator.wikimedia.org/T245757" [puppet] - 10https://gerrit.wikimedia.org/r/445604 (owner: 10Reedy) [14:32:27] (03CR) 10Reedy: "(Y)" [puppet] - 10https://gerrit.wikimedia.org/r/445604 (owner: 10Reedy) [14:34:36] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10jijiki) >>! In T264991#6615421, @Marostegui wrote: > Thank you! What's the expected impact of `updateCollation.php`? We will run on one wiki and see how... [14:36:03] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640434 [14:36:40] (03PS13) 10Jbond: use dnsmasq: add configuration to use dnsmasq with WMF config [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 [14:36:59] (03CR) 10Jbond: use dnsmasq: add configuration to use dnsmasq with WMF config (031 comment) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [14:40:49] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Marostegui) Thank you @Reedy! [14:45:44] (03CR) 10Volans: "Upstream is at 2.9.9 now, what do we want to do? Anyway we should wait to complete the migration of the DNS to Netbox." [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/636464 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [14:51:22] (03CR) 10Jbond: "i would have the new flag just block the success message as the default behaviour is to not post anything elses and it allows the most fle" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/640265 (owner: 10Effie Mouzeli) [14:51:47] (03CR) 10Muehlenhoff: "Two final nits inline :-)" (032 comments) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [14:55:51] (03PS14) 10Jbond: use dnsmasq: add configuration to use dnsmasq with WMF config [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 [14:56:29] (03CR) 10Jbond: "updated thanks" (032 comments) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [14:57:17] 10Operations, 10observability, 10User-fgiunchedi: Wrong redirect when logging into grafana-rw from a grafana.w.o dashboard - https://phabricator.wikimedia.org/T267645 (10fgiunchedi) [14:58:48] (03PS1) 10Filippo Giunchedi: Revert "profile: redirect to grafana-rw with referer" [puppet] - 10https://gerrit.wikimedia.org/r/640436 (https://phabricator.wikimedia.org/T267645) [14:59:01] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [14:59:26] (03PS2) 10Filippo Giunchedi: Revert "profile: redirect to grafana-rw with referer" [puppet] - 10https://gerrit.wikimedia.org/r/640436 (https://phabricator.wikimedia.org/T267645) [15:01:40] (03PS3) 10Hashar: prometheus: collect Gerrit internal metrics [puppet] - 10https://gerrit.wikimedia.org/r/640215 (https://phabricator.wikimedia.org/T184086) [15:02:03] (03CR) 10Jbond: [V: 03+2 C: 03+2] use dnsmasq: add configuration to use dnsmasq with WMF config [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/614787 (owner: 10Jbond) [15:02:47] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/640436 (https://phabricator.wikimedia.org/T267645) (owner: 10Filippo Giunchedi) [15:02:54] (03CR) 10Filippo Giunchedi: [C: 03+2] Revert "profile: redirect to grafana-rw with referer" [puppet] - 10https://gerrit.wikimedia.org/r/640436 (https://phabricator.wikimedia.org/T267645) (owner: 10Filippo Giunchedi) [15:03:06] (03CR) 10Hashar: "Should be good now. We will have to restart Gerrit to take in account the confguration changes." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/640215 (https://phabricator.wikimedia.org/T184086) (owner: 10Hashar) [15:06:39] (03CR) 10Jbond: "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/640210 (owner: 10Jbond) [15:07:56] (03CR) 10Bstorm: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/639815 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [15:10:26] (03CR) 10Alexandros Kosiaris: [C: 03+1] "That's pretty awesome. nice!" [debs/calico] - 10https://gerrit.wikimedia.org/r/640094 (https://phabricator.wikimedia.org/T266893) (owner: 10JMeybohm) [15:13:24] (03CR) 10RLazarus: "Nice! One last round of feedback and then I'm off on vacation today -- can't look at this again before Monday, but obviously feel free to " (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [15:18:24] (03CR) 10Alexandros Kosiaris: "An inline question to help understand before commenting on the rest of the patch" (031 comment) [debs/calico] - 10https://gerrit.wikimedia.org/r/640095 (https://phabricator.wikimedia.org/T266893) (owner: 10JMeybohm) [15:23:22] (03CR) 10JMeybohm: Build a calico-images package (031 comment) [debs/calico] - 10https://gerrit.wikimedia.org/r/640095 (https://phabricator.wikimedia.org/T266893) (owner: 10JMeybohm) [15:24:07] RECOVERY - Long running screen/tmux on mw2221 is OK: OK: No SCREEN or tmux processes detected. https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens [15:27:34] (03PS1) 10Effie Mouzeli: hieradata: enable onhost memcached on mw1263 [puppet] - 10https://gerrit.wikimedia.org/r/640440 (https://phabricator.wikimedia.org/T244340) [15:28:03] !log depool mw1263 - T244340 [15:28:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:10] T244340: Reduce read pressure on mc* servers by adding a machine-local Memcached instance (on-host memcached) - https://phabricator.wikimedia.org/T244340 [15:28:35] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db214[234] - https://phabricator.wikimedia.org/T267041 (10Papaul) [15:28:48] (03CR) 10Effie Mouzeli: [C: 03+2] hieradata: enable onhost memcached on mw1263 [puppet] - 10https://gerrit.wikimedia.org/r/640440 (https://phabricator.wikimedia.org/T244340) (owner: 10Effie Mouzeli) [15:29:32] (03CR) 10Jbond: cookbook API: add class API (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [15:29:49] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Server moves to free up space on 10g racks - https://phabricator.wikimedia.org/T267065 (10Jclark-ctr) @elukey Please Review the racks i have recommended let me know if anything needs to change @Cmjohnson Will wait for after Luca gives the ok to configure ports... [15:36:31] (03PS3) 10Effie Mouzeli: pcc: add '-N' flag to avoid posting PCC result to jenkins [puppet] - 10https://gerrit.wikimedia.org/r/640265 [15:36:36] (03CR) 10Effie Mouzeli: pcc: add '-N' flag to avoid posting PCC result to jenkins (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/640265 (owner: 10Effie Mouzeli) [15:38:21] !log installing 4.19.152 kernel packages on buster hosts (only installing the package, reboots will happen separately) [15:38:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:55] robh: I am pretty sure I am on clinc=uc [15:39:58] clinic* [15:40:33] PROBLEM - Long running screen/tmux on mw2302 is CRITICAL: CRIT: Long running tmux process. (user: cdanis PID: 208953, 1735282s 1728000s). https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens [15:40:36] (03CR) 10Jbond: [C: 03+1] "LGTM thanks" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/640265 (owner: 10Effie Mouzeli) [15:40:54] sigh, time for another episode of 'what was I debugging two weeks ago?' [15:41:19] ah, opcache [15:41:34] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10ArielGlenn) Upgrade plan on deployment-prep: [] add profile::mediawiki::php::icu63: true to hiera for deployment-prep project prefix; this will only have... [15:42:33] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10MoritzMuehlenhoff) Sounds good! [15:42:35] RECOVERY - Long running screen/tmux on mw2302 is OK: OK: No SCREEN or tmux processes detected. https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens [15:52:38] !log zpapierski@deploy1001 Started deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9 [15:52:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:23] (03CR) 10Jbond: [C: 03+2] ssl_ciphersuite: update to use the legacy fact operatingsystemmajrelease [puppet] - 10https://gerrit.wikimedia.org/r/640417 (owner: 10Jbond) [15:53:45] !log zpapierski@deploy1001 Finished deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9 (duration: 01m 06s) [15:53:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:44] (03PS2) 10Jbond: wmflib::os_version: drop the os_version and requires_os functions [puppet] - 10https://gerrit.wikimedia.org/r/640418 (https://phabricator.wikimedia.org/T267396) [15:58:54] 10Operations, 10observability, 10User-fgiunchedi: Wrong redirect when logging into grafana-rw from a grafana.w.o dashboard - https://phabricator.wikimedia.org/T267645 (10jijiki) p:05Triage→03Medium [15:59:18] 10Operations, 10serviceops: upgrade mwmaint1002 to buster - https://phabricator.wikimedia.org/T267607 (10jijiki) p:05Triage→03Medium [15:59:34] 10Operations, 10SRE-Access-Requests: Requesting access to deployment for jgiannelos - https://phabricator.wikimedia.org/T267585 (10jijiki) p:05Triage→03Medium [15:59:54] 10Operations, 10Platform Engineering, 10serviceops, 10Performance-Team (Radar): Phasing out "redis_sessions" MediaWiki cluster - https://phabricator.wikimedia.org/T267581 (10jijiki) p:05Triage→03Medium [16:01:32] (03CR) 10Volans: "Replies inline, thanks a lot for all the feedbacks!" (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [16:01:39] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Server moves to free up space on 10g racks - https://phabricator.wikimedia.org/T267065 (10elukey) @Jclark-ctr I checked and I have only a couple of comments: 1) B7 druid1003 B6 U26 Port25 - this is druid1005 right? 2) could we place druid1001 in a rack that it is... [16:03:44] (03PS1) 10Ppchelko: JobQueue: Move LocalGlobalUserPageCacheUpdateJob to it's own queue. [deployment-charts] - 10https://gerrit.wikimedia.org/r/640446 (https://phabricator.wikimedia.org/T267520) [16:04:16] (03CR) 10Ppchelko: "Can't deploy until Monday, weekends." [deployment-charts] - 10https://gerrit.wikimedia.org/r/640446 (https://phabricator.wikimedia.org/T267520) (owner: 10Ppchelko) [16:05:14] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Server moves to free up space on 10g racks - https://phabricator.wikimedia.org/T267065 (10elukey) [16:07:01] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Traffic: Puppet disabled in beta cluster varnish deployment-cache-text06 - https://phabricator.wikimedia.org/T267578 (10jijiki) p:05Triage→03Medium [16:07:27] 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic: Beta needs to be upgraded to Varnish 6 - https://phabricator.wikimedia.org/T267561 (10jijiki) p:05Triage→03Medium [16:08:32] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Traffic: Beta cluster seems to be extremely slow for logged in user during page navigation - https://phabricator.wikimedia.org/T267435 (10jijiki) p:05Triage→03Medium [16:10:29] 10Operations, 10observability, 10Patch-For-Review: grafana email alerting broken? - https://phabricator.wikimedia.org/T267409 (10jijiki) p:05Triage→03Medium [16:10:30] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: collect Gerrit internal metrics [puppet] - 10https://gerrit.wikimedia.org/r/640215 (https://phabricator.wikimedia.org/T184086) (owner: 10Hashar) [16:10:49] 10Operations, 10observability: smart-data-dump should fail loudly when it can't gather metrics - https://phabricator.wikimedia.org/T267135 (10jijiki) p:05Triage→03Medium [16:12:08] hashar: puppet is running on prometheus, you'll start seeing the requests soon [16:12:43] !log Restarted Gerrit on gerrit2001 for config change [16:12:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:02] 10Operations, 10Wikimedia-Mailing-lists: Request for creation: Wiki Loves Africa Organizers Mailing List - https://phabricator.wikimedia.org/T267083 (10jijiki) @Robh is there a reason this list was not created last week? Are we waiting for someone's input? [16:16:01] 10Operations, 10Analytics: Augment NEL reports with a computed timestamp-of-generation - https://phabricator.wikimedia.org/T266886 (10jijiki) p:05Triage→03Medium [16:16:39] godog: I see the requests [16:16:48] though with a 403 :\ [16:17:12] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Traffic: Puppet disabled in beta cluster varnish deployment-cache-text06 - https://phabricator.wikimedia.org/T267578 (10thcipriani) Even with puppet disabled, packages were upgraded which broke this again. I set the packages to... [16:17:13] oh because I haven't restarted gerrit yet [16:17:24] (03PS4) 10C. Scott Ananian: Turn on formatnum logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640254 (https://phabricator.wikimedia.org/T267587) [16:17:39] (03PS1) 10Elukey: Update network topology for Hadoop worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/640448 (https://phabricator.wikimedia.org/T267065) [16:17:43] (03CR) 10C. Scott Ananian: Turn on formatnum logging (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640254 (https://phabricator.wikimedia.org/T267587) (owner: 10C. Scott Ananian) [16:17:56] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=gerrit-metrics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:17:58] !log Restarting Gerrit on gerrit1001 [16:18:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:23] that explains the error ;P [16:19:20] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10ArielGlenn) Note that I'm running cumin 'O{project:deployment-prep name:^deployment-mediawiki-[0-9]+$ } or O{project:deployment-prep name:^deployment... [16:19:43] godog: that WORKS!!!!!!! I get four requests with a 200 status. I guess we have 4 prometheus collector [16:19:51] (03PS2) 10Elukey: Update network topology for Hadoop worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/640448 (https://phabricator.wikimedia.org/T267065) [16:20:05] 10Operations, 10Prod-Kubernetes, 10serviceops, 10Kubernetes: Refactor calico deploy strategy - https://phabricator.wikimedia.org/T267653 (10JMeybohm) [16:20:09] !log pool mw1263 [16:20:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:20:15] and the Icinga alarm will clear ou [16:20:20] 10Operations, 10Prod-Kubernetes, 10serviceops, 10Kubernetes: Refactor calico deploy strategy - https://phabricator.wikimedia.org/T267653 (10JMeybohm) p:05Triage→03High [16:20:24] hashar: sweet! we have two prometheus hosts in eqiad [16:21:18] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:21:22] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se website, 10HTTPS: Set HSTS on wikiba.se (force HTTPS) - https://phabricator.wikimedia.org/T232246 (10jijiki) p:05Triage→03Medium [16:22:12] 10Operations, 10observability: Two close pages for idle workers api + appserver didn't auto-resolve on recovery - https://phabricator.wikimedia.org/T266570 (10jijiki) p:05Triage→03Medium [16:22:39] 10Operations, 10Wikidata, 10Wikidata Query UI, 10User-Addshore: Move WDQS UI to microsites - https://phabricator.wikimedia.org/T266702 (10jijiki) p:05Triage→03Medium [16:22:57] 10Operations, 10Prod-Kubernetes, 10serviceops, 10Kubernetes: Refactor calico deploy strategy - https://phabricator.wikimedia.org/T267653 (10JMeybohm) To solve the catch-22 we could deploy the to-be calico helm chart via helm3. Which would require us to invest into helm3 integration earlier than we hoped fo... [16:23:04] 10Operations, 10Wikidata, 10Wikidata Query Builder, 10User-Addshore: Deploy WDQS query builder to microsites - https://phabricator.wikimedia.org/T266703 (10jijiki) p:05Triage→03Medium [16:26:53] 10Operations, 10Patch-For-Review: Updated java security policy in OpenJDK 11.9 - https://phabricator.wikimedia.org/T266782 (10jijiki) p:05Triage→03Medium @MoritzMuehlenhoff is there an action needed from us when building our java images? [16:27:09] 10Operations, 10Maps (Kartotherian), 10Patch-For-Review, 10Sustainability (Incident Followup): Kartotherian/Maps outage followups, 2020-10-29 - https://phabricator.wikimedia.org/T266807 (10jijiki) p:05Triage→03Medium [16:27:14] (03CR) 10Giuseppe Lavagetto: Add apache httpd base image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/634924 (https://phabricator.wikimedia.org/T265324) (owner: 10Giuseppe Lavagetto) [16:27:43] (03PS9) 10Giuseppe Lavagetto: Add apache httpd base image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/634924 (https://phabricator.wikimedia.org/T265324) [16:27:45] (03PS7) 10Giuseppe Lavagetto: Add an httpd-fcgi image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/636634 (https://phabricator.wikimedia.org/T265324) [16:27:47] (03PS3) 10Giuseppe Lavagetto: Add base php cli image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/638095 (https://phabricator.wikimedia.org/T265324) [16:27:49] (03PS2) 10Giuseppe Lavagetto: Add a php-fpm image for php 7.2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/640386 (https://phabricator.wikimedia.org/T265324) [16:28:21] 10Operations, 10observability: VictorOps ~5min delay from email received to incident paging - https://phabricator.wikimedia.org/T266800 (10jijiki) p:05Triage→03High #observability please lower priority if you feel that "High" is too much. [16:29:24] 10Operations, 10Wikimedia-Mailing-lists: Request for creation: Wiki Loves Africa Organizers Mailing List - https://phabricator.wikimedia.org/T267083 (10RobH) No reason, can be created whenever, I just didn't get to every single open task last week. [16:30:50] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10observability, and 2 others: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086 (10hashar) [16:31:47] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10observability, and 2 others: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086 (10fgiunchedi) The patch is live, unfortunately due to how our Prometheus puppetization works it means we're scraping metrics from gerrit fro... [16:32:21] !log add cloud-storage1-b-codfw to, well, codfw switches - T267378 [16:32:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:28] T267378: (Need By: TBD) rack/setup/install cloudcephmon200[12] - https://phabricator.wikimedia.org/T267378 [16:38:59] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudnet2004-dev - https://phabricator.wikimedia.org/T267654 (10RobH) [16:39:16] 10Operations, 10observability: VictorOps ~5min delay from email received to incident paging - https://phabricator.wikimedia.org/T266800 (10herron) Pinged VO again yesterday, but no meaningful update so far. [16:39:35] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudnet2004-dev - https://phabricator.wikimedia.org/T267654 (10RobH) @andrew: The procurement request didn't have the racking details filled out, can you please detail the networking requirements for htis host and then assign to @papaul... [16:40:08] (03PS1) 10Jbond: wmflib::ssl_ciphersuites: drop suppport for anything less then jessie [puppet] - 10https://gerrit.wikimedia.org/r/640467 [16:40:30] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10sdkim) a:05Jgiannelos→03None Given we, Product Infra, are not finding issues at our service level inves... [16:40:37] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudnet2004-dev - https://phabricator.wikimedia.org/T267654 (10RobH) [16:40:58] 10Operations, 10Cloud-VPS, 10netops, 10cloud-services-team (Kanban): Evaluate the possibility to add Juniper images to Openstack - https://phabricator.wikimedia.org/T180179 (10aborrero) 05Stalled→03Declined >>! In T180179#6611341, @Aklapper wrote: > Could #netops please answer T180179#4965646? Asking a... [16:41:27] (03PS2) 10Jbond: wmflib::ssl_ciphersuites: drop suppport for anything less then jessie [puppet] - 10https://gerrit.wikimedia.org/r/640467 [16:44:31] (03CR) 10CRusnov: "> Patch Set 1:" [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/636464 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [16:44:53] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudcephmon200[12] - https://phabricator.wikimedia.org/T267378 (10ayounsi) WMCS (and thus cloud-hosts1-b-codfw) is only in row B. So the servers will have to move to row B. cloud-storage1-b-codfw for the 2nd NIC has been created as well... [16:47:31] (03PS1) 10Effie Mouzeli: admin: add jgiannelos to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/640469 (https://phabricator.wikimedia.org/T267585) [16:48:27] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudnet2004-dev - https://phabricator.wikimedia.org/T267654 (10aborrero) Please rack this in `codfw row B`. Networking is as follows: * one NIC (eth0/eno1) for control plane management (ssh, puppet, install, monitoring, etc). This is c... [16:50:55] 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10jijiki) 05Open→03Stalled [16:54:01] 10Operations, 10SRE-Access-Requests: Requesting access to production shell groups for DNdubane - https://phabricator.wikimedia.org/T266791 (10jijiki) @DNdubane_WMF going through the list, we will need your manager's approval, thank you! [16:56:09] 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10jijiki) 05Stalled→03Open [16:57:23] 10Operations, 10observability: smart-data-dump should fail loudly when it can't gather metrics - https://phabricator.wikimedia.org/T267135 (10colewhite) Per discussion on IRC, we know two things: # The proposed change to exit non-zero when no disks are detected is very likely to be extremely noisy given we do... [16:59:57] 10Operations, 10ops-codfw, 10cloud-services-team (Kanban): Rack new cloud-dev servers in same rack - https://phabricator.wikimedia.org/T267662 (10ayounsi) p:05Triage→03Medium [17:00:05] jbond42 and cdanis: It is that lovely time of the day again! You are hereby commanded to deploy Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201110T1700). [17:00:19] 10Operations, 10ops-codfw, 10cloud-services-team (Kanban): Rack new cloud-dev servers in same rack - https://phabricator.wikimedia.org/T267662 (10ayounsi) [17:01:19] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudcephmon200[12] - https://phabricator.wikimedia.org/T267378 (10ayounsi) See also T267662 to maybe rack them all together. [17:01:22] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudnet2004-dev - https://phabricator.wikimedia.org/T267654 (10Papaul) other NIC (etho1/eno2) for data plane (openstack cloud virtual network). interface-range cloud-net-trunk [17:06:53] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10akosiaris) >>! In T266373#6613038, @Jgiannelos wrote: > @akosiaris More from debugging on this issue: > >... [17:08:31] (03CR) 10Bstorm: [C: 03+2] "Removing the extra colon might help here, too, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/640429 (owner: 10David Caro) [17:10:56] 10Operations, 10observability: smart-data-dump should fail loudly when it can't gather metrics - https://phabricator.wikimedia.org/T267135 (10colewhite) [17:11:22] 10Operations, 10observability: smart-data-dump should fail loudly when it can't gather metrics - https://phabricator.wikimedia.org/T267135 (10colewhite) [17:12:06] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10ArielGlenn) Can't proceed at the moment, puppet sync to deployment-prep has been broken since Nov 6. Log excerpts from the earliest error: ` 2020-11-06T20... [17:14:17] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install logstash-be103[345] - https://phabricator.wikimedia.org/T267666 (10RobH) [17:14:23] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install logstash-be103[345] - https://phabricator.wikimedia.org/T267666 (10RobH) [17:15:13] (03CR) 10Effie Mouzeli: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26412" [puppet] - 10https://gerrit.wikimedia.org/r/640469 (https://phabricator.wikimedia.org/T267585) (owner: 10Effie Mouzeli) [17:16:11] 10Operations, 10ops-eqiad, 10Analytics-Radar: analytics1046/analytics1057 stuck in booting - https://phabricator.wikimedia.org/T267392 (10Cmjohnson) Both servers are stuck at the same spot during post. I tried rebooting an-1046 but it still sticks, One of the power supplies is bad and I replaced it with one... [17:18:06] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] admin: add jgiannelos to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/640469 (https://phabricator.wikimedia.org/T267585) (owner: 10Effie Mouzeli) [17:18:33] 10Operations, 10ops-eqiad: Degraded RAID on an-presto1004 - https://phabricator.wikimedia.org/T267160 (10Cmjohnson) After more investigating and trying to swap it with a known good 4TB disk, I see an amber light blinking on the backplane. I reached back out to Dell to let them know that they should also send m... [17:22:49] 10Operations, 10ops-eqiad, 10Analytics-Radar: analytics1046/analytics1057 stuck in booting - https://phabricator.wikimedia.org/T267392 (10Cmjohnson) @elukey @razzi @wiki_willy The servers are stuck and I cannot update bios or firmware. Please decommission. [17:24:51] 10Operations, 10ops-eqiad, 10Analytics-Radar: analytics1046/analytics1057 stuck in booting - https://phabricator.wikimedia.org/T267392 (10elukey) 05Open→03Resolved Thanks for checking @Cmjohnson, will do :) [17:26:38] (03CR) 10Klausman: [C: 03+2] Update network topology for Hadoop worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/640448 (https://phabricator.wikimedia.org/T267065) (owner: 10Elukey) [17:27:39] (03PS1) 10Cwhite: smart: add metric to track number of devices detected [puppet] - 10https://gerrit.wikimedia.org/r/640473 (https://phabricator.wikimedia.org/T267135) [17:29:19] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10RobH) @Jclark-ctr: Has the defective HP mainboard been sent back to HP yet? They are spamming my inbox about it =] [17:30:18] !log about to shutdown db1139 for hw maintenance T261405 [17:30:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:30:26] T261405: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 [17:31:06] !log briefly depool mw1263 and mw1264 [17:31:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:58] (03Abandoned) 10Cwhite: hiera: enable grafana smtp notifications [puppet] - 10https://gerrit.wikimedia.org/r/639812 (https://phabricator.wikimedia.org/T267409) (owner: 10Cwhite) [17:34:20] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) @Cmjohnson host should be shut down right now after stopping mysql cleanly- you are free to disconnect/open/check ram now. Thank you! [17:35:20] 10Operations, 10observability, 10Patch-For-Review: grafana email alerting broken? - https://phabricator.wikimedia.org/T267409 (10colewhite) 05Open→03Declined Per discussion on IRC, we intentionally moved away from Grafana email output in favor of centralizing on Icinga. [17:45:10] PROBLEM - Device not healthy -SMART- on an-presto1004 is CRITICAL: cluster=analytics device=sat+megaraid,8 instance=an-presto1004 job=node site=eqiad https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=an-presto1004&var-datasource=eqiad+prometheus/ops [17:45:39] (03CR) 10Cwhite: [C: 03+2] "PCC checks out: https://puppet-compiler.wmflabs.org/compiler1002/26413/" [puppet] - 10https://gerrit.wikimedia.org/r/639811 (https://phabricator.wikimedia.org/T267017) (owner: 10Cwhite) [17:53:42] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH) [17:53:46] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH) [17:54:36] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10ArielGlenn) This is apparently from T267439 After some discussion with jbond and dancy in irc, I am going to revert that and hope I'm not making the varn... [17:57:50] !log pool mw1263 mw1264 [17:57:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:16] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install ml-serve200[1-4] - https://phabricator.wikimedia.org/T267670 (10RobH) [17:58:23] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install ml-serve200[1-4] - https://phabricator.wikimedia.org/T267670 (10RobH) [17:58:38] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install ml-serve200[1-4] - https://phabricator.wikimedia.org/T267670 (10RobH) a:03Papaul [18:00:04] chrisalbon and accraze: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201110T1800). [18:02:53] brennen: are there swat windows today? [18:03:29] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10ArielGlenn) Puppet sync back to working. Back on track to continue with the update in deployment-prep. [18:03:30] it looks like i regressed arwiki (T267614) and I'd like to swat a patch for it, if that's not crazy. [18:03:32] T267614: Appearence of the eastern arabic numerals on Arabic wikimedia projects - https://phabricator.wikimedia.org/T267614 [18:03:33] cscott: https://wikitech.wikimedia.org/wiki/Deployments#Tuesday,_November_10 [18:04:35] cscott: however given that this would be a train blocker under normal circumstances i can probably sling that out soonish before we call the train ticket resolved [18:05:16] chrisalbon: anything going on for the graphoid/ores window? [18:05:26] nope [18:06:31] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Cmjohnson) reseated all of the DIMM, the erorr remained the same Uncorrectable Machine Check Exception (Processor 2, APIC ID 0x00000026, Bank 0x000000... [18:07:49] cscott: let me know when that's reviewed; if it's for sure good to go before the upcoming backport window i'll do it, otherwise we'll coordinate with that window. [18:09:59] 10Operations, 10ops-eqiad, 10Discovery-Search: Memory issue on elastic1063 caused elasticsearch to be killed - https://phabricator.wikimedia.org/T265113 (10Cmjohnson) Thanks, @dcausse Still no h/w error in idrac, A ticket with Dell will need to be created, the server is under warranty. [18:10:07] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) :-( I will put db1139 back into production so it is somewhat useful until next week. [18:11:17] 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10Cmjohnson) @elukey Let's schedule this for next Tuesday please 1500UTC (10EST) [18:11:55] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) [18:19:25] (03PS1) 10Jbond: ssl: new ssl module intialy planned to replace ssl_ciphersuite() [puppet] - 10https://gerrit.wikimedia.org/r/640480 [18:19:49] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) The memory view tells us that it is now 2, not 1 memory slot that is affected (of course, given the above test it is more likely it is CPU / b... [18:19:52] (03CR) 10jerkins-bot: [V: 04-1] ssl: new ssl module intialy planned to replace ssl_ciphersuite() [puppet] - 10https://gerrit.wikimedia.org/r/640480 (owner: 10Jbond) [18:23:06] 10Operations, 10ops-eqiad, 10decommission-hardware: Reclaim torrelay1001 to spares - https://phabricator.wikimedia.org/T243390 (10Cmjohnson) network switch updated with asset tag, removed from public vlan and added to disabled [18:25:22] (03PS1) 10Jcrespo: mariadb: Reduce memory consumption of mariadb@s6 while hw degraded [puppet] - 10https://gerrit.wikimedia.org/r/640482 (https://phabricator.wikimedia.org/T261405) [18:26:11] (03CR) 10Jcrespo: [C: 03+2] mariadb: Reduce memory consumption of mariadb@s6 while hw degraded [puppet] - 10https://gerrit.wikimedia.org/r/640482 (https://phabricator.wikimedia.org/T261405) (owner: 10Jcrespo) [18:27:24] 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10elukey) >>! In T260448#6617045, @Cmjohnson wrote: > @elukey Let's schedule this for next Tuesday please 1500UTC (10EST) Looks good for me,... [18:28:53] (03PS2) 10Jbond: (WIP) ssl: new ssl module intialy planned to replace ssl_ciphersuite() [puppet] - 10https://gerrit.wikimedia.org/r/640480 [18:31:43] !log holger mwmaint1002 Start T219279 [18:31:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:50] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [18:33:55] 10Operations, 10MediaWiki-General, 10serviceops, 10MW-1.34-notes (1.34.0-wmf.16; 2019-07-30), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10holger.knust) Script execution failed with holger@mwmaint1002:~... [18:36:42] (03PS3) 10Jbond: (WIP) ssl: new ssl module intialy planned to replace ssl_ciphersuite() [puppet] - 10https://gerrit.wikimedia.org/r/640480 [18:38:38] 10Operations, 10Beta-Cluster-Infrastructure, 10DBA, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10ArielGlenn) Update in deployment-prep is now complete, assuming I did not miss any hosts. ` root@deployment-cumin:~# cumin 'O{project:deployment-prep... [18:39:28] (03PS4) 10Jbond: (WIP) ssl: new ssl module intialy planned to replace ssl_ciphersuite() [puppet] - 10https://gerrit.wikimedia.org/r/640480 [18:42:51] 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic: Beta needs to be upgraded to Varnish 6 - https://phabricator.wikimedia.org/T267561 (10Ottomata) I just tried to fix by installing varnish 6, but clearly (and obviously) it isn't that simple. ` $ sudo apt-get remove varnish varnish-modules libvmod-netma... [18:43:00] (03PS1) 10Bstorm: toolsdb: set up quickcopy to move the dump [puppet] - 10https://gerrit.wikimedia.org/r/640483 (https://phabricator.wikimedia.org/T266587) [18:46:40] 10Operations, 10ops-eqiad: Interface errors on cr1-eqiad:xe-3/2/1 - https://phabricator.wikimedia.org/T267672 (10ayounsi) p:05Triage→03High [18:54:09] (03CR) 10Andrew Bogott: [C: 03+1] "I haven't used this class before but the logic seems reasonable" [puppet] - 10https://gerrit.wikimedia.org/r/640483 (https://phabricator.wikimedia.org/T266587) (owner: 10Bstorm) [18:58:55] (03PS2) 10Bstorm: toolsdb: set up quickcopy to move the dump [puppet] - 10https://gerrit.wikimedia.org/r/640483 (https://phabricator.wikimedia.org/T266587) [19:00:04] RoanKattouw, Niharika, and Urbanecm: Dear deployers, time to do the Morning backport window deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201110T1900). [19:00:04] No GERRIT patches in the queue for this window AFAICS. [19:02:28] anybody around that could help tracking down a pcache memcached key? [19:06:26] !log holger mwmaint1002 Stop T219279 [19:06:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:33] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [19:07:10] Urbanecm / RoanKattouw: noting for deployers this window that cscott might have a patch shortly. [19:08:14] correction: 2 patches: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=0&oldid=1887724 [19:08:51] actually i guess that might need to wait a minute; i can handle. [19:10:33] dancy and i have the next couple hours after this window blocked out for some scap debugging so if it runs into that time shouldn't be an issue. [19:13:53] (03PS1) 10Brennen Bearnes: language: Honor $wgTranslateNumerals, even if PHP does digit translation [core] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/640487 (https://phabricator.wikimedia.org/T267614) [19:15:54] (03PS1) 10Brennen Bearnes: Downgrade the severity of the non-numeric argument to formatNum warnings [core] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/640488 (https://phabricator.wikimedia.org/T267370) [19:16:08] 10Operations, 10MediaWiki-General, 10serviceops, 10MW-1.34-notes (1.34.0-wmf.16; 2019-07-30), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10jcrespo) pc2010 seems to be lagging behind. This is a non-issue f... [19:19:55] 10Operations, 10MediaWiki-General, 10serviceops, 10MW-1.34-notes (1.34.0-wmf.16; 2019-07-30), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10ArielGlenn) From looking at the code, it seems like the user list... [19:22:24] (03CR) 10Brennen Bearnes: [C: 03+2] Downgrade the severity of the non-numeric argument to formatNum warnings [core] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/640488 (https://phabricator.wikimedia.org/T267370) (owner: 10Brennen Bearnes) [19:22:48] (03CR) 10Brennen Bearnes: [C: 03+2] language: Honor $wgTranslateNumerals, even if PHP does digit translation [core] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/640487 (https://phabricator.wikimedia.org/T267614) (owner: 10Brennen Bearnes) [19:27:12] 10Operations, 10Commons, 10MediaWiki-File-management: File from commons is not loaded properly - https://phabricator.wikimedia.org/T267668 (10AntiCompositeNumber) Sounds like {T253405} again. Looks like @jijiki's been doing some memcached work today relating to T244340, so that seems like a likely culprit. T... [19:29:15] (03Abandoned) 10Andrew Bogott: exim: add toolforge.org domain [puppet] - 10https://gerrit.wikimedia.org/r/619851 (owner: 10Andrew Bogott) [19:29:22] RECOVERY - ElasticSearch shard size check - 9243 on search.svc.eqiad.wmnet is OK: OK - All good! https://wikitech.wikimedia.org/wiki/Search%23If_it_has_been_indexed [19:31:03] (03PS1) 10Mholloway: Update mobileapps to 2020-11-10-190707-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/640506 [19:31:48] 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic: Beta needs to be upgraded to Varnish 6 - https://phabricator.wikimedia.org/T267561 (10ArielGlenn) I guess that T267439 might be related. [19:46:50] (03Merged) 10jenkins-bot: Downgrade the severity of the non-numeric argument to formatNum warnings [core] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/640488 (https://phabricator.wikimedia.org/T267370) (owner: 10Brennen Bearnes) [19:46:56] (03Merged) 10jenkins-bot: language: Honor $wgTranslateNumerals, even if PHP does digit translation [core] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/640487 (https://phabricator.wikimedia.org/T267614) (owner: 10Brennen Bearnes) [19:48:40] (03PS1) 10Mholloway: Update wikifeeds to 2020-11-10-193040-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/640509 [19:51:00] (03CR) 10Brennen Bearnes: [C: 03+2] Turn on formatnum logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640254 (https://phabricator.wikimedia.org/T267587) (owner: 10C. Scott Ananian) [19:51:55] (03Merged) 10jenkins-bot: Turn on formatnum logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/640254 (https://phabricator.wikimedia.org/T267587) (owner: 10C. Scott Ananian) [19:58:12] (03PS5) 10Ryan Kemper: Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) [19:58:26] (03CR) 10jerkins-bot: [V: 04-1] Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) (owner: 10Ryan Kemper) [20:00:41] (03PS8) 10Ppchelko: Enable parsoid on api_appserver [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635086 (https://phabricator.wikimedia.org/T265954) [20:01:13] brennen: i'm back [20:01:33] cool. i've staged both of those changes (and the config change) to mwdebu1002 [20:01:52] brennen: i think if the mediawiki-config part of the logging stuff gets deployed first we can test the other half of that on beta [20:01:56] looking here - https://ar.wikipedia.org/w/index.php?title=%D9%88%D9%8A%D9%83%D9%8A%D8%A8%D9%8A%D8%AF%D9%8A%D8%A7&action=history - i think the arabic numeral issue is fixed [20:01:57] (03PS6) 10Ryan Kemper: Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) [20:02:07] (03CR) 10Ppchelko: "removed -2 since the dependency seem to have landed in wmf.16 and been deployed. Will deploy this after holidays." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/635086 (https://phabricator.wikimedia.org/T265954) (owner: 10Ppchelko) [20:02:13] and then i was about to check to see if we had a beta arwiki i could test the other patch on [20:02:30] oh, did you already deploy that? [20:02:44] it's just on mwdebug1002; haven't yet synced out otherwise. [20:02:52] (03PS1) 10Andrew Bogott: Replace 'codname' with 'codename' in several places, add codname to typos file [puppet] - 10https://gerrit.wikimedia.org/r/640512 (https://phabricator.wikimedia.org/T267396) [20:02:57] (03CR) 10jerkins-bot: [V: 04-1] Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) (owner: 10Ryan Kemper) [20:04:03] brennen: ah yes, i can confirm that setting the x-debug header to go to mwdebug1002 does seem to fix the issue on arwiki [20:04:19] (03CR) 10jerkins-bot: [V: 04-1] Replace 'codname' with 'codename' in several places, add codname to typos file [puppet] - 10https://gerrit.wikimedia.org/r/640512 (https://phabricator.wikimedia.org/T267396) (owner: 10Andrew Bogott) [20:04:39] (03CR) 10Andrew Bogott: "This was breaking puppet on a striker box; also a bunch of credit agencies still think I used to work at a place called 'CodWeavers' so I'" [puppet] - 10https://gerrit.wikimedia.org/r/640512 (https://phabricator.wikimedia.org/T267396) (owner: 10Andrew Bogott) [20:05:22] brennen: yeah, the arwiki patch looks good to me on mw1002 [20:06:01] yes, i can also confirm fixed via WikimediaDebug gadget [20:06:10] cscott: cool. just realized i likely shouldn't have merged both of these since they both touch Language.php, but should i go ahead and deploy the logging config change? [20:06:39] yes, i think so [20:06:50] (03CR) 10Ryan Kemper: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26418" [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) (owner: 10Ryan Kemper) [20:06:52] (03PS2) 10Andrew Bogott: Replace 'codname' with 'codename' in several places [puppet] - 10https://gerrit.wikimedia.org/r/640512 (https://phabricator.wikimedia.org/T267396) [20:06:54] (03PS1) 10Andrew Bogott: Add 'codname' to typos file [puppet] - 10https://gerrit.wikimedia.org/r/640513 (https://phabricator.wikimedia.org/T267396) [20:08:16] is beta down? https://en.wikipedia.beta.wmflabs.org/ is giving me an error [20:09:53] (03PS7) 10Ryan Kemper: Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) [20:10:22] cscott: yep. T267551 [20:10:22] T267551: Beta Cluster is down - https://phabricator.wikimedia.org/T267551 [20:10:41] !log brennen@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:640254|Turn on formatnum logging (T267587, T267370)]] (duration: 01m 02s) [20:10:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:50] T267587: Language: Use of Language::formatNum with a non-numeric string was deprecated in MediaWiki 1.36. - https://phabricator.wikimedia.org/T267587 [20:10:50] T267370: Use of FormatMetadata::formatNum with non-numeric value was deprecated in MediaWiki 1.36. [Called from FormatMetadata::makeFormattedData] - https://phabricator.wikimedia.org/T267370 [20:11:14] mholloway: ah. we're leaping w/o a net i guess. [20:11:27] (03CR) 10jerkins-bot: [V: 04-1] Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) (owner: 10Ryan Kemper) [20:12:09] (03CR) 10Andrew Bogott: [C: 03+2] Replace 'codname' with 'codename' in several places [puppet] - 10https://gerrit.wikimedia.org/r/640512 (https://phabricator.wikimedia.org/T267396) (owner: 10Andrew Bogott) [20:12:19] (03CR) 10Andrew Bogott: [C: 03+2] Add 'codname' to typos file [puppet] - 10https://gerrit.wikimedia.org/r/640513 (https://phabricator.wikimedia.org/T267396) (owner: 10Andrew Bogott) [20:12:53] (03CR) 10Ryan Kemper: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26419" [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) (owner: 10Ryan Kemper) [20:13:39] brennen: i see messages on the 'formatnum' channel in logstash, so it looks like that worked [20:15:06] cscott: well, fwiw, page loads don't seem to explode on the debug box (to the extent i can tell, which isn't much) and we'll know pretty quickly if formatnum logs fall off. [20:15:36] ok, cool. going ahead with the sync then. [20:16:48] (03PS8) 10Ryan Kemper: Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) [20:18:29] (03CR) 10jerkins-bot: [V: 04-1] Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) (owner: 10Ryan Kemper) [20:23:18] hrm, i guess this is going to need a `scap sync-world`, so may take a bit. [20:24:09] (03Abandoned) 10Thcipriani: Branch commit for wmf/1.36.0-wmf.17 [core] (wmf/1.36.0-wmf.17) - 10https://gerrit.wikimedia.org/r/640283 (https://phabricator.wikimedia.org/T263183) (owner: 10TrainBranchBot) [20:25:32] (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640516 [20:27:25] !log brennen@deploy1001 Started scap: Backport: [[gerrit:640487|language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614)]] and [[gerrit:640488|Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587)]] [20:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:33] T267614: Appearence of the eastern arabic numerals on Arabic wikimedia projects - https://phabricator.wikimedia.org/T267614 [20:27:34] T267587: Language: Use of Language::formatNum with a non-numeric string was deprecated in MediaWiki 1.36. - https://phabricator.wikimedia.org/T267587 [20:27:34] T267370: Use of FormatMetadata::formatNum with non-numeric value was deprecated in MediaWiki 1.36. [Called from FormatMetadata::makeFormattedData] - https://phabricator.wikimedia.org/T267370 [20:29:23] brennen: the mediawiki-config patch is deployed right? so we're just waiting for sync-world to spread the new logging code past mwdebug1002 ? [20:29:45] brennen: because i haven't seen any more logs on the formatnum channel yet. [20:29:56] cscott: correct [20:30:12] ok [20:39:30] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [20:44:42] 10Operations, 10Research, 10SRE-Access-Requests: Access to analytics-privatedata-users for Research volunteer Swagoel - https://phabricator.wikimedia.org/T267314 (10Swagoel) ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOe8yGmiYe4TX6NLuqjO16X6vXJZnyoh6pdFS//eywhI swati@Swatis-Air.volcano.net preferred username = swagoel [20:44:47] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10akosiaris) I 've also ran the same tests against `restbase.svc.eqiad.wmnet` in P13257 and I have the follow... [20:48:33] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 4 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10akosiaris) >>! In T266373#6616625, @sdkim wrote: > Given we, Product Infra, are not finding issues at our s... [20:49:31] (03PS9) 10Ryan Kemper: Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) [20:50:59] (03CR) 10jerkins-bot: [V: 04-1] Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) (owner: 10Ryan Kemper) [20:51:48] (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/640523 [20:54:25] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 4 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10akosiaris) > Interestingly, proton returns transfer-encoding: chunked responses, that don't have a Content-... [20:55:27] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Traffic, and 2 others: PDF download generates invalid PDF files - https://phabricator.wikimedia.org/T266559 (10Urbanecm) [20:55:38] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 4 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10Urbanecm) [20:59:16] brennen: ok, now seeing the channel:formatnum bugs from non-canary machines. so i think this worked. [20:59:40] PROBLEM - Host ripe-atlas-codfw IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [21:00:04] brennen and dancy: Time to snap out of that daydream and deploy scap sync-world debugging. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201110T2100). [21:00:04] brennen: A patch you scheduled for scap sync-world debugging is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [21:00:43] 10Operations, 10Android-app-Bugs, 10Fundraising-Backlog, 10Thank-You-Page, and 5 others: Deal with donatewiki Thank You page launching in apps - https://phabricator.wikimedia.org/T259312 (10DStrine) [21:02:06] !log brennen@deploy1001 Finished scap: Backport: [[gerrit:640487|language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614)]] and [[gerrit:640488|Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587)]] (duration: 34m 46s) [21:02:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:02:16] T267614: Appearence of the eastern arabic numerals on Arabic wikimedia projects - https://phabricator.wikimedia.org/T267614 [21:02:16] T267587: Language: Use of Language::formatNum with a non-numeric string was deprecated in MediaWiki 1.36. - https://phabricator.wikimedia.org/T267587 [21:02:17] T267370: Use of FormatMetadata::formatNum with non-numeric value was deprecated in MediaWiki 1.36. [Called from FormatMetadata::makeFormattedData] - https://phabricator.wikimedia.org/T267370 [21:02:24] PROBLEM - Host ripe-atlas-codfw is DOWN: PING CRITICAL - Packet loss = 100% [21:02:42] 10Operations, 10Commons, 10MediaWiki-File-management: File from commons is not loaded properly - https://phabricator.wikimedia.org/T267668 (10jijiki) p:05Triage→03Medium @AntiCompositeNumber the feature we have been working on has been enabled in mw1276 (api) and mw1263 (app), where items are fetched fro... [21:02:49] cscott: cool. and seems like errors are falling off in mediawiki-new-errors. [21:06:18] (03PS10) 10Ryan Kemper: Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) [21:08:17] ACKNOWLEDGEMENT - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [1000.0] Ryan Kemper https://phabricator.wikimedia.org/T267684 https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [21:10:45] (03CR) 10Ryan Kemper: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26420" [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T246345) (owner: 10Ryan Kemper) [21:11:15] !log ban elastic1050 from eqiad psi cluster due to excessive load [21:11:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:06] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [21:23:31] !log testing some scap operations, modified to use ssh -n for debugging T223287 [21:23:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:23:37] T223287: Investigate scap cluster_ssh idling until pressing ENTER repeatedly - https://phabricator.wikimedia.org/T223287 [21:24:48] !log brennen@deploy1001 sync-file aborted: Testing: README.md sync-file with ssh -n for T223287 (duration: 00m 37s) [21:24:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:31:44] PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [21:32:22] (03PS2) 10Mholloway: Update wikifeeds to 2020-11-10-212744-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/640509 [21:32:24] PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [21:37:32] (03PS3) 10Mholloway: Update wikifeeds to 2020-11-10-212744-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/640509 [21:39:19] (03PS1) 10Razzi: eventschemas: cache schemas for 60 seconds [puppet] - 10https://gerrit.wikimedia.org/r/640556 (https://phabricator.wikimedia.org/T267557) [21:43:10] (03CR) 10Ottomata: [C: 03+1] eventschemas: cache schemas for 60 seconds [puppet] - 10https://gerrit.wikimedia.org/r/640556 (https://phabricator.wikimedia.org/T267557) (owner: 10Razzi) [21:44:32] (03CR) 10Razzi: [C: 03+2] eventschemas: cache schemas for 60 seconds [puppet] - 10https://gerrit.wikimedia.org/r/640556 (https://phabricator.wikimedia.org/T267557) (owner: 10Razzi) [21:47:19] !log unban elastic1050 from eqiad search psi cluster [21:47:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:46] (03CR) 10Mholloway: [C: 03+2] Update wikifeeds to 2020-11-10-212744-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/640509 (owner: 10Mholloway) [21:53:39] (03CR) 10Bstorm: [C: 03+2] "I'm going to need this whenever that dump finishes, so I'll merge now and fix later if there ends up being anything that is wrong about it" [puppet] - 10https://gerrit.wikimedia.org/r/640483 (https://phabricator.wikimedia.org/T266587) (owner: 10Bstorm) [21:54:11] (03Merged) 10jenkins-bot: Update wikifeeds to 2020-11-10-212744-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/640509 (owner: 10Mholloway) [21:55:59] !log mholloway-shell@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' . [21:56:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:57:33] !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [21:57:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:59:24] !log mholloway-shell@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [21:59:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:01:22] (03CR) 10Mholloway: [C: 03+2] Update mobileapps to 2020-11-10-190707-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/640506 (owner: 10Mholloway) [22:04:02] (03Merged) 10jenkins-bot: Update mobileapps to 2020-11-10-190707-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/640506 (owner: 10Mholloway) [22:04:31] 10Operations, 10Research, 10SRE-Access-Requests: Access to analytics-privatedata-users for Research volunteer Swagoel - https://phabricator.wikimedia.org/T267314 (10Swagoel) a:05Swagoel→03RobH [22:05:28] 10Operations, 10Research, 10SRE-Access-Requests: Access to analytics-privatedata-users for Research volunteer Swagoel - https://phabricator.wikimedia.org/T267314 (10RobH) a:05RobH→03jijiki My clinic duty finished last week, so this shouldn't be assigned to me. Reassigning to the current SRE clinic duty... [22:05:55] !log mholloway-shell@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [22:06:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:13] !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [22:08:13] !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . [22:08:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:24] !log mholloway-shell@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [22:14:24] !log mholloway-shell@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . [22:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:21:02] RECOVERY - mediawiki originals uploads -hourly- for codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [22:22:06] RECOVERY - mediawiki originals uploads -hourly- for eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [22:27:51] 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic: Beta needs to be upgraded to Varnish 6 - https://phabricator.wikimedia.org/T267561 (10thcipriani) [22:27:53] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Traffic: Puppet disabled in beta cluster varnish deployment-cache-text06 - https://phabricator.wikimedia.org/T267578 (10thcipriani) [22:27:59] 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic: Beta needs to be upgraded to Varnish 6 - https://phabricator.wikimedia.org/T267561 (10thcipriani) [23:04:16] (03PS2) 10Bstorm: cloud-vps: Change NFS mounts to default to false [puppet] - 10https://gerrit.wikimedia.org/r/639297 (https://phabricator.wikimedia.org/T262350) [23:12:52] (03PS2) 10Dave Pifke: [WIP] Start puppetizing WebPageTest [puppet] - 10https://gerrit.wikimedia.org/r/633202 (https://phabricator.wikimedia.org/T262962) [23:14:01] (03CR) 10Bstorm: [C: 03+2] "Since Andrew set all non-WMCS projects to have a mount_nfs: true key in their project puppet, this will be a functional noop for our users" [puppet] - 10https://gerrit.wikimedia.org/r/639297 (https://phabricator.wikimedia.org/T262350) (owner: 10Bstorm) [23:31:00] (03PS2) 10Dave Pifke: coal: use Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/640226 (https://phabricator.wikimedia.org/T267269) [23:32:10] (03PS5) 10Dave Pifke: webperf: change navtiming to use Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/639197 (https://phabricator.wikimedia.org/T267269) [23:36:12] (03PS4) 10Dave Pifke: webperf: convert statsv to use Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/639216 (https://phabricator.wikimedia.org/T267269) [23:57:14] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_wikifeeds_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:58:56] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets