[00:04:31] PROBLEM - Check systemd state on restbase-dev1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:04:52] PROBLEM - Check systemd state on restbase-dev1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:58:30] (03PS1) 10Huji: Remove lines that are now part of AbuseFilter defaults [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424974 (https://phabricator.wikimedia.org/T178349) [02:36:52] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.28) (duration: 05m 57s) [02:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:25:22] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 790.18 seconds [04:02:31] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 262.19 seconds [05:21:38] 10Operations, 10Puppet, 10Patch-For-Review: uwsgi::app sorts config keys, but the .ini file behavior depends on order - https://phabricator.wikimedia.org/T191648#4115614 (10Joe) Once upon a time, we used ruby 1.8 for puppet, and that had non-deterministic ordering of `Hash` when iterating. So we needed some... [05:29:36] (03PS1) 10Marostegui: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424988 (https://phabricator.wikimedia.org/T187089) [05:31:32] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424988 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [05:32:47] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424988 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [05:33:02] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424988 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [05:33:28] 10Operations, 10Beta-Cluster-Infrastructure, 10HHVM: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2627786 (10Joe) I think this task is resolved as it's about the MediaWiki appservers and AFAICS they're all converted to jessie at least. [05:34:14] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1106 for alter table (duration: 01m 00s) [05:34:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:34:32] !log Deploy schema change on db1106 with replication enabled (this will generate lag on labs replicas) - T187089 T185128 T153182 [05:34:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:34:40] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [05:34:40] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [05:34:40] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [05:34:48] 10Operations, 10Operations-Software-Development, 10HHVM, 10Patch-For-Review: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#4115634 (10Joe) [05:34:51] 10Operations, 10Beta-Cluster-Infrastructure, 10HHVM: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#4115633 (10Joe) 05Open>03Resolved [05:40:52] (03PS1) 10Giuseppe Lavagetto: deployment-prep: add tls proxy listen port to etcd [puppet] - 10https://gerrit.wikimedia.org/r/424989 (https://phabricator.wikimedia.org/T191107) [05:52:37] (03PS1) 10Marostegui: db2079.yaml: Change binlog format [puppet] - 10https://gerrit.wikimedia.org/r/424990 (https://phabricator.wikimedia.org/T191275) [05:54:27] !log Stop MySQL on db2079 to change its binlog format [05:54:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:55:19] (03PS2) 10Giuseppe Lavagetto: deployment-prep: add tls proxy listen port to etcd [puppet] - 10https://gerrit.wikimedia.org/r/424989 (https://phabricator.wikimedia.org/T191107) [05:55:26] (03CR) 10Marostegui: [C: 032] db2079.yaml: Change binlog format [puppet] - 10https://gerrit.wikimedia.org/r/424990 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [05:58:18] (03PS3) 10Giuseppe Lavagetto: deployment-prep: add tls proxy listen port to etcd [puppet] - 10https://gerrit.wikimedia.org/r/424989 (https://phabricator.wikimedia.org/T191107) [06:00:01] (03PS1) 10Marostegui: db-codfw.php: db2079 is now candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424991 (https://phabricator.wikimedia.org/T191275) [06:02:19] (03CR) 10Marostegui: [C: 032] db-codfw.php: db2079 is now candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424991 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:02:21] (03PS4) 10Giuseppe Lavagetto: deployment-prep: add tls proxy listen port to etcd [puppet] - 10https://gerrit.wikimedia.org/r/424989 (https://phabricator.wikimedia.org/T191107) [06:02:53] (03Merged) 10jenkins-bot: db-codfw.php: db2079 is now candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424991 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:03:08] (03CR) 10jenkins-bot: db-codfw.php: db2079 is now candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424991 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:03:35] (03CR) 10Giuseppe Lavagetto: [C: 032] deployment-prep: add tls proxy listen port to etcd [puppet] - 10https://gerrit.wikimedia.org/r/424989 (https://phabricator.wikimedia.org/T191107) (owner: 10Giuseppe Lavagetto) [06:04:21] !log marostegui@tin Synchronized wmf-config/db-codfw.php: db2079 is now s8 candidate master (duration: 00m 59s) [06:04:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:05:53] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#4115654 (10Joe) [06:05:58] 10Puppet, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: deployment-etcd-01 puppet errors - https://phabricator.wikimedia.org/T191107#4115653 (10Joe) 05Open>03Resolved [06:08:32] (03CR) 10Elukey: [C: 032] Release Burrow 1.0 [debs/burrow] (debian) - 10https://gerrit.wikimedia.org/r/424615 (https://phabricator.wikimedia.org/T188719) (owner: 10Elukey) [06:21:26] !log Reboot db2092 for mariadb and kernel upgrade [06:21:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:24:22] !log upgrade burrow 1.0.0 to stretch/jessie wikimedia [06:24:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:24:43] (03PS20) 10Elukey: burrow: configuration upgrade to support 1.0 [puppet] - 10https://gerrit.wikimedia.org/r/424557 (https://phabricator.wikimedia.org/T188719) [06:27:01] (03CR) 10Elukey: [C: 032] burrow: configuration upgrade to support 1.0 [puppet] - 10https://gerrit.wikimedia.org/r/424557 (https://phabricator.wikimedia.org/T188719) (owner: 10Elukey) [06:28:11] PROBLEM - puppet last run on labcontrol1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/modprobe.d/nf_conntrack.conf] [06:29:00] (03PS1) 10Marostegui: mariadb: Move db2092 to s1 [puppet] - 10https://gerrit.wikimedia.org/r/424994 (https://phabricator.wikimedia.org/T191275) [06:29:30] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db2092 to s1 [puppet] - 10https://gerrit.wikimedia.org/r/424994 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:30:47] (03PS2) 10Marostegui: mariadb: Move db2092 to s1 [puppet] - 10https://gerrit.wikimedia.org/r/424994 (https://phabricator.wikimedia.org/T191275) [06:31:22] PROBLEM - Check systemd state on kafkamon2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:32:26] (03PS3) 10Marostegui: mariadb: Move db2092 to s1 [puppet] - 10https://gerrit.wikimedia.org/r/424994 (https://phabricator.wikimedia.org/T191275) [06:33:27] kafkamon2001 is me :) [06:33:29] (03PS1) 10Marostegui: db-codfw.php: Depool db2072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424997 (https://phabricator.wikimedia.org/T170662) [06:34:16] (03CR) 10Marostegui: [C: 032] mariadb: Move db2092 to s1 [puppet] - 10https://gerrit.wikimedia.org/r/424994 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:35:26] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424997 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:36:37] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424997 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:37:22] RECOVERY - Check systemd state on kafkamon2001 is OK: OK - running: The system is fully operational [06:38:01] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2072 - T170662 (duration: 00m 59s) [06:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:38:07] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [06:38:17] (03CR) 10jenkins-bot: db-codfw.php: Depool db2072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424997 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:40:32] PROBLEM - Check systemd state on kafkamon2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:41:01] !log Stop MySQL on db2072 to clone db2092 from it - T170662 [06:41:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:13] !log Reboot db2072 for kernel upgrade [06:43:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:19] (03PS1) 10Elukey: burrow: fix creation of pid file under /var/run [puppet] - 10https://gerrit.wikimedia.org/r/424998 (https://phabricator.wikimedia.org/T188719) [06:45:58] 10Operations, 10HHVM, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4115676 (10Joe) Total number of rows to sort through per shard: s1 - 114M rows s2 - 73M rows s3 - 41M rows s6 - 54M rows s7 = 51M rows [06:46:35] (03PS4) 10Marostegui: mediawiki: Start deleteAutoPatrolLogs from Wikidata logging table [puppet] - 10https://gerrit.wikimedia.org/r/424300 (https://phabricator.wikimedia.org/T189596) (owner: 10Ladsgroup) [06:48:00] (03PS2) 10Elukey: burrow: fix creation of pid file under /var/run [puppet] - 10https://gerrit.wikimedia.org/r/424998 (https://phabricator.wikimedia.org/T188719) [06:49:51] (03CR) 10Marostegui: [C: 032] mediawiki: Start deleteAutoPatrolLogs from Wikidata logging table [puppet] - 10https://gerrit.wikimedia.org/r/424300 (https://phabricator.wikimedia.org/T189596) (owner: 10Ladsgroup) [06:50:29] (03CR) 10Elukey: [C: 032] burrow: fix creation of pid file under /var/run [puppet] - 10https://gerrit.wikimedia.org/r/424998 (https://phabricator.wikimedia.org/T188719) (owner: 10Elukey) [06:50:35] (03PS3) 10Elukey: burrow: fix creation of pid file under /var/run [puppet] - 10https://gerrit.wikimedia.org/r/424998 (https://phabricator.wikimedia.org/T188719) [06:54:20] (03PS1) 10Elukey: burrow: fix erb template generation [puppet] - 10https://gerrit.wikimedia.org/r/424999 (https://phabricator.wikimedia.org/T188719) [06:56:39] (03CR) 10Elukey: [C: 032] burrow: fix erb template generation [puppet] - 10https://gerrit.wikimedia.org/r/424999 (https://phabricator.wikimedia.org/T188719) (owner: 10Elukey) [06:57:59] !log start of ladsgroup@terbium:~$ mwscript deleteAutoPatrolLogs.php --wiki=zhwiktionary --check-old --before 20180223210426 --sleep 2 (T184485) [06:58:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:05] T184485: Stop logging autopatrol actions - https://phabricator.wikimedia.org/T184485 [06:58:14] RECOVERY - puppet last run on labcontrol1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:58:43] RECOVERY - Check systemd state on kafkamon2001 is OK: OK - running: The system is fully operational [07:04:41] (03PS3) 10Muehlenhoff: Enable icu57 component for jessie-based app servers [puppet] - 10https://gerrit.wikimedia.org/r/423687 [07:05:56] (03CR) 10Muehlenhoff: [C: 032] Enable icu57 component for jessie-based app servers [puppet] - 10https://gerrit.wikimedia.org/r/423687 (owner: 10Muehlenhoff) [07:09:32] !log upgrade burrow to 1.0 on kafkamon[12]* - T188719 [07:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:39] T188719: Upgrade Kafka Burrow to 1.0 - https://phabricator.wikimedia.org/T188719 [07:15:53] (03PS1) 10Marostegui: install_server: Allow reinstall db2092 [puppet] - 10https://gerrit.wikimedia.org/r/425001 (https://phabricator.wikimedia.org/T170662) [07:17:45] !log upgrading mw1261 to ICU 57-enabled HHVM build (T189295) [07:17:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:17:52] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [07:17:52] (03CR) 10Marostegui: [C: 032] install_server: Allow reinstall db2092 [puppet] - 10https://gerrit.wikimedia.org/r/425001 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [07:24:28] !log repooling mw1261 after upgrade to ICU 57-enabled HHVM build (T189295) [07:24:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:35] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [07:32:32] !log upgrading mw1262-1265 to ICU 57-enabled HHVM build (T189295) [07:32:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:38] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [07:38:10] <_joe_> !log upgrading mw1300 to ICU 57-enabled HHVM build (T189295) [07:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:16] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [07:38:25] PROBLEM - HHVM jobrunner on mw1300 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [07:39:24] RECOVERY - HHVM jobrunner on mw1300 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.022 second response time [07:42:30] <_joe_> !log repooling mw1300 now with ICU 57-enabled HHVM build (T189295) [07:42:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:46:09] 10Operations, 10Collaboration-Team-Triage, 10DBA, 10StructuredDiscussions, 10WorkType-Maintenance: Setup separate logical External Store for Flow in production - https://phabricator.wikimedia.org/T107610#4115753 (10jcrespo) That last suggestion looks like a blocker to me, at least to check it before doin... [07:48:04] !log upgrading mw1276-1279 (API canaries) to ICU 57-enabled HHVM build (T189295) [07:48:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:10] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [07:49:50] (03PS2) 10Jcrespo: mariadb: Repool es2019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424745 (https://phabricator.wikimedia.org/T153440) [07:56:34] !log Remove /var/log/wikidata/rebuildTermSqlIndex.log* as per Amir1's request [07:56:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:52] Thanks! [07:57:13] (03PS1) 10Giuseppe Lavagetto: role::mw::jobrunner: rationalize runners [puppet] - 10https://gerrit.wikimedia.org/r/425008 [08:02:00] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler02/10859/mw1300.eqiad.wmnet/ seems the patch DTRT" [puppet] - 10https://gerrit.wikimedia.org/r/425008 (owner: 10Giuseppe Lavagetto) [08:03:31] (03CR) 10Volans: [C: 04-2] "Reply inline, not yet ready." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) (owner: 10Volans) [08:04:23] (03CR) 10Jcrespo: [C: 032] mariadb: Repool es2019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424745 (https://phabricator.wikimedia.org/T153440) (owner: 10Jcrespo) [08:05:42] (03Merged) 10jenkins-bot: mariadb: Repool es2019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424745 (https://phabricator.wikimedia.org/T153440) (owner: 10Jcrespo) [08:08:22] (03CR) 10jenkins-bot: mariadb: Repool es2019 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424745 (https://phabricator.wikimedia.org/T153440) (owner: 10Jcrespo) [08:12:21] (03PS2) 10Elukey: Set zookeeper_cluster label for Zookeeper server prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/424664 (owner: 10Ottomata) [08:12:24] (03PS2) 10Giuseppe Lavagetto: role::mw::jobrunner: rationalize runners [puppet] - 10https://gerrit.wikimedia.org/r/425008 [08:13:21] (03PS1) 100x010C: Switch SET on frwiktionary to use wikitexteditor by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425009 (https://phabricator.wikimedia.org/T169741) [08:14:03] (03CR) 10Mobrovac: [C: 031] role::mw::jobrunner: rationalize runners [puppet] - 10https://gerrit.wikimedia.org/r/425008 (owner: 10Giuseppe Lavagetto) [08:15:25] (03CR) 10Elukey: "Pcc looks good! https://puppet-compiler.wmflabs.org/compiler02/10860/" [puppet] - 10https://gerrit.wikimedia.org/r/424664 (owner: 10Ottomata) [08:15:28] (03CR) 10Elukey: [C: 032] Set zookeeper_cluster label for Zookeeper server prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/424664 (owner: 10Ottomata) [08:15:36] (03PS3) 10Giuseppe Lavagetto: role::mw::jobrunner: rationalize runners [puppet] - 10https://gerrit.wikimedia.org/r/425008 [08:17:07] (03PS4) 10Giuseppe Lavagetto: role::mw::jobrunner: rationalize runners [puppet] - 10https://gerrit.wikimedia.org/r/425008 [08:19:23] (03CR) 10Giuseppe Lavagetto: [C: 032] role::mw::jobrunner: rationalize runners [puppet] - 10https://gerrit.wikimedia.org/r/425008 (owner: 10Giuseppe Lavagetto) [08:22:54] (03CR) 10Mobrovac: [C: 04-1] "Superseded by I11f0d2b8cf859f01b8bc60253767f4c0c51350d1" [puppet] - 10https://gerrit.wikimedia.org/r/416481 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [08:24:22] (03CR) 10Muehlenhoff: Cumin masters in WMCS: upgrade to python3 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) (owner: 10Volans) [08:25:11] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425011 [08:25:13] (03CR) 10Volans: "That's exactly what I was doing, CR coming shortly ;)" [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) (owner: 10Volans) [08:25:15] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425011 [08:27:00] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425011 (owner: 10Marostegui) [08:28:07] (03PS4) 10Volans: Cumin masters in WMCS: upgrade to python3 [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) [08:28:09] (03PS5) 10Volans: Cumin masters in prod: upgrade to python3 [puppet] - 10https://gerrit.wikimedia.org/r/412894 (https://phabricator.wikimedia.org/T187773) [08:28:11] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425011 (owner: 10Marostegui) [08:28:45] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425011 (owner: 10Marostegui) [08:29:30] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1106 after alter table (duration: 00m 58s) [08:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:16] (03PS1) 10Marostegui: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425012 (https://phabricator.wikimedia.org/T187089) [08:32:51] <_joe_> !log upgrading eqiad jobrunners to ICU 57-enabled HHVM build (T189295) [08:32:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:57] !log upgrading remaining app servers in eqiad to to ICU 57-enabled HHVM build (T189295) [08:32:57] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [08:33:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:19] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425012 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [08:34:42] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425012 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [08:35:55] !log jynus@tin Synchronized wmf-config/db-codfw.php: Repoo es2019 (duration: 00m 59s) [08:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:02] PROBLEM - HHVM jobrunner on mw1299 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [08:37:02] RECOVERY - HHVM jobrunner on mw1299 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [08:37:40] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1080 for alter table (duration: 00m 59s) [08:37:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:51] !log Deploy schema change on db1080 - T187089 T185128 T153182 [08:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:58] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [08:37:58] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [08:37:58] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [08:38:40] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425012 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [08:43:11] PROBLEM - HHVM jobrunner on mw1303 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [08:44:12] RECOVERY - HHVM jobrunner on mw1303 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [08:44:32] <_joe_> these ^^ are the ongoing ICU upgrades [08:45:05] !log upgrading eqiad api appservers to ICU 57-enabled HHVM build (T189295) [08:45:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:12] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [08:46:01] PROBLEM - HHVM jobrunner on mw1301 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.004 second response time [08:47:02] RECOVERY - HHVM jobrunner on mw1301 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [08:49:41] PROBLEM - HHVM jobrunner on mw1302 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [08:49:50] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4115944 (10ema) >>! In T187014#4113030, @Nuria wrote: > @ema: on our end we just look at the ip passed along via varnishkafka to geolocate,... [08:50:41] RECOVERY - HHVM jobrunner on mw1302 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time [08:53:21] PROBLEM - HHVM jobrunner on mw1310 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [08:54:21] RECOVERY - HHVM jobrunner on mw1310 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time [08:55:14] (03PS1) 10Marostegui: s1.hosts: Add db2092 to s1 [software] - 10https://gerrit.wikimedia.org/r/425013 (https://phabricator.wikimedia.org/T191275) [08:56:36] (03CR) 10Marostegui: [C: 032] s1.hosts: Add db2092 to s1 [software] - 10https://gerrit.wikimedia.org/r/425013 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [08:57:24] (03Merged) 10jenkins-bot: s1.hosts: Add db2092 to s1 [software] - 10https://gerrit.wikimedia.org/r/425013 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [08:58:07] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db2092 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425014 (https://phabricator.wikimedia.org/T170662) [08:59:35] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Add db2092 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425014 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [09:00:59] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2092 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425014 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [09:01:15] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2092 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425014 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [09:02:12] (03CR) 10JackPotte: [C: 031] Switch SET on frwiktionary to use wikitexteditor by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425009 (https://phabricator.wikimedia.org/T169741) (owner: 100x010C) [09:02:21] PROBLEM - DPKG on mw1280 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:02:22] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add db2092 to the config - T170662 (duration: 00m 58s) [09:02:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:28] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [09:03:21] PROBLEM - DPKG on mw1335 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:03:21] RECOVERY - DPKG on mw1280 is OK: All packages OK [09:03:38] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Add db2092 to the config - T170662 (duration: 00m 59s) [09:03:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:21] RECOVERY - DPKG on mw1335 is OK: All packages OK [09:06:41] PROBLEM - DPKG on mw1308 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:07:41] RECOVERY - DPKG on mw1308 is OK: All packages OK [09:09:35] (03CR) 10Giuseppe Lavagetto: [C: 032] Upgrade the repack script. [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/423745 (owner: 10Giuseppe Lavagetto) [09:09:51] PROBLEM - HHVM jobrunner on mw1336 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [09:09:57] (03CR) 10Giuseppe Lavagetto: [C: 032] New upstream version 0.36.0 [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/423746 (owner: 10Giuseppe Lavagetto) [09:10:51] RECOVERY - HHVM jobrunner on mw1336 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [09:13:10] (03CR) 10Giuseppe Lavagetto: [C: 032] New upstream version 0.37.0 [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/423747 (owner: 10Giuseppe Lavagetto) [09:17:21] PROBLEM - DPKG on mw1281 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:18:21] RECOVERY - DPKG on mw1281 is OK: All packages OK [09:19:51] PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm-dbg],Package[hhvm] [09:22:01] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 11 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm-dbg],Package[hhvm] [09:23:01] PROBLEM - HHVM jobrunner on mw1309 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [09:24:01] RECOVERY - HHVM jobrunner on mw1309 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time [09:26:31] PROBLEM - DPKG on mw1317 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:27:31] RECOVERY - DPKG on mw1317 is OK: All packages OK [09:28:51] PROBLEM - DPKG on mw1340 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:29:51] RECOVERY - DPKG on mw1340 is OK: All packages OK [09:33:09] <_joe_> !log all eqiad jobrunners migrated to ICU 57 (T189295) [09:33:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:15] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [09:42:01] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:42:30] (03PS1) 10Matthias Mullie: New SSH key for mlitn [puppet] - 10https://gerrit.wikimedia.org/r/425018 [09:49:36] (03PS1) 10Muehlenhoff: Remove mw1259/mw1260 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/425019 (https://phabricator.wikimedia.org/T187466) [09:49:42] RECOVERY - puppet last run on mw1281 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:54:36] !log upgrading mwdebug servers in eqiad to to ICU 57-enabled HHVM build (T189295) [09:54:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:42] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [09:55:31] (03CR) 10Muehlenhoff: [C: 032] New SSH key for mlitn [puppet] - 10https://gerrit.wikimedia.org/r/425018 (owner: 10Matthias Mullie) [09:57:28] (03CR) 10Aaron Schulz: [C: 031] Create a LockManager for WikidataDispatch with short TTL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395967 (https://phabricator.wikimedia.org/T178652) (owner: 10Addshore) [09:58:12] (03CR) 10Aaron Schulz: [C: 031] Use new wikibase dispatch lock manager on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395969 (https://phabricator.wikimedia.org/T178652) (owner: 10Addshore) [09:59:31] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425020 (https://phabricator.wikimedia.org/T128546) [10:00:38] (03PS1) 10Marostegui: db2092.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/425022 (https://phabricator.wikimedia.org/T170662) [10:01:27] (03CR) 10Marostegui: [C: 032] db2092.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/425022 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [10:01:39] (03CR) 10Jdrewniak: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425020 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:02:41] (03PS1) 10Elukey: profile::zookeeper::server: move monitoring/alerting to Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/425023 (https://phabricator.wikimedia.org/T177460) [10:02:53] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425020 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:03:08] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425020 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:04:41] PROBLEM - DPKG on mw1315 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:05:41] RECOVERY - DPKG on mw1315 is OK: All packages OK [10:08:12] PROBLEM - DPKG on mw1285 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:09:12] RECOVERY - DPKG on mw1285 is OK: All packages OK [10:09:39] !log jdrewniak@tin Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:425020|Bumping portals to master (T128546)]] (duration: 00m 59s) [10:09:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:45] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:10:40] !log jdrewniak@tin Synchronized portals: Wikimedia Portals Update: [[gerrit:425020|Bumping portals to master (T128546)]] (duration: 00m 59s) [10:10:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:13:17] !log completed upgrade of mw eqiad api appservers to ICU 57-enabled HHVM [10:13:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:06] !log upgrading tin/deploy1001 to a ICU 57-enabled HHVM build (T189295) [10:15:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:13] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [10:19:31] (03CR) 10DCausse: Logstash: Add initial network syslog parsing (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/424643 (owner: 10Ayounsi) [10:31:00] !log upgrading Boost libraries on app server canaries with a ICU 57-enabled HHVM build and restart HHVM (T189295) [10:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:06] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [10:39:02] (03PS1) 10Giuseppe Lavagetto: Revert "Revert "Stop forcing php5 in `mwscript`"" [puppet] - 10https://gerrit.wikimedia.org/r/425026 [10:39:13] <_joe_> elukey, moritzm ^^ [10:39:42] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "Stop forcing php5 in `mwscript`"" [puppet] - 10https://gerrit.wikimedia.org/r/425026 (owner: 10Giuseppe Lavagetto) [10:41:39] !log upgrading Boost libraries on mw1300 with a ICU 57-enabled HHVM build and restart HHVM (T189295) [10:41:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:41:45] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [10:43:04] (03PS2) 10Giuseppe Lavagetto: Revert "Revert "Stop forcing php5 in `mwscript`"" [puppet] - 10https://gerrit.wikimedia.org/r/425026 [10:44:55] (03CR) 10Muehlenhoff: [C: 031] Revert "Revert "Stop forcing php5 in `mwscript`"" [puppet] - 10https://gerrit.wikimedia.org/r/425026 (owner: 10Giuseppe Lavagetto) [10:49:55] 10Operations, 10wikidiff2, 10Patch-For-Review, 10WMDE-QWERTY-Team-Board: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4116304 (10WMDE-Fisch) >>! In T190717#4111781, @MoritzMuehlenhoff wrote: >>>! In T190717#4111420, @Lea_WMDE wrote: >> >> Could you put... [10:52:58] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::hhvm: default php to php7 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/425027 [10:53:06] <_joe_> moritzm: ^^ the followup [10:53:34] (03CR) 10Giuseppe Lavagetto: [C: 032] Revert "Revert "Stop forcing php5 in `mwscript`"" [puppet] - 10https://gerrit.wikimedia.org/r/425026 (owner: 10Giuseppe Lavagetto) [10:53:39] (03CR) 10jerkins-bot: [V: 04-1] profile::mediawiki::hhvm: default php to php7 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/425027 (owner: 10Giuseppe Lavagetto) [10:55:30] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::hhvm: default php to php7 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T189295) [11:00:53] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/10861/" [puppet] - 10https://gerrit.wikimedia.org/r/425023 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [11:01:18] (03PS2) 10Elukey: profile::zookeeper::server: move monitoring/alerting to Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/425023 (https://phabricator.wikimedia.org/T177460) [11:01:20] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T189295) (owner: 10Giuseppe Lavagetto) [11:04:47] !log upgrading Boost libraries on API server canaries with a ICU 57-enabled HHVM build and restart HHVM (T189295) [11:04:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:53] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [11:07:01] <_joe_> moritzm: uhm apparently we haven't installed php7 at all on deploy1001 [11:07:04] <_joe_> wtf [11:07:40] ah, no wait [11:07:42] _joe_: deploy1001 is Jessie [11:07:50] it was reimaged back to jessie.. [11:07:55] (03PS1) 10Hoo man: writeuptopageid.1: Fix typo [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/425028 [11:08:28] <_joe_> oh man, why on earth [11:08:38] <_joe_> paladox: do you have a link to an explanation? [11:08:48] commit 8121532 in puppet.git [11:09:29] but no futher comments on https://phabricator.wikimedia.org/T175288 [11:09:41] <_joe_> nice, so we've donwgraded to jessie, but not migrated [11:09:53] <_joe_> so now we have to reinstall it with stretch ASAP [11:09:58] <_joe_> rotfl [11:10:22] _joe_: I think there was an irc discussion [11:10:34] In -releng a few weeks ago [11:10:37] <_joe_> paladox: yeah I happened not to be around [11:10:44] <_joe_> paladox: a few weeks == 1 week ago [11:10:51] Oh [11:11:16] <_joe_> because this was merged exactly on the day I left for vacations, after I warned against the ICU incompatibility [11:11:19] I think it was reverted due to a script failing to run [11:11:30] (Translations or i18n) [11:11:37] <_joe_> and since the ICU upgrade is happening today, well.. [11:13:29] _joe_: moritzm https://phabricator.wikimedia.org/T185275#4087555 [11:14:08] <_joe_> paladox: yeah that was my suggestion, not to switch to that server because of the ICU upgrade [11:14:13] <_joe_> which is happening now [11:14:15] Yep [11:14:17] https://phabricator.wikimedia.org/T190909 [11:14:19] And ^^ [11:14:21] <_joe_> so this is all very unfortunate [11:14:43] <_joe_> that ticket is just wrong, or better [11:14:54] <_joe_> is the consequence of a bad decision, which I just reverted [11:15:06] <_joe_> anyways, sorry, need to focus a bit on the problem at hand [11:15:16] Ok [11:15:30] PROBLEM - Zookeeper Alive Client Connections too high on druid1001 is CRITICAL: CRITICAL - scalar(org_apache_ZooKeeperService_NumAliveConnections{instance=druid1001:12181, zookeeper_cluster=druid-analytics-eqiad}): bad_data: parse error at char 102: missing comma before next identifier druid https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=6&fullscreen [11:15:40] PROBLEM - Zookeeper Alive Client Connections too high on druid1004 is CRITICAL: CRITICAL - scalar(org_apache_ZooKeeperService_NumAliveConnections{instance=druid1004:12181, zookeeper_cluster=druid-public-eqiad}): bad_data: parse error at char 102: missing comma before next identifier druid https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=6&fullscreen [11:15:49] * paladox goes to lunch :) [11:16:30] the above are mine, migrating to prometheus [11:16:31] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler02/10862/" [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T189295) (owner: 10Giuseppe Lavagetto) [11:17:02] and I have tested those queries first, sigh [11:18:39] PROBLEM - Zookeeper Alive Client Connections too high on druid1006 is CRITICAL: CRITICAL - scalar(org_apache_ZooKeeperService_NumAliveConnections{instance=druid1006:12181, zookeeper_cluster=druid-public-eqiad}): bad_data: parse error at char 102: missing comma before next identifier druid https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=6&fullscreen [11:18:47] Anyone available to merge a 2-line change to a utility that's stored in Puppet? https://gerrit.wikimedia.org/r/#/c/424761/ -- already reviewed, just needs someone with +2 to hit the button. [11:19:31] 10Puppet, 10Beta-Cluster-Infrastructure: deployment-mira: puppet broken 2018-04-09 - https://phabricator.wikimedia.org/T191786#4116350 (10MarcoAurelio) [11:20:26] marlier: sure, will do it later on (after fixing my monitoring mess :) [11:20:27] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::hhvm: default php to php7 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T189295) [11:20:40] elukey: thanks! [11:22:35] (03PS1) 10Elukey: profile::zookeeper::server: fix prometheus query string [puppet] - 10https://gerrit.wikimedia.org/r/425029 (https://phabricator.wikimedia.org/T177460) [11:23:09] (03CR) 10Elukey: [C: 032] profile::zookeeper::server: fix prometheus query string [puppet] - 10https://gerrit.wikimedia.org/r/425029 (https://phabricator.wikimedia.org/T177460) (owner: 10Elukey) [11:27:24] (03CR) 10Elukey: [C: 032] coal: Fix property name that indicates an oversample [puppet] - 10https://gerrit.wikimedia.org/r/424761 (https://phabricator.wikimedia.org/T191239) (owner: 10Imarlier) [11:27:28] (03PS3) 10Elukey: coal: Fix property name that indicates an oversample [puppet] - 10https://gerrit.wikimedia.org/r/424761 (https://phabricator.wikimedia.org/T191239) (owner: 10Imarlier) [11:28:37] RECOVERY - Zookeeper Alive Client Connections too high on druid1001 is OK: OK - scalar(org_apache_ZooKeeperService_NumAliveConnections{instance=druid1001:12181, zookeeper_cluster=druid-analytics-eqiad}) within thresholds https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=6&fullscreen [11:28:37] RECOVERY - Zookeeper Alive Client Connections too high on druid1006 is OK: OK - scalar(org_apache_ZooKeeperService_NumAliveConnections{instance=druid1006:12181, zookeeper_cluster=druid-public-eqiad}) within thresholds https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=6&fullscreen [11:28:56] RECOVERY - Zookeeper Alive Client Connections too high on druid1004 is OK: OK - scalar(org_apache_ZooKeeperService_NumAliveConnections{instance=druid1004:12181, zookeeper_cluster=druid-public-eqiad}) within thresholds https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=6&fullscreen [11:29:01] \o/ [11:29:56] marlier: done! [11:30:22] Thank you! [11:35:55] moritzm: Might puppet broken beta be you? ("Apt::Repository[hhvm-icu57] is already declared at ...icu57, cannot redeclare at ...hhvm"). Task is T191786. Your commit 0b1a32743f seems related [11:35:56] T191786: deployment-mira: puppet broken 2018-04-09 - https://phabricator.wikimedia.org/T191786 [11:37:54] 10Puppet, 10Beta-Cluster-Infrastructure: deployment-mira: puppet broken 2018-04-09 - https://phabricator.wikimedia.org/T191786#4116392 (10EddieGP) p:05Triage>03Unbreak! Puppet broken on all the appservers, jobrunners and deployment servers in beta. [11:38:35] eddiegp: I think that /etc/puppet/modules/profile/manifests/beta/icu57.pp needs to be removed, pretty sure releng is the best team to ask for what's best [11:39:18] eddiegp: we need to remove profile::beta::icu57 from the deployment-prep Hiera config now that it's part of the standard app server manifests [11:39:53] Now let's find it in the various places cloud manages it's hiera in ;) [11:40:22] I'm checking horizon [11:40:44] Not in puppet.git [11:41:19] I've removed it from Horizon, next puppet runs should recover [11:41:50] I'll give it a try on one of the appservers [11:41:53] running on -tin now [11:42:08] !log removed profile::beta::icu57 from deployment-prep Hiera config now that the component is part of the standard app server manifests [11:42:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:45] puppet run succeeds. [11:43:04] Thanks moritzm elukey Hauskatze! [11:43:04] success on -tin [11:43:12] moving to mira [11:43:53] 10Puppet, 10Beta-Cluster-Infrastructure: deployment-mira: puppet broken 2018-04-09 - https://phabricator.wikimedia.org/T191786#4116399 (10EddieGP) 05Open>03Resolved a:03EddieGP Fixed by changing hiera, should recover with the next puppet run. [11:44:04] (03CR) 10Hoo man: [C: 032] Support prefixed dump types [dumps/dcat] - 10https://gerrit.wikimedia.org/r/390312 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [11:44:52] (03Merged) 10jenkins-bot: Support prefixed dump types [dumps/dcat] - 10https://gerrit.wikimedia.org/r/390312 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [11:45:29] * eddiegp gets back to work [11:45:33] see you later! [11:49:38] (03PS1) 10EddieGP: Remove profile::beta::icu57 [puppet] - 10https://gerrit.wikimedia.org/r/425032 [11:50:45] !log upgrading Boost libraries on remaining app servers with a ICU 57-enabled HHVM build and restart HHVM (T189295) [11:50:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:51] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [11:52:00] eddiegp: thanks, I'll merge that when I have some time available [12:01:30] !log upgrading Boost libraries on all mediawiki eqiad API server with a ICU 57-enabled HHVM build and restart HHVM (T189295) [12:01:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:37] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [12:05:02] <_joe_> !log upgrading boost, hhvm on terbium for ICU 57 upgrade (T189295) [12:05:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:44] <_joe_> moritzm: tin is the last one to upgrade, would you do the honours once you're done with the rest? [12:08:09] ack, completing the boost upgrade on app servers shortly, then I'll move on to tin [12:09:50] <_joe_> and the snapshots, of course [12:12:42] <_joe_> yeah, hhvm is untenably slower [12:14:06] <_joe_> for mediawiki.org, it took 4 minutes 35 seconds, vs 3m 13 seconds for php 7.0 [12:15:03] (03PS1) 10Hoo man: Make DCAT backwards compatible to old config [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425038 [12:19:40] (03PS2) 10Hoo man: Make DCAT backwards compatible to old config [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425038 [12:23:30] <_joe_> !log preparing to run updateCollation from mw1338: stop videoscaler, disable puppet (T189295) [12:23:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:37] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [12:25:59] (03CR) 10Hoo man: [C: 032] "While testing this for deployment (with the old configuration), I noticed that this breaks compatibility with the old configuration format" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/390312 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [12:27:22] PROBLEM - HHVM rendering on mw1283 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [12:27:32] PROBLEM - Apache HTTP on mw1283 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.003 second response time [12:27:33] (03CR) 10Hoo man: [C: 032] "Sorry, I pasted the wrong link https://gerrit.wikimedia.org/r/425038 is the follow up that fixes compatibility." [dumps/dcat] - 10https://gerrit.wikimedia.org/r/390312 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [12:27:54] mw1283 is me, should solve soon [12:28:23] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 79390 bytes in 0.352 second response time [12:28:32] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.142 second response time [12:33:07] (03PS1) 10Vgutierrez: lvs: Get rid of interface names on site.pp [puppet] - 10https://gerrit.wikimedia.org/r/425040 (https://phabricator.wikimedia.org/T177961) [12:34:22] PROBLEM - puppet last run on mw1331 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 26 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[libboost1.55-dbg] [12:35:35] (03PS3) 10Hoo man: Make DCAT backwards compatible to old config [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425038 (https://phabricator.wikimedia.org/T163328) [12:39:22] RECOVERY - puppet last run on mw1331 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [12:39:36] !log upgrading Boost libraries on job runners with a ICU 57-enabled HHVM build and restart HHVM (T189295) [12:39:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:42] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [12:41:32] 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure, 10media-storage, 10Patch-For-Review: Puppet broken on deployment-ms-be0[34] with evaluation error in swift module - https://phabricator.wikimedia.org/T184236#4116525 (10MarcoAurelio) ``` Linux deployment-ms-be04 4.9.0-0.bpo.5-amd64 #1 SMP Debian 4.9.... [12:42:38] unable to init lv-a for swift at /etc/puppet/modules/swift/manifests/init_device.pp:3:9 at /etc/puppet/modules/role/manifests/swift/storage.pp:23 on node deployment-ms-be04.deployment-prep.eqiad.wmflabs [12:46:38] (03PS4) 10MarcoAurelio: admin: change ssh key for Sharvaniharan [puppet] - 10https://gerrit.wikimedia.org/r/424778 (https://phabricator.wikimedia.org/T191673) [12:48:52] PROBLEM - HHVM jobrunner on mw1300 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.003 second response time [12:49:37] moritzm: is https://gerrit.wikimedia.org/r/#/c/424778/ okay? - refs. T191673 [12:49:38] T191673: Update SSH key in production hosts for @Sharvaniharan - https://phabricator.wikimedia.org/T191673 [12:49:52] RECOVERY - HHVM jobrunner on mw1300 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [12:51:17] (03CR) 10Muehlenhoff: "Thanks. I'll have a look at this and merge soon when the ICU migration has calmed down." [puppet] - 10https://gerrit.wikimedia.org/r/424778 (https://phabricator.wikimedia.org/T191673) (owner: 10MarcoAurelio) [12:54:53] !log upgrading Boost libraries on mwdebug with a ICU 57-enabled HHVM build and restart HHVM (T189295) [12:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:55:00] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [12:57:31] (03CR) 10Lokal Profil: "Would it make sense to rather add a version number to the config and check for that in the script? (rather than maintaining backwards comp" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425038 (https://phabricator.wikimedia.org/T163328) (owner: 10Hoo man) [12:58:04] (03PS1) 10Ema: varnish: restart backends every 7 days [puppet] - 10https://gerrit.wikimedia.org/r/425045 (https://phabricator.wikimedia.org/T181315) [12:59:42] (03CR) 10Lokal Profil: "The related config update is https://gerrit.wikimedia.org/r/#/c/424291/" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/390312 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [13:00:04] (03CR) 10Hoo man: "> Would it make sense to rather add a version number to the config and check for that in the script?" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425038 (https://phabricator.wikimedia.org/T163328) (owner: 10Hoo man) [13:00:32] (03CR) 10Lokal Profil: "The related config update is https://gerrit.wikimedia.org/r/#/c/424288/" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/386366 (https://phabricator.wikimedia.org/T178993) (owner: 10JakobVoss) [13:00:47] !log sbisson@tin Started deploy [tilerator/deploy@aef010b]: Deploying tilerator i18n to maps-test* (with updated source and style) [13:00:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:19] !log sbisson@tin Finished deploy [tilerator/deploy@aef010b]: Deploying tilerator i18n to maps-test* (with updated source and style) (duration: 00m 33s) [13:01:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:00] what's with joucebot? [13:02:31] (03CR) 10Jcrespo: "Just one small nitpick." (031 comment) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/420746 (owner: 10Rduran) [13:02:45] anyway, seems like there is nothing for swat [13:03:10] <_joe_> !log upgrading HHVM / libboost for ICU 57 upgrade (T189295) [13:03:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:16] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [13:03:28] <_joe_> zeljkof: good, is there anything for later on? [13:03:44] !log Stop MySQL on db1080 for mariadb and kernel upgrade [13:03:47] (03PS1) 10Ema: Revert "varnish: restart backends every 3.5 days" [puppet] - 10https://gerrit.wikimedia.org/r/425046 [13:03:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:04:14] (03Abandoned) 10Ema: varnish: restart backends every 7 days [puppet] - 10https://gerrit.wikimedia.org/r/425045 (https://phabricator.wikimedia.org/T181315) (owner: 10Ema) [13:05:31] _joe_: I don't see anything for today https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180409T1300 [13:06:08] <_joe_> zeljkof: ok thanks [13:07:25] (03PS1) 10Marostegui: db-codfw.php: Pool db2092 in s1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425047 (https://phabricator.wikimedia.org/T170662) [13:08:28] (03PS2) 10Ema: Revert "varnish: restart backends every 3.5 days" [puppet] - 10https://gerrit.wikimedia.org/r/425046 [13:10:16] (03PS2) 10Vgutierrez: lvs: Get rid of interface names on site.pp [puppet] - 10https://gerrit.wikimedia.org/r/425040 (https://phabricator.wikimedia.org/T177961) [13:11:50] (03PS2) 10Marostegui: db-codfw.php: Pool db2092 in s1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425047 (https://phabricator.wikimedia.org/T170662) [13:13:11] _joe_: anything going on at the moment? there is a patch for swat, just added https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180409T1300 [13:13:20] ok to deploy it? [13:13:31] zeljkof: I was about to deploy: https://gerrit.wikimedia.org/r/#/c/425047/ but I can wait :) [13:13:41] (03CR) 10Vgutierrez: "pcc (lvs1001 & lvs5003) looks good: https://puppet-compiler.wmflabs.org/compiler02/10866/" [puppet] - 10https://gerrit.wikimedia.org/r/425040 (https://phabricator.wikimedia.org/T177961) (owner: 10Vgutierrez) [13:14:09] <_joe_> !log started updateCollation.php maintenance script for the ICU 57 migration (T189295) [13:14:11] marostegui: ok, I'll let you know when I am done [13:14:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:16] T189295: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295 [13:14:18] zeljkof: great - thanks [13:14:31] <_joe_> zeljkof: go on, if scap is slow, it's expected at this point in time [13:14:32] !log upgrading Boost libraries on mwdebug with a ICU 57-enabled HHVM build and restart HHVM (T189295) [13:14:37] <_joe_> sync-file should be ok [13:14:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:46] Zoranzoki21: around for swat? [13:14:57] Zoranzoki21: you can start [13:16:24] Zoranzoki21: reviewing it [13:17:03] zeljkof: Ok. Before adding +2, do rebase. Thanks [13:18:33] (03PS2) 10Gehel: wdqs: LVS and conftool configuration for new wdqs-internal service [puppet] - 10https://gerrit.wikimedia.org/r/424599 (https://phabricator.wikimedia.org/T187766) [13:18:36] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4116636 (10Ottomata) > No, X-Client-IP is either: ...ehhh wha? We used to collect XFF on the webrequest side, and then parse it to get `ip`... [13:18:53] Zoranzoki21: looks like rebase is not needed [13:19:17] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423188 (https://phabricator.wikimedia.org/T190944) (owner: 10Zoranzoki21) [13:19:26] (03CR) 10Ema: [C: 032] Revert "varnish: restart backends every 3.5 days" [puppet] - 10https://gerrit.wikimedia.org/r/425046 (owner: 10Ema) [13:19:31] zeljkof: I think same, but jerkins.. [13:20:27] (03Abandoned) 10Ema: Revert "varnish: restart backends every 3.5 days" [puppet] - 10https://gerrit.wikimedia.org/r/421943 (owner: 10BBlack) [13:20:31] (03Merged) 10jenkins-bot: Enable on ku.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423188 (https://phabricator.wikimedia.org/T190944) (owner: 10Zoranzoki21) [13:20:47] (03CR) 10jenkins-bot: Enable on ku.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423188 (https://phabricator.wikimedia.org/T190944) (owner: 10Zoranzoki21) [13:21:02] 10Operations, 10ops-eqiad, 10DBA: Rack and setup 8 new eqiad DBs - https://phabricator.wikimedia.org/T191792#4116638 (10Marostegui) p:05Triage>03Normal [13:21:25] zeljkof: on mwdebug1002 is? [13:22:00] Zoranzoki21: yes, it's there [13:22:42] zeljkof: Wait to test [13:27:02] zeljkof: All is ok, you can deploy [13:28:16] Zoranzoki21: ok, deploying [13:29:28] zeljkof, hello, do you have time in the SWAT window? [13:29:29] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:423188|Enable on ku.wikipedia (T190944)]] (duration: 00m 57s) [13:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:36] T190944: Enable on ku.wikipedia - https://phabricator.wikimedia.org/T190944 [13:29:45] Urbanecm: 30 more minutes [13:30:06] Zoranzoki21: deployed, please check [13:30:14] Ok, I'll add my patches into the calendar. Thanks! [13:30:57] zeljkof: ok is all. Thank you! [13:31:30] Zoranzoki21: thanks for deploying with #releng! ;) [13:31:47] zeljkof, added, can you please process my patches? Thanks! [13:31:56] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4116681 (10Joe) [13:31:57] Urbanecm: starting :) [13:32:14] (03PS2) 10Zfilipin: Fix broken line that includes a group into a group by mistake [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424851 (https://phabricator.wikimedia.org/T191719) (owner: 10Urbanecm) [13:32:22] PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 614.98 seconds [13:33:31] (03PS2) 10Urbanecm: Add adm.dp.gov.ua to wgCopyUploadDomains, change if.gov.ua to www.if.gov.ua [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424756 (https://phabricator.wikimedia.org/T191692) [13:33:38] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424851 (https://phabricator.wikimedia.org/T191719) (owner: 10Urbanecm) [13:34:30] (03PS2) 10Urbanecm: Enable RelatedArticles for vector at hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424757 (https://phabricator.wikimedia.org/T191573) [13:34:50] (03Merged) 10jenkins-bot: Fix broken line that includes a group into a group by mistake [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424851 (https://phabricator.wikimedia.org/T191719) (owner: 10Urbanecm) [13:35:01] (03PS1) 10Gehel: wdqs-internal: new entry for service discovery [dns] - 10https://gerrit.wikimedia.org/r/425051 (https://phabricator.wikimedia.org/T187766) [13:35:38] Urbanecm: 424851 is at mwdebug [13:35:59] Testing [13:36:14] (03PS3) 10Zfilipin: Add adm.dp.gov.ua to wgCopyUploadDomains, change if.gov.ua to www.if.gov.ua [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424756 (https://phabricator.wikimedia.org/T191692) (owner: 10Urbanecm) [13:36:18] zeljkof, please deploy, working [13:36:28] Urbanecm: deploying [13:36:32] ack [13:37:36] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:424851|Fix broken line that includes a group into a group by mistake (T191719)]] (duration: 00m 59s) [13:37:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:42] T191719: Fix broken IS.php line about "epcoordinator" for arbcom@cswiki - https://phabricator.wikimedia.org/T191719 [13:37:59] Urbanecm: deployed [13:38:25] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424756 (https://phabricator.wikimedia.org/T191692) (owner: 10Urbanecm) [13:39:08] ack [13:39:34] (03CR) 10jenkins-bot: Fix broken line that includes a group into a group by mistake [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424851 (https://phabricator.wikimedia.org/T191719) (owner: 10Urbanecm) [13:39:50] (03Merged) 10jenkins-bot: Add adm.dp.gov.ua to wgCopyUploadDomains, change if.gov.ua to www.if.gov.ua [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424756 (https://phabricator.wikimedia.org/T191692) (owner: 10Urbanecm) [13:40:33] Urbanecm: 424756 is at mwdebug [13:40:45] ack, testing [13:41:11] (03PS3) 10Zfilipin: Enable RelatedArticles for vector at hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424757 (https://phabricator.wikimedia.org/T191573) (owner: 10Urbanecm) [13:41:52] working, please deploy [13:42:24] Urbanecm: deploying [13:42:57] ack [13:43:18] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:424756|Add adm.dp.gov.ua to wgCopyUploadDomains, change if.gov.ua to www.if.gov.ua (T191692)]] (duration: 00m 59s) [13:43:23] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424757 (https://phabricator.wikimedia.org/T191573) (owner: 10Urbanecm) [13:43:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:24] T191692: Add adm.dp.gov.ua to copy whitelist domains - https://phabricator.wikimedia.org/T191692 [13:43:38] Urbanecm: 424756 is deployed [13:44:04] ack [13:44:45] (03Merged) 10jenkins-bot: Enable RelatedArticles for vector at hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424757 (https://phabricator.wikimedia.org/T191573) (owner: 10Urbanecm) [13:44:47] (03CR) 10jenkins-bot: Add adm.dp.gov.ua to wgCopyUploadDomains, change if.gov.ua to www.if.gov.ua [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424756 (https://phabricator.wikimedia.org/T191692) (owner: 10Urbanecm) [13:44:59] (03CR) 10jenkins-bot: Enable RelatedArticles for vector at hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424757 (https://phabricator.wikimedia.org/T191573) (owner: 10Urbanecm) [13:45:33] Urbanecm: 424757 is at mwdebug [13:46:12] testing [13:46:25] working, please deploy [13:47:17] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4116734 (10Joe) [13:47:21] Urbanecm: deploying [13:48:08] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:424757|Enable RelatedArticles for vector at hewiki (T191573)]] (duration: 00m 59s) [13:48:09] Alaa: as you see zeljkof is currently doing SWAT, but it's mostly done [13:48:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:15] T191573: Add RelatedArticles to Vector skin on he.wikipedia - https://phabricator.wikimedia.org/T191573 [13:48:43] Urbanecm: deployed, please check and thanks for deploying with #releng ;) [13:48:55] Snoke: Alaa done with swat [13:49:06] Thank you! [13:49:09] !log EU SWAT finished [13:49:12] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1971 bytes in 0.084 second response time [13:49:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:16] zeljkof: Alaa needs a sysadmin for deleting a page with > 5000 versions [13:49:22] (to check) [13:49:26] Yup exactly [13:49:28] 5451 revisions [13:49:35] (03CR) 10Marostegui: [C: 032] db-codfw.php: Pool db2092 in s1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425047 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [13:49:36] _joe_: marostegui : done with eu swat [13:49:37] this page https://ko.wikipedia.org/w/index.php?title=%EC%82%AC%EC%9A%A9%EC%9E%90:Dynamicwork/%ED%86%B5%EA%B3%84&action=history [13:49:43] zeljkof: :) [13:50:19] Snoke: Alaa sorry, can not help with that [13:50:24] :( [13:50:40] I think that is DBA matter [13:50:41] hm, who from ops is currently online? [13:50:53] (03Merged) 10jenkins-bot: db-codfw.php: Pool db2092 in s1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425047 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [13:50:59] I'm talking about it with anomie over somewhere [13:51:05] marostegui: is that (deleting a page with >5000 revisions a DBA matter, or is a sysadmin "enough"? [13:51:10] I think I should bring anomie to here ;o [13:51:10] (03CR) 10jenkins-bot: db-codfw.php: Pool db2092 in s1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425047 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [13:51:22] RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 296.95 seconds [13:51:56] you don't need a sysadmin to delete a page with more than 5000 revids, just a steward [13:52:06] Snoke: I don't think you need a DBA for that, we do not touch those [13:52:08] Snoke: the channel description says "Ops Clinic Duty: herron" [13:52:20] Hauskatze: I think we already have two ones here, who asked already for a sysadmin :D [13:52:28] marostegui: ok, thx :) [13:52:33] revi: Alaa ^ [13:52:35] Hauskatze: "Everything between those numbers *may* be feasible, but proceed with caution and definitely seek advice from the ops team." [13:52:39] Yup yup [13:52:46] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Pool db2092 in s1 T170662 (duration: 00m 59s) [13:52:48] what our guide says it to pint -operations in case of large page deletions just in case [13:52:48] (03PS4) 10Elukey: Modify eventlogging purging script to read from YAML whitelist [puppet] - 10https://gerrit.wikimedia.org/r/420685 (https://phabricator.wikimedia.org/T189692) (owner: 10Mforns) [13:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:52] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [13:53:06] I'm confusing with all of this messages :D [13:53:30] (03CR) 10Gilles: Fix $wgLocalFileRepo definition (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424618 (https://phabricator.wikimedia.org/T191643) (owner: 10Gilles) [13:53:36] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425052 [13:53:39] Alaa: use "본인의 사용자 문서 삭제 신청" as reason [13:53:59] so I'll delete it now :D [13:54:05] that will relieve locals not having to understand English :P [13:54:06] Alaa: btw, looks a -tech [13:54:12] Dereckson replied as well :D [13:54:17] revi: where did you see the "proceed with caution and definitely seek advice from the ops team."? [13:54:28] 10Operations, 10Goal, 10Patch-For-Review: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208#4116751 (10Marostegui) [13:54:29] it was docs on stewardwiki [13:54:33] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#4116752 (10Marostegui) [13:54:35] I will update with best practices [13:54:42] ah, then I cannot update it [13:54:48] give me the best practice [13:54:55] but you can email us jynus and we can update it for you [13:54:57] and I will be the proxy [13:54:58] ok [13:55:20] I think stewards-l@lists.wikimedia.org allows @wikimedia.org mails? [13:56:06] Done, deleted :D [13:56:08] https://ko.wikipedia.org/wiki/%EC%82%AC%EC%9A%A9%EC%9E%90:Dynamicwork/%ED%86%B5%EA%B3%84 [13:56:10] if needed, I think moderator can be approve from external mails (steward-l ) [13:56:14] What I can do is to check server logs if all is fine. [13:56:15] confirmed [13:56:16] revi: yes [13:56:27] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4037934 (10matmarex) [13:56:58] thanks for everyone :) [13:57:00] The only thing I see raising is 77 occurences of this: Notice: JobQueueGroup::__destruct: 1 buffered job(s) of type(s) JobSpecification never inserted. in /srv/mediawiki/php-1.31.0-wmf.28/includes/jobqueue/JobQueueGroup.php on line 472 [13:57:35] and it's possible it's unrelated [13:57:37] jynus: if you need old best practice I assume I can give it to you so you can compare it? (not sure, Hauskatze?) [13:57:37] Yes, because it's only 5,451 revisions (not that high) [13:57:49] revi: "deletes with more than dozens of thousands of edits are known to have failed in the past. It is almost impossible to break anything, because mechanisms that prevent slow database changes, but if you see it consistently failing, report the issue on phabricator so website operators can have a look or delete in another way" [13:58:07] (03PS2) 10Gehel: wdqs: new wdqs-internal service [dns] - 10https://gerrit.wikimedia.org/r/424587 (https://phabricator.wikimedia.org/T187766) [13:58:14] (03CR) 10Muehlenhoff: [C: 032] admin: change ssh key for Sharvaniharan [puppet] - 10https://gerrit.wikimedia.org/r/424778 (https://phabricator.wikimedia.org/T191673) (owner: 10MarcoAurelio) [13:58:20] (03PS5) 10Muehlenhoff: admin: change ssh key for Sharvaniharan [puppet] - 10https://gerrit.wikimedia.org/r/424778 (https://phabricator.wikimedia.org/T191673) (owner: 10MarcoAurelio) [13:58:34] (03Abandoned) 10Gehel: wdqs-internal: new entry for service discovery [dns] - 10https://gerrit.wikimedia.org/r/425051 (https://phabricator.wikimedia.org/T187766) (owner: 10Gehel) [13:58:50] revi: everything on that wiki is private, you must not quote or link to nothing hosted there [13:58:54] hmmmm [13:58:56] I would put the limit now a days on dozens of thousands, but in most cases, the action will fail rather than create issues [13:59:00] Hauskatze: yeah, gotit [13:59:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425052 (owner: 10Marostegui) [13:59:44] Hauskatze: note that all ops have signed an NDA, as we technically have access [14:00:07] Yes, but the channel is public and logged [14:00:13] correct [14:00:14] ofc we have DMs [14:00:28] my advice, however, is public [14:00:32] revi: then make sure that both users are using SSL :P [14:00:36] applies to everybody [14:00:55] and/or emails [14:01:34] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425052 (owner: 10Marostegui) [14:02:14] if the delete, for example, was due to private data, you do not want it announced publicly [14:02:25] you would create a "security ticket" [14:03:06] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425052 (owner: 10Marostegui) [14:03:08] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 after alter table, kernel and mariadb upgrade (duration: 00m 59s) [14:03:11] 10Operations, 10Patch-For-Review: Update SSH key in production hosts for @Sharvaniharan - https://phabricator.wikimedia.org/T191673#4116769 (10MoritzMuehlenhoff) 05Open>03Resolved @MarcoAurelio Thanks for preparing a patch, now merged. @Sharvaniharan : I've ran puppet on releases1001.eqiad.wmnet, you shou... [14:03:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:22] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 4 others: Thumbor incorrectly normalizes .jpe and .jpeg into .jpg for Swift thumbnail storage - https://phabricator.wikimedia.org/T191028#4116773 (10Gilles) 05Open>03Resolved The issue should fix itself (and it has, for the file me... [14:05:41] Fun task of the day. Send to storage a 16 Gb and a 18 Gb video for Commons. [14:06:35] (03PS1) 10Marostegui: db-eqiad.php: Depool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425054 (https://phabricator.wikimedia.org/T187089) [14:06:53] (03CR) 10Muehlenhoff: [C: 031] Cumin masters in WMCS: upgrade to python3 [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) (owner: 10Volans) [14:08:31] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425054 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [14:09:12] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1967 bytes in 0.105 second response time [14:09:46] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425054 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [14:10:00] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425054 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [14:10:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1067 for alter table (duration: 00m 59s) [14:11:02] !log Deploy schema change on db1067 - T187089 T185128 T153182 [14:11:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:11] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [14:11:12] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [14:11:14] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [14:12:06] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425056 [14:12:13] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 624.23 seconds [14:12:34] jynus: just to make one thing clear: we (stewards) no longer need to ping ops every time we do bigdelete, right? [14:13:18] in general, things will fail rather than break [14:13:35] improvements yay xD [14:13:36] but don't do those like very quickly [14:13:45] many of those at the same time, etc. [14:13:47] well... bigdelete requests are rare anyway [14:13:51] jynus: I've found that doing those kind of deletions via API/ApiSandbox are faster and the DBs seem to 'suffer' less, is that okay? [14:14:04] shouldn't be different [14:14:04] I think it happens one per 3-4 months or less [14:14:10] that is ok [14:14:19] aren't API requests processed by different servers? [14:14:26] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425056 (owner: 10Marostegui) [14:14:27] so that might be why [14:14:29] bigdelete requests are rare, we may get one or two each two months or so [14:14:36] one think I can propose you [14:14:44] is to setup some time for Q&A [14:14:52] I see there are many myths going around [14:15:06] probably... some sort of history not cleaned up [14:15:07] and that people are avoiding doing things [14:15:07] in any case I never do more than one bigdelete at the same time [14:15:16] because at some point they had problems [14:15:20] yep [14:15:22] 10Operations, 10Ops-Access-Requests: Access to the deployment hosts for Imarlier - https://phabricator.wikimedia.org/T191704#4116808 (10herron) p:05Triage>03Normal [14:15:24] but then they were fixed [14:15:35] (03PS2) 10Dereckson: Always show latest revision even if not reviewed on hu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354549 (https://phabricator.wikimedia.org/T121995) [14:15:35] but people was still with the old mindset [14:15:39] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425056 (owner: 10Marostegui) [14:15:50] but probably not the channel or the time [14:15:51] well we operate with the existing docs, if they're outdated or incomplete we can't be held responsible imho [14:15:53] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425056 (owner: 10Marostegui) [14:15:57] 10Operations, 10Ops-Access-Requests: Access to the deployment hosts for Imarlier - https://phabricator.wikimedia.org/T191704#4114326 (10herron) Hi Ian, I've updated the task description with a general access request checklist (which is mostly complete already). Implementation wise it looks like adding user `i... [14:16:00] probably because we are not notified of the new practices [14:16:07] indeed [14:16:07] or improvements [14:16:12] nor docs are updated [14:16:13] feel free to propose some questions, and send them this way- I may not be able to answer all [14:16:28] but I can ask the developers on your behalf, too [14:17:00] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1080 after alter table, kernel and mariadb upgrade (duration: 00m 59s) [14:17:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:12] for example, there are right now some changes coming up [14:17:18] to stop doing long renames [14:17:20] jynus: saw your question on cloud-l, I'm going to compile the data [14:17:24] (03CR) 10Dereckson: "PS2: rebased, date of the test period updated" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354549 (https://phabricator.wikimedia.org/T121995) (owner: 10Dereckson) [14:17:34] so they perform instantly [14:17:40] you will probably love that [14:17:45] oh, that'd be great [14:17:45] indeed [14:17:47] marostegui: you're done with Tin? I was distracted with the delete issue, but I had something to add to swat. [14:17:57] Dereckson: yep! [14:18:07] legoktm: usually notifies us of upcoming globalrename stuff, but that don't happen for general MW [14:18:12] thanks [14:18:28] 10Operations, 10Ops-Access-Requests: Access to the deployment hosts for Imarlier - https://phabricator.wikimedia.org/T191704#4114326 (10herron) [14:18:29] so while we are well informed about globalrename-related changes, we're not always up to date on mw improvements that makes our life easier [14:18:29] Dereckson: hi, I added you to a server-side upload ticket, was that okay? I don't know anyone that did that stuff recently [14:18:31] Dereckson: I will need to keep pushing things one you are done :) [14:18:43] yeah - so some occasional QnA time would be great [14:18:51] agree with revi [14:18:52] Hauskatze: yes, I'll check that in a few minutes [14:18:53] revi: I offes some of my time [14:18:57] organize something [14:19:02] marostegui: ack'ed [14:19:03] marostegui: tómate un chato de vino mientras :P [14:19:05] ok! [14:19:08] or maybe have people write questions [14:19:22] Hauskatze: hahaha :) [14:19:24] I think we can compile multiple questions and send it over mail [14:19:24] and we can setup some both live and non-live answers [14:19:32] yep, that is ok [14:19:33] then we can start from there [14:20:06] I will also ask someone from platform to help [14:20:17] (Mediawiki Platform Team) [14:20:58] (03PS1) 10Marostegui: db-eqiad.php: Increase db1080 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425059 [14:21:13] Dereckson: ^ That one I will wait till you are done :) [14:22:25] (03CR) 10Dereckson: [C: 032] Always show latest revision even if not reviewed on hu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354549 (https://phabricator.wikimedia.org/T121995) (owner: 10Dereckson) [14:23:41] (03Merged) 10jenkins-bot: Always show latest revision even if not reviewed on hu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354549 (https://phabricator.wikimedia.org/T121995) (owner: 10Dereckson) [14:24:51] 354549 pulled on mwdebug1002 [14:27:05] (03CR) 10Elukey: "First pass + some minor comments, overall it looks good!" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/420685 (https://phabricator.wikimedia.org/T189692) (owner: 10Mforns) [14:28:11] !log dereckson@tin Synchronized wmf-config/flaggedrevs.php: Always show latest revision even if not reviewed on hu.wikipedia (T121995) (duration: 00m 59s) [14:28:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:18] T121995: Switch FlaggedRevs on Hungarian Wikipedia to a "flagged protection" mode - https://phabricator.wikimedia.org/T121995 [14:28:28] marostegui: I'm done [14:28:30] Thanks [14:28:39] Dereckson: Thanks! [14:28:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase db1080 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425059 (owner: 10Marostegui) [14:29:31] (03CR) 10jenkins-bot: Always show latest revision even if not reviewed on hu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354549 (https://phabricator.wikimedia.org/T121995) (owner: 10Dereckson) [14:29:41] jynus: who manages swift nowadays? [14:30:14] (03Merged) 10jenkins-bot: db-eqiad.php: Increase db1080 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425059 (owner: 10Marostegui) [14:31:40] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1080 after alter table, kernel and mariadb upgrade (duration: 00m 59s) [14:31:44] Dereckson: what do you need? [14:31:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:04] jynus: to write a file > 4 Gb [14:32:49] I would ask any deployer that knows how to do that [14:33:42] because I am going to guess there is a maintenance script for that [14:34:11] (03PS1) 10Marostegui: db2069.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/425064 (https://phabricator.wikimedia.org/T191275) [14:34:52] (03CR) 10Marostegui: [C: 032] db2069.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/425064 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [14:34:54] (03PS1) 10Lokal Profil: [WIP]Add versioning for config and validate it [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425065 (https://phabricator.wikimedia.org/T163328) [14:34:55] Yes, but this maintenance script gives here an error: Could not write file "mwstore://local-swift-eqiad/local-public/..." because it is larger than {{PLURAL:4294967296|one byte|4294967296 bytes}}. [14:35:09] (03PS2) 10Giuseppe Lavagetto: Upgrade to 0.37.0 [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/423748 [14:35:11] (03CR) 10jenkins-bot: db-eqiad.php: Increase db1080 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425059 (owner: 10Marostegui) [14:35:14] (03CR) 10jerkins-bot: [V: 04-1] [WIP]Add versioning for config and validate it [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425065 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [14:36:13] uh oh, that's a 32 bits unsigned [14:36:36] (03PS1) 10Muehlenhoff: Update SSH key for santhosh [puppet] - 10https://gerrit.wikimedia.org/r/425067 [14:36:40] could be we store the size of the file as a 32 bits field [14:37:14] 10Operations, 10OCG-General, 10Readers-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#4116923 (10ovasileva) [14:37:32] Dereckson: https://commons.wikimedia.org/wiki/Help:Server-side_upload [14:38:03] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425068 [14:38:06] https://phabricator.wikimedia.org/rOMWCa6e5f230be5c0d8f9da181f8509e4cf3062419ea [14:38:23] swift limit seems to be 5GB [14:38:26] this limit is for files you give to https://commons.wikimedia.org/wiki/Special:Upload [14:38:42] 10Operations, 10Collection, 10OfflineContentGenerator, 10Readers-Web-Backlog (Tracking), 10Services (watching): Replace OCG in collection extension with Electron - https://phabricator.wikimedia.org/T150872#4116935 (10ovasileva) [14:38:44] but yes it will be triggered too here [14:39:00] s/is for/was intended for [14:39:25] We would have to modify MW to allow files in the 4Gb -> 5GB range [14:40:06] last i checked (which was a long time ago) there was a hardcoded limit in MW of 2^32 bytes in MW, but everything could in theory support 5GB if the limit was removed [14:40:23] [this is going on memory of something that I looked at years ago. May be wrong or just outdated] [14:40:32] Dereckson: I would create a ticket with the exact srep you are at, add multimedia and media-storge [14:41:10] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425068 (owner: 10Marostegui) [14:42:29] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425068 (owner: 10Marostegui) [14:42:33] (03CR) 10Muehlenhoff: [C: 032] Update SSH key for santhosh [puppet] - 10https://gerrit.wikimedia.org/r/425067 (owner: 10Muehlenhoff) [14:42:44] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425068 (owner: 10Marostegui) [14:43:41] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1080 after alter table, kernel and mariadb upgrade (duration: 00m 59s) [14:43:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:56] I think FileBackend::maxFileSizeInternal() would need to be overriden [14:44:05] there's a ticket about this from years ago [14:45:21] https://phabricator.wikimedia.org/T116514 states it's hardcoded [14:45:25] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425071 [14:45:57] But we'll need green light from storage ops and performance to be sure this limit can be raised. [14:46:00] 10Operations, 10OCG-General, 10Readers-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#4117048 (10ovasileva) [14:46:34] AaronSchulz is the one who knows about this stuff from a swift end I believe [14:48:45] wgMaxUploadSize has already been restored to 2^32 by the way in the configuration [14:49:12] (03PS1) 10Muehlenhoff: Add new SSH key for esanders [puppet] - 10https://gerrit.wikimedia.org/r/425072 [14:49:14] How big a file are we talking about anyways? [14:49:27] 16 Gb and 18 Gb videos [14:50:09] (03CR) 10Muehlenhoff: [C: 032] Add new SSH key for esanders [puppet] - 10https://gerrit.wikimedia.org/r/425072 (owner: 10Muehlenhoff) [14:50:23] Sizes we're going to encounter more and more. [14:50:30] <_joe_> bawolff: I'd wait for godog to be back too, I'm not sure we're equipped at all for handling such large files [14:50:55] Well we definitely can't store a 16 gb file in swift with the current architecture [14:50:57] <_joe_> Dereckson: well then we need to plan and build our infrastructure accordingly [14:51:21] We don't have support for using swift "large object" thingies on the MW side at all (afaik) [14:51:34] <_joe_> and it will take time, so I guess you want to talk with the product manager of the multimedia team [14:52:09] 10Operations, 10ops-eqiad, 10netops: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4117109 (10Marostegui) I have double checked with @ayounsi the hosts that need to have a longer downtime as they have to be moved physically to another rack, and those are the ones... [14:52:30] Dereckson: btw, For background, see https://docs.openstack.org/swift/latest/overview_large_objects.html [14:52:49] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425071 (owner: 10Marostegui) [14:54:03] I think the likely answer here will be to split the file into multiple smaller files [14:54:04] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425071 (owner: 10Marostegui) [14:54:10] Okay, so, let's prepare an epic with this large storage goal for the Multimedia team. Meanwhile, we'll ask video producers/uploaders to reencode videos at lower sizes. [14:54:19] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425071 (owner: 10Marostegui) [14:54:23] 10Operations, 10Ops-Access-Requests: Access to the deployment hosts for Imarlier - https://phabricator.wikimedia.org/T191704#4117111 (10RobH) All deployment access has to have the sign off of not only the SRE team, but the release engineering team. We'll need either @greg or @demon to sign off on this task ap... [14:54:40] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4117113 (10Papaul) p:05Triage>03Normal [14:55:12] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1080 after alter table, kernel and mariadb upgrade (duration: 00m 59s) [14:55:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:31] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4117116 (10Papaul) @Marostegui which disk you want to replace first? 1 or 4 ? [14:58:58] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4117117 (10Marostegui) Oh, I didn't know there were two of them broken. Let me check [14:59:19] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4117118 (10Marostegui) @Papaul disk #4 [15:00:43] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4117126 (10Joe) [15:01:20] (03Draft2) 10محمد شعیب: Fixing namespace name in Urdu wiktionary. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425062 [15:05:23] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4117150 (10Papaul) a:05Papaul>03Marostegui complete [15:08:32] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4117168 (10Marostegui) Thanks ``` physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 600 GB, Rebuilding) ``` Once this is completed we can replace #1 [15:11:46] (03PS5) 10Mforns: Modify eventlogging purging script to read from YAML whitelist [puppet] - 10https://gerrit.wikimedia.org/r/420685 (https://phabricator.wikimedia.org/T189692) [15:12:27] (03CR) 10Mforns: "Thanks for the comments! See changes and responses :]" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/420685 (https://phabricator.wikimedia.org/T189692) (owner: 10Mforns) [15:17:15] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4117194 (10mbaluta) >>! In T187014#4111935, @ema wrote: >>>! In T187014#4111691, @mbaluta wrote: >> If you provided IP address of our server,... [15:18:34] (03PS2) 10Ayounsi: Logstash: Add initial network syslog parsing [puppet] - 10https://gerrit.wikimedia.org/r/424643 [15:21:06] (03PS1) 10Elukey: profile::kafka::mirror::alerts: skip some topics while checking lag [puppet] - 10https://gerrit.wikimedia.org/r/425073 [15:24:58] (03CR) 10Ayounsi: [C: 032] Logstash: Add initial network syslog parsing [puppet] - 10https://gerrit.wikimedia.org/r/424643 (owner: 10Ayounsi) [15:25:05] (03CR) 10Rduran: ">" (031 comment) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/420746 (owner: 10Rduran) [15:27:59] (03PS1) 10Herron: admin: add common approval guidelines to group descriptions [puppet] - 10https://gerrit.wikimedia.org/r/425074 [15:30:44] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [15:31:05] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - CRITICAL - logstash-json-udp_11514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-log4j_4560: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-json-tcp_11514: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-syslog-tcp_10514: Servers logstash1008.eqiad.wmnet are marked down but poo [15:31:05] g-udp_10514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled [15:31:34] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - CRITICAL - logstash-json-udp_11514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-log4j_4560: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-json-tcp_11514: Servers logstash1008.eqiad.wmnet are marked down but pooled: logstash-syslog-tcp_10514: Servers logstash1008.eqiad.wmnet are marked down but poo [15:31:34] g-udp_10514_udp: Servers logstash1008.eqiad.wmnet are marked down but pooled [15:32:05] (03PS1) 10Marostegui: db-eqiad.php: Restore db1080 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425075 [15:32:18] gehel: --^ [15:32:34] elukey: thanks, looking [15:32:43] lemme know if you need help [15:33:04] 10Operations, 10Traffic, 10Patch-For-Review: Renew unified certificates 2017 - https://phabricator.wikimedia.org/T178173#4117269 (10RobH) [15:33:52] strange, logstash1008 looks ok... [15:34:11] gehel, elukey: probably due to XioNoX merging https://gerrit.wikimedia.org/r/424643 ? [15:34:39] hum [15:34:50] most likely yeah [15:35:02] yep, logstash just restarted [15:35:47] do I need to revert? [15:35:49] so transient error... [15:35:54] ah [15:36:27] my guess is that the check just failed during the restart [15:37:07] gehel: my patch is not doing the expected result neither, for example there are no more logs with "program:RT_FLOW" showing up... [15:37:14] (03PS11) 10Rduran: Create tests skeleton [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/420746 [15:37:16] (03PS11) 10Rduran: Refactor and test the main OSC run method [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/421340 [15:37:36] correction, logstash is failing to restart... XioNoX could you please revert? [15:37:41] yup [15:37:45] thanks! [15:37:51] (03PS1) 10Ayounsi: Revert "Logstash: Add initial network syslog parsing" [puppet] - 10https://gerrit.wikimedia.org/r/425077 [15:38:08] (03CR) 10Ayounsi: [V: 032 C: 032] Revert "Logstash: Add initial network syslog parsing" [puppet] - 10https://gerrit.wikimedia.org/r/425077 (owner: 10Ayounsi) [15:38:29] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/10869/einsteinium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/425073 (owner: 10Elukey) [15:38:48] gehel: done [15:38:59] icinga is not detecting the failure (yet), we should add a few checks... [15:40:25] (03PS4) 10Rduran: Make WMFMariaDB.py flake8 compliant [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/424558 [15:40:31] gehel: and after running puppet I did the following and thought everything was fine [15:40:36] https://www.irccloud.com/pastebin/Oc5wsZzn/ [15:40:44] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [15:41:04] this one is weird ^ [15:41:57] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1080 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425075 (owner: 10Marostegui) [15:42:35] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [15:43:20] (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1080 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425075 (owner: 10Marostegui) [15:43:34] (03CR) 10jenkins-bot: db-eqiad.php: Restore db1080 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425075 (owner: 10Marostegui) [15:43:36] ok, I see the error in the logs [15:44:06] dcausse: we do have an unwanted dependency from logstash to the cirrus elasticsearch cluster (api feature usage), but we should not have the opposite... [15:44:06] dcausse, gehel, is there a way I can do a syntax check or more of my config file before commiting it? [15:44:14] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [15:44:26] XioNoX: yep, logstash1008 looks good again [15:44:46] 10Operations, 10Goal, 10HHVM: Complete the use of HHVM over Zend PHP on the Wikimedia cluster - https://phabricator.wikimedia.org/T86081#4117294 (10Krinkle) [15:44:49] 10Operations, 10Deployments, 10Beta-Cluster-reproducible, 10HHVM, and 2 others: Switch mwscript from Zend PHP5 to default php alternative (e.g. HHVM or PHP7) - https://phabricator.wikimedia.org/T146285#4117288 (10Krinkle) 05Open>03Resolved a:03fgiunchedi Per {ff4db0c87156035d79c0378ab8ba0aa2045ecf27}. [15:44:50] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore original weight for db1080 after alter table, kernel and mariadb upgrade (duration: 00m 59s) [15:44:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:22] (03PS3) 10Rduran: Add tests for the argument parsing [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/424560 [15:45:58] XioNoX: I don't think so... applying the change to a single node, restarting logstash and watching the logs is probably the best you can do [15:46:09] dcausse might know more [15:46:14] ok [15:46:18] (03CR) 10Krinkle: "More about ICU upgrade at T189295." [puppet] - 10https://gerrit.wikimedia.org/r/425026 (owner: 10Giuseppe Lavagetto) [15:47:01] XioNoX: not sure you can do that unless we have logstash in deployment-prep [15:48:08] hey, I forgot the last closing bracket :) [15:48:20] (03PS2) 10Elukey: profile::kafka::mirror::alerts: skip some topics while checking lag [puppet] - 10https://gerrit.wikimedia.org/r/425073 [15:49:09] (03CR) 10Elukey: [C: 032] profile::kafka::mirror::alerts: skip some topics while checking lag [puppet] - 10https://gerrit.wikimedia.org/r/425073 (owner: 10Elukey) [15:49:20] actually, we do have logstash in deployment-prep! [15:49:24] XioNoX: ^ [15:49:25] there is deployment-logstash2 [15:49:27] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4117317 (10Krinkle) [15:50:11] we don't want to send the same juniper traffic to deployment-prep, but as a smoke test, it is a good idea! [15:50:38] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4117321 (10Joe) [15:51:55] this is what happened to logstash from pybal POV: https://grafana.wikimedia.org/dashboard/db/pybal?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-server=lvs1003&var-service=logstash-json-udp_11514_udp&from=1523283996737&to=1523288938779 [15:53:08] yeah, at least to know if everything is going to meltdown or not because of a typo [15:53:09] only one server has been depooled (because of the depooling threshold) although both logstash1007 and 1008 went down (1009 was still fine, perhaps puppet hadn't run there yet?) [15:53:16] (03PS1) 10Herron: puppetmaster: repool rhodium [puppet] - 10https://gerrit.wikimedia.org/r/425078 [15:53:43] most likely, yeah [15:57:20] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey, 10User-notice: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4037934 (10Krinkle) >>! In T189295#4116309, @gerritbot wrote: > Change 425027 had a related patch set uploaded (by Giuseppe Lavagetto;... [15:59:16] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10PHP 7.0 support, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3623028 (10Krinkle) [16:00:04] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10PHP 7.0 support, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4117351 (10Krinkle) [16:00:11] dcausse: to use deployment-logstash2, should I cherry-pick my change on the local puppetmaster or manually do my tests on deployment-logstash2 ? [16:01:21] XioNoX: I think so yes, but I've never done that myself so better to double check with people doing this kind of puppet tests :) [16:01:32] okay! [16:09:29] (03CR) 10Hashar: Cumin masters in WMCS: upgrade to python3 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) (owner: 10Volans) [16:12:36] XioNoX: yep, cherry-picking on deployment-puppetmaster02.deployment-prep.eqiad.wmflabs is the way to go [16:13:58] will do! [16:24:33] (03PS5) 10Volans: Cumin masters in WMCS: upgrade to python3 [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) [16:24:35] (03PS6) 10Volans: Cumin masters in prod: upgrade to python3 [puppet] - 10https://gerrit.wikimedia.org/r/412894 (https://phabricator.wikimedia.org/T187773) [16:24:43] (03CR) 10jerkins-bot: [V: 04-1] Cumin masters in WMCS: upgrade to python3 [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) (owner: 10Volans) [16:24:47] (03CR) 10Volans: "@hashar: now it should get them from backport, I'm checking the compiler right now" [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) (owner: 10Volans) [16:24:53] (03PS1) 10Jcrespo: dbstore: Reenable alerts for dbstore1001 after reset [puppet] - 10https://gerrit.wikimedia.org/r/425086 (https://phabricator.wikimedia.org/T186596) [16:24:54] (03PS1) 10Jcrespo: mariadb: migrate sanitarium to role/profile and abstract instances [puppet] - 10https://gerrit.wikimedia.org/r/425087 (https://phabricator.wikimedia.org/T190704) [16:27:01] (03PS6) 10Volans: Cumin masters in WMCS: upgrade to python3 [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) [16:27:03] (03PS7) 10Volans: Cumin masters in prod: upgrade to python3 [puppet] - 10https://gerrit.wikimedia.org/r/412894 (https://phabricator.wikimedia.org/T187773) [16:31:35] (03CR) 10Volans: "Compiler results: https://puppet-compiler.wmflabs.org/compiler02/10870/labpuppetmaster1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) (owner: 10Volans) [16:45:30] (03PS1) 10Ayounsi: Logstash: Add initial network syslog parsing [puppet] - 10https://gerrit.wikimedia.org/r/425090 [16:46:19] 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: grant thcipriani RelEng root on contint1001 - https://phabricator.wikimedia.org/T191453#4105919 (10herron) This was approved at the Monday SRE meeting so I'll work on creating a patch now [16:46:55] 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: grant thcipriani RelEng root on contint1001 - https://phabricator.wikimedia.org/T191453#4117565 (10herron) [16:50:57] 10Operations, 10Operations-Software-Development, 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review: New tool to track package updates/status for hosts and images (debmonitor) - https://phabricator.wikimedia.org/T167504#4117582 (10Volans) a:03Volans [16:52:49] !log bblack@neodymium conftool action : set/pooled=no; selector: name=cp2006.codfw.wmnet [16:52:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:00] !log bblack@neodymium conftool action : set/pooled=no; selector: name=cp2010.codfw.wmnet [16:53:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:11] !log bblack@neodymium conftool action : set/pooled=no; selector: name=cp2017.codfw.wmnet [16:53:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:58] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4117606 (10ema) >>! In T187014#4116636, @Ottomata wrote: > ...ehhh wha? We used to collect XFF on the webrequest side, and then parse it to... [16:55:55] (03PS1) 10Sbisson: Allow for different storage_id in kartotherian and tilerator [puppet] - 10https://gerrit.wikimedia.org/r/425092 (https://phabricator.wikimedia.org/T191655) [17:00:08] 10Operations, 10Ops-Access-Requests: Access to the deployment hosts for Imarlier - https://phabricator.wikimedia.org/T191704#4117650 (10herron) This was approved by the SRE team during todays meeting (to avoid waiting a week until next meeting) on the condition that release engineering also approves. [17:02:44] dcausse, gehel, maybe add a Icinga alert that goes off if it sees `[ERROR]` in logstash's logs? [17:03:20] XioNoX: just checking that logstash is listening to the appropriate ports... [17:03:55] indeed [17:05:28] (03PS2) 10Ayounsi: Logstash: Add initial network syslog parsing [puppet] - 10https://gerrit.wikimedia.org/r/425090 [17:05:41] (03PS1) 10Herron: admin: add thcipriani to contint-roots group [puppet] - 10https://gerrit.wikimedia.org/r/425094 (https://phabricator.wikimedia.org/T191453) [17:06:07] gehel: also some of the logstash LVS services have essentially-noop monitors (runcommand /bin/true) [17:06:18] gehel: my code works on deployment-logstash, going to push it [17:06:25] (03CR) 10Herron: [C: 032] admin: add thcipriani to contint-roots group [puppet] - 10https://gerrit.wikimedia.org/r/425094 (https://phabricator.wikimedia.org/T191453) (owner: 10Herron) [17:06:34] ema: probably for the UDP endpoints? [17:06:37] ema: that's because pybal can't monitor udp [17:06:53] XioNoX: it couldn't, before vgutierrez added the feature :) [17:07:03] how does it work? [17:07:03] Oh, interesting! [17:07:23] 10Operations, 10Ops-Access-Requests: Requesting access to shell (snapshot, dumpsdata) for springle - https://phabricator.wikimedia.org/T191478#4107106 (10RStallman-legalteam) Sean's NDA is signed and filed with legal. Thanks! [17:08:17] (03PS3) 10Ayounsi: Logstash: Add initial network syslog parsing [puppet] - 10https://gerrit.wikimedia.org/r/425090 [17:08:21] wait what [17:08:45] wow, springle is back? that's cool :) [17:08:48] XioNoX: https://github.com/wikimedia/PyBal/commit/d3358e9d7e36cd9b7872bd94b83864ac271bd804 [17:09:47] that's smart [17:10:56] (03CR) 10Ayounsi: [C: 032] Logstash: Add initial network syslog parsing [puppet] - 10https://gerrit.wikimedia.org/r/425090 (owner: 10Ayounsi) [17:10:58] (03PS2) 10Jcrespo: mariadb: migrate sanitarium to role/profile and abstract instances [puppet] - 10https://gerrit.wikimedia.org/r/425087 (https://phabricator.wikimedia.org/T190704) [17:11:21] (03PS1) 10Bstorm: wiki replicas: depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/425095 (https://phabricator.wikimedia.org/T181650) [17:11:48] 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: grant thcipriani RelEng root on contint1001 - https://phabricator.wikimedia.org/T191453#4117687 (10herron) 05Open>03Resolved a:03herron @thcipriani is now a member of `contint-roots` on `contint1... [17:11:58] (03PS2) 10Bstorm: wiki replicas: depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/425095 (https://phabricator.wikimedia.org/T181650) [17:12:07] 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: grant thcipriani RelEng root on contint1001 - https://phabricator.wikimedia.org/T191453#4117690 (10herron) [17:12:13] There seem to be an issue with the new WDQS gui version, I'm holding the deployment to prod until this is cleared (cc SMalyshev) [17:12:14] 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: grant thcipriani RelEng root on contint1001 - https://phabricator.wikimedia.org/T191453#4105919 (10herron) [17:13:57] (03CR) 10Marostegui: "> Main test build succeeded." [puppet] - 10https://gerrit.wikimedia.org/r/425095 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:14:30] 10Operations, 10Ops-Access-Requests, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: grant thcipriani RelEng root on contint1001 - https://phabricator.wikimedia.org/T191453#4117708 (10thcipriani) Looks to be working, thanks @herron! [17:18:26] gehel, dcausse, what does "no cached mapping for this field" mean in kibana? [17:19:06] "refresh fields list from the management > index patterns page" [17:19:57] I want to hit that refresh button but probably better to check with you if it's fine first :) [17:21:13] (03PS1) 10Ladsgroup: mediawiki: increase speed of deleteAutoPatrolLogs in wikidatawiki [puppet] - 10https://gerrit.wikimedia.org/r/425098 [17:21:21] XioNoX: no clue, I bet kibana tries to keep a cache of the mapping, since you added new fields the cache is then invalid [17:21:31] I'd hit refresh [17:21:53] dcausse: "This will reset the field popularity counters. Are you sure you want to refresh your fields?" [17:24:19] XioNoX: well... I don't know :), I hope it's just used to better rank some dropdowns menus in kibana [17:24:33] that would be my guess [17:25:15] "Reloading the index fields list also resets Kibana’s popularity counters for the fields. The popularity counters keep track of the fields you’ve used most often within Kibana and are used to sort fields within lists. [17:25:18] dcausse: ^ [17:25:31] ok :) [17:25:35] from https://www.elastic.co/guide/en/kibana/4.0/settings.html [17:25:43] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4117734 (10BBlack) Right. There was a time in the past when Zerowiki definitely provided some useful data on OperaMini (and also Nokia?) pro... [17:25:44] sounds safe I guess [17:26:48] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4117741 (10BBlack) Ping @DFoy - might know better about when OperaMini proxy data dropped from the Zero data, I don't have any good insight i... [17:27:37] (03PS2) 10Ladsgroup: mediawiki: increase speed of deleteAutoPatrolLogs in wikidatawiki [puppet] - 10https://gerrit.wikimedia.org/r/425098 [17:28:03] done [17:28:54] (03CR) 10Bstorm: "Sounds good!" [puppet] - 10https://gerrit.wikimedia.org/r/425095 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:36:30] dcausse, gehel, also fyi: https://koendc.github.io/2013/10/11/logstash-test-configuration.html [17:36:59] XioNoX: nice! [17:37:40] that seems to be the first and only blogpost he wrote :) https://koendc.github.io/ [17:40:06] (03PS16) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [17:58:03] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4117864 (10Marostegui) a:05Marostegui>03Papaul @Papaul please change disk #1 as disk #4 got rebuilt finely ``` logicaldrive 1 (3.3 TB, RAID 1+0, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay... [17:58:08] JOIN #wikimedia-dev [17:58:10] oops [17:58:58] !log shutting down cp2022 for memory replacement [17:59:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:53] wdqs issues resolved, resuming deployment [18:01:52] !log gehel@tin Started deploy [wdqs/wdqs@7116a56]: new GUI version [18:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:03] (03PS3) 10Dzahn: installserver: convert tftp role to profile [puppet] - 10https://gerrit.wikimedia.org/r/423787 [18:04:03] !log gehel@tin Finished deploy [wdqs/wdqs@7116a56]: new GUI version (duration: 02m 11s) [18:04:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:41] SMalyshev: deployment completed, tests are green, feel free to check! [18:05:07] (03PS1) 10Dduvall: ci: Host helm charts at integration.wikimedia.org/charts [puppet] - 10https://gerrit.wikimedia.org/r/425105 (https://phabricator.wikimedia.org/T191821) [18:06:12] PROBLEM - Host cp2022.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [18:06:20] gehel, dcausse, have you experimented with https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html in the past? [18:11:29] (03PS4) 10Dzahn: installserver: convert tftp role to profile [puppet] - 10https://gerrit.wikimedia.org/r/423787 [18:11:49] Jouncebot seems down, but it's SWAT time. [18:12:04] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-codfw.wikimedia.org recovered from Primary outbound port utilisation over 80% [18:12:17] Nothing has been scheduled in the deployments calendar. [18:12:59] (03PS5) 10Dzahn: installserver: convert tftp role to profile [puppet] - 10https://gerrit.wikimedia.org/r/423787 [18:13:58] (03CR) 10Dzahn: [C: 032] installserver: convert tftp role to profile [puppet] - 10https://gerrit.wikimedia.org/r/423787 (owner: 10Dzahn) [18:14:00] (03CR) 10Dereckson: "Thanks for this change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425062 (owner: 10محمد شعیب) [18:14:13] RECOVERY - IPsec on cp5001 is OK: Strongswan OK - 66 ESP OK [18:14:22] RECOVERY - IPsec on cp1050 is OK: Strongswan OK - 66 ESP OK [18:14:22] RECOVERY - IPsec on cp1048 is OK: Strongswan OK - 66 ESP OK [18:14:22] RECOVERY - IPsec on cp3039 is OK: Strongswan OK - 66 ESP OK [18:14:22] RECOVERY - Host cp2022.mgmt is UP: PING OK - Packet loss = 0%, RTA = 37.29 ms [18:14:22] RECOVERY - Host cp2022 is UP: PING OK - Packet loss = 0%, RTA = 36.09 ms [18:14:22] RECOVERY - IPsec on kafka-jumbo1005 is OK: Strongswan OK - 136 ESP OK [18:14:23] RECOVERY - IPsec on kafka1020 is OK: Strongswan OK - 136 ESP OK [18:14:23] RECOVERY - IPsec on kafka1014 is OK: Strongswan OK - 136 ESP OK [18:14:32] RECOVERY - IPsec on cp3034 is OK: Strongswan OK - 66 ESP OK [18:14:32] RECOVERY - IPsec on cp3035 is OK: Strongswan OK - 66 ESP OK [18:14:32] RECOVERY - IPsec on cp3046 is OK: Strongswan OK - 66 ESP OK [18:14:32] RECOVERY - IPsec on cp3038 is OK: Strongswan OK - 66 ESP OK [18:14:32] RECOVERY - IPsec on cp3037 is OK: Strongswan OK - 66 ESP OK [18:14:32] RECOVERY - IPsec on cp3047 is OK: Strongswan OK - 66 ESP OK [18:14:32] RECOVERY - IPsec on kafka1012 is OK: Strongswan OK - 136 ESP OK [18:14:33] RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 66 ESP OK [18:14:33] RECOVERY - IPsec on kafka1023 is OK: Strongswan OK - 136 ESP OK [18:14:34] RECOVERY - IPsec on cp4026 is OK: Strongswan OK - 66 ESP OK [18:14:34] RECOVERY - IPsec on cp3049 is OK: Strongswan OK - 66 ESP OK [18:14:42] RECOVERY - IPsec on kafka-jumbo1003 is OK: Strongswan OK - 136 ESP OK [18:14:42] RECOVERY - IPsec on cp1074 is OK: Strongswan OK - 66 ESP OK [18:14:42] RECOVERY - IPsec on cp1062 is OK: Strongswan OK - 66 ESP OK [18:14:43] RECOVERY - IPsec on cp3048 is OK: Strongswan OK - 66 ESP OK [18:14:43] RECOVERY - IPsec on cp3045 is OK: Strongswan OK - 66 ESP OK [18:14:43] RECOVERY - IPsec on cp3044 is OK: Strongswan OK - 66 ESP OK [18:14:52] RECOVERY - IPsec on kafka-jumbo1001 is OK: Strongswan OK - 136 ESP OK [18:14:52] RECOVERY - IPsec on cp5003 is OK: Strongswan OK - 66 ESP OK [18:14:52] RECOVERY - IPsec on cp1064 is OK: Strongswan OK - 66 ESP OK [18:15:02] RECOVERY - IPsec on cp1063 is OK: Strongswan OK - 66 ESP OK [18:15:02] RECOVERY - IPsec on cp1049 is OK: Strongswan OK - 66 ESP OK [18:15:02] RECOVERY - IPsec on cp1072 is OK: Strongswan OK - 66 ESP OK [18:15:02] RECOVERY - IPsec on kafka-jumbo1006 is OK: Strongswan OK - 136 ESP OK [18:15:02] RECOVERY - IPsec on cp5005 is OK: Strongswan OK - 66 ESP OK [18:15:02] RECOVERY - IPsec on cp1099 is OK: Strongswan OK - 66 ESP OK [18:15:02] RECOVERY - IPsec on kafka-jumbo1004 is OK: Strongswan OK - 136 ESP OK [18:15:03] RECOVERY - IPsec on kafka1022 is OK: Strongswan OK - 136 ESP OK [18:15:03] RECOVERY - IPsec on cp4024 is OK: Strongswan OK - 66 ESP OK [18:15:04] RECOVERY - IPsec on cp4025 is OK: Strongswan OK - 66 ESP OK [18:15:04] RECOVERY - IPsec on cp4023 is OK: Strongswan OK - 66 ESP OK [18:15:05] RECOVERY - IPsec on cp4021 is OK: Strongswan OK - 66 ESP OK [18:15:05] RECOVERY - IPsec on cp4022 is OK: Strongswan OK - 66 ESP OK [18:15:06] RECOVERY - IPsec on kafka-jumbo1002 is OK: Strongswan OK - 136 ESP OK [18:16:09] kiwi_0x010C: ping? [18:16:42] PROBLEM - Host cp2022 is DOWN: PING CRITICAL - Packet loss = 100% [18:16:52] PROBLEM - Host cp2022.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [18:17:42] PROBLEM - puppet last run on bast5001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:17:59] (03PS1) 10Dzahn: bastionhost::pop: use profile instead of role for TFTP [puppet] - 10https://gerrit.wikimedia.org/r/425109 [18:18:22] (03CR) 10Dzahn: [C: 032] bastionhost::pop: use profile instead of role for TFTP [puppet] - 10https://gerrit.wikimedia.org/r/425109 (owner: 10Dzahn) [18:18:54] (03CR) 10Dereckson: [C: 031] "This change is ready to land and can be included in a SWAT deployment window." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425009 (https://phabricator.wikimedia.org/T169741) (owner: 100x010C) [18:20:27] (03CR) 10Dereckson: [C: 04-1] "Description should be amended too, or perhaps the commit restricted: you've here both your namespace changes and some names changes for Ur" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425062 (owner: 10محمد شعیب) [18:21:07] tgr: ping? [18:21:22] PROBLEM - IPsec on cp1050 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:22] PROBLEM - IPsec on cp5001 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:22] PROBLEM - IPsec on cp1048 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:23] PROBLEM - IPsec on cp3039 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:23] PROBLEM - IPsec on kafka-jumbo1005 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp2022_v4, cp2022_v6 [18:21:32] PROBLEM - IPsec on kafka1020 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp2022_v4, cp2022_v6 [18:21:32] PROBLEM - IPsec on kafka1014 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp2022_v4, cp2022_v6 [18:21:32] PROBLEM - IPsec on cp3034 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:33] PROBLEM - IPsec on cp3035 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:33] PROBLEM - IPsec on kafka1012 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp2022_v4, cp2022_v6 [18:21:33] PROBLEM - IPsec on cp3046 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:33] PROBLEM - IPsec on cp3038 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:34] PROBLEM - IPsec on cp3037 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:34] PROBLEM - IPsec on cp3047 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:35] PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:35] PROBLEM - IPsec on kafka1023 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp2022_v4, cp2022_v6 [18:21:42] PROBLEM - IPsec on cp4026 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:42] PROBLEM - IPsec on cp3049 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:43] PROBLEM - IPsec on kafka-jumbo1003 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp2022_v4, cp2022_v6 [18:21:43] PROBLEM - IPsec on cp1074 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:52] PROBLEM - IPsec on cp1062 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:52] PROBLEM - IPsec on cp3048 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:52] PROBLEM - IPsec on cp3045 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:52] PROBLEM - IPsec on cp3044 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:53] PROBLEM - IPsec on kafka-jumbo1001 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp2022_v4, cp2022_v6 [18:21:53] PROBLEM - IPsec on cp1064 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:21:55] !log shutting down cp2006 for memory replacement [18:21:57] yoohoo [18:22:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:02] PROBLEM - IPsec on cp5003 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:02] PROBLEM - IPsec on cp1063 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:02] PROBLEM - IPsec on cp1049 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:02] PROBLEM - IPsec on cp1072 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:02] PROBLEM - IPsec on kafka-jumbo1006 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp2022_v4, cp2022_v6 [18:22:03] PROBLEM - IPsec on cp1099 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:03] PROBLEM - IPsec on kafka-jumbo1004 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp2022_v4, cp2022_v6 [18:22:04] RECOVERY - Host cp2022.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.76 ms [18:22:04] PROBLEM - IPsec on cp5005 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:12] PROBLEM - IPsec on kafka1022 is CRITICAL: Strongswan CRITICAL - ok: 132 not-conn: cp2006_v4, cp2006_v6, cp2022_v4, cp2022_v6 [18:22:12] PROBLEM - IPsec on kafka-jumbo1002 is CRITICAL: Strongswan CRITICAL - ok: 132 not-conn: cp2006_v4, cp2006_v6, cp2022_v4, cp2022_v6 [18:22:12] PROBLEM - IPsec on cp4024 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:12] PROBLEM - IPsec on cp4025 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:12] PROBLEM - IPsec on cp4021 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:12] PROBLEM - IPsec on cp4022 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:13] PROBLEM - IPsec on cp4023 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp2022_v4, cp2022_v6 [18:22:43] RECOVERY - puppet last run on bast5001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [18:23:12] PROBLEM - Host cp2006 is DOWN: PING CRITICAL - Packet loss = 100% [18:24:00] (03CR) 10Dereckson: [C: 031] "Dependant change has been merged, and will be train-deployed as part of the 1.31.0-wmf.29 weekly branch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424974 (https://phabricator.wikimedia.org/T178349) (owner: 10Huji) [18:24:21] omg how much messages by icinga.. I currently clone https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions so I can work on much extensions in same time. Are problems which icinga say related to slow speed of checkout? With my internet is all ok [18:24:54] Zoranzoki21: I don't think they are related [18:26:05] Gerrit is installed on Cobalt, a standalone Gerrit host [18:26:18] Dereckson: Ok, thank you for reply [18:26:30] It's expected to clone all mediawiki/extensions will be very very very slow [18:27:24] You've to 980 repos [18:27:30] Dereckson: He clone one extension one by one [18:27:31] (not including mediawiki/extensions itself) [18:27:42] PROBLEM - Host cp2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [18:28:12] PROBLEM - IPsec on cp1045 is CRITICAL: Strongswan CRITICAL - ok: 12 not-conn: cp2006_v4, cp2006_v6 [18:28:22] RECOVERY - Host cp2006.mgmt is UP: PING WARNING - Packet loss = 64%, RTA = 36.99 ms [18:28:32] PROBLEM - IPsec on cp3010 is CRITICAL: Strongswan CRITICAL - ok: 38 not-conn: cp2006_v4, cp2006_v6 [18:28:32] PROBLEM - IPsec on cp1058 is CRITICAL: Strongswan CRITICAL - ok: 12 not-conn: cp2006_v4, cp2006_v6 [18:28:41] Dereckson: He clone one extension one by one. Repository which have about 500 kilobytes take up about 15 seconds. Usually it takes me 5 seconds or less. [18:28:43] (03PS2) 10Awight: Install git-lfs on scap source and target [puppet] - 10https://gerrit.wikimedia.org/r/420409 (https://phabricator.wikimedia.org/T180628) [18:28:52] PROBLEM - IPsec on cp1061 is CRITICAL: Strongswan CRITICAL - ok: 12 not-conn: cp2006_v4, cp2006_v6 [18:28:53] PROBLEM - IPsec on cp3008 is CRITICAL: Strongswan CRITICAL - ok: 38 not-conn: cp2006_v4, cp2006_v6 [18:28:53] PROBLEM - IPsec on cp3007 is CRITICAL: Strongswan CRITICAL - ok: 38 not-conn: cp2006_v4, cp2006_v6 [18:28:53] but, maybe is normal behaviour [18:29:03] PROBLEM - IPsec on cp1051 is CRITICAL: Strongswan CRITICAL - ok: 12 not-conn: cp2006_v4, cp2006_v6 [18:29:48] musikanimal: ping for 424709? [18:30:39] hey! I hadn't scheduled that for a deploy, did I? But we can do it now if you want, easy stuff [18:30:56] (03PS2) 10Dereckson: Enable PageAssessments on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424709 (https://phabricator.wikimedia.org/T185023) (owner: 10MusikAnimal) [18:31:08] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424709 (https://phabricator.wikimedia.org/T185023) (owner: 10MusikAnimal) [18:31:33] Okay [18:31:44] jouncebot2: restart jouncebot [18:32:29] musikanimal: you can add it to https://wikitech.wikimedia.org/wiki/Deployments#Week_of_April_9th [18:32:34] (03Merged) 10jenkins-bot: Enable PageAssessments on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424709 (https://phabricator.wikimedia.org/T185023) (owner: 10MusikAnimal) [18:32:49] (03CR) 10jenkins-bot: Enable PageAssessments on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424709 (https://phabricator.wikimedia.org/T185023) (owner: 10MusikAnimal) [18:32:51] musikanimal: live on mwdebug1002.eqiad.wmnet [18:35:33] (03CR) 10Dereckson: "Should we merge this now or wait https://gerrit.wikimedia.org/r/#/c/201104/ first?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423660 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [18:35:36] I see https://ar.wikipedia.org/wiki/%D8%AE%D8%A7%D8%B5:PageAssessments now exists but the tables haven't been created [18:35:51] let's create the tables [18:35:54] and for some f-ing reason I can't get into terbium right now [18:36:33] RECOVERY - IPsec on cp3010 is OK: Strongswan OK - 40 ESP OK [18:36:42] RECOVERY - Host cp2006 is UP: PING OK - Packet loss = 0%, RTA = 36.01 ms [18:36:49] There are well in createExtensionTables.php [18:36:52] RECOVERY - IPsec on cp1061 is OK: Strongswan OK - 14 ESP OK [18:37:02] RECOVERY - IPsec on cp3008 is OK: Strongswan OK - 40 ESP OK [18:37:02] RECOVERY - IPsec on cp3007 is OK: Strongswan OK - 40 ESP OK [18:37:02] RECOVERY - IPsec on cp1051 is OK: Strongswan OK - 14 ESP OK [18:37:12] terbium is up and running [18:37:12] RECOVERY - IPsec on cp1045 is OK: Strongswan OK - 14 ESP OK [18:37:17] jouncebot: reload [18:37:22] jouncebot: refresh [18:37:23] musikanimal: tables created [18:37:23] I refreshed my knowledge about deployments. [18:37:24] !log shutting down cp2010 for memory replacement [18:37:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:33] RECOVERY - IPsec on cp1058 is OK: Strongswan OK - 14 ESP OK [18:37:37] sweet, I see there are no errors now on-wiki [18:37:51] that should do it, the data isn't there yet because they haven't added the parser function [18:37:58] so the page is empty as it should be [18:38:01] !Log Create PageAssessments tables on ar.wikipedia (T185023) [18:38:01] T185023: Deploy PageAssessments to Arabic Wikipedia - https://phabricator.wikimedia.org/T185023 [18:38:24] thank you! [18:38:52] PROBLEM - Host cp2010 is DOWN: PING CRITICAL - Packet loss = 100% [18:39:19] 10Operations, 10ops-codfw, 10Traffic: cp2022 memory replacement - https://phabricator.wikimedia.org/T191229#4117930 (10Papaul) DIMM 6 replaced DIMM 3 = bad DIMM sent from DELL need replacement again Fan #5 replaced [18:39:50] Dereckson: o/ [18:40:22] 10Operations, 10ops-codfw, 10Traffic: cp2006 memory replacement - https://phabricator.wikimedia.org/T191223#4117932 (10Papaul) DIMM B2 replaced DIMM B6 replaced Server is back up [18:40:43] PROBLEM - puppet last run on cp2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:41:23] tgr: hi :) We've some free slots for SWAT and I was considering 424974. But it seems to require 1.31.0-wmf.29 first isn't it? [18:42:01] Dereckson: cloning work now without problems and with standard speed [18:42:22] PROBLEM - Host cp2010.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [18:43:18] `Could not resolve hostname terbium.eqiad.wmnet: nodename nor servname provided, or not known` [18:43:42] PROBLEM - IPsec on cp1067 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:43] PROBLEM - IPsec on cp3042 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:52] PROBLEM - IPsec on cp5007 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:52] PROBLEM - IPsec on cp1068 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:52] PROBLEM - IPsec on cp5009 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:52] PROBLEM - IPsec on cp5011 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:53] PROBLEM - IPsec on cp4028 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:53] PROBLEM - IPsec on cp4030 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:53] PROBLEM - IPsec on cp3030 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:54] PROBLEM - IPsec on cp3033 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:43:54] PROBLEM - IPsec on cp1053 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:03] PROBLEM - IPsec on cp3031 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:03] PROBLEM - IPsec on cp3032 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:12] PROBLEM - IPsec on cp3041 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:12] PROBLEM - IPsec on cp1055 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:13] PROBLEM - IPsec on cp1066 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:13] PROBLEM - IPsec on cp4031 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:13] Dereckson: yeah, that needs to wait for the train [18:44:13] PROBLEM - IPsec on cp1054 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:22] PROBLEM - IPsec on cp4027 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:22] PROBLEM - IPsec on cp4032 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:22] PROBLEM - IPsec on cp4029 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:32] PROBLEM - IPsec on cp1065 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:32] PROBLEM - IPsec on cp1052 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:32] PROBLEM - IPsec on cp3040 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:32] PROBLEM - IPsec on cp3043 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:42] PROBLEM - IPsec on cp5012 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:42] PROBLEM - IPsec on cp5008 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:42] PROBLEM - IPsec on cp5010 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2010_v4, cp2010_v6 [18:44:42] RECOVERY - Host cp2010 is UP: PING WARNING - Packet loss = 28%, RTA = 45.80 ms [18:44:52] RECOVERY - IPsec on cp3042 is OK: Strongswan OK - 56 ESP OK [18:44:52] RECOVERY - IPsec on cp1068 is OK: Strongswan OK - 56 ESP OK [18:44:52] RECOVERY - IPsec on cp5007 is OK: Strongswan OK - 56 ESP OK [18:44:52] RECOVERY - IPsec on cp5009 is OK: Strongswan OK - 56 ESP OK [18:44:52] RECOVERY - IPsec on cp5011 is OK: Strongswan OK - 56 ESP OK [18:44:53] RECOVERY - IPsec on cp4028 is OK: Strongswan OK - 56 ESP OK [18:44:53] RECOVERY - IPsec on cp4030 is OK: Strongswan OK - 56 ESP OK [18:44:53] RECOVERY - IPsec on cp3030 is OK: Strongswan OK - 56 ESP OK [18:44:54] RECOVERY - IPsec on cp3033 is OK: Strongswan OK - 56 ESP OK [18:44:54] RECOVERY - IPsec on cp1053 is OK: Strongswan OK - 56 ESP OK [18:44:58] (03PS3) 10Gergő Tisza: Set $wgPropagateErrors to false in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423338 (https://phabricator.wikimedia.org/T45086) [18:45:03] RECOVERY - IPsec on cp3031 is OK: Strongswan OK - 56 ESP OK [18:45:03] RECOVERY - IPsec on cp3032 is OK: Strongswan OK - 56 ESP OK [18:45:12] RECOVERY - IPsec on cp3041 is OK: Strongswan OK - 56 ESP OK [18:45:12] RECOVERY - IPsec on cp1055 is OK: Strongswan OK - 56 ESP OK [18:45:12] RECOVERY - IPsec on cp1066 is OK: Strongswan OK - 56 ESP OK [18:45:13] RECOVERY - IPsec on cp4031 is OK: Strongswan OK - 56 ESP OK [18:45:13] RECOVERY - IPsec on cp1054 is OK: Strongswan OK - 56 ESP OK [18:45:22] RECOVERY - IPsec on cp4027 is OK: Strongswan OK - 56 ESP OK [18:45:22] RECOVERY - IPsec on cp4032 is OK: Strongswan OK - 56 ESP OK [18:45:23] RECOVERY - IPsec on cp4029 is OK: Strongswan OK - 56 ESP OK [18:45:31] musikanimal: and you've in your SSH configuration something like Host *.eqiad.wmnet -> ProxyCommand ssh -a -W %h:%p bast1002.wikimedia.org ? [18:45:32] RECOVERY - IPsec on cp1065 is OK: Strongswan OK - 56 ESP OK [18:45:32] RECOVERY - IPsec on cp1052 is OK: Strongswan OK - 56 ESP OK [18:45:32] RECOVERY - IPsec on cp3040 is OK: Strongswan OK - 56 ESP OK [18:45:32] RECOVERY - IPsec on cp3043 is OK: Strongswan OK - 56 ESP OK [18:45:42] RECOVERY - IPsec on cp1067 is OK: Strongswan OK - 56 ESP OK [18:45:42] RECOVERY - IPsec on cp5012 is OK: Strongswan OK - 56 ESP OK [18:45:42] RECOVERY - IPsec on cp5008 is OK: Strongswan OK - 56 ESP OK [18:45:42] RECOVERY - IPsec on cp5010 is OK: Strongswan OK - 56 ESP OK [18:45:43] RECOVERY - puppet last run on cp2006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:46:17] Dereckson: it's a bit of a complicated set up, but it's worked fine for 2+ years. Did something change recently? [18:46:22] yes [18:46:25] bast1001 has been decom [18:46:30] bah [18:46:36] 10Operations, 10ops-codfw, 10Traffic: cp2010 memory replacement - https://phabricator.wikimedia.org/T191225#4117960 (10Papaul) DIMM B2 replaced DIMM B6 replaced Server is back up [18:47:32] RECOVERY - Host cp2010.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.97 ms [18:47:36] Current suggested SSH config is at https://wikitech.wikimedia.org/wiki/Production_shell_access#SSH_configuration [18:47:48] thanks [18:48:18] vim ~/.ssh/config [18:48:22] dammit, sorry [18:48:38] (03PS1) 10Gergő Tisza: Enable TemplateStyles on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425112 (https://phabricator.wikimedia.org/T188198) [18:49:07] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4117967 (10awight) As mentioned in my [[ https://phabricator.wikimedia.org... [18:50:57] !log shutting down cp2017 for memory replacement [18:51:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:51:14] Dereckson: I see you created the DB tables for PageAssessments. Is there documentation on how to do that? I was trying to find it but couldn't. [18:51:26] okay I'm back in! thank you [18:52:09] yes, I'd like to know how to create the tables, too (`createExtensionTables.php` I think?) [18:52:35] we're going to be doing a few more identical deploys in the near future, might be a good opportunity for me to get my feet wet with deploying [18:52:51] 10Operations: build new version of mcrouter package - https://phabricator.wikimedia.org/T190979#4117975 (10aaron) [18:53:01] ACKNOWLEDGEMENT - Host cp2022 is DOWN: PING CRITICAL - Packet loss = 100% Brandon Black T191229 [18:53:03] PROBLEM - Host cp2017 is DOWN: PING CRITICAL - Packet loss = 100% [18:54:07] ACKNOWLEDGEMENT - IPsec on cp1048 is CRITICAL: Strongswan CRITICAL - ok: 62 not-conn: cp2017_v4, cp2017_v6, cp2022_v4, cp2022_v6 Brandon Black T191229 [18:54:07] ACKNOWLEDGEMENT - IPsec on cp1049 is CRITICAL: Strongswan CRITICAL - ok: 62 not-conn: cp2017_v4, cp2017_v6, cp2022_v4, cp2022_v6 Brandon Black T191229 [18:54:07] ACKNOWLEDGEMENT - IPsec on cp1050 is CRITICAL: Strongswan CRITICAL - ok: 62 not-conn: cp2017_v4, cp2017_v6, cp2022_v4, cp2022_v6 Brandon Black T191229 [18:54:07] ACKNOWLEDGEMENT - IPsec on cp1062 is CRITICAL: Strongswan CRITICAL - ok: 62 not-conn: cp2017_v4, cp2017_v6, cp2022_v4, cp2022_v6 Brandon Black T191229 [18:54:07] ACKNOWLEDGEMENT - IPsec on cp1063 is CRITICAL: Strongswan CRITICAL - ok: 62 not-conn: cp2017_v4, cp2017_v6, cp2022_v4, cp2022_v6 Brandon Black T191229 [18:54:20] stupid bot :P [18:54:24] anyway, tables are there on s7 (which you already knew) [18:54:42] so we are good to finish the deploy [18:55:10] Niharika: mwscript extensions/WikimediaMaintenance/createExtensionTables.php arwiki pageassessments [18:55:42] Niharika: it's this maintenance script: https://github.com/wikimedia/mediawiki-extensions-WikimediaMaintenance/blob/master/createExtensionTables.php [18:55:53] it contains the list of SQL files for popular extensions [18:56:01] (I hope all we deplo) [18:57:02] musikanimal: ok [18:58:20] 10Operations, 10ops-codfw: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191826#4117977 (10ops-monitoring-bot) [18:58:39] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable PageAssessments on arwiki (T185023) (duration: 01m 00s) [18:58:43] RECOVERY - Host cp2017 is UP: PING WARNING - Packet loss = 86%, RTA = 38.12 ms [18:58:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:45] T185023: Deploy PageAssessments to Arabic Wikipedia - https://phabricator.wikimedia.org/T185023 [18:58:50] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp2010.codfw.wmnet [18:58:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:58] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp2006.codfw.wmnet [18:59:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:59:14] 10Operations, 10ops-codfw, 10Traffic: cp[2006,2008,2010-2011,2017-2018,2022].codfw.wmnet: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T190540#4117984 (10BBlack) [18:59:17] 10Operations, 10ops-codfw, 10Traffic: cp2006 memory replacement - https://phabricator.wikimedia.org/T191223#4117982 (10BBlack) 05Open>03Resolved Re-pooled into service. [18:59:22] RECOVERY - Host cp2017.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.78 ms [18:59:23] 10Operations, 10ops-codfw, 10Traffic: cp[2006,2008,2010-2011,2017-2018,2022].codfw.wmnet: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T190540#4076372 (10BBlack) [18:59:25] 10Operations, 10ops-codfw, 10Traffic: cp2010 memory replacement - https://phabricator.wikimedia.org/T191225#4117985 (10BBlack) 05Open>03Resolved Re-pooled into service. [18:59:37] yay, all is well https://ar.wikipedia.org/wiki/%D8%AE%D8%A7%D8%B5:PageAssessments [18:59:39] thanks! [19:00:45] Niharika: https://wikitech.wikimedia.org/w/index.php?title=SWAT_deploys%2FDeployers&type=revision&diff=1787952&oldid=1786987 [19:01:02] 10Operations, 10ops-codfw, 10Traffic: cp2017 memory replacement - https://phabricator.wikimedia.org/T191227#4118003 (10Papaul) DIMM A2 replaced DIMM A6 replaced DIMM A8 replaced Server is back up [19:02:14] musikanimal: could you add the change to https://wikitech.wikimedia.org/wiki/Deployments#Monday,_April_09 ? [19:02:29] sure [19:02:33] Thanks [19:03:22] 10Operations, 10ops-codfw: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191826#4118006 (10Marostegui) [19:03:25] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4118008 (10Marostegui) [19:04:55] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4114784 (10Marostegui) I see the disk already being rebuilt Thanks @Papaul ``` physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Rebuilding) ``` [19:07:42] RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational [19:12:09] (03PS1) 10BryanDavis: wiki replicas: Display database name in maintain-views logs [puppet] - 10https://gerrit.wikimedia.org/r/425113 [19:12:42] Dereckson: is the slot still available for T169741? [19:12:43] T169741: Show both "edit" and "edit source" tabs/section edit links on the French Wiktionary - https://phabricator.wikimedia.org/T169741 [19:13:40] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4118043 (10Papaul) @Marostegui no problem [19:14:03] kiwi_0x010C: sure [19:15:00] (03PS2) 10Dereckson: Switch SET on frwiktionary to use wikitexteditor by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425009 (https://phabricator.wikimedia.org/T169741) (owner: 100x010C) [19:15:30] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425009 (https://phabricator.wikimedia.org/T169741) (owner: 100x010C) [19:15:37] !log sbisson@tin Started deploy [kartotherian/deploy@a26712b]: Deploying kartotherian i18n to maps-test* (with updated source and style) [19:15:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:47] (03Merged) 10jenkins-bot: Switch SET on frwiktionary to use wikitexteditor by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425009 (https://phabricator.wikimedia.org/T169741) (owner: 100x010C) [19:17:04] kiwi_0x010C: live on mwdebug1002 [19:17:23] !log sbisson@tin Finished deploy [kartotherian/deploy@a26712b]: Deploying kartotherian i18n to maps-test* (with updated source and style) (duration: 01m 46s) [19:17:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:17:45] kiwi_0x010C: https://wikitech.wikimedia.org/wiki/Debugging_in_production you can test your change asking your browser to send request to a specific server [19:18:52] (03CR) 10jenkins-bot: Switch SET on frwiktionary to use wikitexteditor by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/425009 (https://phabricator.wikimedia.org/T169741) (owner: 100x010C) [19:19:02] kiwi_0x010C: the easiest way is to install the Firefox or Chrome extension at https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [19:20:36] Dereckson: Yep, I'll do it [19:22:50] Dereckson: Nice, it works as expected, both as an unregistred user or with my test account [19:24:32] Thanks for testing, syncing [19:24:51] Could you add the change to https://wikitech.wikimedia.org/wiki/Deployments? [19:25:07] sure [19:26:13] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Switch SET on frwiktionary to use wikitexteditor by default (T169741) (duration: 01m 00s) [19:26:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:19] T169741: Show both "edit" and "edit source" tabs/section edit links on the French Wiktionary - https://phabricator.wikimedia.org/T169741 [19:30:50] 10Operations, 10Traffic, 10Performance-Team (Radar): Support brotli compression - https://phabricator.wikimedia.org/T137979#4118113 (10Krinkle) >>! In T137979#2386228, @Krinkle wrote: >>>! At **** in June 2016: >> Global 45%: Firefox 45+, Chrome 50+, Opera 38+, Chrome for And... [19:33:08] (03PS1) 10BBlack: upload: experimental reduction of fb traffic [puppet] - 10https://gerrit.wikimedia.org/r/425119 [19:34:04] Dereckson: change added to the schedule [19:34:12] thanks for the deploy! [19:34:14] (03PS3) 10Awight: [DO NOT MERGE] Update ORES venv path to use versioned cache [puppet] - 10https://gerrit.wikimedia.org/r/392683 (https://phabricator.wikimedia.org/T181071) [19:35:32] (03PS4) 10Paladox: Gerrit: Add url for avatars [puppet] - 10https://gerrit.wikimedia.org/r/424708 (https://phabricator.wikimedia.org/T191183) [19:35:49] (03PS5) 10Paladox: Gerrit: Switch gc back on [puppet] - 10https://gerrit.wikimedia.org/r/421593 (https://phabricator.wikimedia.org/T190045) [19:38:07] You're welcome, thanks for testing [19:38:36] that was quick :P [19:39:38] 10Operations, 10Traffic, 10Performance-Team (Radar): Support brotli compression - https://phabricator.wikimedia.org/T137979#4118200 (10BBlack) The tricky part is this: Varnish does our compressing (which is in this case the right place to be doing it), and it compresses hittable things on their way into cac... [19:42:06] 10Operations, 10Traffic, 10Performance-Team (Radar): Support brotli compression - https://phabricator.wikimedia.org/T137979#4118215 (10BBlack) Re-reading above: probably the better blend of ooptions would be to swap gzip for brotli in Varnish one-for-one (without the whole storing-dual-forms mess) and then h... [19:47:12] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 23 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [19:49:17] (03PS2) 10BBlack: upload: experimental reduction of fb traffic [puppet] - 10https://gerrit.wikimedia.org/r/425119 [19:52:12] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 7 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [19:52:35] (03PS2) 10Herron: puppetmaster: repool rhodium [puppet] - 10https://gerrit.wikimedia.org/r/425078 [19:52:50] (03PS1) 10BryanDavis: Remove 'die' command [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/425120 (https://phabricator.wikimedia.org/T191828) [19:53:19] (03CR) 10Herron: [C: 032] puppetmaster: repool rhodium [puppet] - 10https://gerrit.wikimedia.org/r/425078 (owner: 10Herron) [19:54:06] (03CR) 10BryanDavis: [C: 032] Remove 'die' command [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/425120 (https://phabricator.wikimedia.org/T191828) (owner: 10BryanDavis) [19:54:36] (03Merged) 10jenkins-bot: Remove 'die' command [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/425120 (https://phabricator.wikimedia.org/T191828) (owner: 10BryanDavis) [19:54:55] (03PS3) 10BBlack: upload: experimental reduction of fb traffic [puppet] - 10https://gerrit.wikimedia.org/r/425119 [19:56:45] (03CR) 10BBlack: [C: 032] upload: experimental reduction of fb traffic [puppet] - 10https://gerrit.wikimedia.org/r/425119 (owner: 10BBlack) [19:57:37] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp2017.codfw.wmnet [19:57:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:08] 10Operations, 10ops-codfw, 10Traffic: cp[2006,2008,2010-2011,2017-2018,2022].codfw.wmnet: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T190540#4118259 (10BBlack) [19:58:11] 10Operations, 10ops-codfw, 10Traffic: cp2017 memory replacement - https://phabricator.wikimedia.org/T191227#4118257 (10BBlack) 05Open>03Resolved cp2017 repooled into service [19:58:49] (03PS2) 10Lokal Profil: [WIP]Add versioning for config and validate it [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425065 (https://phabricator.wikimedia.org/T163328) [19:59:50] jouncebot: now [19:59:50] No deployments scheduled for the next 0 hour(s) and 0 minute(s) [19:59:57] jouncebot: next [19:59:58] In 0 hour(s) and 0 minute(s): Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180409T2000) [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: I, the Bot under the Fountain, allow thee, The Deployer, to do Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180409T2000). [20:00:04] No GERRIT patches in the queue for this window AFAICS. [20:00:41] 10Operations, 10Traffic, 10Performance-Team (Radar): Support brotli compression - https://phabricator.wikimedia.org/T137979#4118261 (10Gilles) For WebP [[ https://phabricator.wikimedia.org/T27611#4090235 | my proposed strategy ]] is to only offer the variant to popular thumbnails (eg. more than X hits on the... [20:01:59] !log repooled rhodium (puppet master backend) https://gerrit.wikimedia.org/r/425078 [20:02:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:54] nothing to deploy for mobileapps [20:04:35] ORES is doing a bit of deployment, we’ll notify here if anything looks unusual. [20:08:20] 10Operations, 10ops-codfw: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191836#4118299 (10ops-monitoring-bot) [20:12:21] !log awight@tin Started deploy [ores/deploy@b61c338]: Transitional virtualenv for ORES, T181071 [20:12:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:28] T181071: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071 [20:12:39] !log awight@tin Finished deploy [ores/deploy@b61c338]: Transitional virtualenv for ORES, T181071 (duration: 00m 19s) [20:12:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:38] !log awight@tin Started deploy [ores/deploy@be69c1d]: Transitional virtualenv for ORES, T181071 [20:14:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:42] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4118365 (10Marostegui) @Papaul this disk failed, was it an used one? Can you try another one? ``` physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Failed) ``` Thanks [20:17:18] 10Operations, 10ops-codfw: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191836#4118379 (10Marostegui) [20:17:22] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4118381 (10Marostegui) [20:19:01] 10Operations, 10Patch-For-Review, 10Scoring-platform-team (Current): Remove deprecated hosts from ORES scap config - https://phabricator.wikimedia.org/T191321#4118391 (10awight) This worked. [20:21:21] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4118408 (10Papaul) Ok will do once back at the DC in the AM [20:21:36] !log arlolra@tin Started deploy [parsoid/deploy@447fab2]: Updating Parsoid to edeeb60 [20:21:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:22:10] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#4118412 (10Fjalapeno) [20:25:11] (03PS3) 10Lokal Profil: Add versioning for config and validate it [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425065 (https://phabricator.wikimedia.org/T163328) [20:25:32] RECOVERY - Check systemd state on restbase-dev1005 is OK: OK - running: The system is fully operational [20:26:13] (03CR) 10Lokal Profil: [C: 032] "Makes sense." [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425038 (https://phabricator.wikimedia.org/T163328) (owner: 10Hoo man) [20:26:41] (03Merged) 10jenkins-bot: Make DCAT backwards compatible to old config [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425038 (https://phabricator.wikimedia.org/T163328) (owner: 10Hoo man) [20:27:42] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2069 - https://phabricator.wikimedia.org/T191720#4118426 (10Marostegui) Thanks! [20:27:46] (03CR) 10Lokal Profil: "I currently set the "you should update" level to `E_USER_WARNING`, I have no idea what a suitable level would be though." [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425065 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [20:28:32] 10Operations, 10Release-Engineering-Team, 10Scap, 10Scoring-platform-team: Deployment git server can't supply ORES hosts in parallel - https://phabricator.wikimedia.org/T191842#4118429 (10awight) [20:31:28] XioNoX: no, never if it's not bundled out of the box we may try to add to it if you think it's useful [20:32:38] !log arlolra@tin Finished deploy [parsoid/deploy@447fab2]: Updating Parsoid to edeeb60 (duration: 11m 03s) [20:32:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:33:03] dcausse: it's already there [20:33:11] (03CR) 10Pnorman: [C: 04-1] "There's an escaped quote after JAVACMD_OPTIONS but no escaped quote at the end of the string" [puppet] - 10https://gerrit.wikimedia.org/r/424247 (https://phabricator.wikimedia.org/T190193) (owner: 10Gehel) [20:33:17] ok :) [20:34:07] ayounsi@logstash1007:/usr/share/logstash$ bin/logstash-plugin list --verbose logstash-filter-geoip [20:34:07] logstash-filter-geoip (4.2.1) [20:34:39] nice [20:38:47] (03CR) 10Lokal Profil: "Might be worth holding of for https://gerrit.wikimedia.org/r/#/c/425065/." [puppet] - 10https://gerrit.wikimedia.org/r/424291 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [20:38:52] !log awight@tin Finished deploy [ores/deploy@be69c1d]: Transitional virtualenv for ORES, T181071 (duration: 24m 14s) [20:38:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:58] T181071: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071 [20:47:28] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4118476 (10bd808) Discussed briefly in the 2018-04-09 SRE team meeting. @RobH mentioned that he would look into getting a quote from HP on a RAID card that can sup... [20:48:08] !log Updated Parsoid to edeeb60 (T191281, T187386, T185266) [20:48:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:17] T185266: INVALID_CHARACTER_ERR (5): the string contains invalid characters - https://phabricator.wikimedia.org/T185266 [20:48:17] T187386: Cannot read property 'unshift' of undefined - https://phabricator.wikimedia.org/T187386 [20:48:17] T191281: VisualEditor breaks galleries with
in image captions - https://phabricator.wikimedia.org/T191281 [20:49:31] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4118493 (10awight) Update: I deployed as a safe migration, and the new vir... [20:55:56] (03PS1) 10Dzahn: installserver: add monitoring for TFTP [puppet] - 10https://gerrit.wikimedia.org/r/425131 (https://phabricator.wikimedia.org/T190439) [20:57:02] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [21:00:04] bawolff and Reedy: How many deployers does it take to do Weekly Security deployment window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180409T2100). [21:00:04] No GERRIT patches in the queue for this window AFAICS. [21:00:24] (03CR) 10Dzahn: [C: 032] installserver: add monitoring for TFTP [puppet] - 10https://gerrit.wikimedia.org/r/425131 (https://phabricator.wikimedia.org/T190439) (owner: 10Dzahn) [21:00:39] (03PS2) 10Dzahn: installserver: add monitoring for TFTP [puppet] - 10https://gerrit.wikimedia.org/r/425131 (https://phabricator.wikimedia.org/T190439) [21:04:04] (03CR) 10Lokal Profil: "and `"nt": "application/n-triples",` should probably be added to `ld-info['mediatype']` as that now exists" [puppet] - 10https://gerrit.wikimedia.org/r/424291 (https://phabricator.wikimedia.org/T163328) (owner: 10Lokal Profil) [21:07:52] (03PS6) 10Paladox: Gerrit: Switch gc back on [puppet] - 10https://gerrit.wikimedia.org/r/421593 (https://phabricator.wikimedia.org/T190045) [21:10:06] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [21:11:09] !log cr1-eqsin 24h experiment on applying same local-pref to peers and transits - T186835 [21:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:20:28] (03PS1) 10Rush: openstack: codify default for failover for l3-agent [puppet] - 10https://gerrit.wikimedia.org/r/425198 [21:22:02] (03PS2) 10Rush: openstack: codify default for failover for l3-agent [puppet] - 10https://gerrit.wikimedia.org/r/425198 [21:22:48] (03CR) 10Rush: [C: 032] openstack: codify default for failover for l3-agent [puppet] - 10https://gerrit.wikimedia.org/r/425198 (owner: 10Rush) [21:26:06] 10Operations, 10cloud-services-team (Kanban): rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4118593 (10bd808) [21:36:17] (03CR) 10Paladox: "tested locally, and puppet ran successfully and also gerrit restarted correctly" [puppet] - 10https://gerrit.wikimedia.org/r/421593 (https://phabricator.wikimedia.org/T190045) (owner: 10Paladox) [21:46:44] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4118685 (10awight) I can't tell whether the fetch check script is failing,... [21:49:44] (03PS1) 10Catrope: Update SSH key for etonkovidova [puppet] - 10https://gerrit.wikimedia.org/r/425200 [21:50:22] (03PS2) 10Catrope: Update SSH key for etonkovidova [puppet] - 10https://gerrit.wikimedia.org/r/425200 [21:50:55] XioNoX: Is the channel topic right that you're on clinic duty this week? If so, could I grab a +2 on https://gerrit.wikimedia.org/r/425200 ? [21:51:20] 10Operations, 10monitoring, 10Patch-For-Review: add tftpd monitoring - https://phabricator.wikimedia.org/T190439#4118701 (10Dzahn) 05stalled>03Resolved - converted puppet role to profile - re-added monitoring section to profile (now the style check is happy about that) - appears here now again: https://... [21:51:28] RoanKattouw: sounds good [21:51:33] (03CR) 10Ayounsi: [C: 032] Update SSH key for etonkovidova [puppet] - 10https://gerrit.wikimedia.org/r/425200 (owner: 10Catrope) [21:51:36] Yay thanks [21:51:47] Once it's deployed, could you force a puppet run on bast4001 and stat1006? [21:52:01] sure [21:52:15] Thanks [21:56:13] RoanKattouw: done [22:02:22] XioNoX: Yay thanks, that worked [22:14:26] (03PS1) 10BryanDavis: toolforge: add mr (Marathi) language pack and locale [puppet] - 10https://gerrit.wikimedia.org/r/425202 (https://phabricator.wikimedia.org/T191727) [22:15:08] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission restbase-test200[123] - https://phabricator.wikimedia.org/T187447#4118753 (10RobH) a:05RobH>03Papaul I've removed the descriptions from those switch ports. @papaul: moving forward, you should be aware that the order of operation of... [22:15:33] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission restbase-test200[123] - https://phabricator.wikimedia.org/T187447#4118756 (10RobH) [22:16:09] (03CR) 10BryanDavis: "Added mutante as a reviewer since it looks like he is one of the few people who has handled a request like T191727 before." [puppet] - 10https://gerrit.wikimedia.org/r/425202 (https://phabricator.wikimedia.org/T191727) (owner: 10BryanDavis) [22:33:57] (03PS2) 10Bstorm: wiki replicas: Display database name in maintain-views logs [puppet] - 10https://gerrit.wikimedia.org/r/425113 (owner: 10BryanDavis) [22:38:03] (03CR) 10Bstorm: [C: 032] wiki replicas: Display database name in maintain-views logs [puppet] - 10https://gerrit.wikimedia.org/r/425113 (owner: 10BryanDavis) [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Evening SWAT (Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180409T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:03:59] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4118918 (10awight) Do we have to install the `python3-setuptools` package?... [23:17:01] jdlrobson: ping for https://gerrit.wikimedia.org/r/#/c/421184/ ? [23:31:21] hmm i just got this error with gerrit.wikimedia.org when using polygerrit "Failed to load resource: The network connection was lost. _handleNetworkError — gr-app.js:1794:453" [23:31:27] Unhandled Promise Rejection: TypeError: The network connection was lost. [23:31:58] Failed to load resource: The network connection was lost. https://gerrit.wikimedia.org/r/changes/425111/revisions/1/files/includes%2FPF_AutoeditAPI.php/reviewed [23:36:21] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#4118975 (10EddieGP) [23:36:24] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet broken on deployment-cache-text04 due to varnishkafka issues - https://phabricator.wikimedia.org/T184234#4118973 (10EddieGP) 05Open>03Resolved Seems fixed. ``` eddie@deployment-cache-text04:~$ sudo puppet agent -tv Info: Using configured environment 'product... [23:37:17] (03CR) 10Smalyshev: "So this defines only internal per-cluster services, not generic endpoint?" [dns] - 10https://gerrit.wikimedia.org/r/424587 (https://phabricator.wikimedia.org/T187766) (owner: 10Gehel) [23:39:16] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#4118983 (10EddieGP) [23:50:42] 10Puppet, 10Beta-Cluster-Infrastructure: Error: Could not find class role::kafka::jumbo::mirror for deployment-kafka0[45] - https://phabricator.wikimedia.org/T191154#4119011 (10EddieGP) 05Open>03Resolved Puppet is fine on both hosts now. so this seems resolved. Thanks @Ottomata! [23:57:24] (03PS1) 10Catrope: tilerator: Add sudo rule for tileratorui [puppet] - 10https://gerrit.wikimedia.org/r/425208 [23:57:53] (03CR) 10jerkins-bot: [V: 04-1] tilerator: Add sudo rule for tileratorui [puppet] - 10https://gerrit.wikimedia.org/r/425208 (owner: 10Catrope)