[00:00:04] twentyafterfour: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180405T0000). [00:00:04] No GERRIT patches in the queue for this window AFAICS. [00:02:00] 10Puppet, 10Beta-Cluster-Infrastructure: Error: Could not find class role::kafka::jumbo::mirror for deployment-kafka0[45] - https://phabricator.wikimedia.org/T191154#4095542 (10Ottomata) Yar, just tried to fix this, but ran into T191232 (again) on the way, so puppet is broken again. [00:16:22] (03CR) 10Ottomata: [C: 031] rm mtizzoni,panisson,paolotti,ciro from analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/423711 (https://phabricator.wikimedia.org/T189341) (owner: 10Herron) [00:25:03] (03PS22) 10Ottomata: [WIP] Set jmx exporter instance common labels at host level [puppet] - 10https://gerrit.wikimedia.org/r/423931 [00:25:40] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set jmx exporter instance common labels at host level [puppet] - 10https://gerrit.wikimedia.org/r/423931 (owner: 10Ottomata) [00:28:08] (03PS23) 10Ottomata: [WIP] Set jmx exporter instance common labels at host level [puppet] - 10https://gerrit.wikimedia.org/r/423931 [00:28:44] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Set jmx exporter instance common labels at host level [puppet] - 10https://gerrit.wikimedia.org/r/423931 (owner: 10Ottomata) [00:39:13] (03PS24) 10Ottomata: [WIP] Set jmx exporter instance common labels at host level [puppet] - 10https://gerrit.wikimedia.org/r/423931 [00:41:39] 10Operations, 10ops-codfw, 10Traffic: cp2018 memory replacement - https://phabricator.wikimedia.org/T191228#4107330 (10Papaul) Hello Papaul, Thank you for replying. We have created the following cases: 1. 7M99F42 - 963059814 – 351594756 (Dispatch) ----- Motherboard and 2 DIMMs 2. 7M6CF42 - 963061588 – 35... [00:47:33] 10Operations, 10ops-codfw, 10Traffic: cp2008 memory replacement - https://phabricator.wikimedia.org/T191224#4107331 (10Papaul) Hello Papaul, Thank you for replying. We have created the following cases: 1. 7M99F42 - 963059814 – 351594756 (Dispatch) ----- Motherboard and 2 DIMMs 2. 7M6CF42 - 963061588 – 35... [00:59:00] (03PS25) 10Ottomata: Set jmx exporter instance common labels at host level [puppet] - 10https://gerrit.wikimedia.org/r/423931 [00:59:36] (03CR) 10jerkins-bot: [V: 04-1] Set jmx exporter instance common labels at host level [puppet] - 10https://gerrit.wikimedia.org/r/423931 (owner: 10Ottomata) [01:01:59] (03PS26) 10Ottomata: Set jmx exporter instance common labels at host level [puppet] - 10https://gerrit.wikimedia.org/r/423931 [01:09:31] (03CR) 10Ottomata: "Ok, I'm pretty sure this works!" [puppet] - 10https://gerrit.wikimedia.org/r/423931 (owner: 10Ottomata) [01:14:41] 10Operations, 10ops-codfw, 10Traffic: cp2010 memory replacement - https://phabricator.wikimedia.org/T191225#4107347 (10Papaul) Your Service Request SR#: 963061588 Contact Us | Support Library | Download Center | SupportAssist | Community Forums Dear PAPAUL TSHIBAMBA, Current Status: This e-mail serves as c... [01:16:23] 10Operations, 10ops-codfw, 10Traffic: cp2011 memory replacement - https://phabricator.wikimedia.org/T191226#4107348 (10Papaul) Hello Papaul, Thank you for replying. We have created the following cases: 7M99F42 - 963059814 – 351594756 (Dispatch) ----- Motherboard and 2 DIMMs 7M6CF42 - 963061588 – 351604843... [02:16:22] 10Operations, 10Ops-Access-Requests: Requesting access to shell (snapshot, dumpsdata) for springle - https://phabricator.wikimedia.org/T191478#4107362 (10Springle) Updated to a non-wmf email in phabricator profile settings. [02:28:16] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.27) (duration: 07m 06s) [02:28:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:58:48] PROBLEM - Request latencies on argon is CRITICAL: CRITICAL - scalar( sum(rate(apiserver_request_latencies_summary_sum{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.32.133:6443}[5m]))/ sum(rate(apiserver_request_latencies_summary_count{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.32.133:6443}[5m]))): 83729477.10319765 = 100000.0 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [03:00:48] RECOVERY - Request latencies on argon is OK: OK - scalar( sum(rate(apiserver_request_latencies_summary_sum{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.32.133:6443}[5m]))/ sum(rate(apiserver_request_latencies_summary_count{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.32.133:6443}[5m]))) within thresholds https://grafana.wikimedia.org/dashboard/db/kubernetes-api [03:08:48] PROBLEM - Request latencies on argon is CRITICAL: CRITICAL - scalar( sum(rate(apiserver_request_latencies_summary_sum{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.32.133:6443}[5m]))/ sum(rate(apiserver_request_latencies_summary_count{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.32.133:6443}[5m]))): 104812514.30858809 = 100000.0 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [03:12:28] PROBLEM - Request latencies on chlorine is CRITICAL: CRITICAL - scalar( sum(rate(apiserver_request_latencies_summary_sum{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.0.45:6443}[5m]))/ sum(rate(apiserver_request_latencies_summary_count{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.0.45:6443}[5m]))): 39787553.07044199 = 100000.0 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [03:14:28] RECOVERY - Request latencies on chlorine is OK: OK - scalar( sum(rate(apiserver_request_latencies_summary_sum{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.0.45:6443}[5m]))/ sum(rate(apiserver_request_latencies_summary_count{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.0.45:6443}[5m]))) within thresholds https://grafana.wikimedia.org/dashboard/db/kubernetes-api [03:14:49] RECOVERY - Request latencies on argon is OK: OK - scalar( sum(rate(apiserver_request_latencies_summary_sum{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.32.133:6443}[5m]))/ sum(rate(apiserver_request_latencies_summary_count{ job=k8s-api,verb!=WATCH,verb!=WATCHLIST,instance=10.64.32.133:6443}[5m]))) within thresholds https://grafana.wikimedia.org/dashboard/db/kubernetes-api [03:47:58] PROBLEM - MariaDB Slave Lag: s7 on db2087 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 345.47 seconds [03:48:08] PROBLEM - MariaDB Slave Lag: s7 on db2061 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 347.05 seconds [03:48:08] PROBLEM - MariaDB Slave Lag: s7 on db2047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 349.45 seconds [03:48:09] PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 351.32 seconds [03:48:18] PROBLEM - MariaDB Slave Lag: s7 on db2040 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 354.89 seconds [03:48:18] PROBLEM - MariaDB Slave Lag: s7 on db2077 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 355.18 seconds [03:48:28] PROBLEM - MariaDB Slave Lag: s7 on db2086 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 357.93 seconds [03:48:48] PROBLEM - MariaDB Slave Lag: s7 on db2054 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 366.95 seconds [04:16:28] RECOVERY - MariaDB Slave Lag: s7 on db2068 is OK: OK slave_sql_lag Replication lag: 55.06 seconds [04:16:29] RECOVERY - MariaDB Slave Lag: s7 on db2040 is OK: OK slave_sql_lag Replication lag: 47.26 seconds [04:16:38] RECOVERY - MariaDB Slave Lag: s7 on db2077 is OK: OK slave_sql_lag Replication lag: 40.75 seconds [04:16:39] RECOVERY - MariaDB Slave Lag: s7 on db2086 is OK: OK slave_sql_lag Replication lag: 3.25 seconds [04:16:59] RECOVERY - MariaDB Slave Lag: s7 on db2054 is OK: OK slave_sql_lag Replication lag: 0.44 seconds [04:17:18] RECOVERY - MariaDB Slave Lag: s7 on db2087 is OK: OK slave_sql_lag Replication lag: 0.27 seconds [04:17:18] RECOVERY - MariaDB Slave Lag: s7 on db2061 is OK: OK slave_sql_lag Replication lag: 0.30 seconds [04:17:19] RECOVERY - MariaDB Slave Lag: s7 on db2047 is OK: OK slave_sql_lag Replication lag: 0.21 seconds [05:26:48] PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - CRITICAL - api-https_443: Servers mw2201.codfw.wmnet, mw2202.codfw.wmnet, mw2142.codfw.wmnet, mw2145.codfw.wmnet, mw2206.codfw.wmnet, mw2135.codfw.wmnet, mw2289.codfw.wmnet, mw2138.codfw.wmnet, mw2215.codfw.wmnet, mw2136.codfw.wmnet, mw2141.codfw.wmnet are marked down but pooled [05:27:48] RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy [05:36:38] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424192 [05:36:40] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424192 [05:39:18] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424192 (owner: 10Marostegui) [05:40:32] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424192 (owner: 10Marostegui) [05:40:51] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424192 (owner: 10Marostegui) [05:42:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 after alter table (duration: 01m 18s) [05:42:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:43:30] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424194 (https://phabricator.wikimedia.org/T187089) [05:45:50] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424194 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [05:47:12] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424194 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [05:48:41] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1089 for alter table (duration: 01m 16s) [05:48:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:50:33] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424194 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [05:52:18] !log Deploy schema change on db1089 - T187089 T185128 T153182 [05:52:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:52:26] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [05:52:26] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [05:52:26] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [05:57:09] (03PS1) 10Marostegui: db2058.yaml: Change binlog format [puppet] - 10https://gerrit.wikimedia.org/r/424201 (https://phabricator.wikimedia.org/T191275) [05:58:06] (03CR) 10Marostegui: [C: 032] db2058.yaml: Change binlog format [puppet] - 10https://gerrit.wikimedia.org/r/424201 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [05:58:44] !log Restart MySQL on db2058 to change its binlog to STATEMENT - T191275 [05:58:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:58:50] T191275: Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T191275 [06:03:32] (03PS1) 10Marostegui: db-codfw.php: db2058 is now a candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424203 (https://phabricator.wikimedia.org/T191275) [06:05:00] (03CR) 10Marostegui: [C: 032] db-codfw.php: db2058 is now a candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424203 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:06:13] (03Merged) 10jenkins-bot: db-codfw.php: db2058 is now a candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424203 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:08:06] !log marostegui@tin Synchronized wmf-config/db-codfw.php: db2058 is now a candidate master for s4 - T191275 (duration: 01m 16s) [06:08:11] (03CR) 10jenkins-bot: db-codfw.php: db2058 is now a candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424203 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:08:12] T191275: Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T191275 [06:26:28] PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - CRITICAL - api-https_443: Servers mw2147.codfw.wmnet, mw2286.codfw.wmnet, mw2205.codfw.wmnet, mw2252.codfw.wmnet, mw2143.codfw.wmnet, mw2207.codfw.wmnet, mw2283.codfw.wmnet, mw2135.codfw.wmnet, mw2289.codfw.wmnet, mw2221.codfw.wmnet, mw2287.codfw.wmnet, mw2214.codfw.wmnet, mw2285.codfw.wmnet, mw2220.codfw.wmnet, mw2290.codfw.wmnet, mw2208.codfw.wmnet [06:26:28] t, mw2210.codfw.wmnet, mw2262.codfw.wmnet are marked down but pooled [06:27:29] RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy [06:29:49] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/smartmontools/run.d/20logger] [06:32:48] PROBLEM - puppet last run on labvirt1014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/nova/policy.json] [06:35:51] (03PS1) 10Marostegui: db2038.yaml: Change binlog format and shard [puppet] - 10https://gerrit.wikimedia.org/r/424206 (https://phabricator.wikimedia.org/T191275) [06:38:10] (03PS1) 10Marostegui: db-codfw.php: Depool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424207 (https://phabricator.wikimedia.org/T191275) [06:39:47] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424207 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:40:24] * elukey checks lvs2003 [06:41:02] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424207 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:41:16] (03CR) 10jenkins-bot: db-codfw.php: Depool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424207 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:43:02] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2038 (duration: 01m 17s) [06:43:06] !log Stop MySQL on db2038 to change binlog format, upgrade mariadb and kernel [06:43:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:21] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090#4062522 (10ema) >>! In T190090#4106972, @ayounsi wrote: > About kernel tuning, here are the variables we can adjust as necessary, with their default. > ``` > 50 -- /proc/sys/... [06:44:28] (03CR) 10Marostegui: [C: 032] db2038.yaml: Change binlog format and shard [puppet] - 10https://gerrit.wikimedia.org/r/424206 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:46:13] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#4107519 (10jcrespo) a:03jcrespo Don't worry, I can boot into RAID manager and do it myself. Thanks! [06:48:30] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#4107530 (10jcrespo) I moved you into "Blocked" because I don't see a better option (you do not have, like us an "All is done in ou... [06:53:25] (03PS1) 10Marostegui: db-codfw.php: Repool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424210 (https://phabricator.wikimedia.org/T191275) [06:55:27] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424210 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:56:40] (03Merged) 10jenkins-bot: db-codfw.php: Repool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424210 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:57:48] RECOVERY - puppet last run on labvirt1014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:58:06] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2038 (duration: 01m 13s) [06:58:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:31] (03CR) 10jenkins-bot: db-codfw.php: Repool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424210 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [06:59:49] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:03:19] (03PS1) 10Marostegui: db2053.yaml: Change binlog format [puppet] - 10https://gerrit.wikimedia.org/r/424211 (https://phabricator.wikimedia.org/T191275) [07:03:41] 10Operations, 10Ops-Access-Requests: Access to stat100x and notebook1003.eqiad.wmnet for Jonas Kress - https://phabricator.wikimedia.org/T191308#4107552 (10Jonas) In my role as PM for Wikidata mobile I would like to access the relevant data (revisions, page views,... ) I already have access to the cluster and... [07:04:11] (03PS1) 10Marostegui: db-codfw.php: db2053 is candidate master for s6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424212 (https://phabricator.wikimedia.org/T191275) [07:04:39] (03CR) 10Marostegui: [C: 032] db2053.yaml: Change binlog format [puppet] - 10https://gerrit.wikimedia.org/r/424211 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [07:05:09] !log Restart MySQL on db2053 for binlog format change [07:05:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:33] 10Puppet, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-prep down hosts - fix/remove? - https://phabricator.wikimedia.org/T191293#4107582 (10MoritzMuehlenhoff) @EddieGP: I can confirm that deployment-videoscaler01 is unused, it was setup to fix/test compatibility problems with Debian... [07:17:55] (03CR) 10Marostegui: [C: 032] db-codfw.php: db2053 is candidate master for s6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424212 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [07:19:09] (03Merged) 10jenkins-bot: db-codfw.php: db2053 is candidate master for s6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424212 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [07:20:09] (03PS2) 10Muehlenhoff: Update SSH key of Stas Malyshev [puppet] - 10https://gerrit.wikimedia.org/r/423841 [07:20:21] (03CR) 10jenkins-bot: db-codfw.php: db2053 is candidate master for s6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424212 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [07:20:34] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2053 as candidate master (duration: 01m 09s) [07:20:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:22:18] (03PS1) 10Marostegui: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424215 (https://phabricator.wikimedia.org/T187089) [07:23:33] (03PS1) 10ArielGlenn: update list of active dumps/datasets mirrors [puppet] - 10https://gerrit.wikimedia.org/r/424216 [07:24:27] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424215 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [07:25:59] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424215 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [07:27:12] (03CR) 10Muehlenhoff: [C: 032] Update SSH key of Stas Malyshev [puppet] - 10https://gerrit.wikimedia.org/r/423841 (owner: 10Muehlenhoff) [07:27:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 for alter table (duration: 01m 16s) [07:27:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:28:07] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424215 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [07:28:40] (03CR) 10ArielGlenn: "Left only the sites that are reachable and with current files as 'active=yes'." [puppet] - 10https://gerrit.wikimedia.org/r/424216 (owner: 10ArielGlenn) [07:33:12] 10Puppet, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-prep down hosts - fix/remove? - https://phabricator.wikimedia.org/T191293#4107626 (10EddieGP) Thanks. I've seen access problems on a few other tasks related to beta breakage I was poking; afaik the removal was not intentional (an... [07:33:28] !log Deploy schema change on db1105:3311 - T187089 T185128 T153182 [07:33:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:35] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [07:33:35] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [07:33:35] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [07:36:41] !log Stop MySQL on db1089 for kernel and mariadb upgrade [07:36:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:28] greetings [07:43:00] 10Operations, 10Cloud-Services, 10Datasets-General-or-Unknown: Adjust bandwidth/connection limits, memory settings on labstore1006,7 as appropriate - https://phabricator.wikimedia.org/T191491#4107651 (10ArielGlenn) p:05Triage>03Normal [07:43:16] 10Operations, 10Cloud-Services, 10Datasets-General-or-Unknown, 10User-ArielGlenn: Adjust bandwidth/connection limits, memory settings on labstore1006,7 as appropriate - https://phabricator.wikimedia.org/T191491#4107640 (10ArielGlenn) [07:44:26] !log upgrading openjdk-7 on contint* [07:44:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:45:50] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424226 [07:47:17] (03PS1) 10Jcrespo: mariadb: Repool es2015, depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424227 (https://phabricator.wikimedia.org/T153440) [07:48:25] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424226 (owner: 10Marostegui) [07:49:23] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424226 (owner: 10Marostegui) [07:51:20] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1089 after alter table, mariadb and kernel upgrade (duration: 01m 16s) [07:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:47] !log removed unused/defunct deployment-tmh01 from deployment-prep (T191293) [07:51:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:54] T191293: deployment-prep down hosts - fix/remove? - https://phabricator.wikimedia.org/T191293 [07:52:13] !log removed unused/defunct deployment-videoscaler01 from deployment-prep (T191293) [07:52:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:52:43] (03PS2) 10Jcrespo: mariadb: Repool es2015, depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424227 (https://phabricator.wikimedia.org/T153440) [07:55:15] 10Puppet, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-prep down hosts - fix/remove? - https://phabricator.wikimedia.org/T191293#4107677 (10MoritzMuehlenhoff) [07:57:53] 10Operations, 10Beta-Cluster-Infrastructure, 10HHVM: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#4107687 (10MoritzMuehlenhoff) [08:00:53] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424226 (owner: 10Marostegui) [08:02:54] 10Operations, 10Graphite, 10Services (watching), 10User-fgiunchedi: Cassandra Graphite metrics space usage audit and cleanup - https://phabricator.wikimedia.org/T191315#4107704 (10fgiunchedi) Ack, thanks @mobrovac @Eevans ! We'll pick this up again when decom time comes for graphite-cassandra machines. [08:04:58] (03CR) 10Filippo Giunchedi: [C: 031] Prometheus: aggregates netstat_Icmp_In and InEcho by cluster [puppet] - 10https://gerrit.wikimedia.org/r/424139 (owner: 10Ayounsi) [08:07:02] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424229 [08:08:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424229 (owner: 10Marostegui) [08:09:53] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424229 (owner: 10Marostegui) [08:09:58] (03PS3) 10Jcrespo: mariadb: Repool es2015, depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424227 (https://phabricator.wikimedia.org/T153440) [08:10:09] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424229 (owner: 10Marostegui) [08:10:13] (03CR) 10Filippo Giunchedi: "LGTM from a quick look, though having PCC results would help" [puppet] - 10https://gerrit.wikimedia.org/r/419339 (https://phabricator.wikimedia.org/T189050) (owner: 10Dzahn) [08:11:08] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 87 probes of 302 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [08:11:27] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1089 (duration: 01m 16s) [08:11:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:00] (03CR) 10Jcrespo: "I will create a second backup of the 3rd cluster, with that we will have a recent offline copy of all of es. This could be done regularly " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424227 (https://phabricator.wikimedia.org/T153440) (owner: 10Jcrespo) [08:12:42] (03CR) 10Jcrespo: [C: 032] mariadb: Repool es2015, depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424227 (https://phabricator.wikimedia.org/T153440) (owner: 10Jcrespo) [08:13:55] (03Merged) 10jenkins-bot: mariadb: Repool es2015, depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424227 (https://phabricator.wikimedia.org/T153440) (owner: 10Jcrespo) [08:14:29] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 37 probes of 303 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [08:14:43] (03CR) 10jenkins-bot: mariadb: Repool es2015, depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424227 (https://phabricator.wikimedia.org/T153440) (owner: 10Jcrespo) [08:15:58] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 168 probes of 303 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [08:16:09] !log jynus@tin Synchronized wmf-config/db-codfw.php: Repool es2015, depool es2019 (duration: 01m 16s) [08:16:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:22] mhh there is zayo maint ongoing too [08:16:29] (re: ripe atlas probes) [08:17:07] paravoid mark akosiaris XioNoX ^ [08:17:15] (03PS2) 10Gehel: Fix assorted bugs in process-osm-data script for new schema [puppet] - 10https://gerrit.wikimedia.org/r/424170 (https://phabricator.wikimedia.org/T191345) (owner: 10Pnorman) [08:17:27] what's up? [08:18:01] (03CR) 10Gehel: [C: 032] Fix assorted bugs in process-osm-data script for new schema [puppet] - 10https://gerrit.wikimedia.org/r/424170 (https://phabricator.wikimedia.org/T191345) (owner: 10Pnorman) [08:18:31] paravoid: ripe atlas alert for eqiad/codfw/ulsfo [08:18:51] in particular eqiad has been half the probes [08:19:34] (03CR) 10Elukey: [C: 032] Add the -skipTrash option to hdfs -rm [puppet/cdh] - 10https://gerrit.wikimedia.org/r/423845 (https://phabricator.wikimedia.org/T189051) (owner: 10Elukey) [08:20:12] looks like it is recovering tho [08:21:52] (03PS1) 10Elukey: Update cdh module [puppet] - 10https://gerrit.wikimedia.org/r/424232 [08:22:50] (03CR) 10Elukey: [C: 032] Update cdh module [puppet] - 10https://gerrit.wikimedia.org/r/424232 (owner: 10Elukey) [08:22:55] indeed, recovered [08:23:49] !log installing net-snmp security updates on jessie (stretch not affected) [08:23:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:25] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424234 [08:27:10] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424234 (owner: 10Marostegui) [08:28:24] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424234 (owner: 10Marostegui) [08:28:37] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424234 (owner: 10Marostegui) [08:29:52] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1089 (duration: 01m 16s) [08:29:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:12] 10Puppet, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: deployment-prep down hosts - fix/remove? - https://phabricator.wikimedia.org/T191293#4107740 (10EddieGP) 05Open>03Resolved a:03EddieGP puppetdb is T187736, let's call this resolved [08:30:19] !log starting backup of es2019, it may create lag T153440 [08:30:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:26] T153440: Create a full backup of all external storage records that would be easy to restore/setup a temporary delayed slave - https://phabricator.wikimedia.org/T153440 [08:35:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 303 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [08:36:08] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 15 probes of 302 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [08:38:45] (03PS1) 10Elukey: role::analytics_cluster::hadoop:master|standby: enable HDFS trash [puppet] - 10https://gerrit.wikimedia.org/r/424237 (https://phabricator.wikimedia.org/T189051) [08:39:19] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-postgres-exporter [puppet] - 10https://gerrit.wikimedia.org/r/424238 (https://phabricator.wikimedia.org/T135991) [08:39:29] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 12 probes of 303 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [08:39:43] (03CR) 10jerkins-bot: [V: 04-1] Enable base::service_auto_restart for prometheus-postgres-exporter [puppet] - 10https://gerrit.wikimedia.org/r/424238 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [08:40:37] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424239 [08:41:04] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-postgres-exporter [puppet] - 10https://gerrit.wikimedia.org/r/424238 (https://phabricator.wikimedia.org/T135991) [08:42:19] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424239 (owner: 10Marostegui) [08:43:32] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424239 (owner: 10Marostegui) [08:43:47] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424239 (owner: 10Marostegui) [08:44:27] (03PS2) 10Elukey: role::analytics_cluster::hadoop:master|standby: enable HDFS trash [puppet] - 10https://gerrit.wikimedia.org/r/424237 (https://phabricator.wikimedia.org/T189051) [08:45:07] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1089 (duration: 01m 17s) [08:45:11] (03CR) 10EddieGP: [C: 04-1] "Please either build the package for stretch or don't activate this for stretch hosts." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392221 (owner: 10Aaron Schulz) [08:45:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:05] (03PS1) 10Marostegui: db-eqiad.php: Restore db1089 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424242 [08:51:11] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/10828/" [puppet] - 10https://gerrit.wikimedia.org/r/424237 (https://phabricator.wikimedia.org/T189051) (owner: 10Elukey) [08:53:17] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-snmp-exporter [puppet] - 10https://gerrit.wikimedia.org/r/424243 (https://phabricator.wikimedia.org/T135991) [08:53:43] (03CR) 10jerkins-bot: [V: 04-1] Enable base::service_auto_restart for prometheus-snmp-exporter [puppet] - 10https://gerrit.wikimedia.org/r/424243 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [08:54:24] (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-snmp-exporter [puppet] - 10https://gerrit.wikimedia.org/r/424243 (https://phabricator.wikimedia.org/T135991) [08:59:41] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1089 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424242 (owner: 10Marostegui) [09:01:00] (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1089 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424242 (owner: 10Marostegui) [09:01:15] (03CR) 10jenkins-bot: db-eqiad.php: Restore db1089 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424242 (owner: 10Marostegui) [09:02:53] (03CR) 10Filippo Giunchedi: [C: 031] Enable base::service_auto_restart for prometheus-postgres-exporter [puppet] - 10https://gerrit.wikimedia.org/r/424238 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [09:03:07] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore db1089 original weight (duration: 01m 17s) [09:03:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:22] (03PS1) 10Gehel: maps: add Java proxy to cleartables_sync cron [puppet] - 10https://gerrit.wikimedia.org/r/424247 (https://phabricator.wikimedia.org/T190193) [09:08:20] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4107815 (10faidon) p:05Normal>03High @Cmjohnson @RobH This has been going on for weeks now, and this is too much of a delay for setting up these systems. I'm e... [09:12:39] 10Operations, 10Analytics, 10New-Readers, 10Traffic, and 2 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#3961955 (10aberjak) Created a ticket in Opera's bug tracking system, internal reference: MT-3735. Thanks for all the info - we will investigate. [09:17:48] (03CR) 10Liuxinyu970226: [C: 031] Initial configuration for gorwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416930 (https://phabricator.wikimedia.org/T189109) (owner: 10Urbanecm) [09:18:01] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4107852 (10faidon) OK, I just saw above that this is a HPE Smart Array P440ar controller. According to [[ https://h20195.www2.hpe.com/v2/getpdf.aspx/c04346299.pdf... [09:28:58] (03CR) 10Gehel: "puppet compiler looks happy: https://puppet-compiler.wmflabs.org/compiler02/10829/" [puppet] - 10https://gerrit.wikimedia.org/r/424247 (https://phabricator.wikimedia.org/T190193) (owner: 10Gehel) [09:44:27] PHP fatal error: [] operator not supported for strings @ https://commons.wikimedia.org/w/index.php?title=Special:AbuseFilter&offset=129 [09:44:39] 10Operations, 10DBA, 10MediaWiki-Page-deletion, 10Performance: Cannot delete two pages with large histories even having the appropriate permissions to do so - https://phabricator.wikimedia.org/T145630#2636396 (10Graham87) I know this bug is old and resolved, but I stumbled on it by accident while looking f... [09:52:51] 10Operations, 10TemplateStyles, 10Traffic, 10Wikimedia-Extension-setup, and 4 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#4108088 (10Iniquity) [10:03:11] (03CR) 10Muehlenhoff: First working version (033 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [10:16:41] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424257 [10:17:25] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424257 [10:18:29] (03CR) 10Muehlenhoff: Add CLI script to be installed in the target hosts (035 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [10:20:19] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424257 (owner: 10Marostegui) [10:21:32] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424257 (owner: 10Marostegui) [10:21:37] 10Operations, 10TemplateStyles, 10Traffic, 10Wikimedia-Extension-setup, and 4 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#4108164 (10Deskana) [10:21:46] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424257 (owner: 10Marostegui) [10:23:27] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 after alter table (duration: 01m 16s) [10:23:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:29] (03PS1) 10Gehel: wdqs: configure new servers wdqs100[6-8] [puppet] - 10https://gerrit.wikimedia.org/r/424260 (https://phabricator.wikimedia.org/T187766) [10:24:41] (03PS1) 10Marostegui: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424261 (https://phabricator.wikimedia.org/T187089) [10:24:49] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Monitor and alarm on SMART attributes - https://phabricator.wikimedia.org/T86552#4108171 (10fgiunchedi) The data collection is now deployed on bare metal across the fleet! Alerting wise there's several metrics: * `device_smart_healthy` th... [10:26:48] (03PS2) 10Gehel: wdqs: configure new servers wdqs100[6-8] [puppet] - 10https://gerrit.wikimedia.org/r/424260 (https://phabricator.wikimedia.org/T187766) [10:27:01] 10Operations, 10TemplateStyles, 10Traffic, 10Wikimedia-Extension-setup, and 4 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#4108187 (10Deskana) [10:27:30] jouncebot: next [10:27:30] In 2 hour(s) and 32 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180405T1300) [10:27:36] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424261 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [10:28:51] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424261 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [10:30:24] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1083 for alter table (duration: 01m 17s) [10:30:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:45] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424261 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [10:33:04] !log Deploy schema change on db1083 - T187089 T185128 T153182 [10:33:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:10] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [10:33:10] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [10:33:11] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [10:37:37] 10Operations, 10TemplateStyles, 10Traffic, 10Wikimedia-Extension-setup, and 4 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#4108232 (10Iniquity) [10:44:39] 10Operations, 10TemplateStyles, 10Traffic, 10Wikimedia-Extension-setup, and 4 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#4108239 (10Tgr) [10:50:03] (03PS1) 10ArielGlenn: more aggressive cleanup of temp dump files on generating servers [puppet] - 10https://gerrit.wikimedia.org/r/424267 [10:51:00] (03CR) 10ArielGlenn: [C: 032] more aggressive cleanup of temp dump files on generating servers [puppet] - 10https://gerrit.wikimedia.org/r/424267 (owner: 10ArielGlenn) [10:51:10] (03PS2) 10ArielGlenn: more aggressive cleanup of temp dump files on generating servers [puppet] - 10https://gerrit.wikimedia.org/r/424267 [10:51:21] (03PS1) 10Marostegui: db-eqiad.php: db1106 is sanitarium [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424268 (https://phabricator.wikimedia.org/T183469) [10:52:47] (03PS2) 10Marostegui: db-eqiad.php: db1106 is sanitarium's master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424268 (https://phabricator.wikimedia.org/T183469) [10:54:35] (03CR) 10Marostegui: [C: 032] db-eqiad.php: db1106 is sanitarium's master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424268 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:55:52] (03Merged) 10jenkins-bot: db-eqiad.php: db1106 is sanitarium's master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424268 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [10:57:29] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Specify that db1106 is sanitarium's master (duration: 01m 16s) [10:57:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:49] !log restart dbstore1001 for RAID re-setup and reimage [10:57:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:21] (03CR) 10jenkins-bot: db-eqiad.php: db1106 is sanitarium's master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424268 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [11:16:04] (03PS1) 10Arturo Borrero Gonzalez: wmcs: monitoring: rsync recursive [puppet] - 10https://gerrit.wikimedia.org/r/424270 (https://phabricator.wikimedia.org/T190512) [11:16:51] (03CR) 10Arturo Borrero Gonzalez: [C: 032] wmcs: monitoring: rsync recursive [puppet] - 10https://gerrit.wikimedia.org/r/424270 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [11:22:58] !log updating libssl-1-1 to 1.1.0h on cache misc cluster (and nginx restart) [11:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:28:04] (03PS1) 10Arturo Borrero Gonzalez: graphite: archive-instances: import missing yaml python module [puppet] - 10https://gerrit.wikimedia.org/r/424273 (https://phabricator.wikimedia.org/T189871) [11:28:48] (03CR) 10Arturo Borrero Gonzalez: [C: 032] graphite: archive-instances: import missing yaml python module [puppet] - 10https://gerrit.wikimedia.org/r/424273 (https://phabricator.wikimedia.org/T189871) (owner: 10Arturo Borrero Gonzalez) [11:36:44] !log updating libssl1.1 to 1.1.0h on cache upload cluster (and nginx restart) [11:36:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:11] (03PS1) 10Muehlenhoff: Update SSH key for Dario Tarborelli [puppet] - 10https://gerrit.wikimedia.org/r/424276 [11:42:43] (03PS2) 10Muehlenhoff: Update SSH key for Dario Taraborelli [puppet] - 10https://gerrit.wikimedia.org/r/424276 [11:43:11] (03CR) 10jerkins-bot: [V: 04-1] Update SSH key for Dario Taraborelli [puppet] - 10https://gerrit.wikimedia.org/r/424276 (owner: 10Muehlenhoff) [11:48:03] 10Operations, 10Ops-Access-Requests: Requesting access to shell (snapshot, dumpsdata) for springle - https://phabricator.wikimedia.org/T191478#4108358 (10ArielGlenn) [11:48:20] (03PS3) 10Muehlenhoff: Update SSH key for Dario Taraborelli [puppet] - 10https://gerrit.wikimedia.org/r/424276 [11:48:48] (03CR) 10jerkins-bot: [V: 04-1] Update SSH key for Dario Taraborelli [puppet] - 10https://gerrit.wikimedia.org/r/424276 (owner: 10Muehlenhoff) [11:49:51] (03PS4) 10Muehlenhoff: Update SSH key for Dario Taraborelli [puppet] - 10https://gerrit.wikimedia.org/r/424276 [11:50:04] 10Operations, 10Ops-Access-Requests: Requesting access to shell (snapshot, dumpsdata) for springle - https://phabricator.wikimedia.org/T191478#4107106 (10ArielGlenn) I suppose I'm the sponsor? If so, yes, I approve, for snapshot100x and dumpsdata100x access. [11:50:19] (03CR) 10jerkins-bot: [V: 04-1] Update SSH key for Dario Taraborelli [puppet] - 10https://gerrit.wikimedia.org/r/424276 (owner: 10Muehlenhoff) [11:51:38] (03PS5) 10Muehlenhoff: Update SSH key for Dario Taraborelli [puppet] - 10https://gerrit.wikimedia.org/r/424276 [11:52:32] can we remove that CI check for commit messages or at least make it non-voting? it's completely pointless [11:53:42] (03CR) 10Muehlenhoff: [C: 032] Update SSH key for Dario Taraborelli [puppet] - 10https://gerrit.wikimedia.org/r/424276 (owner: 10Muehlenhoff) [11:58:36] !log updating libssl1.1 to 1.1.0h on cache text cluster (and nginx restart) [11:58:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:22] !log Manually back-filled hashes for the Wikidata JSON dumps in https://dumps.wikimedia.org/wikidatawiki/entities/20180402/wikidata-20180402-*sums.txt (T190457) [12:04:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:29] T190457: Include checksums in https://dumps.wikimedia.org/wikidatawiki/entities/ - https://phabricator.wikimedia.org/T190457 [12:10:44] (03PS4) 10Lokal Profil: Identify publisher with URI [dumps/dcat] - 10https://gerrit.wikimedia.org/r/386366 (https://phabricator.wikimedia.org/T178993) (owner: 10JakobVoss) [12:10:50] (03CR) 10jerkins-bot: [V: 04-1] Identify publisher with URI [dumps/dcat] - 10https://gerrit.wikimedia.org/r/386366 (https://phabricator.wikimedia.org/T178993) (owner: 10JakobVoss) [12:10:54] (03PS1) 10ArielGlenn: fix monitor, couldn't find StatusHtml module [dumps] - 10https://gerrit.wikimedia.org/r/424285 [12:11:52] (03CR) 10Lokal Profil: Identify publisher with URI (031 comment) [dumps/dcat] - 10https://gerrit.wikimedia.org/r/386366 (https://phabricator.wikimedia.org/T178993) (owner: 10JakobVoss) [12:11:59] (03CR) 10ArielGlenn: [C: 032] fix monitor, couldn't find StatusHtml module [dumps] - 10https://gerrit.wikimedia.org/r/424285 (owner: 10ArielGlenn) [12:12:46] !log ariel@tin Started deploy [dumps/dumps@88ca17c]: fix monitor to import status module after refactor [12:12:50] !log ariel@tin Finished deploy [dumps/dumps@88ca17c]: fix monitor to import status module after refactor (duration: 00m 04s) [12:12:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:42] (03PS5) 10Lokal Profil: Identify publisher with URI [dumps/dcat] - 10https://gerrit.wikimedia.org/r/386366 (https://phabricator.wikimedia.org/T178993) (owner: 10JakobVoss) [12:25:40] (03PS1) 10Arturo Borrero Gonzalez: wmcs: monitoring: rsync directories as well [puppet] - 10https://gerrit.wikimedia.org/r/424287 (https://phabricator.wikimedia.org/T190512) [12:28:07] (03PS1) 10Lokal Profil: Identify publisher with URI A publisher should be identified by an URI. For Wikimedia Foundation we can use the Wikidata entity URI . [puppet] - 10https://gerrit.wikimedia.org/r/424288 (https://phabricator.wikimedia.org/T178993) [12:28:53] (03CR) 10jerkins-bot: [V: 04-1] Identify publisher with URI A publisher should be identified by an URI. For Wikimedia Foundation we can use the Wikidata entity URI . [puppet] - 10https://gerrit.wikimedia.org/r/424288 (https://phabricator.wikimedia.org/T178993) (owner: 10Lokal Profil) [12:29:48] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4108444 (10Cmjohnson) @faidon. That would explain the issue, the disk are in the front in slots 4 and 6. Do we buy a new controller or go with 8 ssds? [12:30:23] !log installing net-snmp security updates on jessie (stretch not affected) [12:30:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:08] (03CR) 10Arturo Borrero Gonzalez: [C: 032] wmcs: monitoring: rsync directories as well [puppet] - 10https://gerrit.wikimedia.org/r/424287 (https://phabricator.wikimedia.org/T190512) (owner: 10Arturo Borrero Gonzalez) [12:32:09] (03PS1) 10Filippo Giunchedi: smart: expand list of reported attributes [puppet] - 10https://gerrit.wikimedia.org/r/424289 (https://phabricator.wikimedia.org/T86552) [12:41:05] (03PS2) 10Herron: rm mtizzoni,panisson,paolotti,ciro from analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/423711 (https://phabricator.wikimedia.org/T189341) [12:41:17] !log start of ladsgroup@terbium:~$ mwscript deleteAutoPatrolLogs.php --wiki=wikidatawiki --before 20180223210426 (T189596) [12:41:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:24] T189596: Run deleteAutopatrolLogs script for Wikidata (WMF) - https://phabricator.wikimedia.org/T189596 [12:41:42] (03CR) 10Herron: [C: 032] rm mtizzoni,panisson,paolotti,ciro from analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/423711 (https://phabricator.wikimedia.org/T189341) (owner: 10Herron) [12:42:14] (03PS1) 10Lokal Profil: Support prefixed dump types [puppet] - 10https://gerrit.wikimedia.org/r/424291 (https://phabricator.wikimedia.org/T163328) [12:43:42] (03CR) 10Gilles: [C: 031] NavigationTiming: Move logic to the server side (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/423959 (https://phabricator.wikimedia.org/T181425) (owner: 10Imarlier) [12:44:18] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T191523#4108485 (10Matthias_Geisler_WMDE) [12:44:42] (03PS2) 10Filippo Giunchedi: smart: expand list of reported attributes [puppet] - 10https://gerrit.wikimedia.org/r/424289 (https://phabricator.wikimedia.org/T86552) [12:46:16] (03PS2) 10Lokal Profil: Identify publisher with URI [puppet] - 10https://gerrit.wikimedia.org/r/424288 (https://phabricator.wikimedia.org/T178993) [12:46:24] (03PS1) 10Ladsgroup: Stop logging autopatrol actions in wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424293 (https://phabricator.wikimedia.org/T184485) [12:48:26] (03PS3) 10Filippo Giunchedi: smart: expand list of reported attributes [puppet] - 10https://gerrit.wikimedia.org/r/424289 (https://phabricator.wikimedia.org/T86552) [12:48:59] (03CR) 10Filippo Giunchedi: [C: 032] smart: expand list of reported attributes [puppet] - 10https://gerrit.wikimedia.org/r/424289 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [12:49:07] (03PS1) 10BBlack: Revert "AU: experiment with splitting WA" [dns] - 10https://gerrit.wikimedia.org/r/424294 [12:49:11] (03PS2) 10BBlack: Revert "AU: experiment with splitting WA" [dns] - 10https://gerrit.wikimedia.org/r/424294 [12:50:52] jouncebot: next [12:50:52] In 0 hour(s) and 9 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180405T1300) [12:51:10] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424295 [12:51:14] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424295 [12:51:33] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 3 others: Restricting access for a collaboration nearing completion - https://phabricator.wikimedia.org/T189341#4108511 (10herron) 05Open>03Resolved These users have been removed from `analytics-privatedata-users`. If there's any other... [12:51:54] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 2 others: Restricting access for a collaboration nearing completion - https://phabricator.wikimedia.org/T189341#4108513 (10herron) [12:52:35] !log finished the script [12:52:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:07] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424295 (owner: 10Marostegui) [12:54:26] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424295 (owner: 10Marostegui) [12:56:00] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1083 after alter table (duration: 01m 17s) [12:56:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:15] (03PS1) 10Muehlenhoff: Update restbase cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/424296 [12:56:17] (03PS1) 10Jcrespo: dbstore1001: Set puppet role as mariadb-backups [puppet] - 10https://gerrit.wikimedia.org/r/424297 (https://phabricator.wikimedia.org/T186596) [12:58:20] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1083" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424295 (owner: 10Marostegui) [12:59:14] (03CR) 10Volans: "Replies inline, I'll send an updated CR shortly" (035 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [12:59:33] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T191523#4108529 (10Aklapper) You should be able to access L2 now [12:59:50] (03CR) 10Muehlenhoff: [C: 032] Update restbase cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/424296 (owner: 10Muehlenhoff) [12:59:53] (03PS2) 10Jcrespo: dbstore1001: Set puppet role as mariadb-backups [puppet] - 10https://gerrit.wikimedia.org/r/424297 (https://phabricator.wikimedia.org/T186596) [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180405T1300). [13:00:05] Daimona: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:23] Hi everyone [13:00:50] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Define turn-up process and scope for eqsin service to regional countries - https://phabricator.wikimedia.org/T189252#4108532 (10BBlack) Australia experimental results with current peering arrangements: Over 3x serial 24h periods duri... [13:01:46] I can SWAT today [13:01:52] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4108547 (10faidon) I don't understand :) Could you clarify which disks are in which slots, and how/where are they connected? I wouldn't go with 8 SSDs; we bought... [13:02:51] hi Daimona [13:02:54] Hi [13:03:35] Amir1: around for SWAT? [13:03:46] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Define turn-up process and scope for eqsin service to regional countries - https://phabricator.wikimedia.org/T189252#4108561 (10BBlack) [13:03:53] Daimona: I am reviewing your patch, I will let you know when it's ready for testing at mwdebug1002 [13:04:02] Ok, thanks [13:04:32] (03CR) 10BBlack: [C: 032] Revert "AU: experiment with splitting WA" [dns] - 10https://gerrit.wikimedia.org/r/424294 (owner: 10BBlack) [13:04:36] (03PS3) 10Jcrespo: dbstore1001: Set puppet role as mariadb-backups [puppet] - 10https://gerrit.wikimedia.org/r/424297 (https://phabricator.wikimedia.org/T186596) [13:04:37] zeljkof: I can SWAT if you want me [13:05:01] Amir1: sure, I'm reviewing an extension backport, go ahead [13:05:08] thanks [13:05:32] zeljkof: my patch is not testable but it I need to keep monitoring things afterwards [13:05:46] hope that's fine for you. For probably around five minutes-ish [13:05:55] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424293 (https://phabricator.wikimedia.org/T184485) (owner: 10Ladsgroup) [13:06:40] Amir1: sure, I'll need 5-10 minutes for this anyway :) [13:07:09] (03Merged) 10jenkins-bot: Stop logging autopatrol actions in wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424293 (https://phabricator.wikimedia.org/T184485) (owner: 10Ladsgroup) [13:07:17] (03CR) 10Marostegui: [C: 031] dbstore1001: Set puppet role as mariadb-backups [puppet] - 10https://gerrit.wikimedia.org/r/424297 (https://phabricator.wikimedia.org/T186596) (owner: 10Jcrespo) [13:08:15] (03CR) 10jenkins-bot: Stop logging autopatrol actions in wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424293 (https://phabricator.wikimedia.org/T184485) (owner: 10Ladsgroup) [13:08:35] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4108573 (10Cmjohnson) I received this last last night from HP...I will try this first (i am hoping I have the cable) Thank you for providing the screenshots.... [13:09:04] the abuse filter in fatalmonitor doesn't let me see anything [13:09:06] :/ [13:09:35] I'm going to fix that [13:09:53] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:424293|Stop logging autopatrol actions in wikidatawiki (T184485)]] (duration: 01m 16s) [13:10:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:00] T184485: Stop logging autopatrol actions - https://phabricator.wikimedia.org/T184485 [13:11:59] (03PS1) 10Marostegui: db2054.yaml: Change binlog format [puppet] - 10https://gerrit.wikimedia.org/r/424299 (https://phabricator.wikimedia.org/T191275) [13:12:35] everything seems fine \o/ [13:12:40] cool [13:12:49] I'm still waiting for CI... :) [13:12:55] (03CR) 10Marostegui: [C: 032] db2054.yaml: Change binlog format [puppet] - 10https://gerrit.wikimedia.org/r/424299 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [13:14:34] (03PS1) 10Ladsgroup: mediawiki: Start deleteAutoPatrolLogs from Wikidata logging table [puppet] - 10https://gerrit.wikimedia.org/r/424300 (https://phabricator.wikimedia.org/T189596) [13:14:51] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: Start deleteAutoPatrolLogs from Wikidata logging table [puppet] - 10https://gerrit.wikimedia.org/r/424300 (https://phabricator.wikimedia.org/T189596) (owner: 10Ladsgroup) [13:16:21] Daimona: it might take a while for your patch to get merged... :( https://integration.wikimedia.org/zuul/ [13:16:40] Yeah I see [13:16:53] Not a problem :-) [13:17:55] (03PS4) 10Jcrespo: dbstore1001: Set puppet role as mariadb-backups [puppet] - 10https://gerrit.wikimedia.org/r/424297 (https://phabricator.wikimedia.org/T186596) [13:18:28] (03PS1) 10Marostegui: Revert "db2053.yaml: Change binlog format" [puppet] - 10https://gerrit.wikimedia.org/r/424301 [13:18:48] (03CR) 10Jcrespo: [C: 032] dbstore1001: Set puppet role as mariadb-backups [puppet] - 10https://gerrit.wikimedia.org/r/424297 (https://phabricator.wikimedia.org/T186596) (owner: 10Jcrespo) [13:19:46] (03PS2) 10Marostegui: Revert "db2053.yaml: Change binlog format" [puppet] - 10https://gerrit.wikimedia.org/r/424301 [13:20:21] (03PS2) 10Ladsgroup: mediawiki: Start deleteAutoPatrolLogs from Wikidata logging table [puppet] - 10https://gerrit.wikimedia.org/r/424300 (https://phabricator.wikimedia.org/T189596) [13:20:31] (03CR) 10Marostegui: [C: 032] Revert "db2053.yaml: Change binlog format" [puppet] - 10https://gerrit.wikimedia.org/r/424301 (owner: 10Marostegui) [13:20:53] jynus: ok to merge your changes? [13:21:18] yes [13:21:22] doing it [13:21:24] was about to do it [13:21:28] done! :) [13:23:00] (03PS1) 10Marostegui: db-codfw.php: db2053 - no longer candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424302 (https://phabricator.wikimedia.org/T191275) [13:23:29] !log Stop MySQL on db2053 for binlog format change [13:23:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:18] Daimona: it's merged! it will be at mwdebug1002 in a few minutes, will let you know [13:25:40] Oh nice [13:25:43] Thanks [13:29:01] Daimona: it's at mwdebug1002, please test and let me know if I can deploy [13:29:16] Sure [13:30:43] Seems fine to me [13:30:47] No new problems [13:30:50] Everything works [13:30:55] ok, deploying [13:31:27] I have to wait for an official confirm that the bug is solved by the one who filed it, but logically thinking there's no reason this shouldn't work and would be needed anyway [13:31:30] So yeah, thanks [13:32:21] !log zfilipin@tin Synchronized php-1.31.0-wmf.28/extensions/AbuseFilter: SWAT: [[gerrit:424204|Make $mode optional for checkAllFilters (T191468)]] (duration: 01m 20s) [13:32:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:27] T191468: Undefined variable: mode in extensions/AbuseFilter/includes/AbuseFilter.php on line 491 - https://phabricator.wikimedia.org/T191468 [13:32:43] Daimona: I think there is a problem [13:33:11] What? [13:33:17] https://phabricator.wikimedia.org/P6951 [13:33:23] (03CR) 10Jcrespo: "I will have a look today." [puppet] - 10https://gerrit.wikimedia.org/r/424300 (https://phabricator.wikimedia.org/T189596) (owner: 10Ladsgroup) [13:33:54] Daimona: not sure if that was there before, I have just noticed it at the top of the log report [13:34:19] Yeah, this is the one that should be solved [13:34:34] Was it caused by my testing? [13:34:53] I don't see any increase in log errors, so I guess it was around before swat [13:35:30] Yeah [13:35:31] Also [13:35:39] "Undefined mode at line 542" [13:35:46] But the master doesn't have "mode" at line 542 [13:36:00] Probably refers to an old version of the code [13:36:47] Daimona: it's for php-1.31.0-wmf.28 [13:37:02] seems the number of errors is dropping since the deployment [13:37:07] 10Operations, 10wikidiff2, 10Patch-For-Review, 10WMDE-QWERTY-Team-Board: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4082038 (10WMDE-Fisch) >>! In T190717#4104961, @MoritzMuehlenhoff wrote: > I've built a 1.6.0 package against the HHVM version currentl... [13:37:12] so your patch seems to fix it [13:37:29] In fact [13:37:47] wmf.28 version doesn't have it at l.542 :-) [13:37:54] Nice [13:38:05] 10Operations, 10Ops-Access-Requests: Requesting access to shell (snapshot, dumpsdata) for springle - https://phabricator.wikimedia.org/T191478#4107106 (10herron) Hey @ArielGlenn, in terms of the specific group access being requested, would this be membership to... * `snapshot-admins` for snapshot shell acces... [13:38:10] hm, I guess the report still sees errors from a while back [13:38:21] so it should just go away in a few minutes/hours [13:38:30] Nice [13:38:31] I'll keep the report open [13:38:44] that's all I guess, if there are any trouble, I'll revert the patch [13:38:54] Yeah, let's also wait for the final confirm [13:39:00] Sure [13:39:03] Many thanks [13:39:54] !log EU SWAT finished [13:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:52] !log Running populateArchiveRevId.php on group 1 for T191307 [13:41:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:58] T191307: Run maintenance/populateArchiveRevId.php on all wikis - https://phabricator.wikimedia.org/T191307 [13:44:42] 10Operations, 10Ops-Access-Requests: Requesting access to shell (snapshot, dumpsdata) for springle - https://phabricator.wikimedia.org/T191478#4108701 (10ArielGlenn) I was thinking of not even having sudo access to the dumpsgen user initially, but setting up a scratch dir writeable by the user for testing purp... [13:52:07] (03CR) 10Ottomata: [C: 031] role::analytics_cluster::hadoop:master|standby: enable HDFS trash [puppet] - 10https://gerrit.wikimedia.org/r/424237 (https://phabricator.wikimedia.org/T189051) (owner: 10Elukey) [13:52:37] yoo godog, not sure if you saw: https://gerrit.wikimedia.org/r/#/c/423931/ but would appreciate a review when you find a min [13:53:01] (also see the latest comment, more context there) [13:53:07] ottomata: hey, I did but didn't have time yet, next week I'm out so I'll try to find some time between today and tomorrow [13:53:17] ok thanks [13:53:32] np! thanks for the context, very useful [13:54:09] 10Operations, 10wikidiff2, 10Patch-For-Review, 10WMDE-QWERTY-Team-Board: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4108707 (10MoritzMuehlenhoff) What about gradual upgrades, usually we roll out updates in stages, when this moves to production, we can... [13:58:11] (03PS3) 10Gehel: wdqs: configure new servers wdqs100[6-8] [puppet] - 10https://gerrit.wikimedia.org/r/424260 (https://phabricator.wikimedia.org/T187766) [13:58:48] (03CR) 10Gehel: [C: 032] wdqs: configure new servers wdqs100[6-8] [puppet] - 10https://gerrit.wikimedia.org/r/424260 (https://phabricator.wikimedia.org/T187766) (owner: 10Gehel) [14:00:14] !log andrew@tin Started deploy [horizon/deploy@cd1cda6]: Deploying potential fix for T191232 [14:00:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:21] T191232: Cannot remove 'other class' in Horizon Puppet project prefix configuration. - https://phabricator.wikimedia.org/T191232 [14:01:08] !log initial puppet run for wdqs100[678] - T187766 [14:01:09] T187766: Install / configure new WDQS servers - https://phabricator.wikimedia.org/T187766 [14:03:23] (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-postgres-exporter [puppet] - 10https://gerrit.wikimedia.org/r/424238 (https://phabricator.wikimedia.org/T135991) [14:03:30] !log andrew@tin Finished deploy [horizon/deploy@cd1cda6]: Deploying potential fix for T191232 (duration: 03m 17s) [14:03:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:05:08] (03CR) 10Muehlenhoff: [C: 032] Enable base::service_auto_restart for prometheus-postgres-exporter [puppet] - 10https://gerrit.wikimedia.org/r/424238 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [14:06:36] PROBLEM - Check systemd state on wdqs2006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:06:36] PROBLEM - Blazegraph Port on wdqs2006 is CRITICAL: connect to address 127.0.0.1 and port 9999: Connection refused [14:06:45] PROBLEM - WDQS HTTP Port on wdqs2006 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time [14:06:55] PROBLEM - Blazegraph process on wdqs2006 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war [14:07:31] ^ wdqs failure above are the initial install, silencing them right now [14:32:48] 10Operations, 10Analytics, 10New-Readers, 10Traffic, and 2 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#3961955 (10mbaluta) Hi! I'm from Opera Mini server team. We did not move any traffic between DCs on specified dates and I don't see any changes on Mini server si... [14:35:16] (03CR) 10Marostegui: [C: 032] db-codfw.php: db2053 - no longer candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424302 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [14:36:46] (03Merged) 10jenkins-bot: db-codfw.php: db2053 - no longer candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424302 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [14:38:12] (03CR) 10jenkins-bot: db-codfw.php: db2053 - no longer candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424302 (https://phabricator.wikimedia.org/T191275) (owner: 10Marostegui) [14:38:55] !log marostegui@tin Synchronized wmf-config/db-codfw.php: db2053 is no longer a candidate master (duration: 01m 17s) [14:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:36] 10Operations, 10wikidiff2, 10Patch-For-Review, 10WMDE-QWERTY-Team-Board: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4108869 (10WMDE-Fisch) When you do gradual upgrades, what's period we are talking about here? [14:40:58] (03PS1) 10Marostegui: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424322 (https://phabricator.wikimedia.org/T187089) [14:41:33] 10Operations, 10ops-codfw, 10Traffic: cp[2006,2008,2010-2011,2017-2018,2022].codfw.wmnet: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T190540#4108873 (10Papaul) @ema @BBlack I need to update BIOS and IDRAC on cp2018 as requested by Dell. Can you please depool the server since it is not sho... [14:41:59] 10Operations, 10ops-codfw, 10Traffic: cp2018 memory replacement - https://phabricator.wikimedia.org/T191228#4108878 (10Papaul) @ema @BBlack I need to update BIOS and IDRAC on cp2018 as requested by Dell. Can you please depool the server since it is not showing depooled on the sheet. Thanks. [14:43:02] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424322 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [14:43:04] !log uploaded apache2 2.4.10-10+deb8u12+wmf1 to apt.wikimedia.org/jessie-wikimedia (rebase of our local patches against the latest DSA) [14:43:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:18] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424322 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [14:44:28] Would anyone be able to help me out quickly, by merging a change to navtiming.py that's located in the puppet repo: https://gerrit.wikimedia.org/r/#/c/423959/ [14:44:56] Has no effect on our systems or anything, we just haven't had a chance to pull this out of puppet yet [14:46:05] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1066 for alter table (duration: 01m 17s) [14:46:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424322 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [14:50:51] marlier: checking [14:52:44] elukey: thanks! [14:53:29] marlier: I think we can merge, ready now? [14:53:38] Yep, go ahead! [14:53:40] (03PS5) 10Elukey: NavigationTiming: Move logic to the server side [puppet] - 10https://gerrit.wikimedia.org/r/423959 (https://phabricator.wikimedia.org/T181425) (owner: 10Imarlier) [14:54:03] !log Deploy schema change on db1066 - T187089 T185128 T153182 [14:54:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:11] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [14:54:11] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [14:54:11] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [14:54:17] (03CR) 10Elukey: [C: 032] NavigationTiming: Move logic to the server side [puppet] - 10https://gerrit.wikimedia.org/r/423959 (https://phabricator.wikimedia.org/T181425) (owner: 10Imarlier) [14:54:34] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4108926 (10faidon) Ah! That's a regular mainboard/SATA controller, so these two drives wouldn't be able to participate in RAID groups. We've done that before I thi... [14:54:58] marlier: done! [14:55:06] Thanks much [14:56:45] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4108939 (10Marostegui) @RobH can you let us know when the switch is ready so we can move db2039? Thanks! [14:58:45] (03CR) 10Vgutierrez: [C: 032] Release 1.15.3: Avoid having a hard requirement on prometheus-client [debs/pybal] - 10https://gerrit.wikimedia.org/r/423696 (https://phabricator.wikimedia.org/T190527) (owner: 10Vgutierrez) [14:59:04] (03CR) 10Vgutierrez: [C: 032] Release 1.15.3: Avoid having a hard requirement on prometheus-client [debs/pybal] (1.15) - 10https://gerrit.wikimedia.org/r/423698 (https://phabricator.wikimedia.org/T190527) (owner: 10Vgutierrez) [14:59:14] !log installing apache security updates [14:59:18] (03CR) 10Vgutierrez: [C: 032] Release 1.15.3: Avoid having a hard requirement on prometheus-client [debs/pybal] (1.15-stretch) - 10https://gerrit.wikimedia.org/r/423699 (https://phabricator.wikimedia.org/T190527) (owner: 10Vgutierrez) [14:59:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:31] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4108959 (10RobH) I've gone ahead and enabled asw-d1-codfw ge-1/0/14, and left asw-c6-codfw ge-6/0/6 online for now. Once the system is fully moved, we'll remove the port info f... [15:07:28] (03PS1) 10Papaul: DNS: Remove db2039 from private1-c-codfw and place it in private1-d-codfw [dns] - 10https://gerrit.wikimedia.org/r/424329 (https://phabricator.wikimedia.org/T191193) [15:15:24] (03PS5) 10DCausse: Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) [15:16:11] (03CR) 10jerkins-bot: [V: 04-1] Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [15:17:48] !log stopping mariadb on db2039 T191193 [15:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:54] T191193: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193 [15:18:08] (03PS6) 10DCausse: Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) [15:19:24] 10Operations, 10Analytics, 10New-Readers, 10Traffic, and 2 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4109027 (10Nuria) Twitter is the best! Thanks for taking time to look into this: @mbaluta: how about we setup a short meeting and we can go over data changes we... [15:21:25] (03PS1) 10Herron: puppetmaster: add rhodium worker offline [puppet] - 10https://gerrit.wikimedia.org/r/424334 [15:21:48] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster: add rhodium worker offline [puppet] - 10https://gerrit.wikimedia.org/r/424334 (owner: 10Herron) [15:26:40] !log uploaded pybal 1.15.3 for stretch on apt.w.o [15:26:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:55] (03Abandoned) 10Herron: puppetmaster: add rhodium worker offline [puppet] - 10https://gerrit.wikimedia.org/r/424334 (owner: 10Herron) [15:27:20] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Change db2039 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424335 (https://phabricator.wikimedia.org/T191193) [15:27:28] (03PS1) 10Herron: puppetmaster: add rhodium worker offline [puppet] - 10https://gerrit.wikimedia.org/r/424336 [15:27:49] (03PS2) 10Marostegui: DNS: Remove db2039 from private1-c-codfw and place it in private1-d-codfw [dns] - 10https://gerrit.wikimedia.org/r/424329 (https://phabricator.wikimedia.org/T191193) (owner: 10Papaul) [15:30:28] (03CR) 10Marostegui: [C: 032] DNS: Remove db2039 from private1-c-codfw and place it in private1-d-codfw [dns] - 10https://gerrit.wikimedia.org/r/424329 (https://phabricator.wikimedia.org/T191193) (owner: 10Papaul) [15:30:57] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Change db2039 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424335 (https://phabricator.wikimedia.org/T191193) (owner: 10Marostegui) [15:32:20] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Change db2039 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424335 (https://phabricator.wikimedia.org/T191193) (owner: 10Marostegui) [15:32:34] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Change db2039 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424335 (https://phabricator.wikimedia.org/T191193) (owner: 10Marostegui) [15:34:12] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Change db2039 IP as it is being moved to a different rack - T191193 (duration: 01m 17s) [15:34:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:18] T191193: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193 [15:34:51] (03PS7) 10DCausse: Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) [15:35:38] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Change db2039 IP as it is being moved to a different rack - T191193 (duration: 01m 17s) [15:35:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:09] (03CR) 10jerkins-bot: [V: 04-1] Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [15:37:39] (03PS8) 10DCausse: Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) [15:39:24] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4109104 (10Papaul) Racktables update. moved db2039 from C6 to D1 [15:39:55] (03PS3) 10Ottomata: Update kafka java.security file with Java 8 u162 changes [puppet] - 10https://gerrit.wikimedia.org/r/421891 (https://phabricator.wikimedia.org/T190400) [15:40:03] (03CR) 10Ottomata: Update kafka java.security file with Java 8 u162 changes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/421891 (https://phabricator.wikimedia.org/T190400) (owner: 10Ottomata) [15:40:19] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4109106 (10Papaul) Please update the task with the next server to move so I can can the rack ready. Thanks [15:41:11] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4109110 (10Marostegui) >>! In T191193#4109106, @Papaul wrote: > Please update the task with the next server to move so I can can the rack ready. Thanks let's go for db2040 as n... [15:42:10] 10Operations, 10Proton, 10Readers-Web-Backlog, 10Services (watching): Choose a server for the chromium-render service - https://phabricator.wikimedia.org/T187821#4109114 (10mobrovac) @Niedzielski next week @Joe @akosiaris and I will meet to make a decision on where exactly to put the service, and then we'l... [15:44:17] !log updating librdkafka1 to 0.11.3 on cache misc [15:44:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:46] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4109121 (10Papaul) [15:52:22] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/421891 (https://phabricator.wikimedia.org/T190400) (owner: 10Ottomata) [15:53:53] (03CR) 10Krinkle: "Little follow-up." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/423959 (https://phabricator.wikimedia.org/T181425) (owner: 10Imarlier) [15:54:39] !log updating librdkafka1 to 0.11.3 on cache upload [15:54:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:56] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4109139 (10Papaul) switch port information when ready to move db2040. db2040 was on asw-c6-codfw ge-6/0/7 and now will be on asw-a3-codfw ge-3/0/ 27 new ip address will be :... [15:56:23] RECOVERY - Blazegraph Port on wdqs2006 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 [15:56:32] RECOVERY - Check systemd state on wdqs2006 is OK: OK - running: The system is fully operational [15:56:50] 10Operations, 10monitoring, 10Patch-For-Review, 10Services (watching): Add Reading Infrastructure engineers to contacts for mobileapps - https://phabricator.wikimedia.org/T189524#4109148 (10Mholloway) Sorry, sounds like it might be worth keeping open, then. I don't know my way around the Icinga web UI wel... [15:56:52] RECOVERY - Blazegraph process on wdqs2006 is OK: PROCS OK: 1 process with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war [15:56:53] RECOVERY - WDQS HTTP Port on wdqs2006 is OK: HTTP OK: HTTP/1.1 200 OK - 434 bytes in 0.071 second response time [15:56:54] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4109149 (10Marostegui) >>! In T191193#4109139, @Papaul wrote: > switch port information when ready to move db2040. > > db2040 was on asw-c6-codfw ge-6/0/7 and now will be on a... [15:56:59] 10Operations, 10monitoring, 10Patch-For-Review, 10Services (watching): Add Reading Infrastructure engineers to contacts for mobileapps - https://phabricator.wikimedia.org/T189524#4109150 (10Mholloway) 05Resolved>03Open [15:57:32] PROBLEM - High lag on wdqs2006 is CRITICAL: CRITICAL - scalar(time() - blazegraph_lastupdated{instance=wdqs2006:9193}): 6486.0 = 3600.0 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [15:58:49] gehel: FYI ^^^ [15:58:57] (03PS1) 10Ottomata: Use profile::kafka::mirror with --new.consumer for main-codfw -> main-eqiad mirror [puppet] - 10https://gerrit.wikimedia.org/r/424344 (https://phabricator.wikimedia.org/T190940) [15:59:06] yep, that's me again, already recovering... [15:59:23] (03CR) 10jerkins-bot: [V: 04-1] Use profile::kafka::mirror with --new.consumer for main-codfw -> main-eqiad mirror [puppet] - 10https://gerrit.wikimedia.org/r/424344 (https://phabricator.wikimedia.org/T190940) (owner: 10Ottomata) [15:59:36] ACKNOWLEDGEMENT - High lag on wdqs2006 is CRITICAL: CRITICAL - scalar(time() - blazegraph_lastupdated{instance=wdqs2006:9193}): 6228.0 = 3600.0 Gehel data transfer in progress, recovering already https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [16:00:04] godog, moritzm, and _joe_: How many deployers does it take to do Puppet SWAT(Max 8 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180405T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:06:07] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#4109181 (10jcrespo) 05Open>03Resolved dbstore1001 is back in use. Thanks for everyone that helped upgrading it and recover fro... [16:09:14] !log updating librdkafka1 to 0.11.3 on cache text [16:09:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:07] (03PS2) 10Ayounsi: Prometheus: aggregates netstat_Icmp_In and InEcho by cluster [puppet] - 10https://gerrit.wikimedia.org/r/424139 [16:10:52] (03CR) 10Ayounsi: [C: 032] Prometheus: aggregates netstat_Icmp_In and InEcho by cluster [puppet] - 10https://gerrit.wikimedia.org/r/424139 (owner: 10Ayounsi) [16:14:13] (03PS2) 10Ottomata: Use profile::kafka::mirror with --new.consumer for main-codfw -> main-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/424344 (https://phabricator.wikimedia.org/T190940) [16:21:33] RECOVERY - High lag on wdqs2006 is OK: OK - scalar(time() - blazegraph_lastupdated{instance=wdqs2006:9193}) within thresholds https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [16:28:49] 10Operations, 10ops-codfw, 10Traffic: cp2011 memory replacement - https://phabricator.wikimedia.org/T191226#4109287 (10Papaul) @ema @BBlack I need to update BIOS and IDRAC on cp2011 as requested by Dell. Can you please depool the server since it is not showing depooled on the sheet. Thanks. [16:29:15] 10Operations, 10ops-codfw, 10Traffic: cp2008 memory replacement - https://phabricator.wikimedia.org/T191224#4109289 (10Papaul) @ema @BBlack I need to update BIOS and IDRAC on cp2008 as requested by Dell. Can you please depool the server since it is not showing depooled on the sheet. Thanks. [16:33:58] (03CR) 10Ottomata: "Looks as expected: https://puppet-compiler.wmflabs.org/compiler02/10830/" [puppet] - 10https://gerrit.wikimedia.org/r/424344 (https://phabricator.wikimedia.org/T190940) (owner: 10Ottomata) [16:37:59] 10Operations, 10ops-codfw, 10Traffic: cp2008 memory replacement - https://phabricator.wikimedia.org/T191224#4109385 (10RobH) I'm not sure what purpose flashing the bios is going to accomplish. We cannot recreate this error regularly with the current version of the bios, and on other systems we've had this i... [16:41:06] !log cp2008 shutting down for firmware updates [16:41:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:35] 10Operations, 10ops-codfw, 10Traffic: cp2008 memory replacement - https://phabricator.wikimedia.org/T191224#4109417 (10RobH) Old comment removed due to IRC discussion, Papaul points out they aren't refusing the replacement, but just want updated TSR. cp2008 has been shutdown. it can be flashed and updated... [16:44:18] PROBLEM - Host cp2008 is DOWN: PING CRITICAL - Packet loss = 100% [16:45:50] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4109425 (10Nuria) [16:46:43] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#3961955 (10Nuria) p:05Triage>03High [16:56:38] RECOVERY - Host cp2008 is UP: PING OK - Packet loss = 0%, RTA = 36.07 ms [17:00:04] cscott, arlolra, subbu, halfak, and Amir1: Time to snap out of that daydream and deploy Services – Graphoid / Parsoid / Citoid / ORES. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180405T1700). [17:00:04] No GERRIT patches in the queue for this window AFAICS. [17:00:07] 10Operations, 10Proton, 10Readers-Web-Backlog, 10Services (watching): Choose a server for the chromium-render service - https://phabricator.wikimedia.org/T187821#4109515 (10Niedzielski) Thanks @mobrovac! ^/cc @ovasileva [17:00:56] no parsoid deploy today [17:01:29] (03CR) 10Smalyshev: Add cirrussearch settings for wikibase (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [17:02:10] 10Operations, 10Analytics, 10netops, 10User-Elukey: Review ACLs for the Analytics VLAN - https://phabricator.wikimedia.org/T157435#4109537 (10Nuria) 05stalled>03Resolved [17:05:12] (03CR) 10Smalyshev: Add cirrussearch settings for wikibase (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [17:06:06] 10Operations, 10Analytics, 10netops, 10User-Elukey: Review ACLs for the Analytics VLAN - https://phabricator.wikimedia.org/T157435#4109561 (10elukey) Since this task has been open for a long time, I'll open a new one when we'll be ready to create the analytics-in6 filter. [17:08:59] !log bsitzmann@tin Started deploy [mobileapps/deploy@eed7961]: Update mobileapps to dbc0687 (T187430) [17:09:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:05] T187430: Duplicate usage examples in Wiktionary page definition endpoint - https://phabricator.wikimedia.org/T187430 [17:10:19] (03CR) 10DCausse: Add cirrussearch settings for wikibase (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [17:11:33] 10Operations, 10ops-codfw, 10Traffic: cp2008 memory replacement - https://phabricator.wikimedia.org/T191224#4109598 (10Papaul) BIOS and IDRAC update complete [17:17:30] 10Operations, 10Analytics, 10Analytics-Cluster: Clean up permissions for privatedata files on stat1005 - they should be group readable by statistics-privatedata-users - https://phabricator.wikimedia.org/T89887#1048013 (10Nuria) There are not many private files anymore, declining. [17:17:40] 10Operations, 10Analytics, 10Analytics-Cluster: Clean up permissions for privatedata files on stat1005 - they should be group readable by statistics-privatedata-users - https://phabricator.wikimedia.org/T89887#4109632 (10Nuria) 05Open>03declined [17:18:44] !log bsitzmann@tin Finished deploy [mobileapps/deploy@eed7961]: Update mobileapps to dbc0687 (T187430) (duration: 09m 45s) [17:18:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:51] T187430: Duplicate usage examples in Wiktionary page definition endpoint - https://phabricator.wikimedia.org/T187430 [17:18:55] 10Operations, 10ops-codfw, 10Traffic: cp2008 memory replacement - https://phabricator.wikimedia.org/T191224#4109636 (10Papaul) TSR upload to Dell [17:21:00] (03PS1) 10EddieGP: Beta: Unbreak wwwportals [puppet] - 10https://gerrit.wikimedia.org/r/424361 [17:21:08] (03PS1) 10Ladsgroup: Disable logging autopatrol actins in commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424362 (https://phabricator.wikimedia.org/T184485) [17:21:48] (03CR) 10Smalyshev: Add cirrussearch settings for wikibase (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [17:22:30] (03CR) 10jerkins-bot: [V: 04-1] Disable logging autopatrol actins in commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424362 (https://phabricator.wikimedia.org/T184485) (owner: 10Ladsgroup) [17:27:37] (03CR) 10Ladsgroup: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424362 (https://phabricator.wikimedia.org/T184485) (owner: 10Ladsgroup) [17:29:59] (03PS9) 10DCausse: Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) [17:30:50] (03CR) 10jerkins-bot: [V: 04-1] Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [17:32:39] 10Operations, 10Analytics, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4109729 (10Nuria) p:05Normal>03Low [17:32:53] 10Operations, 10Dumps-Generation, 10Patch-For-Review: data retrieval/write issues via NFS on dumpsdata1001, impacting some dump jobs - https://phabricator.wikimedia.org/T191177#4109731 (10ArielGlenn) After the relevant reading, I'm leaning hard towards this being some sort of NFS cache thing. Need to to some... [17:33:25] !log start of ladsgroup@terbium:~$ mwscript deleteAutoPatrolLogs.php --wiki=commonswiki --before 20180223210426 (T184485) [17:33:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:31] T184485: Stop logging autopatrol actions - https://phabricator.wikimedia.org/T184485 [17:34:03] (03PS10) 10DCausse: Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) [17:35:12] (03PS3) 10Ottomata: Use profile::kafka::mirror with --new.consumer for main-codfw -> main-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/424344 (https://phabricator.wikimedia.org/T190940) [17:38:13] (03PS2) 10Mark Bergsma: Don't use deprecated TestCase methods [debs/pybal] - 10https://gerrit.wikimedia.org/r/423994 [17:39:15] (03CR) 10Mark Bergsma: [C: 032] Don't use deprecated TestCase methods [debs/pybal] - 10https://gerrit.wikimedia.org/r/423994 (owner: 10Mark Bergsma) [17:39:43] (03Merged) 10jenkins-bot: Don't use deprecated TestCase methods [debs/pybal] - 10https://gerrit.wikimedia.org/r/423994 (owner: 10Mark Bergsma) [17:42:44] !log finished the script [17:42:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:49] (03PS11) 10DCausse: Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) [17:53:05] (03PS1) 10Ottomata: Mount dumps on swap nodes [puppet] - 10https://gerrit.wikimedia.org/r/424366 (https://phabricator.wikimedia.org/T176091) [17:53:40] (03CR) 10jerkins-bot: [V: 04-1] Mount dumps on swap nodes [puppet] - 10https://gerrit.wikimedia.org/r/424366 (https://phabricator.wikimedia.org/T176091) (owner: 10Ottomata) [17:55:29] (03PS2) 10Ottomata: Mount dumps on swap nodes [puppet] - 10https://gerrit.wikimedia.org/r/424366 (https://phabricator.wikimedia.org/T176091) [17:56:09] (03CR) 10Ottomata: [C: 032] Mount dumps on swap nodes [puppet] - 10https://gerrit.wikimedia.org/r/424366 (https://phabricator.wikimedia.org/T176091) (owner: 10Ottomata) [17:56:45] !log start of ladsgroup@terbium:~$ mwscript deleteAutoPatrolLogs.php --wiki=commonswiki --before 20180223210426 --from-id 156008475 (T184485) [17:56:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:51] T184485: Stop logging autopatrol actions - https://phabricator.wikimedia.org/T184485 [18:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Morning SWAT (Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180405T1800). [18:00:04] Daimona and Amir1: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:11] o/ [18:00:30] Hi again [18:01:25] (03CR) 10EddieGP: "Unbreaks www.wikiPedia.beta.wmflabs.org, confirmed via cherry-pick." [puppet] - 10https://gerrit.wikimedia.org/r/424361 (owner: 10EddieGP) [18:03:19] Hello [18:03:21] I can SWAT [18:03:43] Daimona: Do you have the WikimediaDebug browser extension installed? [18:04:12] RoanKattouw: Mine is not testable [18:04:30] Yes [18:04:46] Amir1: Are you sure? Can't you make an edit with an autopatrolled account on Commons using WMDebug and see if it gets logged? [18:04:53] Or is there job queue stuff involved? [18:04:58] (03CR) 10Catrope: [C: 032] Disable logging autopatrol actins in commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424362 (https://phabricator.wikimedia.org/T184485) (owner: 10Ladsgroup) [18:05:34] RoanKattouw: TBH, I can't say if it's a deferred update, job triggered or it happens at the same time [18:05:40] OK [18:05:51] It's a pretty simple config change anyway [18:05:57] one other thing is it has been already working on wikidata [18:06:07] This is not the first wiki [18:06:19] (03Merged) 10jenkins-bot: Disable logging autopatrol actins in commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424362 (https://phabricator.wikimedia.org/T184485) (owner: 10Ladsgroup) [18:06:38] Right [18:08:19] (03CR) 10jenkins-bot: Disable logging autopatrol actins in commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424362 (https://phabricator.wikimedia.org/T184485) (owner: 10Ladsgroup) [18:08:30] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Disable logging autopatrol actions on commonswiki (T184485) (duration: 01m 17s) [18:08:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:08:37] T184485: Stop logging autopatrol actions - https://phabricator.wikimedia.org/T184485 [18:08:49] (03PS2) 10EddieGP: Beta: Unbreak wwwportals [puppet] - 10https://gerrit.wikimedia.org/r/424361 (https://phabricator.wikimedia.org/T173887) [18:09:19] Amir1: There you go [18:09:47] * RoanKattouw glares at Zuul for not prioritizing the supposedly prioritized queue that Daimona's patch is in [18:09:49] Thank you! [18:11:55] PROBLEM - puppet last run on notebook1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:12:33] (03CR) 10Smalyshev: [C: 031] Add cirrussearch settings for wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/419367 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse) [18:12:59] Poor Zuul [18:13:14] 10Puppet, 10Beta-Cluster-Infrastructure: labs-puppetmaster/Labs Puppetmaster HTTPS is UNKNOWN since [...] - https://phabricator.wikimedia.org/T191553#4109867 (10MarcoAurelio) [18:16:20] (03PS1) 10Ottomata: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/424369 (https://phabricator.wikimedia.org/T176091) [18:16:31] (03CR) 10Ottomata: [V: 032 C: 032] Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/424369 (https://phabricator.wikimedia.org/T176091) (owner: 10Ottomata) [18:18:19] Finally! [18:18:35] Daimona: OK, your change is on mwdebug1002, please test [18:19:02] Yeah [18:19:19] (03PS1) 10Ottomata: Fix parameters for statistics::dataset_mount on swap [puppet] - 10https://gerrit.wikimedia.org/r/424370 (https://phabricator.wikimedia.org/T176091) [18:19:53] (03CR) 10jerkins-bot: [V: 04-1] Fix parameters for statistics::dataset_mount on swap [puppet] - 10https://gerrit.wikimedia.org/r/424370 (https://phabricator.wikimedia.org/T176091) (owner: 10Ottomata) [18:20:14] PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:21:30] Seems to work, please give us another minut to finish testing [18:21:38] (03PS2) 10Ottomata: Fix parameters for statistics::dataset_mount on swap [puppet] - 10https://gerrit.wikimedia.org/r/424370 (https://phabricator.wikimedia.org/T176091) [18:22:00] RoanKattouw: It works for me [18:22:11] (03CR) 10Ottomata: [C: 032] Fix parameters for statistics::dataset_mount on swap [puppet] - 10https://gerrit.wikimedia.org/r/424370 (https://phabricator.wikimedia.org/T176091) (owner: 10Ottomata) [18:22:16] Ok [18:22:20] We can deploy [18:23:19] Alright, deploying [18:24:50] RoanKattouw: Can I add something to the SWAT? [18:25:35] !log catrope@tin Synchronized php-1.31.0-wmf.28/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: Unbreak Special:AbuseFilter (T191512) (duration: 01m 17s) [18:25:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:41] T191512: PHP fatal error: [] operator not supported for strings @ group0 and group1 - https://phabricator.wikimedia.org/T191512 [18:26:12] Amir1: Sure [18:26:39] Thanks RoanKattouw [18:27:53] Amir1: What's your addition? [18:28:11] RoanKattouw: Lucas_WMDE is making the patch, it'll go up in one sec [18:30:14] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:30:15] (03PS1) 10EddieGP: mediawiki: Remove now unused parameter portal_dir [puppet] - 10https://gerrit.wikimedia.org/r/424371 [18:31:14] (03PS1) 10Lucas Werkmeister (WMDE): Disable writing wb_terms search fields on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424372 (https://phabricator.wikimedia.org/T189777) [18:31:35] RoanKattouw, Amir1: the patch in question :) ^ [18:31:46] (I’ll add it to the calendar too, for the record) [18:31:55] RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:34:48] (03CR) 10Catrope: [C: 032] Disable writing wb_terms search fields on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424372 (https://phabricator.wikimedia.org/T189777) (owner: 10Lucas Werkmeister (WMDE)) [18:36:04] (03Merged) 10jenkins-bot: Disable writing wb_terms search fields on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424372 (https://phabricator.wikimedia.org/T189777) (owner: 10Lucas Werkmeister (WMDE)) [18:37:46] Lucas_WMDE: Live on mwdebug1002, please test [18:38:19] (03CR) 10jenkins-bot: Disable writing wb_terms search fields on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424372 (https://phabricator.wikimedia.org/T189777) (owner: 10Lucas Werkmeister (WMDE)) [18:41:32] RoanKattouw: seems to work as expected [18:41:50] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4109965 (10Andrew) We'll end up wasting a fair bit of space if we have to break these up into separate volumes. Pinging @chasemp in case he thinks we can make thi... [18:43:41] Lucas_WMDE: OK, deploying [18:43:49] thanks [18:45:41] !log catrope@tin Synchronized wmf-config/Wikibase-production.php: Disable writing wb_terms search fields on Wikidata (T189777) (duration: 01m 16s) [18:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:48] T189777: Disable reading from term_search_key from wb_terms table in wikidata - https://phabricator.wikimedia.org/T189777 [18:49:38] (03PS3) 10Ladsgroup: mediawiki: Start deleteAutoPatrolLogs from Wikidata logging table [puppet] - 10https://gerrit.wikimedia.org/r/424300 (https://phabricator.wikimedia.org/T189596) [19:00:04] twentyafterfour: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for MediaWiki train deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180405T1900). [19:00:04] No GERRIT patches in the queue for this window AFAICS. [19:03:03] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4110036 (10Nuria) >Normally users from Nigeria connect to a data center in Europe, however between February 17th - March 10th I see a small n... [19:19:55] (03PS1) 10Ottomata: 2.3.0 Hadoop 2.6 release [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424380 (https://phabricator.wikimedia.org/T159962) [19:21:25] (03CR) 10Ottomata: [C: 032] 2.3.0 Hadoop 2.6 release [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424380 (https://phabricator.wikimedia.org/T159962) (owner: 10Ottomata) [19:34:10] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[spark2] [19:34:37] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4110076 (10chasemp) >>! In T187373#4109965, @Andrew wrote: > I think we probably need a new controller. Yep [19:36:35] (03PS1) 10Ottomata: Add README instructions for updating assembly .zip after upgrading [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424381 [19:37:04] (03CR) 10Ottomata: [C: 032] Add README instructions for updating assembly .zip after upgrading [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424381 (owner: 10Ottomata) [19:37:43] (03PS2) 10Ottomata: Add README instructions for updating assembly .zip after upgrading [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424381 [19:38:02] (03CR) 10Ottomata: [V: 032 C: 032] Add README instructions for updating assembly .zip after upgrading [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424381 (owner: 10Ottomata) [19:46:19] (03PS1) 10Andrew Bogott: labs puppetmaster: hold back OpenStack version for openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/424383 (https://phabricator.wikimedia.org/T145919) [19:47:17] (03CR) 10Andrew Bogott: [C: 032] labs puppetmaster: hold back OpenStack version for openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/424383 (https://phabricator.wikimedia.org/T145919) (owner: 10Andrew Bogott) [19:47:37] (03PS1) 1020after4: all wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424384 [19:47:39] (03CR) 1020after4: [C: 032] all wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424384 (owner: 1020after4) [19:48:54] (03Merged) 10jenkins-bot: all wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424384 (owner: 1020after4) [19:49:09] (03CR) 10jenkins-bot: all wikis to 1.31.0-wmf.28 refs T183967 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424384 (owner: 1020after4) [19:51:38] !log robh@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2008.codfw.wmnet [19:51:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:07] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: all wikis to 1.31.0-wmf.28 refs T183967 [19:52:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:13] T183967: 1.31.0-wmf.28 deployment blockers - https://phabricator.wikimedia.org/T183967 [19:57:54] (03CR) 10Herron: [C: 032] puppetmaster: add rhodium worker offline [puppet] - 10https://gerrit.wikimedia.org/r/424336 (owner: 10Herron) [19:58:03] (03PS2) 10Herron: puppetmaster: add rhodium worker offline [puppet] - 10https://gerrit.wikimedia.org/r/424336 [19:59:54] !log added rhodium puppet master backend in offline mode [19:59:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:55] 10Operations, 10ops-codfw, 10Traffic: cp2017 memory replacement - https://phabricator.wikimedia.org/T191227#4110136 (10Papaul) The Dell tech call saying he couldn't make it for today. This is now schedule first thing Monday morning . [20:01:09] 10Operations, 10ops-codfw, 10Traffic: cp2010 memory replacement - https://phabricator.wikimedia.org/T191225#4110137 (10Papaul) The Dell tech call saying he couldn't make it for today. This is now schedule first thing Monday morning . [20:01:19] 10Operations, 10ops-codfw, 10Traffic: cp2006 memory replacement - https://phabricator.wikimedia.org/T191223#4110138 (10Papaul) The Dell tech call saying he couldn't make it for today. This is now schedule first thing Monday morning . [20:01:30] 10Operations, 10ops-codfw, 10Traffic: cp2022 memory replacement - https://phabricator.wikimedia.org/T191229#4110139 (10Papaul) The Dell tech call saying he couldn't make it for today. This is now schedule first thing Monday morning . [20:04:04] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [20:05:24] (03CR) 10Hoo man: "> not sure, shouldn't canary hosts be exactly like actual prod hosts on the other hand" [puppet] - 10https://gerrit.wikimedia.org/r/391045 (owner: 10Hoo man) [20:05:59] (03PS5) 10Zoranzoki21: Enable on ku.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/423188 (https://phabricator.wikimedia.org/T190944) [20:06:15] (03PS2) 10Herron: remove nitrogen and nihal from site.pp and install_server [puppet] - 10https://gerrit.wikimedia.org/r/424064 (https://phabricator.wikimedia.org/T191467) [20:07:09] !log deploying https://gerrit.wikimedia.org/r/#/c/424379/ refs T191335 [20:07:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:16] T191335: InvalidArgumentException from line 13 of EchoIcon.php: The trash icon is not registered - https://phabricator.wikimedia.org/T191335 [20:10:36] !log twentyafterfour@tin Synchronized php-1.31.0-wmf.28/extensions/Echo/: Sync https://gerrit.wikimedia.org/r/#/c/424379/ refs T183967 (duration: 01m 05s) [20:10:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:43] T183967: 1.31.0-wmf.28 deployment blockers - https://phabricator.wikimedia.org/T183967 [20:14:38] (03PS3) 10Herron: remove nitrogen and nihal from site.pp and install_server [puppet] - 10https://gerrit.wikimedia.org/r/424064 (https://phabricator.wikimedia.org/T191467) [20:15:15] (03PS1) 10Andrew Bogott: labs puppetmaster: hold back OpenStack version for openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/424394 (https://phabricator.wikimedia.org/T145919) [20:16:46] (03CR) 10Andrew Bogott: [C: 032] labs puppetmaster: hold back OpenStack version for openstack client libs [puppet] - 10https://gerrit.wikimedia.org/r/424394 (https://phabricator.wikimedia.org/T145919) (owner: 10Andrew Bogott) [20:22:15] (03CR) 10Herron: [C: 032] remove nitrogen and nihal from site.pp and install_server [puppet] - 10https://gerrit.wikimedia.org/r/424064 (https://phabricator.wikimedia.org/T191467) (owner: 10Herron) [20:22:32] (03PS4) 10Herron: remove nitrogen and nihal from site.pp and install_server [puppet] - 10https://gerrit.wikimedia.org/r/424064 (https://phabricator.wikimedia.org/T191467) [20:24:08] (03PS1) 10Ottomata: Install spark2-thriftserver executable [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424444 (https://phabricator.wikimedia.org/T159962) [20:25:07] (03CR) 10Ottomata: [C: 032] Install spark2-thriftserver executable [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424444 (https://phabricator.wikimedia.org/T159962) (owner: 10Ottomata) [20:26:59] (03PS1) 10Ottomata: Fix permissions on spark-thriftserver [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424448 [20:28:16] (03PS2) 10Ottomata: Fix permissions on spark-thriftserver [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424448 [20:33:44] (03PS2) 10Herron: remove nitrogen and nihal from forward/reverse dns [dns] - 10https://gerrit.wikimedia.org/r/424067 (https://phabricator.wikimedia.org/T191467) [20:34:23] (03CR) 10Ottomata: [C: 032] Fix permissions on spark-thriftserver [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/424448 (owner: 10Ottomata) [20:35:43] (03CR) 10Herron: [C: 032] remove nitrogen and nihal from forward/reverse dns [dns] - 10https://gerrit.wikimedia.org/r/424067 (https://phabricator.wikimedia.org/T191467) (owner: 10Herron) [20:47:27] (03CR) 10Bstorm: wiki replicas: drop views with missing tables (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/424166 (https://phabricator.wikimedia.org/T191387) (owner: 10BryanDavis) [20:50:58] 10Operations, 10Puppet, 10Patch-For-Review: Retire nitrogen and nihal ganeti VMs - https://phabricator.wikimedia.org/T191467#4110279 (10herron) 05Open>03Resolved Hosts have been removed from ganeti, dns, puppet and puppetdb and as expected are longer present in icinga. [20:57:21] (03PS1) 10Andrew Bogott: Make labtestservices2003 a temporary nodepool test box [puppet] - 10https://gerrit.wikimedia.org/r/424454 [20:57:57] (03CR) 10jerkins-bot: [V: 04-1] Make labtestservices2003 a temporary nodepool test box [puppet] - 10https://gerrit.wikimedia.org/r/424454 (owner: 10Andrew Bogott) [20:58:38] (03CR) 10Rush: [C: 031] "makes sense, we should reimage this post so I'm not worried about it being dirty" [puppet] - 10https://gerrit.wikimedia.org/r/424454 (owner: 10Andrew Bogott) [20:59:35] (03CR) 10Andrew Bogott: [V: 032 C: 032] "Overriding the -1 as it's about duplicating the ipv6 definition when I split this out into two different node defs." [puppet] - 10https://gerrit.wikimedia.org/r/424454 (owner: 10Andrew Bogott) [21:00:40] (03PS1) 10Ayounsi: Puppet: Remove my old ssh key [puppet] - 10https://gerrit.wikimedia.org/r/424455 [21:01:13] (03CR) 10Ayounsi: [C: 032] Puppet: Remove my old ssh key [puppet] - 10https://gerrit.wikimedia.org/r/424455 (owner: 10Ayounsi) [21:01:19] (03PS2) 10Ayounsi: Puppet: Remove my old ssh key [puppet] - 10https://gerrit.wikimedia.org/r/424455 [21:09:16] PROBLEM - puppet last run on analytics1041 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[spark2] [21:09:36] PROBLEM - Host labtestservices2003 is DOWN: PING CRITICAL - Packet loss = 100% [21:09:49] !log robh@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2008.codfw.wmnet,service=varnish-be [21:09:53] !lot re-imaging labtestservices2003 [21:09:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:10:01] really? [21:12:26] RECOVERY - Host labtestservices2003 is UP: PING OK - Packet loss = 0%, RTA = 36.05 ms [21:12:27] 10Puppet, 10Beta-Cluster-Infrastructure, 10cloud-services-team: labs-puppetmaster/Labs Puppetmaster HTTPS is UNKNOWN since [...] - https://phabricator.wikimedia.org/T191553#4110316 (10EddieGP) The "Labs Puppetmaster HTTPS" check checks whether `https://labs-puppetmaster.wikimedia.org:8140/` returns the expec... [21:20:21] (03PS1) 10Andrew Bogott: labtestservices2003: give CI people sudo access [puppet] - 10https://gerrit.wikimedia.org/r/424458 [21:20:46] (03PS4) 10Volans: First working version [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) [21:20:48] (03PS2) 10Volans: Add CLI script to be installed in the target hosts [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) [21:20:50] (03PS4) 10Volans: Add basic test coverage [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394621 (https://phabricator.wikimedia.org/T167504) [21:21:01] (03CR) 10jerkins-bot: [V: 04-1] First working version [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [21:21:03] (03CR) 10jerkins-bot: [V: 04-1] Add CLI script to be installed in the target hosts [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [21:21:05] (03CR) 10jerkins-bot: [V: 04-1] Add basic test coverage [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394621 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [21:23:33] (03PS2) 10Volans: Created Django apps [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394619 (https://phabricator.wikimedia.org/T167504) [21:23:35] (03PS5) 10Volans: First working version [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) [21:23:37] (03PS3) 10Volans: Add CLI script to be installed in the target hosts [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) [21:23:39] (03PS5) 10Volans: Add basic test coverage [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394621 (https://phabricator.wikimedia.org/T167504) [21:23:43] (03CR) 10jerkins-bot: [V: 04-1] Created Django apps [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394619 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [21:23:47] (03CR) 10jerkins-bot: [V: 04-1] First working version [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394620 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [21:23:49] (03CR) 10jerkins-bot: [V: 04-1] Add CLI script to be installed in the target hosts [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394990 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [21:23:51] (03CR) 10jerkins-bot: [V: 04-1] Add basic test coverage [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394621 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [21:24:26] (03CR) 10Volans: [V: 032] "Tox.ini is added in the next CR, manually verifying +2" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/394619 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [21:28:10] (03CR) 10Andrew Bogott: [C: 032] labtestservices2003: give CI people sudo access [puppet] - 10https://gerrit.wikimedia.org/r/424458 (owner: 10Andrew Bogott) [21:31:02] (03PS38) 10EddieGP: Add mcrouter module and mcrouter_wancache profile and enable on beta [puppet] - 10https://gerrit.wikimedia.org/r/392221 (owner: 10Aaron Schulz) [21:34:48] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp2008.codfw.wmnet [21:34:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:39:16] RECOVERY - puppet last run on analytics1041 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [21:41:45] (03CR) 10EddieGP: "I've updated the cherry-pick on deployment-prep to stop breaking deployment-mediawiki07 now." [puppet] - 10https://gerrit.wikimedia.org/r/392221 (owner: 10Aaron Schulz) [21:43:29] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp2010.codfw.wmnet [21:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:44] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp2006.codfw.wmnet [21:43:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:07] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet [21:44:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:52] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=cp2017.codfw.wmnet [21:44:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:35:10] (03PS1) 10Nuria: Moving sqooping of mediawiki to the 5th of month [puppet] - 10https://gerrit.wikimedia.org/r/424473 [22:36:11] (03PS2) 10Nuria: Moving sqooping of mediawiki to the 5th of month [puppet] - 10https://gerrit.wikimedia.org/r/424473 [22:47:15] I'm unable to ssh into terbium from office wifi. I get - ssh: connect to host bast1001.wikimedia.org port 22: No route to host. ssh_exchange_identification: Connection closed by remote host [22:48:31] I think bast1001 is...going away? being replaced? something... [22:48:48] bast1002 [22:49:02] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#4110551 (10EddieGP) [22:49:03] bast1001 was replaced with bast1002 [22:49:05] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet errors on deployment-mediawiki07 - https://phabricator.wikimedia.org/T190632#4110549 (10EddieGP) 05Open>03Resolved a:03EddieGP [22:49:11] Niharika ^^ [22:49:42] paladox: I'm sshing in as usual (ssh terbium.eqiad.wmnet). How can I specify which bastion? [22:50:01] is it defined in the .ssh/config file? [22:50:13] Lemme check. [22:51:25] paladox: Ah, yes it is. I switch it to bast1002 but now I get - fork failed: Resource temporarily unavailable [22:51:26] ssh_exchange_identification: Connection closed by remote host [22:51:35] oh [22:52:15] paladox: Ah, it works on a retry. [22:52:18] :) [22:52:18] Thanks for your help! [22:52:24] your welcome :) [22:52:54] bblack: yt? [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Evening SWAT (Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180405T2300). [23:00:04] Zoranzoki21: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:05] 10Operations, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 4 others: Opera mini IP addresses reassigned - https://phabricator.wikimedia.org/T187014#4110582 (10Nuria) Varnish5 rollout might have something to do with this? https://gerrit.wikimedia.org/r/#/c/409047/ cc @ema [23:00:44] hi [23:03:04] (03CR) 10BryanDavis: wiki replicas: drop views with missing tables (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/424166 (https://phabricator.wikimedia.org/T191387) (owner: 10BryanDavis) [23:04:40] who will be swater [23:07:09] PROBLEM - High CPU load on API appserver on mw1288 is CRITICAL: CRITICAL - load average: 68.85, 22.14, 13.03 [23:08:09] RECOVERY - High CPU load on API appserver on mw1288 is OK: OK - load average: 26.93, 18.58, 12.37 [23:58:29] (03PS1) 10BryanDavis: mwvagrant: Add sudoer rules for `vagrant up` [puppet] - 10https://gerrit.wikimedia.org/r/424481