[00:00:04] twentyafterfour: That opportune time is upon us again. Time for a Phabricator update deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180517T0000). [00:04:01] (03CR) 10Dzahn: "i'll move it to modules/ganeti/files/ and try to install it only on actual master nodes" [puppet] - 10https://gerrit.wikimedia.org/r/433296 (owner: 10Dzahn) [00:18:20] (03PS4) 10Dzahn: ganeti: add interactive script to create VMs [puppet] - 10https://gerrit.wikimedia.org/r/433296 [00:25:28] (03CR) 10Dzahn: "i moved it to the ganeti module itself so that it gets installed on servers in /usr/local/bin/. ok? Also, i would have done "only if on " [puppet] - 10https://gerrit.wikimedia.org/r/433296 (owner: 10Dzahn) [00:48:34] (03PS1) 10Dzahn: scap/deployment_server: replace trebuchet with mwdeploy user (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/433516 [00:49:04] (03CR) 10Dzahn: [C: 04-2] scap/deployment_server: replace trebuchet with mwdeploy user (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/433516 (owner: 10Dzahn) [00:57:08] !log installing OS on webperf1002, webperf2002 [00:57:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:01:00] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation=compareAndSwap https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:02:20] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:27:20] (03PS2) 10Dzahn: add webperf1002/2002 as spare systems with IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/433298 (https://phabricator.wikimedia.org/T194390) [01:28:25] (03PS3) 10Dzahn: add webperf1002/2002 as spare systems with IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/433298 (https://phabricator.wikimedia.org/T194390) [01:28:33] (03CR) 10Dzahn: [C: 032] add webperf1002/2002 as spare systems with IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/433298 (https://phabricator.wikimedia.org/T194390) (owner: 10Dzahn) [03:04:43] !log l10nupdate@tin scap sync-l10n completed (1.32.0-wmf.3) (duration: 14m 27s) [03:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:28:58] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 964.45 seconds [04:02:31] !log l10nupdate@tin scap sync-l10n completed (1.32.0-wmf.4) (duration: 14m 08s) [04:02:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:10:13] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu May 17 04:10:13 UTC 2018 (duration 7m 42s) [04:10:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:19:49] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 235.67 seconds [04:48:03] ACKNOWLEDGEMENT - Juniper alarms on asw-c-eqiad is CRITICAL: JNX_ALARMS CRITICAL - 1 red alarms, 0 yellow alarms Ayounsi https://phabricator.wikimedia.org/T194858 [05:09:39] !log Deploy schema change on s4 primary master (db1068) - T191519 T188299 T190148 [05:09:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:09:45] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [05:09:45] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [05:09:45] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [05:10:55] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433517 [05:10:56] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433517 [05:13:13] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433517 (owner: 10Marostegui) [05:14:42] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433517 (owner: 10Marostegui) [05:18:23] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 after alter table (duration: 01m 48s) [05:18:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:19:36] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433517 (owner: 10Marostegui) [05:20:07] !log Force BBU learn cycle on db1054 - T194867 [05:20:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:20:11] T194867: BBU issues on db1054 (s2 primary master) - https://phabricator.wikimedia.org/T194867 [05:22:26] (03PS1) 10Marostegui: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433518 (https://phabricator.wikimedia.org/T190148) [05:28:02] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433518 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:29:41] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433518 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:31:28] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1079 for alter table (duration: 01m 22s) [05:31:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:31:34] !log Deploy schema change on db1079 with replication (this will generate lag on labs s7) - T191519 T188299 T190148 [05:31:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:31:40] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [05:31:40] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [05:31:40] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [05:32:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433518 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:32:28] (03PS2) 10Marostegui: db-codfw.php: Specifiy sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433343 (https://phabricator.wikimedia.org/T190704) [05:36:20] (03CR) 10Marostegui: [C: 032] db-codfw.php: Specifiy sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433343 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [05:36:22] (03PS1) 10Marostegui: mariadb: Make explicit they use ROW based replication [puppet] - 10https://gerrit.wikimedia.org/r/433519 (https://phabricator.wikimedia.org/T190704) [05:37:47] (03Merged) 10jenkins-bot: db-codfw.php: Specifiy sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433343 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [05:38:01] (03CR) 10jenkins-bot: db-codfw.php: Specifiy sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433343 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [05:38:44] (03CR) 10Marostegui: [C: 032] mariadb: Make explicit they use ROW based replication [puppet] - 10https://gerrit.wikimedia.org/r/433519 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [05:44:38] PROBLEM - MegaRAID on db1054 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [05:45:05] ^ I am with that: T194867 [05:45:05] T194867: BBU issues on db1054 (s2 primary master) - https://phabricator.wikimedia.org/T194867 [05:52:54] !log Disable BBU auto-learn on new hosts - T192979 [05:52:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:52:59] T192979: Productionize 8 eqiad hosts - https://phabricator.wikimedia.org/T192979 [05:54:58] RECOVERY - MegaRAID on db1054 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [06:46:09] PROBLEM - MegaRAID on db1054 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [06:51:04] ^ that is probably because I did a few BBU relearn cycles [06:53:58] (03PS1) 10Jcrespo: mariadb: Depool db1093 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433529 [06:54:22] (03CR) 10Marostegui: [C: 031] mariadb: Depool db1093 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433529 (owner: 10Jcrespo) [06:54:33] (03PS2) 10Gehel: Fix some issues found with process-osm-data [puppet] - 10https://gerrit.wikimedia.org/r/433488 (https://phabricator.wikimedia.org/T190237) (owner: 10Pnorman) [06:55:25] (03CR) 10Gehel: [C: 032] Fix some issues found with process-osm-data [puppet] - 10https://gerrit.wikimedia.org/r/433488 (https://phabricator.wikimedia.org/T190237) (owner: 10Pnorman) [06:56:47] (03PS1) 10Jcrespo: mariadb: Allow reimage of db109* hosts [puppet] - 10https://gerrit.wikimedia.org/r/433530 [06:56:53] (03PS2) 10Gehel: wdqs: partman config for wdqs10(09|10) [puppet] - 10https://gerrit.wikimedia.org/r/433348 (https://phabricator.wikimedia.org/T194184) [06:58:14] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1093 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433529 (owner: 10Jcrespo) [06:58:28] (03PS2) 10Jcrespo: mariadb: Allow reimage of db109* hosts [puppet] - 10https://gerrit.wikimedia.org/r/433530 [06:58:49] (03CR) 10Jcrespo: [C: 032] mariadb: Allow reimage of db109* hosts [puppet] - 10https://gerrit.wikimedia.org/r/433530 (owner: 10Jcrespo) [06:59:05] (03CR) 10Gehel: [C: 032] wdqs: partman config for wdqs10(09|10) [puppet] - 10https://gerrit.wikimedia.org/r/433348 (https://phabricator.wikimedia.org/T194184) (owner: 10Gehel) [06:59:45] (03PS3) 10Jcrespo: mariadb: Allow reimage of db109* hosts [puppet] - 10https://gerrit.wikimedia.org/r/433530 [06:59:48] (03Merged) 10jenkins-bot: mariadb: Depool db1093 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433529 (owner: 10Jcrespo) [07:00:00] (03CR) 10jenkins-bot: mariadb: Depool db1093 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433529 (owner: 10Jcrespo) [07:00:08] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433531 [07:00:14] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433531 [07:02:00] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1093 (duration: 01m 22s) [07:02:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:05:00] !log stop and reimage db1093 [07:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:05:38] (03PS2) 10Gehel: wdqs: configure new wdqs test cluster [puppet] - 10https://gerrit.wikimedia.org/r/433351 (https://phabricator.wikimedia.org/T194184) [07:06:48] RECOVERY - MegaRAID on db1054 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [07:07:01] (03CR) 10Gehel: wdqs: configure new wdqs test cluster (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/433351 (https://phabricator.wikimedia.org/T194184) (owner: 10Gehel) [07:07:19] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433531 (owner: 10Marostegui) [07:08:55] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433531 (owner: 10Marostegui) [07:12:10] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1079 after alter table (duration: 01m 20s) [07:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:26] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433531 (owner: 10Marostegui) [07:16:39] (03PS1) 10Marostegui: db-eqiad.php: Depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433534 (https://phabricator.wikimedia.org/T190148) [07:18:32] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433534 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [07:18:58] PROBLEM - Device not healthy -SMART- on rdb1004 is CRITICAL: cluster=redis device=sat+megaraid,3 instance=rdb1004:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=rdb1004&var-datasource=eqiad%2520prometheus%252Fops [07:19:45] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Specify the sanitarium masters in codfw (duration: 01m 21s) [07:19:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:20:10] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433534 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [07:22:08] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 for alter table (duration: 01m 21s) [07:22:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:22:28] !log Deploy schema change on db1090:3317 - T191519 T188299 T190148 [07:22:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:22:33] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [07:22:34] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [07:22:34] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [07:24:50] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433534 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [07:25:17] !log bounced all the prometheus burrow exporters on kafkamon* hosts to refresh their metrics and drop old/expired cgroups [07:25:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:06] (03PS1) 10Jcrespo: mariadb: Repool db1093 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433535 [07:37:22] (03PS1) 10Jcrespo: mariadb: Reimage db1085 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/433536 [07:40:14] (03PS3) 10Gehel: wdqs: configure new wdqs test cluster [puppet] - 10https://gerrit.wikimedia.org/r/433351 (https://phabricator.wikimedia.org/T194184) [07:41:46] (03PS2) 10Jcrespo: mariadb: Reimage db1085 & s2-codfw hosts to stretch [puppet] - 10https://gerrit.wikimedia.org/r/433536 [07:43:01] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1093 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433535 (owner: 10Jcrespo) [07:44:24] (03Merged) 10jenkins-bot: mariadb: Repool db1093 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433535 (owner: 10Jcrespo) [07:49:11] (03CR) 10jenkins-bot: mariadb: Repool db1093 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433535 (owner: 10Jcrespo) [07:53:38] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1093 with low load (duration: 01m 20s) [07:53:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:46] (03CR) 10Jcrespo: [C: 032] mariadb: Reimage db1085 & s2-codfw hosts to stretch [puppet] - 10https://gerrit.wikimedia.org/r/433536 (owner: 10Jcrespo) [08:04:24] !log stop and reimage db2056 [08:04:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:06] (03PS1) 10Marostegui: install_server: Allow install the new sanitarium hosts [puppet] - 10https://gerrit.wikimedia.org/r/433537 (https://phabricator.wikimedia.org/T194780) [08:14:34] (03CR) 10Marostegui: [C: 032] install_server: Allow install the new sanitarium hosts [puppet] - 10https://gerrit.wikimedia.org/r/433537 (https://phabricator.wikimedia.org/T194780) (owner: 10Marostegui) [08:41:29] !log stop and reimage db2049 [08:41:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:41] (03PS4) 10Gehel: wdqs: configure new wdqs test cluster [puppet] - 10https://gerrit.wikimedia.org/r/433351 (https://phabricator.wikimedia.org/T194184) [09:03:08] (03CR) 10Gehel: [C: 032] wdqs: configure new wdqs test cluster [puppet] - 10https://gerrit.wikimedia.org/r/433351 (https://phabricator.wikimedia.org/T194184) (owner: 10Gehel) [09:12:07] (03CR) 10Smalyshev: wdqs: configure new wdqs test cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/433351 (https://phabricator.wikimedia.org/T194184) (owner: 10Gehel) [09:15:30] (03CR) 10Gehel: wdqs: configure new wdqs test cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/433351 (https://phabricator.wikimedia.org/T194184) (owner: 10Gehel) [09:18:25] (03PS1) 10Gehel: wdqs: disable LDF endpoint for wdqs-test [puppet] - 10https://gerrit.wikimedia.org/r/433539 (https://phabricator.wikimedia.org/T194184) [09:19:18] (03CR) 10Gehel: [C: 032] wdqs: disable LDF endpoint for wdqs-test [puppet] - 10https://gerrit.wikimedia.org/r/433539 (https://phabricator.wikimedia.org/T194184) (owner: 10Gehel) [09:23:00] (03PS2) 10Reedy: New throttle rule for WMF Hackhathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433207 (https://phabricator.wikimedia.org/T194392) (owner: 10Urbanecm) [09:23:12] (03PS3) 10Reedy: New throttle rule for WMF Hackhathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433207 (https://phabricator.wikimedia.org/T194392) (owner: 10Urbanecm) [09:23:29] (03CR) 10jerkins-bot: [V: 04-1] New throttle rule for WMF Hackhathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433207 (https://phabricator.wikimedia.org/T194392) (owner: 10Urbanecm) [09:24:19] (03CR) 10Reedy: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433207 (https://phabricator.wikimedia.org/T194392) (owner: 10Urbanecm) [09:25:57] (03CR) 10Reedy: [C: 032] New throttle rule for WMF Hackhathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433207 (https://phabricator.wikimedia.org/T194392) (owner: 10Urbanecm) [09:27:26] (03Merged) 10jenkins-bot: New throttle rule for WMF Hackhathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433207 (https://phabricator.wikimedia.org/T194392) (owner: 10Urbanecm) [09:29:26] (03CR) 10jenkins-bot: New throttle rule for WMF Hackhathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433207 (https://phabricator.wikimedia.org/T194392) (owner: 10Urbanecm) [09:30:23] !log reedy@tin Synchronized wmf-config/throttle.php: Throttle for Barcelona Hackathon (duration: 01m 22s) [09:30:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:19] (03CR) 10MarcoAurelio: [C: 031] lists: disable list subscription via email [puppet] - 10https://gerrit.wikimedia.org/r/432998 (https://phabricator.wikimedia.org/T194032) (owner: 10Herron) [10:09:52] !log mobrovac@tin Started deploy [citoid/deploy@8a26508]: Update citoid to 2f35126 - T179123 T185217 [10:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:58] T179123: Use crossref to search for human-readable citations copy-pasted from a bibliography in a PDF - https://phabricator.wikimedia.org/T179123 [10:09:58] T185217: If unable to resolve DOI in requestFromDOI, try to retrieve metadata using crossRef - https://phabricator.wikimedia.org/T185217 [10:12:44] !log mobrovac@tin Finished deploy [citoid/deploy@8a26508]: Update citoid to 2f35126 - T179123 T185217 (duration: 02m 52s) [10:12:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:58] (03Draft1) 10MarcoAurelio: zhwiki: let 'accountcreators' to self-remove their permissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433546 (https://phabricator.wikimedia.org/T194871) [10:31:03] (03PS2) 10MarcoAurelio: zhwiki: let 'accountcreators' to self-remove their permissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433546 (https://phabricator.wikimedia.org/T194871) [11:08:22] !log Stop MySQL and poweroff db1067 - T194852 [11:08:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:08:27] T194852: Possibly BBU issues on db1067 - https://phabricator.wikimedia.org/T194852 [11:08:57] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433550 [11:09:00] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433550 [11:10:36] (03CR) 10jerkins-bot: [V: 04-1] Revert "db-eqiad.php: Depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433550 (owner: 10Marostegui) [11:12:00] (03Abandoned) 10Marostegui: Revert "db-eqiad.php: Depool db1090:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433550 (owner: 10Marostegui) [11:13:09] (03PS1) 10Marostegui: db-eqiad.php: Repool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433551 [11:18:18] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433551 (owner: 10Marostegui) [11:19:42] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433551 (owner: 10Marostegui) [11:19:57] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1090:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433551 (owner: 10Marostegui) [11:21:18] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 after alter table (duration: 01m 21s) [11:21:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:03] (03PS1) 10Mforns: Change deploy-mode to client for EventLogging sanitization cron job [puppet] - 10https://gerrit.wikimedia.org/r/433555 (https://phabricator.wikimedia.org/T193176) [11:54:27] (03CR) 10Elukey: [C: 032] Change deploy-mode to client for EventLogging sanitization cron job [puppet] - 10https://gerrit.wikimedia.org/r/433555 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [12:12:19] (03PS1) 10Rduran: [WIP] Create framework to transfer files over the LAN [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/433556 [12:12:21] (03PS1) 10Rduran: [WIP] Use Cumin to implement the comunication for the transfer [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/433557 (https://phabricator.wikimedia.org/T156462) [12:12:23] (03PS1) 10Rduran: [WIP] Refactor code in transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/433558 (https://phabricator.wikimedia.org/T156462) [12:41:26] (03PS2) 10Elukey: profile::hadoop::worker: drop Debian Jessie support [puppet] - 10https://gerrit.wikimedia.org/r/432564 (https://phabricator.wikimedia.org/T192557) [12:43:17] (03CR) 10Elukey: [C: 032] profile::hadoop::worker: drop Debian Jessie support [puppet] - 10https://gerrit.wikimedia.org/r/432564 (https://phabricator.wikimedia.org/T192557) (owner: 10Elukey) [12:44:56] (03PS1) 10Jcrespo: mariadb: Depool db1080 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433560 [12:49:27] (03PS2) 10Jcrespo: mariadb: Depool db1106 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433560 [12:50:30] !log Deploy schema change on s3 codfw primary master (db2043) this will generate lag on codfw - T191519 T188299 T190148 [12:50:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:36] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [12:50:36] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [12:50:37] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [12:50:51] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1106 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433560 (owner: 10Jcrespo) [12:51:53] (03CR) 10jenkins-bot: mariadb: Depool db1106 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433560 (owner: 10Jcrespo) [12:53:47] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1106 (duration: 01m 20s) [12:53:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:27] (03PS1) 10Jcrespo: mariadb: Reimage db1106 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/433561 [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy European Mid-day SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180517T1300). [13:00:04] No GERRIT patches in the queue for this window AFAICS. [13:03:52] (03PS2) 10KartikMistry: WIP: apertium-apy: New upstream release [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/433318 (https://phabricator.wikimedia.org/T194342) [13:04:26] (03CR) 10jerkins-bot: [V: 04-1] WIP: apertium-apy: New upstream release [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/433318 (https://phabricator.wikimedia.org/T194342) (owner: 10KartikMistry) [13:07:53] (03PS2) 10Elukey: Swap zookeeper on conf1002 with conf1005 [puppet] - 10https://gerrit.wikimedia.org/r/433322 (https://phabricator.wikimedia.org/T182924) [13:13:24] (03CR) 10Jcrespo: [C: 032] mariadb: Reimage db1106 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/433561 (owner: 10Jcrespo) [13:14:42] (03CR) 10Elukey: "PCC looks good https://puppet-compiler.wmflabs.org/compiler02/11235/" [puppet] - 10https://gerrit.wikimedia.org/r/433322 (https://phabricator.wikimedia.org/T182924) (owner: 10Elukey) [13:15:35] !log stop and reimage db1106 [13:15:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:56] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1093 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433564 [13:17:06] (03CR) 10jerkins-bot: [V: 04-1] Revert "mariadb: Depool db1093 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433564 (owner: 10Jcrespo) [13:24:37] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1093 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433564 [13:35:15] (03PS1) 10Marostegui: db-eqiad.php: Depool db1105 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433567 [13:36:05] !log restarted db1105 by mistake, turning it back on [13:36:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:42] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1105 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433567 (owner: 10Marostegui) [13:37:18] marostegui: I have a BBU for db1067. Do you want to shut the server down now? [13:38:02] cmjohnson: Let's give it some time, I powered it off for 1 hour and it now looks fine, but I want to leave it like that for a few more hours [13:38:16] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1105 (duration: 01m 21s) [13:38:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:28] cmjohnson: Can we move db1066 instead today? T193847 [13:38:29] T193847: Move db1066 to row A - https://phabricator.wikimedia.org/T193847 [13:38:57] marostegui: sure, I’m here so whenever you’re ready [13:39:07] cmjohnson: Cool, I will need an IP :) [13:39:22] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1105 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433567 (owner: 10Marostegui) [13:39:52] (03PS1) 10Marostegui: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433568 (https://phabricator.wikimedia.org/T193847) [13:41:39] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433568 (https://phabricator.wikimedia.org/T193847) (owner: 10Marostegui) [13:43:17] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433568 (https://phabricator.wikimedia.org/T193847) (owner: 10Marostegui) [13:45:00] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433568 (https://phabricator.wikimedia.org/T193847) (owner: 10Marostegui) [13:45:30] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1066 for a rack change - T193847 (duration: 01m 21s) [13:45:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:35] T193847: Move db1066 to row A - https://phabricator.wikimedia.org/T193847 [13:46:31] !log Stop MySQL on db1066 for a rack change - T193847 [13:46:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:46] marostegui +db1066 1H IN A 10.64.0.110 [13:46:49] great! [13:46:54] I am stopping mysql now [13:48:11] (03PS1) 10Cmjohnson: Updating dns db1066 [dns] - 10https://gerrit.wikimedia.org/r/433571 (https://phabricator.wikimedia.org/T193847) [13:48:50] (03CR) 10Cmjohnson: [C: 032] Updating dns db1066 [dns] - 10https://gerrit.wikimedia.org/r/433571 (https://phabricator.wikimedia.org/T193847) (owner: 10Cmjohnson) [13:49:43] marostegui let me know when you shutdown [13:49:53] yeah, about to do it now, changing the ip on the servern ow [13:49:55] now [13:50:54] !log Power off db1066 for a rack change - T193847 [13:50:57] cmjohnson: server down [13:50:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:58] T193847: Move db1066 to row A - https://phabricator.wikimedia.org/T193847 [13:53:15] (03PS1) 10Marostegui: db-eqiad,db.codfw.php: Change db1066 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433572 (https://phabricator.wikimedia.org/T193847) [13:55:49] (03CR) 10Marostegui: [C: 032] db-eqiad,db.codfw.php: Change db1066 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433572 (https://phabricator.wikimedia.org/T193847) (owner: 10Marostegui) [13:57:18] (03Merged) 10jenkins-bot: db-eqiad,db.codfw.php: Change db1066 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433572 (https://phabricator.wikimedia.org/T193847) (owner: 10Marostegui) [13:58:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Change db1066 IP - T193847 (duration: 01m 17s) [13:59:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:01] T193847: Move db1066 to row A - https://phabricator.wikimedia.org/T193847 [13:59:28] (03CR) 10jenkins-bot: db-eqiad,db.codfw.php: Change db1066 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433572 (https://phabricator.wikimedia.org/T193847) (owner: 10Marostegui) [14:00:47] marostegui powering on now [14:01:47] nice! [14:01:48] thanks [14:03:17] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Change db1066 IP - T193847 (duration: 01m 21s) [14:03:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:18] cmjohnson: server back up, I will take it from here, thank you! [14:04:43] do you want to do the BBU today? [14:05:00] cmjohnson: No, let's give it some hours, so far it is working fine. Just keep in handy anyways :) [14:05:13] okay [14:06:17] PROBLEM - puppet last run on mw2173 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP] [14:06:37] RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:07:35] cmjohnson: you will update racktables, right? [14:08:06] already done [14:10:27] PROBLEM - puppet last run on dbproxy1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:10:36] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:10:56] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:10:57] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:11:56] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 11 failures. Last run 3 minutes ago with 11 failures. Failed resources (up to 3 shown): File[/usr/local/bin/pooler-loop],File[/usr/local/bin/pool],File[/usr/local/bin/depool],File[/usr/local/bin/drain] [14:11:56] PROBLEM - puppet last run on mw1267 is CRITICAL: CRITICAL: Puppet has 33 failures. Last run 3 minutes ago with 33 failures. Failed resources (up to 3 shown): File[/etc/firejail/mediawiki-imagemagick.profile],File[/usr/local/bin/mediawiki-firejail-convert],File[/etc/firejail/mediawiki-converters.profile],File[/usr/local/bin/mediawiki-firejail-ghostscript] [14:12:07] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: Puppet has 25 failures. Last run 3 minutes ago with 25 failures. Failed resources (up to 3 shown): File[/etc/apache2/sites-available/07-wikimania.conf],File[/etc/apache2/sites-available/08-wikimedia.conf],File[/etc/apache2/sites-available/09-foundation.conf],File[/etc/logrotate.d/nginx] [14:12:07] PROBLEM - puppet last run on kubernetes1001 is CRITICAL: CRITICAL: Puppet has 7 failures. Last run 3 minutes ago with 7 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/reboot-host],File[/usr/local/sbin/wmf-auto-restart],File[/usr/local/sbin/smart-data-dump],File[/usr/local/sbin/enforce-users-groups] [14:12:41] mmmm [14:13:12] what's happening [14:13:16] PROBLEM - puppet last run on db1088 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 4 minutes ago with 27 failures. Failed resources (up to 3 shown): File[/home/bblack],File[/home/andrew],File[/home/faidon],File[/home/rush] [14:13:27] looks at kubernetes1001 [14:13:50] did a user get removed/added ? [14:13:56] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:13:56] PROBLEM - puppet last run on mc1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:14:00] Donhttps://www.youtube.com/watch?v=nR0lOtdvqyg [14:14:08] https://www.youtube.com/watch?v=nR0lOtdvqyg *# [14:14:24] elukey: on kubernetes1001 - no issue when i run puppet despite what icinga says [14:14:36] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:15:37] PROBLEM - puppet last run on wtp1039 is CRITICAL: CRITICAL: Puppet has 21 failures. Last run 7 minutes ago with 21 failures. Failed resources (up to 3 shown): File[/etc/smartmontools/run.d/20logger],File[/etc/ferm/conf.d/00_main],File[/usr/local/bin/pooler-loop],File[/usr/local/bin/pool] [14:15:51] mutante: yup, multiple puppetmaster restarts due to human error [14:15:53] no worries [14:16:06] I should have waited a bit more before reenabling icinga-wm [14:16:26] just saw, thank you akosiaris [14:16:48] (03PS1) 10Ema: vcl: strip away unnecessary response headers set by Thumbor [puppet] - 10https://gerrit.wikimedia.org/r/433573 (https://phabricator.wikimedia.org/T194814) [14:16:50] so whos in barcelona? [14:17:16] RECOVERY - puppet last run on kubernetes1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:17:25] !log Manually fail disk #2 on db1064 to get it replaced [14:17:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:16] RECOVERY - puppet last run on analytics1060 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [14:18:16] RECOVERY - puppet last run on analytics1061 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [14:18:16] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [14:19:16] RECOVERY - puppet last run on analytics1070 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:19:56] RECOVERY - puppet last run on db1089 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [14:19:57] RECOVERY - puppet last run on analytics1071 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:20:57] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [14:21:06] RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [14:21:06] RECOVERY - puppet last run on puppetboard1001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [14:21:06] RECOVERY - puppet last run on bast4002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:21:16] RECOVERY - puppet last run on install1002 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [14:21:16] RECOVERY - puppet last run on thumbor1004 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [14:21:26] RECOVERY - puppet last run on mw2239 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:21:36] RECOVERY - puppet last run on mw1266 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:21:46] RECOVERY - puppet last run on ms-be2035 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [14:21:56] RECOVERY - puppet last run on conf1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:21:56] RECOVERY - puppet last run on restbase-dev1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:21:56] RECOVERY - puppet last run on analytics1028 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:22:06] RECOVERY - puppet last run on db1080 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:22:26] RECOVERY - puppet last run on mw1341 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [14:22:27] RECOVERY - puppet last run on ores1003 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [14:22:46] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [14:22:46] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [14:22:46] RECOVERY - puppet last run on db1100 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [14:22:47] RECOVERY - puppet last run on labvirt1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:22:47] RECOVERY - puppet last run on logstash1006 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [14:22:47] RECOVERY - puppet last run on wtp1027 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:22:55] (03PS1) 10Alexandros Kosiaris: puppet: Add /var/lib/puppet/server/ssl to backups [puppet] - 10https://gerrit.wikimedia.org/r/433574 [14:23:06] RECOVERY - puppet last run on analytics1063 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:23:06] RECOVERY - puppet last run on analytics1073 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [14:23:06] RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:23:07] RECOVERY - puppet last run on db1107 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:23:07] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [14:23:07] RECOVERY - puppet last run on db1077 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [14:23:08] (03PS1) 10Dzahn: add IPv6 records for webperf1002/webperf2002 [dns] - 10https://gerrit.wikimedia.org/r/433575 (https://phabricator.wikimedia.org/T194390) [14:23:17] RECOVERY - puppet last run on mw2204 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:23:17] RECOVERY - puppet last run on ms-be2026 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:23:25] (03PS2) 10Dzahn: add IPv6 records for webperf1002/webperf2002 [dns] - 10https://gerrit.wikimedia.org/r/433575 (https://phabricator.wikimedia.org/T194390) [14:23:36] RECOVERY - puppet last run on ms-be2031 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:23:37] RECOVERY - puppet last run on db1088 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [14:23:46] RECOVERY - puppet last run on kafka1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:23:46] RECOVERY - puppet last run on db1092 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [14:23:47] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:23:47] RECOVERY - puppet last run on labservices1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:23:56] RECOVERY - puppet last run on elastic1041 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:23:56] RECOVERY - puppet last run on labvirt1008 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [14:23:56] RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:23:56] RECOVERY - puppet last run on mw1317 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:23:56] RECOVERY - puppet last run on bast1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:23:57] RECOVERY - puppet last run on cp2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:24:06] RECOVERY - puppet last run on francium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:06] RECOVERY - puppet last run on mw2170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:24:06] RECOVERY - puppet last run on labcontrol1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:07] RECOVERY - puppet last run on dumpsdata1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:24:07] RECOVERY - puppet last run on mc1027 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [14:24:07] RECOVERY - puppet last run on dbproxy1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:24:07] RECOVERY - puppet last run on labvirt1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:24:16] RECOVERY - puppet last run on mw2241 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [14:24:16] RECOVERY - puppet last run on mw2242 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:24:16] RECOVERY - puppet last run on mw1340 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [14:24:16] RECOVERY - puppet last run on ms-be1026 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [14:24:16] RECOVERY - puppet last run on mw2184 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:24:16] RECOVERY - puppet last run on mw2179 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [14:24:17] RECOVERY - puppet last run on elastic1051 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [14:24:21] (03CR) 10Dzahn: [C: 032] add IPv6 records for webperf1002/webperf2002 [dns] - 10https://gerrit.wikimedia.org/r/433575 (https://phabricator.wikimedia.org/T194390) (owner: 10Dzahn) [14:24:26] RECOVERY - puppet last run on mw2188 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:26] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:24:26] RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:24:26] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:26] RECOVERY - puppet last run on dbstore1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:26] RECOVERY - puppet last run on db1087 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:26] RECOVERY - puppet last run on mw2257 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:24:27] RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:24:27] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:28] RECOVERY - puppet last run on wtp1037 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:24:28] RECOVERY - puppet last run on scb1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:29] RECOVERY - puppet last run on kafka1012 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [14:24:36] RECOVERY - puppet last run on db1095 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:24:36] RECOVERY - puppet last run on mw1325 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:36] RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:37] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [14:24:56] RECOVERY - puppet last run on thumbor1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:24:56] RECOVERY - puppet last run on mwlog1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:24:56] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [14:24:56] RECOVERY - puppet last run on cp2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:24:56] RECOVERY - puppet last run on mw2217 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [14:24:56] RECOVERY - puppet last run on ms-be1033 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:25:06] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:25:16] RECOVERY - puppet last run on mw2181 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [14:25:16] RECOVERY - puppet last run on mw1300 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [14:25:16] RECOVERY - puppet last run on ms-be1027 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [14:25:16] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [14:25:17] RECOVERY - puppet last run on labvirt1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:25:17] RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:25:17] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:25:17] RECOVERY - puppet last run on dbproxy1003 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [14:25:26] RECOVERY - puppet last run on kubestagetcd1003 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [14:25:26] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [14:25:26] RECOVERY - puppet last run on ms-be1031 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:25:26] RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:25:27] RECOVERY - puppet last run on ms-be1040 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:25:27] RECOVERY - puppet last run on ms-be1023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:25:27] RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [14:25:36] RECOVERY - puppet last run on mw2222 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [14:25:36] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:25:36] RECOVERY - puppet last run on relforge1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:25:36] RECOVERY - puppet last run on ping1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:25:36] RECOVERY - puppet last run on mw2140 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:25:36] RECOVERY - puppet last run on ms-be2014 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [14:25:37] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:25:37] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:25:46] RECOVERY - puppet last run on mc1024 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:25:46] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:25:46] RECOVERY - puppet last run on labvirt1021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:25:46] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [14:25:46] RECOVERY - puppet last run on mw2196 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:25:47] RECOVERY - puppet last run on wdqs1008 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:25:47] RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:25:47] RECOVERY - puppet last run on mw2289 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:25:48] RECOVERY - puppet last run on mw2247 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:06] RECOVERY - puppet last run on dbproxy1009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:26:06] RECOVERY - puppet last run on aqs1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:16] RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [14:26:17] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:26:17] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:26:17] RECOVERY - puppet last run on ms-be1030 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [14:26:26] RECOVERY - puppet last run on labvirt1014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:26] RECOVERY - puppet last run on labcontrol1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:26] RECOVERY - puppet last run on ganeti1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:26] RECOVERY - puppet last run on mw1288 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:26:26] RECOVERY - puppet last run on mw1280 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:26:26] RECOVERY - puppet last run on mw1318 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:26:26] RECOVERY - puppet last run on mw2213 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:26:27] RECOVERY - puppet last run on serpens is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [14:26:36] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [14:26:36] RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:46] RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:46] RECOVERY - puppet last run on db1091 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:26:46] RECOVERY - puppet last run on logstash1009 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [14:26:56] RECOVERY - puppet last run on mw2173 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:56] RECOVERY - puppet last run on ms-be2028 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:56] RECOVERY - puppet last run on mw1307 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:56] RECOVERY - puppet last run on mw2165 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:26:57] RECOVERY - puppet last run on labnodepool1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:26:57] RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:27:06] RECOVERY - puppet last run on es1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:27:06] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:27:16] RECOVERY - puppet last run on cp5010 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:27:17] RECOVERY - puppet last run on kafka1023 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:27:27] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:27:27] RECOVERY - puppet last run on mw1267 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:27:36] RECOVERY - puppet last run on db1104 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:27:36] RECOVERY - puppet last run on labnodepool1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:27:37] RECOVERY - puppet last run on ms-be1035 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:27:37] RECOVERY - puppet last run on mwdebug2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:27:46] RECOVERY - puppet last run on mw1348 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:27:46] RECOVERY - puppet last run on mw2267 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:27:46] RECOVERY - puppet last run on mw2229 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:27:46] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:27:46] RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:27:56] RECOVERY - puppet last run on mw2168 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:27:56] RECOVERY - puppet last run on wdqs1006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:28:06] RECOVERY - puppet last run on scb1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:28:06] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:28:06] RECOVERY - puppet last run on mw1308 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:28:26] RECOVERY - puppet last run on oresrdb1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:28:26] RECOVERY - puppet last run on mw1319 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:28:26] RECOVERY - puppet last run on mw1327 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:28:37] RECOVERY - puppet last run on mw2197 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:28:37] RECOVERY - puppet last run on seaborgium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:29:06] RECOVERY - puppet last run on labvirt1017 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:29:06] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:29:26] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:29:26] RECOVERY - puppet last run on mw2154 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:29:27] RECOVERY - puppet last run on ms-be2040 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:29:27] RECOVERY - puppet last run on mw2161 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:29:36] RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:29:46] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:29:46] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:29:46] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [14:29:47] RECOVERY - puppet last run on mw2166 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [14:30:06] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:30:16] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:32:04] (03PS2) 10Marostegui: wiki replicas: depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/433206 (https://phabricator.wikimedia.org/T174047) (owner: 10Bstorm) [14:32:16] (03PS2) 10Dzahn: webperf: add IPv6 mapped address to role [puppet] - 10https://gerrit.wikimedia.org/r/433299 [14:32:44] (03CR) 10jerkins-bot: [V: 04-1] webperf: add IPv6 mapped address to role [puppet] - 10https://gerrit.wikimedia.org/r/433299 (owner: 10Dzahn) [14:33:35] (03CR) 10Marostegui: [C: 032] wiki replicas: depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/433206 (https://phabricator.wikimedia.org/T174047) (owner: 10Bstorm) [14:34:56] RECOVERY - puppet last run on mc1020 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:35:18] !log Reload haproxy on dbproxy1010 to depool labsdb1010 https://phabricator.wikimedia.org/T174047 https://phabricator.wikimedia.org/T194341 [14:35:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:53] (03PS1) 10Jcrespo: mariadb: Allow reimage of db111* hosts [puppet] - 10https://gerrit.wikimedia.org/r/433576 [14:39:15] (03CR) 10Jcrespo: [C: 032] mariadb: Allow reimage of db111* hosts [puppet] - 10https://gerrit.wikimedia.org/r/433576 (owner: 10Jcrespo) [14:39:20] !log shutting down furud for shelves swap [14:39:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:26] PROBLEM - Host furud is DOWN: PING CRITICAL - Packet loss = 100% [14:41:46] PROBLEM - HP RAID on db2067 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:11, 1I:1:12 - Failed: 1I:1:10 - Controller: OK - Battery/Capacitor: OK [14:41:48] ACKNOWLEDGEMENT - HP RAID on db2067 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:11, 1I:1:12 - Failed: 1I:1:10 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T194886 [14:41:56] RECOVERY - puppet last run on wtp1039 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:42:16] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:43:06] PROBLEM - Device not healthy -SMART- on db1064 is CRITICAL: cluster=mysql device={megaraid,2,megaraid,6} instance=db1064:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1064&var-datasource=eqiad%2520prometheus%252Fops [14:43:49] ^ that is being handled on T194885 [14:43:49] T194885: Degraded RAID on db1064 - https://phabricator.wikimedia.org/T194885 [14:45:08] Krinkle: for "webperf profiling tools" as in "xhgui,xhprof, ?": you are going to need a webserver, right. want Apache? [14:49:26] RECOVERY - Host furud is UP: PING OK - Packet loss = 0%, RTA = 36.95 ms [14:56:35] (03Abandoned) 10Rduran: [WIP] Refactor code in transfer.py [puppet] - 10https://gerrit.wikimedia.org/r/432569 (https://phabricator.wikimedia.org/T156462) (owner: 10Rduran) [14:56:46] (03Abandoned) 10Rduran: [WIP] Use Cumin to implement the comunication for the transfer [puppet] - 10https://gerrit.wikimedia.org/r/430868 (https://phabricator.wikimedia.org/T156462) (owner: 10Rduran) [14:56:58] (03PS2) 10Alexandros Kosiaris: puppet: Add /var/lib/puppet/server/ssl to backups [puppet] - 10https://gerrit.wikimedia.org/r/433574 [14:57:57] (03PS1) 10Bstorm: Revert "wiki replicas: depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/433583 [14:58:14] (03PS2) 10Marostegui: Revert "wiki replicas: depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/433583 (owner: 10Bstorm) [14:58:50] (03CR) 10Marostegui: [C: 032] Revert "wiki replicas: depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/433583 (owner: 10Bstorm) [15:00:12] (03CR) 10Alexandros Kosiaris: [C: 032] puppet: Add /var/lib/puppet/server/ssl to backups [puppet] - 10https://gerrit.wikimedia.org/r/433574 (owner: 10Alexandros Kosiaris) [15:00:18] !log Reload haproxy on dbproxy1010 to repool labsdb1010 [15:00:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:12] (03PS3) 10Alexandros Kosiaris: puppet: Add /var/lib/puppet/server/ssl to backups [puppet] - 10https://gerrit.wikimedia.org/r/433574 [15:05:52] RECOVERY - Device not healthy -SMART- on db2067 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2067&var-datasource=codfw%2520prometheus%252Fops [15:10:07] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: rack/setup/install labstore1008 & labstore1009 - https://phabricator.wikimedia.org/T193655#4212031 (10chasemp) >>! In T193655#4211959, @Cmjohnson wrote: > @chasemp The only row I have available that I can put in adjacent racks is A5 and A6. wil... [15:11:32] (03PS2) 10Rduran: [WIP] Refactor code in transfer.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/433558 (https://phabricator.wikimedia.org/T156462) [15:15:06] 10Operations, 10ops-codfw: furud: disconnect furud-array[3-7]; connect furud-array[1-2] - https://phabricator.wikimedia.org/T194798#4212038 (10faidon) 05Open>03Resolved Confirmed, thanks @Papaul! [15:15:12] RECOVERY - Check systemd state on kafkamon2001 is OK: OK - running: The system is fully operational [15:16:21] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1106 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433589 [15:16:32] (03CR) 10jerkins-bot: [V: 04-1] Revert "mariadb: Depool db1106 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433589 (owner: 10Jcrespo) [15:19:42] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1093 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433564 (owner: 10Jcrespo) [15:19:51] (03PS4) 10Jcrespo: Revert "mariadb: Depool db1093 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433564 [15:27:23] (03PS1) 10Dzahn: webperf: basic role for profiling tools, add ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) [15:27:52] (03CR) 10jerkins-bot: [V: 04-1] webperf: basic role for profiling tools, add ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) (owner: 10Dzahn) [15:28:03] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1064 - https://phabricator.wikimedia.org/T194885#4212056 (10Marostegui) 05Open>03Resolved a:03Marostegui This is now fixed, I am going to fail the other disk and a new task will be created ``` Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id:... [15:28:30] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1093 with full weight (duration: 01m 21s) [15:28:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:09] (03PS1) 10Bstorm: wiki replicas: depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/433591 (https://phabricator.wikimedia.org/T174047) [15:29:15] !log Manually fail disk #6 on db1064 to get it replaced [15:29:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:27] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1093 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433564 (owner: 10Jcrespo) [15:35:45] (03PS2) 10Dzahn: webperf: add ferm rules and basic role/profile for xhgui [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) [15:36:18] (03CR) 10jerkins-bot: [V: 04-1] webperf: add ferm rules and basic role/profile for xhgui [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) (owner: 10Dzahn) [15:40:51] 10Operations, 10Cloud-Services, 10netops: Allocate public v4 IPs for Neutron setup in eqiad - https://phabricator.wikimedia.org/T193496#4212077 (10chasemp) [15:45:54] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1106 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433589 [15:47:46] (03PS3) 10Dzahn: webperf: add ferm rules and basic role/profile for xhgui [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) [15:48:26] (03CR) 10jerkins-bot: [V: 04-1] webperf: add ferm rules and basic role/profile for xhgui [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) (owner: 10Dzahn) [15:54:50] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1106 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433589 (owner: 10Jcrespo) [15:55:36] (03PS4) 10Dzahn: webperf: add ferm rules and basic role/profile for xhgui [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) [15:56:06] 10Operations, 10ops-eqiad, 10Cloud-VPS: labnet1003 and labnet1004 moving and enabling 10G NICs - https://phabricator.wikimedia.org/T193196#4212103 (10Andrew) [15:56:12] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review: Update and move labnet1001/1002 - https://phabricator.wikimedia.org/T193579#4212101 (10Andrew) 05Open>03Resolved Looks good! Thanks @Cmjohnson [15:56:13] (03CR) 10jerkins-bot: [V: 04-1] webperf: add ferm rules and basic role/profile for xhgui [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) (owner: 10Dzahn) [15:57:14] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1106 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433589 (owner: 10Jcrespo) [15:59:39] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1106 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433589 (owner: 10Jcrespo) [16:00:04] godog, moritzm, and _joe_: That opportune time is upon us again. Time for a Puppet SWAT(Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180517T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:00:21] (03PS5) 10Dzahn: webperf: add ferm rules and basic role/profile for xhgui [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) [16:04:07] (03CR) 10Dzahn: [C: 032] webperf: add ferm rules and basic role/profile for xhgui [puppet] - 10https://gerrit.wikimedia.org/r/433590 (https://phabricator.wikimedia.org/T194390) (owner: 10Dzahn) [16:04:09] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1106 (duration: 01m 21s) [16:04:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:30] 10Operations, 10Cloud-Services, 10netops: Allocate public v4 IPs for Neutron setup in eqiad - https://phabricator.wikimedia.org/T193496#4212144 (10chasemp) >>! In T193496#4210037, @faidon wrote: > The /25 -> /24 renumbering seems fairly straightforward, but given a) IPv4's depletion (we effectively cannot ge... [16:09:17] (03PS2) 10Marostegui: wiki replicas: depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/433591 (https://phabricator.wikimedia.org/T174047) (owner: 10Bstorm) [16:09:46] RECOVERY - Device not healthy -SMART- on db1064 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1064&var-datasource=eqiad%2520prometheus%252Fops [16:09:58] \o/ [16:10:01] (03CR) 10Marostegui: [C: 032] wiki replicas: depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/433591 (https://phabricator.wikimedia.org/T174047) (owner: 10Bstorm) [16:11:28] !log Reload haproxy on dbproxy1010 to depool labsdb1011 T174047 T194341 [16:11:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:34] T174047: Provide backwards compatibility views for toolforge replica [MCR] - https://phabricator.wikimedia.org/T174047 [16:11:34] T194341: SELECT query on page table appears to also reference revision table - https://phabricator.wikimedia.org/T194341 [16:15:45] (03PS1) 10Dzahn: webperf::profiling: require libapache2-mod-php7.0 [puppet] - 10https://gerrit.wikimedia.org/r/433600 [16:16:06] (03PS2) 10Dzahn: webperf::profiling: require libapache2-mod-php7.0 [puppet] - 10https://gerrit.wikimedia.org/r/433600 [16:17:29] (03CR) 10Dzahn: [C: 032] webperf::profiling: require libapache2-mod-php7.0 [puppet] - 10https://gerrit.wikimedia.org/r/433600 (owner: 10Dzahn) [16:33:13] 10Operations, 10vm-requests, 10Patch-For-Review: EQIAD & CODFW: 1 VM in each data center for xhprof/xhgui/other profiling tools - https://phabricator.wikimedia.org/T194390#4212203 (10Dzahn) Hi @Imarlier, so.. see all the above: - VMs created with specs as requested (4vCPUS,8G RAM, 50G disk), one in each DC... [16:33:46] 10Operations, 10vm-requests, 10Patch-For-Review: EQIAD & CODFW: 1 VM in each data center for xhprof/xhgui/other profiling tools - https://phabricator.wikimedia.org/T194390#4212206 (10Dzahn) 05Open>03Resolved [16:34:55] 10Operations, 10Performance-Team, 10monitoring, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837#3367896 (10Dzahn) [16:35:01] 10Operations, 10Performance-Team, 10monitoring, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837#3367896 (10Dzahn) [16:36:10] 10Operations, 10Performance-Team, 10monitoring, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837#3368030 (10Dzahn) You should now be able to move xhgui over to webperf1002/2002. New VMs have been created and are ready to use. See details i... [16:43:34] 10Operations: Upgrade naos to stretch (and rename to deploy2001) - https://phabricator.wikimedia.org/T190524#4212225 (10Dzahn) [16:43:37] 10Operations, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4212223 (10Dzahn) [16:45:57] 10Operations, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4212226 (10Dzahn) merged in duplicate task. checked naos in racktables. it's still under warranty until 2019-03 [16:47:52] (03PS2) 10Ottomata: Enable webrequest deletion [puppet] - 10https://gerrit.wikimedia.org/r/433425 [16:47:57] (03CR) 10Ottomata: [V: 032 C: 032] Enable webrequest deletion [puppet] - 10https://gerrit.wikimedia.org/r/433425 (owner: 10Ottomata) [16:52:19] 10Operations, 10Wikimedia-Mailing-lists: wikitech-l is mangling my PGP/MIME emails, causing signature validation to fail - https://phabricator.wikimedia.org/T186311#4212242 (10Aklapper) For the records, https://lists.wikimedia.org/pipermail/wikitech-l/2018-May/090003.html says "Valid signature, but cannot veri... [16:55:40] 10Operations, 10Release-Engineering-Team (Watching / External): rename naos to deploy2001 and reinstall with stretch - https://phabricator.wikimedia.org/T193916#4212246 (10RobH) Please ensure when the rename is done, a sub-task for the on-site (@papaul) is created in #ops-codfw for him to update the hostname p... [17:00:05] cscott, arlolra, subbu, halfak, and Amir1: #bothumor My software never has bugs. It just develops random features. Rise for Services – Graphoid / Parsoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180517T1700). [17:06:18] !log arlolra@tin Started deploy [parsoid/deploy@091b891]: Updating Parsoid to fd49ab4 [17:06:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:06:55] (03CR) 10Krinkle: webperf::profiling: require libapache2-mod-php7.0 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/433600 (owner: 10Dzahn) [17:10:57] (03CR) 10Dzahn: [C: 032] webperf::profiling: require libapache2-mod-php7.0 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/433600 (owner: 10Dzahn) [17:12:21] (03CR) 10Dzahn: [C: 032] webperf::profiling: require libapache2-mod-php7.0 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/433600 (owner: 10Dzahn) [17:15:53] !log arlolra@tin Finished deploy [parsoid/deploy@091b891]: Updating Parsoid to fd49ab4 (duration: 09m 35s) [17:15:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:19:04] (03PS1) 10Herron: lists: rate limit subscriptions by email address [puppet] - 10https://gerrit.wikimedia.org/r/433607 (https://phabricator.wikimedia.org/T194032) [17:20:10] (03CR) 10Herron: [C: 032] lists: rate limit subscriptions by email address [puppet] - 10https://gerrit.wikimedia.org/r/433607 (https://phabricator.wikimedia.org/T194032) (owner: 10Herron) [17:21:56] (03PS1) 10Bstorm: Revert "wiki replicas: depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/433609 [17:23:52] !log Updated Parsoid to fd49ab4 (T194821, T194687) [17:23:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:57] T194821: parsoid warning with WikibaseLexeme on test.wikidata & beta "Unknown contentmodel wikibase-lexeme" - https://phabricator.wikimedia.org/T194821 [17:23:57] T194687: TemplateData Request The parameter "doNotIgnoreMissingTitles" has been deprecated. - https://phabricator.wikimedia.org/T194687 [17:40:49] (03PS1) 10Dzahn: install_server: rename naos to deploy2001 [puppet] - 10https://gerrit.wikimedia.org/r/433615 (https://phabricator.wikimedia.org/T193916) [17:42:15] (03PS3) 10Umherirrender: Replace wfGetLBFactory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414310 [17:45:37] RECOVERY - HP RAID on db2067 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK [17:45:39] (03PS1) 10Dzahn: scap: swap naos with deploy2001 as scap master [puppet] - 10https://gerrit.wikimedia.org/r/433616 (https://phabricator.wikimedia.org/T193916) [17:52:38] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Google Search Console access for Search Platform team - https://phabricator.wikimedia.org/T188453#4008190 (10RobH) I recently had to add another user to google search console, so I'll share my findings. Please note directions on how to... [17:55:38] (03PS2) 10Herron: lists: disable list subscription via email [puppet] - 10https://gerrit.wikimedia.org/r/432998 (https://phabricator.wikimedia.org/T194032) [17:57:21] (03CR) 10Herron: [C: 032] lists: disable list subscription via email [puppet] - 10https://gerrit.wikimedia.org/r/432998 (https://phabricator.wikimedia.org/T194032) (owner: 10Herron) [18:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Morning SWAT (Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180517T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:02:36] (03PS1) 10Dzahn: add deploy2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/433618 (https://phabricator.wikimedia.org/T193916) [18:05:12] (03PS1) 10Dzahn: rename wmf6406's mgmt interface from naos to deploy2001 [dns] - 10https://gerrit.wikimedia.org/r/433619 (https://phabricator.wikimedia.org/T193916) [18:08:18] (03PS2) 10Dzahn: rename wmf6406's mgmt interface from naos to deploy2001 [dns] - 10https://gerrit.wikimedia.org/r/433619 (https://phabricator.wikimedia.org/T193916) [18:10:00] (03PS2) 10Dzahn: add deploy2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/433618 (https://phabricator.wikimedia.org/T193916) [18:11:32] (03CR) 10Dzahn: [C: 032] rename wmf6406's mgmt interface from naos to deploy2001 [dns] - 10https://gerrit.wikimedia.org/r/433619 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [18:13:55] (03CR) 10Dzahn: [C: 032] "i linked this and the ticket on https://racktables.wikimedia.org/index.php?page=object&tab=default&object_id=2930 but haven't changed the " [dns] - 10https://gerrit.wikimedia.org/r/433619 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [18:14:34] (03PS3) 10Dzahn: add deploy2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/433618 (https://phabricator.wikimedia.org/T193916) [18:16:18] (03CR) 10Dzahn: [C: 032] "adding a new name/IP to be used by a reinstalled naos (without touching existing naos)" [dns] - 10https://gerrit.wikimedia.org/r/433618 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [18:20:13] (03PS2) 10Dzahn: install_server: rename naos to deploy2001 [puppet] - 10https://gerrit.wikimedia.org/r/433615 (https://phabricator.wikimedia.org/T193916) [18:22:03] (03CR) 10Dzahn: [C: 032] "only affected once it gets rebooted into PXE" [puppet] - 10https://gerrit.wikimedia.org/r/433615 (https://phabricator.wikimedia.org/T193916) (owner: 10Dzahn) [18:32:26] (03PS1) 10Andrew Bogott: Make labvirt1019, 1020 labvirts [puppet] - 10https://gerrit.wikimedia.org/r/433623 [18:56:06] (03PS1) 10Gehel: maps: add crons for different update types [puppet] - 10https://gerrit.wikimedia.org/r/433625 (https://phabricator.wikimedia.org/T194857) [18:56:08] (03PS1) 10Gehel: maps: disable OSM synchronisation while testing [puppet] - 10https://gerrit.wikimedia.org/r/433626 (https://phabricator.wikimedia.org/T194857) [18:56:40] (03CR) 10jerkins-bot: [V: 04-1] maps: add crons for different update types [puppet] - 10https://gerrit.wikimedia.org/r/433625 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [18:59:51] (03PS2) 10Gehel: maps: add crons for different update types [puppet] - 10https://gerrit.wikimedia.org/r/433625 (https://phabricator.wikimedia.org/T194857) [18:59:53] (03PS2) 10Gehel: maps: disable OSM synchronisation while testing [puppet] - 10https://gerrit.wikimedia.org/r/433626 (https://phabricator.wikimedia.org/T194857) [19:00:04] twentyafterfour: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180517T1900). [19:04:19] (03CR) 10Gehel: "puppet compiler looks happy" [puppet] - 10https://gerrit.wikimedia.org/r/433625 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:04:22] (03CR) 10Gehel: "https://puppet-compiler.wmflabs.org/compiler02/11236/" [puppet] - 10https://gerrit.wikimedia.org/r/433625 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:07:23] (03CR) 10Gehel: "puppet compiler looks happy https://puppet-compiler.wmflabs.org/compiler02/11237/" [puppet] - 10https://gerrit.wikimedia.org/r/433626 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:11:28] !log train is still blocked by T194848 [19:11:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:33] T194848: Fatal error: $this is null in Echo/includes/model/Event.php on line 345 - https://phabricator.wikimedia.org/T194848 [19:12:25] (03CR) 10Pnorman: [C: 031] maps: add crons for different update types [puppet] - 10https://gerrit.wikimedia.org/r/433625 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:12:57] (03CR) 10Pnorman: [C: 031] maps: disable OSM synchronisation while testing [puppet] - 10https://gerrit.wikimedia.org/r/433626 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:13:15] (03CR) 10Gehel: [C: 032] maps: add crons for different update types [puppet] - 10https://gerrit.wikimedia.org/r/433625 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:13:20] (03CR) 10Gehel: [C: 032] maps: disable OSM synchronisation while testing [puppet] - 10https://gerrit.wikimedia.org/r/433626 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:15:51] (03PS1) 10Gehel: maps: fix typo in cron hours [puppet] - 10https://gerrit.wikimedia.org/r/433628 (https://phabricator.wikimedia.org/T194857) [19:17:07] (03CR) 10Gehel: [C: 032] maps: fix typo in cron hours [puppet] - 10https://gerrit.wikimedia.org/r/433628 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:17:49] PROBLEM - puppet last run on maps-test2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:19:24] (03PS1) 10Gehel: maps: fix typo in cron hours [puppet] - 10https://gerrit.wikimedia.org/r/433629 (https://phabricator.wikimedia.org/T194857) [19:20:25] (03CR) 10Gehel: [C: 032] maps: fix typo in cron hours [puppet] - 10https://gerrit.wikimedia.org/r/433629 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:20:41] 10Operations, 10Wikimedia-Mailing-lists: Mass subscription attempt to a mailing list from same domain (aol and yahoo) - https://phabricator.wikimedia.org/T194597#4212387 (10herron) 05Open>03Resolved a:03herron Closing as duplicate of T194032 [19:23:00] RECOVERY - puppet last run on maps-test2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:24:29] (03PS1) 10Gehel: maps: disable crons for maps-test2004 [puppet] - 10https://gerrit.wikimedia.org/r/433631 (https://phabricator.wikimedia.org/T194857) [19:25:08] (03CR) 10Gehel: [C: 032] maps: disable crons for maps-test2004 [puppet] - 10https://gerrit.wikimedia.org/r/433631 (https://phabricator.wikimedia.org/T194857) (owner: 10Gehel) [19:28:58] !log twentyafterfour@tin Synchronized php-1.32.0-wmf.4/extensions/Echo/: unbreak T194848 (duration: 01m 24s) [19:29:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:29:03] T194848: Fatal error: $this is null in Echo/includes/model/Event.php on line 345 - https://phabricator.wikimedia.org/T194848 [19:33:37] !log getting the train back on track. Starting with group1 to 1.32.0-wmf.4 right now, will do all wikis to wmf.4 after verifying that group1 looks stable. [19:33:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:29] (03PS1) 1020after4: group1 wikis to 1.32.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433634 [19:35:31] (03CR) 1020after4: [C: 032] group1 wikis to 1.32.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433634 (owner: 1020after4) [19:36:46] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433634 (owner: 1020after4) [19:38:21] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.4 [19:38:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:25] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433634 (owner: 1020after4) [19:39:42] !log twentyafterfour@tin Synchronized php: group1 wikis to 1.32.0-wmf.4 (duration: 01m 21s) [19:39:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:11] !log rolling back due to spike of undefined variable notices in resourceloader and ApiCSPReport.php [19:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:27] (03PS1) 1020after4: group1 wikis to 1.32.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433635 [19:41:29] (03CR) 1020after4: [C: 032] group1 wikis to 1.32.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433635 (owner: 1020after4) [19:42:58] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433635 (owner: 1020after4) [19:44:47] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.3 [19:44:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:44:59] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433635 (owner: 1020after4) [19:46:08] !log twentyafterfour@tin Synchronized php: group1 wikis to 1.32.0-wmf.3 (duration: 01m 20s) [19:46:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:41] (03PS1) 10Dzahn: network::constants/tcpircbot: add deploy2001 to allowed hosts [puppet] - 10https://gerrit.wikimedia.org/r/433637 (https://phabricator.wikimedia.org/T193916) [20:01:40] (03PS2) 10Dzahn: network/tcpircbot/kubernetes: add deploy2001 to allowed hosts [puppet] - 10https://gerrit.wikimedia.org/r/433637 (https://phabricator.wikimedia.org/T193916) [20:03:00] (03PS1) 10Dzahn: add deploy2001 to site.pp with deployment_server role [puppet] - 10https://gerrit.wikimedia.org/r/433638 (https://phabricator.wikimedia.org/T193916) [20:23:26] PROBLEM - ensure kvm processes are running on labvirt1021 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 [20:37:47] (03PS1) 10Herron: lists: deny subscriptions from blocklisted IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/433671 (https://phabricator.wikimedia.org/T194032) [20:39:04] PROBLEM - DNS naos.mgmt on naos.mgmt is CRITICAL: Domain naos.mgmt.codfw.wmnet was not found by the server [20:50:50] (03CR) 10Herron: [C: 032] lists: deny subscriptions from blocklisted IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/433671 (https://phabricator.wikimedia.org/T194032) (owner: 10Herron) [20:59:20] twentyafterfour: Merging a revert [21:01:17] Reedy: cool [21:02:16] PROBLEM - ensure kvm processes are running on labvirt1022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 [21:06:36] Reedy: want me to deploy? [21:06:52] Not fussed either way, I'm logged on [21:06:59] but feel free ;) [21:11:20] twentyafterfour: It merged [21:11:30] Needs a full sync-dir of that version of mw core I guess :/ [21:11:40] Reedy: ok I'll sync and babysit it [21:11:45] Thanks :) [21:29:16] RECOVERY - ensure kvm processes are running on labvirt1021 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 [21:30:06] RECOVERY - ensure kvm processes are running on labvirt1022 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 [21:41:44] (03PS2) 10Bstorm: Make labvirt1019, 1020 labvirts [puppet] - 10https://gerrit.wikimedia.org/r/433623 (owner: 10Andrew Bogott) [21:43:02] (03CR) 10Bstorm: [C: 032] Make labvirt1019, 1020 labvirts [puppet] - 10https://gerrit.wikimedia.org/r/433623 (owner: 10Andrew Bogott) [21:51:13] ugh. 21:21:06 sync-dir failed: /srv/mediawiki-staging/php-1.32.0-wmf.4/extensions/Popups/.eslintrc.json is an invalid JSON file [21:51:14] (03PS1) 10Bstorm: Add hieradata for labvirt1019 and labvirt1020 [puppet] - 10https://gerrit.wikimedia.org/r/433677 [21:52:24] (03CR) 10Bstorm: [C: 032] Add hieradata for labvirt1019 and labvirt1020 [puppet] - 10https://gerrit.wikimedia.org/r/433677 (owner: 10Bstorm) [22:45:33] ACKNOWLEDGEMENT - HP RAID on labvirt1019 is CRITICAL: CRITICAL: Slot 0: no logical drives --- Slot 0: no drives --- Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:1:1, 2I:1:2, 2I:1:3, 2I:1:4, 2I:2:1, 2I:2:2 - Controller: OK - Battery count: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T194907 [22:45:38] 10Operations, 10ops-eqiad: Degraded RAID on labvirt1019 - https://phabricator.wikimedia.org/T194907#4212678 (10ops-monitoring-bot) [22:53:15] !log deploying https://gerrit.wikimedia.org/r/#/c/433673/ refs T194900 T191050 [22:53:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:53:21] T194900: Undefined variable: nonce in ResourceLoaderClientHtml.php - https://phabricator.wikimedia.org/T194900 [22:53:21] T191050: 1.32.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T191050 [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180517T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:15:21] PROBLEM - configured eth on labvirt1019 is CRITICAL: eth1 reporting no carrier. [23:16:01] PROBLEM - configured eth on labvirt1020 is CRITICAL: eth1 reporting no carrier. [23:22:01] !log twentyafterfour@tin Synchronized php-1.32.0-wmf.4/: sync https://gerrit.wikimedia.org/r/#/c/433673/ refs T194900 (duration: 09m 54s) [23:22:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:06] T194900: Undefined variable: nonce in ResourceLoaderClientHtml.php - https://phabricator.wikimedia.org/T194900 [23:23:45] (03PS1) 1020after4: group1 wikis to 1.32.0-wmf.4 refs T191050 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433681 [23:23:47] (03CR) 1020after4: [C: 032] group1 wikis to 1.32.0-wmf.4 refs T191050 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433681 (owner: 1020after4) [23:25:07] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.4 refs T191050 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433681 (owner: 1020after4) [23:26:51] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.4 refs T191050 [23:26:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:56] T191050: 1.32.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T191050 [23:28:09] !log twentyafterfour@tin Synchronized php: group1 wikis to 1.32.0-wmf.4 refs T191050 (duration: 01m 17s) [23:28:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:28] !log still seeing Notice: Undefined variable: nonce in /srv/mediawiki/php-1.32.0-wmf.4/includes/resourceloader/ResourceLoaderClientHtml.php on line 272 [23:29:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:33] !log rolling back [23:29:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:44] (03PS1) 1020after4: group1 wikis to 1.32.0-wmf.3 refs T191050 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433682 [23:29:46] (03CR) 1020after4: [C: 032] group1 wikis to 1.32.0-wmf.3 refs T191050 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433682 (owner: 1020after4) [23:29:48] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.4 refs T191050 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433681 (owner: 1020after4) [23:31:29] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.3 refs T191050 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433682 (owner: 1020after4) [23:32:48] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.3 refs T191050 [23:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:32:52] T191050: 1.32.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T191050 [23:34:08] !log twentyafterfour@tin Synchronized php: group1 wikis to 1.32.0-wmf.3 refs T191050 (duration: 01m 20s) [23:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:34:22] 10Operations, 10DNS, 10Mail, 10Patch-For-Review: Outbound mail from Greenhouse is broken - https://phabricator.wikimedia.org/T189065#4212720 (10bbogaert) Hi @herron, I have a meeting with Lisa in recruiting for 1 pm Pacific on Monday, May 21. I'll be doing the green house changes with her then. Can we coo... [23:34:58] it looks like wmf.4 isn't getting out this week. [23:35:36] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.3 refs T191050 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433682 (owner: 1020after4) [23:35:52] !log MediaWiki Train for 1.32.0-wmf.4 remains blocked by critical bugs, see T191050 for a list of blockers. [23:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log