[00:03:47] https://github.com/wikimedia/mediawiki-extensions-Parsoid/blob/master/Parsoid.hooks.php#L47 - wait, what :| [00:10:11] gwicke, when you update a file page, we don't update the parsoid cache for pages which transclude that page? [00:10:28] ... [00:44:40] My shell on tin is frozen right now, and I have no idea why [01:17:46] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [01:21:26] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [01:39:36] The site just 502ed on me. Are things unhappy? [02:03:53] !log l10nupdate Synchronized php-1.25wmf20/cache/l10n: (no message) (duration: 00m 04s) [02:04:02] Logged the message, Master [02:05:00] !log LocalisationUpdate completed (1.25wmf20) at 2015-03-16 02:03:57+00:00 [02:05:04] Logged the message, Master [02:05:27] !log l10nupdate Synchronized php-1.25wmf21/cache/l10n: (no message) (duration: 00m 04s) [02:05:30] Logged the message, Master [02:06:36] !log LocalisationUpdate completed (1.25wmf21) at 2015-03-16 02:05:32+00:00 [02:06:40] Logged the message, Master [02:18:41] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Mar 16 02:17:38 UTC 2015 (duration 17m 37s) [02:18:47] Logged the message, Master [03:16:27] (03PS1) 10Jforrester: Enable VisualEditor by default on "phase 5" Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196984 [03:20:06] (03CR) 10MZMcBride: "What does "phase 5" refer to? Is there an associated Phabricator task or wiki page or similar?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196984 (owner: 10Jforrester) [03:45:10] (03CR) 10Alex Monk: "It refers to phases in https://www.mediawiki.org/wiki/VisualEditor/Rollouts - rather than the MediaWiki phases :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196984 (owner: 10Jforrester) [04:26:39] (03PS2) 10Glaisher: Add 'autopatrol' protection level to lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196779 (https://phabricator.wikimedia.org/T92645) [04:28:22] (03PS1) 10Revi: Create Draft (118) namespace on Korean Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) [04:28:46] (03CR) 10jenkins-bot: [V: 04-1] Create Draft (118) namespace on Korean Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) (owner: 10Revi) [04:29:07] (03PS1) 10Glaisher: Enable WikiLove extension at Ukrainian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196988 (https://phabricator.wikimedia.org/T91530) [04:30:34] (03CR) 10Glaisher: Create Draft (118) namespace on Korean Wikipedia (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) (owner: 10Revi) [04:33:44] (03PS2) 10Revi: Create Draft (118) namespace on Korean Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) [04:43:26] (03PS3) 10Revi: Create Draft (118) namespace on Korean Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) [05:27:50] 6operations: introduce our fleece blankets and bathrobes factory - https://phabricator.wikimedia.org/T92803#1120645 (10emailbot) [05:28:39] 6operations, 10Spam: introduce our fleece blankets and bathrobes factory - https://phabricator.wikimedia.org/T92803#1120651 (10MZMcBride) 5Open>3Invalid [05:37:17] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [05:37:17] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [05:52:27] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:52:27] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:12:42] (03CR) 10Jforrester: [C: 04-1] Create Draft (118) namespace on Korean Wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) (owner: 10Revi) [06:18:05] (03PS4) 10Revi: Create Draft (118) namespace on Korean Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) [06:24:18] James_F|Away: fine now? [06:28:17] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 3 failures [06:28:17] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:37] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:57] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:17] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:17] PROBLEM - puppet last run on mw2017 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:36] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:57] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:57] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:07] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [06:45:27] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:45:28] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:45:57] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:45:57] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:46:17] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:17] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:58] RECOVERY - puppet last run on mw2017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:00:24] 6operations: introduce our fleece blankets and bathrobes factory - https://phabricator.wikimedia.org/T92804#1120695 (10emailbot) [07:00:26] 6operations: introduce our fleece blankets and bathrobes factory - https://phabricator.wikimedia.org/T92805#1120699 (10emailbot) [07:38:45] (03CR) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [08:02:33] PROBLEM - RAID on mw2024 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:02:33] PROBLEM - RAID on mw2021 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:02:33] PROBLEM - RAID on mw2022 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:02:33] PROBLEM - RAID on mw2025 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:02:33] PROBLEM - RAID on mw2019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:02:33] PROBLEM - RAID on mw2023 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:03] PROBLEM - configured eth on mw2022 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:03] PROBLEM - configured eth on mw2023 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:03] PROBLEM - configured eth on mw2024 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:03] PROBLEM - configured eth on mw2021 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:03] PROBLEM - configured eth on mw2019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:03] PROBLEM - configured eth on mw2025 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:12] PROBLEM - dhclient process on mw2022 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:12] PROBLEM - dhclient process on mw2021 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:12] PROBLEM - dhclient process on mw2023 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:12] PROBLEM - dhclient process on mw2024 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:12] PROBLEM - dhclient process on mw2025 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:12] PROBLEM - dhclient process on mw2019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:21] PROBLEM - nutcracker port on mw2022 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:21] PROBLEM - nutcracker port on mw2021 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:22] PROBLEM - nutcracker port on mw2019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:22] PROBLEM - nutcracker port on mw2023 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:22] PROBLEM - nutcracker port on mw2024 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:22] PROBLEM - nutcracker port on mw2025 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:31] PROBLEM - nutcracker process on mw2023 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:31] PROBLEM - nutcracker process on mw2022 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:31] PROBLEM - nutcracker process on mw2024 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:31] PROBLEM - nutcracker process on mw2019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:32] PROBLEM - nutcracker process on mw2025 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:32] PROBLEM - nutcracker process on mw2021 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:42] PROBLEM - puppet last run on mw2022 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:42] PROBLEM - puppet last run on mw2023 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:42] PROBLEM - puppet last run on mw2019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:42] PROBLEM - puppet last run on mw2021 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:42] PROBLEM - puppet last run on mw2025 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:03:43] PROBLEM - puppet last run on mw2024 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:02] PROBLEM - salt-minion processes on mw2023 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:02] PROBLEM - salt-minion processes on mw2019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:02] PROBLEM - salt-minion processes on mw2021 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:02] PROBLEM - salt-minion processes on mw2025 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:02] PROBLEM - salt-minion processes on mw2024 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:02] PROBLEM - salt-minion processes on mw2022 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:12] PROBLEM - DPKG on mw2022 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:13] PROBLEM - DPKG on mw2019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:13] PROBLEM - DPKG on mw2021 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:13] PROBLEM - DPKG on mw2023 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:13] PROBLEM - DPKG on mw2025 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:13] RECOVERY - configured eth on mw2024 is OK: NRPE: Unable to read output [08:04:22] RECOVERY - dhclient process on mw2024 is OK: PROCS OK: 0 processes with command name dhclient [08:04:22] PROBLEM - Disk space on mw2019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:22] PROBLEM - Disk space on mw2023 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:22] PROBLEM - Disk space on mw2021 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:22] PROBLEM - Disk space on mw2025 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:22] PROBLEM - Disk space on mw2022 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:04:32] RECOVERY - nutcracker port on mw2024 is OK: TCP OK - 0.000 second response time on port 11212 [08:04:41] RECOVERY - nutcracker process on mw2024 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:04:41] RECOVERY - nutcracker process on mw2019 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:04:41] RECOVERY - nutcracker process on mw2023 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:04:52] RECOVERY - RAID on mw2024 is OK: OK: no RAID installed [08:04:52] RECOVERY - RAID on mw2025 is OK: OK: no RAID installed [08:04:52] RECOVERY - RAID on mw2019 is OK: OK: no RAID installed [08:04:52] RECOVERY - RAID on mw2023 is OK: OK: no RAID installed [08:05:02] RECOVERY - salt-minion processes on mw2023 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:05:02] RECOVERY - salt-minion processes on mw2021 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:05:02] RECOVERY - salt-minion processes on mw2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:05:02] RECOVERY - salt-minion processes on mw2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:05:02] RECOVERY - salt-minion processes on mw2025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:05:03] RECOVERY - salt-minion processes on mw2024 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:05:22] RECOVERY - DPKG on mw2023 is OK: All packages OK [08:05:22] RECOVERY - DPKG on mw2019 is OK: All packages OK [08:05:22] RECOVERY - DPKG on mw2025 is OK: All packages OK [08:05:23] RECOVERY - DPKG on mw2022 is OK: All packages OK [08:05:23] RECOVERY - DPKG on mw2021 is OK: All packages OK [08:05:23] RECOVERY - configured eth on mw2022 is OK: NRPE: Unable to read output [08:05:23] RECOVERY - configured eth on mw2023 is OK: NRPE: Unable to read output [08:05:24] RECOVERY - configured eth on mw2019 is OK: NRPE: Unable to read output [08:05:24] RECOVERY - configured eth on mw2021 is OK: NRPE: Unable to read output [08:05:25] RECOVERY - configured eth on mw2025 is OK: NRPE: Unable to read output [08:05:31] <_joe_> oh that is me installing them of course [08:05:32] RECOVERY - dhclient process on mw2023 is OK: PROCS OK: 0 processes with command name dhclient [08:05:32] RECOVERY - dhclient process on mw2022 is OK: PROCS OK: 0 processes with command name dhclient [08:05:32] RECOVERY - dhclient process on mw2021 is OK: PROCS OK: 0 processes with command name dhclient [08:05:32] RECOVERY - dhclient process on mw2025 is OK: PROCS OK: 0 processes with command name dhclient [08:05:32] RECOVERY - dhclient process on mw2019 is OK: PROCS OK: 0 processes with command name dhclient [08:05:32] RECOVERY - Disk space on mw2019 is OK: DISK OK [08:05:33] RECOVERY - Disk space on mw2023 is OK: DISK OK [08:05:33] RECOVERY - Disk space on mw2025 is OK: DISK OK [08:05:34] RECOVERY - Disk space on mw2021 is OK: DISK OK [08:05:34] RECOVERY - Disk space on mw2022 is OK: DISK OK [08:05:36] <_joe_> sorry for the spam [08:05:41] RECOVERY - nutcracker port on mw2021 is OK: TCP OK - 0.000 second response time on port 11212 [08:05:41] RECOVERY - nutcracker port on mw2023 is OK: TCP OK - 0.000 second response time on port 11212 [08:05:41] RECOVERY - nutcracker port on mw2022 is OK: TCP OK - 0.000 second response time on port 11212 [08:05:41] RECOVERY - nutcracker port on mw2025 is OK: TCP OK - 0.000 second response time on port 11212 [08:05:41] RECOVERY - nutcracker port on mw2019 is OK: TCP OK - 0.000 second response time on port 11212 [08:05:42] RECOVERY - nutcracker process on mw2022 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:05:42] RECOVERY - nutcracker process on mw2025 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:05:43] RECOVERY - nutcracker process on mw2021 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:06:02] RECOVERY - RAID on mw2022 is OK: OK: no RAID installed [08:06:02] RECOVERY - RAID on mw2021 is OK: OK: no RAID installed [08:19:01] PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: Puppet has 6 failures [08:19:02] PROBLEM - puppet last run on mw2024 is CRITICAL: CRITICAL: Puppet has 6 failures [08:19:02] PROBLEM - puppet last run on mw2022 is CRITICAL: CRITICAL: Puppet has 6 failures [08:19:02] PROBLEM - puppet last run on mw2019 is CRITICAL: CRITICAL: Puppet has 6 failures [08:19:02] PROBLEM - puppet last run on mw2025 is CRITICAL: CRITICAL: Puppet has 6 failures [08:19:02] PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 6 failures [08:20:12] RECOVERY - puppet last run on mw2019 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [08:21:32] RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [08:21:32] RECOVERY - puppet last run on mw2024 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [08:21:32] RECOVERY - puppet last run on mw2022 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [08:21:32] RECOVERY - puppet last run on mw2025 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:21:32] RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [08:35:14] (03CR) 10Giuseppe Lavagetto: [C: 031] enable authenticated access to Cassandra JMX (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/196133 (https://phabricator.wikimedia.org/T92471) (owner: 10Eevans) [08:39:32] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [08:39:32] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [08:42:45] PROBLEM - RAID on mw2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:42:45] PROBLEM - RAID on mw2026 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:42:45] PROBLEM - RAID on mw2028 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:42:45] PROBLEM - RAID on mw2035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:42:45] PROBLEM - RAID on mw2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:42:45] PROBLEM - RAID on mw2034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:42:45] PROBLEM - RAID on mw2033 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:42:46] PROBLEM - RAID on mw2027 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:42:46] PROBLEM - RAID on mw2032 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:05] PROBLEM - configured eth on mw2026 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:05] PROBLEM - configured eth on mw2034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:05] PROBLEM - configured eth on mw2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:05] PROBLEM - configured eth on mw2032 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:05] PROBLEM - configured eth on mw2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:06] PROBLEM - configured eth on mw2035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:06] PROBLEM - configured eth on mw2028 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:07] PROBLEM - configured eth on mw2027 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:07] PROBLEM - configured eth on mw2033 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:15] PROBLEM - dhclient process on mw2033 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:15] PROBLEM - dhclient process on mw2032 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:15] PROBLEM - dhclient process on mw2027 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:15] PROBLEM - dhclient process on mw2028 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:15] PROBLEM - dhclient process on mw2026 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:16] PROBLEM - dhclient process on mw2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:16] PROBLEM - dhclient process on mw2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:17] PROBLEM - dhclient process on mw2034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:17] PROBLEM - dhclient process on mw2035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:25] PROBLEM - nutcracker port on mw2027 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:25] PROBLEM - nutcracker port on mw2026 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:25] PROBLEM - nutcracker port on mw2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:25] PROBLEM - nutcracker port on mw2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:25] PROBLEM - nutcracker port on mw2028 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:26] PROBLEM - nutcracker port on mw2035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:26] PROBLEM - nutcracker port on mw2033 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:27] PROBLEM - nutcracker port on mw2034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:27] PROBLEM - nutcracker port on mw2032 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:34] PROBLEM - nutcracker process on mw2027 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:35] PROBLEM - nutcracker process on mw2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:35] PROBLEM - nutcracker process on mw2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:35] PROBLEM - nutcracker process on mw2032 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:35] PROBLEM - nutcracker process on mw2028 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:35] PROBLEM - nutcracker process on mw2026 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:35] PROBLEM - nutcracker process on mw2033 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:36] PROBLEM - nutcracker process on mw2034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:36] PROBLEM - nutcracker process on mw2035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:45] PROBLEM - puppet last run on mw2026 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:45] PROBLEM - puppet last run on mw2027 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:45] PROBLEM - puppet last run on mw2028 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:45] PROBLEM - puppet last run on mw2033 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:46] PROBLEM - puppet last run on mw2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:46] PROBLEM - puppet last run on mw2032 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:46] PROBLEM - puppet last run on mw2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:47] PROBLEM - puppet last run on mw2034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:43:47] PROBLEM - puppet last run on mw2035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:06] PROBLEM - salt-minion processes on mw2027 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:06] PROBLEM - salt-minion processes on mw2028 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:06] PROBLEM - salt-minion processes on mw2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:06] PROBLEM - salt-minion processes on mw2032 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:06] PROBLEM - salt-minion processes on mw2034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:06] PROBLEM - salt-minion processes on mw2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:07] PROBLEM - salt-minion processes on mw2033 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:07] PROBLEM - salt-minion processes on mw2026 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:08] PROBLEM - salt-minion processes on mw2035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:14] PROBLEM - DPKG on mw2027 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:14] PROBLEM - DPKG on mw2028 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:14] PROBLEM - DPKG on mw2026 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:14] PROBLEM - DPKG on mw2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:14] PROBLEM - DPKG on mw2035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:15] PROBLEM - DPKG on mw2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:15] PROBLEM - DPKG on mw2033 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:16] PROBLEM - DPKG on mw2034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:16] PROBLEM - DPKG on mw2032 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:25] PROBLEM - Disk space on mw2026 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:25] PROBLEM - Disk space on mw2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:25] PROBLEM - Disk space on mw2035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:25] PROBLEM - Disk space on mw2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:25] PROBLEM - Disk space on mw2027 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:26] PROBLEM - Disk space on mw2028 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:26] PROBLEM - Disk space on mw2034 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:27] PROBLEM - Disk space on mw2033 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:27] PROBLEM - Disk space on mw2032 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:45:25] RECOVERY - configured eth on mw2031 is OK: NRPE: Unable to read output [08:45:35] RECOVERY - dhclient process on mw2031 is OK: PROCS OK: 0 processes with command name dhclient [08:45:35] RECOVERY - Disk space on mw2031 is OK: DISK OK [08:45:45] RECOVERY - nutcracker port on mw2031 is OK: TCP OK - 0.000 second response time on port 11212 [08:45:45] RECOVERY - nutcracker port on mw2034 is OK: TCP OK - 0.000 second response time on port 11212 [08:45:55] RECOVERY - nutcracker process on mw2029 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:45:55] RECOVERY - nutcracker process on mw2031 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:45:55] RECOVERY - nutcracker process on mw2035 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:45:55] RECOVERY - nutcracker process on mw2034 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:46:15] RECOVERY - RAID on mw2028 is OK: OK: no RAID installed [08:46:15] RECOVERY - RAID on mw2035 is OK: OK: no RAID installed [08:46:15] RECOVERY - RAID on mw2029 is OK: OK: no RAID installed [08:46:15] RECOVERY - RAID on mw2034 is OK: OK: no RAID installed [08:46:15] RECOVERY - RAID on mw2027 is OK: OK: no RAID installed [08:46:15] RECOVERY - RAID on mw2031 is OK: OK: no RAID installed [08:46:15] RECOVERY - RAID on mw2032 is OK: OK: no RAID installed [08:46:16] RECOVERY - RAID on mw2033 is OK: OK: no RAID installed [08:46:16] RECOVERY - RAID on mw2026 is OK: OK: no RAID installed [08:46:26] RECOVERY - salt-minion processes on mw2028 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:46:26] RECOVERY - salt-minion processes on mw2027 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:46:27] RECOVERY - salt-minion processes on mw2033 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:46:27] RECOVERY - salt-minion processes on mw2029 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:46:27] RECOVERY - salt-minion processes on mw2031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:46:27] RECOVERY - salt-minion processes on mw2032 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:46:27] RECOVERY - salt-minion processes on mw2035 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:46:28] RECOVERY - salt-minion processes on mw2034 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:46:28] RECOVERY - salt-minion processes on mw2026 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:46:34] RECOVERY - DPKG on mw2027 is OK: All packages OK [08:46:34] RECOVERY - DPKG on mw2026 is OK: All packages OK [08:46:34] RECOVERY - DPKG on mw2029 is OK: All packages OK [08:46:34] RECOVERY - DPKG on mw2033 is OK: All packages OK [08:46:34] RECOVERY - DPKG on mw2028 is OK: All packages OK [08:46:35] RECOVERY - DPKG on mw2035 is OK: All packages OK [08:46:35] RECOVERY - DPKG on mw2031 is OK: All packages OK [08:46:36] RECOVERY - DPKG on mw2032 is OK: All packages OK [08:46:36] RECOVERY - DPKG on mw2034 is OK: All packages OK [08:46:37] RECOVERY - configured eth on mw2026 is OK: NRPE: Unable to read output [08:46:37] RECOVERY - configured eth on mw2032 is OK: NRPE: Unable to read output [08:46:38] RECOVERY - configured eth on mw2028 is OK: NRPE: Unable to read output [08:46:38] RECOVERY - configured eth on mw2034 is OK: NRPE: Unable to read output [08:46:51] RECOVERY - Disk space on mw2033 is OK: DISK OK [08:46:55] RECOVERY - nutcracker port on mw2027 is OK: TCP OK - 0.000 second response time on port 11212 [08:46:55] RECOVERY - nutcracker port on mw2028 is OK: TCP OK - 0.000 second response time on port 11212 [08:46:55] RECOVERY - nutcracker port on mw2026 is OK: TCP OK - 0.000 second response time on port 11212 [08:46:55] RECOVERY - nutcracker port on mw2033 is OK: TCP OK - 0.000 second response time on port 11212 [08:46:55] RECOVERY - nutcracker port on mw2029 is OK: TCP OK - 0.000 second response time on port 11212 [08:46:55] RECOVERY - nutcracker port on mw2035 is OK: TCP OK - 0.000 second response time on port 11212 [08:46:56] RECOVERY - nutcracker port on mw2032 is OK: TCP OK - 0.000 second response time on port 11212 [08:47:04] RECOVERY - nutcracker process on mw2032 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:47:05] RECOVERY - nutcracker process on mw2027 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:47:05] RECOVERY - nutcracker process on mw2028 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:47:05] RECOVERY - nutcracker process on mw2026 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:47:05] RECOVERY - nutcracker process on mw2033 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:57:55] PROBLEM - puppet last run on mw2029 is CRITICAL: CRITICAL: Puppet has 6 failures [08:57:55] PROBLEM - puppet last run on mw2035 is CRITICAL: CRITICAL: Puppet has 6 failures [08:57:55] PROBLEM - puppet last run on mw2031 is CRITICAL: CRITICAL: Puppet has 6 failures [08:57:55] PROBLEM - puppet last run on mw2034 is CRITICAL: CRITICAL: Puppet has 6 failures [08:59:14] PROBLEM - puppet last run on mw2026 is CRITICAL: CRITICAL: Puppet has 6 failures [08:59:14] PROBLEM - puppet last run on mw2027 is CRITICAL: CRITICAL: Puppet has 6 failures [08:59:14] PROBLEM - puppet last run on mw2032 is CRITICAL: CRITICAL: Puppet has 6 failures [08:59:14] PROBLEM - puppet last run on mw2028 is CRITICAL: CRITICAL: Puppet has 6 failures [08:59:14] PROBLEM - puppet last run on mw2033 is CRITICAL: CRITICAL: Puppet has 6 failures [09:00:25] RECOVERY - puppet last run on mw2027 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [09:00:25] RECOVERY - puppet last run on mw2028 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [09:00:25] RECOVERY - puppet last run on mw2032 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [09:00:25] RECOVERY - puppet last run on mw2033 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [09:00:25] RECOVERY - puppet last run on mw2029 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [09:00:26] RECOVERY - puppet last run on mw2035 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [09:00:26] RECOVERY - puppet last run on mw2031 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [09:00:27] RECOVERY - puppet last run on mw2034 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [09:01:35] RECOVERY - puppet last run on mw2026 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [09:01:49] (03CR) 10Giuseppe Lavagetto: [C: 031] add rbf2001/2002 hosts yaml files in hiera [puppet] - 10https://gerrit.wikimedia.org/r/196704 (https://phabricator.wikimedia.org/T86898) (owner: 10Dzahn) [09:01:57] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [09:01:57] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [09:03:21] (03PS2) 10Giuseppe Lavagetto: nutcracker: add mc2001 and mc2004 to the config [puppet] - 10https://gerrit.wikimedia.org/r/196555 [09:04:02] (03CR) 10Giuseppe Lavagetto: [C: 032] nutcracker: add mc2001 and mc2004 to the config [puppet] - 10https://gerrit.wikimedia.org/r/196555 (owner: 10Giuseppe Lavagetto) [09:05:01] (03CR) 10Giuseppe Lavagetto: "beware: we're seeing problems with rbf1* that should be troubleshooted before we allow this in prod" [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) (owner: 10Yuvipanda) [09:33:13] PROBLEM - puppet last run on mw2038 is CRITICAL: CRITICAL: Puppet has 6 failures [09:34:24] PROBLEM - puppet last run on mw2043 is CRITICAL: CRITICAL: Puppet has 6 failures [09:34:24] PROBLEM - puppet last run on mw2041 is CRITICAL: CRITICAL: Puppet has 6 failures [09:35:34] PROBLEM - puppet last run on mw2037 is CRITICAL: CRITICAL: Puppet has 6 failures [09:35:34] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 6 failures [09:35:34] PROBLEM - puppet last run on mw2036 is CRITICAL: CRITICAL: Puppet has 6 failures [09:35:34] PROBLEM - puppet last run on mw2044 is CRITICAL: CRITICAL: Puppet has 6 failures [09:35:34] PROBLEM - puppet last run on mw2040 is CRITICAL: CRITICAL: Puppet has 6 failures [09:35:34] PROBLEM - puppet last run on mw2042 is CRITICAL: CRITICAL: Puppet has 6 failures [09:35:35] RECOVERY - puppet last run on mw2038 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [09:35:35] PROBLEM - puppet last run on mw2039 is CRITICAL: CRITICAL: Puppet has 6 failures [09:38:03] RECOVERY - puppet last run on mw2037 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [09:38:03] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [09:38:03] RECOVERY - puppet last run on mw2036 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [09:38:03] RECOVERY - puppet last run on mw2044 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [09:38:03] RECOVERY - puppet last run on mw2042 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [09:38:04] RECOVERY - puppet last run on mw2043 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [09:38:04] RECOVERY - puppet last run on mw2041 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [09:38:05] RECOVERY - puppet last run on mw2039 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [09:39:14] RECOVERY - puppet last run on mw2040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:02:09] !log restarting jenkins [10:02:13] Logged the message, Master [10:02:26] PROBLEM - puppet last run on mw2048 is CRITICAL: Connection refused by host [10:02:26] PROBLEM - puppet last run on mw2046 is CRITICAL: Connection refused by host [10:02:26] PROBLEM - puppet last run on mw2051 is CRITICAL: Connection refused by host [10:02:26] PROBLEM - puppet last run on mw2047 is CRITICAL: Connection refused by host [10:02:26] PROBLEM - puppet last run on mw2053 is CRITICAL: Connection refused by host [10:02:26] PROBLEM - puppet last run on mw2049 is CRITICAL: Connection refused by host [10:02:27] PROBLEM - puppet last run on mw2052 is CRITICAL: Connection refused by host [10:02:43] PROBLEM - salt-minion processes on mw2048 is CRITICAL: Connection refused by host [10:02:43] PROBLEM - salt-minion processes on mw2046 is CRITICAL: Connection refused by host [10:02:44] PROBLEM - salt-minion processes on mw2047 is CRITICAL: Connection refused by host [10:02:44] PROBLEM - salt-minion processes on mw2049 is CRITICAL: Connection refused by host [10:02:44] PROBLEM - salt-minion processes on mw2053 is CRITICAL: Connection refused by host [10:02:44] PROBLEM - salt-minion processes on mw2051 is CRITICAL: Connection refused by host [10:02:44] PROBLEM - salt-minion processes on mw2052 is CRITICAL: Connection refused by host [10:02:54] PROBLEM - DPKG on mw2047 is CRITICAL: Connection refused by host [10:02:54] PROBLEM - DPKG on mw2046 is CRITICAL: Connection refused by host [10:02:54] PROBLEM - DPKG on mw2053 is CRITICAL: Connection refused by host [10:02:54] PROBLEM - DPKG on mw2052 is CRITICAL: Connection refused by host [10:02:54] PROBLEM - DPKG on mw2049 is CRITICAL: Connection refused by host [10:02:54] PROBLEM - DPKG on mw2051 is CRITICAL: Connection refused by host [10:02:54] PROBLEM - DPKG on mw2048 is CRITICAL: Connection refused by host [10:03:13] PROBLEM - Disk space on mw2048 is CRITICAL: Connection refused by host [10:03:13] PROBLEM - Disk space on mw2046 is CRITICAL: Connection refused by host [10:03:13] PROBLEM - Disk space on mw2051 is CRITICAL: Connection refused by host [10:03:13] PROBLEM - Disk space on mw2047 is CRITICAL: Connection refused by host [10:03:13] PROBLEM - Disk space on mw2052 is CRITICAL: Connection refused by host [10:03:14] PROBLEM - Disk space on mw2053 is CRITICAL: Connection refused by host [10:03:14] PROBLEM - Disk space on mw2049 is CRITICAL: Connection refused by host [10:03:54] PROBLEM - RAID on mw2048 is CRITICAL: Connection refused by host [10:03:54] PROBLEM - RAID on mw2051 is CRITICAL: Connection refused by host [10:03:54] PROBLEM - RAID on mw2049 is CRITICAL: Connection refused by host [10:03:54] PROBLEM - RAID on mw2052 is CRITICAL: Connection refused by host [10:03:54] PROBLEM - RAID on mw2053 is CRITICAL: Connection refused by host [10:03:55] PROBLEM - RAID on mw2047 is CRITICAL: Connection refused by host [10:03:55] PROBLEM - RAID on mw2046 is CRITICAL: Connection refused by host [10:04:14] PROBLEM - configured eth on mw2048 is CRITICAL: Connection refused by host [10:04:15] PROBLEM - configured eth on mw2047 is CRITICAL: Connection refused by host [10:04:15] PROBLEM - configured eth on mw2049 is CRITICAL: Connection refused by host [10:04:15] PROBLEM - configured eth on mw2052 is CRITICAL: Connection refused by host [10:04:15] PROBLEM - configured eth on mw2053 is CRITICAL: Connection refused by host [10:04:15] PROBLEM - configured eth on mw2046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:15] PROBLEM - configured eth on mw2051 is CRITICAL: Connection refused by host [10:04:33] PROBLEM - dhclient process on mw2047 is CRITICAL: Connection refused by host [10:04:34] PROBLEM - dhclient process on mw2052 is CRITICAL: Connection refused by host [10:04:34] PROBLEM - dhclient process on mw2053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:34] PROBLEM - dhclient process on mw2051 is CRITICAL: Connection refused by host [10:04:34] PROBLEM - dhclient process on mw2049 is CRITICAL: Connection refused by host [10:04:34] PROBLEM - dhclient process on mw2046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:34] PROBLEM - dhclient process on mw2048 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:44] PROBLEM - nutcracker port on mw2047 is CRITICAL: Connection refused by host [10:04:44] PROBLEM - nutcracker port on mw2048 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:44] PROBLEM - nutcracker port on mw2051 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:44] PROBLEM - nutcracker port on mw2049 is CRITICAL: Connection refused by host [10:04:44] PROBLEM - nutcracker port on mw2053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:44] PROBLEM - nutcracker port on mw2052 is CRITICAL: Connection refused by host [10:04:44] PROBLEM - nutcracker port on mw2046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:54] PROBLEM - nutcracker process on mw2048 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:54] PROBLEM - nutcracker process on mw2046 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:54] PROBLEM - nutcracker process on mw2049 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:54] PROBLEM - nutcracker process on mw2047 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:54] PROBLEM - nutcracker process on mw2052 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:55] PROBLEM - nutcracker process on mw2053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:04:55] PROBLEM - nutcracker process on mw2051 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:11:54] RECOVERY - nutcracker port on mw2046 is OK: TCP OK - 0.000 second response time on port 11212 [10:12:14] RECOVERY - nutcracker process on mw2046 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:12:24] RECOVERY - RAID on mw2048 is OK: OK: no RAID installed [10:12:24] RECOVERY - RAID on mw2053 is OK: OK: no RAID installed [10:12:24] RECOVERY - RAID on mw2046 is OK: OK: no RAID installed [10:12:24] RECOVERY - salt-minion processes on mw2046 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:12:24] RECOVERY - salt-minion processes on mw2048 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:12:24] RECOVERY - salt-minion processes on mw2053 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:12:34] RECOVERY - DPKG on mw2046 is OK: All packages OK [10:12:34] RECOVERY - DPKG on mw2053 is OK: All packages OK [10:12:34] RECOVERY - DPKG on mw2051 is OK: All packages OK [10:12:34] RECOVERY - DPKG on mw2048 is OK: All packages OK [10:12:45] RECOVERY - configured eth on mw2052 is OK: NRPE: Unable to read output [10:12:45] RECOVERY - configured eth on mw2047 is OK: NRPE: Unable to read output [10:12:45] RECOVERY - configured eth on mw2046 is OK: NRPE: Unable to read output [10:12:45] RECOVERY - configured eth on mw2053 is OK: NRPE: Unable to read output [10:12:45] RECOVERY - configured eth on mw2048 is OK: NRPE: Unable to read output [10:12:45] RECOVERY - configured eth on mw2051 is OK: NRPE: Unable to read output [10:12:53] RECOVERY - Disk space on mw2051 is OK: DISK OK [10:12:54] RECOVERY - Disk space on mw2052 is OK: DISK OK [10:12:54] RECOVERY - Disk space on mw2048 is OK: DISK OK [10:12:54] RECOVERY - Disk space on mw2053 is OK: DISK OK [10:12:54] RECOVERY - Disk space on mw2046 is OK: DISK OK [10:12:54] RECOVERY - Disk space on mw2047 is OK: DISK OK [10:12:54] RECOVERY - Disk space on mw2049 is OK: DISK OK [10:13:03] RECOVERY - dhclient process on mw2051 is OK: PROCS OK: 0 processes with command name dhclient [10:13:04] RECOVERY - dhclient process on mw2052 is OK: PROCS OK: 0 processes with command name dhclient [10:13:04] RECOVERY - dhclient process on mw2053 is OK: PROCS OK: 0 processes with command name dhclient [10:13:04] RECOVERY - dhclient process on mw2049 is OK: PROCS OK: 0 processes with command name dhclient [10:13:04] RECOVERY - dhclient process on mw2047 is OK: PROCS OK: 0 processes with command name dhclient [10:13:04] RECOVERY - dhclient process on mw2048 is OK: PROCS OK: 0 processes with command name dhclient [10:13:04] RECOVERY - dhclient process on mw2046 is OK: PROCS OK: 0 processes with command name dhclient [10:13:13] RECOVERY - nutcracker port on mw2047 is OK: TCP OK - 0.000 second response time on port 11212 [10:13:13] RECOVERY - nutcracker port on mw2048 is OK: TCP OK - 0.000 second response time on port 11212 [10:13:13] RECOVERY - nutcracker port on mw2052 is OK: TCP OK - 0.000 second response time on port 11212 [10:13:13] RECOVERY - nutcracker port on mw2051 is OK: TCP OK - 0.000 second response time on port 11212 [10:13:14] RECOVERY - nutcracker port on mw2053 is OK: TCP OK - 0.000 second response time on port 11212 [10:13:14] RECOVERY - nutcracker port on mw2049 is OK: TCP OK - 0.000 second response time on port 11212 [10:13:23] RECOVERY - nutcracker process on mw2049 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:13:23] RECOVERY - nutcracker process on mw2052 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:13:24] RECOVERY - nutcracker process on mw2053 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:13:24] RECOVERY - nutcracker process on mw2051 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:13:24] RECOVERY - nutcracker process on mw2047 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:13:24] RECOVERY - nutcracker process on mw2048 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:13:34] RECOVERY - RAID on mw2051 is OK: OK: no RAID installed [10:13:34] RECOVERY - RAID on mw2052 is OK: OK: no RAID installed [10:13:34] RECOVERY - RAID on mw2049 is OK: OK: no RAID installed [10:13:34] RECOVERY - RAID on mw2047 is OK: OK: no RAID installed [10:13:43] RECOVERY - salt-minion processes on mw2051 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:13:43] RECOVERY - salt-minion processes on mw2049 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:13:43] RECOVERY - salt-minion processes on mw2052 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:13:43] RECOVERY - salt-minion processes on mw2047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:13:54] RECOVERY - DPKG on mw2047 is OK: All packages OK [10:13:54] RECOVERY - DPKG on mw2049 is OK: All packages OK [10:13:54] RECOVERY - DPKG on mw2052 is OK: All packages OK [10:14:03] RECOVERY - configured eth on mw2049 is OK: NRPE: Unable to read output [10:16:19] 6operations: Get python-gear 0.5.5 to trusty-wikimedia and jessie-wikimedia - https://phabricator.wikimedia.org/T92684#1120953 (10hashar) [10:16:31] 6operations: Upload python-gear 0.5.5-1 to Debian project - https://phabricator.wikimedia.org/T89952#1049604 (10hashar) [10:16:32] 6operations: Get python-gear 0.5.5 to trusty-wikimedia and jessie-wikimedia - https://phabricator.wikimedia.org/T92684#1118062 (10hashar) [10:23:15] PROBLEM - puppet last run on mw2053 is CRITICAL: CRITICAL: Puppet has 6 failures [10:23:15] PROBLEM - puppet last run on mw2051 is CRITICAL: CRITICAL: Puppet has 6 failures [10:23:15] PROBLEM - puppet last run on mw2048 is CRITICAL: CRITICAL: Puppet has 6 failures [10:23:15] PROBLEM - puppet last run on mw2047 is CRITICAL: CRITICAL: Puppet has 6 failures [10:23:15] PROBLEM - puppet last run on mw2049 is CRITICAL: CRITICAL: Puppet has 6 failures [10:23:15] PROBLEM - puppet last run on mw2046 is CRITICAL: CRITICAL: Puppet has 6 failures [10:23:16] PROBLEM - puppet last run on mw2052 is CRITICAL: CRITICAL: Puppet has 6 failures [10:23:23] 6operations, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1120962 (10Joe) 3NEW [10:24:35] RECOVERY - puppet last run on mw2046 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:25:44] RECOVERY - puppet last run on mw2048 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:25:44] RECOVERY - puppet last run on mw2053 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:25:44] RECOVERY - puppet last run on mw2051 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:25:44] RECOVERY - puppet last run on mw2047 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:25:44] RECOVERY - puppet last run on mw2049 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:25:44] RECOVERY - puppet last run on mw2052 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:37:07] 6operations, 10ops-esams: Rack, cable, prepare cp3030-3049 - https://phabricator.wikimedia.org/T92514#1120992 (10mark) >>! In T92514#1119761, @Multichill wrote: > I'm happy I only put asset tags on the servers and no labels yet ;-) > > Oh btw, these servers have a display. I'm pretty sure you can output the h... [10:38:04] 6operations, 10ops-esams: Rack, cable, prepare cp3030-3049 - https://phabricator.wikimedia.org/T92514#1120993 (10mark) [11:16:58] 6operations, 10ops-esams: Remove knsq16-30 and prepare OE13 for new servers - https://phabricator.wikimedia.org/T92519#1121091 (10mark) 5Open>3Resolved We decided to use OE10 instead. Nonetheless, knsq16-30 have been removed from this rack. [11:18:23] 6operations, 10ops-esams: Remove all Toolserver equipment - https://phabricator.wikimedia.org/T92518#1121093 (10mark) [11:18:58] 6operations, 10ops-esams: Remove all Toolserver equipment - https://phabricator.wikimedia.org/T92518#1113608 (10mark) All Toolserver equipment has been removed from esams; I'll resolve this ticket once I have a confirmation from the recycling company. [12:07:31] 6operations, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1121183 (10faidon) The servers are not configured on the switches, i.e. there is no VLAN set. I'd configure the VLAN, but there are also... [12:08:32] 6operations, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1121191 (10Joe) Troubleshooted this quickly with Faidon, and we found out the switches are not configured for those servers: 12:47 lol [12:09:32] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1121192 (10Joe) [12:25:50] (03PS10) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) [12:32:18] (03CR) 10Giuseppe Lavagetto: "This is a finished version of this config and I target merging it this week" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [13:09:27] (03CR) 10Ottomata: "Hm, I still don't understand. Why does the puppet compiler need to work with submodules? Aren't you just trying to add testing to this m" [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [13:11:46] hey [13:11:50] what's with all these 500s? [13:12:32] (03CR) 10Ottomata: "@gwicke, I read your argument as argument as "This is hard for me, so let's do the thing that makes it easier for me, even if it makes it " [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [13:15:10] <_joe_> paravoid: are you looking into it? I was heading to lunch [13:22:01] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Apart from my general opinion on our need for another ENC, this would cripple down performance - also a small security concern." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/196628 (owner: 10Yuvipanda) [13:32:45] (03CR) 10Ottomata: "@ori, what is so painful? git submodule update is not hard. Is there something else, other than these testing issues, you are referring " [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [13:33:40] gwicke, tonythomas: Submodule update patches for SWAT? See https://gerrit.wikimedia.org/r/#/c/196733/ for an example. [13:37:51] (03CR) 10Yuvipanda: "Agreed that this is terrible for performance - which is why I'm restricting this to self-hosted puppetmasters. Even assuming a project wit" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/196628 (owner: 10Yuvipanda) [13:37:52] anomie: it affected the logs. So we had discussions to backport [13:38:16] (03PS1) 10Mark Bergsma: Revert "icinga: remove mark from SMS" [puppet] - 10https://gerrit.wikimedia.org/r/197037 [13:38:34] (03PS2) 10Mark Bergsma: Revert "icinga: remove mark from SMS" [puppet] - 10https://gerrit.wikimedia.org/r/197037 [13:38:43] (03CR) 10coren: [C: 032] "Generates correctly; harmless even if not currently used." [software] - 10https://gerrit.wikimedia.org/r/196271 (owner: 10coren) [13:39:04] (03CR) 10Mark Bergsma: [C: 032] Revert "icinga: remove mark from SMS" [puppet] - 10https://gerrit.wikimedia.org/r/197037 (owner: 10Mark Bergsma) [13:39:06] tonythomas: I mean that you should do what it says at https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Updating_the_submodule and list that patch for SWAT, instead of expecting the SWAT team to do it for you. [13:40:05] anomie: oh. I never noticed that. In my phone. Well change once I get to my pc [13:40:12] ok [13:41:29] 6operations, 10Continuous-Integration, 6Labs, 10OOjs, and 2 others: Jenkins failing with "Error: GET https://saucelabs.com: Couldn't resolve host name." - https://phabricator.wikimedia.org/T92351#1121309 (10yuvipanda) a:3coren [13:42:01] (03CR) 10Andrew Bogott: [C: 032] "Let's try!" [puppet] - 10https://gerrit.wikimedia.org/r/196961 (owner: 10Hoo man) [13:42:41] 6operations, 10Continuous-Integration, 6Labs, 10OOjs, and 2 others: Jenkins failing with "Error: GET https://saucelabs.com: Couldn't resolve host name." - https://phabricator.wikimedia.org/T92351#1121312 (10coren) p:5Unbreak!>3Normal The issue has been worked around on the CI side, but the underlying i... [13:42:49] 7Puppet, 6operations, 5Patch-For-Review: Resource attributes are quoted inconsistently - https://phabricator.wikimedia.org/T91908#1121314 (10yuvipanda) p:5Triage>3Low [13:43:33] 6operations, 10Continuous-Integration, 6Labs, 10OOjs, and 2 others: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1121316 (10coren) [13:44:17] 6operations, 10Continuous-Integration, 6Labs, 10OOjs, and 2 others: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1121318 (10yuvipanda) Interestingly, dig saucelabs.com on tools-trusty works fine. [13:44:35] 6operations, 10Continuous-Integration, 6Labs, 10OOjs, and 2 others: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1121319 (10coren) @scfc The SOA records are there; though it's not immediately clear that they work properly either... [13:44:59] 7Blocked-on-Operations, 6operations, 10RESTBase, 10hardware-requests, 7RESTBase-architecture: RESTBase production hardware - 5 of 6 ready - https://phabricator.wikimedia.org/T76986#1121320 (10yuvipanda) a:3fgiunchedi [13:45:50] 6operations, 5Patch-For-Review: Make puppet the sole manager of user keys - https://phabricator.wikimedia.org/T92475#1121322 (10yuvipanda) p:5High>3Normal @robh is there a reason you set this to high priority? [13:46:04] 6operations, 10Continuous-Integration, 6Labs, 10OOjs, and 2 others: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1121324 (10coren) @yuvipanda: That's not so much "interesting" as "expected". Dig ignores the search order so it n... [13:46:30] 6operations, 6MediaWiki-Core-Team, 10hardware-requests, 5Patch-For-Review: Fluorine needs bigger disks - https://phabricator.wikimedia.org/T92417#1121326 (10yuvipanda) @andrew any updates? [13:47:40] 6operations, 6Labs, 5Patch-For-Review: Puppetize labstore1003 - https://phabricator.wikimedia.org/T91573#1121327 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Is done. [13:48:27] 6operations, 6Labs: Make labs salt use instance names than ids - https://phabricator.wikimedia.org/T1154#1121330 (10yuvipanda) [13:50:43] 6operations, 7HTTPS, 3HTTPS-by-default: Expand HTTP frontend clusters with new hardware - https://phabricator.wikimedia.org/T86663#1121331 (10yuvipanda) a:3BBlack [13:54:05] (03PS1) 10Werdna: T88164: Make Hovercards default for Chinese, Catalan and Greek WP. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197038 [14:12:08] (03PS4) 10Ottomata: Allow kafka configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [14:13:09] (03CR) 10jenkins-bot: [V: 04-1] Allow kafka configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 (owner: 10Ottomata) [14:22:13] anomie: i'll be doing the updates instead of gwicke [14:22:23] anomie: but i'd need a bit of guidance [14:22:23] :P [14:22:24] (03PS5) 10Ottomata: Allow kafka configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [14:22:39] mobrovac: Ok. Update the Deployments page? Guidance on what, making the submodule patch? [14:22:49] yep will do [14:22:55] euh, following https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Case_1b:_extension_changes [14:23:01] i'm on tin [14:23:15] and apparently no git review there :( [14:23:29] mobrovac: You shouldn't need to do anything on tin, just on your local machine [14:23:38] ah ok, cool [14:23:40] thnx [14:23:58] anomie: so, i get the stuff in the branches of the extension, and then update the core submodule, right? [14:24:04] If you did it on tin you'd be halfway to deploying it already... [14:24:06] all from my local machine [14:24:14] mobrovac: Yes [14:24:36] but i am on tin, chrery-picked and everything, just can't git review [14:24:47] nor git push for that matter [14:24:54] (03CR) 10Jforrester: [C: 031] Create Draft (118) namespace on Korean Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) (owner: 10Revi) [14:24:54] ah ok [14:24:57] nevermind [14:24:58] ... [14:25:29] 6operations, 6MediaWiki-Core-Team, 7Wikimedia-log-errors: rbf1001 and rbf1002 are timing out / dropping clients for Redis - https://phabricator.wikimedia.org/T92591#1121388 (10Joe) @chasemp what exactly doesn't work in the hiera change? I see the config on rbf1002 server seem to be correct at the moment. [14:30:06] (03PS5) 10Jforrester: Create Draft (118) namespace on Korean Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) (owner: 10Revi) [14:30:58] (03PS21) 10JanZerebecki: Wikidata builder [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) [14:31:16] (03CR) 10Jforrester: [C: 031] "PS5 tweaks the formatting of the bug references for consistency, and adjusts the commit message. Good to go." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196987 (https://phabricator.wikimedia.org/T92798) (owner: 10Revi) [14:31:20] (03PS6) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [14:33:27] (03PS7) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [14:33:55] (03CR) 10Jforrester: "Krenair is right; there's also T51999 but that's pretty general." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196984 (owner: 10Jforrester) [14:34:07] (03CR) 10Jforrester: [C: 04-1] "Awaiting community announcement and OK." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196984 (owner: 10Jforrester) [14:35:24] 6operations, 10ops-codfw, 6Phabricator: failed to create new task via email - https://phabricator.wikimedia.org/T92832#1121401 (10fgiunchedi) 3NEW [14:36:12] James_F: ty :) [14:36:50] though it's on tomorrow's SWAT [14:36:51] 6operations, 10ops-codfw: ms-be2009.codfw.wmnet: slot=10 dev=sdk failed - https://phabricator.wikimedia.org/T92833#1121413 (10fgiunchedi) 3NEW [14:38:22] anomie: hm, apparently i can't +2 the cherry-pick for some reason (in the ext repo) [14:38:35] so should i leave it like that and proceed with submodule update? [14:38:35] mobrovac: Link? [14:38:40] ah [14:38:42] Revi: In 24 hours' time? OK. [14:39:14] anomie: https://gerrit.wikimedia.org/r/#/c/197041/ and https://gerrit.wikimedia.org/r/#/c/197042/ [14:39:23] same thing for the 2 active branches [14:40:00] James_F: yeah, tomorrow morning SWAT [14:40:06] (tbh this is my first swat :pl [14:40:08] ) [14:40:11] Revi: Good. :-) [14:40:18] (03PS8) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [14:41:05] ACKNOWLEDGEMENT - RAID on ms-be2009 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) Filippo Giunchedi T92833 [14:42:30] mobrovac: +2ed those two. [14:42:34] 6operations, 10ops-codfw: ms-be2007.codfw.wmnet: slot=8 dev=sdi failed - https://phabricator.wikimedia.org/T92834#1121427 (10fgiunchedi) 3NEW [14:43:10] 6operations, 10ops-codfw: ms-be2007.codfw.wmnet: slot=3 dev=sdd failed - https://phabricator.wikimedia.org/T92835#1121435 (10fgiunchedi) 3NEW [14:43:45] anomie: thnx, still not merged [14:43:55] oh, it seems gwicke has woken up [14:44:00] v+2 now [14:44:22] ACKNOWLEDGEMENT - RAID on ms-be2007 is CRITICAL: CRITICAL: 2 failed LD(s) (Offline, Offline) Filippo Giunchedi T92834 T92835 [14:44:38] Oh, do you not have Jenkins set up on that repo for the usual merging? [14:45:16] no, for some reason [14:45:21] (no idea why is that) [14:45:42] !log reboot ms-be2009, xfs hosed [14:45:48] Logged the message, Master [14:47:40] * anomie will do SWAT today, since it's huge and he already looked at everything [14:47:53] (03CR) 10Dzahn: [C: 032] add rbf2001/2002 hosts yaml files in hiera [puppet] - 10https://gerrit.wikimedia.org/r/196704 (https://phabricator.wikimedia.org/T86898) (owner: 10Dzahn) [14:49:32] James_F, FlorianSW, superm401, tonythomas, mobrovac: Ping for SWAT in about 11 minutes. It's a big one today, I plan to go in the order I named you in this ping. [14:49:40] * James_F nods. [14:49:42] tonythomas: Do you have your submodule updates ready? [14:50:05] anomie: pong :D [14:50:21] cool, i'm last, just in time for all of the extensions to check-out in this new branch [14:50:26] 6operations, 10Continuous-Integration, 6Labs, 10OOjs, and 2 others: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1121449 (10scfc) So what does `dig notexist.eqiad.wmflabs` return on the server where `dnsmasq` is running? If I q... [14:52:06] (03CR) 10Dzahn: [C: 032] "yes, per https://gerrit.wikimedia.org/r/#/c/58922/ . thanks Tim Landscheidt" [puppet] - 10https://gerrit.wikimedia.org/r/196787 (owner: 10Dzahn) [14:53:46] 6operations: Cannot use dsh-based restart of parsoid from tin anymore - https://phabricator.wikimedia.org/T87803#1121456 (10yuvipanda) 5Open>3declined a:3yuvipanda (Closing, since @ssastry seems to be using bastion for dsh atm, and we'll have a salt upgrade soon) [14:54:51] 6operations, 10ops-codfw, 6Phabricator: failed to create new task via email - https://phabricator.wikimedia.org/T92832#1121459 (10Qgil) [14:54:57] (03PS2) 10Yuvipanda: Grant mobrovac access to citoid hosts [puppet] - 10https://gerrit.wikimedia.org/r/195932 (https://phabricator.wikimedia.org/T92389) (owner: 10Alexandros Kosiaris) [14:55:16] yey :) [14:55:52] mobrovac: :) [14:56:13] (03CR) 10Yuvipanda: [C: 032] "Monday hath come." [puppet] - 10https://gerrit.wikimedia.org/r/195932 (https://phabricator.wikimedia.org/T92389) (owner: 10Alexandros Kosiaris) [14:56:54] anomie, ready [14:57:10] YuviPanda (or anyone in ops) can i ask you a question about tin? [14:57:35] nuria: hey! sure. [14:57:53] YuviPanda: I am trying to deploy EL and when trying to update repo i get: [14:57:55] anomie: if it would help you: if you merge my backports, i can update the submodule :) [14:57:57] https://www.irccloud.com/pastebin/yQS3rN69 [14:58:17] YuviPanda: meaning that i cannot do "git pull" [14:58:22] nuria: ah, interesting. looks like a permission muckup, but also I’m not really sure how git deploy works…. [14:58:22] FlorianSW: Which submodule? Your patches are to core [14:58:28] nuria: can you file a bug? I’ll look into it in about 15 mins [14:58:35] YuviPanda: sure [14:58:41] mobrovac: you should try sshing and sudoing now [14:58:56] anomie: argh, damn, you're right, sorry for confusion, i forget, that the fix is in core, not in inputbox :) [14:59:24] 6operations: Upload python-gear 0.5.5-1 to Debian project - https://phabricator.wikimedia.org/T89952#1121467 (10hashar) [14:59:41] Thanks for the offer though, and I'd have been after you earlier about it had they been to InputBox ;) [15:00:05] manybubbles, anomie, ^d, thcipriani, tonythomas: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150316T1500). [15:00:06] * anomie begins SWAT with James_F's config change [15:00:14] (03PS2) 10Anomie: Beta Features: Remove VisualEditor language tool (deployed everywhere) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193762 (owner: 10Jforrester) [15:00:18] anomie: That should be a no-op. [15:00:20] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193762 (owner: 10Jforrester) [15:01:06] mobrovac: hmm, and according to https://gerrit.wikimedia.org/r/#/admin/groups/630,members you should have +2 and submit rights... [15:01:28] YuviPanda: ssh && sudo working flawlessly :) [15:01:41] mobrovac: sweet. [15:01:50] YuviPanda: yes, that has been corrected it seems :) [15:01:54] YuviPanda: chhers! [15:02:15] (03PS9) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:02:23] Oh, sigh. Jenkins is being slow. [15:02:25] mobrovac: yw [15:02:25] 10Ops-Access-Requests, 6operations, 10Citoid, 6Services, 5Patch-For-Review: Give mobrovac production access for citoid - https://phabricator.wikimedia.org/T92389#1121473 (10yuvipanda) 5Open>3Resolved a:3yuvipanda All sorted now :) Someone has added @mobrovac to the mediawiaki-services gerrit group... [15:02:35] anomie: preparing the updates now ( just got infront of my pc ) [15:03:46] 6operations: unable to subscribe to operations tag after migration and merge from ops-core and ops-request - https://phabricator.wikimedia.org/T89053#1121477 (10yuvipanda) [15:04:24] (03Merged) 10jenkins-bot: Beta Features: Remove VisualEditor language tool (deployed everywhere) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193762 (owner: 10Jforrester) [15:04:51] PROBLEM - citoid on sca1001 is CRITICAL: Connection refused [15:04:59] uhm [15:05:00] mobrovac: ^ [15:05:06] mobrovac: did you do anything? :) [15:05:35] ah [15:05:36] yeah [15:05:38] restart [15:05:41] should be up though [15:05:42] !log anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Beta Features: Remove VisualEditor language tool (deployed everywhere) [[gerrit:193762]] (duration: 01m 04s) [15:05:44] James_F: ^ Is any test needed, or really a no-op? [15:05:46] Logged the message, Master [15:06:05] mobrovac: nope, [15:06:09] anomie: LGTM on Beta Cluster. Consider it done. [15:06:09] root@sca1001:/home/yuvipanda# service citoid status [15:06:10] YuviPanda: should have tried a reload instead, sorry [15:06:12] damn [15:06:12] citoid stop/waiting [15:06:18] _joe_, YuviPanda: scap timed out trying to sync to silver. "on silver returned [255]: ssh: connect to host silver port 22: Connection timed out" [15:06:39] mobrovac: also, use !log in this channel whenever you do any significant action like this, though [15:06:51] oki [15:06:51] anomie: ah, probably the base::firewall on silver change that andrewbogott_afk and hoo merged earlier... [15:07:13] !log citoid down on sca1001, not coming back after restart. mobrovac investigating [15:07:16] Logged the message, Master [15:07:20] uh [15:07:25] :S [15:07:26] Thanks YuviPanda. [15:07:28] base::firewall should leave ssh open though [15:07:35] mutante: only from bastion hosts [15:07:37] and tin isn't [15:07:37] mutante: It leaves it open for bastions only [15:07:45] that makes sense, indeed [15:07:47] How does that work for the other app servers? [15:07:48] ok, YuviPanda i know what's going on, need to do a small correction to the puppet config to fix it [15:07:58] do they have that rule applied via some class that's not on silver? [15:08:04] Probably this is flawed somewhere [15:08:06] mobrovac: alright, standing by to help :) [15:08:14] silver should probably also have that class... mh [15:09:35] hoo: the other appservers don't have base:firewall yet [15:09:45] I see [15:09:48] !log anomie Synchronized php-1.25wmf21/extensions/WikiEditor/: SWAT: WikiEditor: fix Edit schema validation issues [[gerrit:196715]] [[gerrit:196716]] [[gerrit:196727]] (duration: 01m 04s) [15:09:50] James_F: ^ Test please [15:09:51] Logged the message, Master [15:10:07] silver should have a new rule that allows from tin specifically [15:10:25] anomie: Looks good from my end. Thanks! [15:10:31] I think role::mediawiki should have that rule? [15:10:39] YuviPanda: I guess [15:10:51] FlorianSW: You're up [15:10:59] (03PS1) 10Mobrovac: Quote the proxy setting [puppet] - 10https://gerrit.wikimedia.org/r/197052 [15:11:08] YuviPanda: ^^ [15:11:25] anomie: ready for testing :) [15:11:35] (03CR) 10Yuvipanda: [C: 032 V: 032] Quote the proxy setting [puppet] - 10https://gerrit.wikimedia.org/r/197052 (owner: 10Mobrovac) [15:11:52] YuviPanda: Do we need to expect tin or all scap proxies? [15:11:55] YuviPanda: git-deploy issue looks solved, was permits [15:11:57] * except [15:12:00] YuviPanda: thanks! [15:12:10] nuria: ah, cool :) [15:12:23] mobrovac: done, and I’m running puppet on sca1001 [15:12:40] cool thnx [15:13:31] mobrovac, tonythomas: Don't forget to update the Deployments page with your submodule updates [15:13:33] mobrovac: still dead ;) [15:13:46] (03PS10) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:14:31] > TypeError: Cannot read property 'citoidPort' of undefined [15:14:32] mobrovac: ^ [15:14:35] YuviPanda: ok, another change coming up, the example settings file was not correctly updated it seems [15:14:42] alright [15:15:17] anomie: yeah. I hope I would be able to complete all the submodule updates within the deployment window [15:15:35] (03PS11) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:16:50] (03PS12) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:17:10] * anomie waits impatiently for Jenkins... [15:17:22] (03PS1) 10Dzahn: mediawiki: allow ssh from tin for deployment [puppet] - 10https://gerrit.wikimedia.org/r/197053 [15:17:26] YuviPanda: yes, agreed, like it already has a rule for http. ^ [15:18:12] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121529 (10ArielGlenn) 3NEW a:3ArielGlenn [15:19:15] (03CR) 10Dzahn: "needs something like this to let tin connect via ssh for deployment https://gerrit.wikimedia.org/r/#/c/197053/1" [puppet] - 10https://gerrit.wikimedia.org/r/196961 (owner: 10Hoo man) [15:19:45] (03CR) 10Yuvipanda: [C: 04-1] mediawiki: allow ssh from tin for deployment (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197053 (owner: 10Dzahn) [15:19:46] 6operations, 6MediaWiki-Core-Team, 7Wikimedia-log-errors: rbf1001 and rbf1002 are timing out / dropping clients for Redis - https://phabricator.wikimedia.org/T92591#1121537 (10chasemp) >>! In T92591#1121388, @Joe wrote: > @chasemp what exactly doesn't work in the hiera change? I see the config on rbf1002 ser... [15:19:54] mutante: ^ nit [15:20:25] mutante: Thanks [15:21:33] i knew you would say this, that's why i added the comment we already do the same thing in releases [15:21:42] broken windoes, etc :P [15:21:47] *windows [15:21:54] go ahead and add it? [15:21:59] i'm not sure where is right [15:22:11] network.pp has $special_hosts [15:22:22] so we can probably add a deployment_hosts there? [15:22:24] i tried this before and afair was told it's the wrong place [15:22:31] _joe_: looking for feedback on image scalers ? [15:22:35] I’m also not sure what the answer to hoo’s question is - wether we just need tin or the all the proxies. [15:22:47] mutante: oh? why? [15:22:47] bd808: ^ [15:23:04] !log anomie Synchronized php-1.25wmf21/includes/Html.php: SWAT: Fix for mediawiki.ui style for wpTextbox1 and wpSummary in preview if text includes inbutbox element [[gerrit:196896]] (duration: 01m 03s) [15:23:05] FlorianSW: ^ Test please [15:23:06] * bd808 reads backscroll [15:23:07] Logged the message, Master [15:23:11] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [15:23:19] * FlorianSW is testing [15:23:24] booo strontium [15:23:31] (03PS1) 10coren: Update maintain-replicas [software] - 10https://gerrit.wikimedia.org/r/197055 (https://phabricator.wikimedia.org/T60196) [15:23:36] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121554 (10ArielGlenn) have a changeset for fluorine somewhere around here: https://gerrit.wikimedia.org/r/#/c/195917/ [15:23:47] FlorianSW: Just a minute, I missed a step [15:23:53] hoo: what am I looking for here? [15:24:02] anomie: i was typing my question, if i wait for caches :D [15:24:06] YuviPanda: re: "hardcoded" , it's in the role class, the place we put variables [15:24:08] https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/John_Stuart_Mill%2C_Considerations_on_Representative_Government_%281st_ed%2C_1861%29.pdf/page1-382px-John_Stuart_Mill%2C_Considerations_on_Representative_Government_%281st_ed%2C_1861%29.pdf.jpg [15:24:11] *should wait for caches [15:24:12] * anomie waits for silver again [15:24:16] https://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/WorldAviation.198409.BackCover.pdf/page1-342px-WorldAviation.198409.BackCover.pdf.jpg [15:24:21] mutante: it will cause issues for labs. [15:24:21] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [15:24:23] gives Error generating thumbnail [15:24:24] YuviPanda: trying to find the old change [15:24:28] mutante: ok. [15:24:30] YuviPanda: i dont know then [15:24:39] bd808: Do we need to ssh into appservers from tin only or all scap proxies? [15:24:50] Tin only. [15:24:55] Ok, good to know [15:25:03] we could also just fix things and make thenm perfect later? [15:25:03] !log anomie Synchronized php-1.25wmf21/includes/Html.php: SWAT: Fix for mediawiki.ui style for wpTextbox1 and wpSummary in preview if text includes inbutbox element [[gerrit:196896]] (duration: 01m 03s) [15:25:04] FlorianSW: ^ For real this time [15:25:06] mutante: finding the old change would be useful, yes. [15:25:14] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121559 (10ArielGlenn) [15:25:16] hoo: The app servers then contact the proxies over rsync protocol [15:25:21] * FlorianSW is testing again :) [15:25:50] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121529 (10ArielGlenn) Erik, I added you so you can comment on the /a/wikistats_git/ files. [15:26:02] mutante: I don’t think silver not updating for a few more minutes is that big a deal :) and you know how ‘temporary things turn into permanent things’ around here, but if you want I can just do the update to network.pp and deal with any fallout. [15:26:26] bd808: cool :) that’s useful! [15:26:30] But like YuviPanda is pointing out the implementation in Puppet should allow the deployment server IP(s) to be given in hiera so that labs and eventually codfw can use them [15:26:41] oh yeah, codfw too [15:27:08] I spent a lot of time int he last few months cleaning up things like this, and we should stop putting things like this in instead of cleaning up every now and then :) [15:27:21] anomie: this change shouldn't be cache related, right? [15:27:24] see.. he says hiera [15:27:28] he doesnt say network.pp [15:27:37] FlorianSW: No. Are you testing on a wiki running 1.25wmf21? [15:27:44] mutante: you’re welcome to convert all of network.pp to hiera atm if you’d like. there’s a ticket for tha.t [15:27:54] anomie: no, wikipedias, wmf20 *facepalm* [15:28:03] YuviPanda: now you're confusing me, do want to add it to network.pp or not [15:28:12] mutante: I want you to add it to network.pp now [15:28:34] FlorianSW: I like to do wmf21 first, since it's less likely to cause widespread panic if it breaks stuff and we have to revert ;) [15:28:35] and then we’ll convert network.pp to hiera ‘later’. hiera -> network.pp > putting it in role class, [15:28:41] anomie: ok, it's working :) [15:28:56] hiera is also a lot of work, and needs to be done very carefully, since you mess up and bam you can’t ssh to things anymore :) [15:29:01] there’s a ticket for that. [15:29:02] anomie: reasonable :P Haven't thought about it :) [15:29:04] after you just told me "we should stop putting things like this in instead of cleaning up"" [15:29:36] ... [15:29:56] !log anomie Synchronized php-1.25wmf20/includes/Html.php: SWAT: Fix for mediawiki.ui style for wpTextbox1 and wpSummary in preview if text includes inbutbox element [[gerrit:196897]] (duration: 01m 03s) [15:29:57] FlorianSW: ^ There's wmf20. Double-check that works too? [15:30:00] Logged the message, Master [15:30:06] mutante: I’m saying there is a ‘currently good solution that works for prod and labs’ which is network.pp, and then there is the ‘correct’ solution which is hiera [15:30:15] the former takes a minute, and the latter should take a few days of careful prodding. [15:30:25] anomie: checked :) Thanks! [15:30:28] superm401: You're next [15:30:39] anomie, great. [15:30:54] (03PS13) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:31:26] (03CR) 10coren: [C: 032] "Yeah backlog!" [software] - 10https://gerrit.wikimedia.org/r/197055 (https://phabricator.wikimedia.org/T60196) (owner: 10coren) [15:31:28] mutante: and if you put it in network.pp, it will get cleaned up whenever network.pp gets moved to hiera, rather than having to be hunted down specifically. [15:32:22] (03CR) 10jenkins-bot: [V: 04-1] Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 (owner: 10Ottomata) [15:33:00] !log anomie Synchronized php-1.25wmf21/extensions/Flow/: SWAT: Flow: base href fix and dependency [[gerrit:196996]] (duration: 01m 10s) [15:33:02] superm401: ^ Test please (wmf21) [15:33:03] Logged the message, Master [15:33:16] (03PS14) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:33:55] anomie, works on MediaWiki.org. [15:34:23] mobrovac, tonythomas: Neither of you has your submodule updates posted yet... Whichever one of you does first gets to go next. [15:34:59] YuviPanda: re citoid, it seems the problem is that the repo on sca1001 is not up to date with origin [15:35:04] oh [15:35:05] strange [15:35:13] by repo you mean the citoid deploy? [15:35:21] anomie: hoo have merged the patchsets. I am updating submodules over here. now following https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Case_1c:_extension_update [15:35:27] !log anomie Synchronized php-1.25wmf20/extensions/Flow/: SWAT: Flow: base href fix [[gerrit:196995]] [[gerrit:196997]] (duration: 01m 05s) [15:35:28] superm401: ^ Double-check for wmf20, please [15:35:30] Logged the message, Master [15:35:36] anomie: still waiting on all submodules to c-o ... in order to create a patch [15:35:41] YuviPanda: yes, citoid/deploy [15:35:57] mobrovac: hmm, ok. I’ll leave you to it unless you need any more puppet help? :) [15:36:36] anomie, yep, works on English WIkipedia now too. [15:36:40] Thanks. [15:37:28] !log Eventlogging deploy & restart: 4399dfc3240c0d27fdf6c517c7bf3239fc2da924 [15:37:31] Logged the message, Master [15:38:01] (03PS15) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:38:42] mutante: do you mind if I update the patch to refer to network.pp? i’d like to have a scap run before SWAT finishes, so we can verify... [15:39:03] YuviPanda: i can't really do much, as i can't update the repo on the server directly, and all of the things have been merged in gerrit [15:39:19] mutante: https://phabricator.wikimedia.org/T87519 is the bug for killing network.pp [15:39:28] mobrovac: have you tried doing another deploy? [15:39:37] ah right [15:39:38] :P [15:39:47] sorry, got confused there for a bit [15:39:51] :) [15:40:13] mobrovac: :) [15:42:09] (03PS16) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:42:14] (03PS2) 10Dzahn: mediawiki: allow ssh from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/197053 [15:42:21] YuviPanda: here,^ working from bus, just got back [15:42:34] feel free to take it or amend it more [15:42:39] need to switch to train [15:43:13] (03CR) 10Andrew Bogott: [C: 031] mediawiki: allow ssh from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/197053 (owner: 10Dzahn) [15:43:13] mutante: cool. I’ll amend [15:43:14] anomie: I am removing my patchsets from swat as of now. Will need more stable internet to update all those submodules. [15:43:32] YuviPanda: hoo: no IPv6 because tin doesn';t have it yet [15:43:48] mutante: need to include network::constants and refer to it explicitly, I think. [15:43:51] tonythomas: No, I'll just do the submodule for you then [15:44:20] anomie: that would be great :) [15:44:34] But next time, you should do it ;) [15:45:21] RECOVERY - RAID on ms-be2009 is OK: OK: optimal, 14 logical, 14 physical [15:45:22] anomie: of course. I am not on a stable connection as of now :( [15:45:40] RECOVERY - very high load average likely xfs on ms-be2009 is OK: OK - load average: 13.21, 3.55, 1.21 [15:46:31] (03PS3) 10Dzahn: mediawiki: allow ssh from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/197053 [15:46:33] (03PS1) 10Dzahn: switch ferm rule in releases to use network.pp [puppet] - 10https://gerrit.wikimedia.org/r/197062 [15:46:37] (03PS17) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:46:58] (03PS1) 10Rush: phab make mail processing to be executable [puppet] - 10https://gerrit.wikimedia.org/r/197063 [15:47:52] (03PS2) 10Rush: phab make mail processing bin executable [puppet] - 10https://gerrit.wikimedia.org/r/197063 [15:47:54] mutante: actually, I was wrong. that wont’ work either. [15:47:59] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121648 (10ArielGlenn) Ottomata, what do you think about making these have 90 days instead of 180? puppet/templates/udp2log/logrotate_udp2log_analytics.erb puppet/templates/udp2log/logrotate_udp2log.erb [15:48:02] (03PS4) 10Yuvipanda: mediawiki: allow ssh from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/197053 (owner: 10Dzahn) [15:48:03] mutante: ^ updated, this should work. [15:48:49] mutante: because network.pp has arrays, and srange doesn’t accept arrays. [15:49:19] (03CR) 10Andrew Bogott: mediawiki: allow ssh from deployment servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197053 (owner: 10Dzahn) [15:49:23] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121653 (10Ottomata) +1, but I'm pretty sure the *_analytics.erb one is not used at all. [15:49:34] (03PS1) 10GWicke: Use RESTBase for visual editing on ruwiki and ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197067 [15:50:43] (03CR) 10Yuvipanda: mediawiki: allow ssh from deployment servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197053 (owner: 10Dzahn) [15:51:19] YuviPanda: do you know if the deployment-prep project's puppet master is safe to pull from latest production? [15:51:37] ottomata: yup. it auto-pulls every 20 mins or so anyway. [15:51:41] ok cool [15:51:43] thanks [15:51:56] cmoooon jenkiiiiins [15:52:28] (03CR) 10Andrew Bogott: [C: 031] mediawiki: allow ssh from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/197053 (owner: 10Dzahn) [15:52:45] 6operations, 10ops-eqiad: cp1047 down - https://phabricator.wikimedia.org/T88045#1121655 (10Cmjohnson) I would agree the stick is bad. contacting Dell and will update the phab task with shipping information [15:52:55] (03CR) 10Jforrester: "Looks good." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197067 (owner: 10GWicke) [15:53:20] 6operations, 10Deployment-Systems: scap can't reach silver to deploy for wikitech - https://phabricator.wikimedia.org/T92843#1121660 (10yuvipanda) 3NEW [15:53:34] (03CR) 10Rush: [C: 032] phab make mail processing bin executable [puppet] - 10https://gerrit.wikimedia.org/r/197063 (owner: 10Rush) [15:53:40] anomie: https://gerrit.wikimedia.org/r/#/c/197066/ and https://gerrit.wikimedia.org/r/#/c/197059/ [15:53:48] (updated deploymnets page too) [15:53:52] !log anomie Synchronized php-1.25wmf21/extensions/BounceHandler/: SWAT: BounceHandler: Removed repititive un-subscribe action on a global user [[gerrit:196877]] (duration: 01m 04s) [15:53:54] tonythomas: ^ Test please (wmf21) [15:53:55] Logged the message, Master [15:54:03] (03PS5) 10Yuvipanda: mediawiki: allow ssh from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/197053 (https://phabricator.wikimedia.org/T92843) (owner: 10Dzahn) [15:54:11] (03PS6) 10Yuvipanda: mediawiki: allow ssh from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/197053 (https://phabricator.wikimedia.org/T92843) (owner: 10Dzahn) [15:54:13] (03PS18) 10Ottomata: Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 [15:54:22] (03CR) 10Ottomata: [C: 032 V: 032] Allow kafka and zookeeper configuration in labs via hiera [puppet] - 10https://gerrit.wikimedia.org/r/196665 (owner: 10Ottomata) [15:54:27] YuviPanda: hm, git deploy doesn't want to deploy the minions for some reason [15:54:56] mobrovac: how long did you wait? [15:55:19] hit c maybe 10 times [15:55:24] but pretty quickly [15:55:31] (03PS7) 10Yuvipanda: mediawiki: allow ssh from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/197053 (https://phabricator.wikimedia.org/T92843) (owner: 10Dzahn) [15:55:32] gwicke: should i relax a bit ? :) [15:55:40] (03CR) 10Yuvipanda: [C: 032 V: 032] mediawiki: allow ssh from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/197053 (https://phabricator.wikimedia.org/T92843) (owner: 10Dzahn) [15:55:42] mobrovac: try 'c' once more ;) [15:56:19] otherwise, you could also try 'r' [15:56:29] or 'd' to see which hosts are missing [15:57:28] Is anyone planning a deployment at 16:00 UTC? SWAT is running long. [15:57:29] anomie: hey! can you force scap (or something?) to run just for silver now? [15:57:39] YuviPanda: another q. what is the proper way to include classes on nodes in labs? [15:57:40] anomie: ah, nvm, if more scaps are planned... [15:57:43] i can do it via the checkboxes [15:57:47] but is it possible with hiera? [15:58:04] ottomata: aaaah. wikitech. for now. in a few days you can use https://gerrit.wikimedia.org/r/#/c/196628/ [15:58:37] hmm, interseting, ok, thanks [15:58:42] (03CR) 10Eevans: [C: 031] Use RESTBase for visual editing on ruwiki and ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197067 (owner: 10GWicke) [15:59:02] anomie: let me know how silver feels on next scap? it should be all good now [15:59:21] gwicke: seems the deploy dir on sca1001 is not clean, so i guess the c-o cannot happen [15:59:27] 6operations, 10ops-eqiad: cp1047 down - https://phabricator.wikimedia.org/T88045#1121683 (10Cmjohnson) Congratulations: Work Order WO6747226 was successfully submitted. [15:59:47] mobrovac: normally that shouldn't matter [15:59:59] !log anomie Synchronized php-1.25wmf21/extensions/RestBaseUpdateJobs/: SWAT: RestBaseUpdateJobs: Set HTTP headers as an associative array [[gerrit:197042]] (duration: 01m 03s) [16:00:00] mobrovac: ^ Test please (wmf21) [16:00:04] YuviPanda: Still timed out [16:00:07] Logged the message, Master [16:00:12] anomie: uh, strange. [16:00:41] wait, what. silver doesn’t actually have role::mediawiki? [16:01:02] YuviPanda: I've got two more scaps to do. And then we could always just do a no-op sync-file to keep testing, of course. [16:01:10] anomie: yup! so all good [16:01:10] 6operations: Upload python-gear 0.5.5-2 to Debian project - https://phabricator.wikimedia.org/T89952#1121688 (10hashar) [16:01:39] 6operations, 10Deployment-Systems, 5Patch-For-Review: scap can't reach silver to deploy for wikitech - https://phabricator.wikimedia.org/T92843#1121691 (10yuvipanda) Uh oh, looks like silver doesn't actually include the mediawiki role. [16:01:54] anomie: runJobs.log looks okay [16:02:07] re RestBaseUpdateJobs update [16:02:18] 6operations: Get python-gear 0.5.5 to trusty-wikimedia and jessie-wikimedia - https://phabricator.wikimedia.org/T92684#1121692 (10hashar) I have migrated the Debian source package to git and released a new minor Debian version 0.5.5-2. See {T89952} for details. [16:02:21] gwicke: ok [16:02:53] 6operations: Get python-gear 0.5.5 to trusty-wikimedia and jessie-wikimedia - https://phabricator.wikimedia.org/T92684#1121707 (10hashar) [16:04:08] !log anomie Synchronized php-1.25wmf20/extensions/RestBaseUpdateJobs/: SWAT: RestBaseUpdateJobs: Set HTTP headers as an associative array [[gerrit:197041]] (duration: 01m 03s) [16:04:11] gwicke: ^ Double check for wmf20, please? [16:04:12] Logged the message, Master [16:04:23] anomie: we will have to wait for a bounce to turn up to year that since Jeff is on a one week holiday. [16:05:08] anomie: log still looking good, and verified that template updates on wmf20 are working now [16:05:23] anomie: thank you! [16:05:27] (03PS1) 10ArielGlenn: make udp2log logrot keep for 90 days be default instead of 180 [puppet] - 10https://gerrit.wikimedia.org/r/197071 [16:05:27] year -> here [16:05:48] tonythomas: :/ [16:05:49] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1121711 (10Slaporte) @RobH: I was told that the registration credentials were sent to dns-admin@wikimedia.org. Do you have everything you need to configure the domain? [16:05:53] (03PS2) 10ArielGlenn: make udp2log logrot keep for 90 days by default instead of 180 [puppet] - 10https://gerrit.wikimedia.org/r/197071 [16:06:43] !log anomie Synchronized php-1.25wmf20/extensions/BounceHandler/: SWAT: BounceHandler: Removed repititive un-subscribe action on a global user [[gerrit:196878]] (duration: 01m 06s) [16:06:46] Logged the message, Master [16:06:53] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1121717 (10RobH) a:3RobH Yes, I just need to add this to the end of one of the deployment days this week, since it requires an apache graceful across the cluster. [16:06:54] * anomie is done with SWAT, finally [16:07:16] anomie: no worries. ;) [16:07:26] ottomata: you should probably email releng or ops@ with your kafka on betalabs plans, so people don’t freak out when there are alerts :D (Just saw a downtime one for a kafka host there) [16:08:06] ? oh i just deleted a host [16:08:10] there was an alert?! [16:08:27] YuviPanda: i just want to be able to test some eventlogging changes in beta before we try to deploy those changes to prod [16:08:40] i'll need to set up a varnishkafka instance on the beta bits instnace. [16:08:45] hm, ok [16:08:54] i'm not on releng [16:08:54] ottomata: yeah, just a short email will do. [16:09:01] ottomata: ops! [16:09:01] can I email it without being a member? [16:09:03] err [16:09:03] oi [16:09:04] ok [16:09:07] gah, ops@ [16:09:52] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121723 (10ArielGlenn) https://gerrit.wikimedia.org/r/#/c/197071/ [16:11:20] RECOVERY - NTP on cp4015 is OK: NTP OK: Offset 0.03807103634 secs [16:12:30] (03PS4) 10Jforrester: Provide the Citoid extension for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187132 [16:12:40] mobrovac: https://phabricator.wikimedia.org/T92845. let me know if there’s anything you want me to do? [16:13:34] YuviPanda: trebuchet seems to be not cooperating [16:13:38] can't deploy from tin [16:13:57] hmm. I’m not sure whom to poke, having not done anything trebuchet myself... [16:14:29] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1121743 (10Slaporte) Can you transfer the domain to our regular registrar? [16:15:11] YuviPanda: have you got root on sca1001/2? [16:15:32] mobrovac: I do :) [16:15:39] perfect [16:16:01] (03PS5) 10Jforrester: Provide the Citoid extension for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187132 [16:16:08] YuviPanda: You know everything. Does https://gerrit.wikimedia.org/r/187132 look sane? [16:17:16] James_F: looks vaguely ok, but I might be missing things (haven’t dealt with new extension deployment in forever) [16:17:20] James_F: however, citoid is also down atm :D [16:17:32] YuviPanda: um, how do instances get puppet signed in deployment prep? there's no puppet-ca installed on deployment-salt [16:17:37] YuviPanda: Yeah, noticed. :-) I trust mobrovac to get it fixed. :-) [16:18:10] YuviPanda: could you please reset the git head and everything and co master in /srv/deployment/citoid/deploy ? [16:18:13] (sca100x) [16:18:23] ottomata: it’s all fairly automated, ottomata :D takes a few mins, though. if you want it to get it over with very quickly, just run ‘sudo puppetsigner.py’ or something like that [16:18:43] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121748 (10ArielGlenn) on oxygen, all logs in /a/log/webrequest/archive/ of the form zero-*.tsv.log-*gz e.g. zero-digi-malaysia.tsv.log-20130711.gz will have to be removed manually, logrot doesn't see t... [16:18:51] yeah, YuviPanda i saw that [16:18:52] and got [16:18:52] TypeError: must be string, not None [16:19:24] ottomata: hmm, strange. I’ll take a look in a few mins? helping mobrovac now [16:19:27] sure [16:20:32] mobrovac: done [16:20:38] thnx [16:21:40] YuviPanda: nm, i was able to override by running puppet cert sign [16:21:47] ottomata: ok. [16:22:00] ottomata: shouldn’t have been required, so I’ll take a look later [16:23:28] (03PS1) 10Jforrester: Enable Citoid extension on all VisualEditor wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197074 (https://phabricator.wikimedia.org/T62768) [16:23:36] (03CR) 10Ottomata: [C: 032] make udp2log logrot keep for 90 days by default instead of 180 [puppet] - 10https://gerrit.wikimedia.org/r/197071 (owner: 10ArielGlenn) [16:23:47] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121759 (10ArielGlenn) on gadolinium, in /a/log/webrequest/archive/, all logs of the form edits.tsv.log-* mobile-sampled-100.tsv.log-* sampled-1000.tsv.log-* 5xx.tsv.log-* will ne... [16:24:32] 6operations, 10ops-eqiad: Setup the 4 new varnish caching systems - https://phabricator.wikimedia.org/T91769#1121762 (10Cmjohnson) 5Open>3Resolved Completed Task Racked and cabled in D8 Labeled Racktables updated Switch Configuration Completed. private1-d-eqiad DNS: Completed Mgmt, IPv4 and IPv6 https:/... [16:25:10] 6operations, 10ops-eqiad: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1121765 (10fgiunchedi) >>! In T90922#1075806, @fgiunchedi wrote: > I've investigated a bit more and it seems the raid controller will reoder disks as presented to the operating system, with sda / sdb being the p... [16:25:59] 6operations, 10Deployment-Systems, 5Patch-For-Review: scap can't reach silver to deploy for wikitech - https://phabricator.wikimedia.org/T92843#1121766 (10yuvipanda) I'm not sure where the mediawiki setup for silver comes from. @andrew? [16:26:24] !log Eventlogging deploy and restart, reduced batch size. Changeset: 3c987f67a0355c613aa042704a1c3422d0fcd55b [16:28:54] !log Eventlogging deploy and restart, reduced batch size. Changeset: 3c987f67a0355c613aa042704a1c3422d0fcd55b [16:28:57] Logged the message, Master [16:29:33] mobrovac: any luck? [16:29:35] !log cp3030-3049 downtimed in icinga through 2015-04-01 for now, not in production traffic flow [16:29:38] Logged the message, Master [16:29:49] YuviPanda: nope, but i've got some more commands for you :) [16:29:54] heh sure [16:30:03] mobrovac: wait don’t you also have root on sca1*? [16:30:21] (03CR) 10Mvolz: "What's the plan for roll-out of namespace messages?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197074 (https://phabricator.wikimedia.org/T62768) (owner: 10Jforrester) [16:30:31] YuviPanda: on sca1001, try "sudo salt-call deploy.fetch 'citoid/deploy'" [16:30:33] aaah [16:30:34] you don't [16:30:55] YuviPanda: good question, have to ask for it i guess [16:31:09] mobrovac: yeah. anyway, I ran it. [16:31:15] seems to have run fine [16:31:22] 6operations, 10ops-esams: Rack and configure asw-esams (new 2xQFX5100 stack) - https://phabricator.wikimedia.org/T91643#1121781 (10faidon) >>! In T91643#1100413, @faidon wrote: > What's left: > - The two VLANs have been defined but the matching interface-ranges have not, as ranges cannot be defined as empty. >... [16:31:29] mobrovac: and citoid starts now :) [16:31:36] (03PS1) 10GWicke: Lower concurrency on parsoid & restbase jobs slightly [puppet] - 10https://gerrit.wikimedia.org/r/197078 [16:31:53] (03CR) 10Jforrester: "> What's the plan for roll-out of namespace messages?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197074 (https://phabricator.wikimedia.org/T62768) (owner: 10Jforrester) [16:32:08] YuviPanda: really? the submodule is still checkout at the wrong commit [16:32:19] mobrovac: ah, nevermind. [16:32:25] mobrovac: I started and checked status immediately [16:32:30] looks like it takes a while to crap out :D [16:32:34] :) [16:32:58] ok, so the command did nothing [16:33:00] hm hm [16:33:50] YuviPanda: please, do git checkout master on sca1001 [16:34:01] (03CR) 10Manybubbles: "I'm always weary of large puppet changes like these because I'm not sure how it'll effect production but it seems pretty sane." [puppet] - 10https://gerrit.wikimedia.org/r/196640 (owner: 10Chad) [16:34:06] mobrovac: done [16:34:38] greg-g: you around? [16:34:57] YuviPanda: and now git pull && git submodule update --init [16:35:00] 6operations, 10Wikimedia-General-or-Unknown, 5Patch-For-Review: [gdash] "(cdn) HTTP Error Rate" would use log scale for 5xx errors - https://phabricator.wikimedia.org/T43754#1121795 (10fgiunchedi) 5Open>3Resolved a:3fgiunchedi spurious 500s have been fixed by https://phabricator.wikimedia.org/T88412 so... [16:35:19] mobrovac: hmm [16:35:20] > fatal: reference is not a tree: 2cd246b99fd2f407b92c33a079a1120f0abe7117 [16:35:30] (03PS1) 10Dzahn: nova::manager: allow ssh from deployment [puppet] - 10https://gerrit.wikimedia.org/r/197079 (https://phabricator.wikimedia.org/T92843) [16:35:57] (03CR) 10Giuseppe Lavagetto: [C: 032] Lower concurrency on parsoid & restbase jobs slightly [puppet] - 10https://gerrit.wikimedia.org/r/197078 (owner: 10GWicke) [16:36:00] (03CR) 10Yuvipanda: [C: 031] nova::manager: allow ssh from deployment [puppet] - 10https://gerrit.wikimedia.org/r/197079 (https://phabricator.wikimedia.org/T92843) (owner: 10Dzahn) [16:36:02] gwicke: not really, in a 3 hour budget meeting of doom [16:36:30] (03CR) 10Dzahn: [C: 032] nova::manager: allow ssh from deployment [puppet] - 10https://gerrit.wikimedia.org/r/197079 (https://phabricator.wikimedia.org/T92843) (owner: 10Dzahn) [16:36:34] greg-g: sounds like fun [16:36:55] YuviPanda: can we resort to removing /srv/deployment/citoid/deploy and running puppet again ? [16:37:03] mobrovac: heh, ok [16:37:07] then i can try to deploy (again) [16:37:10] <_joe_> mutante: I merged your change as well [16:37:19] <_joe_> gwicke: merged [16:37:43] _joe_: heh, i already had the script open and when i typed "yes" it was merged :) [16:37:50] _joe_: thx! [16:38:08] I patched the script so you can type ‘y’ instead :D [16:38:16] * YuviPanda wins puppet merge racccessss [16:38:25] haha, ok [16:38:46] <_joe_> YuviPanda: I double-dare you! [16:38:55] mutante: uh, puppet failures on silver [16:39:22] lol? exact copy of the one from earlier [16:39:28] also, no [16:39:29] yeah [16:39:33] Notice: Finished catalog run in 31.47 seconds [16:39:45] > Error: /Stage[main]/Ferm/Service[ferm]: Failed to call refresh: Could not stop Service[ferm]: Execution of '/etc/init.d/ferm stop' returned 25: [16:39:45] Error: /Stage[main]/Ferm/Service[ferm]: Could not stop Service[ferm]: Execution of '/etc/init.d/ferm stop' returned 25: [16:39:45] Notice: Finished catalog run in 31.59 seconds [16:39:48] mobrovac: nope ^ [16:39:49] err [16:39:50] mutante: ^ [16:40:15] mutante: gah, my fault. the original is missing a semicolon [16:40:16] * YuviPanda facepalms [16:40:42] how come it didnt fail on all mw servers? [16:40:43] and previously puppet didn’t fail because it wasn’t applied at all [16:40:44] 6operations, 10ops-eqiad, 10RESTBase, 6Services: restbase1006 faulty disk controller - https://phabricator.wikimedia.org/T89639#1121860 (10Cmjohnson) a:5Cmjohnson>3fgiunchedi sending over to filippo to get working [16:40:50] mutante: they don’t have base::firewall [16:41:23] (03PS1) 10ArielGlenn: purge webrequest logs after 90 days [puppet] - 10https://gerrit.wikimedia.org/r/197081 [16:41:30] yes. good that we have silver being special to catch it before :p [16:41:52] (03PS1) 10coren: Add is_sensitive to replica meta_p.wiki table [software] - 10https://gerrit.wikimedia.org/r/197082 (https://phabricator.wikimedia.org/T69476) [16:41:55] (03PS1) 10Yuvipanda: Fix syntax error in ferm rules for deployment ssh access [puppet] - 10https://gerrit.wikimedia.org/r/197083 [16:41:57] mutante: ^ [16:42:22] (03CR) 10Dzahn: [C: 031] Fix syntax error in ferm rules for deployment ssh access [puppet] - 10https://gerrit.wikimedia.org/r/197083 (owner: 10Yuvipanda) [16:42:28] gotcha, yep [16:42:45] (03CR) 10Dzahn: [C: 032] Fix syntax error in ferm rules for deployment ssh access [puppet] - 10https://gerrit.wikimedia.org/r/197083 (owner: 10Yuvipanda) [16:43:38] (03CR) 10ArielGlenn: "I am not sure if this is needed, maybe the stanza right above it covers this?" [puppet] - 10https://gerrit.wikimedia.org/r/197081 (owner: 10ArielGlenn) [16:44:02] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121869 (10ArielGlenn) https://gerrit.wikimedia.org/r/#/c/197081/ for logs on stat1002 but maybe it's not needed? [16:44:29] 6operations, 10Deployment-Systems, 5Patch-For-Review: scap can't reach silver to deploy for wikitech - https://phabricator.wikimedia.org/T92843#1121870 (10Dzahn) ``` chain INPUT { - proto tcp dport ssh saddr $DEPLOYMENT_HOSTS ACCEPT + proto tcp dport ssh saddr $DEPLOYMENT_HOSTS ACCEPT; ``` ``` ACCEP... [16:44:39] YuviPanda: worked. ACCEPT tcp -- tin.eqiad.wmnet anywhere tcp dpt:ssh [16:44:46] mutante: sweet [16:44:57] mutante: let me see if it works [16:45:20] !log yuvipanda Synchronized README: testing silver firewall hole (duration: 00m 05s) [16:45:25] Logged the message, Master [16:45:28] mutante: ^ all good [16:45:32] :) great [16:45:41] RECOVERY - RAID on ms-be2007 is OK: OK: optimal, 12 logical, 12 physical [16:45:43] anomie: should we scap again for wikitech to pick up the SWAT stuff? or is it ok if we just let it be and have it ride the train? [16:45:46] * YuviPanda has no preferences [16:45:51] mutante: thanks for the patches! :) [16:47:09] YuviPanda: thank you. i wanted to use ferm::service instead of ferm:rule but probably didn't work like that [16:47:21] mutante: yup, the saddr vs array thing. [16:47:36] YuviPanda: Probably good to do a sync-common-all (I think that's right) on the host in question. [16:47:56] and that's the thing i meant in the beginning, about trying to use variables from network.pp in an "srange" [16:47:57] No need to scap all the other hosts. [16:48:40] anomie: ah, alright. should I run it as a particular user on silver? or will it sudo to mwdeploy / www-data as needed? [16:48:49] mutante: ah, I see. I just assumed you meant that it shouldn’t be in network.pp at all. [16:49:12] YuviPanda: i wasn't sure, i mixed it up with people saying i should use hiera [16:49:21] yup, two separate issues. [16:49:27] *nod* [16:49:44] 6operations, 6Multimedia: Add monitoring of upload rate on commons to icingia alerts - https://phabricator.wikimedia.org/T92322#1121882 (10fgiunchedi) is the number of uploads already in graphite somewhere? that'd make creating the alarm very easy. [16:49:44] (03CR) 10Mobrovac: [C: 031] Use RESTBase for visual editing on ruwiki and ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197067 (owner: 10GWicke) [16:49:46] 6operations, 10Deployment-Systems, 5Patch-For-Review: scap can't reach silver to deploy for wikitech - https://phabricator.wikimedia.org/T92843#1121883 (10Dzahn) 5Open>3Resolved a:3Dzahn 09:45 < logmsgbot> !log yuvipanda Synchronized README: testing silver firewall hole (duration: 00m 05s) [16:49:54] (03CR) 10coren: [C: 032] "Tested to work." [software] - 10https://gerrit.wikimedia.org/r/197082 (https://phabricator.wikimedia.org/T69476) (owner: 10coren) [16:51:01] YuviPanda: I don't know, whichever user gets used when a scap runs. bd808 might be able to tell you. bd808: Context: silver was not accessible during SWAT. What script do they run on silver (sync-common-all?) and as which user to bring it up to date? [16:51:16] yeah, I think it’s mwdeploy. [16:51:37] and I don’t see a sync-common-all only a sync-common [16:51:40] Any user can run `sync-common --verbose` on silver to catch it up [16:51:54] cool [16:51:54] thanks [16:51:59] it does all the right sudo magic inside [16:52:06] (03CR) 10Mvolz: "Okay- my only request is that we wait for https://gerrit.wikimedia.org/r/#/c/196581/ to be merged first; otherwise the citations are more " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197074 (https://phabricator.wikimedia.org/T62768) (owner: 10Jforrester) [16:52:11] !log running sync-common on silver to have it catch up [16:52:14] Logged the message, Master [16:52:25] 6operations, 3Interdatacenter-IPsec: Implement a big off switch - https://phabricator.wikimedia.org/T88536#1121893 (10ArielGlenn) If you run the script via salt and you make the script blocking as I saw in a comment on the changeset, you need to check for job returns from all the clients and give up after some... [16:52:33] bd808: anomie \o/ all good. thanks :) [16:53:07] YuviPanda: now you can rm citoid/deploy && puppet apply on sca100x in peace :P [16:53:46] mobrovac: ah, so I did that on 1001 and that didn’t actually seem to work [16:54:04] because of the ferm thingy? [16:54:19] mobrovac: nope, unrelated. [16:54:28] https://www.irccloud.com/pastebin/jIUVS0gq [16:54:31] mobrovac: ^ [16:54:42] after rm -rfing the repo, running puppet again, this is what shows up [16:54:55] mobrovac: I’ve a feeling this is just git-deploy unable to deal with submodules properly [16:55:22] it's always the submodules [16:55:30] damn [16:57:38] i don't understand this ... it worked neatly for restbase, with the same submodules outline [16:58:44] and it deleted the submodule dir [16:58:46] hm [17:08:00] (03CR) 10Yuvipanda: [C: 04-1] "Won't work, will need to use ferm::rule (see role::mediawiki)" [puppet] - 10https://gerrit.wikimedia.org/r/197062 (owner: 10Dzahn) [17:10:04] YuviPanda: could you please paste me somewhere the output of "sudo salt-call deploy.fetch 'citoid/deploy'; sudo salt-call deploy.checkout 'citoid/deploy'" on sca1001 ? [17:10:35] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1121944 (10ArielGlenn) asked Yurik (adding him) about /a/zerosms/logs, he will be able to clean that up next week, it needs careful review by him. [17:11:04] mobrovac: https://phabricator.wikimedia.org/P402 [17:13:51] YuviPanda: thnx [17:13:55] gotta love submodules ... [17:14:02] mobrovac: :) [17:17:38] (03PS1) 10Thcipriani: Get base domainname from ldap.conf [puppet] - 10https://gerrit.wikimedia.org/r/197087 [17:17:49] ^ YuviPanda [17:17:55] thcipriani: test on deployment-salt? [17:17:58] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1121990 (10RobH) a:3RobH Indeed, I only got through row A in the mw port setups. Going forward (since this task was op... [17:18:10] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1121995 (10RobH) p:5Triage>3High [17:18:24] YuviPanda: tested on staging-palladium. I can patch deployment-salt though. [17:18:32] thcipriani: ah, sweet. let me merge. [17:18:44] (03PS1) 10Giuseppe Lavagetto: redis: correct replication data for rbf* [puppet] - 10https://gerrit.wikimedia.org/r/197088 [17:19:00] <_joe_> chasemp: ^^ [17:19:38] (03CR) 10Rush: [C: 031] redis: correct replication data for rbf* [puppet] - 10https://gerrit.wikimedia.org/r/197088 (owner: 10Giuseppe Lavagetto) [17:19:41] thcipriani: hmm, actually, I don’t know if that’ll work [17:19:55] thcipriani: you need it to be .getLdapInfo(‘base’) [17:19:57] not ldapbase [17:20:00] since it’s base in ldap.conf [17:20:00] i misread it too :) [17:20:14] thcipriani: ah [17:20:17] thcipriani: nevermind, I can’t read. [17:20:45] (03CR) 10Yuvipanda: [C: 032] Get base domainname from ldap.conf [puppet] - 10https://gerrit.wikimedia.org/r/197087 (owner: 10Thcipriani) [17:20:51] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet has 1 failures [17:20:59] thcipriani: ^ thanks :) [17:21:19] mobrovac: I’m off for food now :( Can you make notes in the ticket if you’re running into issues still? [17:21:43] YuviPanda: sure, bon appetit! [17:23:40] (03PS2) 10Giuseppe Lavagetto: redis: correct replication data for rbf* [puppet] - 10https://gerrit.wikimedia.org/r/197088 [17:23:50] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] redis: correct replication data for rbf* [puppet] - 10https://gerrit.wikimedia.org/r/197088 (owner: 10Giuseppe Lavagetto) [17:29:16] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1122070 (10RobH) So, as I was about to start to do this again, I realize I'm not being efficient. How does one edit the... [17:34:47] !log Updated entity suggester data on wikidata (with data from today's dump) [17:34:54] Logged the message, Master [17:39:02] RECOVERY - SSH on restbase1006 is OK: SSH OK - OpenSSH_6.7p1 Debian-3 (protocol 2.0) [17:39:10] RECOVERY - Host restbase1006 is UP: PING OK - Packet loss = 0%, RTA = 1.40 ms [17:42:10] (03CR) 10Ori.livneh: "\o/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197067 (owner: 10GWicke) [17:44:30] (03PS2) 10GWicke: Use RESTBase for visual editing on ruwiki and ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197067 (https://phabricator.wikimedia.org/T89066) [17:45:59] (03CR) 10Aaron Schulz: mediawiki: add configs to support the Dallas DC (035 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [17:47:11] greg-g: Can I ask for a deploy window later today? Want to get the Citoid extension into group 0 now that the production service is finally ready. [17:47:40] PROBLEM - Host restbase1006 is DOWN: PING CRITICAL - Packet loss = 100% [17:47:51] James_F: citoid is still down afaik [17:48:05] YuviPanda|foood: Working here. [17:48:19] James_F: ah, so 1 of the hosts is down, and the other one is only accidentally working [17:48:20] !log running CentralAuth's migrateAccount.php --auto on all unattached accounts [17:48:24] and deploys are completely broken [17:48:25] Logged the message, Master [17:48:45] James_F: mobrovac is still looking into it, afaik. [17:48:54] * James_F nods. [17:49:15] Nemo_bis: ^ [17:49:45] (03CR) 1020after4: [C: 032] Check for any content before opening (03Merged) 10jenkins-bot: Check for any content before opening James_F: yep, take one :) [17:50:08] greg-g: Thanks! [17:50:15] Krenair: Do you have a preferred hour? [17:50:25] * greg-g is still in meeting 'o dooooooom [17:50:29] greg-g: Enjoy. [17:51:26] greg-g: Also, we should probably re-label the "Parsoid/OCG" slot into "Services", given citoid/mathoid/service-runner/etc. now exist. [17:51:40] doit [17:51:47] Kk. [17:52:01] greg-g: I was actually wondering if we could go before then [17:52:10] https://gerrit.wikimedia.org/r/#/c/197067/ [17:52:17] James_F, an hour starting sometime after 19:00 and ending before 23:00, I think [17:52:21] utc, of course [17:52:32] gwicke: sure [17:52:53] so 12:00 and 16:00 pdt? [17:52:54] greg-g: cool, thx! will give us more time to monitor it before the next round [17:53:13] (03PS1) 10Ottomata: Set up eventlogging varnishkafka instance in betalabs on bits host [puppet] - 10https://gerrit.wikimedia.org/r/197095 [17:54:07] (03CR) 10jenkins-bot: [V: 04-1] Set up eventlogging varnishkafka instance in betalabs on bits host [puppet] - 10https://gerrit.wikimedia.org/r/197095 (owner: 10Ottomata) [17:54:14] yay legoktm :) [17:55:11] (03PS2) 10Ottomata: Set up eventlogging varnishkafka instance in betalabs on bits host [puppet] - 10https://gerrit.wikimedia.org/r/197095 [17:56:39] (03PS3) 10Ottomata: Set up eventlogging varnishkafka instance in betalabs on bits host [puppet] - 10https://gerrit.wikimedia.org/r/197095 [17:56:45] legoktm: are the unifications relayed to RC/IRC channels? [17:57:23] Krenair: Services have 20:00 UTC – take the hour beforehand? [17:57:45] right [17:58:02] gwicke: You going to go now? [17:58:54] James_F, yep, 19:00 - 20:00 should be fine [17:59:33] Krenair: Done – https://wikitech.wikimedia.org/wiki/Deployments#Week_of_March_16th [18:02:36] James_F: yes, going now [18:02:54] akosiaris: btw, there’s a partial citoid outage going on atm https://phabricator.wikimedia.org/T92845 [18:03:14] (03CR) 10GWicke: [C: 032] Use RESTBase for visual editing on ruwiki and ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197067 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [18:03:20] (03Merged) 10jenkins-bot: Use RESTBase for visual editing on ruwiki and ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197067 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [18:03:39] mobrovac: [18:03:50] mobrovac: err, I just pasted https://phabricator.wikimedia.org/T92845 in our ops etherpad [18:04:29] !log gwicke Synchronized wmf-config/InitialiseSettings.php: Use RESTBase with VE on ptwiki and ruwiki (duration: 00m 05s) [18:04:33] Logged the message, Master [18:04:45] YuviPanda: cheers :) [18:04:59] mobrovac: do fill it the ticket in with your investigations as you go on. [18:04:59] can somone merge https://gerrit.wikimedia.org/r/#/c/196885/ during merge session pls [18:05:35] (03PS2) 10Steinsplitter: cleanup: upload has been disabled on outrechwiki, no longer needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196885 [18:05:39] YuviPanda: yeah Error: Cannot find module '/srv/deployment/citoid/deploy/src/server.js' [18:05:53] akosiaris: yup, current theory is git-deploy doesn’t like submodules muhc [18:05:54] much [18:05:54] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1122209 (10RobH) I'll toss in the vlan assignments post ops meeting so installs can continue. the descriptions can be ud... [18:06:03] that is possible [18:06:12] I 'll look into it right after the meeting [18:06:17] akosiaris: sweet [18:06:23] mobrovac: [18:06:24] James_F, does that citoid issue block our extension deployment later? [18:06:40] Krenair: No, but it does block further release. [18:06:51] akosiaris: cheers! [18:07:12] right, we need all relevant sca servers running to have it widely used by all wikis (i.e. not just test wikis)? [18:07:29] akosiaris: it can't find the latest ref basically: https://phabricator.wikimedia.org/P402 [18:08:00] 6operations, 6MediaWiki-Core-Team, 7Wikimedia-log-errors: rbf1001 and rbf1002 are timing out / dropping clients for Redis - https://phabricator.wikimedia.org/T92591#1122221 (10chasemp) relevant https://phabricator.wikimedia.org/T90923 [18:08:12] it just does not have complete capacity right now, rather than erroring? [18:08:58] James_F [18:09:00] Krenair: Exactly. [18:09:13] ok, that's fine for test wikis I thnk [18:09:15] think* [18:09:18] * James_F nods. [18:16:13] ha [18:16:16] "The guilty developer can be identified [18:16:16] and harassed without human intervention." [18:16:28] From buildbot's description [18:16:29] :P [18:20:16] (03PS1) 10GWicke: Enable RESTBase for visual editing on itwiki and plwiki as well [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197104 (https://phabricator.wikimedia.org/T89066) [18:20:36] 6operations, 10Wikimedia-Mailing-lists: Large Delays or problems with all mailing lists? - https://phabricator.wikimedia.org/T92872#1122306 (10Krenair) [18:20:47] 6operations, 10Wikimedia-Mailing-lists: Large Delays or problems with all mailing lists? - https://phabricator.wikimedia.org/T92872#1122279 (10Krenair) http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Miscellaneous+eqiad&h=polonium.wikimedia.org&jr=&js=&v=76&m=exim+messages+out&vl=messages looks... [18:21:28] 6operations, 7Mail, 10Wikimedia-Mailing-lists: Large Delays or problems with all mailing lists? - https://phabricator.wikimedia.org/T92872#1122312 (10Krenair) [18:23:21] (03CR) 10Jforrester: [C: 031] Enable RESTBase for visual editing on itwiki and plwiki as well [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197104 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [18:23:23] 6operations, 10ops-codfw: ms-be2009.codfw.wmnet: slot=10 dev=sdk failed - https://phabricator.wikimedia.org/T92833#1122327 (10Papaul) I contact Dell, I will have the drive on site tomorrow. [18:26:43] (03CR) 10GWicke: [C: 032] Enable RESTBase for visual editing on itwiki and plwiki as well [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197104 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [18:26:48] (03Merged) 10jenkins-bot: Enable RESTBase for visual editing on itwiki and plwiki as well [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197104 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [18:27:37] !log gwicke Synchronized wmf-config/InitialiseSettings.php: Use RESTBase with VE on itwiki and plwiki (duration: 00m 07s) [18:27:42] Logged the message, Master [18:28:06] gwicke, enwiki and then dewiki? [18:28:16] that's... untraditional. [18:28:52] Krenair: ;) [18:29:02] :) [18:29:11] frwiki will be next [18:29:15] and that's the biggest VE user [18:31:07] 10Ops-Access-Requests, 6operations, 6Phabricator, 6Release-Engineering, 5Patch-For-Review: Chad H. needs access to iridium (Phabricator host) to manage repos - https://phabricator.wikimedia.org/T92564#1122360 (10RobH) The sudo review for this passed ops meeting review. We should continue to narrow the s... [18:31:18] (03PS1) 10coren: WIP: new security module for security::pam [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) [18:32:24] (03CR) 10coren: [C: 031] "WIP do not merge." [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) (owner: 10coren) [18:32:36] (03CR) 10coren: [C: 04-1] WIP: new security module for security::pam [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) (owner: 10coren) [18:32:58] <_joe_> gwicke: I'll try again to use VE now [18:33:02] <_joe_> :)) [18:33:31] _joe_: great ;) [18:34:03] 6operations, 6CA-team, 7Mail, 10Wikimedia-Mailing-lists: Large Delays or problems with all mailing lists? - https://phabricator.wikimedia.org/T92872#1122380 (10Jalexander) [18:34:12] (03CR) 10Dzahn: "approved in ops meeting" [puppet] - 10https://gerrit.wikimedia.org/r/196613 (owner: 10Rush) [18:34:16] 6operations, 6Multimedia: Add monitoring of upload rate on commons to icingia alerts - https://phabricator.wikimedia.org/T92322#1122382 (10Tgr) In theory, all hooks and API requests are logged to graphite; [[ https://www.mediawiki.org/wiki/Manual:Hooks/FileUpload | FileUpload ]] and [[ https://www.mediawiki.or... [18:40:31] (03PS1) 10GWicke: Use RESTBase with VisualEditor on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197108 (https://phabricator.wikimedia.org/T89066) [18:41:38] (03CR) 10Mobrovac: [C: 031] "Yeeeey :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197108 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [18:42:04] (03CR) 10Jforrester: [C: 031] Use RESTBase with VisualEditor on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197108 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [18:43:08] 6operations, 10ops-codfw: ms-be2007.codfw.wmnet: slot=8 dev=sdi failed - https://phabricator.wikimedia.org/T92834#1122445 (10Papaul) I contact Dell, I will have the drive on site tomorrow. [18:43:32] (03CR) 10GWicke: [C: 032] Use RESTBase with VisualEditor on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197108 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [18:43:36] 6operations, 10ops-codfw: ms-be2007.codfw.wmnet: slot=3 dev=sdd failed - https://phabricator.wikimedia.org/T92835#1122448 (10Papaul) I contact Dell, I will have the drive on site tomorrow. [18:43:42] (03Merged) 10jenkins-bot: Use RESTBase with VisualEditor on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197108 (https://phabricator.wikimedia.org/T89066) (owner: 10GWicke) [18:44:10] !log gwicke Synchronized wmf-config/InitialiseSettings.php: Use RESTBase with VE on frwiki (duration: 00m 08s) [18:44:13] Logged the message, Master [18:44:21] and.. we are live [18:49:54] kaldari: do you somehow not have https://gerrit.wikimedia.org/r/#/c/193400/ in your vagrant? [18:50:05] did it not fix the issue? [18:50:15] regarding https://phabricator.wikimedia.org/T92878 [18:50:57] (03PS2) 10Rush: phab allow chad h to admin sane things [puppet] - 10https://gerrit.wikimedia.org/r/196613 [18:51:33] gah, wrong channel [18:51:34] aude: oh, maybe it’s already fixed now. I just remembered that I forgot to file a bug about it when I ran into the problem. I’ll update the bug. [18:52:08] please check [18:53:01] (03CR) 10Rush: [C: 032] phab allow chad h to admin sane things [puppet] - 10https://gerrit.wikimedia.org/r/196613 (owner: 10Rush) [18:53:26] jouncebot, next [18:53:26] In 0 hour(s) and 6 minute(s): Citoid extension deployment to group0 wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150316T1900) [18:56:49] (03PS1) 10Rush: phab add group phabricator-admins to iridium [puppet] - 10https://gerrit.wikimedia.org/r/197111 [18:58:20] (03PS2) 10Rush: phab add group phabricator-admins to iridium [puppet] - 10https://gerrit.wikimedia.org/r/197111 [19:00:04] Krenair, James_F: Respected human, time to deploy Citoid extension deployment to group0 wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150316T1900). Please do the needful. [19:00:10] Woo-hoo. [19:00:13] ok [19:00:37] akosiaris: https://phabricator.wikimedia.org/T92845 if you can just take a quick look [19:00:45] i've been stuck at it the whole afternoon [19:01:30] mobrovac: yeah that is what I am looking [19:01:36] so [19:01:38] yey :) [19:01:39] (03CR) 10Rush: [C: 032] phab add group phabricator-admins to iridium [puppet] - 10https://gerrit.wikimedia.org/r/197111 (owner: 10Rush) [19:01:45] fatal: reference is not a tree: 2cd246b99fd2f407b92c33a079a1120f0abe7117 [19:01:49] right [19:01:59] that's the commit sha1 which exists [19:01:59] which gets me here http://stackoverflow.com/questions/2155887/git-submodule-head-reference-is-not-a-tree-error [19:02:15] James_F, don't we want labs wgCitoidServiceUrl to point to citoid.wmflabs.org still? [19:02:21] hmmm [19:02:51] Krenair: I'd rather we were testing a Beta Cluster version of the service, but until that happens we should probably test production. [19:02:59] ok [19:03:47] akosiaris: that's a new one, we had "unable to find" this afternoon :) [19:04:31] (03PS6) 10Alex Monk: Provide the Citoid extension for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187132 (owner: 10Jforrester) [19:05:01] Krenair: I think we need to sync the new extension before the config to enable it. [19:05:01] (03CR) 10Alex Monk: [C: 032] Provide the Citoid extension for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187132 (owner: 10Jforrester) [19:05:03] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: puppet fail [19:05:06] yeah [19:05:09] we do [19:05:55] akosiaris: i have no idea how this happens, since on tin this commit exists [19:07:24] (03CR) 10Tim Landscheidt: "I think this is overly complicated. For similar cases, e. g. sudoers.d, we just require that resources aren't only removed, but "ensure =" [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) (owner: 10coren) [19:07:38] mobrovac: I think I have an idea [19:07:49] oh yeah [19:07:51] :) [19:07:59] (03Merged) 10jenkins-bot: Provide the Citoid extension for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187132 (owner: 10Jforrester) [19:09:29] mobrovac: OK fixed. So I 've encountered this before. What I did do to solve it was akosiaris@tin:/srv/deployment/citoid/deploy/src$ git update-server-info [19:09:32] and redeploy [19:09:49] I 'll log it into the ticket, I consider it a trebuchet bug [19:10:10] interesting, in my desperation i could swear i've tried it too [19:10:17] akosiaris: thnx a looot! [19:10:29] akosiaris: you've already re-deployed? [19:10:51] mobrovac: yup [19:10:56] :) [19:11:02] but I now got another error on sca1001 [19:11:11] i'll add this info to https://wikitech.wikimedia.org/wiki/Trebuchet#Troubleshooting [19:11:15] Error: Cannot find module 'bunyan' [19:11:19] akosiaris: that's good :) [19:11:27] ok, that's my bad probably [19:11:27] how come ? [19:11:31] i'll fix it [19:11:35] ok, thanks [19:11:40] akosiaris: because that we can fix :D [19:12:14] I like that trebuchet though did not issue a restart on sca1002 [19:12:21] so the service is not down despite the problem :-) [19:14:13] 6operations: iridium "standard" conflict with exim in role - https://phabricator.wikimedia.org/T92879#1122552 (10chasemp) 3NEW a:3yuvipanda [19:14:13] (03CR) 10Andrew Bogott: [C: 032] Block in firstboot until NFS mounts are available. [puppet] - 10https://gerrit.wikimedia.org/r/196233 (owner: 10Andrew Bogott) [19:15:39] (03PS1) 10Rush: phab work around dupe exim waiting on T9287 [puppet] - 10https://gerrit.wikimedia.org/r/197112 [19:15:54] (03PS2) 10Rush: phab work around dupe exim waiting on T9287 [puppet] - 10https://gerrit.wikimedia.org/r/197112 [19:16:32] (03PS4) 10Andrew Bogott: Moved the dns::recursor class into a module [puppet] - 10https://gerrit.wikimedia.org/r/196621 [19:16:36] !log krenair Started scap: Citoid extension deployment [19:16:40] Logged the message, Master [19:17:04] !log krenair scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.zUFETE2jD1" ' returned non-zero exit status 1 (duration: 00m 27s) [19:17:06] Logged the message, Master [19:17:10] oh dear / [19:17:12] :\ [19:18:21] anyone know why that fails? [19:18:32] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:18:40] ohh [19:19:08] (03PS5) 10Andrew Bogott: Moved the dns::recursor class into a module [puppet] - 10https://gerrit.wikimedia.org/r/196621 [19:19:12] James_F, I think we need to add the extension to wmf20 as well? [19:19:18] even though it'd only be enabled on testwikis [19:19:27] "Extension /srv/mediawiki-staging/php-1.25wmf20/extensions/Citoid/Citoid.php doesn't exist" :| [19:19:36] Krenair: Ah. [19:19:42] * James_F sighs. [19:19:43] that's not supposed to exist, it's wmf21 only... [19:19:45] Yay hetdeploy. [19:20:32] Krenair: Can you do that or should I? [19:20:40] please do [19:20:44] OK. [19:21:18] (03CR) 10Andrew Bogott: [C: 032] Moved the dns::recursor class into a module [puppet] - 10https://gerrit.wikimedia.org/r/196621 (owner: 10Andrew Bogott) [19:21:26] akosiaris: yuhuu, citoid's up on sca1001 [19:21:29] cheers a lot [19:21:40] mobrovac: nice! [19:21:43] RECOVERY - citoid on sca1001 is OK: HTTP OK: HTTP/1.1 200 OK - 745 bytes in 0.007 second response time [19:21:52] hehe ^^ [19:22:07] Woo. [19:22:25] !log restart citoid on sca1002 [19:22:26] (03PS3) 10Andrew Bogott: Moved the labs ldap dns manifest into a module [puppet] - 10https://gerrit.wikimedia.org/r/196638 [19:22:31] Logged the message, Master [19:22:38] but yeah I guess it makes sense looking at extension-list :( [19:22:45] (03CR) 10coren: "Except that this would not suffice according to spec. In addition to having the file removed, you need to run pam-auth-update (with --rem" [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) (owner: 10coren) [19:25:42] mobrovac: should I close https://phabricator.wikimedia.org/T92845? [19:25:49] (03CR) 10Andrew Bogott: [C: 032] Moved the labs ldap dns manifest into a module [puppet] - 10https://gerrit.wikimedia.org/r/196638 (owner: 10Andrew Bogott) [19:26:02] akosiaris: i'm adding a comment now and will close it [19:26:09] cool, thanks! [19:26:41] Krenair: https://gerrit.wikimedia.org/r/197116 OK? [19:27:06] 6operations, 10Citoid, 10VisualEditor, 3VisualEditor 2014/15 Q3 blockers: Improve citoid production service - https://phabricator.wikimedia.org/T90281#1122634 (10Jdforrester-WMF) [19:27:12] waiting for jenkins [19:28:25] Hm, at current speeds we’ll have all out puppet code in modules by 2019 :( [19:29:42] 6operations, 10Citoid, 6Services: Provide service alerting/statistics for the citoid and zotero services - https://phabricator.wikimedia.org/T87496#1122647 (10akosiaris) [19:30:10] 6operations, 10Citoid, 6Services: Provide service alerting/statistics for the citoid and zotero services - https://phabricator.wikimedia.org/T87496#992754 (10akosiaris) As mentioned above, the alerting part is done [19:30:17] akosiaris: The only think marked as blocking https://phabricator.wikimedia.org/T90281 (citoid being better) is https://phabricator.wikimedia.org/T87496 (monitoring for citoid/zotero/proxy). [19:30:25] Ha. Already ahead of me. :-D [19:30:28] (03PS6) 10Andrew Bogott: Roughed in designate class [puppet] - 10https://gerrit.wikimedia.org/r/191471 [19:30:30] (03PS1) 10Andrew Bogott: pdns class and setup for labs designate [puppet] - 10https://gerrit.wikimedia.org/r/197117 [19:30:44] James_F: :-) [19:31:09] akosiaris: On the stats side, is that something mvolz/Services can do for themselves? [19:31:55] James_F: I can take that on [19:31:57] both stats and monitoring [19:32:01] James_F: As far as citoid goes, yes. Send stuff to graphite I suppose. As far as zotero goes, I am not very optimistics [19:32:03] (03PS1) 10GWicke: Use RESTBase for VisualEditor on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197118 [19:32:05] oh, monitoring is done [19:32:06] optimistic* [19:32:10] akosiaris: :-) [19:32:16] James_F: want me to do that? [19:32:16] akosiaris: Do we need to monitor the proxy service? [19:32:22] ori: That'd be awesome if possible. [19:32:53] (03CR) 10Jforrester: [C: 031] "Let's do this in the Services window (13:00 PST) after the Parsoid deploy is done?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197118 (owner: 10GWicke) [19:32:58] James_F: sure, on it [19:32:59] James_F: url-downloader ? We already do. [19:33:15] akosiaris: Aha. Even better. [19:33:25] 6operations, 10Citoid, 6Services: Provide service alerting/statistics for the citoid and zotero services - https://phabricator.wikimedia.org/T87496#1122669 (10ori) a:3ori [19:33:42] akosiaris: Is there some magic incantation I have to say to get alerting to the services group for SMS? [19:33:52] (Assuming that's even possible?) [19:33:56] (03CR) 10Andrew Bogott: [C: 032] "(This will be isolated to Holmium which plays no role in production at the moment.)" [puppet] - 10https://gerrit.wikimedia.org/r/191471 (owner: 10Andrew Bogott) [19:34:40] (03CR) 10Andrew Bogott: [C: 032] pdns class and setup for labs designate [puppet] - 10https://gerrit.wikimedia.org/r/197117 (owner: 10Andrew Bogott) [19:34:52] James_F: like shiboleet ? no not really, but a task would be fine [19:35:52] (03PS4) 10Ottomata: Set up eventlogging varnishkafka instance in betalabs on bits host [puppet] - 10https://gerrit.wikimedia.org/r/197095 [19:36:25] !log krenair Started scap: Citoid extension deployment [19:36:35] akosiaris: Kk. [19:37:08] (03CR) 10Ottomata: [C: 032] Set up eventlogging varnishkafka instance in betalabs on bits host [puppet] - 10https://gerrit.wikimedia.org/r/197095 (owner: 10Ottomata) [19:38:30] 6operations, 10Citoid, 6Services: Add citoid service alerts to the "services" group for SMS alerts - https://phabricator.wikimedia.org/T92887#1122704 (10Jdforrester-WMF) 3NEW [19:39:19] !log krenair Finished scap: Citoid extension deployment (duration: 02m 54s) [19:39:22] Logged the message, Master [19:39:27] that was ... quicker than expected... [19:39:49] James_F? [19:40:13] uh oh [19:40:57] Krenair: That sounds bad. [19:41:03] !log krenair Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s) [19:41:07] Logged the message, Master [19:41:11] indeed: 14995 Undefined variable: wmgUseCitoid in /srv/mediawiki/wmf-config/CommonSettings.php on line 2055 :| [19:41:17] Bleh. [19:41:23] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet has 1 failures [19:41:33] For wmf20? [19:41:38] 'Cos it seems to be working here. [19:41:44] Yeah it was working for me as well [19:41:49] then I swapped to fatalmonitor and saw that :| [19:41:51] Hmm. [19:42:02] touched the file which sets that variable and synched it again [19:42:09] seems fine now? [19:42:10] Krenair: Is it dying down now? [19:42:12] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [19:42:23] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [19:42:26] that's paravoid, working on it [19:42:46] sorry, yes [19:42:50] !log working on mailman issues [19:42:53] Logged the message, Master [19:43:21] James_F, seems OK now... [19:43:49] didn't see any complaints in the obvious channels either. very strange [19:44:01] Krenair: Still needs a scap though. [19:44:17] It did have a scap [19:45:03] although the i18n messages have not appeared for some reason [19:46:13] Still needs i18n, then. :-) [19:46:18] yeah [19:46:28] scap should have done that though, shouldn't it? [19:48:22] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [19:48:33] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [19:49:16] did you add the extension to both branches? I remember scap not liking if you only add it to one [19:49:29] Ran into that problem, fixed [19:49:46] scap enabled the extension on the right wikis [19:49:52] but they are missing i18n [19:49:52] (03PS1) 10Andrew Bogott: Added a node definition for holmium [puppet] - 10https://gerrit.wikimedia.org/r/197122 [19:50:25] 6operations, 6Labs: setup / deploy holmium as designate server - https://phabricator.wikimedia.org/T92507#1122767 (10Andrew) 5Open>3Resolved Working fine -- thanks! [19:50:26] Krenair: is it a new extension? [19:50:29] yes [19:50:43] There's a file it has t be added to... [19:50:44] 6operations, 10Citoid, 6Services: Add citoid service alerts to the "services" group for SMS alerts - https://phabricator.wikimedia.org/T92887#1122772 (10mobrovac) Is it possible to become a recipient of such text messages? [19:50:45] * bd808 looks [19:51:26] extension-list? yeah done that [19:51:36] Does it need to run twice or something silly? [19:51:56] (03CR) 10Andrew Bogott: [C: 032] Added a node definition for holmium [puppet] - 10https://gerrit.wikimedia.org/r/197122 (owner: 10Andrew Bogott) [19:52:04] Krenair: ExtensionMessages-1.25wmfNN.php [19:52:10] Ha. [19:52:11] * James_F sighs. [19:52:21] Citoid extension is listed there already bd808 [19:52:32] 'Citoid' => "$IP/extensions/Citoid/i18n", [19:52:40] under wgMessagesDirs [19:52:54] (nod* [19:52:59] so not that then... hmm [19:54:15] Krenair: what would an example message key be? [19:54:21] citoid-desc [19:54:25] any of https://github.com/wikimedia/mediawiki-extensions-Citoid/blob/master/i18n/en.json [19:54:38] krenair@tin:/srv/mediawiki-staging/php-1.25wmf21$ mwscript eval.php test2wiki [19:54:39] > var_dump( wfMessage( 'citoid-desc' )->text() ); [19:54:39] string(13) "" [19:54:41] it takes a long time to return that [19:55:23] (03CR) 10GWicke: "@James: Yup, that should work timing-wise." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197118 (owner: 10GWicke) [19:55:32] Yeah... It's not in the cdb json dumps [19:55:39] at least for wmf20 [19:55:56] it's not enabled on any wmf20 wikis if that matters [19:55:57] just wmf21 [19:56:04] (And it won't be.) [19:56:10] not in the wmf21 cache either [19:56:15] right, that's the issue [19:56:21] scap should put it there, right? [19:56:29] yes [19:57:23] Times like this are when I miss Reedy most [19:57:41] akosiaris: Argh. How hard will it be for citoid to provide access via SSL? Totally forgot… [19:58:37] 6operations, 10Citoid: citoid.wikimedia.org needs to provide HTTPS access as well as HTTP - https://phabricator.wikimedia.org/T92891#1122833 (10Jdforrester-WMF) 3NEW [19:58:54] 6operations, 10Citoid: citoid.wikimedia.org needs to provide HTTPS access as well as HTTP - https://phabricator.wikimedia.org/T92891#1122840 (10Jdforrester-WMF) [19:59:53] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:00:04] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150316T2000). [20:00:49] 6operations, 10Citoid, 3VisualEditor 2014/15 Q3 blockers: citoid.wikimedia.org needs to provide HTTPS access as well as HTTP - https://phabricator.wikimedia.org/T92891#1122833 (10Jdforrester-WMF) [20:01:12] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [20:02:03] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [20:03:00] (still me) [20:03:41] subbu: if you want to go a bit later I could do a quick config change first [20:04:09] i was about to start in a few mins, but can wait till you are done as well. [20:04:12] bd808, still looking at the i18n thing? [20:04:21] subbu: should take ~1min [20:04:26] ok [20:04:32] (03CR) 10GWicke: [C: 032] Use RESTBase for VisualEditor on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197118 (owner: 10GWicke) [20:04:34] ok, going for it [20:04:36] (03Merged) 10jenkins-bot: Use RESTBase for VisualEditor on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197118 (owner: 10GWicke) [20:04:54] Krenair: The config files /look/ right but something apparently didn't take [20:05:02] !log gwicke Synchronized wmf-config/InitialiseSettings.php: Use RESTBase with VE on all wikipedia (duration: 00m 08s) [20:05:04] *s [20:05:06] Logged the message, Master [20:05:17] subbu: done [20:05:27] alright. [20:05:51] Krenair: There may indeed be a "run scap twice" bug here when adding a new extension outside a train deploy [20:06:09] So scapping again wouldn't hurt at all [20:06:18] gwicke, subbu: you guys done deploying? [20:06:31] Krenair, i am just about starting now. [20:06:40] Krenair: subbu is next, then I have a restbase deploy ready [20:06:41] if all goes well, should be done in 10-15 mins .. or less. [20:06:47] ok [20:07:05] would you mind letting me know when you're done please? [20:08:12] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [20:08:13] Krenair: yes, will do [20:08:20] thanks [20:08:32] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [20:12:26] !log deployed parsoid sha ccf4c140 [20:12:29] Logged the message, Master [20:12:32] gwicke, deployed .. now to verify and test .. [20:12:41] Krenair: I'm an idiot. [20:12:48] James_F? [20:12:59] Krenair: " $wgCitoidServiceUrl = 'http://citoid.wikimedia.org/api'; " should be protocol-relative. [20:13:19] ah [20:13:29] let's fix that? :) [20:13:32] subbu: okay, I'll get ready in the meantime [20:13:49] James_F, I saw https://phabricator.wikimedia.org/T92891#1122833 :) [20:13:57] Krenair: :-) [20:15:27] 10Ops-Access-Requests, 6operations, 6Phabricator, 6Release-Engineering, 5Patch-For-Review: Chad H. needs access to iridium (Phabricator host) to manage repos - https://phabricator.wikimedia.org/T92564#1122905 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/196613/ has been merged and narrows the rights down... [20:15:30] James_F, so it does actually support https? [20:15:44] we're just not using it? [20:16:02] (03PS1) 10Jforrester: Follow-up 7b22a4e: Protocol-relative URLs FTW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197127 [20:16:13] Krenair: Apparently: https://citoid.wikimedia.org/ [20:16:23] gwicke, looks good to me. [20:16:31] Krenair: Non-problem masked by Beta Cluster's lack of HTTPS. [20:16:33] subbu: ok [20:16:55] 6operations, 10Citoid, 3VisualEditor 2014/15 Q3 blockers: citoid.wikimedia.org needs to provide HTTPS access as well as HTTP - https://phabricator.wikimedia.org/T92891#1122910 (10Jdforrester-WMF) 5Open>3Resolved a:3Jdforrester-WMF I am an idiot. It does. [20:17:14] I think that counts as invalid [20:17:17] but ok [20:17:26] Krenair: It was actively fixed in the past, I now remember. [20:17:32] Krenair: But whatever. :-) [20:18:27] Krenair: Push that fix out with your scap? [20:18:31] yeah, will do [20:18:35] * James_F nods. [20:18:39] everything ok gwicke? [20:19:23] Krenair: just getting ready [20:19:30] ok [20:20:45] 6operations, 10ops-codfw, 7network, 3wikis-in-codfw: Codfw mediawiki appservers from any rows but row A can't communicate with the dhcp server - https://phabricator.wikimedia.org/T92815#1122932 (10RobH) a:5RobH>3Joe Ok, the descriptions of these ports need to be updated, but the vlans are set. @joe: y... [20:20:52] 6operations, 10Citoid, 6Services: Add citoid service alerts to the "services" group for SMS alerts - https://phabricator.wikimedia.org/T92887#1122937 (10Dzahn) re: SMS alerts: this depends on the Icinga contactgroup like "sms" or "parsoid" defined in: puppet://modules/icinga/files/contactsgroups.cfg to bec... [20:21:37] 6operations, 10Citoid, 6Services: Add citoid service alerts to the "services" group for SMS alerts - https://phabricator.wikimedia.org/T92887#1122939 (10Jdforrester-WMF) Ah, sorry, thought we'd renamed the "parsoid" group to cover all services. [20:23:53] gwicke: Deployment zone is clear for you. :-) [20:23:59] 6operations, 10Citoid, 6Services: Add citoid service alerts to the "services" group for SMS alerts - https://phabricator.wikimedia.org/T92887#1122954 (10Dzahn) Since we are using mail 2 SMS gateways, your phone provider would have to offer a gateway for that. Major providers usually have that while resellers... [20:25:29] (03CR) 10Tim Landscheidt: "You're right. But a) we could then just replace the "ensure => absent" with an Exec that calls pam-auth-update --remove $file && rm -f, o" [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) (owner: 10coren) [20:29:53] James_F: I did the RB roll-out at the start of the window [20:30:05] Ah. :-) [20:30:08] Never mind, then. [20:30:14] Krenair: Clear for you, it seems. [20:30:35] planned to update RB too, but have trouble validating the deploy on the staging cluster [20:30:43] Krenair: go ahead [20:30:55] (03CR) 10Alex Monk: [C: 032] Follow-up 7b22a4e: Protocol-relative URLs FTW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197127 (owner: 10Jforrester) [20:30:59] (03Merged) 10jenkins-bot: Follow-up 7b22a4e: Protocol-relative URLs FTW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197127 (owner: 10Jforrester) [20:31:42] James_F: I was at the National Institutes of Health. Someone wanted to add missing metadata to his citations, so I got him to turn on Visual Editor and I showed him the nice interface for editing citation data. He appreciated it. [20:31:55] harej: You think that's nice, you should wait for the new one. :-) [20:32:07] I am waiting for VisualEditor on-by-default for new accounts. [20:32:08] !log krenair Started scap: https://gerrit.wikimedia.org/r/#/c/197127/ - and also try to fix citoid i18n on test wikis [20:32:11] Logged the message, Master [20:32:34] harej: Hopefully this will be a positive conversation next time. [20:32:43] For the hell of it, I will speak as a corporation. Wikimedia District of Columbia is waiting for you to turn VisualEditor on by default for new accounts. [20:32:55] * James_F grins. [20:33:00] (03PS1) 10Ori.livneh: Add configurable metric reporting for citoid [puppet] - 10https://gerrit.wikimedia.org/r/197131 [20:33:53] !log krenair Finished scap: https://gerrit.wikimedia.org/r/#/c/197127/ - and also try to fix citoid i18n on test wikis (duration: 01m 45s) [20:33:56] Logged the message, Master [20:34:11] I'm also planning on making a flyer for circulation at events teaching people how to turn it on, but that seems unnecessary, no? [20:34:36] harej: Yeah, hopefully. [20:34:49] James_F, bd808: I think the i18n is still broken :/ [20:34:56] Krenair: Yeah. :-( [20:35:12] Krenair: :( I don't see new keys in /srv/mediawiki-staging/php-1.25wmf21/cache/l10n/upstream/*.json [20:35:41] I don't get it... the extension is clearly installed on the wikis and scap has been run twice [20:35:57] tim changed file permission schemes somewhat [20:36:00] possibly related? [20:36:00] James_F: hey, isn't citoid already having HTTPS ?https://citoid.wikimedia.org/ ? [20:36:26] ori: possibly I guess. If so all l10n will be stuck I would think [20:36:37] akosiaris: Yes, sorry, I'm an idiot. [20:36:41] akosiaris, it is, we just weren't using it. fixed now. [20:36:43] akosiaris: You've already fixed it. :-) [20:36:57] :-) [20:37:07] (03CR) 10coren: "I don't particularly like leaving security configuration to "let's not forget to do X when Y happens" - especially since a broken PAM conf" [puppet] - 10https://gerrit.wikimedia.org/r/197105 (https://phabricator.wikimedia.org/T85910) (owner: 10coren) [20:37:10] Krenair: I'll look at the scap code and see if my memory is jarred for other weirdness that needs to happen [20:37:14] akosiaris: https://gerrit.wikimedia.org/r/#/c/197131/ btw [20:38:24] mw-update-l10n? [20:38:27] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: enable authenticated access to Cassandra JMX - https://phabricator.wikimedia.org/T92471#1122988 (10Eevans) > I think managing this at the network level might result in better security & usability. I see two main options: > [ ... ] > See also... [20:38:30] ori: nice! [20:39:16] Krenair: scap runs that, but it's the file to look at yes [20:40:04] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1122991 (10vshchepakina) @Dzahn yes, we are "enterprise" customers. [20:44:26] (03PS2) 10Bmansurov: [WikiGrok] Create 'film director' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194373 [20:44:28] (03PS2) 10Bmansurov: [WikiGrok] Add new suggestions to the actor campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194354 (owner: 10Phuedx) [20:44:30] (03PS2) 10Bmansurov: [WikiGrok] Add the "filmProducer" campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194503 (owner: 10Phuedx) [20:44:32] (03PS2) 10Bmansurov: [WikiGrok] Create 'screenwriter' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194378 [20:46:33] bd808, I tried to rebuildLocalisationCache [20:46:42] [4f0b3aef] [no req] MWException from line 1304 of /srv/mediawiki-staging/php-1.25wmf21/includes/cache/LocalisationCache.php: Unable to open CDB file for write "/srv/mediawiki-staging/php-1.25wmf21/cache/l10n/l10n_cache-en.cdb" [20:46:46] both as me and as www-data [20:46:57] needs to run as l10nupdate [20:47:03] failed to open stream: Permission denied in [20:47:08] and as l10nupdate [20:47:59] rw-rw-r-- 1 l10nupdate l10nupdate 3036238 Mar 14 02:31 l10n_cache-en.cdb [20:48:22] (03CR) 10QChris: "ping" [puppet] - 10https://gerrit.wikimedia.org/r/195262 (owner: 10QChris) [20:48:56] 6operations, 10ops-codfw, 3wikis-in-codfw: Configure mw2001-2134 correctly - https://phabricator.wikimedia.org/T91238#1123029 (10Papaul) Update BIOS settings for logical processor and redirection after boot to enabled for mw2080 -mw2100. But not mw2098. it looks like mw2098 have hardware problem. i will look... [20:49:20] bd808, hang on, doesn't mwscript try to sudo as www-data? [20:49:47] oh ffs [20:49:54] 6operations, 6CA-team, 7Mail, 10Wikimedia-Mailing-lists: Large Delays or problems with all mailing lists? - https://phabricator.wikimedia.org/T92872#1123043 (10akosiaris) Hello, Some extra information as to which mails have been delayed so we can pinpoint the problem would be helpful. Especially helpful w... [20:49:55] (03PS2) 10Dzahn: switch ferm rule in releases to use network.pp [puppet] - 10https://gerrit.wikimedia.org/r/197062 [20:50:02] how could that work? [20:50:04] it didn't if you were l10nupdate until last week :( [20:50:20] That's what is broken [20:50:23] and it's new [20:50:33] bd808: need me to chown something? [20:50:47] well.... [20:51:19] l10nupdate owns the cdb files but Tim changed mwscript so it always sudos to www-data now [20:51:36] which I think wa on purpose but messes up l10nupdate [20:52:36] so if I make mwscript allow l10nupdate to run the scripts directly... it should work? [20:52:42] (03PS3) 10Dzahn: switch ferm rule in releases to use network.pp [puppet] - 10https://gerrit.wikimedia.org/r/197062 [20:52:59] yeah, but I think Tim was changing that on purpose [20:53:04] :/ [20:53:07] (03PS4) 10Dzahn: switch ferm rule in releases to use network.pp [puppet] - 10https://gerrit.wikimedia.org/r/197062 [20:53:16] just revert his change [20:53:16] so maybe the files need to be owned by www-data now? [20:53:31] but that seems yucky too [20:53:35] would that resolve things? [20:53:41] bd808, that would probably break l10nupdate running on its own [20:53:48] having the cache owned by the web user is not awesome [20:54:15] https://gerrit.wikimedia.org/r/#/c/196132/ [20:54:20] (03PS1) 10Bmansurov: WikiGrok: Add a new 'politician' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197142 [20:54:37] l10nupdate should not sync [20:54:50] we lived without it, and we can continue to live without it [20:54:54] it's a security and permission nightmare [20:55:04] The sync isn't the problem [20:55:09] it's writing to the prep area [20:55:26] if it's not syncing, can't the work being performed by l10nupdate happen when we scap? [20:55:29] the sync has been there for literally years [20:55:30] or would that be too slow? [20:55:47] it takes like 7-9 minutes [20:56:01] Hi. I've been having troubles accessing stat1003.eqiad.wmnet, and I'd appreciate a few minutes of trouble shooting. [20:56:09] which is why we do it once and sync the output [20:56:11] HaithamS: what's the problem? [20:56:35] (03CR) 10Dzahn: [C: 032] "exactly like in https://gerrit.wikimedia.org/r/#/c/197079/ and checking on caesium, should be noop" [puppet] - 10https://gerrit.wikimedia.org/r/197062 (owner: 10Dzahn) [20:56:36] whenever I try to ssh I get Permission denied (publickey). [20:57:02] HaithamS: I'll PM [20:59:55] (03PS1) 10Bmansurov: WikiGrok: Add a new 'writer' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197184 [21:00:52] (03PS1) 10Dzahn: releases: fix syntax error in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/197189 [21:00:58] Krenair: mind making an Unbreak Now! ticket about this? [21:01:05] (03CR) 10Dzahn: "yea, "exactly like", including the syntax error :p" [puppet] - 10https://gerrit.wikimedia.org/r/197062 (owner: 10Dzahn) [21:01:55] 6operations, 6CA-team, 7Mail, 10Wikimedia-Mailing-lists: Large Delays or problems with all mailing lists? - https://phabricator.wikimedia.org/T92872#1123097 (10faidon) 5Open>3Resolved a:3faidon Mailman was hammered by hundreds of thousands of subscriptions (someone mailbombing some particular email a... [21:02:08] (03CR) 10Dzahn: [C: 032] releases: fix syntax error in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/197189 (owner: 10Dzahn) [21:02:09] 6operations: Allocate a few servers to logstash - https://phabricator.wikimedia.org/T87031#1123102 (10RobH) [21:02:10] 6operations, 5Patch-For-Review: reclaim lsearchd hosts - https://phabricator.wikimedia.org/T86149#1123100 (10RobH) 5Open>3Resolved nothing, chris added back to spares, resolving [21:02:11] 6operations, 10ops-eqiad, 5Patch-For-Review: Decommission lsearchd - https://phabricator.wikimedia.org/T85009#1123103 (10RobH) [21:02:33] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:02:56] 6operations, 10ops-eqiad, 5Patch-For-Review: Decommission lsearchd - https://phabricator.wikimedia.org/T85009#1123112 (10RobH) 5Open>3Resolved with adding the old systems back to spares, I think that ends the decom steps for lsearch. Resolving. [21:03:36] bd808, done [21:03:39] (03CR) 10Dzahn: "besides that it's ok though" [puppet] - 10https://gerrit.wikimedia.org/r/197062 (owner: 10Dzahn) [21:06:12] 6operations, 6CA-team, 7Mail, 10Wikimedia-Mailing-lists: Large Delays or problems with all mailing lists? - https://phabricator.wikimedia.org/T92872#1123126 (10Jalexander) >>! In T92872#1123097, @faidon wrote: > Mailman was hammered by hundreds of thousands of subscriptions (someone mailbombing some partic... [21:07:24] Krenair: Could it be fixed by ori running it as root or something ghastly-but-at-least-works? [21:07:40] hey ori, feel like running scap as root? :P [21:07:50] * James_F grins. [21:07:52] James_F: nope [21:07:59] * James_F sighs. [21:08:01] but Tim-away will be looking at it [21:08:06] OK. [21:08:08] maybe the mwscript file can just be reverted by a root [21:08:10] and we will fix it ASAP [21:08:24] then we can try scap again [21:08:25] bd808: Thanks. :-) [21:08:32] (and thank you Tim) :) [21:08:33] (03PS1) 10Andrew Bogott: Replace a too-futuristing api-paste.ini [puppet] - 10https://gerrit.wikimedia.org/r/197206 [21:08:33] I don't get why this wasn't showing an error in my terminal though [21:08:52] to show an error one must know an error exists [21:08:53] greg-g: Of course, but thanking Tim-away goes without saying. :-) [21:08:55] * greg-g gets zen [21:08:55] (03PS2) 10Andrew Bogott: Replace a too-futuristic api-paste.ini [puppet] - 10https://gerrit.wikimedia.org/r/197206 [21:10:11] (03PS3) 10Andrew Bogott: Replace a too-futuristic api-paste.ini [puppet] - 10https://gerrit.wikimedia.org/r/197206 [21:10:14] do you want it to be fixed in less than an hour? I am in a meeting until then [21:11:17] (03CR) 10Andrew Bogott: [C: 032] Replace a too-futuristic api-paste.ini [puppet] - 10https://gerrit.wikimedia.org/r/197206 (owner: 10Andrew Bogott) [21:11:31] (03Abandoned) 10Dzahn: don't use 'ndots: 2' in labs resolv.conf [puppet] - 10https://gerrit.wikimedia.org/r/196731 (https://phabricator.wikimedia.org/T92351) (owner: 10Dzahn) [21:13:22] (03CR) 10Dzahn: [C: 031] Clean up bastionhost domain_search [puppet] - 10https://gerrit.wikimedia.org/r/196964 (owner: 10Hoo man) [21:14:32] TimStarling, I don't think it's *that* urgent [21:14:51] it should probably be fixed before the next train deploy [21:14:58] James_F, Krenair: sorry, I didn't notice the ping. Do you still need me to do anything? [21:15:04] tomorrow morning [21:15:06] ori: no [21:15:25] * James_F nods. [21:17:11] (03PS1) 10Andrew Bogott: Set up keystone config for designate [puppet] - 10https://gerrit.wikimedia.org/r/197208 [21:18:53] (03CR) 10Andrew Bogott: [C: 032] Set up keystone config for designate [puppet] - 10https://gerrit.wikimedia.org/r/197208 (owner: 10Andrew Bogott) [21:19:23] !log installing a non-puppetized version of the puppet cronjob on nescio, sodium. The new well thought out puppet-run can not run on lucid hosts since https://gerrit.wikimedia.org/r/#/c/196162/ . Given they go away soon, it is better to not do weird puppet tricks to accomodate for just 2 old, soon to be deprecated, boxes. [21:19:29] Logged the message, Master [21:20:03] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:22:24] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1123263 (10RobH) I've tried to get someone to review https://gerrit.wikimedia.org/r/#/c/196321/ and added various folks, however no one seems to want to review. [21:22:35] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1123265 (10RobH) p:5Triage>3High [21:23:37] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1114259 (10RobH) @slaporte: I have no idea, I thought you guys handled the domain transfers. Did you want me to ask Doneva? (Since you initially asked us to add support, I assumed you had hand... [21:23:39] (03CR) 10Ori.livneh: [C: 031] adding support to redirect wikimedia.xyz to wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/196321 (owner: 10RobH) [21:24:00] heh, thx ori [21:24:06] i mostly need someone in ops to say im sane [21:24:24] (03CR) 10BBlack: [C: 031] adding support to redirect wikimedia.xyz to wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/196321 (owner: 10RobH) [21:24:32] yay, thx [21:24:39] now to add to calendar. [21:27:10] (03PS1) 10Andrew Bogott: Give designate access to RabbitMQ on virt1000 [puppet] - 10https://gerrit.wikimedia.org/r/197210 [21:28:22] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1123304 (10Slaporte) Ah ok. I'm happy to do the domain transfer. Could you forward to me the registrar credentials that were emailed to dns-admin@wikimedia.org? Thanks! [21:28:55] (03PS2) 10Andrew Bogott: Give designate access to RabbitMQ on virt1000 [puppet] - 10https://gerrit.wikimedia.org/r/197210 [21:30:07] (03CR) 10Andrew Bogott: [C: 032] Give designate access to RabbitMQ on virt1000 [puppet] - 10https://gerrit.wikimedia.org/r/197210 (owner: 10Andrew Bogott) [21:37:56] @seen Gloria [21:37:56] mutante: Last time I saw Gloria they were quitting the network with reason: Ping timeout: 246 seconds N/A at 2/10/2015 10:55:28 AM (34d10h42m27s ago) [21:39:47] mutante: Fiona [21:40:51] ori: thanks [21:42:03] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1123452 (10RobH) I got reviews from @ori and @bblack =] (thx guys!) As such, I've added this to https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0March.C2.A017 for the 2PM Pacific... [21:42:54] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1123454 (10RobH) p:5High>3Normal [21:44:19] (03PS1) 10Andrew Bogott: Holmium dns should use holmium's ip. [puppet] - 10https://gerrit.wikimedia.org/r/197218 [21:44:52] (03CR) 10Andrew Bogott: [C: 032] Holmium dns should use holmium's ip. [puppet] - 10https://gerrit.wikimedia.org/r/197218 (owner: 10Andrew Bogott) [21:47:31] (03PS1) 10Dzahn: disable contacts website [puppet] - 10https://gerrit.wikimedia.org/r/197219 (https://phabricator.wikimedia.org/T90679) [21:48:20] (03PS1) 10Andrew Bogott: Fix a mismatch between 'password' and 'passwd' [puppet] - 10https://gerrit.wikimedia.org/r/197220 [21:49:35] (03CR) 10Andrew Bogott: [C: 032] Fix a mismatch between 'password' and 'passwd' [puppet] - 10https://gerrit.wikimedia.org/r/197220 (owner: 10Andrew Bogott) [21:50:24] (03CR) 10Dzahn: [C: 032] disable contacts website [puppet] - 10https://gerrit.wikimedia.org/r/197219 (https://phabricator.wikimedia.org/T90679) (owner: 10Dzahn) [21:51:06] (03PS1) 10Andrew Bogott: s/passwd/password -- I did this the wrong direction before. [puppet] - 10https://gerrit.wikimedia.org/r/197223 [21:54:50] (03CR) 10Andrew Bogott: [C: 032] s/passwd/password -- I did this the wrong direction before. [puppet] - 10https://gerrit.wikimedia.org/r/197223 (owner: 10Andrew Bogott) [21:57:05] 6operations, 10ops-eqiad: cp1047 down - https://phabricator.wikimedia.org/T88045#1123542 (10Cmjohnson) If you need to make any changes to the dispatch contact information, please visit our Support Center or Click Here to chat with a live support representative. For expedited service to our premium te... [22:00:55] 6operations, 5Patch-For-Review: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1123578 (10Dzahn) I disabled (but did not delete) the service for now and left a message at the address instead to contact me if anyone still expects it. It can be re-enabled easily (for a reas... [22:05:28] Krenair / ori: LU has the same problem, i.e. generating LC CDB files as one user and then syncing them out [22:05:55] it uses a temporary directory with --outdir and then copies into $IP/cache [22:06:25] so we could use the same solution for scap [22:14:31] (03CR) 10Dzahn: [C: 032] contint: Remove integration/kss.git [puppet] - 10https://gerrit.wikimedia.org/r/196174 (https://phabricator.wikimedia.org/T92482) (owner: 10Hashar) [22:15:14] does anyone have the full output from the scap run that failed? [22:15:24] e.g. Krenair? [22:15:36] Scap 'succeeded' [22:15:45] but it did not update the i18n cache [22:17:52] TimStarling: You can get data on fluorine -- tail -1000 /a/mw-log/scap.log | python ~bd808/scaplog.py [22:18:47] (03PS1) 10Andrew Bogott: Messing with the designate-sink config [puppet] - 10https://gerrit.wikimedia.org/r/197233 [22:20:19] (03CR) 10Andrew Bogott: [C: 032] Messing with the designate-sink config [puppet] - 10https://gerrit.wikimedia.org/r/197233 (owner: 10Andrew Bogott) [22:22:13] (03CR) 10Dzahn: "is it a feature of "role::dataset::secondary" to have bonded interfaces like that? (rather than putting it on a node in site.pp)" [puppet] - 10https://gerrit.wikimedia.org/r/193837 (owner: 10ArielGlenn) [22:24:06] thanks bd808 [22:26:34] ah, it uses threads [22:27:31] using --threads causes the exit status of the child process to be lost [22:27:53] ah [22:30:14] I can temporarily fix it [22:30:29] temporarily because pcntl_fork() will fail horribly under HHVM so some other solution will be needed there [22:31:11] fix - it will error now? [22:31:33] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [22:31:55] // Abuse the exit value for the count of rebuild languages [22:31:55] exit( $numRebuilt ); [22:32:28] it will give an error as soon as an i18n file is changed [22:32:41] (03PS1) 10Gergő Tisza: [WIP] Make vbench more generic [puppet] - 10https://gerrit.wikimedia.org/r/197240 (https://phabricator.wikimedia.org/T92701) [22:33:04] there are already changes waiting to be made to the cache [22:34:39] ok [22:35:02] 6operations, 10ops-esams: Rack, cable, prepare cp3030-3049 - https://phabricator.wikimedia.org/T92514#1123659 (10BBlack) [22:35:32] RoanKattouw_away, ^demon|lunch, Krenair: I might not be able to get online for SWAT but my patch only affects a maintenance script that needs to be run by hand and takes an hour to test anyway, so it should be safe to merge [22:35:46] okay [22:36:12] you'll test it later, just need it merged and synced? [22:39:52] 6operations, 10ops-esams: Rack, cable, prepare cp3030-3049 - https://phabricator.wikimedia.org/T92514#1123673 (10BBlack) re: BIOS, Faidon already set up the DRAC networking and passwords there. Today I audited the rest of the settings and changed any that needed changing (per our standard stuff, etc). Hypert... [22:41:56] Krenair: yes, testing it and running it is pretty much the same thing at this point, only fails on huge databases [22:43:32] andrewbogott: databases and users in T92694 comment 3 ok? [22:43:45] springle: yep, it’s all working great. Thank you! [22:43:54] awesome tnx [22:46:12] hi aude [22:46:24] andrewbogott: did we actually need the designate db? [22:46:29] it's still empty [22:46:58] springle: it’s empty, really? i ran an ‘init’ command and it succeeded... [22:47:16] aude, I notice https://gerrit.wikimedia.org/r/#/c/197049/ and https://gerrit.wikimedia.org/r/#/c/197050/ both appear to update from and to versions of wmf19 [22:47:30] is that expected? [22:49:40] andrewbogott: designate did connect once to the db, but did nothing, no tables [22:49:59] springle: ok, let me try again and we’ll see what happens [22:50:01] andrewbogott: pdns has connected many times, so i guess it's ok [22:50:12] but if we can drop designate, that would be neater [22:50:26] or make it use sqlite if it's doing nothing [22:51:27] I’m pretty sure it will be doing things… something must be wrong [22:51:46] does designate use it's own db or the pdns db? [22:51:58] it should use its own [22:52:04] purge.command('sudo -u mwdeploy -n -- /bin/rm ' [22:52:04] '--recursive --force %s/*' % deployed_l10n) [22:52:20] it'd be pretty funny if some bug made deployed_l10n be an empty string wouldn't it? [22:52:20] andrewbogott: does it expect to have DDL permissions? [22:52:40] springle: I don’t know what that is [22:53:30] andrewbogott: DDL == CREATE TABLE, DROP, etc. DML == INSERT,UPDATE,SELECT,DELETE [22:53:34] springle: so far I can’t make designate actually do anything, so it doesn’t shock me that the db is empty [22:54:03] springle: well… I’m running a script to set up the db, and another to update it. Woudl that /have/ to be able to create tables? [22:54:15] When I run the init script it says that things are already initialized. [22:54:24] So I wonder if it’s being weird and using sqlite even though I have it configured otherwise [22:54:40] andrewbogott: i know nothing about designate except the docs page you posted :) [22:54:56] yes, but -- [22:54:56] but my guess is that it would need elevanted permissions for the setup phase [22:55:02] elevated* [22:55:22] I don’t understand. You’re expressing surprise that designate has not created tables, but also saying that it doesn’t have permission to create tables :) [22:55:32] andrewbogott: you have root. we all have root [22:55:46] Ah, yeah, but I’m not... [22:56:04] andrewbogott: as i mentioned in T92694, if you need root to setup, that's ok, but please test first ;) [22:56:22] (and this suggests we really do need to test ;) [22:57:33] (03CR) 10ArielGlenn: "I don't think so. This host could win up with a 10ge card instead, depending on needs." [puppet] - 10https://gerrit.wikimedia.org/r/193837 (owner: 10ArielGlenn) [22:57:59] springle: in this case, I don’t know any more than the docs either. [22:58:14] And that the init/update scripts say that all is well. Which means they must not actually be doing anything real [22:58:19] But then, why the connection to the db at all? [22:58:24] yeah odd [22:58:43] there, did that hit it again? [22:59:26] no, still only the one designate connection. and in fact, i tested the designate connection myself, so that may have been me 12h ago [22:59:51] ok, I think it’s just ignoring my config file then [23:00:01] and using sqlite [23:00:05] RoanKattouw, ^d, Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150316T2300). [23:00:06] although I can’t figure out where the sqlite files are [23:00:07] ok [23:00:18] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1123717 (10MZMcBride) Wait, why is the store being renamed and moved? [23:00:33] * James_F waves. [23:00:48] AaronSchulz: do you know if there is a way to configure something in InitializeSettings.php so that it applies to *.wikipedia.org? [23:00:58] merging a patch for tgr|away, then aude, James_F and myself [23:01:26] andrewbogott: i'll grant DDL to designate temporarily so you can experiment without root. but we should eventually be confident of what it does, or reduce perms to DML again [23:01:59] springle: I’m confident I know what it’s /supposed/ to do [23:02:09] which, it should only do ddl when I run the udpate script by hand during upgrades [23:02:47] springle: where should I look for sqlite files or logs? [23:03:08] * andrewbogott predicts that lazy developers just ignored the db config and it always uses sqlite on localhost :( [23:03:38] sqlite can be configured to place files anywhere. probably in /var/lib somewhere [23:03:39] gwicke: see wmgUseBounceHandler for example [23:03:52] just use the 'wikipedia' tag as the key, is that not enough? [23:03:55] springle: ah, but I can’t find the config file either [23:04:24] AaronSchulz: that's what I did, but it seems to have disabled things across wikipedias [23:04:40] !log krenair Synchronized php-1.25wmf20/extensions/GlobalUsage/refreshGlobalimagelinks.php: https://gerrit.wikimedia.org/r/#/c/196993/ (duration: 00m 05s) [23:04:41] I suspect 'wiki' might be the right *.wikipedia.org key [23:04:44] Logged the message, Master [23:04:47] springle: if you look at /var/log/designate/designate-manage.log on holmium, you will see the logfile of a tool that thinks it has a happy db connection [23:05:23] gwicke: what did you do? [23:05:36] AaronSchulz: I set the flag for VE to use restbase [23:05:50] after ramping up wiki by wiki I went for 'wikipedia' to enable it for all of them [23:06:03] but now am seeing the headers that signal that the request actually went to the parsoid caches [23:06:10] and it's slower than expected [23:06:15] !log krenair Synchronized php-1.25wmf21/extensions/GlobalUsage/refreshGlobalimagelinks.php: https://gerrit.wikimedia.org/r/#/c/196994/ (duration: 00m 05s) [23:06:15] aude, ping [23:06:18] Logged the message, Master [23:06:44] hoo? [23:07:25] let's get back to them later >_> [23:08:20] huh? [23:08:39] hoo: aude had a wikidata patch for swat but has not appeared [23:09:03] Oh, I guess she forgot the tz weirdness [23:09:08] probably [23:09:41] gwicke: which variable? [23:09:51] 6operations, 6CA-team, 7Mail, 10Wikimedia-Mailing-lists: Large Delays or problems with all mailing lists? - https://phabricator.wikimedia.org/T92872#1123740 (10csteipp) >>! In T92872#1123126, @Jalexander wrote: >>>! In T92872#1123097, @faidon wrote: >> Mailman was hammered by hundreds of thousands of subsc... [23:10:04] oh, VRS [23:10:08] * AaronSchulz just used git log [23:10:39] AaronSchulz: *nod* [23:11:03] https://gerrit.wikimedia.org/r/#/c/197118/ [23:11:42] hoo: anyway do you want to step in for her? [23:11:48] Krenair: Although I didn't plan to be around, I'm ready for it [23:12:03] AaronSchulz: 'wikipedia' is definitely a tag which pulls in the *.wikipedia dblist in CommonSettings.php [23:12:41] I'm dealing with a couple of James' first, will do wikidata ones next [23:12:55] ok [23:13:20] hoo: will we need to do anything specific for these? I notice it changes composer files [23:13:47] gwicke: where does wmgUseRestbaseVRS get used to set wgUseRestbaseVRS? [23:14:14] ah, CS.php [23:14:18] Krenair: No, that's all pulled together in the build, just use the submodule updates by Katie [23:14:19] AaronSchulz: in CommonSettings.php [23:14:24] ok [23:14:25] line 1937 [23:15:09] are you testing on tin or just random apaches? [23:15:22] AaronSchulz: I'm testing on random apaches [23:15:49] the other explicitly defined wikis including mediawiki.org are definitely using RB [23:16:30] and I'm 99% sure that was also the case for the explicitly listed wikipedias before switching to the tag [23:16:56] James_F, ^ [23:16:58] !log krenair Synchronized php-1.25wmf21/extensions/Citoid/Citoid.php: https://gerrit.wikimedia.org/r/#/c/197236/ (duration: 00m 07s) [23:17:02] Logged the message, Master [23:17:51] (03PS1) 10Faidon Liambotis: mailman: mitigate against stupid subscription bots [puppet] - 10https://gerrit.wikimedia.org/r/197246 [23:18:09] James_F? [23:18:13] (03CR) 10Faidon Liambotis: [C: 032 V: 032] mailman: mitigate against stupid subscription bots [puppet] - 10https://gerrit.wikimedia.org/r/197246 (owner: 10Faidon Liambotis) [23:18:15] I see $wgVirtualRestConfig has restbase on enwiki [23:18:38] Krenair: WFM. [23:19:23] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [23:19:39] !log krenair Synchronized php-1.25wmf20/includes/content/JsonContent.php: https://gerrit.wikimedia.org/r/#/c/197216/ (duration: 00m 06s) [23:19:41] James_F, ^ [23:19:43] Logged the message, Master [23:20:35] same on mw1063 [23:20:37] Krenair: LGTM. [23:20:53] AaronSchulz: for some reason it's still consistently emitting the x-parsoid-performance header that indicates that it used Parsoid [23:21:08] while not doing so on mediawiki.org [23:21:52] https://en.wikipedia.org/w/api.php?action=visualeditor&format=json&paction=parse&page=Jaeger_(Kamil_Drozd_EP)&uselang=en&oldid=651700386 [23:22:25] gwicke: does it matter that mw.org is on a different MW version? [23:22:55] it shouldn't, as the same extension code is deployed in both [23:23:28] although.. let me double-check the VE extension [23:23:30] you confirmed that? [23:24:14] 6operations, 10ops-eqiad: Increase asw-d-eqiad uplink capacity - https://phabricator.wikimedia.org/T92914#1123769 (10faidon) 3NEW a:3Cmjohnson [23:25:16] it's definitely not the mw config [23:25:29] (03CR) 10Dzahn: [C: 04-1] "dzahn@mw1033:~$ aptitude why libmemcached11" [puppet] - 10https://gerrit.wikimedia.org/r/158023 (owner: 10Reedy) [23:26:21] springle: ah, it’s ignoring the db config and writing everything to var/lib/designate/designate.sqlite [23:26:28] I hope I can convince it to not do that :( [23:26:36] AaronSchulz: ok, thanks for the help! [23:28:05] AaronSchulz: confirmed that it's the VE extension not using restbase in wmf20; d'oh! [23:29:01] (03CR) 10Dzahn: "bump after 1 year" [puppet] - 10https://gerrit.wikimedia.org/r/52043 (owner: 10Silke Meyer) [23:29:17] James_F [23:29:20] !log krenair Synchronized php-1.25wmf21/includes/content/JsonContent.php: https://gerrit.wikimedia.org/r/#/c/197215/ (duration: 00m 11s) [23:29:24] Logged the message, Master [23:29:34] 6operations, 5Patch-For-Review, 7domains: add support for wikimedia.xyz - https://phabricator.wikimedia.org/T92547#1123794 (10MZMcBride) I looked at and I read through this task. I still don't really understand the virtue of having . I under... [23:29:51] Krenair: ? [23:30:04] is that one okay as well? [23:30:07] I think it is. [23:30:13] Oh, yes. [23:30:21] Test failure is unrelated [23:31:30] hoo, shall I retry that then? [23:31:52] Retry or overwrite, yes [23:32:29] ah it's a flow failure [23:33:43] 6operations, 10Parsoid, 6Services: Move Parsoid config into ops/puppet - https://phabricator.wikimedia.org/T92636#1123798 (10faidon) Well, ops/root dependencies aside, puppet is a very poor tool for this for multiple reasons (e.g. it would be impossible to deploy a MediaWiki config across the fleet in a few... [23:34:43] !log krenair Synchronized php-1.25wmf20/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/197049/ (duration: 00m 13s) [23:34:48] hoo, ^ [23:34:48] Logged the message, Master [23:35:08] (03CR) 10Dzahn: "@ottomata do you still think this is needed? maybe some servers meanwhile got base::firewall anyways and we identified more of the needed " [puppet] - 10https://gerrit.wikimedia.org/r/160480 (owner: 10Ottomata) [23:37:24] (03CR) 10Dzahn: [C: 04-1] "this edits files in module "mediawiki_new". meanwhile it's just "mediawiki" and _new doesn't exist anymore. i expect we still want apparmo" [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [23:38:01] all ok hoo? [23:38:19] Trying to verify... probably still in RL cache [23:38:36] ok :) [23:38:39] (03CR) 10Dzahn: "wondering how this is compatible or conflicts with the existing yuvipanda shinken/module" [puppet] - 10https://gerrit.wikimedia.org/r/124861 (owner: 10Alexandros Kosiaris) [23:38:53] ah, here we go :) [23:38:59] Looks good, thanks [23:39:57] ok, moving on to wmf21 [23:40:14] (03CR) 10Dzahn: "looks like it would need heavy rebasing but still like it" [puppet] - 10https://gerrit.wikimedia.org/r/124861 (owner: 10Alexandros Kosiaris) [23:41:26] (03PS1) 10GWicke: Disable restbase on wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197250 [23:41:46] hoo [23:41:52] !log krenair Synchronized php-1.25wmf21/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/197050/ (duration: 00m 13s) [23:41:52] (03PS2) 10GWicke: Disable restbase on wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197250 (https://phabricator.wikimedia.org/T89066) [23:41:55] Logged the message, Master [23:42:20] thanks :) [23:43:32] (03CR) 10Dzahn: "re: TODO - move it to the role role::ci::master per comment from hashar" [puppet] - 10https://gerrit.wikimedia.org/r/165991 (owner: 10Alexandros Kosiaris) [23:46:28] springle: how about now? [23:47:08] (03CR) 10CSteipp: "I would like to see it implemented still" [puppet] - 10https://gerrit.wikimedia.org/r/38307 (owner: 10J) [23:48:02] (03PS1) 10Andrew Bogott: s/connection/database_connection in designate conf. [puppet] - 10https://gerrit.wikimedia.org/r/197252 [23:48:33] springle: designate logs are at least telling me know that they’re hitting mysql://designate:***@m1-master.eqiad.wmnet/designate [23:48:55] gwicke, https://phabricator.wikimedia.org/T89066#1123796 - you mean wmf20? [23:49:03] wmf21 is the newer version on group0/1 [23:49:10] wmf20 is on group2 (wikipedias) [23:49:37] 10Ops-Access-Requests, 6operations, 6Phabricator, 6Release-Engineering, 5Patch-For-Review: Chad H. needs access to iridium (Phabricator host) to manage repos - https://phabricator.wikimedia.org/T92564#1123856 (10Dzahn) 5Open>3Resolved 16:23 < mutante> ^d: phab access should work 16:23 < ^d> It does [23:49:44] springle: and now it’s saying ‘(OperationalError) (1005, "Can't create table 'designate.recordsets' (errno: 150)") "\nCREATE TABLE recordsets (\n\tid CHAR(32) NOT NULL, \n\tcreated_at DATETIME, \n\tupdated_at DATETIME, \n\tversion INTEGER NOT NULL…’ which seems somehow promising [23:50:13] Krenair: thx, fixed [23:52:29] !log krenair Synchronized php-1.25wmf20/includes/logging: https://gerrit.wikimedia.org/r/#/c/196846/ (duration: 00m 05s) [23:52:35] Logged the message, Master [23:53:59] !log krenair Synchronized php-1.25wmf21/includes/logging: https://gerrit.wikimedia.org/r/#/c/196845/ (duration: 00m 07s) [23:54:02] Logged the message, Master [23:54:20] access requests is empty, kind of rare [23:55:04] (03PS2) 10Dzahn: Backup /var/lib/jenking/config.xml on gallium [puppet] - 10https://gerrit.wikimedia.org/r/165991 (owner: 10Alexandros Kosiaris) [23:56:13] (03CR) 10Dzahn: [C: 031] Backup /var/lib/jenking/config.xml on gallium [puppet] - 10https://gerrit.wikimedia.org/r/165991 (owner: 10Alexandros Kosiaris) [23:57:19] (03PS2) 10Alex Monk: Enable NewUserMessage extension for fawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195014 (https://phabricator.wikimedia.org/T91861) (owner: 10Mjbmr) [23:57:23] andrewbogott: hey, sorry. looking... [23:57:24] (03CR) 10Alex Monk: [C: 032] Enable NewUserMessage extension for fawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195014 (https://phabricator.wikimedia.org/T91861) (owner: 10Mjbmr) [23:57:48] (03Merged) 10jenkins-bot: Enable NewUserMessage extension for fawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195014 (https://phabricator.wikimedia.org/T91861) (owner: 10Mjbmr) [23:58:38] andrewbogott: errno 150 is a foreign key issue [23:58:53] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/195014/ (duration: 00m 07s) [23:58:56] springle: it’s at least hitting the right db now, right? [23:58:57] Logged the message, Master [23:59:13] (03CR) 10Dzahn: "@apergos - still good, just make it exit with code 3 and "unknown" instead of warning per inline comments" [puppet] - 10https://gerrit.wikimedia.org/r/145018 (owner: 10ArielGlenn) [23:59:43] andrewbogott: it is, yes. half a dozen tables in designate db [23:59:56] So, good news and bad news :)