[00:00:48] ah, so if you wanted to you could install one of the icinga/nagios apps on the phone [00:01:12] (03PS13) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [00:01:29] mutante: so I get SMS on my phone and on my watch :D so... [00:02:16] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [00:02:23] dapatrick: I have to run, feel free to email if you find any problems with the deployment or the i18n. [00:02:27] yea, so i think i can conclude i dont need an android watch :) [00:02:42] awight: Okay, thanks for your help! [00:03:08] agreed, the 2fa on it sounds like it could be handy [00:04:17] mutante: quite literally handy yeah :D [00:04:38] :p in Germany a mobile phone is called "das Handy" [00:05:06] these days, you can connect your handy to the beamer [00:05:20] hehee [00:09:57] (03PS14) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [00:10:30] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [00:12:51] gwicke: they will thinnk you mean http://www.bmw.com/com/de/owners/navigation/bluetooth/introduction.html :) [00:13:39] (03CR) 10Dzahn: [C: 032] install_server: re-use amslvs1 for bast3001 [puppet] - 10https://gerrit.wikimedia.org/r/280791 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [00:14:16] mutante: ;) [00:14:38] there was a story about them publishing the open source stuff used in that system recently: https://github.com/edent/BMW-OpenSource [00:15:27] hah, cool @ opensource@bmw.com [00:15:34] in the actual car ui [00:18:19] cups, for on-board printing apparently [00:33:08] (03CR) 10CSteipp: [C: 032] Enable Ex:OATHAuth in beta, disabled for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280676 (owner: 10CSteipp) [00:33:31] (03PS1) 10Dzahn: Revert "rename hooft.esams to bast3001" [dns] - 10https://gerrit.wikimedia.org/r/280795 [00:33:34] (03Merged) 10jenkins-bot: Enable Ex:OATHAuth in beta, disabled for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280676 (owner: 10CSteipp) [00:34:53] (03PS2) 10Dzahn: Revert "rename hooft.esams to bast3001" [dns] - 10https://gerrit.wikimedia.org/r/280795 [00:36:26] !log csteipp@tin Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 34s) [00:36:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:36:33] (03CR) 10Dzahn: [C: 032] "cross-dc install didn't work, we are doing it differently and user another host" [dns] - 10https://gerrit.wikimedia.org/r/280795 (owner: 10Dzahn) [00:37:37] !log csteipp@tin Synchronized wmf-config/CommonSettings-labs.php: Sync labs change to keep repo clean (duration: 00m 31s) [00:37:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:38:01] PROBLEM - puppet last run on mw2213 is CRITICAL: CRITICAL: puppet fail [00:38:30] mutante: does system installing usually take this long? almost 7-8 hours and still going [00:38:50] YuviPanda: only when it's broken like it happened for me last night :/ [00:38:53] fun [00:39:04] so no, that sounds exactly like the broken disk [00:39:08] mutante: one of them is moving, other one not so much [00:39:24] mutante: I guess I'll wait overnight too and see what happen [00:39:27] yea, _and_ you even heard of disk problems before [00:39:29] *happens [00:39:40] mutante: yup, on only one of them tho [00:39:43] that's what i thought yesterday too [00:39:48] when i woke up it was not done [00:39:51] and rebooted [00:40:05] right. I'm in no hurry though, so I'll wait [00:40:11] after so many hours you can give up now [00:40:26] ok [00:52:14] (03PS1) 10CSteipp: Revert "Enable Ex:OATHAuth in beta, disabled for all users" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280796 [00:52:29] (03CR) 10CSteipp: [C: 032] Revert "Enable Ex:OATHAuth in beta, disabled for all users" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280796 (owner: 10CSteipp) [00:52:54] (03Merged) 10jenkins-bot: Revert "Enable Ex:OATHAuth in beta, disabled for all users" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280796 (owner: 10CSteipp) [00:54:38] !log csteipp@tin Synchronized wmf-config/InitialiseSettings-labs.php: Synching labs revert (duration: 00m 27s) [00:54:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:54:49] (03PS1) 10Dzahn: assign IP to bast3001, v4 and v6 [dns] - 10https://gerrit.wikimedia.org/r/280797 (https://phabricator.wikimedia.org/T123712) [00:55:29] !log csteipp@tin Synchronized wmf-config/CommonSettings-labs.php: Synching labs revert (duration: 00m 31s) [00:55:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:56:33] (03CR) 10Dzahn: [C: 032] assign IP to bast3001, v4 and v6 [dns] - 10https://gerrit.wikimedia.org/r/280797 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [01:06:10] RECOVERY - puppet last run on mw2213 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:13:34] (03PS1) 10Dzahn: install/ganglia/network: adjust bast3001 IP address [puppet] - 10https://gerrit.wikimedia.org/r/280798 (https://phabricator.wikimedia.org/T123712) [01:15:13] (03PS15) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [01:15:22] (03PS2) 10Dzahn: install/ganglia/network: adjust bast3001 IP address [puppet] - 10https://gerrit.wikimedia.org/r/280798 (https://phabricator.wikimedia.org/T123712) [01:15:35] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [01:29:13] (03PS16) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [01:47:24] (03PS17) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [01:48:52] (03PS1) 10Dzahn: site.pp: temp add hooft back as install-server [puppet] - 10https://gerrit.wikimedia.org/r/280799 (https://phabricator.wikimedia.org/T123712) [01:49:39] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [01:49:41] (03CR) 10Dzahn: [C: 032] site.pp: temp add hooft back as install-server [puppet] - 10https://gerrit.wikimedia.org/r/280799 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [01:55:57] (03PS18) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [02:06:48] (03PS19) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [02:12:23] (03PS20) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [02:12:45] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [02:19:23] PROBLEM - puppet last run on mw2030 is CRITICAL: CRITICAL: puppet fail [02:19:34] (03PS1) 10Dzahn: ganglia: leave aggregator on hooft until bast3001 works [puppet] - 10https://gerrit.wikimedia.org/r/280803 (https://phabricator.wikimedia.org/T123712) [02:21:02] (03CR) 10jenkins-bot: [V: 04-1] ganglia: leave aggregator on hooft until bast3001 works [puppet] - 10https://gerrit.wikimedia.org/r/280803 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [02:21:21] (03PS21) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [02:21:42] (03PS2) 10Dzahn: ganglia: leave aggregator on hooft until bast3001 works [puppet] - 10https://gerrit.wikimedia.org/r/280803 (https://phabricator.wikimedia.org/T123712) [02:21:45] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [02:26:27] (03PS22) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [02:26:36] (03CR) 10Dzahn: [C: 032] ganglia: leave aggregator on hooft until bast3001 works [puppet] - 10https://gerrit.wikimedia.org/r/280803 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [02:26:52] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [02:29:01] (03CR) 10VolkerE: [C: 04-1] Remove $wgCopyrightIcon (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [02:29:41] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 10m 29s) [02:29:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:36:20] (03CR) 10Dzahn: "firs the plan was to do this right after the install, but since it dit not work straight from carbon i now tried a different way where bas" [dns] - 10https://gerrit.wikimedia.org/r/280641 (owner: 10Faidon Liambotis) [02:38:39] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Apr 1 02:38:39 UTC 2016 (duration 8m 58s) [02:38:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:43:45] (03PS23) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [02:44:07] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [02:45:53] PROBLEM - Check size of conntrack table on chromium is CRITICAL: CRITICAL: nf_conntrack is 100 % full [02:47:33] RECOVERY - Check size of conntrack table on chromium is OK: OK: nf_conntrack is 5 % full [02:49:22] RECOVERY - puppet last run on mw2030 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:50:03] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [02:50:13] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [02:55:29] (03PS1) 10Papaul: DNS: Add mgmt DNS for graphite2002 Bug:T130938 [dns] - 10https://gerrit.wikimedia.org/r/280807 (https://phabricator.wikimedia.org/T130938) [02:59:51] 6Operations, 10ops-codfw: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2166869 (10Papaul) [03:00:55] 6Operations, 10ops-codfw: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2151480 (10Papaul) mgmt 10.193.2.251 port info row 5 rack C5 ge-5/0/6 [03:01:10] (03PS24) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [03:01:15] 6Operations, 10ops-codfw: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2166871 (10Papaul) [03:01:32] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [03:09:32] (03PS1) 10Papaul: DNS: Add production DNS for graphite2002 Bug:T130938 [dns] - 10https://gerrit.wikimedia.org/r/280808 (https://phabricator.wikimedia.org/T130938) [03:11:12] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:11:22] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:13:47] (03PS25) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [03:14:10] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [03:17:22] (03PS1) 10Papaul: install-server: Add graphite2002 MAC address Bug:T130938 [puppet] - 10https://gerrit.wikimedia.org/r/280809 (https://phabricator.wikimedia.org/T130938) [03:19:58] (03PS26) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [03:20:21] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [03:23:55] 6Operations, 10ops-codfw: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2166919 (10Papaul) [03:25:07] 6Operations, 10ops-codfw: rack five new spare pool systems - https://phabricator.wikimedia.org/T130941#2166920 (10Papaul) [03:50:03] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [03:51:52] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [03:57:03] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:58:53] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [04:00:34] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: puppet fail [04:11:59] (03PS2) 10Dzahn: DNS: Add mgmt DNS for graphite2002 Bug:T130938 [dns] - 10https://gerrit.wikimedia.org/r/280807 (https://phabricator.wikimedia.org/T130938) (owner: 10Papaul) [04:12:12] (03CR) 10Dzahn: [C: 032] DNS: Add mgmt DNS for graphite2002 Bug:T130938 [dns] - 10https://gerrit.wikimedia.org/r/280807 (https://phabricator.wikimedia.org/T130938) (owner: 10Papaul) [04:12:50] (03PS2) 10Dzahn: DNS: Add production DNS for graphite2002 Bug:T130938 [dns] - 10https://gerrit.wikimedia.org/r/280808 (https://phabricator.wikimedia.org/T130938) (owner: 10Papaul) [04:13:57] (03CR) 10Dzahn: [C: 032] DNS: Add production DNS for graphite2002 Bug:T130938 [dns] - 10https://gerrit.wikimedia.org/r/280808 (https://phabricator.wikimedia.org/T130938) (owner: 10Papaul) [04:14:48] (03CR) 10Dzahn: "[radon:~] $ host graphite2002.codfw.wmnet" [dns] - 10https://gerrit.wikimedia.org/r/280808 (https://phabricator.wikimedia.org/T130938) (owner: 10Papaul) [04:16:05] 6Operations, 10ops-codfw: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2151480 (10Dzahn) merged DNS changes https://gerrit.wikimedia.org/r/280807 https://gerrit.wikimedia.org/r/280808/ [radon:~] $ host graphite2002.codfw.wmnet graphite2002.codfw.wmnet has address 10.192.32.140... [04:16:43] (03PS2) 10Dzahn: install-server: Add graphite2002 MAC address [puppet] - 10https://gerrit.wikimedia.org/r/280809 (https://phabricator.wikimedia.org/T130938) (owner: 10Papaul) [04:16:51] (03PS3) 10Dzahn: install-server: Add graphite2002 MAC address [puppet] - 10https://gerrit.wikimedia.org/r/280809 (https://phabricator.wikimedia.org/T130938) (owner: 10Papaul) [04:17:45] (03CR) 10Dzahn: [C: 032] install-server: Add graphite2002 MAC address [puppet] - 10https://gerrit.wikimedia.org/r/280809 (https://phabricator.wikimedia.org/T130938) (owner: 10Papaul) [04:26:53] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [04:43:23] PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 68.97% of data above the critical threshold [5000000.0] [04:44:33] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 72.41% of data above the critical threshold [5000000.0] [05:04:09] !log starting mobileapps deploy [05:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:05:03] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/0: down - Core: cr2-codfw:xe-5/2/1 (Telia, IC-314534, 29ms) {#11375} [10Gbps wave]BR [05:05:32] PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 1 mailman queue(s) above limits (thresholds: bounces: 25 in: 25 virgin: 25) [05:06:14] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/2/1: down - Core: cr1-eqord:xe-0/0/0 (Telia, IC-314534, 24ms) {#10694} [10Gbps wave]BR [05:08:54] RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below the limits. [05:14:02] PROBLEM - puppet last run on mw2103 is CRITICAL: CRITICAL: puppet fail [05:20:44] PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:22:23] RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING [05:25:33] RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 50.00% above the threshold [1000000.0] [05:28:09] !log mobileapps deployed 66f8dac [05:28:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:33:53] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [05:42:12] RECOVERY - puppet last run on mw2103 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:03:59] (03CR) 10Nemo bis: "Thanks Krinkle for the additional analysis. Awight, will you take care of getting this through the mediawiki-config bureaucracy?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250384 (https://phabricator.wikimedia.org/T130442) (owner: 10Nemo bis) [06:29:24] PROBLEM - puppet last run on mw2036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:52] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: puppet fail [06:30:23] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:52] PROBLEM - puppet last run on mw2043 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:53] PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:32] PROBLEM - puppet last run on wtp2017 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:02] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:03] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:24] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:33] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:54] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:03] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:12] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:36] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "We don't really need non-localhost requests to reach the http server here" [puppet] - 10https://gerrit.wikimedia.org/r/280625 (owner: 10Muehlenhoff) [06:40:22] (03Abandoned) 10Muehlenhoff: Add ferm rules for mediawiki::maintenance [puppet] - 10https://gerrit.wikimedia.org/r/280625 (owner: 10Muehlenhoff) [06:41:44] 6Operations, 10Wikimedia-General-or-Unknown, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Switchover of the application servers to codfw - https://phabricator.wikimedia.org/T124671#2167064 (10Joe) [06:41:46] 6Operations, 13Patch-For-Review, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Reduce the number of appservers we're using in eqiad - https://phabricator.wikimedia.org/T126242#2167063 (10Joe) 5stalled>3Open [06:53:50] <_joe_> !log setting all newer appservers weight to 20 in eqiad [06:53:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:55:53] RECOVERY - puppet last run on mw2036 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:56:03] RECOVERY - puppet last run on wtp2017 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:56:33] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:56:53] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:57:12] RECOVERY - puppet last run on mw2043 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:57:13] RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:33] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:57:34] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:44] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:54] PROBLEM - puppet last run on elastic2020 is CRITICAL: CRITICAL: Puppet has 1 failures [06:58:02] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:23] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:33] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:53] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:00:27] <_joe_> !log depooling mw1070-89 from the appserver cluster. T126242 [07:00:28] T126242: Reduce the number of appservers we're using in eqiad - https://phabricator.wikimedia.org/T126242 [07:00:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:02:45] (03PS1) 10Muehlenhoff: Enable base::firewall on wasat [puppet] - 10https://gerrit.wikimedia.org/r/280815 [07:03:30] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable base::firewall on wasat [puppet] - 10https://gerrit.wikimedia.org/r/280815 (owner: 10Muehlenhoff) [07:04:38] (03PS1) 10Reedy: Throttle for Jerusalem Hackathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280816 (https://phabricator.wikimedia.org/T130460) [07:11:42] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [07:12:03] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [07:13:58] (03PS2) 10Reedy: Throttle for Jerusalem Hackathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280816 (https://phabricator.wikimedia.org/T130460) [07:14:22] (03CR) 10Reedy: [C: 032] Throttle for Jerusalem Hackathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280816 (https://phabricator.wikimedia.org/T130460) (owner: 10Reedy) [07:14:46] (03Merged) 10jenkins-bot: Throttle for Jerusalem Hackathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280816 (https://phabricator.wikimedia.org/T130460) (owner: 10Reedy) [07:14:46] 6Operations, 7Graphite: Add labs graphite as a data source to grafana.wikimedia.org - https://phabricator.wikimedia.org/T131431#2167123 (10yuvipanda) [07:15:53] !log reedy@tin Synchronized wmf-config/throttle.php: T130460 Jerusalem Hackathon (duration: 00m 40s) [07:15:54] T130460: Add throttle exceptions for Jerusalem Hackathon - https://phabricator.wikimedia.org/T130460 [07:15:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:22:43] RECOVERY - puppet last run on elastic2020 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [07:28:21] (03PS5) 10Volans: DB: Expose Puppet SSL certs and generate CA cert [puppet] - 10https://gerrit.wikimedia.org/r/279596 (https://phabricator.wikimedia.org/T111654) [07:51:59] (03PS1) 10Muehlenhoff: Add ferm rules for ganglia::deprecated::collector [puppet] - 10https://gerrit.wikimedia.org/r/280817 [07:56:09] <_joe_> !log depooling mw1121-1130 from the api cluster [07:56:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:00:59] (03PS1) 10Muehlenhoff: Add ferm rules for torrus [puppet] - 10https://gerrit.wikimedia.org/r/280818 [08:01:24] (03PS1) 10Dereckson: ip → IP in throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280819 [08:02:20] (03CR) 10Gehel: [C: 031] "Looks clean to me..." [puppet] - 10https://gerrit.wikimedia.org/r/280757 (owner: 10Dzahn) [08:06:00] (03CR) 10Dereckson: "I wonder if we shouldn't add:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280819 (owner: 10Dereckson) [08:06:42] (03CR) 10Elukey: "diff:" [puppet] - 10https://gerrit.wikimedia.org/r/280678 (https://phabricator.wikimedia.org/T129344) (owner: 10Elukey) [08:07:00] Reedy: we've opened the throttling for all IPs ^ [08:09:27] Dereckson: Or we could push for the E:ThrottleOverride deployment and have a special page for it [08:11:10] <_joe_> !log progressively reducing weight of the older servers in the api cluster [08:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:13:57] (03Abandoned) 10Muehlenhoff: Enable base::firewall on wasat [puppet] - 10https://gerrit.wikimedia.org/r/280626 (owner: 10Muehlenhoff) [08:22:21] Dereckson: Ta, let me fix [08:22:24] (03CR) 10Dereckson: [C: 032] ip → IP in throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280819 (owner: 10Dereckson) [08:22:40] oh okay [08:22:46] I were going to do it. [08:23:15] I've only CR+2 the change, not yet fetched or anything. [08:25:15] Dereckson: I didn't realise you had shell access now [08:25:17] Feel free [08:25:24] Dereckson: I guess, we should just normalise it all [08:25:28] Or use strtolower or similar [08:26:40] Ok. [08:27:50] (03PS1) 10ArielGlenn: dumps: fix up dir var references in incr dumps config [puppet] - 10https://gerrit.wikimedia.org/r/280820 [08:29:13] !log dereckson@tin Synchronized wmf-config/throttle.php: Fix throttle rules (Gerrit change 280819). (duration: 00m 29s) [08:29:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:29:31] Done. [08:29:44] Now the normalisation code. [08:32:45] (03PS2) 10Mschon: update the DNS record for benefactors.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/280637 (https://phabricator.wikimedia.org/T130937) [08:34:23] (03CR) 10ArielGlenn: [C: 032] dumps: fix up dir var references in incr dumps config [puppet] - 10https://gerrit.wikimedia.org/r/280820 (owner: 10ArielGlenn) [08:34:51] (03PS1) 10Filippo Giunchedi: cassandra: add restbase2004-b [puppet] - 10https://gerrit.wikimedia.org/r/280821 [08:35:21] (03PS2) 10Filippo Giunchedi: cassandra: add restbase2004-b [puppet] - 10https://gerrit.wikimedia.org/r/280821 [08:35:29] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: add restbase2004-b [puppet] - 10https://gerrit.wikimedia.org/r/280821 (owner: 10Filippo Giunchedi) [08:36:17] (03CR) 10Volans: "@jynus: this is the final version, I've checked that all the results from puppet-compiler have the same diff." [puppet] - 10https://gerrit.wikimedia.org/r/279596 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [08:38:01] !log bootstrap restbase2004-b [08:38:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:42:01] !log reedy@tin rebuilt wikiversions.php and synchronized wikiversions files: Revert labswiki to wmf.18 as 2FA seems to be broken [08:42:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:42:40] valhallasw`cloud: bd808 ^^ [08:43:20] godog: you around ? [08:43:32] matanya: ciao! long time no see [08:43:40] hi :) [08:43:57] godog: I am starting to work on some video uploading stuff [08:44:24] I need to know if there is a real reason not to raise the upload limit from swift point of view [08:44:28] (03PS1) 10Reedy: Revert labswiki to wmf.18 as 2FA seems to be broken [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280822 [08:45:05] also godog I'd like to verify the limit is 4.7 GB, or any other number nowadays [08:45:22] (03CR) 10Reedy: [C: 032] Revert labswiki to wmf.18 as 2FA seems to be broken [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280822 (owner: 10Reedy) [08:46:01] (03Merged) 10jenkins-bot: Revert labswiki to wmf.18 as 2FA seems to be broken [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280822 (owner: 10Reedy) [08:48:23] matanya: yup, should be 2GB these days, https://phabricator.wikimedia.org/T76614 [08:50:31] (03CR) 10Ema: [C: 031] "LGTM. The - ensure => present bit in pcc is not something to worry about. I've tried removing /etc/rsyslog.d/70-varnishkafka.conf with thi" [puppet] - 10https://gerrit.wikimedia.org/r/280678 (https://phabricator.wikimedia.org/T129344) (owner: 10Elukey) [08:51:11] (03CR) 10Jcrespo: [C: 031] DB: Expose Puppet SSL certs and generate CA cert [puppet] - 10https://gerrit.wikimedia.org/r/279596 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [08:54:21] (03PS5) 10Elukey: Update the varnishkafka module with latest changes. [puppet] - 10https://gerrit.wikimedia.org/r/280678 (https://phabricator.wikimedia.org/T129344) [08:55:17] funny enough godog, i opened that one. [08:55:33] 6Operations, 10DBA, 6Labs: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2167442 (10jcrespo) The following tables on enwiki have been already imported: ``` 73350868 logging 51366344 archive 68959980 article_feedback 49353723 updates 39641126 page 33369615 article_feedback_revisions... [08:56:28] matanya: hehe true, that's live afaik [08:56:42] yes, just checked, it is [08:56:54] godog: I want to raise it to swift max [08:57:01] what number is that ? [08:57:09] (03CR) 10Elukey: [C: 032] Update the varnishkafka module with latest changes. [puppet] - 10https://gerrit.wikimedia.org/r/280678 (https://phabricator.wikimedia.org/T129344) (owner: 10Elukey) [08:57:25] 5GB by default IIRC [08:59:42] the latest throttle deploy for jerusalem may be missing some quotes https://logstash.wikimedia.org/#dashboard/temp/AVPN8G8sO3D718AOD3-_ [09:00:04] so i would like to raise that to 5GB, any concern ? [09:00:38] https://phabricator.wikimedia.org/rOMWCb299c6c14336b06a8acdb20dfa10be44cdb09c20 wat? [09:01:05] (03CR) 10JanZerebecki: [C: 031] update the DNS record for benefactors.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/280637 (https://phabricator.wikimedia.org/T130937) (owner: 10Mschon) [09:01:49] matanya: there were some on the exact limit on https://phabricator.wikimedia.org/T116514#1988121 but tl;dr is that it should be tested in beta first [09:02:20] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Puppet has 1 failures [09:03:20] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [09:04:36] (03PS1) 10Jcrespo: Fix quotes on throttle.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280826 [09:05:08] jynus: Ugh, how the feck I ended up using those quotes [09:05:20] (03CR) 10Reedy: [C: 032] Fix quotes on throttle.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280826 (owner: 10Jcrespo) [09:05:27] (03PS2) 10Jcrespo: Fix quotes on throttle.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280826 [09:05:33] 6Operations, 10Traffic, 13Patch-For-Review: varnishkafka logrotate cronspam - https://phabricator.wikimedia.org/T129344#2167546 (10elukey) Update: - rsyslog command has been updated to avoid cronspam - files/varnish/varnishkafka_rsyslog.conf has been moved from the puppet repo to the varnishkafka submodule... [09:06:32] do we need ok from releng? [09:07:01] Reedy: jynus: this throttle rule has some malediction on it :D [09:07:14] jynus: no [09:07:16] hashar, Dereckson I know you are not involved with that, but give me your optinion [09:07:52] (03CR) 10Reedy: [C: 032] Fix quotes on throttle.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280826 (owner: 10Jcrespo) [09:07:53] jynus: as far as I know, trivial config changes could be applied directly [09:08:01] ok, doing [09:08:18] (03Merged) 10jenkins-bot: Fix quotes on throttle.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280826 (owner: 10Jcrespo) [09:08:30] oh ,Ready merged [09:08:40] PROBLEM - cassandra-b CQL 10.192.32.138:9042 on restbase2004 is CRITICAL: Connection refused [09:08:48] are going to deploy, or do I? [09:08:53] (03PS1) 10Jforrester: Enable VisualEditor Beta Feature on Wikisources, Wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280828 [09:08:55] yeah, I will [09:08:58] ok [09:09:11] thanks for that :) [09:09:23] it created no issue, but I worry about logging when it has lots of stress [09:09:35] !log reedy@tin Synchronized wmf-config/throttle.php: Fix my quote fail (duration: 00m 29s) [09:09:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:09:49] Yeah, makes sense to fix it [09:09:55] (03PS1) 10Amire80: Add the Hatnote blog to the English Planet [puppet] - 10https://gerrit.wikimedia.org/r/280829 [09:10:05] (03PS1) 10ArielGlenn: dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 [09:10:11] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [09:10:29] we already make mistakes, but come on, 70K/minute on fatalmonitor! [09:10:36] s/already/all/ [09:11:18] (03CR) 10jenkins-bot: [V: 04-1] dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 (owner: 10ArielGlenn) [09:11:24] I've done worse ;) [09:12:43] (03PS2) 10Amire80: Add the Hatnote blog to the English Planet [puppet] - 10https://gerrit.wikimedia.org/r/280829 [09:14:35] !log testing new Cache-Control headers on mw1017 [09:14:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:15:08] there is a lot of activity on wikidata [09:15:09] (03PS1) 10Matanya: upload limit: raise to 5 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280831 [09:15:25] it is creating spikes of errors here and there [09:17:46] some reads, but a lot of updates [09:18:22] (03CR) 10Catrope: [C: 04-1] upload limit: raise to 5 GB (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280831 (owner: 10Matanya) [09:18:41] (03PS2) 10ArielGlenn: dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 [09:19:23] 6Operations, 10Continuous-Integration-Infrastructure, 10Phabricator, 10netops, and 4 others: Make sure phab can talk to gearman and nodepool instances can talk to phabricator - https://phabricator.wikimedia.org/T131375#2167638 (10hashar) >>! In T131375#2165682, @mmodell wrote: > @hashar: I don't know how t... [09:21:18] (03CR) 10Aaron Schulz: upload limit: raise to 5 GB (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280831 (owner: 10Matanya) [09:26:27] Reedy: is https://phabricator.wikimedia.org/T41985#459523 still the procedure to use? [09:27:20] Dereckson: Not sure [09:28:19] (03PS1) 10Muehlenhoff: Add ferm service for rsyslog instance on netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/280834 [09:29:07] Dereckson: Krenair has done it more recently than me [09:29:21] Dereckson: https://wikitech.wikimedia.org/wiki/Uploading_large_files [09:29:41] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:29:48] (03CR) 10Gehel: "I've tested this change by manually applying it on mw1017 (now rolled back)." [puppet] - 10https://gerrit.wikimedia.org/r/280204 (https://phabricator.wikimedia.org/T126280) (owner: 10Gehel) [09:30:07] Reedy: and krenair documented the command used in https://phabricator.wikimedia.org/T128306#2070706 [09:30:34] I have tested https://gerrit.wikimedia.org/r/#/c/280204/ on mw1017, it seems to work just fine. [09:31:04] I now need to deploy it for real. It requires an Apache reload. How do I go about it without breaking everything? [09:31:43] It used to be apache-graceful-all [09:32:14] gehel: https://wikitech.wikimedia.org/wiki/Application_servers#Deploying_config [09:32:18] (03PS3) 10ArielGlenn: dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 [09:32:47] Reedy: thanks! [09:33:35] (03CR) 10jenkins-bot: [V: 04-1] dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 (owner: 10ArielGlenn) [09:36:11] gehel: you can always bribe Reedy and get him to do it (if you want to hide, that is) [09:36:23] Not sure if I have sufficient access [09:36:44] p858snake: Nah, I have broken enough thinks this week, one more will not matter much. [09:36:59] though the other broken things were much less visible ... [09:37:02] (03PS4) 10ArielGlenn: dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 [09:38:05] (03CR) 10jenkins-bot: [V: 04-1] dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 (owner: 10ArielGlenn) [09:38:40] PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:40:20] RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING [09:42:30] (03PS5) 10ArielGlenn: dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 [09:51:42] (03PS1) 10Giuseppe Lavagetto: Log all write activity to an irc bot [software/conftool] - 10https://gerrit.wikimedia.org/r/280843 [09:56:42] (03PS2) 10Matanya: upload limit: raise to 4 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280831 [09:57:27] (03PS6) 10Faidon Liambotis: scap::target: Allow scap's user to restart all services on a node [puppet] - 10https://gerrit.wikimedia.org/r/279717 (https://phabricator.wikimedia.org/T130948) (owner: 10Mobrovac) [09:57:43] (03CR) 10Faidon Liambotis: [C: 032] scap::target: Allow scap's user to restart all services on a node [puppet] - 10https://gerrit.wikimedia.org/r/279717 (https://phabricator.wikimedia.org/T130948) (owner: 10Mobrovac) [09:58:50] (03CR) 10jenkins-bot: [V: 04-1] scap::target: Allow scap's user to restart all services on a node [puppet] - 10https://gerrit.wikimedia.org/r/279717 (https://phabricator.wikimedia.org/T130948) (owner: 10Mobrovac) [09:59:44] hashar: ^^^ puppet-lint looks broken? [10:00:11] (03CR) 10Faidon Liambotis: [V: 032] scap::target: Allow scap's user to restart all services on a node [puppet] - 10https://gerrit.wikimedia.org/r/279717 (https://phabricator.wikimedia.org/T130948) (owner: 10Mobrovac) [10:00:34] (03PS1) 10Matanya: raise upload limit to 4 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280850 [10:02:09] (03PS2) 10Catrope: Raise upload limit to 4 GB in beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280850 (owner: 10Matanya) [10:03:06] (03CR) 10Catrope: [C: 04-1] Raise upload limit to 4 GB in beta labs (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280850 (owner: 10Matanya) [10:04:57] (03PS6) 10ArielGlenn: dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 [10:06:54] 6Operations, 10EventBus, 6Services, 15User-mobrovac, 7service-deployment-requests: New Service Request - Change Propagation - https://phabricator.wikimedia.org/T128463#2167803 (10mobrovac) [10:07:37] 6Operations, 10EventBus, 6Services, 15User-mobrovac, 7service-deployment-requests: New Service Request - Change Propagation - https://phabricator.wikimedia.org/T128463#2075614 (10mobrovac) 5Open>3Resolved [10:07:40] 6Operations, 10Analytics, 10ArchCom-RfC, 6Discovery, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#2167807 (10mobrovac) [10:09:13] (03PS3) 10Matanya: raise upload limit to 4 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280850 [10:10:02] * matanya raises fist at vi [10:13:15] (03PS1) 10Muehlenhoff: Allow access to yubikey validation server from bastion hosts [puppet] - 10https://gerrit.wikimedia.org/r/280855 [10:20:09] ACKNOWLEDGEMENT - cassandra-b CQL 10.192.32.138:9042 on restbase2004 is CRITICAL: Connection refused Filippo Giunchedi bootstrapping [10:20:21] (03PS7) 10ArielGlenn: dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 [10:21:16] (03CR) 10Muehlenhoff: [C: 032 V: 032] Allow access to yubikey validation server from bastion hosts [puppet] - 10https://gerrit.wikimedia.org/r/280855 (owner: 10Muehlenhoff) [10:22:50] 6Operations, 6Project-Admins, 3DevRel-March-2016: Operations-related subprojects/tags reorganization - https://phabricator.wikimedia.org/T119944#2167913 (10Aklapper) p:5Normal>3Low >>! In T119944#2140271, @Aklapper wrote: > It's also possible to (mass-)remove #dc-ops from those 47 tasks that have both #d... [10:27:36] (03PS8) 10ArielGlenn: dumps: make dir var references easier on the eyes in the manifests [puppet] - 10https://gerrit.wikimedia.org/r/280830 [10:29:10] (03CR) 10ArielGlenn: [C: 032] "confirmed noop by puppet compiler" [puppet] - 10https://gerrit.wikimedia.org/r/280830 (owner: 10ArielGlenn) [10:29:17] 6Operations, 6Analytics-Kanban, 10Traffic, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2167924 (10elukey) Update: we added all the patches (mine + BBlack's) to the latest version of vk's deb package and installed it on cp1043/cp1044... [10:34:59] (03PS1) 10Muehlenhoff: Install libpam-yubico [puppet] - 10https://gerrit.wikimedia.org/r/280858 [10:37:04] (03CR) 10Muehlenhoff: [C: 032 V: 032] Install libpam-yubico [puppet] - 10https://gerrit.wikimedia.org/r/280858 (owner: 10Muehlenhoff) [10:48:36] (03PS1) 10Filippo Giunchedi: install_server: add graphite2002 partman [puppet] - 10https://gerrit.wikimedia.org/r/280862 (https://phabricator.wikimedia.org/T130938) [10:50:11] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: add graphite2002 partman [puppet] - 10https://gerrit.wikimedia.org/r/280862 (https://phabricator.wikimedia.org/T130938) (owner: 10Filippo Giunchedi) [10:56:19] (03PS1) 10Jcrespo: [WIP]Add mysql to labs dns servers [puppet] - 10https://gerrit.wikimedia.org/r/280863 (https://phabricator.wikimedia.org/T128737) [10:57:40] (03CR) 10jenkins-bot: [V: 04-1] [WIP]Add mysql to labs dns servers [puppet] - 10https://gerrit.wikimedia.org/r/280863 (https://phabricator.wikimedia.org/T128737) (owner: 10Jcrespo) [11:01:18] 6Operations, 10Analytics, 10Datasets-General-or-Unknown, 10Traffic, 13Patch-For-Review: http://dumps.wikimedia.org should redirect to https:// - https://phabricator.wikimedia.org/T128587#2167989 (10ArielGlenn) I'm sending mail to wikitech-l announcing that the switch will go live Monday (Apr 4). [11:06:05] (03PS2) 10Jcrespo: [WIP]Add mysql to labs dns servers [puppet] - 10https://gerrit.wikimedia.org/r/280863 (https://phabricator.wikimedia.org/T128737) [11:06:36] !log Imported Jerusalem_Old_City_Walking_to_the_Western_Wall_4K.webm to Commons (T131441) [11:06:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:06:44] T131441: Please upload large file to Wikimedia Commons - https://phabricator.wikimedia.org/T131441 [11:07:14] (we log that?) [11:08:20] (or only anything which modifies servers states?) [11:09:14] There's no clear policy for that AFAIK [11:09:21] I wouldn't bother doing that [11:09:26] We often do log when running maintenance scripts, but not always [11:09:37] And yeah as Reedy said a file import is routine enough to not log it [11:09:49] Okay. [11:10:23] 6Operations, 10ops-codfw, 13Patch-For-Review: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2168000 (10fgiunchedi) a:5Papaul>3RobH I've updated the switch port, the host is missing 4x SSDs afaics ``` ~ # cat /proc/partitions major minor #blocks name 8 0 9767... [11:10:42] Dereckson: SAL is kinda anything that you think that others might find intresting/need to know about [11:15:35] (03PS4) 10Jforrester: Enable VisualEditor Single Edit Tab on the English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274131 (https://phabricator.wikimedia.org/T128478) [11:18:16] (03PS4) 10Matanya: raise upload limit to 4 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280850 [11:18:55] 6Operations, 10DNS, 6Labs, 10Traffic: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168029 (10Harej) Stating for the record that `http://librarybase.wmflabs.org` worked just fine until today. [11:19:07] (03PS5) 10Matanya: raise upload limit to 4 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280850 [11:19:26] 6Operations, 10DNS, 6Labs, 10Traffic: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168030 (10Harej) p:5Triage>3Unbreak! [11:21:20] (03PS3) 10Jforrester: VisualEditor: Enabled for logged-out users on the English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242042 (https://phabricator.wikimedia.org/T90662) [11:21:50] (03PS1) 10Dereckson: Allow wmf-config/throttle.php to be lenient on ip/IP typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280865 (https://phabricator.wikimedia.org/T131469) [11:23:39] (03CR) 10Catrope: [C: 032] raise upload limit to 4 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280850 (owner: 10Matanya) [11:24:05] (03Merged) 10jenkins-bot: raise upload limit to 4 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280850 (owner: 10Matanya) [11:24:31] 6Operations, 10DNS, 6Labs, 10Traffic: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168047 (10Dereckson) @yuvipanda Is the priority of this task correctly asserted? [11:25:16] (03PS3) 10Matanya: upload limit: raise to 4 GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280831 [11:26:09] 6Operations, 10DNS, 6Labs, 10Traffic: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168049 (10Harej) I consider not being able to access Librarybase to be a production outage, the fact that is currently accessible from an alternate URL no... [11:30:14] 6Operations, 10DNS, 6Labs, 10Traffic: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168050 (10Harej) [11:31:52] (03PS1) 10Jforrester: [Cleanup] Remove VisualEditor experimental config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280869 [11:31:54] (03PS1) 10Jforrester: [Cleanup] Remove VisualEditor AutoAccountEnable config now unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280870 [11:41:50] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [11:41:50] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [11:45:42] (03PS1) 10ArielGlenn: dumps: clean up some dir refs in templates to be more readable [puppet] - 10https://gerrit.wikimedia.org/r/280873 [11:47:43] 6Operations, 10DNS, 6Labs, 10Traffic: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168147 (10valhallasw) p:5Unbreak!>3Normal I'm sorry @harej, but given that this is a) labs (and therefore by definition not production) and b) there i... [11:57:21] (03PS2) 10ArielGlenn: dumps: clean up some dir refs in templates to be more readable [puppet] - 10https://gerrit.wikimedia.org/r/280873 [12:00:16] 6Operations, 10DNS, 6Labs, 10Traffic: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168165 (10valhallasw) [12:05:14] (03CR) 10ArielGlenn: [C: 032] "verified noop for all but a simple variable substitution in a shell script." [puppet] - 10https://gerrit.wikimedia.org/r/280873 (owner: 10ArielGlenn) [12:07:19] 6Operations, 10DNS, 6Labs, 10Traffic: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168196 (10Krenair) 5Open>3Resolved a:3Krenair I've handled this specific case, the generic issue is T131367. Also, not a production outage, this is... [12:09:11] (03PS1) 10Ema: Fix typo in /usr/local/bin/drain [puppet] - 10https://gerrit.wikimedia.org/r/280876 [12:20:18] (03PS1) 10ArielGlenn: dumps: move (most) cron job manifests into cron directory [puppet] - 10https://gerrit.wikimedia.org/r/280885 [12:27:07] (03PS2) 10ArielGlenn: dumps: move (most) cron job manifests into cron directory [puppet] - 10https://gerrit.wikimedia.org/r/280885 [12:30:02] 6Operations, 10DNS, 6Labs, 10Traffic: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168396 (10BBlack) [12:30:22] 6Operations, 6Labs: librarybase project cannot create a proxy for librarybase.wmflabs.org - https://phabricator.wikimedia.org/T131448#2168398 (10BBlack) [12:37:31] (03PS3) 10ArielGlenn: dumps: move (most) cron job manifests into cron directory [puppet] - 10https://gerrit.wikimedia.org/r/280885 [12:40:01] PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [5000000.0] [12:44:21] (03CR) 10ArielGlenn: [C: 032] dumps: move (most) cron job manifests into cron directory [puppet] - 10https://gerrit.wikimedia.org/r/280885 (owner: 10ArielGlenn) [12:47:05] 6Operations, 6Analytics-Kanban, 10Traffic: varnishkafka logrotate cronspam - https://phabricator.wikimedia.org/T129344#2168509 (10elukey) [12:48:17] (03PS1) 10ArielGlenn: dumps: fix up a couple of requires for new location of cron manifests [puppet] - 10https://gerrit.wikimedia.org/r/280894 [12:50:07] ignore puppet whine from snapshot1003 please, ficing [12:50:10] fixing [12:50:40] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: puppet fail [12:52:43] 6Operations, 10Traffic, 13Patch-For-Review: Upgrade to Varnish 4: things to remember - https://phabricator.wikimedia.org/T126206#2168548 (10ema) [12:53:23] (03CR) 10ArielGlenn: [C: 032] dumps: fix up a couple of requires for new location of cron manifests [puppet] - 10https://gerrit.wikimedia.org/r/280894 (owner: 10ArielGlenn) [12:56:00] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:56:25] (03CR) 10Elukey: [C: 04-1] "Took a look to https://www.nginx.com/blog/creating-nginx-rewrite-rules/ --> Example: Forcing all Requests to Use SSL/TLS" [puppet] - 10https://gerrit.wikimedia.org/r/278861 (https://phabricator.wikimedia.org/T128587) (owner: 10ArielGlenn) [13:05:51] (03PS1) 10ArielGlenn: dumps: move cron jobs out of dumps dir into cron dir [puppet] - 10https://gerrit.wikimedia.org/r/280901 [13:14:58] (03CR) 10ArielGlenn: [C: 032] dumps: move cron jobs out of dumps dir into cron dir [puppet] - 10https://gerrit.wikimedia.org/r/280901 (owner: 10ArielGlenn) [13:15:42] 6Operations, 10Traffic, 7Documentation: Update TLS/HTTP documentation on wikitech - https://phabricator.wikimedia.org/T96844#2168643 (10ema) [13:16:49] PROBLEM - puppet last run on mw1115 is CRITICAL: CRITICAL: Puppet has 1 failures [13:18:39] 6Operations, 10Phabricator, 6Project-Admins, 6Triagers: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#1342459 (10JeanFred) Hello, I would like to request project creation on Phabricator. I am not expecting to create a lot of projects, but I w... [13:23:33] 6Operations, 10Phabricator, 6Project-Admins, 6Triagers: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#2168682 (10Luke081515) You can't convert project to subprojects via the WebUI. Otherwise you need to create a new project, and archive the ol... [13:26:17] (03CR) 10Dereckson: "Is that useful before we raise the timeout?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280831 (owner: 10Matanya) [13:26:40] RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 50.00% above the threshold [1000000.0] [13:28:21] (03PS2) 10Dereckson: Update project logo for ast.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280445 (https://phabricator.wikimedia.org/T131247) (owner: 10MarcoAurelio) [13:28:38] (03CR) 10Dereckson: [C: 031] "PS2: optipng -o7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280445 (https://phabricator.wikimedia.org/T131247) (owner: 10MarcoAurelio) [13:28:40] (03PS5) 10Tim Landscheidt: Tools: Unpuppetize host_aliases [puppet] - 10https://gerrit.wikimedia.org/r/241582 (https://phabricator.wikimedia.org/T109485) [13:32:59] 6Operations, 10Phabricator, 6Project-Admins, 6Triagers: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#2168712 (10JeanFred) >>! In T706#2168682, @Luke081515 wrote: > You can't convert project to subprojects via the WebUI. Otherwise you need to... [13:33:36] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: Evaluate and Test Limited Deployment of Varnish 4 - https://phabricator.wikimedia.org/T122880#2168713 (10ema) [13:33:57] 6Operations, 10Traffic, 7Varnish: Install XKey vmod - https://phabricator.wikimedia.org/T122881#2168714 (10ema) [13:34:06] (03PS1) 10ArielGlenn: dumps: move cron related files to files/cron [puppet] - 10https://gerrit.wikimedia.org/r/280908 [13:34:29] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: Create separate packages for required vmods - https://phabricator.wikimedia.org/T124281#2168715 (10ema) [13:34:52] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish, and 2 others: Varnish support for shutting users out of a DC - https://phabricator.wikimedia.org/T129424#2168716 (10ema) [13:36:05] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: cache_misc's misc_fetch_large_objects has issues - https://phabricator.wikimedia.org/T128813#2168732 (10ema) [13:36:43] 6Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 13Patch-For-Review, 7Varnish: Make CI run Varnish VCL tests - https://phabricator.wikimedia.org/T128188#2168733 (10ema) [13:37:09] 6Operations, 10Analytics, 10Traffic, 7Varnish: Sort out analytics service dependency issues for cp* cache hosts - https://phabricator.wikimedia.org/T128374#2168736 (10ema) [13:37:20] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: Upgrade to Varnish 4: things to remember - https://phabricator.wikimedia.org/T126206#2168737 (10ema) [13:37:32] 6Operations, 10Traffic, 7HTTPS, 7Varnish: Outbound HTTPS for varnish backend instances - https://phabricator.wikimedia.org/T109325#2168738 (10ema) [13:38:02] (03PS2) 10Ema: Fix typo in /usr/local/bin/drain [puppet] - 10https://gerrit.wikimedia.org/r/280876 [13:38:12] (03CR) 10Ema: [C: 032 V: 032] Fix typo in /usr/local/bin/drain [puppet] - 10https://gerrit.wikimedia.org/r/280876 (owner: 10Ema) [13:39:10] !log deploying apache config for MW appservers: new cache-control headers for portals (https://gerrit.wikimedia.org/r/#/c/280204/ T126280) [13:39:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:39:34] (03PS3) 10Gehel: Allow browser to cache /portal static files. [puppet] - 10https://gerrit.wikimedia.org/r/280204 (https://phabricator.wikimedia.org/T126280) [13:39:46] Dereckson: can you please elaborate about the timeout ? [13:42:21] (03PS4) 10Gehel: Allow browser to cache /portal static files. [puppet] - 10https://gerrit.wikimedia.org/r/280204 (https://phabricator.wikimedia.org/T126280) [13:44:22] (03CR) 10Gehel: [C: 032] Allow browser to cache /portal static files. [puppet] - 10https://gerrit.wikimedia.org/r/280204 (https://phabricator.wikimedia.org/T126280) (owner: 10Gehel) [13:48:43] (03CR) 10Andrew Bogott: [C: 031] [WIP]Add mysql to labs dns servers [puppet] - 10https://gerrit.wikimedia.org/r/280863 (https://phabricator.wikimedia.org/T128737) (owner: 10Jcrespo) [13:51:00] PROBLEM - puppet last run on mw1115 is CRITICAL: CRITICAL: Puppet has 1 failures [13:52:49] ^ puppet on mw1115 is probably me, checking... [13:55:36] matanya: if we try, outside GWT, to upload a large file from a remote URL, for example through Special:Upload, there is a maximal time to achieve the download operation. [13:56:03] so if your patch is intended for GWT, that's perfect, as the timeout there is 15 minutes [13:56:37] but it's a more generic patch, the timeout is 90s [13:56:47] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168822 (10ema) [13:57:00] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168834 (10ema) p:5Triage>3Normal [13:57:14] matanya: I'm adding you to https://phabricator.wikimedia.org/T118887 [13:57:22] thanks Dereckson [13:57:23] oh you already are subscribed [13:57:29] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: Upgrade to Varnish 4: things to remember - https://phabricator.wikimedia.org/T126206#2168836 (10ema) [13:57:31] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168822 (10ema) [13:57:44] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168822 (10ema) [13:57:46] 6Operations, 10Traffic, 7Varnish: Port remaining scripts depending on varnishlog.py to new VSL API - https://phabricator.wikimedia.org/T131353#2168838 (10ema) [13:58:36] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: Upgrade to Varnish 4: things to remember - https://phabricator.wikimedia.org/T126206#2007434 (10ema) [13:58:38] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: Evaluate and Test Limited Deployment of Varnish 4 - https://phabricator.wikimedia.org/T122880#2168839 (10ema) [13:58:59] 6Operations, 10Traffic, 7Varnish: Install XKey vmod - https://phabricator.wikimedia.org/T122881#2168842 (10ema) [13:59:01] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168822 (10ema) [13:59:44] Dereckson: so how can i upload 1-2 GB files without an issue ? [14:00:00] RECOVERY - puppet last run on mw1115 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [14:00:40] https://commons.wikimedia.org/wiki/Category:Uploaded_with_video2commons/Server-side_uploads [14:01:00] 6Operations, 10Traffic, 7Varnish: Convert misc cluster to Varnish 4 - https://phabricator.wikimedia.org/T131501#2168861 (10ema) [14:01:37] 6Operations, 10Traffic, 7Varnish: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502#2168876 (10ema) [14:01:39] 7Blocked-on-Operations, 6Operations, 10Phabricator, 10Traffic: Phabricator needs to expose notification daemon (websocket) - https://phabricator.wikimedia.org/T112765#2168889 (10chasemp) >>! In T112765#2166588, @greg wrote: >>>! In T112765#1822062, @chasemp wrote: >> We need to make a plan to get connectiv... [14:01:48] matanya: you can't [14:02:07] 6Operations, 10Traffic, 7Varnish: Convert text cluster to Varnish 4 - https://phabricator.wikimedia.org/T131503#2168895 (10ema) [14:02:22] matanya: we first need to increase the timeout, so there is TIME to download at 100 or 300 Mbps your 2 GB [14:02:40] Dereckson: that is true for servers side or any upload ? [14:02:42] (bawolff indicated we also need extra time for a copy operation once the file is uploaded) [14:02:54] that's for server side [14:03:03] for the upload wizard, we have another issue [14:03:09] cause https://commons.wikimedia.org/wiki/Category:Uploaded_with_video2commons/Server-side_uploads is full of above 1GB files [14:03:18] 6Operations, 10Traffic, 7Varnish: Install XKey vmod - https://phabricator.wikimedia.org/T122881#2168925 (10ema) [14:03:20] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168924 (10ema) [14:03:24] or i am missing something here [14:03:27] yes [14:03:33] there are two kind of "server side uploads" [14:03:39] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 51.72% of data above the critical threshold [5000000.0] [14:03:56] people with sysop right on Wikimedia Commons (or member of GWT group) can use [[Special:Upload]] old form [14:04:00] and give an URL [14:04:23] If the URL is whitelisted, the file is downloaded. It's using this mode we bump on the timeout. [14:04:55] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168822 (10ema) a:3ema [14:05:05] what is the downside of increasing the timeout ? [14:05:09] I don't know. [14:05:25] It were my last question on the task. [14:05:32] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168929 (10ema) [14:05:34] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: Evaluate and Test Limited Deployment of Varnish 4 - https://phabricator.wikimedia.org/T122880#2168930 (10ema) [14:05:54] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168822 (10ema) [14:05:56] 6Operations, 10Traffic, 7Varnish: Install XKey vmod - https://phabricator.wikimedia.org/T122881#2168932 (10ema) [14:05:58] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: Evaluate and Test Limited Deployment of Varnish 4 - https://phabricator.wikimedia.org/T122880#1916092 (10ema) [14:06:17] 6Operations, 10Traffic, 7Varnish: Uprade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2168822 (10ema) [14:06:20] 6Operations, 10Traffic, 13Patch-For-Review, 7Varnish: Evaluate and Test Limited Deployment of Varnish 4 - https://phabricator.wikimedia.org/T122880#1916092 (10ema) [14:06:47] matanya: you see, the files in this cat were uploaded by krenair: https://phabricator.wikimedia.org/T126133 [14:07:49] Anyone know how we can add cxserver in Graphana? like, https://grafana.wikimedia.org/dashboard/db/service-citoid [14:07:54] mobrovac: ^^ [14:07:54] Dereckson: cahtted with brion about it [14:08:15] \o/ [14:08:28] hi brion [14:08:33] it adds the danger of beinging the servers to an infinte loop when stucking in processing [14:08:42] i.e. a ddos vector [14:08:57] kart_: graphana reads metrics' info from graphite [14:09:08] ideally whitelisted sites don't have infinite-length streams to download :D [14:09:14] kart_: in order to have something to display, you need to send it some metrics :P [14:09:36] if you're upping the limit *only* while downloading via ini_set or whatever, then that's the main danger [14:09:44] and since they're whitelisted i wouldn't super worry about that case [14:09:49] but it's a consideration [14:09:53] brion: we already restrict this feature to whitelist domains, and csteipp make very verbose warnings against allowing too broad hosts, like Google Cloud Engine [14:09:57] mobrovac: we can. like mt requests etc [14:09:59] (or labs) [14:09:59] yep [14:10:13] wouldn't hurt to make sure the download process has a byte size limit too [14:10:13] kart_: see https://github.com/wikimedia/service-template-node/blob/master/doc/coding.md#metrics-collection on how to send them [14:10:17] even if its in the multi gigabytes [14:10:23] can we add one labs tool ? [14:10:29] mobrovac: cool. Thanks. [14:10:35] or the entire domain only ? [14:10:47] it currently works by domain [14:11:14] yeah i think you'd have to give it a subdomain [14:11:16] but if the tool has its own VM instead of be on labs, that's possible [14:11:18] kart_: also https://github.com/wikimedia/service-runner#metric-reporting [14:12:02] Dereckson: it has, several of them [14:12:26] (03CR) 10Rush: [C: 04-1] "What do you think about leaving this applied via puppet but moving the list itself to hiera on wiki? I'm not a huge fan of the wikified h" [puppet] - 10https://gerrit.wikimedia.org/r/241582 (https://phabricator.wikimedia.org/T109485) (owner: 10Tim Landscheidt) [14:14:21] RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 50.00% above the threshold [1000000.0] [14:15:00] matanya: yes, we can whitelist that [14:15:11] but there will be still the timeout issue [14:17:02] yes, i'll think about that now [14:21:30] 6Operations, 10hardware-requests, 13Patch-For-Review: Allocate 2 analytics machines to experiment with a jupyterhub notebook service - https://phabricator.wikimedia.org/T130760#2169031 (10Ottomata) @yuvipanda can you remove the relevant analytics* node entries from site.pp too? [14:23:33] the current value is 90 s while and a 2 Gb file at 100 Mbps takes 164 s [14:24:05] (03PS3) 10Ottomata: Create eventlogging::deployment::target define that abstracts scap::target for eventlogging targets [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [14:24:39] what about a 200s limit (2 Gb at 100 Mbps + 30s margin to start upload, reach this speed and then let time for the server to handle the final operation once the upload is done)? [14:24:58] sounds sane to me [14:25:54] though i guess 300s would be safe too [14:26:38] ok, see you all tomorrow [14:26:38] (03PS3) 10Ottomata: [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280771 (https://phabricator.wikimedia.org/T118772) [14:26:44] ++ matanya [14:27:37] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280771 (https://phabricator.wikimedia.org/T118772) (owner: 10Ottomata) [14:31:27] (03PS4) 10Ottomata: [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280771 (https://phabricator.wikimedia.org/T118772) [14:32:25] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280771 (https://phabricator.wikimedia.org/T118772) (owner: 10Ottomata) [14:35:19] (03PS5) 10Ottomata: [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280771 (https://phabricator.wikimedia.org/T118772) [14:45:41] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:50] 6Operations, 10ops-codfw, 10RESTBase-Cassandra: restbase2004.codfw.wmnet: Failed disk/RAID - https://phabricator.wikimedia.org/T130990#2169207 (10Eevans) 5Open>3Resolved 2004 is back online; Closing [14:53:59] 6Operations, 10RESTBase-Cassandra, 6Services: Investigate high read requests on restbase1012-a - https://phabricator.wikimedia.org/T131370#2169240 (10Eevans) 5Open>3Resolved Mystery solved; Closing issue. [14:55:01] anomie: ostriches thcipriani MarkTraceur How would you feel about swating https://gerrit.wikimedia.org/r/#/c/280874/ and https://gerrit.wikimedia.org/r/#/c/280875/ toward the end of this swat session? It would be great to get this data soon :) [14:55:13] Just on my way back to the hostel at the hackathon now though! [14:56:19] addshore: there is no swat on fridays [14:56:21] addshore: it's' friday :) [14:57:10] addshore: Even if there were SWAT on Friday, why SWAT https://gerrit.wikimedia.org/r/#/c/280874/? It only touches unit tests. [14:57:54] (03CR) 10Giuseppe Lavagetto: Add select mode (035 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/278552 (https://phabricator.wikimedia.org/T128199) (owner: 10Giuseppe Lavagetto) [15:01:13] !log rebooting copper for upgrade to 4.4 [15:01:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:04:43] !log grafana made me and godog admins [15:04:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:06:32] thcipriani: anomie greg-g it is indeed Friday! This week has been so manic I didn't even notice! [15:06:40] :D [15:07:03] Anomie, you wouldn't be able to merge one without the other :) well... Jenkins would say no [15:07:26] 6Operations, 10Continuous-Integration-Infrastructure, 6Services: Package npm 2.14 - https://phabricator.wikimedia.org/T124474#2169270 (10hashar) [15:07:28] 6Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Switch CI from jsduck deb package to a gemfile/bundler system - https://phabricator.wikimedia.org/T109005#2169269 (10hashar) [15:07:49] I guess I will have to wait until after the hackathon ;) [15:07:50] (03PS3) 10Dzahn: Add the Hatnote blog to the English Planet [puppet] - 10https://gerrit.wikimedia.org/r/280829 (owner: 10Amire80) [15:07:58] (03CR) 10Dzahn: [C: 032] Add the Hatnote blog to the English Planet [puppet] - 10https://gerrit.wikimedia.org/r/280829 (owner: 10Amire80) [15:08:05] 6Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Switch CI from jsduck deb package to a gemfile/bundler system - https://phabricator.wikimedia.org/T109005#1537429 (10hashar) And the job fails with `cb() never called` which is due to npm1 :( Hence blocked by T124474. [15:08:51] 6Operations, 10Continuous-Integration-Infrastructure, 6Services: Package npm 2.14 - https://phabricator.wikimedia.org/T124474#2169276 (10Paladox) We will need to temporarily do npm install npm@2 for now until we either package npm 2 or using nvm. [15:11:20] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:13:34] (03CR) 10Dzahn: "the author said:" [puppet] - 10https://gerrit.wikimedia.org/r/280757 (owner: 10Dzahn) [15:13:35] 6Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Switch CI from jsduck deb package to a gemfile/bundler system - https://phabricator.wikimedia.org/T109005#2169281 (10hashar) So repositories should get a Gemfile with jsduck and trigger the job `rake-jessie` Alternatively, the npm entry point... [15:13:47] 6Operations, 10Continuous-Integration-Infrastructure, 6Services: Package npm 2.14 - https://phabricator.wikimedia.org/T124474#2169283 (10hashar) [15:13:50] 6Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Switch CI from jsduck deb package to a gemfile/bundler system - https://phabricator.wikimedia.org/T109005#2169282 (10hashar) [15:14:21] 6Operations, 10Continuous-Integration-Infrastructure, 6Services: Package npm 2.14 - https://phabricator.wikimedia.org/T124474#1957157 (10hashar) [15:19:44] csteipp, Krenair, bd808, can someone catch me up about what's happening to 2fa on wikitech? It's working properly in production but I just updated labtestwikitech to .19 and it definitely does not work there [15:19:59] Did someone explicitly revert to .18 on silver for this very reason? [15:20:01] https://phabricator.wikimedia.org/T131445 [15:20:05] yes [15:20:16] wikitech is held back to .18 until 2FA is unbroken [15:20:20] ok, thanks [15:20:25] * andrewbogott reads the bug [15:21:14] the bit about the removed hook seems the likely cause [15:23:50] (03PS1) 10Dereckson: Add The Ash Tree to fr.planet [puppet] - 10https://gerrit.wikimedia.org/r/280933 [15:26:20] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: puppet fail [15:26:27] 6Operations, 10Parsoid, 6Services: Switch Parsoid to Jessie and Node 4.2 - https://phabricator.wikimedia.org/T125017#2169338 (10hashar) [15:30:44] (03CR) 10Jgreen: [C: 031] update the DNS record for benefactors.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/280637 (https://phabricator.wikimedia.org/T130937) (owner: 10Mschon) [15:31:41] (03PS4) 10Ottomata: Create eventlogging::deployment::target define that abstracts scap::target for eventlogging targets [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [15:32:32] (03PS2) 10Giuseppe Lavagetto: Log all write activity to an irc bot [software/conftool] - 10https://gerrit.wikimedia.org/r/280843 [15:32:34] (03PS9) 10Giuseppe Lavagetto: Add select mode [software/conftool] - 10https://gerrit.wikimedia.org/r/278552 (https://phabricator.wikimedia.org/T128199) [15:33:00] (03CR) 10Dereckson: [C: 031] Update project logo for ast.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280445 (https://phabricator.wikimedia.org/T131247) (owner: 10MarcoAurelio) [15:34:52] (03CR) 10Giuseppe Lavagetto: [C: 032] Add select mode [software/conftool] - 10https://gerrit.wikimedia.org/r/278552 (https://phabricator.wikimedia.org/T128199) (owner: 10Giuseppe Lavagetto) [15:37:09] !log installing graphite2002 [15:37:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:39:30] 6Operations, 10ops-codfw, 13Patch-For-Review: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2169370 (10Papaul) [15:51:41] (03PS1) 10Jcrespo: [WIP]New user for prometheus monitoring [puppet] - 10https://gerrit.wikimedia.org/r/280939 (https://phabricator.wikimedia.org/T128185) [15:53:50] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:03:21] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 68.97% of data above the critical threshold [5000000.0] [16:10:50] PROBLEM - puppet last run on db2040 is CRITICAL: CRITICAL: Puppet has 1 failures [16:20:45] (03PS5) 10Ottomata: [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers Create eventlogging::deployment::target define that abstracts scap::target for eventlogging targets [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [16:23:32] (03PS2) 10ArielGlenn: dumps: move cron related files to files/cron [puppet] - 10https://gerrit.wikimedia.org/r/280908 [16:23:46] (03Abandoned) 10Ottomata: [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280771 (https://phabricator.wikimedia.org/T118772) (owner: 10Ottomata) [16:24:32] (03PS6) 10Ottomata: [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers Create eventlogging::deployment::target define that abstracts scap::target for eventlogging targets [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [16:25:20] (03CR) 10ArielGlenn: [C: 032] dumps: move cron related files to files/cron [puppet] - 10https://gerrit.wikimedia.org/r/280908 (owner: 10ArielGlenn) [16:25:54] 6Operations, 10Continuous-Integration-Infrastructure, 6Services: Package npm 2.14 - https://phabricator.wikimedia.org/T124474#2169479 (10Paladox) We can always use http://packages.ubuntu.com/xenial/npm which has a deb package. And since Ubuntu is based on debian it should work. @Hashar and @Krinkle what do... [16:28:11] 6Operations, 10Continuous-Integration-Infrastructure, 6Services: Package npm 2.14 - https://phabricator.wikimedia.org/T124474#2169481 (10Paladox) Or we can use https://github.com/nodejs/node/tree/master/deps/npm and add it in the deps/npm folder. [16:30:26] (03PS2) 10Dzahn: Add The Ash Tree to fr.planet [puppet] - 10https://gerrit.wikimedia.org/r/280933 (owner: 10Dereckson) [16:31:21] (03CR) 10Dzahn: [C: 032] Add The Ash Tree to fr.planet [puppet] - 10https://gerrit.wikimedia.org/r/280933 (owner: 10Dereckson) [16:33:14] (03PS2) 10Dzahn: Add ferm service for rsyslog instance on netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/280834 (https://phabricator.wikimedia.org/T105410) (owner: 10Muehlenhoff) [16:34:15] (03PS3) 10Dzahn: Add ferm service for rsyslog instance on netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/280834 (https://phabricator.wikimedia.org/T105410) (owner: 10Muehlenhoff) [16:34:44] (03CR) 10Dzahn: [C: 032] "confirmed - udp/514 rsyslogd" [puppet] - 10https://gerrit.wikimedia.org/r/280834 (https://phabricator.wikimedia.org/T105410) (owner: 10Muehlenhoff) [16:36:14] 6Operations, 10ops-codfw, 13Patch-For-Review: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2169493 (10RobH) We didn't get a task in for this, so I'll create one shortly. Updating from IRC chat with Filippo: from @godog's POV anything that gives >= 2TB usable is fine we'll... [16:37:52] (03PS1) 10ArielGlenn: dumsp: move cron related templates into templates/cron subdir [puppet] - 10https://gerrit.wikimedia.org/r/280944 [16:38:00] RECOVERY - puppet last run on db2040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:39:30] PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: puppet fail [16:44:41] (03PS2) 10ArielGlenn: dumsp: move cron related templates into templates/cron subdir [puppet] - 10https://gerrit.wikimedia.org/r/280944 [16:45:52] (03CR) 10ArielGlenn: [C: 032] dumsp: move cron related templates into templates/cron subdir [puppet] - 10https://gerrit.wikimedia.org/r/280944 (owner: 10ArielGlenn) [16:54:00] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 50.00% above the threshold [1000000.0] [16:54:34] (03CR) 10Dzahn: [C: 032] "i'm not sure why this deprecated class is on netmon1001. but the change looks good nevertheless. confirmed those ports are used by gmetad," [puppet] - 10https://gerrit.wikimedia.org/r/280817 (owner: 10Muehlenhoff) [16:54:59] (03PS2) 10Dzahn: Add ferm rules for ganglia::deprecated::collector [puppet] - 10https://gerrit.wikimedia.org/r/280817 (owner: 10Muehlenhoff) [16:55:39] 6Operations, 10ops-codfw, 13Patch-For-Review: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2169612 (10fgiunchedi) small correction on the ~850GB figure I gave there: here's the current usage ```lines=10 graphite1001:/var/lib/carbon/whisper$ du -hcs * | grep G | sort -rn 894G... [17:01:56] (03PS1) 10ArielGlenn: dumps: move dump-related templates to templates/dumps [puppet] - 10https://gerrit.wikimedia.org/r/280946 [17:04:08] (03PS7) 10Ottomata: [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [17:05:51] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) (owner: 10Ottomata) [17:06:41] RECOVERY - puppet last run on cp3037 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [17:07:08] (03PS8) 10Ottomata: [WIP] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [17:07:15] (03PS2) 10Dzahn: Add ferm rules for torrus [puppet] - 10https://gerrit.wikimedia.org/r/280818 (owner: 10Muehlenhoff) [17:07:30] (03CR) 10Dzahn: [C: 032] Add ferm rules for torrus [puppet] - 10https://gerrit.wikimedia.org/r/280818 (owner: 10Muehlenhoff) [17:08:47] 6Operations, 13Patch-For-Review: Ferm rules for netmon1001 - https://phabricator.wikimedia.org/T105410#2169641 (10Dzahn) merged additional changes by @Muehlenhoff https://gerrit.wikimedia.org/r/#/c/280817/ https://gerrit.wikimedia.org/r/#/c/280818/1 [17:09:58] (03PS1) 10Jcrespo: [WIP]Sharing draft of automatization framework to share mediawiki parsing [software] - 10https://gerrit.wikimedia.org/r/280947 [17:11:03] (03PS2) 10ArielGlenn: dumps: move dump-related templates to templates/dumps [puppet] - 10https://gerrit.wikimedia.org/r/280946 [17:11:29] 6Operations, 10ops-codfw, 13Patch-For-Review: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2151480 (10RobH) [17:12:01] !log starting mobileapps deploy (retry of yesterdays which did not complete) [17:12:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:12:41] (03CR) 10ArielGlenn: [C: 032] dumps: move dump-related templates to templates/dumps [puppet] - 10https://gerrit.wikimedia.org/r/280946 (owner: 10ArielGlenn) [17:16:14] 6Operations, 10Ops-Access-Requests: Grant reedy access to librenms - https://phabricator.wikimedia.org/T131252#2161064 (10RobH) I see an option to add users via the webgui. It is my understanding that anyone who can login can also add other users. I haven't tried, as I am not sure of the approval's process f... [17:21:37] (03CR) 10Ottomata: "Minor nit:" [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [17:26:33] hey, ops: before TimStarling went to sleep he told me: "(02:25:18 AM) TimStarling: get someone to give you root access to the labs instance if you don't have it already" wrt to the telnet.wmflabs.org machine. [17:26:59] how would i go about doing that? what's the interface for access control to labs instances (which I presume this is?) [17:27:26] cscott: it means being an admin in that project [17:27:31] cscott: i think i can do that.. hold on [17:28:11] I can probably do it too [17:28:12] the interface is wikitech.wikimedia.org [17:29:14] Shows no instances for telnet... [17:29:57] yes.. eh.. where is it then [17:30:03] i know he said he created a new project [17:30:13] and the instance "telnet2" [17:30:37] https://wikitech.wikimedia.org/wiki/Nova_Resource:Telnet2.telnet.eqiad.wmflabs [17:30:49] (03PS1) 10ArielGlenn: dumps: update all comments in files/templates with puppet location [puppet] - 10https://gerrit.wikimedia.org/r/280948 [17:30:54] lets check the wiki RC [17:30:59] ok [17:30:59] 6Operations, 10Ops-Access-Requests: Grant reedy access to librenms - https://phabricator.wikimedia.org/T131252#2161064 (10yuvipanda) > Adding Reedy gets my +1 though. If no one objects by Monday (April 4th), it will have been 3 business days and I wouldn't feel badly adding him then. I think this is good enou... [17:31:46] I've tried logging out and in again, twice [17:31:52] Reedy: eh, and i still dont see the instance.. labs bug? [17:31:56] sigh [17:31:59] 6Operations, 10Ops-Access-Requests: global root access for gilles - https://phabricator.wikimedia.org/T130910#2169709 (10RobH) 5Open>3declined It seems that the outcome of the operations meeting didn't make it onto this task. I'll relay the outcome of the meeting, please don't shoot the messenger. If any... [17:32:06] Yeah, looks like it [17:32:23] YuviPanda is project admin already [17:32:57] i see. on https://wikitech.wikimedia.org/wiki/Nova_Resource:Telnet the 'admins' get root and the 'members' get regular login privs? [17:33:06] YuviPanda: could i be an admin, please? [17:33:14] Added cscott to telnet. [17:33:27] Only members? [17:33:27] cscott: I think mutante just added you :) [17:33:29] groovy, let's check w/ the old ssh. [17:33:33] now you are a user.. [17:33:34] everyone gets root by default [17:33:46] admins get to add other people / create instances / etc [17:33:54] oic. see, i'm learning stuff! [17:33:55] Added Cscott to projectadmin. [17:33:59] now you are an admin [17:34:16] now i just have to find where tim put the actual code [17:34:38] Reedy: we dont see the instance becaues we are not members ourselves, i think [17:34:38] ah, /srv/wikipedia-telnet. that makes sense. [17:34:48] Reedy: we'd have to first add ourselves [17:36:27] tim wants me to use upstart, according to the USE-UPSTART file he left... [17:36:42] so if i start it with `start wikipedia-telnet`, how do I *restart* it? [17:36:53] restart wikipedia-telnet, i guess? [17:37:35] hey, it works! yay, squashing april fool's bugs. [17:39:57] (03CR) 10ArielGlenn: [C: 032] dumps: update all comments in files/templates with puppet location [puppet] - 10https://gerrit.wikimedia.org/r/280948 (owner: 10ArielGlenn) [17:40:43] :) [17:45:54] !log mobileapps deployed 66f8dac [17:46:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:46:03] gwicke: ^ [17:46:26] cscott: of course it works, we already tried it during a GSoC proposal some months ago [17:48:04] cscott: there are some bug in your module when we use strange characters by the way [17:48:21] Dereckson: can you give an example? [17:50:26] Sure, let me find again the #wikipedia-fr log relevant part. [17:50:26] hackernews complains that there's no AAAA RR for telnet.wmflabs.org [17:50:39] mutante: do we have IPv6 routing to labs instances? [17:53:13] Doesn't look like we d [17:53:14] o [17:54:00] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [17:54:11] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [17:55:05] bearND: thanks, looking good! [17:59:50] cscott: 13:46 <+sam> on dirait qu’il encode mal le caractère « ┏ » mmmh, %EF%BF%BD == caractère de remplacement [18:01:20] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:01:22] cscott: a request with one these two UTF 8 characters led to the inability to do further requests in the same session [18:01:31] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:02:10] Dereckson: the angle double quotes? or the character between them (which my IRC client is rendering as an ascii art top-left corner glyph) [18:02:38] the ┏ [18:02:56] is it supposed to look like a top-left-corner ? [18:03:03] it does on my screen too [18:03:19] is it breaking when it's in a title, or when it's in the content of the article? [18:03:37] in a title if I remember well [18:06:20] (03PS1) 10GWicke: RESTBase: Increase log sampling rates [puppet] - 10https://gerrit.wikimedia.org/r/280951 [18:20:41] PROBLEM - puppet last run on mw2168 is CRITICAL: CRITICAL: puppet fail [18:24:36] (03PS9) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [18:25:52] (03CR) 10jenkins-bot: [V: 04-1] Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [18:26:10] (03PS2) 10Reedy: Allow wmf-config/throttle.php to be lenient on ip/IP typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280865 (https://phabricator.wikimedia.org/T131469) (owner: 10Dereckson) [18:29:28] (03PS1) 10Dzahn: install: update MAC address bast3001 from amslvs2 [puppet] - 10https://gerrit.wikimedia.org/r/280955 (https://phabricator.wikimedia.org/T123712) [18:30:55] (03CR) 10Dzahn: [C: 032] "racadm getsysinfo" [puppet] - 10https://gerrit.wikimedia.org/r/280955 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [18:31:07] (03PS27) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [18:33:13] (03PS28) 10Ladsgroup: [WIP] Scap3 deployment configurations for ores [puppet] - 10https://gerrit.wikimedia.org/r/280403 [18:35:27] (03CR) 10Ladsgroup: "PS27 is rebase only" [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [18:44:48] (03PS1) 10Krinkle: contint: Upgrade npm to v2.15.2 [puppet] - 10https://gerrit.wikimedia.org/r/280956 [18:46:28] (03CR) 10Krinkle: [C: 031] "Cherry-picked to integration-puppetmaster.integration.eqiad.wmflabs." [puppet] - 10https://gerrit.wikimedia.org/r/280956 (owner: 10Krinkle) [18:50:39] RECOVERY - puppet last run on mw2168 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:54:17] (03PS10) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [18:55:39] (03CR) 10jenkins-bot: [V: 04-1] Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [18:56:56] (03CR) 10ArielGlenn: "WHile this patch doesn't have this issue, have folks thought about how to move scap.cfg out of their software repos and into puppet? That " [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [18:58:47] hehehe [18:58:49] hey apergos [18:58:55] hey [18:59:05] I don't mean to derail you, that's why I just left a drive-by comment [18:59:14] https://gerrit.wikimedia.org/r/#/c/279198/ [18:59:19] I'm trying to figure out how to do this "right" for dumps right at this minute [18:59:19] so [18:59:20] oops [18:59:21] wrong one [18:59:22] heh [18:59:28] https://gerrit.wikimedia.org/r/#/c/280730/8 [18:59:29] that one! [18:59:47] check out https://gerrit.wikimedia.org/r/#/c/280730/8/modules/scap/manifests/source.pp [19:00:14] hm yep I will [19:01:21] got more patches coming in on that [19:02:40] uh huh [19:02:43] (03PS9) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [19:02:45] (03PS11) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [19:02:45] basically [19:03:10] you'd add stuff like this [19:03:10] https://gerrit.wikimedia.org/r/#/c/280730/9/hieradata/role/common/deployment/server.yaml [19:03:17] and if I need a given user to own the repo at the target end (and and the local end since that user will also be the deploy user) how does that work? [19:03:35] yeah I've been looking at the hiera changes especially [19:03:50] the key holder stuff seems like the obvious thing to move into hiera regardless [19:03:51] apergos: that works as usual with scap::target [19:03:54] deploy_user [19:04:01] (03CR) 10jenkins-bot: [V: 04-1] Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [19:04:04] (03PS1) 10Dzahn: install: update MAC for bast3001 from slauerhoff [puppet] - 10https://gerrit.wikimedia.org/r/280959 (https://phabricator.wikimedia.org/T123712) [19:04:06] (03PS10) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [19:04:08] (03PS9) 10Reedy: Remove $wgCopyrightIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [19:04:09] yes but on the target end I can include a decl for th euser [19:04:13] on the source end? [19:04:15] (03CR) 10jenkins-bot: [V: 04-1] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) (owner: 10Ottomata) [19:04:24] apergos: not sure i understand [19:04:27] what you are asking [19:04:28] well on tin [19:04:37] /srv/deployment/dumps/dumps [19:04:41] owned by [19:04:44] ?? [19:04:45] apergos: actually, can we chat in releng? i'm talking with those guys in there [19:04:49] ah sorry [19:04:50] sure [19:04:56] (03CR) 10Dzahn: [C: 032] install: update MAC for bast3001 from slauerhoff [puppet] - 10https://gerrit.wikimedia.org/r/280959 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [19:05:38] (03CR) 10jenkins-bot: [V: 04-1] Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) (owner: 10Ottomata) [19:06:11] 6Operations, 10Continuous-Integration-Infrastructure, 6Labs, 10Packaging: Update phantomjs to 2.1.1 on trusty - https://phabricator.wikimedia.org/T130940#2170134 (10Krinkle) 5Open>3declined There is no need for this. CI packages should be installed locally on a per-needed basis. This is just the need o... [19:06:51] (03PS10) 10Reedy: Remove $wgCopyrightIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [19:08:05] (03CR) 10Reedy: [C: 032] Remove $wgCopyrightIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [19:08:42] (03Merged) 10jenkins-bot: Remove $wgCopyrightIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [19:09:21] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [19:09:30] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [19:09:33] !log reedy@tin Synchronized wmf-config/InitialiseSettings-labs.php: consistency (duration: 00m 28s) [19:09:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:10:31] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Replace old copyright image config (duration: 00m 32s) [19:10:33] Reedy: there is also https://gerrit.wikimedia.org/r/#/c/279934/ as a labs change waiting, but there is an unanswered question in Gerrit comments [19:10:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:10:54] !log booting slauerhoff into PXE, installing as new bast3001 [19:11:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:12:38] Dereckson: I'll ask FlorianSW__ as he's sat next to me :P [19:13:37] jdlrobson said that we talk about that, but he didn't so far :P Let me find him and try to give him a kick (@jdlrobson: ;)) [19:13:51] Dereckson, Reedy ^ [19:13:53] hhheeyy [19:14:39] oh FlorianSW__ Dereckson Reedy it's beta labs only so no problem enabling [19:14:50] i just don't want us enabling on production [19:14:55] jdlrobson: Famous last words :D [19:15:12] (03CR) 10Jdlrobson: [C: 031] "provided we do not roll this out to production just yet. i think this is fine." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279934 (https://phabricator.wikimedia.org/T113243) (owner: 10Florianschmidtwelzow) [19:15:24] pushing toproduction is up to JonKatz [19:15:47] jdlrobson: sure, that's only beta labs :) [19:15:59] (03PS2) 10Reedy: Enable mobile link preview (popups) on beta labs wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279934 (https://phabricator.wikimedia.org/T113243) (owner: 10Florianschmidtwelzow) [19:16:04] (03CR) 10Reedy: [C: 032] Enable mobile link preview (popups) on beta labs wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279934 (https://phabricator.wikimedia.org/T113243) (owner: 10Florianschmidtwelzow) [19:16:31] btw. I haven't found jdlrobson :/ [19:16:35] (03Merged) 10jenkins-bot: Enable mobile link preview (popups) on beta labs wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279934 (https://phabricator.wikimedia.org/T113243) (owner: 10Florianschmidtwelzow) [19:16:53] thanks Reedy :) [19:17:18] whaats up with jenkins and puppet [19:17:19] ? [19:17:41] 19:05:31 F/usr/bin/env: php: No such file or directory [19:17:56] gj [19:17:59] OH [19:18:01] it might be me [19:18:02] whoops [19:18:26] hmm noooo [19:18:38] oh yes [19:20:03] (03PS12) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [19:20:14] !log reedy@tin Synchronized wmf-config/InitialiseSettings-labs.php: noop change for labs (duration: 00m 29s) [19:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:20:22] 7Puppet, 10Scap3 (Scap3-Adoption-Phase1): move scap3 keyholder configuration to hiera to avoid proliferation of more*::deployment::source classes - https://phabricator.wikimedia.org/T130419#2135532 (10ArielGlenn) https://gerrit.wikimedia.org/r/#/c/279198/ this is in progress. [19:20:43] 6Operations, 13Patch-For-Review: Reimage hooft with jessie and rename to bast3001 - https://phabricator.wikimedia.org/T123712#2170189 (10Dzahn) a:3Dzahn [19:21:12] 6Operations, 10ops-esams: update hostname label on system bast3001, was slauerhoff - https://phabricator.wikimedia.org/T131543#2170195 (10RobH) [19:21:23] (03PS11) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [19:21:25] (03PS13) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [19:28:24] (03PS14) 1020after4: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (https://phabricator.wikimedia.org/T130419) [19:30:01] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 63.33% of data above the critical threshold [5000000.0] [19:31:01] (03CR) 10Ottomata: [C: 031] Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (https://phabricator.wikimedia.org/T130419) (owner: 1020after4) [19:31:15] (03PS12) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [19:31:17] (03CR) 10Lydia Pintscher: [C: 031] "Discussed it with James at the hackathon. I think this is fine. If we run into issues we can adjust it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280003 (owner: 10Jforrester) [19:32:24] (03CR) 10Legoktm: "Are people editing too fast?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280002 (owner: 10Jforrester) [19:45:21] (03CR) 1020after4: [C: 031] "puppetcompiler results http://puppet-compiler.wmflabs.org/2282/" [puppet] - 10https://gerrit.wikimedia.org/r/279198 (https://phabricator.wikimedia.org/T130419) (owner: 1020after4) [19:46:21] (03CR) 10Krinkle: [C: 04-1] "50 per 10 minutes is lower than the rate limit for anons (8 per minute)." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280002 (owner: 10Jforrester) [19:46:59] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [5000000.0] [19:47:25] (03CR) 10Faidon Liambotis: [C: 04-1] update the DNS record for benefactors.wikimedia.org (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/280637 (https://phabricator.wikimedia.org/T130937) (owner: 10Mschon) [19:55:59] PROBLEM - Host ps1-b8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:56:31] (03PS15) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [19:57:21] PROBLEM - Host ps1-c5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:21] PROBLEM - Host ps1-c7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:21] PROBLEM - Host ps1-c3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:21] PROBLEM - Host ps1-b1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:22] PROBLEM - Host ps1-a5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:22] PROBLEM - Host ps1-a3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:22] PROBLEM - Host ps1-a8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:22] PROBLEM - Host ps1-b2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:23] PROBLEM - Host ps1-c2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:23] PROBLEM - Host ps1-d7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:23] PROBLEM - Host ps1-b6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:24] PROBLEM - Host ps1-d2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:34] (03PS16) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [19:57:39] PROBLEM - Host ps1-d3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:39] PROBLEM - Host ps1-a4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:39] PROBLEM - Host ps1-b3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:39] PROBLEM - Host ps1-b4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:39] PROBLEM - Host ps1-c8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:39] PROBLEM - Host ps1-c4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:39] PROBLEM - Host ps1-a1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:40] PROBLEM - Host ps1-a2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:40] PROBLEM - Host ps1-b7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:41] PROBLEM - Host ps1-d8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:41] PROBLEM - Host ps1-d5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:59] PROBLEM - Host ps1-d6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:59] PROBLEM - Host ps1-a7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:57:59] PROBLEM - Host ps1-b5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:58:09] PROBLEM - Host ps1-c1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:58:09] PROBLEM - Host ps1-d1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [19:58:19] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [19:58:20] PROBLEM - Host mr1-eqiad is DOWN: CRITICAL - Network Unreachable (208.80.154.199) [20:00:27] (03PS17) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [20:01:20] RECOVERY - Host ps1-d1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.98 ms [20:01:20] RECOVERY - Host ps1-d3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 4.36 ms [20:01:20] RECOVERY - Host ps1-c8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.24 ms [20:01:20] RECOVERY - Host ps1-b7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.76 ms [20:01:20] RECOVERY - Host ps1-c1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.49 ms [20:01:20] RECOVERY - Host ps1-d8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.49 ms [20:01:20] RECOVERY - Host ps1-b4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.60 ms [20:01:21] RECOVERY - Host ps1-d5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 5.41 ms [20:01:21] RECOVERY - Host ps1-c5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.24 ms [20:01:22] RECOVERY - Host ps1-a7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.22 ms [20:01:22] RECOVERY - Host ps1-c7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.70 ms [20:01:23] RECOVERY - Host ps1-c4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.26 ms [20:01:23] RECOVERY - Host ps1-a3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.38 ms [20:01:24] RECOVERY - Host ps1-c6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.56 ms [20:01:39] RECOVERY - Host ps1-d6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.59 ms [20:01:40] RECOVERY - Host ps1-b1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.57 ms [20:01:40] RECOVERY - Host ps1-d4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 4.28 ms [20:02:19] RECOVERY - Host mr1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.92 ms [20:03:06] what was that [20:03:21] maintenance at eqiad? [20:04:15] (03PS18) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [20:04:20] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 26.40 ms [20:08:17] (03PS13) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [20:14:59] (03PS14) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [20:18:51] (03PS1) 10Hashar: contint: stop installing grunt-cli [puppet] - 10https://gerrit.wikimedia.org/r/280974 (https://phabricator.wikimedia.org/T124474) [20:19:29] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [20:20:31] (03PS2) 10Dzahn: network.pp: new IP bast3001 & drop former hooft SLAAC addr [puppet] - 10https://gerrit.wikimedia.org/r/280506 (https://phabricator.wikimedia.org/T123712) [20:22:08] (03PS3) 10Dzahn: network.pp: new IP bast3001 & drop former hooft SLAAC addr [puppet] - 10https://gerrit.wikimedia.org/r/280506 (https://phabricator.wikimedia.org/T123712) [20:22:13] (03CR) 10Paladox: [C: 031] contint: stop installing grunt-cli [puppet] - 10https://gerrit.wikimedia.org/r/280974 (https://phabricator.wikimedia.org/T124474) (owner: 10Hashar) [20:22:56] (03CR) 10Paladox: [C: 031] contint: Upgrade npm to v2.15.2 [puppet] - 10https://gerrit.wikimedia.org/r/280956 (owner: 10Krinkle) [20:23:49] (03PS4) 10Dzahn: network.pp: new IP bast3001 & drop former hooft SLAAC addr [puppet] - 10https://gerrit.wikimedia.org/r/280506 (https://phabricator.wikimedia.org/T123712) [20:24:55] (03PS5) 10Dzahn: network.pp: new IP bast3001 & drop former hooft SLAAC addr [puppet] - 10https://gerrit.wikimedia.org/r/280506 (https://phabricator.wikimedia.org/T123712) [20:25:46] (03CR) 10Krinkle: [C: 031] contint: stop installing grunt-cli [puppet] - 10https://gerrit.wikimedia.org/r/280974 (https://phabricator.wikimedia.org/T124474) (owner: 10Hashar) [20:26:18] (03CR) 10Dzahn: [C: 032] "bast3001.wikimedia.org has address 91.198.174.112" [puppet] - 10https://gerrit.wikimedia.org/r/280506 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [20:27:37] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/280506 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [20:28:18] (03PS19) 10Ottomata: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [20:31:54] (03PS15) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [20:33:56] (03CR) 10Hashar: "Announced on wikitech-l and QA mailing lists." [puppet] - 10https://gerrit.wikimedia.org/r/280974 (https://phabricator.wikimedia.org/T124474) (owner: 10Hashar) [20:34:01] (03PS3) 10Dzahn: dhcp: update install-server IP for esams subnets [puppet] - 10https://gerrit.wikimedia.org/r/280798 (https://phabricator.wikimedia.org/T123712) [20:35:10] (03CR) 10Dzahn: [C: 032] dhcp: update install-server IP for esams subnets [puppet] - 10https://gerrit.wikimedia.org/r/280798 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [20:38:08] (03PS1) 10Dzahn: site.pp: remove hooft from puppet [puppet] - 10https://gerrit.wikimedia.org/r/281029 (https://phabricator.wikimedia.org/T123712) [20:38:50] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/280798 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [20:39:42] (03PS1) 10Bmansurov: Do not redirect Samsung Smart TV 2015 and newer to mobile [puppet] - 10https://gerrit.wikimedia.org/r/281031 (https://phabricator.wikimedia.org/T127021) [20:39:50] (03CR) 10BryanDavis: [C: 031] RESTBase: Increase log sampling rates [puppet] - 10https://gerrit.wikimedia.org/r/280951 (owner: 10GWicke) [20:41:28] YuviPanda: do you know if hiera lookup in role/common works in labs? [20:41:58] ottomata: nope, 'role/' doesn't work at all in labs unfortunately [20:42:04] ahhhh [20:42:06] ok [20:43:20] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [20:43:37] (03PS1) 10Dr0ptp4kt: Do not redirect Samsung Smart TV 2015 and later to mobile [puppet] - 10https://gerrit.wikimedia.org/r/281032 (https://phabricator.wikimedia.org/T127021) [20:45:04] (03Abandoned) 10Dr0ptp4kt: Do not redirect Samsung Smart TV 2015 and later to mobile [puppet] - 10https://gerrit.wikimedia.org/r/281032 (https://phabricator.wikimedia.org/T127021) (owner: 10Dr0ptp4kt) [20:45:20] (03PS1) 10Andrew Bogott: Keystone totp: Allow for 60 seconds of clock drift [puppet] - 10https://gerrit.wikimedia.org/r/281033 [20:45:28] csteipp, chasemp, can I get some +1s for ^ ? [20:45:45] (03PS16) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [20:46:07] (03CR) 10Rush: [C: 031] "Thanks, this was especially confusing as it was acceptable on wikitech but not horizon when tested" [puppet] - 10https://gerrit.wikimedia.org/r/281033 (owner: 10Andrew Bogott) [20:50:20] (03PS17) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [20:52:16] (03CR) 10CSteipp: [C: 031] "I think that's right. That will give 2*30=60 seconds forward / backwards." [puppet] - 10https://gerrit.wikimedia.org/r/281033 (owner: 10Andrew Bogott) [20:52:58] may I get a couple contint patches merged in please? They got cherry picked on the CI puppet master already. That is to bump the npm version we install and get rid of an old package (grunt-cli) https://gerrit.wikimedia.org/r/#/c/280956/ and then https://gerrit.wikimedia.org/r/#/c/280974/ :-) [20:53:21] none of the class are applied on prod hosts, it is solely fo the CI labs instances [20:53:24] (03PS1) 10Dzahn: ganglia: switch esams aggregator to bast3001 [puppet] - 10https://gerrit.wikimedia.org/r/281035 (https://phabricator.wikimedia.org/T123712) [20:53:44] (03CR) 10Andrew Bogott: [C: 032] Keystone totp: Allow for 60 seconds of clock drift [puppet] - 10https://gerrit.wikimedia.org/r/281033 (owner: 10Andrew Bogott) [20:54:23] hashar: sure man I got ya [20:54:28] (03PS2) 10Rush: contint: Upgrade npm to v2.15.2 [puppet] - 10https://gerrit.wikimedia.org/r/280956 (owner: 10Krinkle) [20:54:29] \O/ [20:54:34] andrewbogott: can you give dpatrick2 some permissions on labtestwiki or whatever it's called plase [20:54:46] chasemp: awesome :} [20:55:03] (03PS2) 10Rush: contint: stop installing grunt-cli [puppet] - 10https://gerrit.wikimedia.org/r/280974 (https://phabricator.wikimedia.org/T124474) (owner: 10Hashar) [20:55:09] (03PS2) 10Yuvipanda: RESTBase: Increase log sampling rates [puppet] - 10https://gerrit.wikimedia.org/r/280951 (owner: 10GWicke) [20:55:13] (03CR) 10Rush: [C: 032 V: 032] contint: Upgrade npm to v2.15.2 [puppet] - 10https://gerrit.wikimedia.org/r/280956 (owner: 10Krinkle) [20:55:17] (03PS2) 10Dzahn: ganglia: switch esams aggregator to bast3001 [puppet] - 10https://gerrit.wikimedia.org/r/281035 (https://phabricator.wikimedia.org/T123712) [20:55:22] (03CR) 10Rush: [C: 032 V: 032] contint: stop installing grunt-cli [puppet] - 10https://gerrit.wikimedia.org/r/280974 (https://phabricator.wikimedia.org/T124474) (owner: 10Hashar) [20:55:37] (03CR) 10Yuvipanda: [C: 032 V: 032] RESTBase: Increase log sampling rates [puppet] - 10https://gerrit.wikimedia.org/r/280951 (owner: 10GWicke) [20:55:43] chasemp: thank you very much! [20:55:47] np [20:55:49] (03PS3) 10Yuvipanda: RESTBase: Increase log sampling rates [puppet] - 10https://gerrit.wikimedia.org/r/280951 (owner: 10GWicke) [20:56:03] (03CR) 10Yuvipanda: [V: 032] RESTBase: Increase log sampling rates [puppet] - 10https://gerrit.wikimedia.org/r/280951 (owner: 10GWicke) [20:56:20] (03PS18) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) [20:57:08] !log bast3001 - sign puppet certs, salt keys [20:57:11] Reedy: maybe… does Darian have root on the cluster in general? [20:57:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:57:31] Or deployment? [20:59:02] yeah he's in deployment andrewbogott and analytics groups [20:59:13] but no root [20:59:17] ok, I think deployment means he has login on labtestweb2001 already [20:59:26] (03CR) 10Ottomata: "Tested in deployment-prep...IT WORKS!" [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) (owner: 10Ottomata) [20:59:42] To give root on labtest* will require a review during the Ops meeting on monday. Maybe open a ticket for that if that's what y'all need. [20:59:50] (03CR) 10GWicke: "> what concerns me, though, is the impact of moving these request-mangles of the hostname above the mobile_redirect code in general." [puppet] - 10https://gerrit.wikimedia.org/r/279564 (https://phabricator.wikimedia.org/T130904) (owner: 10GWicke) [20:59:52] (And, tell me about it so I can raise it in the meeting) [21:00:13] (03CR) 10Ottomata: Add new scap::source define to ease bootstrapping of repositories on deploy servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/280730 (https://phabricator.wikimedia.org/T118772) (owner: 10Ottomata) [21:00:20] andrewbogott: He doesn't need that [21:00:28] andrewbogott: He just needs cloudadmin or whatever on the wiki [21:00:35] So he can test his 2FA fix in .19 [21:00:45] ... [21:01:11] There are lots of things here, none of them are 'cloudadmin' I don't think :) [21:01:20] He needs to edit mediawiki code and the database, right? [21:01:38] db no [21:01:42] mw code, yeah [21:01:52] but he needs the elevated rights on the wiki to test it [21:01:55] ok, so, I think that as a deployer he should have that already, on labtestweb2001.wikimedia.org [21:01:58] let me know if not [21:02:12] ah, right, because cloudadmin requires 2fa... [21:02:18] on-wiki username? [21:03:06] Dpatrick2 [21:03:39] ah, you said that already, sorry [21:03:40] done [21:06:18] heh, thanks [21:12:39] PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors [21:14:01] Krenair: deployment-upload is back up but also horrifying [21:14:23] yep [21:14:41] uhm.. broken icinga config? [21:14:44] recent changes? [21:14:50] Krenair: also when I hit the url mentioned in that bug, it just redirects to commons [21:15:05] you hit / ? [21:15:12] it's supposed to do that. [21:15:19] yep, ok [21:16:01] uhm, looks like it's related to bast3001, on it [21:16:49] (03PS1) 10GWicke: Normalize REST API Accept headers [puppet] - 10https://gerrit.wikimedia.org/r/281042 (https://phabricator.wikimedia.org/T128040) [21:19:44] just needed a puppet run [21:22:58] RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct [21:25:59] !log hooft.esams - stop puppet, stop salt [21:26:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:26:12] (03CR) 10Dzahn: [C: 032] site.pp: remove hooft from puppet [puppet] - 10https://gerrit.wikimedia.org/r/281029 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [21:26:19] (03PS2) 10Dzahn: site.pp: remove hooft from puppet [puppet] - 10https://gerrit.wikimedia.org/r/281029 (https://phabricator.wikimedia.org/T123712) [21:30:04] (03PS3) 10Dzahn: ganglia: switch esams aggregator to bast3001 [puppet] - 10https://gerrit.wikimedia.org/r/281035 (https://phabricator.wikimedia.org/T123712) [21:30:38] PROBLEM - salt-minion processes on hooft is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [21:31:30] ACKNOWLEDGEMENT - salt-minion processes on hooft is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion daniel_zahn is killing me [21:34:11] > daniel_zahn is killing me [21:36:21] !log hooft: stopping ganglia aggregators, remove from icinga/storedconfigclean [21:36:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:37:06] (03CR) 10Dzahn: [C: 032] ganglia: switch esams aggregator to bast3001 [puppet] - 10https://gerrit.wikimedia.org/r/281035 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [21:39:41] (03CR) 10Subramanya Sastry: "How will this work for for 1.2.0 content-type that will be the default version that many clients can handle which has the old profile urls" [puppet] - 10https://gerrit.wikimedia.org/r/281042 (https://phabricator.wikimedia.org/T128040) (owner: 10GWicke) [21:40:47] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 63.33% of data above the critical threshold [5000000.0] [21:44:02] 6Operations, 10Continuous-Integration-Infrastructure, 6Services, 13Patch-For-Review: Package npm 2.14 - https://phabricator.wikimedia.org/T124474#2170614 (10hashar) 5stalled>3Resolved a:3hashar Nodepool images now comes with npm 2.5.12. provisioned by puppet and installed from npmjs.org. ``` $ /usr/... [21:44:18] 6Operations, 10Parsoid, 6Services: Switch Parsoid to Jessie and Node 4.2 - https://phabricator.wikimedia.org/T125017#2170620 (10hashar) [21:44:32] (03CR) 10Krinkle: Remove upload7 references (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280170 (https://phabricator.wikimedia.org/T129586) (owner: 10Reedy) [21:47:34] (03CR) 10Reedy: Remove upload7 references (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280170 (https://phabricator.wikimedia.org/T129586) (owner: 10Reedy) [21:49:33] (03PS4) 10Dzahn: base: add script to generate fingerprints [puppet] - 10https://gerrit.wikimedia.org/r/280757 [21:50:01] (03PS5) 10Dzahn: base: add script to generate fingerprints [puppet] - 10https://gerrit.wikimedia.org/r/280757 [21:50:41] (03CR) 10Dzahn: [C: 032] base: add script to generate fingerprints [puppet] - 10https://gerrit.wikimedia.org/r/280757 (owner: 10Dzahn) [21:57:55] (03CR) 10GWicke: "@Subbu, it won't. Do you think this still matters by the time we roll this out, considering the small differences between 1.2.0 and 1.2.1?" [puppet] - 10https://gerrit.wikimedia.org/r/281042 (https://phabricator.wikimedia.org/T128040) (owner: 10GWicke) [21:59:28] Krinkle: you can now use bast3001 (https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/bast3001.wikimedia.org) [21:59:56] heh, looks like hashar has 'auto-quit at midnight' [22:00:01] not bad [22:08:25] 6Operations, 13Patch-For-Review: Reimage hooft with jessie and rename to bast3001 - https://phabricator.wikimedia.org/T123712#2170685 (10Dzahn) bast3001 is up with jessie and can be used now. ``` +---------+---------+-------------------------------------------------+ | Cipher | Algo | Fingerprint... [22:11:14] mutante: thx [22:11:34] sorry for the delay, it was a fight [22:13:37] !log hooft - shutdown [22:13:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:14:32] 6Operations, 13Patch-For-Review, 7Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2170761 (10Dzahn) [22:18:37] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 62.07% of data above the critical threshold [5000000.0] [22:30:36] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [22:33:10] 6Operations, 10Traffic, 10Wiki-Loves-Monuments-General, 7HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#1798950 (10JeanFred) Ok, so what needs to be done here? cc @Multichill [22:34:02] (03CR) 10Siebrand: [C: 031] Clarifying i18n parameters [dumps/dcat] - 10https://gerrit.wikimedia.org/r/277955 (owner: 10Lokal Profil) [22:39:12] !log rolling restart restbase to pick up https://gerrit.wikimedia.org/r/#/c/280951/ config change [22:39:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:41:22] 6Operations, 10Traffic, 10Wikimedia-Fundraising, 7HTTPS, 13Patch-For-Review: delete links.email.donate.wikimedia.org (and all other email.donate.*?) from DNS - https://phabricator.wikimedia.org/T130414#2170841 (10Dzahn) @CCogdill_WMF Sorry, i didn't get to this yet and now it's Friday again and i will b... [22:42:57] 6Operations, 10Traffic, 10Wikimedia-Fundraising, 7HTTPS, 13Patch-For-Review: delete links.email.donate.wikimedia.org (and all other email.donate.*?) from DNS - https://phabricator.wikimedia.org/T130414#2170843 (10CCogdill_WMF) No problem, thanks for keeping me updated @Dzahn. Have a nice vacation! [22:56:19] 6Operations, 10Traffic, 10Wiki-Loves-Monuments-General, 7HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2170873 (10Dzahn) @JeanFred a WMNL and a WMDE server admin need to work together to fix it [23:01:10] 6Operations: reinstall bast4001 with jessie - https://phabricator.wikimedia.org/T123674#2170880 (10Dzahn) a:5Dzahn>3None [23:02:46] 6Operations: reinstall bast4001 with jessie - https://phabricator.wikimedia.org/T123674#1935454 (10Dzahn) Just giving it to pool while i'm on vacation. If anyone wants to take it please go for it. Otherwise i'll take it back when i get back. ganglia-aggregator works on jessie now (see bast3001) so that should n... [23:04:38] RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 50.00% above the threshold [1000000.0] [23:05:07] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 72.41% of data above the critical threshold [5000000.0] [23:05:08] 6Operations, 13Patch-For-Review: Ferm rules for netmon1001 - https://phabricator.wikimedia.org/T105410#2170883 (10Dzahn) a:5Dzahn>3Muehlenhoff @Muehlenhoff fyi, this was the existing ticket for the netmon rules you also worked on. You said you wanted to check about the snmpwalk part which i was also wonder... [23:08:20] 6Operations: reclaim hooft to spares - https://phabricator.wikimedia.org/T131560#2170895 (10Dzahn) [23:08:54] 6Operations: reclaim hooft to spares - https://phabricator.wikimedia.org/T131560#2170910 (10Dzahn) [23:09:15] 6Operations: reclaim hooft to spares - https://phabricator.wikimedia.org/T131560#2170895 (10Dzahn) another task for wiping ... or is this included? [23:09:17] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 51.72% of data above the critical threshold [5000000.0] [23:11:31] 6Operations: Reimage hooft with jessie and rename to bast3001 - https://phabricator.wikimedia.org/T123712#2170922 (10Dzahn) [23:15:55] 6Operations: Rename 'restricted' group? - https://phabricator.wikimedia.org/T104671#1423684 (10Dzahn) restricted is historic. there used to be just 3 levels. restricted was the lower level, then "mortals" with deployer rights and finally root. There is not much reason to keep it as it. We should go through th... [23:16:57] 6Operations: Rename 'restricted' group? - https://phabricator.wikimedia.org/T104671#2170928 (10Dzahn) and re; bastion access it should definitely either give ALL bastions or NONE (and then we use bastiononly in addition). one of the 2 option, but not some random mix. [23:18:06] 6Operations: create a test for multicast relay - https://phabricator.wikimedia.org/T82038#2170943 (10Dzahn) @Faidon thoughts on this ticket today? [23:18:31] 6Operations, 10netops: create a test for multicast relay - https://phabricator.wikimedia.org/T82038#2170945 (10Dzahn) [23:19:56] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [23:24:30] (03PS1) 10Dzahn: remove hooft's production IPs [dns] - 10https://gerrit.wikimedia.org/r/281056 (https://phabricator.wikimedia.org/T131560) [23:25:28] (03PS2) 10Dzahn: remove hooft's production IPs [dns] - 10https://gerrit.wikimedia.org/r/281056 (https://phabricator.wikimedia.org/T131560) [23:26:03] (03CR) 10Dzahn: [C: 032] remove hooft's production IPs [dns] - 10https://gerrit.wikimedia.org/r/281056 (https://phabricator.wikimedia.org/T131560) (owner: 10Dzahn) [23:30:43] (03PS1) 10Dzahn: rename mgmt interface slauerhoff->bast3001 [dns] - 10https://gerrit.wikimedia.org/r/281057 (https://phabricator.wikimedia.org/T123712) [23:31:35] (03CR) 10Dzahn: [C: 04-2] "slauerhoff became bast3001 instead -> https://gerrit.wikimedia.org/r/#/c/281057/" [dns] - 10https://gerrit.wikimedia.org/r/280641 (owner: 10Faidon Liambotis) [23:32:40] (03CR) 10Dzahn: [C: 032] rename mgmt interface slauerhoff->bast3001 [dns] - 10https://gerrit.wikimedia.org/r/281057 (https://phabricator.wikimedia.org/T123712) (owner: 10Dzahn) [23:35:52] (03PS1) 10Dzahn: remove slauerhoff remnants [dns] - 10https://gerrit.wikimedia.org/r/281058 [23:37:01] (03CR) 10Dzahn: [C: 032] remove slauerhoff remnants [dns] - 10https://gerrit.wikimedia.org/r/281058 (owner: 10Dzahn) [23:39:18] 6Operations: reclaim hooft to spares - https://phabricator.wikimedia.org/T131560#2170978 (10Dzahn) [23:41:40] (03PS1) 10Yuvipanda: mattermost: Kill it! [puppet] - 10https://gerrit.wikimedia.org/r/281059 [23:41:55] (03PS2) 10Yuvipanda: mattermost: Kill it! [puppet] - 10https://gerrit.wikimedia.org/r/281059 [23:42:06] (03CR) 10Yuvipanda: [C: 032 V: 032] mattermost: Kill it! [puppet] - 10https://gerrit.wikimedia.org/r/281059 (owner: 10Yuvipanda) [23:43:04] 6Operations: replace bast3001 with newer hardware - https://phabricator.wikimedia.org/T131562#2170979 (10Dzahn) [23:43:50] !log krinkle@tin Synchronized php-1.27.0-wmf.19/extensions/MobileFrontend/includes/MobileFrontend.hooks.php: T131337 (duration: 00m 38s) [23:43:51] T131337: MobileFrontendSkinHooks::gradeCImageSupport() should not apply to Vector skin output - https://phabricator.wikimedia.org/T131337 [23:43:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:50:56] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 50.00% above the threshold [1000000.0] [23:55:01] (03PS20) 10Greg Grossmeier: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 (https://phabricator.wikimedia.org/T130419) (owner: 1020after4) [23:55:31] 6Operations, 7Graphite: Add labs graphite as a data source to grafana.wikimedia.org - https://phabricator.wikimedia.org/T131431#2171027 (10yuvipanda) 5Open>3Invalid This is already present - you can switch the datasource for each panel by clicking the 'data sources' icon (hard to find!) at the bottom right...