[01:12:29] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/12353/einsteinium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/455248 (owner: 10Dzahn) [01:13:39] (03PS5) 10Dzahn: icinga: move Hiera calls to profile parameters [puppet] - 10https://gerrit.wikimedia.org/r/455248 [01:17:13] (03PS4) 10Dzahn: icinga: add Hieradata for icinga1001, set to passive/disabled [puppet] - 10https://gerrit.wikimedia.org/r/455264 [01:24:14] (03CR) 10Dzahn: [C: 032] icinga: add Hieradata for icinga1001, set to passive/disabled [puppet] - 10https://gerrit.wikimedia.org/r/455264 (owner: 10Dzahn) [01:31:35] (03PS2) 10Dzahn: icinga: move Hiera data from hosts to role [puppet] - 10https://gerrit.wikimedia.org/r/455262 [01:33:08] (03PS32) 10Alex Monk: Packaging stuff and readme [software/certcentral] - 10https://gerrit.wikimedia.org/r/456646 [01:34:12] (03PS3) 10Dzahn: icinga: move Hiera data from hosts to role [puppet] - 10https://gerrit.wikimedia.org/r/455262 [01:38:00] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/12355/" [puppet] - 10https://gerrit.wikimedia.org/r/455262 (owner: 10Dzahn) [01:38:30] (03PS33) 10Alex Monk: Packaging stuff and readme [software/certcentral] - 10https://gerrit.wikimedia.org/r/456646 [01:39:20] (03PS2) 10Dzahn: nagios_common: use libmonitoring-plugin-perl on stretch as libnagios-plugin-perl is deprecated [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [01:40:02] (03CR) 10jerkins-bot: [V: 04-1] nagios_common: use libmonitoring-plugin-perl on stretch as libnagios-plugin-perl is deprecated [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [01:42:52] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.20/includes/resourceloader/: I8e8d3a2cd2cc - T201686 (duration: 01m 31s) [01:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:42:58] T201686: Changes to module dependencies are not propagating to load.php startup manifest - https://phabricator.wikimedia.org/T201686 [01:43:44] (03CR) 10Dzahn: "ok, finally found it. the actual error was hidden behind the unrelated noise. had to click on "full logs" to get everything and then searc" [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [01:44:48] (03PS3) 10Dzahn: nagios_common: use libmonitoring-plugin-perl on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [01:45:29] (03CR) 10jerkins-bot: [V: 04-1] nagios_common: use libmonitoring-plugin-perl on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [01:52:45] (03PS4) 10Dzahn: nagios_common: use libmonitoring-plugin-perl on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [01:53:29] (03CR) 10jerkins-bot: [V: 04-1] nagios_common: use libmonitoring-plugin-perl on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [01:57:33] (03PS34) 10Alex Monk: Packaging stuff and readme [software/certcentral] - 10https://gerrit.wikimedia.org/r/456646 [01:57:35] (03PS4) 10Alex Monk: Add make_account CLI script [software/certcentral] - 10https://gerrit.wikimedia.org/r/457933 [01:58:44] (03CR) 10Dzahn: "well.. now there is another error behind that and that is probably just that the tests for the monitoring module are broken?" [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [02:01:30] (03PS1) 10Legoktm: Set $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458090 (https://phabricator.wikimedia.org/T203296) [02:17:18] (03CR) 10Krinkle: [C: 031] Set $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458090 (https://phabricator.wikimedia.org/T203296) (owner: 10Legoktm) [02:24:29] (03PS35) 10Alex Monk: Packaging stuff and readme [software/certcentral] - 10https://gerrit.wikimedia.org/r/456646 [02:24:31] (03PS5) 10Alex Monk: Add make_account CLI script [software/certcentral] - 10https://gerrit.wikimedia.org/r/457933 [02:33:54] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.19) (duration: 13m 15s) [02:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:49:59] (03PS5) 10Dzahn: nagios_common: use libmonitoring-plugin-perl on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [02:51:12] 10Operations, 10Traffic, 10Wikimedia-Incident: upload.wikimedia.org returns HTTP status code 503 for truncated urls, not 404 - https://phabricator.wikimedia.org/T106517 (10Krinkle) [02:51:52] (03CR) 10Dzahn: "PS5 was a test to check whether this error always happens when this file is touched.. it does not.." [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [02:52:25] (03CR) 10Dzahn: "PS4 was another test to see if it still happens if i slightly refactor the code.. it still does" [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [02:54:57] (03PS6) 10Dzahn: nagios_common: use libmonitoring-plugin-perl on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [02:55:41] (03CR) 10jerkins-bot: [V: 04-1] nagios_common: use libmonitoring-plugin-perl on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [02:59:39] (03CR) 10Dzahn: [V: 032 C: 032] "ok, it's definitely about "os_version" and looks like a new bug unrelated to the content of this change. compiler shows noop.. going ahead" [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [03:01:01] (03CR) 10Dzahn: [V: 032 C: 032] "i did it slightly different than your original approach but this meant it's completely noop" [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [03:04:26] (03PS1) 10Dzahn: site: apply alerting_host role on icinga1001 [puppet] - 10https://gerrit.wikimedia.org/r/458093 (https://phabricator.wikimedia.org/T201344) [03:04:43] (03CR) 10Dzahn: [V: 032 C: 032] "also all noop on tegmen and einsteinium as the changes before" [puppet] - 10https://gerrit.wikimedia.org/r/458062 (https://phabricator.wikimedia.org/T201344) (owner: 10Cwhite) [03:07:56] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.20) (duration: 16m 17s) [03:07:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:18:07] !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Wed Sep 5 03:18:07 UTC 2018 (duration 10m 12s) [03:18:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:18:47] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/12361/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/458093 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [03:21:20] (03CR) 10Dzahn: "see compiler output. gotta double-check the " + notifications_enabled => 1" part. will do this tomorrow.. not tonight" [puppet] - 10https://gerrit.wikimedia.org/r/458093 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [03:29:03] (03CR) 10Dzahn: "puppet code looks ok to me but doesn't mean it can simply be merged without making some migration plan? not sure. the best would be to te" [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [03:30:07] (03CR) 10Dzahn: "let's get back to this" [puppet] - 10https://gerrit.wikimedia.org/r/363548 (https://phabricator.wikimedia.org/T169680) (owner: 10ArielGlenn) [03:30:29] (03PS8) 10Dzahn: datasets: monitor hosts for nfsd cpu usage [puppet] - 10https://gerrit.wikimedia.org/r/363548 (https://phabricator.wikimedia.org/T169680) (owner: 10ArielGlenn) [03:32:40] (03CR) 10Dzahn: "i think i'd prefer not using the arrow syntax to declare dependencies but instead just include the new class somewhere else. and possibly " [puppet] - 10https://gerrit.wikimedia.org/r/363548 (https://phabricator.wikimedia.org/T169680) (owner: 10ArielGlenn) [03:37:45] 10Operations, 10monitoring: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) [03:39:44] 10Operations, 10monitoring: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) [03:39:51] 10Operations, 10monitoring, 10Patch-For-Review: rack/setup/install icinga1001.wikimedia.org - https://phabricator.wikimedia.org/T201344 (10Dzahn) 05Open>03Resolved this ticket should continue on T202782 since the last few merges. i'll call this resolved (the host itself has been provided and is running)... [03:40:18] 10Operations, 10monitoring, 10Patch-For-Review: rack/setup/install icinga1001.wikimedia.org - https://phabricator.wikimedia.org/T201344 (10Dzahn) [03:42:19] 10Operations, 10monitoring: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) Yep, working on it. Some things have been done in T201344#4530884 ff and now it will continue here. Cole is joining me in this effort. [05:06:38] !log Deploy schema change on s7 primary master (db1062) [05:06:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:08:02] (03PS1) 10Marostegui: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458107 [05:10:04] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458107 (owner: 10Marostegui) [05:11:27] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458107 (owner: 10Marostegui) [05:12:33] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458107 (owner: 10Marostegui) [05:12:55] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 01m 09s) [05:12:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:09] !log Deploy schema change on db1077 with replication, this will generate lag on labsdb:s3 [05:13:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:52] (03CR) 10Giuseppe Lavagetto: dnsdisc: add methods for checking if a datacenter can be depooled (038 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/457951 (owner: 10Giuseppe Lavagetto) [06:28:28] (03PS2) 10Giuseppe Lavagetto: dnsdisc: add methods for checking if a datacenter can be depooled [software/spicerack] - 10https://gerrit.wikimedia.org/r/457951 [06:29:30] (03CR) 10jerkins-bot: [V: 04-1] dnsdisc: add methods for checking if a datacenter can be depooled [software/spicerack] - 10https://gerrit.wikimedia.org/r/457951 (owner: 10Giuseppe Lavagetto) [06:31:55] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458110 [06:37:31] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458110 (owner: 10Marostegui) [06:38:50] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458110 (owner: 10Marostegui) [06:40:00] !log Deploy schema change on s3 master (db1075) [06:40:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1077 (duration: 00m 57s) [06:40:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:26] (03CR) 10Muehlenhoff: "I think we should gather some stats first. I suspect that SSL capabilities of mail servers are not as advanced as with httpds (e.g. due to" [puppet] - 10https://gerrit.wikimedia.org/r/458061 (https://phabricator.wikimedia.org/T203260) (owner: 10Herron) [06:50:29] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458110 (owner: 10Marostegui) [06:53:58] (03PS2) 10Elukey: oozie: change smtp_host to localhost [puppet] - 10https://gerrit.wikimedia.org/r/441132 (https://phabricator.wikimedia.org/T196920) (owner: 10Herron) [06:57:53] !log installing java security updates on aqs* [06:57:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:01] (03PS3) 10Giuseppe Lavagetto: dnsdisc: add methods for checking if a datacenter can be depooled [software/spicerack] - 10https://gerrit.wikimedia.org/r/457951 [06:59:36] (03CR) 10Giuseppe Lavagetto: sre.switchdc.services: Add __init__ for the recipe (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/457873 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:01:14] (03PS3) 10Giuseppe Lavagetto: sre.switchdc.services: Add __init__ for the recipe [cookbooks] - 10https://gerrit.wikimedia.org/r/457873 (https://phabricator.wikimedia.org/T199079) [07:01:16] (03PS3) 10Giuseppe Lavagetto: sre.switchdc.services: Add phase 0 [cookbooks] - 10https://gerrit.wikimedia.org/r/457874 (https://phabricator.wikimedia.org/T199079) [07:01:27] (03PS3) 10Giuseppe Lavagetto: sre.switchdc.services: add phase 1 [cookbooks] - 10https://gerrit.wikimedia.org/r/457875 (https://phabricator.wikimedia.org/T199079) [07:01:27] (03PS3) 10Giuseppe Lavagetto: sre.switchdc.services: add phase 2 [cookbooks] - 10https://gerrit.wikimedia.org/r/457876 (https://phabricator.wikimedia.org/T199079) [07:02:13] !log restart oozie on analytics1003 to pick up new smtp settings [07:02:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:02:17] herron: --^ :) [07:03:51] (03CR) 10Giuseppe Lavagetto: [C: 031] spicerack: add redis sessions configuration [puppet] - 10https://gerrit.wikimedia.org/r/457836 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:04:46] (03PS1) 10Volans: cookbook: split main() into setup() and run() [software/spicerack] - 10https://gerrit.wikimedia.org/r/458115 (https://phabricator.wikimedia.org/T199079) [07:05:20] (03PS5) 10Volans: spicerack: add redis sessions configuration [puppet] - 10https://gerrit.wikimedia.org/r/457836 (https://phabricator.wikimedia.org/T199079) [07:06:13] (03CR) 10Volans: [C: 032] spicerack: add redis sessions configuration [puppet] - 10https://gerrit.wikimedia.org/r/457836 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:13:44] (03CR) 10Volans: [C: 032] dnsdisc: add methods for checking if a datacenter can be depooled (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/457951 (owner: 10Giuseppe Lavagetto) [07:14:20] (03CR) 10Giuseppe Lavagetto: [C: 031] sre.switchdc.mediawiki: add Phase 3 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456510 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:14:50] (03Merged) 10jenkins-bot: dnsdisc: add methods for checking if a datacenter can be depooled [software/spicerack] - 10https://gerrit.wikimedia.org/r/457951 (owner: 10Giuseppe Lavagetto) [07:14:57] (03CR) 10Volans: "We can perfectly decide to merge this after the switchover" [software/spicerack] - 10https://gerrit.wikimedia.org/r/458115 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:16:53] (03PS6) 10Volans: sre.switchdc.mediawiki: add Phase 3 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456510 (https://phabricator.wikimedia.org/T199079) [07:17:39] (03CR) 10Volans: [C: 032] sre.switchdc.mediawiki: add Phase 3 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456510 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:18:18] (03Merged) 10jenkins-bot: sre.switchdc.mediawiki: add Phase 3 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456510 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:19:14] (03CR) 10Volans: sre.switchdc.services: Add __init__ for the recipe (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/457873 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:19:30] (03PS4) 10Volans: sre.switchdc.services: Add __init__ for the recipe [cookbooks] - 10https://gerrit.wikimedia.org/r/457873 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:20:25] (03CR) 10Volans: [C: 032] sre.switchdc.services: Add __init__ for the recipe [cookbooks] - 10https://gerrit.wikimedia.org/r/457873 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:21:03] (03Merged) 10jenkins-bot: sre.switchdc.services: Add __init__ for the recipe [cookbooks] - 10https://gerrit.wikimedia.org/r/457873 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:21:53] (03CR) 10Volans: sre.switchdc.services: Add phase 0 (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/457874 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:21:57] (03CR) 10Giuseppe Lavagetto: "I didn't check every phase for other potentially disruptive changes that would need the live test, but I already have one comment," (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:21:59] (03PS4) 10Volans: sre.switchdc.services: Add phase 0 [cookbooks] - 10https://gerrit.wikimedia.org/r/457874 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:22:36] (03CR) 10Giuseppe Lavagetto: [C: 031] "From my checks, running live_test should /not/ disrupt anything. Please don't prove me wrong." [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:23:36] !log kartik@deploy1001 Started deploy [cxserver/deploy@f341eec]: Update cxserver to 81d1a97 (T202933, T202283, T189438) [07:23:39] (03CR) 10Volans: [C: 032] sre.switchdc.services: Add phase 0 [cookbooks] - 10https://gerrit.wikimedia.org/r/457874 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:23:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:23:44] T202933: cxserver: Update packages with security vulnerabilities identified by npm audit - https://phabricator.wikimedia.org/T202933 [07:23:44] T202283: CX2: Big sections are untranslatable - https://phabricator.wikimedia.org/T202283 [07:23:45] T189438: TitleError: title-invalid-characters - https://phabricator.wikimedia.org/T189438 [07:24:09] !log rebooting video scalers/job runners in codfw for kernel security update [07:24:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:19] (03Merged) 10jenkins-bot: sre.switchdc.services: Add phase 0 [cookbooks] - 10https://gerrit.wikimedia.org/r/457874 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:24:29] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Since in phase zero we have the warmup script running, which takes between 8 and 15 minutes to run, I think it's safe to assume the ttl wi" [cookbooks] - 10https://gerrit.wikimedia.org/r/457936 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:26:07] (03CR) 10Hashar: "Thanks :]" (033 comments) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/398462 (owner: 10Hashar) [07:26:15] (03PS4) 10Hashar: Generate documentation with Sphinx [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/398462 [07:27:38] !log kartik@deploy1001 Finished deploy [cxserver/deploy@f341eec]: Update cxserver to 81d1a97 (T202933, T202283, T189438) (duration: 04m 03s) [07:27:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:29:11] (03PS4) 10Volans: sre.switchdc.services: add phase 1 [cookbooks] - 10https://gerrit.wikimedia.org/r/457875 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:29:31] <_joe_> g [07:31:10] (03CR) 10Volans: [C: 032] sre.switchdc.services: add phase 1 (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/457875 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:31:51] (03Merged) 10jenkins-bot: sre.switchdc.services: add phase 1 [cookbooks] - 10https://gerrit.wikimedia.org/r/457875 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:32:24] (03PS4) 10Volans: sre.switchdc.services: add phase 2 [cookbooks] - 10https://gerrit.wikimedia.org/r/457876 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:34:30] (03CR) 10Volans: [C: 032] sre.switchdc.services: add phase 2 [cookbooks] - 10https://gerrit.wikimedia.org/r/457876 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:35:12] (03Merged) 10jenkins-bot: sre.switchdc.services: add phase 2 [cookbooks] - 10https://gerrit.wikimedia.org/r/457876 (https://phabricator.wikimedia.org/T199079) (owner: 10Giuseppe Lavagetto) [07:39:55] (03CR) 10Volans: "> Patch Set 1:" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:40:52] (03CR) 10Elukey: [C: 04-2] "I just tested this in deployment-prep, and the code as it is causes a refresh of the memcached unit, basically wiping all our caches every" [puppet] - 10https://gerrit.wikimedia.org/r/456096 (https://phabricator.wikimedia.org/T203429) (owner: 10Elukey) [07:41:01] (03Abandoned) 10Elukey: memcached: enable basic logging with the -v parameter [puppet] - 10https://gerrit.wikimedia.org/r/456096 (https://phabricator.wikimedia.org/T203429) (owner: 10Elukey) [07:42:00] (03CR) 10Volans: "> Patch Set 1: Code-Review-1" [cookbooks] - 10https://gerrit.wikimedia.org/r/457936 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:42:49] (03CR) 10Ema: [C: 032] vhtcpd (0.1.2-1) stretch-wikimedia; urgency=medium [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/457934 (https://phabricator.wikimedia.org/T199720) (owner: 10Ema) [07:48:49] !log re-enable puppet on mc1035 - memcache unit refreshed, mw cache shard wiped - T203429 [07:48:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:54] T203429: Improve memcache logs on mc* hosts - https://phabricator.wikimedia.org/T203429 [07:49:08] !log upload vhtcpd 0.1.2-1 to stretch-wikimedia T199720 [07:49:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:12] T199720: Deploy initial ATS test clusters in core DCs - https://phabricator.wikimedia.org/T199720 [07:50:04] (03CR) 10Gehel: [C: 04-1] cookbook: split main() into setup() and run() (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/458115 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:51:07] 10Operations, 10Wikimedia-General-or-Unknown, 10Patch-For-Review, 10User-Elukey: Improve memcache logs on mc* hosts - https://phabricator.wikimedia.org/T203429 (10elukey) 05Open>03declined The amount of logs found during the night were really low: ``` Sep 04 13:06:21 mc1035 systemd[1]: Started memcach... [08:17:05] (03CR) 10Ema: [V: 032 C: 032] "recheck" [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/457927 (https://phabricator.wikimedia.org/T199720) (owner: 10Ema) [08:17:20] (03CR) 10Ema: [C: 032] "recheck" [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/457934 (https://phabricator.wikimedia.org/T199720) (owner: 10Ema) [08:17:51] (03PS3) 10Banyek: Labs: Make redact_sanitarium.sh file easier to read [puppet] - 10https://gerrit.wikimedia.org/r/457899 [08:18:11] 10Operations, 10Discovery-Search, 10Elasticsearch: Alert when elasticsearch has shards larger than a maximum size - https://phabricator.wikimedia.org/T203546 (10Gehel) [08:18:18] 10Operations, 10Discovery-Search, 10Elasticsearch: Alert when elasticsearch has shards larger than a maximum size - https://phabricator.wikimedia.org/T203546 (10Gehel) p:05Triage>03Normal [08:22:41] !log reboot aqs100[5-9] for kernel + openjdk-8 upgrades [08:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:50] (03PS1) 10Volans: Upstream release v0.0.4 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/458123 (https://phabricator.wikimedia.org/T199079) [08:34:09] (03PS2) 10Giuseppe Lavagetto: conftool: add class for writing to state to file [puppet] - 10https://gerrit.wikimedia.org/r/457490 [08:34:11] (03PS2) 10Giuseppe Lavagetto: realm.pp: drop mw_primary [puppet] - 10https://gerrit.wikimedia.org/r/457491 [08:34:13] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: depend on mediawiki config, not hiera [puppet] - 10https://gerrit.wikimedia.org/r/457492 [08:34:15] (03PS1) 10Giuseppe Lavagetto: profile::openstack::base::frontend: remove config-master [puppet] - 10https://gerrit.wikimedia.org/r/458124 [08:35:04] (03PS1) 10Jcrespo: mariadb: Set s5 section in read-write, codfw should be still in ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458125 (https://phabricator.wikimedia.org/T189107) [08:35:06] (03CR) 10jerkins-bot: [V: 04-1] conftool: add class for writing to state to file [puppet] - 10https://gerrit.wikimedia.org/r/457490 (owner: 10Giuseppe Lavagetto) [08:37:35] ^ _joe_ ok to preceed with the test? [08:37:51] !log Drop partitions from db2040 (s7 master) for metawiki.pagelinks - T203548 [08:37:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:57] T203548: Remove partitions from s7 masters (db1062 and db2040) for metawiki.pagelinks - https://phabricator.wikimedia.org/T203548 [08:38:26] (03CR) 10Jcrespo: [C: 032] mariadb: Set s5 section in read-write, codfw should be still in ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458125 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [08:38:49] (03CR) 10Volans: [C: 032] Upstream release v0.0.4 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/458123 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:39:55] (03Merged) 10jenkins-bot: mariadb: Set s5 section in read-write, codfw should be still in ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458125 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [08:39:57] (03Merged) 10jenkins-bot: Upstream release v0.0.4 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/458123 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:42:08] (03CR) 10jenkins-bot: mariadb: Set s5 section in read-write, codfw should be still in ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458125 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [08:44:10] !log jynus@deploy1001 Synchronized wmf-config/db-codfw.php: Set s5 section in read-write, codfw should be still in ro (duration: 00m 58s) [08:44:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:05] !log oblivian@puppetmaster1001 conftool action : edit; selector: name=ReadOnly,scope=codfw [08:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:41] !log oblivian@puppetmaster1001 conftool action : edit; selector: name=ReadOnly,scope=codfw [08:52:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:39] (03PS1) 10Jcrespo: mariadb: Set all codfw sections as read-only, codfw is still in ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458128 (https://phabricator.wikimedia.org/T189107) [09:02:25] (03CR) 10Jcrespo: "Looks ok, did you test nothing broke with the latest changes?" [puppet] - 10https://gerrit.wikimedia.org/r/457899 (owner: 10Banyek) [09:07:27] (03CR) 10Giuseppe Lavagetto: [C: 031] "https://puppet-compiler.wmflabs.org/compiler02/12365/ correctly removes the useless confd-based files from the puppetmasters." [puppet] - 10https://gerrit.wikimedia.org/r/458124 (owner: 10Giuseppe Lavagetto) [09:07:38] !log rebooting druid1001 for kernel + openjdk-8 upgrades [09:07:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:52] (03CR) 10Gehel: "Puppet compiler looks reasonable: https://puppet-compiler.wmflabs.org/compiler03/12364/" [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [09:08:51] (03CR) 10Jcrespo: [C: 032] mariadb: Set all codfw sections as read-only, codfw is still in ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458128 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [09:10:09] (03Merged) 10jenkins-bot: mariadb: Set all codfw sections as read-only, codfw is still in ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458128 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [09:11:58] (03CR) 10Banyek: "I did a dry-run test" [puppet] - 10https://gerrit.wikimedia.org/r/457899 (owner: 10Banyek) [09:12:13] !log jynus@deploy1001 Synchronized wmf-config/db-codfw.php: Set all individual codfw sections in read-write, codfw globally still in ro (duration: 00m 56s) [09:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:13] !log uploaded spicerack_0.0.4-1{,+deb9u1} to apt.wikimedia.org {jessie,stretch}-wikimedia - T199079 [09:16:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:19] T199079: Refactor the switchdc script - https://phabricator.wikimedia.org/T199079 [09:26:28] (03PS1) 10Volans: sre.switchdc.services: rename cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/458131 (https://phabricator.wikimedia.org/T199079) [09:26:59] (03PS2) 10Volans: sre.switchdc.mediawiki: add --live-test option [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) [09:27:33] (03CR) 10Volans: [C: 032] sre.switchdc.services: rename cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/458131 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:28:14] (03Merged) 10jenkins-bot: sre.switchdc.services: rename cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/458131 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:28:34] (03CR) 10jenkins-bot: mariadb: Set all codfw sections as read-only, codfw is still in ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458128 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [09:33:49] (03PS1) 10Jcrespo: mariadb: Repool db1114 but with low api load due to ongoing errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458133 (https://phabricator.wikimedia.org/T121333) [09:38:48] (03Abandoned) 10Jcrespo: Revert "mariadb: Depool db1114 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457868 (owner: 10Jcrespo) [09:39:40] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1114 but with low api load due to ongoing errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458133 (https://phabricator.wikimedia.org/T121333) (owner: 10Jcrespo) [09:40:04] (03PS2) 10Jcrespo: mariadb: Fix DB configuration in preparation for dc switchover [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457847 (https://phabricator.wikimedia.org/T189107) [09:41:12] (03Merged) 10jenkins-bot: mariadb: Repool db1114 but with low api load due to ongoing errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458133 (https://phabricator.wikimedia.org/T121333) (owner: 10Jcrespo) [09:43:14] (03PS3) 10Volans: sre.switchdc.mediawiki: add --live-test option [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) [09:43:38] (03CR) 10Volans: "done" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:45:19] (03CR) 10Volans: [C: 032] sre.switchdc.mediawiki: add --live-test option [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:46:01] (03Merged) 10jenkins-bot: sre.switchdc.mediawiki: add --live-test option [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:47:43] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1114 but with small api load due to ongoing issues (duration: 00m 56s) [09:47:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:31] we just had some lag issues from mwdebug [09:48:45] that is weird, those hosts are barely used [09:49:05] in codfw? [09:49:05] but we now have 100 errors from those [09:49:15] also on mwdebug1001 [09:49:23] but also on codfw [09:49:37] s7 codfw is lagging due to T203548 [09:49:38] T203548: Remove partitions from s7 masters (db1062 and db2040) for metawiki.pagelinks - https://phabricator.wikimedia.org/T203548 [09:50:06] that is ok, I can see sometimes mw2 but they are small errors due to monitoring [09:50:09] this is a spike [09:50:30] https://logstash.wikimedia.org/goto/e2f2460faf70b215ba94884e10db0a6c [09:51:10] I think mostly on s6 [09:52:06] (03PS1) 10Muehlenhoff: Add library hint for bind9 [puppet] - 10https://gerrit.wikimedia.org/r/458137 [09:52:57] !log installing java security updates on druid* [09:53:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:32] (03CR) 10Alexandros Kosiaris: [C: 032] Test switchover of the deployment server [puppet] - 10https://gerrit.wikimedia.org/r/457867 (owner: 10Alexandros Kosiaris) [09:56:09] heads up, I am doing a switchover of the deployment server [09:56:18] (03PS2) 10Alexandros Kosiaris: Test switchover of the deployment server [puppet] - 10https://gerrit.wikimedia.org/r/457867 [09:56:20] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Test switchover of the deployment server [puppet] - 10https://gerrit.wikimedia.org/r/457867 (owner: 10Alexandros Kosiaris) [09:57:32] (03PS16) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) [09:57:34] (03PS2) 10Muehlenhoff: Add library hint for bind9 [puppet] - 10https://gerrit.wikimedia.org/r/458137 [09:57:38] (03CR) 10jerkins-bot: [V: 04-1] Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [09:58:15] (03CR) 10Mathew.onipe: Elasticsearch module is coming up. (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [09:59:42] !log jmm@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=mw2213.codfw.wmnet [09:59:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:03] !log upgraded spicerack to version 0.0.4 on sarin/neodymium - T199079 [10:00:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:08] T199079: Refactor the switchdc script - https://phabricator.wikimedia.org/T199079 [10:00:15] (03CR) 10Muehlenhoff: [C: 032] Add library hint for bind9 [puppet] - 10https://gerrit.wikimedia.org/r/458137 (owner: 10Muehlenhoff) [10:02:08] (03PS3) 10Jcrespo: mariadb: Fix DB configuration in preparation for dc switchover [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457847 (https://phabricator.wikimedia.org/T189107) [10:04:00] !log switchover the deployment server as a test for the switchover next week [10:04:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:33] !log repair sdh on ms-be1043 - T199198 [10:04:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:39] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [10:06:28] (03PS1) 10Muehlenhoff: Decommission mw2213 [puppet] - 10https://gerrit.wikimedia.org/r/458139 (https://phabricator.wikimedia.org/T203434) [10:06:50] 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review: Decom mw2213 - https://phabricator.wikimedia.org/T203434 (10MoritzMuehlenhoff) [10:08:08] !log akosiaris@deploy2001 Started deploy [servermon/servermon@c474a6b]: (no justification provided) [10:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:52] !log akosiaris@deploy2001 Finished deploy [servermon/servermon@c474a6b]: (no justification provided) (duration: 01m 44s) [10:09:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:29] !log akosiaris@deploy2001 Started deploy [servermon/servermon@c474a6b]: (no justification provided) [10:10:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:57] (03CR) 10Jcrespo: [C: 031] realm.pp: drop mw_primary [puppet] - 10https://gerrit.wikimedia.org/r/457491 (owner: 10Giuseppe Lavagetto) [10:11:30] (03CR) 10Jcrespo: [C: 031] "CC @Marostegui as he had a WIP patch depending on the solution to this." [puppet] - 10https://gerrit.wikimedia.org/r/457491 (owner: 10Giuseppe Lavagetto) [10:12:09] !log akosiaris@deploy2001 Finished deploy [servermon/servermon@c474a6b]: (no justification provided) (duration: 01m 39s) [10:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:37] (03CR) 10Marostegui: "> CC @Marostegui as he had a WIP patch depending on the solution to" [puppet] - 10https://gerrit.wikimedia.org/r/457491 (owner: 10Giuseppe Lavagetto) [10:13:15] (03CR) 10Jcrespo: [C: 031] "450228 will also have to changed slightly after this is deployed." [puppet] - 10https://gerrit.wikimedia.org/r/457491 (owner: 10Giuseppe Lavagetto) [10:15:37] !log akosiaris@deploy2001 Started deploy [servermon/servermon@c474a6b]: (no justification provided) [10:15:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:08] (03CR) 10jenkins-bot: mariadb: Repool db1114 but with low api load due to ongoing errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458133 (https://phabricator.wikimedia.org/T121333) (owner: 10Jcrespo) [10:16:27] (03CR) 10Marostegui: "> 450228 will also have to changed slightly after this is deployed." [puppet] - 10https://gerrit.wikimedia.org/r/457491 (owner: 10Giuseppe Lavagetto) [10:16:30] !log akosiaris@deploy2001 Finished deploy [servermon/servermon@c474a6b]: (no justification provided) (duration: 00m 52s) [10:16:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:19] !log installing bind9 security updates (client-side tools and libraries) [10:20:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:24] !log akosiaris@deploy2001 deploy aborted: (no justification provided) (duration: 00m 00s) [10:21:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:36] !log akosiaris@deploy2001 Started deploy [servermon/servermon@c474a6b]: (no justification provided) [10:21:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:40] !log akosiaris@deploy2001 Finished deploy [servermon/servermon@c474a6b]: (no justification provided) (duration: 00m 04s) [10:21:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:01] !log akosiaris@deploy2001 Started deploy [servermon/servermon@c474a6b]: (no justification provided) [10:22:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:51] !log akosiaris@deploy2001 Finished deploy [servermon/servermon@c474a6b]: (no justification provided) (duration: 00m 49s) [10:22:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:18] 10Operations, 10Analytics, 10hardware-requests: eqiad: (2) hardware refresh for analytics1003 - https://phabricator.wikimedia.org/T198685 (10elukey) @RobH quick ping to check if we can get this hardware before the end of quarter, to schedule Hadoop maintenance ops in one go (since we have to shutdown the who... [10:24:23] (03CR) 10Jcrespo: "Nothing against this- but I would like to test it works as expected on a trivial production class rather than on the mariadb::core class f" [puppet] - 10https://gerrit.wikimedia.org/r/457490 (owner: 10Giuseppe Lavagetto) [10:24:47] (03CR) 10Jcrespo: [C: 031] profile::mediawiki::maintenance: depend on mediawiki config, not hiera [puppet] - 10https://gerrit.wikimedia.org/r/457492 (owner: 10Giuseppe Lavagetto) [10:24:47] !log akosiaris@deploy2001 Started deploy [servermon/servermon@c474a6b]: (no justification provided) [10:24:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:51] !log akosiaris@deploy2001 Finished deploy [servermon/servermon@c474a6b]: (no justification provided) (duration: 00m 04s) [10:24:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:34] !log akosiaris@deploy2001 Started deploy [servermon/servermon@c474a6b]: (no justification provided) [10:28:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:48] !log akosiaris@deploy2001 Finished deploy [servermon/servermon@c474a6b]: (no justification provided) (duration: 00m 14s) [10:28:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:18] (03PS17) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) [10:33:11] !log akosiaris@deploy2001 Started deploy [servermon/servermon@c474a6b]: (no justification provided) [10:33:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:17] !log akosiaris@deploy2001 Finished deploy [servermon/servermon@c474a6b]: (no justification provided) (duration: 00m 05s) [10:33:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:59] (03CR) 10Marostegui: [C: 031] "I am fine with this initial configuration (and with going for db1073 for now until we heard back from Clouds Team)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457847 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [10:37:28] (03PS1) 10Alexandros Kosiaris: Add clarifying comment for urldownloader [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458149 [10:37:50] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add clarifying comment for urldownloader [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458149 (owner: 10Alexandros Kosiaris) [10:38:17] (03CR) 10Marostegui: [C: 031] "Also, whatever is trying to write to db2037 now and getting the read-only error, will now succeed as db1073 is RW." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457847 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [10:40:38] (03PS1) 10Volans: sre.switchdc.mediawiki: fix typo in parse args [cookbooks] - 10https://gerrit.wikimedia.org/r/458150 (https://phabricator.wikimedia.org/T199079) [10:43:24] !log Stop MySQL and reboot db2093 for kernel upgrade [10:43:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:31] !log akosiaris@deploy2001 Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 04m 57s) [10:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:48] ok this worked too [10:45:02] (03CR) 10Volans: [C: 032] sre.switchdc.mediawiki: fix typo in parse args [cookbooks] - 10https://gerrit.wikimedia.org/r/458150 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [10:45:43] (03Merged) 10jenkins-bot: sre.switchdc.mediawiki: fix typo in parse args [cookbooks] - 10https://gerrit.wikimedia.org/r/458150 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [10:45:56] (03CR) 10Vgutierrez: [C: 04-1] "nice work, only 1 minor thing to be fixed" (033 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/456646 (owner: 10Alex Monk) [10:48:24] (03PS1) 10Alexandros Kosiaris: Switch the deployment server to deploy2001 [dns] - 10https://gerrit.wikimedia.org/r/458151 [10:51:20] (03CR) 10Alexandros Kosiaris: [C: 032] Switch the deployment server to deploy2001 [dns] - 10https://gerrit.wikimedia.org/r/458151 (owner: 10Alexandros Kosiaris) [10:53:25] (03PS1) 10Giuseppe Lavagetto: sre.switchdc.services: add --dc-to option [cookbooks] - 10https://gerrit.wikimedia.org/r/458153 [10:55:28] volans: should we test the cookbooks ? [10:56:11] akosiaris: sure, we can start dry run things as the swat is coming up [10:56:36] (03PS2) 10Giuseppe Lavagetto: sre.switchdc.services: add --dc-to option [cookbooks] - 10https://gerrit.wikimedia.org/r/458153 [10:57:57] (03CR) 10Volans: [C: 032] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/458153 (owner: 10Giuseppe Lavagetto) [10:58:39] (03Merged) 10jenkins-bot: sre.switchdc.services: add --dc-to option [cookbooks] - 10https://gerrit.wikimedia.org/r/458153 (owner: 10Giuseppe Lavagetto) [11:00:06] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T1100). [11:00:06] No GERRIT patches in the queue for this window AFAICS. [11:00:27] o/ [11:00:40] I like swats with no patches :D [11:03:54] :D [11:04:30] (03CR) 10jenkins-bot: Add clarifying comment for urldownloader [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458149 (owner: 10Alexandros Kosiaris) [11:08:46] !log rebooting deploy1001 for kernel security update [11:08:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:28] (03PS1) 10Volans: sre.switchdc.mediawiki: fix parse args validation [cookbooks] - 10https://gerrit.wikimedia.org/r/458155 (https://phabricator.wikimedia.org/T199079) [11:12:52] !log rearmed keyholder on deploy1001 [11:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:14:10] (03CR) 10Alexandros Kosiaris: [C: 031] sre.switchdc.mediawiki: fix parse args validation [cookbooks] - 10https://gerrit.wikimedia.org/r/458155 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:14:34] (03CR) 10Volans: [C: 032] sre.switchdc.mediawiki: fix parse args validation [cookbooks] - 10https://gerrit.wikimedia.org/r/458155 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:15:17] (03Merged) 10jenkins-bot: sre.switchdc.mediawiki: fix parse args validation [cookbooks] - 10https://gerrit.wikimedia.org/r/458155 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:19:53] zeljkof: we can change that no patches thing ;) [11:20:11] !log rebooting mwmaint2001 for kernel security update [11:20:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:35] Hauskatze: no need ;P [11:21:05] if there are patches that need to be deployed, I'm around, but if it can be delayed until another swat window, please do ;) [11:21:32] I have none a.t.m; waiting for some review first [11:25:25] (03PS1) 10Effie Mouzeli: icinga: Replaced user jijiki with effie mouzeli [puppet] - 10https://gerrit.wikimedia.org/r/458157 (https://phabricator.wikimedia.org/T201816) [11:28:31] 10Operations: syncing Ubuntu mirror fail - https://phabricator.wikimedia.org/T203290 (10faidon) Thanks for tracking that down @Dzahn! [11:29:19] (03CR) 10Jcrespo: [C: 04-1] "I have missing to add the s3 loads here." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457847 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [11:29:31] (03CR) 10Gehel: Elasticsearch module is coming up. (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) (owner: 10Mathew.onipe) [11:38:34] (03PS1) 10Volans: sre.switchdc.mediawiki: phase 0 fix typo [cookbooks] - 10https://gerrit.wikimedia.org/r/458161 (https://phabricator.wikimedia.org/T199079) [11:40:07] (03CR) 10Alexandros Kosiaris: [C: 031] sre.switchdc.mediawiki: phase 0 fix typo [cookbooks] - 10https://gerrit.wikimedia.org/r/458161 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:40:18] (03CR) 10Volans: [C: 032] sre.switchdc.mediawiki: phase 0 fix typo [cookbooks] - 10https://gerrit.wikimedia.org/r/458161 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:40:25] (03CR) 10Effie Mouzeli: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/12366/" [puppet] - 10https://gerrit.wikimedia.org/r/458157 (https://phabricator.wikimedia.org/T201816) (owner: 10Effie Mouzeli) [11:41:00] (03Merged) 10jenkins-bot: sre.switchdc.mediawiki: phase 0 fix typo [cookbooks] - 10https://gerrit.wikimedia.org/r/458161 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T1200) [12:09:23] (03PS36) 10Alex Monk: Packaging stuff and readme [software/certcentral] - 10https://gerrit.wikimedia.org/r/456646 [12:17:59] (03CR) 10Alexandros Kosiaris: [C: 031] "minor inline comment, rest LGTM" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:21:51] (03PS32) 10Gehel: Convert elasticsearch to systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [12:22:34] (03CR) 10jerkins-bot: [V: 04-1] Convert elasticsearch to systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [12:25:16] (03CR) 10Gehel: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [12:25:55] (03CR) 10jerkins-bot: [V: 04-1] Convert elasticsearch to systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [12:29:15] !log Starting extensions/PageTriage/maintenance/FixNominatedForDeletion.php --wiki testwiki [12:29:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:45] !log Finished extensions/PageTriage/maintenance/FixNominatedForDeletion.php --wiki testwiki [12:29:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:51] (03PS1) 10Ema: ATS: status code 15 means successful exit [puppet] - 10https://gerrit.wikimedia.org/r/458168 (https://phabricator.wikimedia.org/T199720) [12:48:09] (03CR) 10Ema: [C: 032] ATS: status code 15 means successful exit [puppet] - 10https://gerrit.wikimedia.org/r/458168 (https://phabricator.wikimedia.org/T199720) (owner: 10Ema) [12:48:34] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Ladsgroup) We should note that hive is behind NDA and production access which only most staff and handful... [12:58:06] 10Operations, 10LDAP-Access-Requests: Remove user "albe" from the wmde LDAP group - https://phabricator.wikimedia.org/T203561 (10WMDE-leszek) [12:58:13] (03CR) 10Elukey: [C: 031] Decommission mw2213 [puppet] - 10https://gerrit.wikimedia.org/r/458139 (https://phabricator.wikimedia.org/T203434) (owner: 10Muehlenhoff) [13:00:04] hashar: I, the Bot under the Fountain, allow thee, The Deployer, to do MediaWiki train - European version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T1300). [13:02:29] o/ [13:02:38] (03PS1) 10Hashar: group1 wikis to 1.32.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458169 [13:02:40] (03CR) 10Hashar: [C: 032] group1 wikis to 1.32.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458169 (owner: 10Hashar) [13:04:17] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458169 (owner: 10Hashar) [13:07:13] arghh [13:07:46] !log hashar@deploy2001 rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.20 [13:08:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:53] !log hashar@deploy2001 Synchronized php: group1 wikis to 1.32.0-wmf.20 (duration: 01m 07s) [13:08:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:30] <_joe_> !log depooling /repooling restbase, mathoid in codfw for switchover pre-flight testing [13:09:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:43] akosiaris: train worked from codfw deploy host :] [13:10:14] !log installing php5 security updates on jessie [13:10:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:26] <_joe_> ok, the dnsdisc part of spicerack works like a charm [13:10:49] <_joe_> spicerack is really an awesome thing, it's even better than switchdc was in terms of abstractions [13:11:02] <_joe_> it's a tool that can really shift how we do things in production [13:13:44] (03CR) 10Giuseppe Lavagetto: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/457490 (owner: 10Giuseppe Lavagetto) [13:14:40] (03CR) 10jerkins-bot: [V: 04-1] conftool: add class for writing to state to file [puppet] - 10https://gerrit.wikimedia.org/r/457490 (owner: 10Giuseppe Lavagetto) [13:15:42] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458169 (owner: 10Hashar) [13:19:46] looks like group1 is behaving properly [13:20:27] !log rebooting tegmen for kernel security update [13:20:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:54] <_joe_> hashar: uh jenkins keeps giving me -1, which is strange given I get the test to pass locally [13:26:43] <_joe_> hashar: doesn't every puppet CI run spin up its own docker container? [13:27:15] _joe_: hi. Yeah more or less [13:27:23] <_joe_> more or less? [13:27:24] some job use several containers [13:27:37] <_joe_> puppet just has one job [13:27:41] but the state should be cleaned at the start of the job (by deleting files in the build workspace) [13:27:55] <_joe_> no, that's not how this is supposed to work [13:28:05] <_joe_> not the puppet container at the very least [13:28:13] <_joe_> and that's why things are failing for me, probably [13:29:07] <_joe_> anyways, I'll find an alternate solution in the meantime [13:29:35] _joe_: can you link to the job / change failling? [13:30:25] <_joe_> https://gerrit.wikimedia.org/r/457490 https://integration.wikimedia.org/ci/job/operations-puppet-tests-docker/27575/console [13:30:32] <_joe_> there is one error that's my syntax error [13:30:42] <_joe_> but there is another, about the role function, that doesn't make sense [13:30:50] <_joe_> I didn't even touch the thing [13:32:39] at least I have the same errors locally :] [13:33:27] (03PS2) 10Giuseppe Lavagetto: profile::openstack::base::frontend: remove config-master [puppet] - 10https://gerrit.wikimedia.org/r/458124 [13:33:29] (03PS3) 10Giuseppe Lavagetto: conftool: add class for writing to state to file [puppet] - 10https://gerrit.wikimedia.org/r/457490 [13:34:57] (03PS1) 10Ottomata: Temporarily removing thorium from netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/458174 (https://phabricator.wikimedia.org/T192641) [13:35:22] (03CR) 10Elukey: [C: 031] Temporarily removing thorium from netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/458174 (https://phabricator.wikimedia.org/T192641) (owner: 10Ottomata) [13:35:47] (03CR) 10jerkins-bot: [V: 04-1] conftool: add class for writing to state to file [puppet] - 10https://gerrit.wikimedia.org/r/457490 (owner: 10Giuseppe Lavagetto) [13:35:54] (03CR) 10Ottomata: [C: 032] Temporarily removing thorium from netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/458174 (https://phabricator.wikimedia.org/T192641) (owner: 10Ottomata) [13:37:06] <_joe_> hashar: how do you run the tests locally? [13:37:20] bundle install && bundle exec rake test [13:37:31] <_joe_> within rbenv? [13:38:04] (03PS2) 10Herron: phabricator: set smtp-host to localhost [puppet] - 10https://gerrit.wikimedia.org/r/440910 (https://phabricator.wikimedia.org/T196916) [13:38:07] and for any individual module you can use spec:module_name_here [13:38:08] eg [13:38:08] bundle exec rake spec:wmflib [13:38:22] I dont bother with rbenv, i just use the local ruby [13:38:27] <_joe_> ok I'm trying to understand why I don't get the same errors [13:38:55] bundle will install the gem somewhere under $HOME, then bundle exec mangle RUBY_PATH to point to each (gem, version) requested [13:38:59] !log updating phabricator mail smtp-host to localhost T196916 [13:39:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:05] T196916: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916 [13:39:30] <_joe_> hashar: I know, I use bundle within rbenv usually [13:39:40] (03CR) 10Herron: [C: 032] phabricator: set smtp-host to localhost [puppet] - 10https://gerrit.wikimedia.org/r/440910 (https://phabricator.wikimedia.org/T196916) (owner: 10Herron) [13:39:48] maybe they dont mix properly :\ [13:40:35] !log reimaging thorium to debian stretch (this will cause an announced {stats,analytics}.http://wm.org/ downtime!) - T192641 [13:40:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:40] T192641: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 [13:41:03] <_joe_> hashar: if you run `bundle exec rake spec:wmflib` you get the same error? [13:41:35] yup [13:42:55] <_joe_> heh nice, if I try to run with the ruby on my machine (2.5.0) puppet fails :D [13:42:57] _joe_: sometime using a ruby debugger can help pin point the issue. I wrote about it ages ago on https://wikitech.wikimedia.org/wiki/Puppet_coding/testing#ruby_debugger [13:43:09] require 'pry'; binding.pry [13:43:25] that drops you in an interactive ruby session in the context of whereever you have put the "binding.pry" [13:43:48] (03PS1) 10Ottomata: Use stretch for thorium [puppet] - 10https://gerrit.wikimedia.org/r/458176 (https://phabricator.wikimedia.org/T192641) [13:44:02] (03CR) 10Ottomata: [V: 032 C: 032] Use stretch for thorium [puppet] - 10https://gerrit.wikimedia.org/r/458176 (https://phabricator.wikimedia.org/T192641) (owner: 10Ottomata) [13:44:09] (03PS2) 10Ottomata: Use stretch for thorium [puppet] - 10https://gerrit.wikimedia.org/r/458176 (https://phabricator.wikimedia.org/T192641) [13:44:11] (03CR) 10Ottomata: [V: 032 C: 032] Use stretch for thorium [puppet] - 10https://gerrit.wikimedia.org/r/458176 (https://phabricator.wikimedia.org/T192641) (owner: 10Ottomata) [13:44:16] !log repair sdn1 on ms-be2041 - T199198 [13:44:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:28] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [13:47:24] (03PS1) 10Hashar: wmflib + stdlib fail rspec :\ [puppet] - 10https://gerrit.wikimedia.org/r/458177 [13:47:42] _joe_: apparently that is the modules/wmflib/.fixtures.yaml file (which adds stdlib) https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/458177 [13:47:51] that dummy change should fail equally [13:47:54] <_joe_> hashar: I guessed so [13:48:08] <_joe_> hashar: but it doesn't make sense, and I wasn't able to reproduce locally [13:48:18] yeah that is annoying :\\\ [13:48:41] some tests "might" fail when using ruby2.4.0 or later [13:48:55] since we run 2.3.x on production, we might not have caught all issues [13:49:02] I did fix a few when I had a mac with ruby 2.4 [13:49:12] (03CR) 10jerkins-bot: [V: 04-1] wmflib + stdlib fail rspec :\ [puppet] - 10https://gerrit.wikimedia.org/r/458177 (owner: 10Hashar) [13:49:28] oh and the Gemfile let you override the puppet version to use simply by setting PUPPET_GEM_VERSION [13:50:28] (03Abandoned) 10Hashar: wmflib + stdlib fail rspec :\ [puppet] - 10https://gerrit.wikimedia.org/r/458177 (owner: 10Hashar) [13:50:52] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review, and 3 others: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916 (10herron) 05Open>03Resolved a:03herron This is looking good. Here are the received headers from a recently delivered message t... [13:51:22] <_joe_> hashar: uhm I do get the error now [13:51:29] hurrah! [13:51:38] <_joe_> I also have other specs failing for a problem I supposedly fixed eons ago [13:51:57] <_joe_> Could not find the daemon directory (tested [/etc/sv,/var/lib/service]) [13:52:09] ah yeah I came accross that one more than a couple time [13:52:25] <_joe_> but I fixed all of those using the facts collection [13:52:40] iirc that is the puppet debian package which does not recognizes that Debian Stretch uses systemd [13:52:41] <_joe_> so maybe there is a bug in the puppet gem? [13:52:53] and the puppet code ends up with a default fallback which is /etc/sv [13:52:54] <_joe_> hashar: nope, it's the original puppet [13:53:02] <_joe_> not the debian package [13:53:11] <_joe_> the debian package is unsuprisingly correct :P [13:53:29] <_joe_> but in CI we run the tests on ruby 2.3.3 with the same gems I'm using [13:53:38] <_joe_> so same ruby, same gems, different result [13:53:40] <_joe_> it's absurd [13:54:03] AH https://github.com/rodjek/rspec-puppet/issues/629 [13:54:10] I knew I filledl it somewhere [13:54:16] <_joe_> also the bug is well known and in theory already solved with adding the proper facts [13:54:23] <_joe_> which is what we did, remember? [13:54:44] yeah seems like you have hit that and hinted at rspec-puppet-facts [13:54:52] and I properly just blindly copy pasted to craft my fix ( https://github.com/wikimedia/integration-config/commit/27bed685c7e3358a7dd3a28d2b16beafef9d34dd ) [13:55:22] and my issue was on Jessie [13:55:55] <_joe_> yeah I'll have to dig deeper [13:56:00] <_joe_> this is disappointing though [13:56:08] the operations-puppet container is apparently based on jessie as well [13:56:13] <_joe_> can you please run [13:56:23] but adding rspec-puppet-facts should fix the "Could not find the daemon directory (tested [/etc/sv,/var/lib/service])" issue [13:56:30] <_joe_> rake spec:httpd? [13:57:42] ah with my dummy change ( https://gerrit.wikimedia.org/r/458177 ) which adds the .fixtures.yml in wmflib [13:57:45] that fails \o/ [13:57:48] <_joe_> I get a few failures [13:58:01] <_joe_> sorry, gtg, I have an interview [13:58:53] _joe_: pasted to https://phabricator.wikimedia.org/P7513 for later :] [13:58:55] I have to go as well [13:59:15] <_joe_> so you get the same error as me [13:59:20] <_joe_> still, it passes CI [13:59:27] <_joe_> /o\ [14:02:32] _joe_: rake tasks are run in parallel, so the output is multiplexed. Try: bundle exec rake --jobs 1 test [14:03:02] 10Operations, 10Patch-For-Review, 10User-herron, 10Wikimedia-Incident: Add email queueing/failover to services currently using mail_smarthost[0] - https://phabricator.wikimedia.org/T196920 (10herron) 05Open>03Resolved a:03herron The last service using `mail_smarthost[0]` was migrated to the localhost... [14:04:17] (03CR) 10Andrew Bogott: [C: 031] "surely this is just from a copy/paste -- feel free to merge this." [puppet] - 10https://gerrit.wikimedia.org/r/458124 (owner: 10Giuseppe Lavagetto) [14:07:05] !log reboot druid* hosts for kernel + openjdk-8 upgrades [14:07:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:33] 10Operations, 10Math, 10Patch-For-Review: Clean up artifacts from LaTeX based math rendering - https://phabricator.wikimedia.org/T195847 (10herron) [14:10:42] !log shutting down wdqs2002 for new SSD and reimage - T202777 [14:10:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:47] T202777: add SSDs to wdqs200[12] - https://phabricator.wikimedia.org/T202777 [14:11:17] 10Operations, 10cloud-services-team, 10monitoring: Prometheus vs. CPU usage vs. hyperthreading - https://phabricator.wikimedia.org/T193272 (10herron) [14:11:20] !log gehel@puppetmaster1001 conftool action : set/pooled=inactive; selector: dc=codfw,cluster=wdqs,name=wdqs2002.codfw.wmnet [14:11:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:47] (03PS1) 10Elukey: profile::analytics::refinery::job::data_check: remove cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/458182 (https://phabricator.wikimedia.org/T172532) [14:13:16] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Reach out to Google about @yahoo.com emails not reaching gmail inboxes (when sent to mailing lists) - https://phabricator.wikimedia.org/T146841 (10herron) 05Open>03Resolved a:03herron >>! In T146841#4378385, @Aklapper wrote: > @herron: Thanks! I boldly c... [14:14:57] 10Operations, 10Patch-For-Review, 10User-herron, 10Wikimedia-Incident: Add email queueing/failover to services currently using mail_smarthost[0] - https://phabricator.wikimedia.org/T196920 (10herron) [14:15:03] 10Operations, 10Mail, 10Phabricator, 10Patch-For-Review, and 3 others: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916 (10herron) [14:15:05] 10Operations, 10Mail, 10Patch-For-Review, 10User-herron: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361 (10herron) [14:15:16] (03PS2) 10Elukey: profile::analytics::refinery::job::data_check: remove cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/458182 (https://phabricator.wikimedia.org/T172532) [14:16:05] (03PS3) 10Elukey: profile::analytics::refinery::job::data_check: remove cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/458182 (https://phabricator.wikimedia.org/T172532) [14:18:35] 10Operations, 10Mail, 10Patch-For-Review, 10User-herron: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361 (10herron) The issues that were blocking this have been resolved (and added here as subtasks for reference) Moving forward once again with the mx1001 stretch upgrade [14:18:41] (03PS2) 10Herron: install_server: reinstall mx1001 with stretch [puppet] - 10https://gerrit.wikimedia.org/r/429241 (https://phabricator.wikimedia.org/T175361) [14:21:02] I 'll revert the deployment server to deploy1001 now. Everything seems to have well enough for it to happen [14:21:13] !log switchover the deployment server back to deploy1001 [14:21:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:26] (03PS5) 10Bstorm: wiki replicas: moving compatibility views to $table_compat [puppet] - 10https://gerrit.wikimedia.org/r/447654 (https://phabricator.wikimedia.org/T174047) [14:22:08] (03PS1) 10Alexandros Kosiaris: Revert "Switch the deployment server to deploy2001" [dns] - 10https://gerrit.wikimedia.org/r/458183 [14:22:10] (03PS1) 10Alexandros Kosiaris: Revert "Test switchover of the deployment server" [puppet] - 10https://gerrit.wikimedia.org/r/458184 [14:23:05] (03PS2) 10Alexandros Kosiaris: Revert "Test switchover of the deployment server" [puppet] - 10https://gerrit.wikimedia.org/r/458184 [14:23:15] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "Test switchover of the deployment server" [puppet] - 10https://gerrit.wikimedia.org/r/458184 (owner: 10Alexandros Kosiaris) [14:23:44] (03PS3) 10Herron: install_server: reinstall mx1001 with stretch [puppet] - 10https://gerrit.wikimedia.org/r/429241 (https://phabricator.wikimedia.org/T175361) [14:25:06] (03CR) 10Bstorm: "On the issue of the userindex tables, those put a premium on performance by their nature (after talking to some users of them). That sugg" [puppet] - 10https://gerrit.wikimedia.org/r/447654 (https://phabricator.wikimedia.org/T174047) (owner: 10Bstorm) [14:27:21] 10Operations, 10Mail, 10Patch-For-Review, 10User-herron: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361 (10herron) To be more specific -- First depooling mx1001 by stopping Exim, then manually relaying any queued/deferred messages on mx1001 to mx2001, then reinstalling with St... [14:28:47] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "Switch the deployment server to deploy2001" [dns] - 10https://gerrit.wikimedia.org/r/458183 (owner: 10Alexandros Kosiaris) [14:33:55] (03PS1) 10Ottomata: Revert "Temporarily removing thorium from netboot.cfg" [puppet] - 10https://gerrit.wikimedia.org/r/458188 [14:34:02] (03PS2) 10Ottomata: Revert "Temporarily removing thorium from netboot.cfg" [puppet] - 10https://gerrit.wikimedia.org/r/458188 [14:34:06] (03CR) 10Ottomata: [V: 032 C: 032] Revert "Temporarily removing thorium from netboot.cfg" [puppet] - 10https://gerrit.wikimedia.org/r/458188 (owner: 10Ottomata) [14:35:38] (03CR) 10Ottomata: [C: 031] profile::analytics::refinery::job::data_check: remove cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/458182 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [14:42:13] !log elukey@deploy1001 Started deploy [analytics/refinery@77a5a83]: small changes to pageview whitelist and scripts [14:42:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:28] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce orespoolcounter{1,2}00{1,2} [dns] - 10https://gerrit.wikimedia.org/r/457925 (https://phabricator.wikimedia.org/T203465) (owner: 10Alexandros Kosiaris) [14:42:32] (03PS2) 10Alexandros Kosiaris: Introduce orespoolcounter{1,2}00{1,2} [dns] - 10https://gerrit.wikimedia.org/r/457925 (https://phabricator.wikimedia.org/T203465) [14:42:36] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Introduce orespoolcounter{1,2}00{1,2} [dns] - 10https://gerrit.wikimedia.org/r/457925 (https://phabricator.wikimedia.org/T203465) (owner: 10Alexandros Kosiaris) [14:43:30] (03PS4) 10Elukey: profile::analytics::refinery::job::data_check: remove cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/458182 (https://phabricator.wikimedia.org/T172532) [14:46:12] (03CR) 10Elukey: [C: 032] profile::analytics::refinery::job::data_check: remove cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/458182 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [14:47:06] 08Warning Alert for device asw2-d-eqiad.mgmt.eqiad.wmnet - Inbound interface errors [14:48:24] 10Operations, 10ORES, 10Scoring-platform-team, 10vm-requests, 10Patch-For-Review: Site: 4 VM request for ORES poolcounter - https://phabricator.wikimedia.org/T203465 (10akosiaris) [14:50:53] !log elukey@deploy1001 Finished deploy [analytics/refinery@77a5a83]: small changes to pageview whitelist and scripts (duration: 08m 40s) [14:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:54] so the warning for asw2-d-eqiad.mgmt.eqiad.wmnet seems related to stat1006's port [14:52:29] https://librenms.wikimedia.org/device/device=149/tab=port/port=12168/ [14:53:02] Out errors: Carrier transitions: 32325, Errors: 0, Drops: 12284, Collisions: 0, [14:53:02] In errors: Errors: 7965, Drops: 0, Framing errors: 7965, Runts: 0, Bucket drops: 0, [14:53:24] we just reimaged thorium to stretch, and there are some rsyncs that are running between them.. I am wondering if they are now for some reason copying data [14:53:46] ottomata: are you running rsyncs or similar on stat1006 by any chance? [14:53:57] https://wikitech.wikimedia.org/wiki/Network_monitoring#Inbound/outbound_interface_errors [14:54:31] (03PS1) 10Alexandros Kosiaris: Fix typo in orespoolcounter2002 IP assignment [dns] - 10https://gerrit.wikimedia.org/r/458192 [14:54:37] the errors that alerted are the framing errors, usually means faulty interface or cable [14:55:01] the carrier transition is super sketchy too, possibly related [14:55:49] XioNoX: I saw the huge inbound traffic increase and I thought it was saturating the link [14:56:15] (03CR) 10Alexandros Kosiaris: [C: 032] Fix typo in orespoolcounter2002 IP assignment [dns] - 10https://gerrit.wikimedia.org/r/458192 (owner: 10Alexandros Kosiaris) [14:56:59] 10Operations, 10LDAP-Access-Requests: Remove user "albe" from the wmde LDAP group - https://phabricator.wikimedia.org/T203561 (10Krenair) `nda` is a highly privileged group according to https://wikitech.wikimedia.org/wiki/LDAP/Groups - not as much as the other ones on there, but still. [14:57:19] elukey: it is too [14:57:41] and it's probably that increase in traffic that revealed the link issue [14:57:47] elukey: i was but not anymore [14:57:49] 10Operations, 10cloud-services-team: Onboard gtirloni to WMF - https://phabricator.wikimedia.org/T203489 (10Dzahn) when adding users to LDAP groups they also need to be added to the admin module in puppet. we are getting this from the "cross-validate-accounts" script: ``` Membership of ops group in LDAP and... [15:02:38] XioNoX: so I just stopped the rsync that it was running [15:06:55] XioNoX: https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?orgId=1&panelId=8&fullscreen&var-server=stat1006&var-datasource=eqiad%20prometheus%2Fops&from=now-1h&to=now-1m [15:08:23] 10Operations, 10ops-eqiad: Interface errors for stat1006 - https://phabricator.wikimedia.org/T203576 (10ayounsi) [15:08:28] elukey, cmjohnson1, https://phabricator.wikimedia.org/T203576 [15:08:54] thanks a lot! [15:09:56] 10Operations, 10ops-eqiad: Interface errors for stat1006 - https://phabricator.wikimedia.org/T203576 (10ayounsi) [15:09:57] elukey, give me about 1 hour and I can swap it [15:10:10] if that works for you [15:10:10] (03PS1) 10Ema: trafficserver (7.1.3+ds-4wm3) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/458195 (https://phabricator.wikimedia.org/T199720) [15:10:59] updated https://wikitech.wikimedia.org/wiki/Network_monitoring#Inbound/outbound_interface_errors [15:11:59] cmjohnson1: ack, thanks a lot! No rush! [15:12:22] XioNoX: nice thanks! [15:12:27] elukey do you want me to just do it ..or coordinate w/you? [15:12:56] maybe 20secs [15:13:04] cmjohnson1: a heads up would be great if you can (I'll be online for the next couple of hours or a bit more, I shouldn't lag a lot in answering) [15:13:11] okay [15:13:41] if you could also check analytics1068 at the same time I'll be suuuuper happy :D [15:13:48] but of course I don't want to jump the queue [15:17:06] 08̶W̶a̶r̶n̶i̶n̶g Device asw2-d-eqiad.mgmt.eqiad.wmnet recovered from Inbound interface errors [15:17:57] (03CR) 10Marostegui: mariadb: Fix DB configuration in preparation for dc switchover (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457847 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [15:18:43] (03PS1) 10Bstorm: sysctl: Allow override of tcp settings in the kernel [puppet] - 10https://gerrit.wikimedia.org/r/458198 (https://phabricator.wikimedia.org/T203254) [15:19:09] (03PS2) 10Bstorm: sysctl: Allow override of tcp settings in the kernel [puppet] - 10https://gerrit.wikimedia.org/r/458198 (https://phabricator.wikimedia.org/T203254) [15:20:17] (03CR) 10jerkins-bot: [V: 04-1] sysctl: Allow override of tcp settings in the kernel [puppet] - 10https://gerrit.wikimedia.org/r/458198 (https://phabricator.wikimedia.org/T203254) (owner: 10Bstorm) [15:24:11] (03CR) 10Muehlenhoff: "I don't think we should do that, overriding select values for some hosts/clusters is precisely what those priorities are designed for. Als" [puppet] - 10https://gerrit.wikimedia.org/r/458198 (https://phabricator.wikimedia.org/T203254) (owner: 10Bstorm) [15:24:35] (03CR) 10jerkins-bot: [V: 04-1] trafficserver (7.1.3+ds-4wm3) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/458195 (https://phabricator.wikimedia.org/T199720) (owner: 10Ema) [15:28:38] (03CR) 10Bstorm: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/458198 (https://phabricator.wikimedia.org/T203254) (owner: 10Bstorm) [15:31:52] (03Abandoned) 10Bstorm: sysctl: Allow override of tcp settings in the kernel [puppet] - 10https://gerrit.wikimedia.org/r/458198 (https://phabricator.wikimedia.org/T203254) (owner: 10Bstorm) [15:41:49] (03PS1) 10Ema: ATS: ship service file as a systemd override [puppet] - 10https://gerrit.wikimedia.org/r/458201 (https://phabricator.wikimedia.org/T200178) [15:50:57] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/458201 (https://phabricator.wikimedia.org/T200178) (owner: 10Ema) [15:51:04] (03PS33) 10Gehel: Convert elasticsearch to systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [15:51:06] (03PS3) 10Giuseppe Lavagetto: profile::openstack::base::frontend: remove config-master [puppet] - 10https://gerrit.wikimedia.org/r/458124 [15:51:28] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::openstack::base::frontend: remove config-master [puppet] - 10https://gerrit.wikimedia.org/r/458124 (owner: 10Giuseppe Lavagetto) [15:51:55] (03CR) 10jerkins-bot: [V: 04-1] Convert elasticsearch to systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [15:53:14] (03PS34) 10Gehel: Convert elasticsearch to systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [15:53:25] (03PS35) 10Gehel: Convert elasticsearch to systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [15:54:06] (03CR) 10jerkins-bot: [V: 04-1] Convert elasticsearch to systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [15:56:46] (03CR) 10Ema: "recheck" [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/458195 (https://phabricator.wikimedia.org/T199720) (owner: 10Ema) [15:58:41] (03PS37) 10Alex Monk: Prepare for packaging stuff and readme [software/certcentral] - 10https://gerrit.wikimedia.org/r/456646 [16:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T1600). Please do the needful. [16:00:05] bmansurov and stephanebisson: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:16] hey [16:00:44] here [16:02:26] 10Operations, 10ops-codfw, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs200[12] - https://phabricator.wikimedia.org/T202777 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['wdqs2002.codfw.wmnet'] ``` The log can be found in `/var/log/w... [16:05:06] (03PS4) 10Giuseppe Lavagetto: conftool: add class for writing to state to file [puppet] - 10https://gerrit.wikimedia.org/r/457490 [16:05:42] I can SWAT, just got to get setup... [16:06:28] \o/ [16:06:57] (03PS3) 10Thcipriani: Enable logging for Schema:CitationUsage at 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454854 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [16:07:10] (03CR) 10jerkins-bot: [V: 04-1] conftool: add class for writing to state to file [puppet] - 10https://gerrit.wikimedia.org/r/457490 (owner: 10Giuseppe Lavagetto) [16:07:20] (03PS4) 10Jcrespo: mariadb: Fix DB configuration in preparation for dc switchover [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457847 (https://phabricator.wikimedia.org/T189107) [16:08:16] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454854 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [16:08:56] thcipriani: o/ [16:09:07] by any chance did you see my email about Archiva? [16:09:19] (03CR) 10Giuseppe Lavagetto: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/457490 (owner: 10Giuseppe Lavagetto) [16:09:44] (03Merged) 10jenkins-bot: Enable logging for Schema:CitationUsage at 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454854 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [16:10:40] (03CR) 10jerkins-bot: [V: 04-1] conftool: add class for writing to state to file [puppet] - 10https://gerrit.wikimedia.org/r/457490 (owner: 10Giuseppe Lavagetto) [16:10:54] elukey: I saw the task about upgrading to stretch and using ldap auth, but I haven't seen the email [16:10:57] <_joe_> this makes zero sense now [16:12:00] elukey: although I haven't followed that task very closely, do you need feedback from me or need me to test something? [16:13:30] (03CR) 10jerkins-bot: [V: 04-1] trafficserver (7.1.3+ds-4wm3) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/458195 (https://phabricator.wikimedia.org/T199720) (owner: 10Ema) [16:13:55] bmansurov: you change is on mwdebug1002, check please if possible [16:13:55] thcipriani: nono I sent an email to engineering warning people that archiva-deploy is not valid anymore, now you need to be in the archiva-deployers LDAP group to be able to upload jars [16:13:59] you are in [16:14:02] (03PS38) 10Alex Monk: Prepare for packaging stuff and readme [software/certcentral] - 10https://gerrit.wikimedia.org/r/456646 [16:14:03] thcipriani: on it [16:14:47] (03PS1) 10Dzahn: admins: remove albe from ldap_admins, add to absent group [puppet] - 10https://gerrit.wikimedia.org/r/458207 (https://phabricator.wikimedia.org/T203561) [16:15:34] thcipriani: it's working! [16:15:35] (03CR) 10Vgutierrez: [C: 031] Prepare for packaging stuff and readme [software/certcentral] - 10https://gerrit.wikimedia.org/r/456646 (owner: 10Alex Monk) [16:17:31] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Remove user "albe" from the wmde LDAP group - https://phabricator.wikimedia.org/T203561 (10Dzahn) > can be verified at https://www.wikimedia.de/wiki/Mitarbeitende Thank you. I did that. Verified. Note to other people handling access requests: This r... [16:18:46] elukey: ah! yes, just now read it. FWIW, the archiva-deploy user didn't work for me last time (password stores were out of date and...key management issues for pwstore). We setup a new user different from my ldap user to upload the latest gerrit jars. [16:19:08] bmansurov: great, going live [16:19:23] ok [16:20:39] thcipriani: yep yep, I recall, just wanted to make sure that you knew about the ldap thing, good! [16:21:04] elukey: awesome, thank you for reaching out :) [16:23:41] !log thcipriani@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:454854|Enable logging for Schema:CitationUsage at 100%]] T191086 (duration: 01m 23s) [16:23:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:48] T191086: Instrument and collect data via CitationUsage schema - https://phabricator.wikimedia.org/T191086 [16:23:50] ^ bmansurov live now [16:23:50] (03CR) 10Jcrespo: [C: 031] Labs: Make redact_sanitarium.sh file easier to read [puppet] - 10https://gerrit.wikimedia.org/r/457899 (owner: 10Banyek) [16:24:01] thcipriani: thank you! [16:24:05] yw :) [16:24:47] (03CR) 10Jcrespo: [C: 031] "Until we have a better options, let's deploy this carefully (avoiding false positives)" [puppet] - 10https://gerrit.wikimedia.org/r/456099 (https://phabricator.wikimedia.org/T149340) (owner: 10Banyek) [16:25:42] (03CR) 10Marostegui: "Will you add the s3 rc, vslow and api sections in the end?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457847 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [16:26:22] (03CR) 10Jcrespo: [C: 04-1] "Yes" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/457847 (https://phabricator.wikimedia.org/T189107) (owner: 10Jcrespo) [16:27:02] stephanebisson: I pulled https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/PageTriage/+/458166/ to mwdebug1002, although there may be nothing to check there since it's adding a maintenance script: anything you want to look at there before I sync everywhere? [16:27:32] thcipriani: If the script is there, that's good enough for me [16:27:57] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: request to add phedenskog to perf-roots - https://phabricator.wikimedia.org/T202658 (10Dzahn) [16:28:16] stephanebisson: ok, going live :) [16:29:34] (03CR) 10Dzahn: [C: 032] "confirmed it was requested by his manager at WMDE" [puppet] - 10https://gerrit.wikimedia.org/r/458207 (https://phabricator.wikimedia.org/T203561) (owner: 10Dzahn) [16:30:13] !log thcipriani@deploy1001 Synchronized php-1.32.0-wmf.19/extensions/PageTriage/maintenance/FixNominatedForDeletion.php: SWAT: [[gerrit:458166|Maintenance: fix ptrp_deleted]] T202582 (duration: 00m 57s) [16:30:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:18] T202582: "Nominated for deletion" filter doesn't work - https://phabricator.wikimedia.org/T202582 [16:30:37] ^ stephanebisson maintenance script should be everywhere now [16:31:24] !log LDAP: removed user 'albe' from groups 'wmde' and 'nda' (T203561) [16:31:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:30] T203561: Remove user "albe" from the wmde LDAP group - https://phabricator.wikimedia.org/T203561 [16:31:55] 10Operations, 10ops-codfw, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs200[12] - https://phabricator.wikimedia.org/T202777 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['wdqs2002.codfw.wmnet'] ``` and were **ALL** successful. [16:32:03] stephanebisson: Fix CopyPatrol links for draft is live on mwdebug1002, check please [16:32:12] thcipriani: testing... [16:32:23] 10Operations, 10LDAP-Access-Requests: Remove user "albe" from the wmde LDAP group - https://phabricator.wikimedia.org/T203561 (10Dzahn) [16:32:52] 10Operations, 10LDAP-Access-Requests: Remove user "albe" from the wmde LDAP group - https://phabricator.wikimedia.org/T203561 (10Dzahn) Done. Keeping the ticket open to check whether the NDA group removal needs a process with legal or not. [16:32:56] thcipriani: works as expected [16:33:09] thanks for checking, going live [16:33:45] (03PS1) 10Bmansurov: Disable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458209 (https://phabricator.wikimedia.org/T191086) [16:35:39] !log thcipriani@deploy1001 Synchronized php-1.32.0-wmf.20/extensions/PageTriage/modules/ext.pageTriage.models/ext.pageTriage.article.js: SWAT: [[gerrit:458085|Fix CopyPatrol links for drafts]] T203284 (duration: 00m 57s) [16:35:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:44] T203284: "Copyvio" link to CopyPatrol doesn't work for drafts, should drop Draft: namespace prefix - https://phabricator.wikimedia.org/T203284 [16:35:55] stephanebisson: ^ should be live [16:35:56] thcipriani: thank you! [16:36:00] yw :) [16:38:26] !log Starting extensions/PageTriage/maintenance/FixNominatedForDeletion.php --wiki enwiki [16:38:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:48] !log Finished extensions/PageTriage/maintenance/FixNominatedForDeletion.php --wiki enwiki [16:38:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:35] elukey: I am going to be swapping cable for stat1006 in next few mins...still okay? [16:42:43] cmjohnson1: ack! thanks! [16:42:54] !log shutting down wdqs1003 for new SSD and reimage - T202780 [16:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:58] !log gehel@puppetmaster1001 conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=wdqs,name=wdqs1003.eqiad.wmnet [16:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:12] anybody deployed changes related to eventlogging? [16:43:35] yes https://gerrit.wikimedia.org/r/#/c/454854 [16:44:12] the events processed by EL are really high https://grafana.wikimedia.org/dashboard/db/eventlogging?orgId=1&from=now-3h&to=now-5m [16:44:17] it tripped our alerts [16:45:08] !log swapping ethernet cable for stat1006 [16:45:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:45:38] (03CR) 10jenkins-bot: Enable logging for Schema:CitationUsage at 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454854 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [16:45:53] elukey it's done...i will update task but test it when you get the opportunity [16:46:01] ack! [16:46:12] the rsync should be related to a cron, will test it tomorrow [16:46:14] 10Operations, 10ops-eqiad: Interface errors for stat1006 - https://phabricator.wikimedia.org/T203576 (10Cmjohnson) @elukey ethernet cable has been swapped. [16:46:30] cmjohnson1: if you could also check analytics1068 later on it would be awesome [16:47:44] (03PS1) 10Dzahn: admins: clean up duplicate membership for mbsantos [puppet] - 10https://gerrit.wikimedia.org/r/458210 (https://phabricator.wikimedia.org/T197237) [16:50:35] elukey: aye, 0 to 100% in one go is unusual indeed. [16:50:54] I think it was 1% to 100% but same thing :D [16:52:14] (03PS1) 10Ottomata: Blacklist eventlogging CitationUsage from MySQL import [puppet] - 10https://gerrit.wikimedia.org/r/458211 (https://phabricator.wikimedia.org/T191086) [16:52:59] (03CR) 10Ottomata: [C: 032] Blacklist eventlogging CitationUsage from MySQL import [puppet] - 10https://gerrit.wikimedia.org/r/458211 (https://phabricator.wikimedia.org/T191086) (owner: 10Ottomata) [16:55:22] elukey: The commit adds factor=1 where it was undefined before. [16:55:30] Grafana shows 0/s in the days before [16:55:53] ah, it's rounding down. [16:55:56] Krinkle: ah yes you are right, I was checking the tasks' correspondence [16:55:58] OK. I see a few now. [16:56:32] about 35 spurious event spread over the last few weeks. [16:56:36] most minutes 0. [16:56:39] probably unrelated? [16:57:34] yeah, from the last compaign in July [16:57:47] probably some cache issue somewhere, e.g. people's browsing being shutdown for a month and then restoring the session [16:57:49] it happens :) [16:58:12] our max page load time is 17 hours a few times a week. I suspect it's people flying. [16:59:38] good rtt from in-flight internet *g* [16:59:45] we are checking with the research folks now [17:00:27] k [17:00:49] jouncebot, next [17:00:50] In 0 hour(s) and 59 minute(s): fixcopyright.wikimedia.org deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T1800) [17:01:13] anyway, given global volume of EL increased by 200% in one minute, and mysql insert to be disable, I think it's safe to say, this was not prepared for. I also don't see a reply from Nuria on that task or expected number/volume indeed. [17:01:48] Would normally recommend revert, discuss and then try again later. I don't know about the research bias, but "off" should be safe/unbiased. [17:02:22] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10MoritzMuehlenhoff) I tried an installation from cloudvirt1023, but the PXELINUX version on the NIC is affected by a bug in syslinu... [17:04:00] 10Operations, 10cloud-services-team, 10netops: modify labs-hosts1-vlans for http load of installer kernel - https://phabricator.wikimedia.org/T190424 (10MoritzMuehlenhoff) Arzhel and I had a look and this doesn't seem to be ACL-related, the tftp packets are flowing in both directions. This is possibly a bug... [17:04:23] Krinkle: yep I agree, would it be possible to revert now if the research team decides so? (outside the window) [17:04:53] Sure. I can help guide it through if needed. [17:05:27] Would prefer the deployer to respond first. But if no reply in 15min, will revert regardless. we have a policy of being online after ones deployment for a reasonable amount of time. [17:05:53] I can ask him to join here [17:06:16] o/ [17:06:27] Krinkle: --^ :) [17:06:30] Good :) [17:06:38] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/458209/ is ready [17:07:41] Ah, I see it was swat deployed. Right. [17:08:40] bmansurov: want me to deploy it? [17:08:49] Krinkle: yes please [17:08:54] (03CR) 10Krinkle: [C: 032] Disable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458209 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [17:09:20] bmansurov: I assumed (incorrectly) that you were a deployer, so there was some confusion/uncertainty. Makes sense now. [17:09:39] Krinkle: oh i see [17:10:13] (03Merged) 10jenkins-bot: Disable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458209 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [17:11:03] (03CR) 10jenkins-bot: Disable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458209 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [17:13:43] Krinkle: thanks for deploying the change! [17:14:07] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: I73d2ce30 - T191086 (duration: 00m 57s) [17:14:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:12] T191086: Instrument and collect data via CitationUsage schema - https://phabricator.wikimedia.org/T191086 [17:14:13] you're welcome [17:14:19] thanks indeed Krinkle [17:14:37] and bmansurov of course :) [17:15:02] o/ also the analytics folks [17:22:11] !log ppchelko@deploy1001 Started deploy [restbase/deploy@b768926]: Bump expected Parsoid version and release new metrics endpoints [17:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:08] 10Operations, 10LDAP-Access-Requests: Remove user "albe" from the wmde LDAP group - https://phabricator.wikimedia.org/T203561 (10WMDE-leszek) Thanks for removing from 'nda' LDAP group. Thanks in advanced to @RStallman-legalteam for taking any further actions that are required (if there any). [17:30:31] 10Operations, 10ops-eqiad, 10Analytics: analytics1068 doesn't boot - https://phabricator.wikimedia.org/T203244 (10Cmjohnson) @elukey analytics1068 is broke...it will not get past loading bios drivers during the post. I tired a hard reset (removing power, drain flea power) I reseated memory (sometimes memory... [17:31:23] 10Operations, 10ops-eqiad, 10Analytics: analytics1068 doesn't boot - https://phabricator.wikimedia.org/T203244 (10elukey) Thanks a lot! No rush I only wanted to know if draining flea power would have helped.. [17:31:30] thanks a lot cmjohnson1 -^ [17:34:36] (03PS1) 10Volans: mediawiki: improve validation checks [software/spicerack] - 10https://gerrit.wikimedia.org/r/458218 (https://phabricator.wikimedia.org/T199079) [17:35:34] !log ppchelko@deploy1001 Finished deploy [restbase/deploy@b768926]: Bump expected Parsoid version and release new metrics endpoints (duration: 13m 24s) [17:35:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:42] !log ppchelko@deploy1001 Started deploy [restbase/deploy@b768926]: Bump expected Parsoid version and release new metrics endpoints, take 2 [17:35:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:13] (03PS1) 10Volans: sre.switchdc.mediawiki: phase 0 improve logging [cookbooks] - 10https://gerrit.wikimedia.org/r/458219 (https://phabricator.wikimedia.org/T199079) [17:39:44] !log ppchelko@deploy1001 Finished deploy [restbase/deploy@b768926]: Bump expected Parsoid version and release new metrics endpoints, take 2 (duration: 04m 02s) [17:39:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:49] 10Operations, 10Services (watching): Create nodejs 10 packages - https://phabricator.wikimedia.org/T203239 (10Pchelolo) > All node binary packages will need to be built for our environment and not fetched from npm (which is the clean approach anyway) For most of node services we actually build binary dependen... [17:40:00] (03CR) 10Volans: "done" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/457944 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [17:40:38] !log ppchelko@deploy1001 Started deploy [restbase/deploy@b768926]: Bump expected Parsoid version and release new metrics endpoints, take 3 [17:40:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:55] !log swapping DAC cable cp1080 [17:42:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:12] (03PS1) 10Volans: sre.switchdc.mediawiki: phase 2 add sleep [cookbooks] - 10https://gerrit.wikimedia.org/r/458221 (https://phabricator.wikimedia.org/T199079) [17:44:07] 10Operations, 10ops-eqiad, 10Traffic: cp1080 - kernel / bnxt_en failures - https://phabricator.wikimedia.org/T203194 (10Cmjohnson) @bblack @Dzahn I didn't get any direction with this but I assumed swapping the DAC is the appropriate attempted fix. Please let me know if this resolves the issue [17:46:20] (03CR) 10EBernhardson: Convert elasticsearch to systemd unit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [17:46:50] (03PS5) 10Jdlrobson: Remove obsolete $wgPopupsBetaFeature, Part I: CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/450906 (https://phabricator.wikimedia.org/T203589) (owner: 10Prtksxna) [17:47:15] (03PS8) 10Jdlrobson: Remove obsolete $wgPopupsBetaFeature, Part III: InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444574 (https://phabricator.wikimedia.org/T203589) (owner: 10Prtksxna) [17:47:18] !log ppchelko@deploy1001 Finished deploy [restbase/deploy@b768926]: Bump expected Parsoid version and release new metrics endpoints, take 3 (duration: 06m 40s) [17:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:47] (03PS1) 10Faidon Liambotis: Pass argv from to main() -> parse_args -> argparse [software/keyholder] - 10https://gerrit.wikimedia.org/r/458223 [17:47:49] (03PS1) 10Faidon Liambotis: Add setuptools, LICENSE, README.rst etc. [software/keyholder] - 10https://gerrit.wikimedia.org/r/458224 [17:47:51] (03PS1) 10Faidon Liambotis: Add pytest support for unit/integration testing [software/keyholder] - 10https://gerrit.wikimedia.org/r/458225 [17:47:53] (03PS1) 10Faidon Liambotis: Don't barf on an empty or invalid YAML config [software/keyholder] - 10https://gerrit.wikimedia.org/r/458226 [17:47:55] (03PS1) 10Faidon Liambotis: Drop legacy SSHv1 support [software/keyholder] - 10https://gerrit.wikimedia.org/r/458227 [17:47:57] (03PS1) 10Faidon Liambotis: Drop MD5 (pre-6.8) digest support [software/keyholder] - 10https://gerrit.wikimedia.org/r/458228 [17:47:59] (03PS1) 10Faidon Liambotis: Don't drop the colon between hash type/digest [software/keyholder] - 10https://gerrit.wikimedia.org/r/458229 [17:48:01] (03PS1) 10Faidon Liambotis: Only show tracebacks on DEBUG logging levels [software/keyholder] - 10https://gerrit.wikimedia.org/r/458230 [17:48:03] (03PS3) 10Jdlrobson: Remove obsolete $wgPopupsBetaFeature, Part II: InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452863 (https://phabricator.wikimedia.org/T203589) (owner: 10Jforrester) [17:48:21] (03CR) 10Zhuyifei1999: "We are planning to do this as part of T202588. The old server using mysql will simply be abandoned and deleted after the migration." [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [17:48:25] (03CR) 10Jdlrobson: [C: 031] "Added to our upcoming column so we get round to this soon. Thanks for the prod." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/450906 (https://phabricator.wikimedia.org/T203589) (owner: 10Prtksxna) [17:48:36] (03PS1) 10Faidon Liambotis: Respond with SSH_AGENT_FAILURE on protocol errors [software/keyholder] - 10https://gerrit.wikimedia.org/r/458231 [17:48:38] (03PS1) 10Faidon Liambotis: Switch to using Enum for SSH protocol codes [software/keyholder] - 10https://gerrit.wikimedia.org/r/458232 [17:48:40] (03PS1) 10Faidon Liambotis: Switch to Construct for the SSH agent protocol [software/keyholder] - 10https://gerrit.wikimedia.org/r/458233 [17:48:42] (03PS1) 10Faidon Liambotis: Split handle_client_request() into multiple methods [software/keyholder] - 10https://gerrit.wikimedia.org/r/458234 [17:48:44] (03PS1) 10Faidon Liambotis: Stop referring to the daemon as a "proxy" [software/keyholder] - 10https://gerrit.wikimedia.org/r/458235 [17:48:46] (03PS1) 10Faidon Liambotis: Implement all the SSH agent bits and stop proxying [software/keyholder] - 10https://gerrit.wikimedia.org/r/458236 [17:48:48] (03PS1) 10Faidon Liambotis: Split SshAgentCommand type to Request/Response [software/keyholder] - 10https://gerrit.wikimedia.org/r/458237 [17:48:50] (03PS1) 10Faidon Liambotis: Make pylint a little happier [software/keyholder] - 10https://gerrit.wikimedia.org/r/458238 [17:48:52] (03PS1) 10Faidon Liambotis: Use mlockall() to avoid any potential swapping [software/keyholder] - 10https://gerrit.wikimedia.org/r/458239 [17:48:54] (03PS1) 10Faidon Liambotis: Add permission checks for various commands [software/keyholder] - 10https://gerrit.wikimedia.org/r/458240 [17:48:58] (03PS1) 10Faidon Liambotis: Verify the validity of signature requests [software/keyholder] - 10https://gerrit.wikimedia.org/r/458241 [17:49:00] (03PS1) 10Faidon Liambotis: Implement SSH_AGENTC_LOCK/SSH_AGENTC_UNLOCK [software/keyholder] - 10https://gerrit.wikimedia.org/r/458242 [17:49:02] (03PS1) 10Faidon Liambotis: Parse/build agent request/responses once [software/keyholder] - 10https://gerrit.wikimedia.org/r/458243 [17:49:04] (03PS1) 10Faidon Liambotis: Refactor handle() [software/keyholder] - 10https://gerrit.wikimedia.org/r/458244 [17:49:08] (03PS1) 10Faidon Liambotis: Add compatibility with Construct 2.8.22 and 2.9.45 [software/keyholder] - 10https://gerrit.wikimedia.org/r/458245 [17:49:12] (03PS1) 10Faidon Liambotis: Switch path handling to pathlib.Path [software/keyholder] - 10https://gerrit.wikimedia.org/r/458246 [17:54:15] 10Operations, 10Release-Engineering-Team (Kanban): Create keyholder gerrit repo - https://phabricator.wikimedia.org/T203108 (10faidon) [18:00:05] legoktm and CindyCicaleseWMF: My dear minions, it's time we take the moon! Just kidding. Time for fixcopyright.wikimedia.org deployment deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T1800). [18:00:26] rats, I would rather take the moon [18:00:36] lol [18:04:36] !log upgrade asw2-a-eqiad (not in production) [18:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:46] (03PS2) 10Dzahn: admins: clean up duplicate membership for mbsantos [puppet] - 10https://gerrit.wikimedia.org/r/458210 (https://phabricator.wikimedia.org/T197237) [18:07:42] legoktm: https://moon.wikimedia.org/ [18:08:14] :D [18:08:57] AndyRussG: yt? [18:08:59] https://phabricator.wikimedia.org/T203592 [18:09:54] ottomata: ah hmm ok [18:09:56] interesting [18:10:04] (03CR) 10Dzahn: [C: 032] admins: clean up duplicate membership for mbsantos [puppet] - 10https://gerrit.wikimedia.org/r/458210 (https://phabricator.wikimedia.org/T197237) (owner: 10Dzahn) [18:10:09] I'm here-ish... [18:10:23] ottomata: it's actually fine for that not to go into MySQL [18:10:49] We're just collecting via the Kafka topic and can dig into Hive if anything more is needed [18:12:18] great [18:12:19] that is easy then [18:12:21] will blacklist [18:12:26] * legoktm twiddles thumbs waiting for jenkins [18:15:03] ottomata: ok great thx! [18:15:07] (03PS1) 10Ottomata: Blacklist CentralNoticeImpression from EventLogging MySQL [puppet] - 10https://gerrit.wikimedia.org/r/458252 (https://phabricator.wikimedia.org/T203592) [18:15:09] (03CR) 10Ottomata: [V: 032 C: 032] Blacklist CentralNoticeImpression from EventLogging MySQL [puppet] - 10https://gerrit.wikimedia.org/r/458252 (https://phabricator.wikimedia.org/T203592) (owner: 10Ottomata) [18:16:36] (03PS2) 10Legoktm: Enable EUCopyrightCampaign extensions and SkinPerPage for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458070 (https://phabricator.wikimedia.org/T203296) [18:16:38] (03PS2) 10Legoktm: Enable $wgULSLanguageDetection for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458071 (https://phabricator.wikimedia.org/T203179) [18:16:40] (03PS2) 10Legoktm: Set $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458090 (https://phabricator.wikimedia.org/T203296) [18:30:52] !log cp1080 - pooling again after T203194 appears fixed [18:31:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:01] T203194: cp1080 - kernel / bnxt_en failures - https://phabricator.wikimedia.org/T203194 [18:32:31] 10Operations, 10ops-eqiad, 10Traffic: cp1080 - kernel / bnxt_en failures - https://phabricator.wikimedia.org/T203194 (10Dzahn) @Cmjohnson Thank you. I don't really know if that was the cause but looking at monitoring now everything is green. So i repooled the server just now. [18:36:15] 10Operations, 10netops, 10Wikimedia-Incident: asw2-a-eqiad FPC5 gets disconnected every 10 minutes - https://phabricator.wikimedia.org/T201145 (10ayounsi) asw2-a-eqiad members upgraded to 14.1X53-D47.3 (QFX host included). For the record, downtime was 10min. [18:38:17] !log legoktm@deploy1001 Started scap: build l10n for EUCopyrightCampaign and SkinPerPage [18:38:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:06] !log fixed package state on auth2001 [18:39:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:04] 10Operations, 10ops-eqiad, 10Traffic: cp1080 - kernel / bnxt_en failures - https://phabricator.wikimedia.org/T203194 (10Dzahn) 05Open>03Resolved a:03Dzahn I guess we can close it and simply reopen it if it should happen again. Let me know if you think otherwise @ema @BBlack [18:44:37] (03CR) 10Gehel: Convert elasticsearch to systemd unit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [18:46:23] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs1003 - https://phabricator.wikimedia.org/T202780 (10RobH) [18:46:30] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/458218 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [18:47:22] (03CR) 10Gehel: [C: 031] "LGTM, trivial enough" [cookbooks] - 10https://gerrit.wikimedia.org/r/458219 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [18:51:34] (03PS2) 10Dzahn: site: apply alerting_host role on icinga1001 [puppet] - 10https://gerrit.wikimedia.org/r/458093 (https://phabricator.wikimedia.org/T201344) [18:53:21] greg-g: I'm definitely going to go over, I always underestimate how long scap will take :/ [18:54:09] jouncebot: next [18:54:10] In 0 hour(s) and 5 minute(s): MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T1900) [18:54:15] legoktm: you're good :) [18:55:50] (03PS1) 10Dzahn: icinga: keep notifications for icinga1001 itself disabled initially [puppet] - 10https://gerrit.wikimedia.org/r/458272 (https://phabricator.wikimedia.org/T201344) [18:56:55] (03CR) 10Dzahn: "the modified resources part in the compiler output is explained by:" [puppet] - 10https://gerrit.wikimedia.org/r/458093 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [18:57:19] (03PS2) 10Dzahn: icinga: keep notifications for icinga1001 itself disabled initially [puppet] - 10https://gerrit.wikimedia.org/r/458272 (https://phabricator.wikimedia.org/T201344) [18:58:15] (03PS3) 10Dzahn: icinga: keep notifications for icinga1001 itself disabled initially [puppet] - 10https://gerrit.wikimedia.org/r/458272 (https://phabricator.wikimedia.org/T201344) [18:58:36] (03CR) 10Dzahn: [C: 032] icinga: keep notifications for icinga1001 itself disabled initially [puppet] - 10https://gerrit.wikimedia.org/r/458272 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [18:59:12] (03CR) 10Dzahn: [C: 032] site: apply alerting_host role on icinga1001 [puppet] - 10https://gerrit.wikimedia.org/r/458093 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [18:59:19] (03PS3) 10Dzahn: site: apply alerting_host role on icinga1001 [puppet] - 10https://gerrit.wikimedia.org/r/458093 (https://phabricator.wikimedia.org/T201344) [19:00:04] Deploy window MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T1900) [19:18:55] greg-g: would like to roll out a regression fix for Kartographer with mateusbs17 sometime between now and the next swat (I'm unavailable then and can shephard the deploy myself). Was going to take the train slot, but can wait. is that okay? [19:19:30] Krinkle: I'm still scapping for the fixcopyright stuff :/ [19:21:47] Krinkle: after lego's done you're good to go [19:21:55] US train slot is unused this week [19:23:07] (03CR) 10Mathew.onipe: "> Patch Set 14:" (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) (owner: 10Mathew.onipe) [19:23:24] greg-g: thanks [19:23:51] Krinkle: Three for Kartographer plus l.egoktm's own fix for EducationProgram and one for Score. https://gerrit.wikimedia.org/r/q/is:open+(branch:wmf%252F1.32.0-wmf.20+OR+branch:wmf%252F1.32.0-wmf.19) [19:24:19] (03PS18) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) [19:24:21] eh, I'm not doing actual swat though. I'm just doing this one fix to two branches and then have a meeting. [19:25:12] Sure. [19:25:29] (03CR) 10jerkins-bot: [V: 04-1] Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) (owner: 10Mathew.onipe) [19:28:07] (03PS19) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) [19:29:17] (03CR) 10jerkins-bot: [V: 04-1] Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) (owner: 10Mathew.onipe) [19:29:45] (03PS2) 10Andrew Bogott: puppet-merge: add some conftool extras [puppet] - 10https://gerrit.wikimedia.org/r/413745 (https://phabricator.wikimedia.org/T157133) [19:33:19] (03PS1) 10Dzahn: icinga::web: install libapache2-mod-php if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458277 (https://phabricator.wikimedia.org/T201344) [19:37:10] (03CR) 10Cwhite: [C: 031] icinga::web: install libapache2-mod-php if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458277 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [19:38:09] (03PS2) 10Dzahn: icinga::web: install libapache2-mod-php if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458277 (https://phabricator.wikimedia.org/T201344) [19:42:44] !log legoktm@deploy1001 Finished scap: build l10n for EUCopyrightCampaign and SkinPerPage (duration: 64m 26s) [19:42:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:50] FINALLY [19:43:13] Yay. [19:43:23] ok, now the fun part [19:43:31] Yup, https://fixcopyright.wikimedia.org/wiki/Special:Version shows the new hash. [19:43:50] What, me, stalking the deploy? Nuh-uh. [19:43:50] (03CR) 10Legoktm: [C: 032] Enable EUCopyrightCampaign extensions and SkinPerPage for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458070 (https://phabricator.wikimedia.org/T203296) (owner: 10Legoktm) [19:44:37] (03CR) 10Legoktm: [C: 032] Enable $wgULSLanguageDetection for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458071 (https://phabricator.wikimedia.org/T203179) (owner: 10Legoktm) [19:44:40] (03CR) 10Legoktm: [C: 032] Set $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458090 (https://phabricator.wikimedia.org/T203296) (owner: 10Legoktm) [19:45:09] (03Merged) 10jenkins-bot: Enable EUCopyrightCampaign extensions and SkinPerPage for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458070 (https://phabricator.wikimedia.org/T203296) (owner: 10Legoktm) [19:45:29] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1002/12369/" [puppet] - 10https://gerrit.wikimedia.org/r/458277 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [19:45:49] (03CR) 10Andrew Bogott: [C: 031] "This seems fine to me. As far as I know this module isn't applied on cloud instances anyway, and I can't think of why it would matter on " [puppet] - 10https://gerrit.wikimedia.org/r/454291 (https://phabricator.wikimedia.org/T134476) (owner: 10Jcrespo) [19:45:51] (03Merged) 10jenkins-bot: Enable $wgULSLanguageDetection for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458071 (https://phabricator.wikimedia.org/T203179) (owner: 10Legoktm) [19:46:19] (03Merged) 10jenkins-bot: Set $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458090 (https://phabricator.wikimedia.org/T203296) (owner: 10Legoktm) [19:46:26] (03CR) 10Dzahn: [C: 032] "the only difference for einsteinium is that we now require Libapache2_mod_php5 which is already installed" [puppet] - 10https://gerrit.wikimedia.org/r/458277 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [19:46:47] (03PS3) 10Dzahn: icinga::web: install libapache2-mod-php if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458277 (https://phabricator.wikimedia.org/T201344) [19:46:49] !log legoktm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable EUCopyrightCampaign extensions and SkinPerPage for fixcopyrightwiki (duration: 00m 56s) [19:46:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:40] (03CR) 10Krinkle: [C: 031] mediawiki: improve validation checks [software/spicerack] - 10https://gerrit.wikimedia.org/r/458218 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [19:49:05] !log legoktm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set $wgSitename and $wgULSLanguageDetection on fixcopyrightwiki (duration: 00m 57s) [19:49:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:29] (03CR) 10Dzahn: "ooh! very cool, a ticket with a detailed plan already exists :) thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [19:50:11] (03PS7) 10Dzahn: quarry::database: Use mariadb instead of mysql module [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [19:50:13] mateusbs17: I've staged the Kartographer change for wmf.20 on mwdebug1002 [19:51:13] Krinkle: Ok, I am on it. [19:51:32] 10Operations, 10DBA, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) T202588 exists for the quarry migration. That will unblock a lot of this. [19:52:52] Krinkle: wmf20 looks good to me. [19:53:01] mateusbs17: staged wmf.19 as well now [19:53:05] 10Operations, 10Cloud-Services, 10Community-Wikimetrics, 10DBA, and 2 others: Evaluate future of wmf puppet module "mysql" - https://phabricator.wikimedia.org/T165625 (10Dzahn) T202588 exists for the quarry migration. that will unblock a lot of this. Also T162070 is a duplicate of this ticket in a way. [19:53:08] legoktm: ok to sync? [19:53:09] (03CR) 10jenkins-bot: Enable EUCopyrightCampaign extensions and SkinPerPage for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458070 (https://phabricator.wikimedia.org/T203296) (owner: 10Legoktm) [19:53:11] (03CR) 10jenkins-bot: Enable $wgULSLanguageDetection for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458071 (https://phabricator.wikimedia.org/T203179) (owner: 10Legoktm) [19:53:13] (03CR) 10jenkins-bot: Set $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458090 (https://phabricator.wikimedia.org/T203296) (owner: 10Legoktm) [19:53:36] Krinkle: er, one sec [19:53:40] k :) [19:53:59] Both works fine. If you want to see it too I am using the following links [19:54:15] wmf.19 https://pt.wikipedia.org/wiki/Usu%C3%A1rio(a):MSantos_(WMF)/Testing_maplink [19:54:23] wmf.20 https://en.wikivoyage.org/wiki/User:MSantos_(WMF)/Testing_maplink [19:54:50] 10Operations, 10monitoring: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) wrong ticket by accident. these should have been here as well: icinga: keep notifications for icinga1001 itself disabled initially https://gerrit.wikimedia.org/r/458272 s... [19:55:15] 10Operations, 10monitoring: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) p:05Normal>03High [19:57:25] mateusbs17: checked, looks good indeed. [19:58:51] (03PS4) 10Dzahn: icinga::web: install libapache2-mod-php if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458277 (https://phabricator.wikimedia.org/T202782) [19:59:08] (03CR) 10Dzahn: [C: 032] icinga::web: install libapache2-mod-php if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/458277 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: I, the Bot under the Fountain, allow thee, The Deployer, to do Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T2000). [20:05:10] 10Operations, 10Traffic, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review: Sort out HTTP caching issues for fixcopyright wiki - https://phabricator.wikimedia.org/T203179 (10Legoktm) I didn't read the ULS code properly, it looks like we need `$wgULSAnonCanChangeLanguage` t... [20:05:48] 10Operations, 10Data-Services, 10Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083 (10Bstorm) [20:08:31] (03PS1) 10Legoktm: Tweak $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458283 [20:08:46] (03CR) 10Legoktm: [C: 032] Tweak $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458283 (owner: 10Legoktm) [20:09:41] !log arlolra@deploy1001 Started deploy [parsoid/deploy@f3ef0c8]: Updating Parsoid to 740b3a4 [20:09:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:25] (03Merged) 10jenkins-bot: Tweak $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458283 (owner: 10Legoktm) [20:11:27] (03CR) 10jenkins-bot: Tweak $wgSitename for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458283 (owner: 10Legoktm) [20:11:29] (03PS3) 10Andrew Bogott: m5: add grants for designate on cloudservices1003 and 1004 [puppet] - 10https://gerrit.wikimedia.org/r/452997 [20:11:31] (03PS1) 10Andrew Bogott: m5 grants: replace designate password hash with a private lookup [puppet] - 10https://gerrit.wikimedia.org/r/458284 [20:11:48] !log legoktm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Tweak $wgSitename for fixcopyrightwiki (duration: 00m 58s) [20:11:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:29] !log legoktm@deploy1001:~$ cat /home/legoktm/fixcopyright | mwscript purgeList.php --wiki=aawiki [20:12:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:47] Krinkle: done. [20:12:48] I hope [20:13:17] OK [20:13:32] mateusbs17: rolling out to both branches now [20:14:12] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.20/extensions/Kartographer/: Ie5744d45b - T203427 (duration: 00m 58s) [20:14:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:18] T203427: behavior completely destroyed - https://phabricator.wikimedia.org/T203427 [20:17:58] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.19/extensions/Kartographer/: Ie5744d45b - T203427 (duration: 00m 57s) [20:18:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:08] mateusbs17: please verify once more :) [20:19:04] (03CR) 10Volans: [C: 04-1] "It seems to me there are some issues, but please have Giuseppe have a look too. (I've just added him as a reviewer)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/413745 (https://phabricator.wikimedia.org/T157133) (owner: 10Andrew Bogott) [20:19:38] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@f3ef0c8]: Updating Parsoid to 740b3a4 (duration: 09m 57s) [20:19:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:19:48] `wmf.20` is ok, `wmf.19` not yet [20:20:08] (03CR) 10Volans: [C: 032] sre.switchdc.mediawiki: phase 0 improve logging [cookbooks] - 10https://gerrit.wikimedia.org/r/458219 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [20:20:40] Krinkle: could it be cache? [20:20:48] (03Merged) 10jenkins-bot: sre.switchdc.mediawiki: phase 0 improve logging [cookbooks] - 10https://gerrit.wikimedia.org/r/458219 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [20:20:48] mateusbs17: could be. [20:21:02] may take upto 5min for the startup module which triggers the JS request to get the new version. [20:21:04] thats quite normal [20:21:27] Ok, now I confirm. Both deploy works fine. [20:21:33] mateusbs17: Maybe check with debug to mwdebug1001 (which is currently clean), or try again in a few mimnutes. [20:21:35] (03CR) 10Volans: [C: 032] mediawiki: improve validation checks [software/spicerack] - 10https://gerrit.wikimedia.org/r/458218 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [20:21:37] OK. cool [20:21:38] (03PS1) 10Herron: mtail: add exim tls ciphersuite metrics [puppet] - 10https://gerrit.wikimedia.org/r/458289 (https://phabricator.wikimedia.org/T203260) [20:21:38] Thanks ! [20:22:17] (03CR) 10jerkins-bot: [V: 04-1] mtail: add exim tls ciphersuite metrics [puppet] - 10https://gerrit.wikimedia.org/r/458289 (https://phabricator.wikimedia.org/T203260) (owner: 10Herron) [20:22:33] (03PS2) 10Volans: sre.switchdc.mediawiki: phase 2 add sleep [cookbooks] - 10https://gerrit.wikimedia.org/r/458221 (https://phabricator.wikimedia.org/T199079) [20:22:37] (03Merged) 10jenkins-bot: mediawiki: improve validation checks [software/spicerack] - 10https://gerrit.wikimedia.org/r/458218 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [20:22:44] thanks mateusbs17 & Krinkle !!! [20:23:14] No problem! [20:25:20] (03PS2) 10Herron: mtail: add exim tls ciphersuite metrics [puppet] - 10https://gerrit.wikimedia.org/r/458289 (https://phabricator.wikimedia.org/T203260) [20:26:02] (03CR) 10jerkins-bot: [V: 04-1] mtail: add exim tls ciphersuite metrics [puppet] - 10https://gerrit.wikimedia.org/r/458289 (https://phabricator.wikimedia.org/T203260) (owner: 10Herron) [20:27:12] (03PS3) 10Herron: mtail: add exim tls ciphersuite metrics [puppet] - 10https://gerrit.wikimedia.org/r/458289 (https://phabricator.wikimedia.org/T203260) [20:27:45] (03CR) 10jerkins-bot: [V: 04-1] mtail: add exim tls ciphersuite metrics [puppet] - 10https://gerrit.wikimedia.org/r/458289 (https://phabricator.wikimedia.org/T203260) (owner: 10Herron) [20:28:04] since the log contents being tested have changed with this patch it would be nice if all errors were printed at once :( [20:28:18] !log Updated Parsoid to 740b3a4 (T198400, T202819) [20:28:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:25] (03PS3) 10Andrew Bogott: puppet-merge: add some conftool extras [puppet] - 10https://gerrit.wikimedia.org/r/413745 (https://phabricator.wikimedia.org/T157133) [20:28:25] T202819: Create production wiki: fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T202819 [20:28:26] T198400: Create Wikipedia Santali - https://phabricator.wikimedia.org/T198400 [20:29:15] (03CR) 10Volans: [C: 04-1] "Small thing to fix inline" (031 comment) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458223 (owner: 10Faidon Liambotis) [20:30:49] (03PS4) 10Herron: mtail: add exim tls ciphersuite metrics [puppet] - 10https://gerrit.wikimedia.org/r/458289 (https://phabricator.wikimedia.org/T203260) [20:32:21] (03PS20) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) [20:34:38] (03CR) 10Volans: [C: 031] "LGTM, optional nitpick inline" (031 comment) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458224 (owner: 10Faidon Liambotis) [20:37:07] (03PS1) 10Bstorm: labstore: Change tcp buffer settings [puppet] - 10https://gerrit.wikimedia.org/r/458291 (https://phabricator.wikimedia.org/T203254) [20:37:17] 10Operations, 10netops, 10Wikimedia-Incident: asw2-a-eqiad FPC5 gets disconnected every 10 minutes - https://phabricator.wikimedia.org/T201145 (10ayounsi) Migration step toward a compatible topology. {F25670034} **Step 1** [] Disconnect already disabled links - fpc1-fpc3 (5m DAC) - fpc3-fpc5 (5m DAC) - fp... [20:37:42] (03CR) 10jerkins-bot: [V: 04-1] labstore: Change tcp buffer settings [puppet] - 10https://gerrit.wikimedia.org/r/458291 (https://phabricator.wikimedia.org/T203254) (owner: 10Bstorm) [20:40:27] (03PS2) 10Bstorm: labstore: Change tcp buffer settings [puppet] - 10https://gerrit.wikimedia.org/r/458291 (https://phabricator.wikimedia.org/T203254) [20:41:01] (03CR) 10Volans: [C: 031] "LGTM" [software/keyholder] - 10https://gerrit.wikimedia.org/r/458225 (owner: 10Faidon Liambotis) [20:45:41] (03PS1) 10Cwhite: profile: use mariadb-client in stretch. maintain backwards compatibility with current infra [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) [20:46:22] (03CR) 10jerkins-bot: [V: 04-1] profile: use mariadb-client in stretch. maintain backwards compatibility with current infra [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [20:46:55] (03CR) 10Volans: "Looks good, just a couple of minor possible improvements inline" (032 comments) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458226 (owner: 10Faidon Liambotis) [20:47:29] (03CR) 10Volans: [C: 031] "LGTM" [software/keyholder] - 10https://gerrit.wikimedia.org/r/458227 (owner: 10Faidon Liambotis) [20:49:31] (03CR) 10Volans: [C: 031] "LGTM" [software/keyholder] - 10https://gerrit.wikimedia.org/r/458228 (owner: 10Faidon Liambotis) [20:50:13] (03CR) 10Volans: [C: 031] "LGTM" [software/keyholder] - 10https://gerrit.wikimedia.org/r/458229 (owner: 10Faidon Liambotis) [20:50:55] (03PS2) 10Cwhite: profile: use mariadb-client in stretch. maintain backwards compatibility with current infra [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) [20:51:31] 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Paladox) Note that when i upgraded one of our wiki farms icinga that the path to nagios.cmd was changed to icinga.cmd (in stretch) [20:51:55] (03CR) 10jerkins-bot: [V: 04-1] profile: use mariadb-client in stretch. maintain backwards compatibility with current infra [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [20:53:22] 10Operations, 10Mail, 10Release-Engineering-Team: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10herron) [20:53:32] 10Operations, 10Mail, 10Release-Engineering-Team, 10User-herron: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10herron) [20:55:42] (03PS3) 10Cwhite: profile: use mariadb-client in stretch. [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) [20:59:14] (03CR) 10Dzahn: [C: 031] "as discussed on IRC, we want to replace all uses of the mysql module and we also don't want to touch einsteinium but after migrating away " [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [20:59:21] !log clear bgp neighbor 80.249.209.209 on cr2-esams (max prefix limit) [20:59:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:09] (03CR) 10Dzahn: [C: 031] "https://puppet-compiler.wmflabs.org/compiler1002/12371/ lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [21:06:37] 10Operations, 10DBA, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10colewhite) Mysql module is also used in puppet/modules/profile/manifests/icinga.pp. This should be removed once the transition to st... [21:08:22] (03CR) 10Dzahn: [C: 032] profile: use mariadb-client in stretch. [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [21:09:36] 10Operations, 10Traffic, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review: Sort out HTTP caching issues for fixcopyright wiki - https://phabricator.wikimedia.org/T203179 (10Bawolff) >>! In T203179#4558234, @Legoktm wrote: > Great :) And the special page TTL was fixed in h... [21:10:17] (03CR) 10Dzahn: [C: 032] "this was "profile::icinga", only affects icinga hosts on stretch" [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [21:12:44] (03CR) 10Dzahn: [C: 032] "puppet run on icinga1001 now all green, first time on stretch:)" [puppet] - 10https://gerrit.wikimedia.org/r/458294 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [21:13:20] 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10colewhite) >>! In T202782#4560975, @Paladox wrote: > Note that when i upgraded one of our wiki farms icinga that the path to nagios.cmd was changed to icinga... [21:14:40] 10Operations, 10Traffic, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review: Sort out HTTP caching issues for fixcopyright wiki - https://phabricator.wikimedia.org/T203179 (10Bawolff) > > Varnish seems to be caching it fine (Assuming you don't have cookies, which I imagin... [21:18:27] 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) After the changes above we now have for the first time a working role(alerting_host) on stretch that is applied on icinga1001 and runs without errors.... [21:28:09] 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) >>! In T202782#4561035, @colewhite wrote: > Good catch. There are 7 files currently referencing nagios.cmd. > > # modules/icinga/files/raid_handler.... [21:30:40] 10Operations, 10Traffic, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review: Sort out HTTP caching issues for fixcopyright wiki - https://phabricator.wikimedia.org/T203179 (10BBlack) >>! In T203179#4561033, @Bawolff wrote: > Varnish seems to be caching it fine (Assuming you... [21:31:57] bblack: i meant the cache varying one's. I was under the impression that there is magic making WMF-last-access/geoip not trigger cache varying even though its a cookie [21:35:18] 10Operations, 10Traffic, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review: Sort out HTTP caching issues for fixcopyright wiki - https://phabricator.wikimedia.org/T203179 (10Bawolff) However, if we use ULS to set the language, the assumption might be that people will touch... [21:36:01] (03PS1) 10Dzahn: icinga-downtime: add stretch support [puppet] - 10https://gerrit.wikimedia.org/r/458301 (https://phabricator.wikimedia.org/T202782) [21:36:22] (03PS1) 10Volans: Upstream release v0.0.5 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/458302 (https://phabricator.wikimedia.org/T199079) [21:43:35] (03CR) 10Volans: [C: 032] Upstream release v0.0.5 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/458302 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [21:43:37] (03CR) 10Cwhite: [C: 031] icinga-downtime: add stretch support [puppet] - 10https://gerrit.wikimedia.org/r/458301 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [21:44:42] (03Merged) 10jenkins-bot: Upstream release v0.0.5 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/458302 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [21:45:36] (03CR) 10Gehel: Elasticsearch module is coming up. (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) (owner: 10Mathew.onipe) [21:46:31] (03CR) 10Gehel: "A few minor comments inline, but this is looking really good! Congratulation for the work! And thanks for all the reworks (almost) without" [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T202885) (owner: 10Mathew.onipe) [21:47:58] (03PS1) 10Dzahn: icinga: add stretch support to submit_check_result.sh [puppet] - 10https://gerrit.wikimedia.org/r/458306 (https://phabricator.wikimedia.org/T202782) [21:54:04] !log uploaded spicerack_0.0.5-1{,+deb9u1} to apt.wikimedia.org {jessie,stretch}-wikimedia - T199079 [21:54:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:10] T199079: Refactor the switchdc script - https://phabricator.wikimedia.org/T199079 [21:56:30] (03CR) 10Paladox: [C: 031] icinga-downtime: add stretch support [puppet] - 10https://gerrit.wikimedia.org/r/458301 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [22:01:12] (03PS2) 10Dzahn: icinga: add stretch support to submit_check_result.sh [puppet] - 10https://gerrit.wikimedia.org/r/458306 (https://phabricator.wikimedia.org/T202782) [22:03:28] (03PS2) 10Dzahn: icinga-downtime: add stretch support [puppet] - 10https://gerrit.wikimedia.org/r/458301 (https://phabricator.wikimedia.org/T202782) [22:18:55] (03CR) 10Cwhite: [C: 031] "This is a bit more future-proof. I like this better." [puppet] - 10https://gerrit.wikimedia.org/r/458301 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [22:29:33] 10Operations, 10MediaWiki-extensions-Score: crackling at start of OGG renditions of MIDI files (fixed in TiMidity++ 2.14.0) - https://phabricator.wikimedia.org/T50029 (10Ebe123) [22:44:59] 10Operations, 10Traffic, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review: Sort out HTTP caching issues for fixcopyright wiki - https://phabricator.wikimedia.org/T203179 (10Legoktm) >>! In T203179#4561076, @BBlack wrote: >>>! In T203179#4561033, @Bawolff wrote: >> Varnish... [22:49:29] (03PS1) 10Cwhite: profile: update python scripts to detect command file [puppet] - 10https://gerrit.wikimedia.org/r/458325 (https://phabricator.wikimedia.org/T202782) [22:50:14] (03CR) 10jerkins-bot: [V: 04-1] profile: update python scripts to detect command file [puppet] - 10https://gerrit.wikimedia.org/r/458325 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [22:50:22] (03CR) 10Paladox: profile: update python scripts to detect command file (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/458325 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [22:58:10] (03PS2) 10Cwhite: profile: update python scripts to detect command file [puppet] - 10https://gerrit.wikimedia.org/r/458325 (https://phabricator.wikimedia.org/T202782) [23:00:02] * James_F waves. [23:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180905T2300). [23:00:04] Ebe123 and James_F: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:21] * Ebe123 waves. [23:00:35] I'll do it. [23:01:22] Thank you! [23:08:50] (03CR) 10Cwhite: ">" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/458325 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [23:09:24] Ebe123: It's live on mwdebug1002, can you test or do you need me to? [23:09:32] I'll test [23:14:54] !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.20/extensions/EducationProgram/includes/pagers/StudentPager.php: SWAT Fix spammy log errors T203577 (duration: 00m 58s) [23:14:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:15:02] T203577: ErrorException from line 116 of /srv/mediawiki/php-1.32.0-wmf.20/extensions/EducationProgram/includes/pagers/StudentPager.php: PHP Notice: Undefined variable: retValue - https://phabricator.wikimedia.org/T203577 [23:15:17] (03CR) 10Faidon Liambotis: Pass argv from to main() -> parse_args -> argparse (031 comment) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458223 (owner: 10Faidon Liambotis) [23:18:13] Ebe123: How's it going? [23:19:50] It's good; not causing any new problems [23:20:05] Cool. [23:20:50] (03CR) 10Faidon Liambotis: Don't barf on an empty or invalid YAML config (032 comments) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458226 (owner: 10Faidon Liambotis) [23:21:24] Think there is a problem. Let's rollback [23:22:32] !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.20/extensions/Score/includes/Score.php: SWAT Fix error on malformed MIDI files T203560 (duration: 01m 04s) [23:22:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:39] T203560: Notice: Undefined index: qb4tlxyr.ogg in /srv/mediawiki/php-1.32.0-wmf.19/extensions/Score/includes/Score.php on line 507 - https://phabricator.wikimedia.org/T203560 [23:22:44] Oh, you sure? [23:23:33] (03CR) 10Dzahn: [C: 031] "+1. we need this to support Icinga on both jessie and stretch where the location of the commandfile changes. I will let volans check it be" [puppet] - 10https://gerrit.wikimedia.org/r/458325 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [23:24:36] (03CR) 10Dzahn: [C: 031] "call it "icinga" instead of "profile" in the commit message. just profile makes it not obvious what it touches" [puppet] - 10https://gerrit.wikimedia.org/r/458325 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite) [23:25:04] (03CR) 10Dzahn: [C: 032] icinga-downtime: add stretch support [puppet] - 10https://gerrit.wikimedia.org/r/458301 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:25:11] (03PS3) 10Dzahn: icinga-downtime: add stretch support [puppet] - 10https://gerrit.wikimedia.org/r/458301 (https://phabricator.wikimedia.org/T202782) [23:25:41] !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.20/extensions/EducationProgram/includes/pagers/StudentPager.php: SWAT Revert fix forT203577 (duration: 00m 56s) [23:25:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:00] Meh. [23:26:27] Ebe123: Reverted in production. [23:27:13] !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.20/extensions/Score/includes/Score.php: SWAT Fix error for T203560 (duration: 00m 54s) [23:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:57] (03PS3) 10Dzahn: icinga: add stretch support to submit_check_result.sh [puppet] - 10https://gerrit.wikimedia.org/r/458306 (https://phabricator.wikimedia.org/T202782) [23:30:25] 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) [23:31:59] 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) [23:32:29] (03CR) 10Dzahn: [C: 032] icinga: add stretch support to submit_check_result.sh [puppet] - 10https://gerrit.wikimedia.org/r/458306 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [23:34:06] (03CR) 10Dzahn: "20after4 or thcipriani, a review on this would be appreciated" [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [23:37:31] !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.20/extensions/Score: SWAT clean-up following messy deployment and undeployment of patch (duration: 00m 57s) [23:37:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:37:47] OK, SWAT over. [23:38:25] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service, 10User-Addshore: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Smalyshev) Getting lots of these errors from Wikidata again today. Any ideas what could be causing this?... [23:40:18] (03PS16) 10Paladox: Gerrit: Hook up gerrit.wmfusercontent.org to apache [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) [23:46:55] (03CR) 10Dzahn: "this is config for a plugin that isn't used yet, so we should be sure that gerrit doesn't dislike that." [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [23:47:31] (03CR) 10Paladox: [C: 031] "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [23:50:22] (03CR) 10Paladox: [C: 031] "Tested locally by applying the updated puppet change on tuesday and seeing:" [puppet] - 10https://gerrit.wikimedia.org/r/423794 (owner: 10Chad) [23:51:24] (03PS8) 10Paladox: Gerrit: Move all logging to /var/log/gerrit [puppet] - 10https://gerrit.wikimedia.org/r/423794 (owner: 10Chad) [23:51:44] James_F: Just that for a file, I didn't see the ogg file listed in the metadata [23:52:59] My bad anyways, I deactivated audio for that one... (Sorry for the delay; had to eat) [23:53:09] (03CR) 10Dzahn: Gerrit: Setup avatars url in gerrit config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [23:53:55] (03PS6) 10Paladox: Gerrit: Setup avatars url in gerrit config [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) [23:54:10] (03PS7) 10Paladox: Gerrit: Setup avatars url in gerrit config [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) [23:54:29] (03CR) 10Paladox: [C: 031] Gerrit: Setup avatars url in gerrit config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [23:56:26] (03CR) 10Dzahn: [C: 031] "sounds good. thanks for working on this and testing it. i'll do that soon. i always disliked it a little that logs were not in the standar" [puppet] - 10https://gerrit.wikimedia.org/r/423794 (owner: 10Chad) [23:59:55] errr