[00:00:09] error rate is fine. [00:00:29] yurik: what is islocal? [00:01:10] can we move on to evening SWAT? Is the state of the train stable? [00:01:12] Reedy, isLocal is when you have both storage and consumer in one wiki, and you don't allow remote access to it [00:01:18] ah [00:01:38] as oppose to storage wiki and remotely accessing it wikis [00:02:37] thcipriani: have they dropped off completely? [00:02:41] Reedy or thcipriani, can you move zerowiki to group1 plz [00:02:47] Reedy, there are no more log msgs [00:02:52] ^ [00:03:05] That can't be right [00:04:03] i'm sure its because nothing works at all :D [00:04:30] there are error messages but nothing pertaining to zerowiki showing up at any significant rate [00:05:08] er, JsonZeroConfig rather [00:05:22] Is there just kaldari's patch? [00:05:25] looks like it.... [00:06:07] (03PS1) 10Madhuvishy: drbd monitoring: Improve status and error messages [puppet] - 10https://gerrit.wikimedia.org/r/315874 [00:06:14] Ah [00:06:17] there are so [00:06:30] It's just such a small relative amount to the hundreds of thousands [00:06:47] kaldari: Still about? [00:06:53] yep [00:07:27] Is it standalone testable? If so, where do you want it staging? [00:09:03] (03PS1) 10Dzahn: repeat hostname for IPv6 for iron,kraz,lead,labtestweb [dns] - 10https://gerrit.wikimedia.org/r/315876 [00:09:11] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:09:14] It's not completely testable via the header plug-in since the change really only affects database interactions, but I can test on mw1099 to make sure it doesn't break anything. [00:10:10] (03CR) 10Dzahn: [C: 032] repeat hostname for IPv6 for iron,kraz,lead,labtestweb [dns] - 10https://gerrit.wikimedia.org/r/315876 (owner: 10Dzahn) [00:11:02] and the only tables it interacts with are pageassessments tables (which aren't even used for anything yet) so it's very low risk [00:11:17] yeah, it didn't seem obviously user facing [00:11:26] no, nothing user facing [00:11:49] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [00:11:55] ugh, that's daft [00:12:18] no, ignore me [00:12:40] good thing I don't know what the word daft means anyway :) [00:12:53] kaldari: live on mw1099 [00:14:54] Reedy: looks good to me, feel free to sync. [00:16:47] (03Abandoned) 10Dzahn: installserver: split 'mirror'-server into a separate role (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/305165 (owner: 10Dzahn) [00:17:30] !log reedy@mira Synchronized php-1.28.0-wmf.22/extensions/PageAssessments: [extensions/PageAssessments] Only update assessment data when talk pages are saved (duration: 00m 51s) [00:17:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:19:12] Reedy: I'm going AFK. thanks for taking evening SWAT/poking at Zero errors, appreciated. [00:19:29] thcipriani: Well, I did cause them to some extent ;) [00:19:34] thcipriani|afk: np though [00:20:28] Reedy: :D toodles [00:20:37] Reedy: everything looks good. Thanks for the help! [00:21:17] So mw .22 is live? [00:21:51] (03PS1) 10Dzahn: RT: rm duplicate firewall, mv standard incl to role [puppet] - 10https://gerrit.wikimedia.org/r/315877 [00:22:35] (03CR) 10Dzahn: [C: 032] RT: rm duplicate firewall, mv standard incl to role [puppet] - 10https://gerrit.wikimedia.org/r/315877 (owner: 10Dzahn) [00:24:11] Zppix|mobile: it's live :) [00:24:13] ahem, what happened to graphoid? [00:24:15] https://grafana.wikimedia.org/dashboard/db/service-graphoid?from=now%2Fw&to=now%2Fw [00:25:09] ah, never mind, silly me looked at "this week" rather than last 7 days [00:27:47] (03PS1) 10Dzahn: peopleweb: move base::firewall from node to role [puppet] - 10https://gerrit.wikimedia.org/r/315878 [00:30:40] (03CR) 10Dzahn: [C: 032] peopleweb: move base::firewall from node to role [puppet] - 10https://gerrit.wikimedia.org/r/315878 (owner: 10Dzahn) [00:30:54] (03PS1) 10Dzahn: pmacct: move firewall, standard include to role [puppet] - 10https://gerrit.wikimedia.org/r/315879 [00:33:56] (03PS1) 10Dzahn: restbase: move standard include to role [puppet] - 10https://gerrit.wikimedia.org/r/315880 [00:37:44] (03PS1) 10Dzahn: installserver: move standard include to role [puppet] - 10https://gerrit.wikimedia.org/r/315882 [00:38:22] (03CR) 10Reedy: "Want to revert out the Zero(Banner|Portal) wfLoad changes, and any config variable swap overs, and we can get this merged for the moment? " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) (owner: 10Dereckson) [00:40:10] (03PS1) 10Dzahn: url_downloader: move standard/firewall to role [puppet] - 10https://gerrit.wikimedia.org/r/315883 [00:42:55] (03PS1) 10Bearloga: Update R and C++-related stats puppet configs [puppet] - 10https://gerrit.wikimedia.org/r/315885 (https://phabricator.wikimedia.org/T147682) [00:45:24] (03PS1) 10Dzahn: lists::server: simplify includes on node level [puppet] - 10https://gerrit.wikimedia.org/r/315886 [00:50:11] (03PS1) 10Dzahn: site.pp: remove "include admin"s [puppet] - 10https://gerrit.wikimedia.org/r/315887 [00:51:00] 06Operations, 06Discovery, 06Discovery-Analysis (Current work), 13Patch-For-Review, 07Tracking: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2715129 (10mpopov) @gehel: I added you as a reviewer to that puppet patch (). How would we a... [00:52:13] 06Operations, 06Discovery, 06Discovery-Analysis (Current work), 13Patch-For-Review, 07Tracking: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2715130 (10mpopov) [01:01:53] (03PS1) 10Dzahn: logstash: move base::firewall from node to role [puppet] - 10https://gerrit.wikimedia.org/r/315888 [01:05:19] (03PS1) 10Dzahn: maps::server: move base::firewall to role [puppet] - 10https://gerrit.wikimedia.org/r/315889 [01:10:31] (03PS1) 10Dzahn: ganglia/netmon1001: rm ganglia::deprecated::collector [puppet] - 10https://gerrit.wikimedia.org/r/315890 [01:13:10] (03PS2) 10Dzahn: ganglia/netmon1001: rm ganglia::deprecated::collector [puppet] - 10https://gerrit.wikimedia.org/r/315890 [01:14:08] SPANISH WIKINEWS, the place lesbians are Alvaro's stupid editor [01:19:03] (03PS1) 10Dzahn: decom palladium from puppet, install_server, network constants [puppet] - 10https://gerrit.wikimedia.org/r/315891 (https://phabricator.wikimedia.org/T147320) [01:25:25] (03PS1) 10Dzahn: ori: update personal dot files [puppet] - 10https://gerrit.wikimedia.org/r/315893 (https://phabricator.wikimedia.org/T147320) [01:28:49] (03PS1) 10Dzahn: puppetmaster: replace palladium with puppetmaster1001 as default [puppet] - 10https://gerrit.wikimedia.org/r/315894 [01:29:43] (03PS2) 10Dzahn: puppetmaster: replace palladium with puppetmaster1001 as default [puppet] - 10https://gerrit.wikimedia.org/r/315894 (https://phabricator.wikimedia.org/T147320) [01:35:00] (03CR) 10Dzahn: "this looks pretty good, except one thing:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/315885 (https://phabricator.wikimedia.org/T147682) (owner: 10Bearloga) [01:37:16] (03CR) 10Dzahn: Update R and C++-related stats puppet configs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/315885 (https://phabricator.wikimedia.org/T147682) (owner: 10Bearloga) [01:37:33] (03CR) 10Dzahn: [C: 04-1] "E: Unable to locate package g++-4.9" [puppet] - 10https://gerrit.wikimedia.org/r/315885 (https://phabricator.wikimedia.org/T147682) (owner: 10Bearloga) [01:37:59] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Stirring The Pot, and 2 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2715208 (10AndyRussG) Earlier today @Ejegg and @XenoRyet deployed the MessageCache patch to the beta clus... [01:43:28] ALVARO VALE MIERDA [01:45:27] AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ES [01:45:35] MIERDA [02:10:23] AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ES [02:21:22] PROBLEM - puppet last run on labvirt1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:31:15] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Stirring The Pot, and 2 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2715263 (10RobLa-WMF) In order to understand if rMWaa5be0c1e03d is a safe change, I'd need to better unde... [02:33:07] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.22) (duration: 12m 03s) [02:33:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:33:21] I am running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 2.8.0.0 [libirc v. 1.0.3] my source code is licensed under GPL and located at https://github.com/benapetr/wikimedia-bot I will be very happy if you fix my bugs or implement new features [02:33:21] @help [02:33:30] @log off [02:33:36] @logoff [02:33:36] Permission denied [02:33:41] @logon [02:33:41] Permission denied [02:33:45] @reboot [02:33:57] Permission denied [02:33:57] @part [02:34:04] @quit [02:34:13] @admin [02:34:22] @log [02:34:35] @ask [02:35:35] @op [02:35:44] I am running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 2.8.0.0 [libirc v. 1.0.3] my source code is licensed under GPL and located at https://github.com/benapetr/wikimedia-bot I will be very happy if you fix my bugs or implement new features [02:35:44] @help [02:35:47] I am running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 2.8.0.0 [libirc v. 1.0.3] my source code is licensed under GPL and located at https://github.com/benapetr/wikimedia-bot I will be very happy if you fix my bugs or implement new features [02:35:47] @help [02:35:54] @bugs [02:35:57] @debug [02:36:02] !log [02:36:22] !log mdesploy@eswikinews [02:36:27] MIERDA [02:36:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:36:32] LALALALALA [02:39:28] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Oct 14 02:39:27 UTC 2016 (duration 6m 20s) [02:39:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:39:56] !log AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ES [02:40:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:40:12] !log ALVARO MOLINA ES LA PERSONA MAS ESTUPIDA DEL MUNDO [02:40:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:40:20] !log LALALA [02:40:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:45:01] PROBLEM - puppet last run on elastic1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:45:22] !log ALVARO ES UNA BASURA DESCOMPUESTA AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ES [02:45:26] !log AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ES [02:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:46:41] RECOVERY - puppet last run on labvirt1009 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [03:07:50] RECOVERY - puppet last run on elastic1032 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [03:11:49] Platonides can those log messages be deleted? [03:15:52] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:41:02] RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [04:26:39] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Stirring The Pot, and 2 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2715343 (10AndyRussG) Still not able to reproduce on the beta cluster any more. FWIW here are links to p... [05:54:07] (03CR) 10Giuseppe Lavagetto: [C: 031] "I assumed cross-dc was not a problem for data with no privacy implications and no political value as the public list of our pooled servers" [puppet] - 10https://gerrit.wikimedia.org/r/315531 (https://phabricator.wikimedia.org/T147847) (owner: 10BBlack) [05:55:19] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/2/1: down - Core: cr1-eqord:xe-0/0/0 (Telia, IC-314534, 24ms) {#10694} [10Gbps wave]BR [05:56:11] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/0: down - Core: cr2-codfw:xe-5/2/1 (Telia, IC-314534, 29ms) {#11375} [10Gbps wave]BR [05:58:50] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [06:00:31] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [06:06:30] 06Operations: Clean up wikitech twitter account - https://phabricator.wikimedia.org/T148119#2715427 (10Matthewrbowker) [06:07:49] Zppix: Re your question earlier ^ [06:21:17] !log AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ES [06:21:23] !log AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ES MIERDA [06:21:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:23:36] How can be someone that upset..it is friday! [06:24:08] Can someone clean the log up, especially the Twitter bridge? https://phabricator.wikimedia.org/T148119 [06:24:20] Oh, Matthew_ is already in here. [06:24:25] * marktraceur is seriously not conscious [06:24:52] marktraceur: Indeed. Also related: https://phabricator.wikimedia.org/T148120 (It's a security bug, I can add you if needed) [06:25:33] marktraceur: I have removed those lines from SAL [06:26:00] marostegui: Yes, but it's showing up on the public twitter. Someone needs to delete the tweets themselves. [06:26:20] Ah, that I don't have access to :( [06:26:54] That's what the task is about :) Also why I filed it [06:28:50] !log AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ES [06:29:23] !log AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ES MIERDA [06:29:54] 06Operations, 05Security: Clean up wikitech twitter account - https://phabricator.wikimedia.org/T148119#2715460 (10Matthewrbowker) This is indicative of an exploit, hiding (Hope I did that right) [06:54:34] (03CR) 10Muehlenhoff: [C: 04-1] "This would break any labs instance which doesn't have logstash::cluster_nodes set in Hiera. deployment-prep does that, but we need to chec" [puppet] - 10https://gerrit.wikimedia.org/r/315888 (owner: 10Dzahn) [06:59:57] (03CR) 10Hashar: "So I guess it was previously broken? :(" [puppet] - 10https://gerrit.wikimedia.org/r/315849 (owner: 10Chad) [07:06:03] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/0: down - Core: cr2-codfw:xe-5/2/1 (Telia, IC-314534, 29ms) {#11375} [10Gbps wave]BR [07:08:23] (03CR) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [07:08:30] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [07:08:43] (03PS10) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [07:10:34] (03PS11) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [07:11:13] (03PS12) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [07:12:14] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [07:13:08] !log upgrading hhvm in codfw to latest 3.12.x bugfix release [07:13:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:15:23] (03PS13) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [07:15:54] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [07:16:52] (03PS10) 10Hashar: zuul: refactor to use hiera [puppet] - 10https://gerrit.wikimedia.org/r/308778 (https://phabricator.wikimedia.org/T139527) [07:17:10] Dropping hitcounter, _counter memory tables in S7 on db1041 (master) - T132837 [07:17:11] T132837: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837 [07:17:20] !log Dropping hitcounter, _counter memory tables in S7 on db1041 (master) - T132837 [07:17:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:18:12] (03PS14) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [07:18:34] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:18:48] oh my god puppet [07:18:51] it is never ending :] [07:22:34] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:26:04] (03CR) 10Hashar: "The zuul::merger class had a require ::zuul and somehow that caused the Class[zuul] to disappear from scandium." [puppet] - 10https://gerrit.wikimedia.org/r/308778 (https://phabricator.wikimedia.org/T139527) (owner: 10Hashar) [07:27:28] (03PS5) 10Hashar: zuul: migrate server only settings out of merger [puppet] - 10https://gerrit.wikimedia.org/r/309299 [07:37:41] (03PS1) 10Hashar: zuul: stop managing unix user/group [puppet] - 10https://gerrit.wikimedia.org/r/315902 [07:39:33] (03PS1) 10Muehlenhoff: Remove decomissioned mw1217 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/315903 [07:41:34] (03CR) 10Elukey: [C: 031] Remove decomissioned mw1217 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/315903 (owner: 10Muehlenhoff) [07:43:16] !log AlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEX Z AND UAWIKI IN #WIKIPEDIA-ESAlvaroMolina IT'S TRUSH AND LOVES ALEXZ AND UAWIKI IN #WIKIPEDIA-ES [07:43:42] !log ALVARO ES EL PEOR HIJO DE PUTA QUE HA HABIDO EN WIKIPEDIA [07:43:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:46:31] (03CR) 10Muehlenhoff: [C: 032] Remove decomissioned mw1217 from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/315903 (owner: 10Muehlenhoff) [07:47:18] !log reimaging mw1161 to Debian Jessie (MW Jobrunner, scap proxy) [07:47:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:47:56] PROBLEM - HHVM rendering on mw2088 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:50:00] robh: Platonides ^ for the record I asked AlexZ to do that as an stop gap to kicking/banning morebots for now. Not ideal and obviously feel free to change if necessary ( we need a longer term strategy though see https://phabricator.wikimedia.org/T148120 ) [07:50:22] RECOVERY - HHVM rendering on mw2088 is OK: HTTP OK: HTTP/1.1 200 OK - 77068 bytes in 0.348 second response time [07:51:31] 06Operations, 10ops-eqiad, 13Patch-For-Review: Broken memory on mw1217 - https://phabricator.wikimedia.org/T138925#2715509 (10MoritzMuehlenhoff) I merged https://gerrit.wikimedia.org/r/#/c/315903/ to remove it from site.pp and also removed the salt key. [08:13:36] 06Operations, 06Analytics-Kanban, 10Traffic, 07HTTPS: OCG failing with new GlobalSign intermediate workaround - https://phabricator.wikimedia.org/T148076#2715534 (10MoritzMuehlenhoff) Beside ocg we have a other precise/trusty systems not using nodejs 4: - sca1* still has it installed, but the only remainin... [08:20:47] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[generate_varnishkafka_webrequest_gmond_pyconf] [08:20:57] (03PS15) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [08:25:17] (03PS16) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [08:26:57] 06Operations, 13Patch-For-Review: Migrate pool counters to trusty/jessie - https://phabricator.wikimedia.org/T123734#2715559 (10MoritzMuehlenhoff) As these are not not one-off servers, we should rather use the opportunity by starting with poolcounter1001.eqiad,wmnet and adapting the other servers as they get r... [08:30:17] (03CR) 10Gehel: "g++-4.9 and gfortran-4.9 are not available. The rest looks good to me." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/315885 (https://phabricator.wikimedia.org/T147682) (owner: 10Bearloga) [08:38:32] 06Operations, 06Discovery, 06Discovery-Analysis (Current work), 13Patch-For-Review, 07Tracking: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2715565 (10Gehel) @mpopov: as I understand it, we are not very keen on adding PPA to our apt... [08:48:24] RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:50:28] (03PS17) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [08:57:28] (03CR) 10Jcrespo: "> I just uploaded that 2 days ago, there can't be that many changes since then and of course i'd rebase that manually, i was just waiting " [puppet] - 10https://gerrit.wikimedia.org/r/315343 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [08:59:36] !log mw1161 back in service after reimage (MW Jobrunner, scap proxdy) [08:59:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:59:55] 4 jobrunners to go [09:00:30] 06Operations, 13Patch-For-Review: Migrate pool counters to trusty/jessie - https://phabricator.wikimedia.org/T123734#2715597 (10fgiunchedi) @MoritzMuehlenhoff poolcounter1001 would work for me, though usually PC lives on a shared baremetal machine since its requirements are very small. Given how critical the s... [09:01:25] jynus: yeah, I was referring more to who ever would be working on the implementation to work on the new task, not specifically you as I implied [09:01:47] I was actually now answering you :-) [09:01:51] *not [09:02:22] but I wanted to be subtle about it [09:04:46] (03CR) 10Mobrovac: Conftool: Create script that checks the state after (de)pooling (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [09:06:44] (03CR) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [09:08:40] 06Operations, 06Analytics-Kanban, 10Traffic, 07HTTPS: OCG failing with new GlobalSign intermediate workaround - https://phabricator.wikimedia.org/T148076#2715606 (10akosiaris) >>! In T148076#2714544, @Volans wrote: > FYI it's worth noticing that the upgrade of NodeJS for this service looks a bit broken by... [09:10:42] (03PS1) 10Jon Harald Søby: Adding language name configuration for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315912 (https://phabricator.wikimedia.org/T113408) [09:12:14] (03CR) 10Jcrespo: "> Well, adding you here was my way to contact you to get your approval and make sure you are informed about the new location of the classe" [puppet] - 10https://gerrit.wikimedia.org/r/315343 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [09:19:32] (03PS1) 10Jcrespo: Fix DEFINER and binary log switching; delete old script [software/redactatron] - 10https://gerrit.wikimedia.org/r/315915 [09:21:58] (03PS18) 10Giuseppe Lavagetto: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [09:23:23] !log reimaging mw1166 to Debian Jessie (MW Jobrunner) [09:23:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:23:45] (03CR) 10Giuseppe Lavagetto: [C: 032] Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [09:27:32] (03PS2) 10BBlack: netboot/LVS: fix partman recipes used by LVS hosts [puppet] - 10https://gerrit.wikimedia.org/r/315869 (https://phabricator.wikimedia.org/T136737) (owner: 10Dzahn) [09:29:22] (03PS3) 10BBlack: netboot/LVS: fix partman recipes used by LVS hosts [puppet] - 10https://gerrit.wikimedia.org/r/315869 (https://phabricator.wikimedia.org/T136737) (owner: 10Dzahn) [09:30:30] (03CR) 10BBlack: [C: 032] netboot/LVS: fix partman recipes used by LVS hosts [puppet] - 10https://gerrit.wikimedia.org/r/315869 (https://phabricator.wikimedia.org/T136737) (owner: 10Dzahn) [09:33:31] (03PS1) 10Giuseppe Lavagetto: conftool::scripts::service: fixup for If1be5238493 [puppet] - 10https://gerrit.wikimedia.org/r/315918 [09:36:08] (03PS2) 10Giuseppe Lavagetto: conftool::scripts::service: fixup for If1be5238493 [puppet] - 10https://gerrit.wikimedia.org/r/315918 [09:36:51] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] conftool::scripts::service: fixup for If1be5238493 [puppet] - 10https://gerrit.wikimedia.org/r/315918 (owner: 10Giuseppe Lavagetto) [09:39:12] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 729 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3156778 keys - replication_delay is 729 [09:39:49] (03CR) 10Marostegui: [C: 031] Fix DEFINER and binary log switching; delete old script [software/redactatron] - 10https://gerrit.wikimedia.org/r/315915 (owner: 10Jcrespo) [09:41:14] 06Operations, 10Pybal, 06Services, 13Patch-For-Review, and 2 others: Depool / repool scripts execute successfully even when the host has not been (r|d)epooled - https://phabricator.wikimedia.org/T145518#2715709 (10Joe) 05Open>03Resolved [09:47:08] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3147170 keys - replication_delay is 0 [09:50:26] (03PS1) 10Ema: Point pinkunicorn's varnish-fe to its own varnish-be [puppet] - 10https://gerrit.wikimedia.org/r/315920 (https://phabricator.wikimedia.org/T131503) [09:52:54] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2715740 (10BBlack) [09:53:14] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2715756 (10BBlack) [09:53:16] 06Operations, 10Traffic, 07HTTPS: implement Public Key Pinning (HPKP) for Wikimedia domains - https://phabricator.wikimedia.org/T92002#1101271 (10BBlack) [09:53:31] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2715740 (10BBlack) p:05Triage>03High [09:53:50] (03PS1) 10Filippo Giunchedi: rsyslog: set retention days to 90 per policy [puppet] - 10https://gerrit.wikimedia.org/r/315921 [09:54:19] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2715740 (10BBlack) [09:55:07] 06Operations, 06Discovery, 06Discovery-Analysis (Current work), 13Patch-For-Review, 07Tracking: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2700737 (10MoritzMuehlenhoff) @mpopov: Can you provide the command you used to install boom w... [09:55:52] PROBLEM - Host sca2003 is DOWN: PING CRITICAL - Packet loss = 100% [10:03:26] 06Operations, 10Traffic: OCSP Stapling: support truly-independent ECC/RSA Certs+Staples - https://phabricator.wikimedia.org/T148132#2715770 (10BBlack) [10:04:03] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2715784 (10BBlack) [10:04:05] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review, 07Wikimedia-Incident: Make OCSP Stapling support more generic and robust - https://phabricator.wikimedia.org/T93927#2715785 (10BBlack) [10:04:08] 06Operations, 10Traffic: OCSP Stapling: support truly-independent ECC/RSA Certs+Staples - https://phabricator.wikimedia.org/T148132#2715783 (10BBlack) [10:05:09] (03CR) 10Filippo Giunchedi: "PCC fails on this change with a puppetdb-related error https://puppet-compiler.wmflabs.org/4365/prometheus1001.eqiad.wmnet/change.promethe" [puppet] - 10https://gerrit.wikimedia.org/r/315098 (https://phabricator.wikimedia.org/T147424) (owner: 10Filippo Giunchedi) [10:08:19] (03PS4) 10Elukey: Add extra compiler warnings to the Makefile [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/314662 (https://phabricator.wikimedia.org/T147436) [10:08:54] 06Operations, 10Traffic: OCSP Stapling for Intermediates - https://phabricator.wikimedia.org/T148134#2715807 (10BBlack) [10:09:15] <_joe_> godog: sigh, it's the usual pluginsync issue [10:09:57] RECOVERY - Host sca2003 is UP: PING OK - Packet loss = 0%, RTA = 37.69 ms [10:10:22] ? [10:10:29] _joe_: what usual issue ? [10:10:34] ignore sca2003 btw [10:11:02] <_joe_> akosiaris: on the compiler we do not execute pluginsync from the change [10:11:22] <_joe_> that would mean running the puppet agent on the server itself, more or less [10:11:25] <_joe_> I shit you not [10:11:26] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2715822 (10BBlack) [10:11:29] 06Operations, 10Traffic, 07HTTPS: implement Public Key Pinning (HPKP) for Wikimedia domains - https://phabricator.wikimedia.org/T92002#2715821 (10BBlack) [10:11:43] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2715740 (10BBlack) [10:11:46] 06Operations, 10Traffic, 07HTTPS: implement Public Key Pinning (HPKP) for Wikimedia domains - https://phabricator.wikimedia.org/T92002#1101271 (10BBlack) [10:11:49] * akosiaris sigh [10:13:05] 06Operations, 10Traffic: OCSP Stapling: support truly-independent ECC/RSA Certs+Staples - https://phabricator.wikimedia.org/T148132#2715828 (10BBlack) [10:13:07] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review, 07Wikimedia-Incident: Make OCSP Stapling support more generic and robust - https://phabricator.wikimedia.org/T93927#2715831 (10BBlack) [10:13:31] 06Operations, 10Traffic: OCSP Stapling for Intermediates - https://phabricator.wikimedia.org/T148134#2715807 (10BBlack) [10:13:33] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2715835 (10BBlack) [10:13:36] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review, 07Wikimedia-Incident: Make OCSP Stapling support more generic and robust - https://phabricator.wikimedia.org/T93927#1149975 (10BBlack) [10:13:39] 06Operations, 10Traffic: OCSP Stapling: support truly-independent ECC/RSA Certs+Staples - https://phabricator.wikimedia.org/T148132#2715770 (10BBlack) [10:14:10] 06Operations, 10Traffic: OCSP Stapling: support truly-independent ECC/RSA Certs+Staples - https://phabricator.wikimedia.org/T148132#2715770 (10BBlack) p:05Triage>03Normal [10:14:26] 06Operations, 10Traffic: OCSP Stapling for Intermediates - https://phabricator.wikimedia.org/T148134#2715807 (10BBlack) p:05Triage>03Normal [10:15:58] 06Operations, 06Analytics-Kanban, 10Traffic, 07HTTPS, 13Patch-For-Review: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2715846 (10BBlack) [10:16:00] 06Operations, 06Analytics-Kanban, 10Traffic, 07HTTPS: GlobalSign intermediate updates for one-offs - https://phabricator.wikimedia.org/T148069#2715843 (10BBlack) 05Open>03Resolved a:03BBlack Resolving for now, as we've covered what we can cover here in Ops. We'll need this ticket as a reference if w... [10:22:01] (03PS3) 10Giuseppe Lavagetto: lvm: add module from puppetlabs. [puppet] - 10https://gerrit.wikimedia.org/r/315293 [10:22:03] (03PS2) 10Giuseppe Lavagetto: kubernetes: introduce 1st-stage worker role [puppet] - 10https://gerrit.wikimedia.org/r/315717 (https://phabricator.wikimedia.org/T147181) [10:22:05] (03PS1) 10Giuseppe Lavagetto: kubernetes: install kubernetes1001-4 as worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/315923 (https://phabricator.wikimedia.org/T147933) [10:23:08] (03PS2) 10Ema: Point pinkunicorn's varnish-fe to its own varnish-be [puppet] - 10https://gerrit.wikimedia.org/r/315920 (https://phabricator.wikimedia.org/T131503) [10:24:26] (03CR) 10Giuseppe Lavagetto: [C: 032] lvm: add module from puppetlabs. [puppet] - 10https://gerrit.wikimedia.org/r/315293 (owner: 10Giuseppe Lavagetto) [10:28:07] RECOVERY - configured eth on sca2003 is OK: OK - interfaces up [10:28:12] !log stopping and restarting mysql at dbstore2001 for misc tests T146261 [10:28:13] T146261: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261 [10:28:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:28:25] RECOVERY - dhclient process on sca2003 is OK: PROCS OK: 0 processes with command name dhclient [10:28:48] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [10:28:49] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [10:29:05] RECOVERY - Check size of conntrack table on sca2003 is OK: OK: nf_conntrack is 0 % full [10:29:05] RECOVERY - salt-minion processes on sca2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:29:35] RECOVERY - DPKG on sca2003 is OK: All packages OK [10:30:11] RECOVERY - Disk space on sca2003 is OK: DISK OK [10:31:25] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [10:32:45] (03PS3) 10Ema: Point pinkunicorn's varnish-fe to its own varnish-be [puppet] - 10https://gerrit.wikimedia.org/r/315920 (https://phabricator.wikimedia.org/T131503) [10:32:58] (03CR) 10Ema: [C: 032 V: 032] Point pinkunicorn's varnish-fe to its own varnish-be [puppet] - 10https://gerrit.wikimedia.org/r/315920 (https://phabricator.wikimedia.org/T131503) (owner: 10Ema) [10:39:12] RECOVERY - zotero on sca2003 is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.085 second response time [10:45:34] 06Operations, 10vm-requests, 13Patch-For-Review: EQIAD|CODFW: (2) VM request for zotero - https://phabricator.wikimedia.org/T147409#2715869 (10akosiaris) 05Open>03Resolved a:03akosiaris [10:48:11] (03PS1) 10Ema: role::cache::instances add pinkunicorn to production conditional [puppet] - 10https://gerrit.wikimedia.org/r/315924 [10:52:01] (03PS1) 10Alexandros Kosiaris: conftool: Add new SCA VMs as zotero backends [puppet] - 10https://gerrit.wikimedia.org/r/315925 [10:52:03] (03PS1) 10Alexandros Kosiaris: conftool: Remove the old sca physical boxes from zotero backends [puppet] - 10https://gerrit.wikimedia.org/r/315926 [10:53:43] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 608 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3150750 keys - replication_delay is 608 [10:57:05] 06Operations, 06Analytics-Kanban, 10Traffic, 07HTTPS, 13Patch-For-Review: Windows 10 & MacOS Sierra Certificate errors due to GlobalSign - https://phabricator.wikimedia.org/T148045#2715888 (10Aklapper) [10:57:07] (03CR) 10Ema: [C: 032] role::cache::instances add pinkunicorn to production conditional [puppet] - 10https://gerrit.wikimedia.org/r/315924 (owner: 10Ema) [11:00:18] (03PS1) 10BBlack: eqiad recdns IP fix: add new to DNS [dns] - 10https://gerrit.wikimedia.org/r/315927 (https://phabricator.wikimedia.org/T143915) [11:00:20] (03PS1) 10BBlack: eqiad recdns IP fix: remove old from DNS [dns] - 10https://gerrit.wikimedia.org/r/315928 (https://phabricator.wikimedia.org/T143915) [11:00:37] !log mw1166 back in service after reimage (MW Jobrunner) [11:00:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:01:15] (03PS1) 10BBlack: eqiad recdns IP fix: add new address (.254) [puppet] - 10https://gerrit.wikimedia.org/r/315929 (https://phabricator.wikimedia.org/T143915) [11:01:17] (03PS1) 10BBlack: eqiad recdns IP fix: switch in puppet [puppet] - 10https://gerrit.wikimedia.org/r/315930 (https://phabricator.wikimedia.org/T143915) [11:01:19] (03PS1) 10BBlack: eqiad recdns IP fix: remove old from LVS [puppet] - 10https://gerrit.wikimedia.org/r/315931 (https://phabricator.wikimedia.org/T143915) [11:02:45] !log Stopping MySQL db2055 (S1-codfw) to import S1 to dbstore2001 - T146261 [11:02:46] T146261: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261 [11:02:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:03:27] great work what you are doing there with the imports, marostegui [11:03:37] Thanks! :) [11:04:38] if you have the time, start gathering one-liners, so they can later digi-evolve to a script, and later to a service [11:05:12] PROBLEM - check_mysql on frdb1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1289 [11:07:23] (03PS5) 10Elukey: Add extra compiler warnings to the Makefile [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/314662 (https://phabricator.wikimedia.org/T147436) [11:10:12] RECOVERY - check_mysql on frdb1001 is OK: Uptime: 855057 Threads: 1 Questions: 84970271 Slow queries: 6652 Opens: 8111 Flush tables: 1 Open tables: 599 Queries per second avg: 99.373 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [11:15:08] (03PS2) 10Alexandros Kosiaris: conftool: Add new SCA VMs as zotero backends [puppet] - 10https://gerrit.wikimedia.org/r/315925 [11:15:14] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] conftool: Add new SCA VMs as zotero backends [puppet] - 10https://gerrit.wikimedia.org/r/315925 (owner: 10Alexandros Kosiaris) [11:17:26] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: sca1003.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=sca', 'service=zotero']) [11:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:17:40] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: sca1004.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=sca', 'service=zotero']) [11:17:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:17:56] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: sca2004.codfw.wmnet (tags: ['dc=codfw', 'cluster=sca', 'service=zotero']) [11:18:00] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: sca2003.codfw.wmnet (tags: ['dc=codfw', 'cluster=sca', 'service=zotero']) [11:18:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:18:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:20:16] !log change-prop deploying 6dbdaa1 [11:20:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:25:33] dcausse: so what SSL issue happens on deployment-tin ? :] [11:25:44] T145609 [11:25:45] T145609: Puppet sslcert::ca does not refresh the certificate symlinks when a .crt is updated - https://phabricator.wikimedia.org/T145609 [11:25:47] hashar: the same we had last time [11:25:53] ohhhhh [11:26:14] so probably not related to the GlobalSign issue from yesterday [11:26:18] just wanted to check if you have no objections if I run the update cert command [11:26:31] update-ca-certificates --fresh [11:26:33] should do it [11:26:36] ok [11:26:48] I cant remember off hand what is the exact issue in the puppet define, but the task describe it [11:27:16] hashar: perfect it works now :) [11:27:20] thanks! [11:27:21] not sure why the .crt is updated though [11:27:37] me neither :/ [11:27:55] I award you a point to have found out the task :D [11:28:18] dcausse: maybe !log it in #wikimedia-releng and associate the task [11:28:23] sure [11:28:27] might gives a pointer to others later on [11:28:51] hashar: I searched for tasks opened by you and subscribed by me, searching with words is.. well.. you know :) [11:29:09] magic! [11:32:23] (03PS1) 10Alexandros Kosiaris: apertium: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/315933 [11:39:45] (03PS3) 10Alexandros Kosiaris: puppetmaster: replace palladium with puppetmaster1001 as default [puppet] - 10https://gerrit.wikimedia.org/r/315894 (https://phabricator.wikimedia.org/T147320) (owner: 10Dzahn) [11:39:50] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] puppetmaster: replace palladium with puppetmaster1001 as default [puppet] - 10https://gerrit.wikimedia.org/r/315894 (https://phabricator.wikimedia.org/T147320) (owner: 10Dzahn) [11:41:17] (03PS1) 10Gehel: ssl - make sure certificates are updated "fresh" [puppet] - 10https://gerrit.wikimedia.org/r/315934 (https://phabricator.wikimedia.org/T145609) [11:42:35] (03CR) 10Mark Bergsma: [C: 031] kubernetes: introduce 1st-stage worker role [puppet] - 10https://gerrit.wikimedia.org/r/315717 (https://phabricator.wikimedia.org/T147181) (owner: 10Giuseppe Lavagetto) [11:44:39] (03PS1) 10Mobrovac: Parsoid: Install the conftool service scripts [puppet] - 10https://gerrit.wikimedia.org/r/315936 (https://phabricator.wikimedia.org/T145518) [11:45:27] _joe_: ^^^ [11:46:57] <_joe_> mobrovac: heh, after lunch I'll be on it [11:47:03] (03PS1) 10Giuseppe Lavagetto: role::mediawiki::webserver: add conftool scripts [puppet] - 10https://gerrit.wikimedia.org/r/315937 [11:47:05] (03PS1) 10Giuseppe Lavagetto: role::mediawiki::webserver: restart hhvm routinely [puppet] - 10https://gerrit.wikimedia.org/r/315938 (https://phabricator.wikimedia.org/T147773) [11:47:10] heh [11:48:20] (03CR) 10Alexandros Kosiaris: [C: 031] "Kind of. So it is still used for feeding ganglia data to torrus (which we also want to kill) which we currently only use for PDUs. I am no" [puppet] - 10https://gerrit.wikimedia.org/r/315890 (owner: 10Dzahn) [11:50:15] (03CR) 10Mobrovac: "PCC OK - https://puppet-compiler.wmflabs.org/4368/" [puppet] - 10https://gerrit.wikimedia.org/r/315936 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [11:51:18] 06Operations, 07LDAP: update ldap-[codfw|eqiad].wikimedia.org certificates (expire on 2016-09-20) - https://phabricator.wikimedia.org/T145201#2715947 (10akosiaris) Confirmed [11:59:46] (03PS1) 10Gehel: decommission deployment-elastic08 [puppet] - 10https://gerrit.wikimedia.org/r/315940 (https://phabricator.wikimedia.org/T147777) [12:00:12] 06Operations, 10ops-esams, 10DNS, 10Traffic, 10netops: eeden ethernet outage - https://phabricator.wikimedia.org/T146391#2659577 (10akosiaris) Noting that there are no errors in TX or RX on the interface of neither the host nor the switch. [12:00:42] (03CR) 10Gehel: [C: 04-1] "Do not merge before the related changes in wmf-config are merged:" [puppet] - 10https://gerrit.wikimedia.org/r/315940 (https://phabricator.wikimedia.org/T147777) (owner: 10Gehel) [12:04:29] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "https://puppet-compiler.wmflabs.org/4369/" [puppet] - 10https://gerrit.wikimedia.org/r/315933 (owner: 10Alexandros Kosiaris) [12:04:37] (03PS2) 10Alexandros Kosiaris: apertium: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/315933 [12:04:39] (03CR) 10Alexandros Kosiaris: [V: 032] apertium: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/315933 (owner: 10Alexandros Kosiaris) [12:12:12] PROBLEM - puppet last run on aqs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:16:47] (03CR) 10Faidon Liambotis: [C: 04-1] "That sounds a little bit dangerous to me. This means that for a moment there the symlinks will disappear and who knows how will software r" [puppet] - 10https://gerrit.wikimedia.org/r/315934 (https://phabricator.wikimedia.org/T145609) (owner: 10Gehel) [12:22:23] RECOVERY - puppet last run on aqs1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:28:45] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2715740 (10faidon) I was actually thinking the same for keeping both certs live. One way to get around the subtle differences/coalesce issue etc. is to deploy them in different regions. esams/ulsfo could get ven... [12:36:04] !log akosiaris@puppetmaster1001 conftool action : set/pooled=no; selector: sca2001.codfw.wmnet (tags: ['dc=codfw', 'cluster=sca', 'service=zotero']) [12:36:08] !log akosiaris@puppetmaster1001 conftool action : set/pooled=no; selector: sca2002.codfw.wmnet (tags: ['dc=codfw', 'cluster=sca', 'service=zotero']) [12:36:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:36:23] !log akosiaris@puppetmaster1001 conftool action : set/pooled=no; selector: sca1002.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=sca', 'service=zotero']) [12:36:27] !log akosiaris@puppetmaster1001 conftool action : set/pooled=no; selector: sca1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=sca', 'service=zotero']) [12:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:36:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:42:58] 06Operations, 10Traffic: OCSP Stapling for Intermediates - https://phabricator.wikimedia.org/T148134#2715807 (10faidon) I was researching this a little bit last night. The tradeoff you mention (inflating response size) is definitely real and it looks like [[ https://www.ietf.org/mail-archive/web/tls/current/ms... [12:47:54] 06Operations, 10Traffic: OCSP Stapling for Intermediates - https://phabricator.wikimedia.org/T148134#2716295 (10BBlack) [12:48:17] !log reindexing top 10 wikipedias with BM25 on elastic@codfw from terbium (logs in ~dcausse/bm25_reindex/cirrus_log/) (T147508) [12:48:18] T147508: BM25: initial limited release into production - https://phabricator.wikimedia.org/T147508 [12:48:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:48:30] (03Abandoned) 10Gehel: ssl - make sure certificates are updated "fresh" [puppet] - 10https://gerrit.wikimedia.org/r/315934 (https://phabricator.wikimedia.org/T145609) (owner: 10Gehel) [12:49:39] (03Abandoned) 10Muehlenhoff: Decomission mw1217 [puppet] - 10https://gerrit.wikimedia.org/r/312522 (https://phabricator.wikimedia.org/T138925) (owner: 10Muehlenhoff) [12:49:53] (03CR) 10Jcrespo: [C: 032] Fix DEFINER and binary log switching; delete old script [software/redactatron] - 10https://gerrit.wikimedia.org/r/315915 (owner: 10Jcrespo) [12:49:56] (03CR) 10Jcrespo: [V: 032] Fix DEFINER and binary log switching; delete old script [software/redactatron] - 10https://gerrit.wikimedia.org/r/315915 (owner: 10Jcrespo) [12:50:20] 06Operations, 10ops-esams, 10DNS, 10Traffic, 10netops: eeden ethernet outage - https://phabricator.wikimedia.org/T146391#2716372 (10faidon) This happened twice yesterday, unfortunately during the GlobalSign event. I investigated it both times, but in both times the downtime was brief which limited my tro... [12:52:38] (03CR) 10DCausse: decommission deployment-elastic08 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/315940 (https://phabricator.wikimedia.org/T147777) (owner: 10Gehel) [12:52:44] 06Operations, 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 13Patch-For-Review: Puppet sslcert::ca does not refresh the certificate symlinks when a .crt is updated - https://phabricator.wikimedia.org/T145609#2716391 (10Gehel) As pointed by @faidon, when running `update-ca-certificates` with... [12:58:52] (03PS2) 10Gehel: decommission deployment-elastic08 [puppet] - 10https://gerrit.wikimedia.org/r/315940 (https://phabricator.wikimedia.org/T147777) [13:01:18] (03PS1) 10Ema: cp1008: set varnish::dynamic_directors to false [puppet] - 10https://gerrit.wikimedia.org/r/315944 (https://phabricator.wikimedia.org/T131503) [13:02:02] (03PS1) 10Jcrespo: mariadb: Move db1053 from s1 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/315945 (https://phabricator.wikimedia.org/T147305) [13:04:36] (03CR) 10Marostegui: [C: 031] mariadb: Move db1053 from s1 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/315945 (https://phabricator.wikimedia.org/T147305) (owner: 10Jcrespo) [13:05:22] (03CR) 10DCausse: [C: 031] decommission deployment-elastic08 [puppet] - 10https://gerrit.wikimedia.org/r/315940 (https://phabricator.wikimedia.org/T147777) (owner: 10Gehel) [13:05:24] (03CR) 10Ema: [C: 032] cp1008: set varnish::dynamic_directors to false [puppet] - 10https://gerrit.wikimedia.org/r/315944 (https://phabricator.wikimedia.org/T131503) (owner: 10Ema) [13:10:50] 06Operations, 07HHVM, 13Patch-For-Review, 15User-Joe, 07discovery-system: Restart HHVM on API appservers every about 48 hours - https://phabricator.wikimedia.org/T147773#2716539 (10Joe) a:03Joe [13:11:18] (03CR) 10Jcrespo: [C: 032] mariadb: Move db1053 from s1 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/315945 (https://phabricator.wikimedia.org/T147305) (owner: 10Jcrespo) [13:11:24] (03PS2) 10Jcrespo: mariadb: Move db1053 from s1 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/315945 (https://phabricator.wikimedia.org/T147305) [13:12:19] (03PS2) 10Giuseppe Lavagetto: role::mediawiki::webserver: add conftool scripts [puppet] - 10https://gerrit.wikimedia.org/r/315937 [13:13:24] PROBLEM - puppet last run on rdb1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:14:24] (03CR) 10Gehel: [C: 031] Elastic@deployment-prep: force the number of replicas to 1 max [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315104 (https://phabricator.wikimedia.org/T147777) (owner: 10DCausse) [13:14:38] (03CR) 10Gehel: [C: 031] Elastic@deployment-prep: Remove deployment-elastic08 from the clsuter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315249 (https://phabricator.wikimedia.org/T147777) (owner: 10DCausse) [13:18:34] 06Operations, 10ops-esams, 10DNS, 10Traffic, 10netops: eeden ethernet outage - https://phabricator.wikimedia.org/T146391#2716561 (10grin) The time the link went away has there been any VRRP change? (Either .1 didn't get/accept the arp req or havent answered it, or answered it on a different interface, I... [13:20:03] (03CR) 10Giuseppe Lavagetto: [C: 032] role::mediawiki::webserver: add conftool scripts [puppet] - 10https://gerrit.wikimedia.org/r/315937 (owner: 10Giuseppe Lavagetto) [13:24:50] 06Operations, 06Analytics-Kanban, 10Traffic, 07HTTPS, 13Patch-For-Review: Windows 10 & MacOS Sierra Certificate errors due to GlobalSign - https://phabricator.wikimedia.org/T148045#2716594 (10BBlack) Resolving this. The mitigation deployed yesterday (alternate intermediate->root chain) seems to have wor... [13:28:21] 06Operations, 06Analytics-Kanban, 10Traffic, 07HTTPS, 13Patch-For-Review: Windows 10 & MacOS Sierra Certificate errors due to GlobalSign - https://phabricator.wikimedia.org/T148045#2716627 (10faidon) 05Open>03Resolved a:03BBlack [13:33:08] (03PS1) 10Giuseppe Lavagetto: conftool::scripts::service: port 80 is the default [puppet] - 10https://gerrit.wikimedia.org/r/315951 [13:33:58] (03CR) 10Giuseppe Lavagetto: [C: 032] conftool::scripts::service: port 80 is the default [puppet] - 10https://gerrit.wikimedia.org/r/315951 (owner: 10Giuseppe Lavagetto) [13:34:02] (03CR) 10Giuseppe Lavagetto: [V: 032] conftool::scripts::service: port 80 is the default [puppet] - 10https://gerrit.wikimedia.org/r/315951 (owner: 10Giuseppe Lavagetto) [13:37:46] RECOVERY - puppet last run on rdb1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:53:42] (03PS4) 10Hashar: (WIP) Add puppet-lint to Rakefile (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/288620 [13:53:48] In about 30 minutes I'm going to be on the road, probably not very reachable until after midnight, then available most of Saturday, travelling again Sunday til late afternoon [13:54:04] (03PS4) 10Hashar: (wip) Speed up linting by only processing HEAD (wip) [puppet] - 10https://gerrit.wikimedia.org/r/288629 [13:57:00] (03CR) 10jenkins-bot: [V: 04-1] (wip) Speed up linting by only processing HEAD (wip) [puppet] - 10https://gerrit.wikimedia.org/r/288629 (owner: 10Hashar) [14:04:25] PROBLEM - MariaDB Slave Lag: s1 on db2055 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1438.85 seconds [14:12:15] RECOVERY - MariaDB Slave Lag: s1 on db2055 is OK: OK slave_sql_lag Replication lag: 0.21 seconds [14:14:34] PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [14:16:51] 06Operations, 10Gerrit, 06Release-Engineering-Team: Build warm slave for Gerrit in Dallas - https://phabricator.wikimedia.org/T148186#2716848 (10demon) [14:19:47] 06Operations, 10Gerrit, 06Release-Engineering-Team, 10hardware-requests: Requesting 1 spare misc box for Gerrit in codfw - https://phabricator.wikimedia.org/T148187#2716861 (10demon) [14:20:06] (03PS11) 10Hashar: zuul: refactor to use hiera [puppet] - 10https://gerrit.wikimedia.org/r/308778 (https://phabricator.wikimedia.org/T139527) [14:20:24] (03CR) 10Hashar: "* hiera_hash() > hiera(), the former merge all found values. The later only the first match." [puppet] - 10https://gerrit.wikimedia.org/r/308778 (https://phabricator.wikimedia.org/T139527) (owner: 10Hashar) [14:20:27] (03PS1) 10Gehel: maps - adding dummy passwords for postgresql monitoring and replication [labs/private] - 10https://gerrit.wikimedia.org/r/315959 (https://phabricator.wikimedia.org/T147194) [14:21:27] (03PS4) 10Gehel: Maps - cleanup postgres user creation [puppet] - 10https://gerrit.wikimedia.org/r/315271 (https://phabricator.wikimedia.org/T147194) [14:25:11] (03PS12) 10Giuseppe Lavagetto: zuul: refactor to use hiera [puppet] - 10https://gerrit.wikimedia.org/r/308778 (https://phabricator.wikimedia.org/T139527) (owner: 10Hashar) [14:25:18] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] zuul: refactor to use hiera [puppet] - 10https://gerrit.wikimedia.org/r/308778 (https://phabricator.wikimedia.org/T139527) (owner: 10Hashar) [14:26:05] \O/ [14:26:28] <_joe_> hashar: let's see if there is any circular dependency now :P [14:26:32] yeah [14:26:42] I will eventually write a rspec test suite for CI [14:27:06] <_joe_> hashar: nicely run and works as expected [14:27:11] actually I have a rspec one locally [14:27:16] which I have used to handle the refactoring [14:28:28] (03CR) 10Volans: [C: 031] "LGTM" [labs/private] - 10https://gerrit.wikimedia.org/r/315959 (https://phabricator.wikimedia.org/T147194) (owner: 10Gehel) [14:28:48] _joe_: if you dont mind, there is the follow up https://gerrit.wikimedia.org/r/#/c/309299/ which is tightly coupled with the change just merged [14:29:07] it move some settings around. I made it an independent change because it shows up change in the puppet compiler [14:29:09] (03PS1) 10Chad: Gerrit: Clean up cron job definitions [puppet] - 10https://gerrit.wikimedia.org/r/315960 [14:29:17] and I wanted the change that got merged as close to a noop as possible [14:29:30] (03CR) 10Gehel: [C: 032] maps - adding dummy passwords for postgresql monitoring and replication [labs/private] - 10https://gerrit.wikimedia.org/r/315959 (https://phabricator.wikimedia.org/T147194) (owner: 10Gehel) [14:29:38] (03CR) 10Gehel: [V: 032] maps - adding dummy passwords for postgresql monitoring and replication [labs/private] - 10https://gerrit.wikimedia.org/r/315959 (https://phabricator.wikimedia.org/T147194) (owner: 10Gehel) [14:30:27] (03PS6) 10Hashar: zuul: migrate server only settings out of merger [puppet] - 10https://gerrit.wikimedia.org/r/309299 [14:30:44] (puppet compiling it) [14:31:57] (03CR) 10Hashar: [C: 031] "Puppet compiled for all three hosts. https://puppet-compiler.wmflabs.org/4375/" [puppet] - 10https://gerrit.wikimedia.org/r/309299 (owner: 10Hashar) [14:33:30] (03PS6) 10Elukey: Add extra compiler warnings to the Makefile [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/314662 (https://phabricator.wikimedia.org/T147436) [14:33:34] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:38:14] RECOVERY - puppet last run on mw2178 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [14:42:54] (03PS3) 10Chad: Remove MWVersion, fold its two functions into MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 [14:49:10] <_joe_> hashar: I'll take a quick look before the meeting [14:49:39] _joe_: I have recompiled it looks straight forward :] [14:49:49] will spend rest of the day fighting with some Duplicate declaration: Class[Zuul] is already declared that happens on labs bah [14:50:48] (03CR) 10EBernhardson: [C: 031] contint: Install php7.0-ast for phan [puppet] - 10https://gerrit.wikimedia.org/r/315711 (https://phabricator.wikimedia.org/T132636) (owner: 10Legoktm) [14:53:14] (03PS5) 10Gehel: Maps - cleanup postgres user creation [puppet] - 10https://gerrit.wikimedia.org/r/315271 (https://phabricator.wikimedia.org/T147194) [14:54:48] <_joe_> hashar: sorry, can't make it reasonably [14:54:53] <_joe_> I have a meeting now [14:54:56] <_joe_> I'll try after [14:55:23] (03PS6) 10Gehel: Maps - cleanup postgres user creation [puppet] - 10https://gerrit.wikimedia.org/r/315271 (https://phabricator.wikimedia.org/T147194) [14:55:53] _joe_: yeah it is meeting time. Regardless the one that merged was the most important one :]  Thank you for that! [14:56:22] <_joe_> yw [14:58:11] (03PS7) 10Gehel: Maps - cleanup postgres user creation [puppet] - 10https://gerrit.wikimedia.org/r/315271 (https://phabricator.wikimedia.org/T147194) [14:59:07] (03CR) 10Dzahn: "if we upgrade those machines we'd want to switch to Debian (jessie). jessie has g++ 4.9 https://packages.debian.org/jessie/g++" [puppet] - 10https://gerrit.wikimedia.org/r/315885 (https://phabricator.wikimedia.org/T147682) (owner: 10Bearloga) [14:59:37] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:02:39] (03PS8) 10Gehel: Maps - cleanup postgres user creation [puppet] - 10https://gerrit.wikimedia.org/r/315271 (https://phabricator.wikimedia.org/T147194) [15:07:47] (03PS1) 10Hashar: contint: drop role::ci::slave::localbrowser [puppet] - 10https://gerrit.wikimedia.org/r/315962 [15:10:40] (03CR) 10Hashar: [C: 031] "The labs CI instances failed with:" [puppet] - 10https://gerrit.wikimedia.org/r/315962 (owner: 10Hashar) [15:14:04] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:14:56] (03PS5) 10Jcrespo: mariadb:Create a systemd unit & init.d script for new package [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/315228 (https://phabricator.wikimedia.org/T147305) [15:18:40] (03PS6) 10Jcrespo: mariadb:Create a systemd unit & init.d script for new package [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/315228 (https://phabricator.wikimedia.org/T147305) [15:18:50] (03CR) 10Marostegui: mariadb:Create a systemd unit & init.d script for new package (031 comment) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/315228 (https://phabricator.wikimedia.org/T147305) (owner: 10Jcrespo) [15:25:18] (03PS1) 10Hashar: contint: remove zuul from role::ci::slave::browsertests [puppet] - 10https://gerrit.wikimedia.org/r/315963 [15:26:16] (03CR) 10Hashar: [C: 031] "Duplicate declaration: Class[Zuul] is" [puppet] - 10https://gerrit.wikimedia.org/r/315963 (owner: 10Hashar) [15:27:26] PROBLEM - HP RAID on labvirt1005 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:1 - OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12, 1I:1:13, 2I:1:14, 2I:1:15, 2I:1:16, 2I:1:17, 2I:1:18, Controller, Battery/Capacitor [15:30:04] 06Operations, 06Discovery, 06Maps, 03Interactive-Sprint: Maps - error when doing initial tiles generation: "Error: could not create converter for SQL_ASCII"" - https://phabricator.wikimedia.org/T148031#2716943 (10Yurik) [15:33:30] (03PS2) 10Madhuvishy: drbd monitoring: Improve status and error messages [puppet] - 10https://gerrit.wikimedia.org/r/315874 [15:36:20] (03CR) 10Madhuvishy: [C: 032 V: 032] drbd monitoring: Improve status and error messages [puppet] - 10https://gerrit.wikimedia.org/r/315874 (owner: 10Madhuvishy) [15:37:24] PROBLEM - puppet last run on elastic1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:40:43] RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:43:50] (03PS3) 10Alexandros Kosiaris: facilities: Unexported nagios host and services [puppet] - 10https://gerrit.wikimedia.org/r/315510 [15:46:44] 06Operations, 06Analytics-Kanban, 10Traffic, 07HTTPS, and 2 others: Windows 10 & MacOS Sierra Certificate errors due to GlobalSign - https://phabricator.wikimedia.org/T148045#2716979 (10Aklapper) [15:50:28] (03PS1) 10Jgreen: add A record for payments-listener-codfw.wikimedia.org, remove deprecated listener-frdev hostname [dns] - 10https://gerrit.wikimedia.org/r/315966 [15:51:06] (03CR) 10Jgreen: [C: 032] add A record for payments-listener-codfw.wikimedia.org, remove deprecated listener-frdev hostname [dns] - 10https://gerrit.wikimedia.org/r/315966 (owner: 10Jgreen) [15:52:29] !log authdns-update to add payments-listener-codfw A record [15:52:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:01:37] RECOVERY - puppet last run on elastic1028 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:16:51] (03PS4) 10Alexandros Kosiaris: icinga: Remove the hack around facilities, lvs::monitor [puppet] - 10https://gerrit.wikimedia.org/r/315510 [16:16:53] (03PS1) 10Alexandros Kosiaris: monitoring: Export based on class icinga inclusion [puppet] - 10https://gerrit.wikimedia.org/r/315970 [16:17:12] (03CR) 10jenkins-bot: [V: 04-1] icinga: Remove the hack around facilities, lvs::monitor [puppet] - 10https://gerrit.wikimedia.org/r/315510 (owner: 10Alexandros Kosiaris) [16:17:20] (03CR) 10jenkins-bot: [V: 04-1] monitoring: Export based on class icinga inclusion [puppet] - 10https://gerrit.wikimedia.org/r/315970 (owner: 10Alexandros Kosiaris) [16:19:32] (03PS5) 10Alexandros Kosiaris: icinga: Remove the hack around facilities, lvs::monitor [puppet] - 10https://gerrit.wikimedia.org/r/315510 [16:19:34] (03PS2) 10Alexandros Kosiaris: monitoring: Export based on class icinga inclusion [puppet] - 10https://gerrit.wikimedia.org/r/315970 [16:24:04] (03CR) 10Alexandros Kosiaris: [C: 04-2] "Done differently in https://gerrit.wikimedia.org/r/#/c/315970/" [puppet] - 10https://gerrit.wikimedia.org/r/314537 (owner: 10Giuseppe Lavagetto) [16:24:08] (03Abandoned) 10Alexandros Kosiaris: role::icinga: declare common resources only on the primary server [puppet] - 10https://gerrit.wikimedia.org/r/314537 (owner: 10Giuseppe Lavagetto) [16:24:12] 06Operations, 10Traffic: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2717174 (10BBlack) Yeah, regional split might make sense. We probably don't want to mix within the US, where we might see "bouncy" GeoIP resolution. Perhaps one for all US sites and one for all non-US sites (i... [16:25:08] 07Puppet, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Technical-Debt, 07Zuul: role::zuul::configuration should be replaced by hiera - https://phabricator.wikimedia.org/T139527#2717180 (10hashar) 05Open>03Resolved Solved with nice help from @joe and @Dzahn [16:26:40] ori: this is a change in your personal dot files https://gerrit.wikimedia.org/r/#/c/315893/ [16:27:15] (03PS2) 10Ori.livneh: ori: update personal dot files [puppet] - 10https://gerrit.wikimedia.org/r/315893 (https://phabricator.wikimedia.org/T147320) (owner: 10Dzahn) [16:27:30] (03CR) 10Ori.livneh: [C: 032] "Thanks, Daniel!" [puppet] - 10https://gerrit.wikimedia.org/r/315893 (https://phabricator.wikimedia.org/T147320) (owner: 10Dzahn) [16:27:40] (03CR) 10Ori.livneh: [V: 032] ori: update personal dot files [puppet] - 10https://gerrit.wikimedia.org/r/315893 (https://phabricator.wikimedia.org/T147320) (owner: 10Dzahn) [16:28:02] :) just because i searched for palladium [16:28:52] yay [16:33:13] (03PS7) 10Jcrespo: mariadb:Create a systemd unit & init.d script for new package [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/315228 (https://phabricator.wikimedia.org/T147305) [16:34:56] PROBLEM - puppet last run on cobalt is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/home/ori/.hosts/bast1001],File[/home/ori/.hosts/puppetmaster1001] [16:35:16] (03PS3) 10Jcrespo: mariadb: Move db1053 from s1 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/315945 (https://phabricator.wikimedia.org/T147305) [16:35:20] * ori boggles [16:36:56] 06Operations, 10Traffic, 07Wikimedia-Incident: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2717232 (10greg) [16:37:19] mutante: it was fine on a second puppet run; seems like a fluke race condition [16:37:36] RECOVERY - puppet last run on cobalt is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:37:54] ori: oh, maybe because it changed the puppet-merge alias itself [16:43:16] 06Operations, 10Traffic, 07Wikimedia-Incident: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2717255 (10BBlack) [16:47:19] (03CR) 1020after4: [C: 031] "I haven't tested whatsoever but this looks good" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 (owner: 10Chad) [16:47:46] (03PS4) 10Rush: WIP: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [16:48:46] (03CR) 10jenkins-bot: [V: 04-1] WIP: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 (owner: 10Rush) [16:50:24] (03PS2) 10Dzahn: contint: drop role::ci::slave::localbrowser [puppet] - 10https://gerrit.wikimedia.org/r/315962 (owner: 10Hashar) [16:51:37] (03CR) 10Dzahn: [C: 032] contint: drop role::ci::slave::localbrowser [puppet] - 10https://gerrit.wikimedia.org/r/315962 (owner: 10Hashar) [16:52:29] (03PS2) 10Dzahn: contint: remove zuul from role::ci::slave::browsertests [puppet] - 10https://gerrit.wikimedia.org/r/315963 (owner: 10Hashar) [16:52:54] (03PS5) 10Rush: WIP: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [16:53:55] (03CR) 10jenkins-bot: [V: 04-1] WIP: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 (owner: 10Rush) [16:54:16] (03CR) 10Dzahn: [C: 032] contint: remove zuul from role::ci::slave::browsertests [puppet] - 10https://gerrit.wikimedia.org/r/315963 (owner: 10Hashar) [16:54:32] (03CR) 10Thcipriani: "Left a bunch of inline comments that are all directory path related." (036 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 (owner: 10Chad) [16:54:45] (03PS1) 10Jcrespo: dbtools: Move db1053 from s1 to s4 [software] - 10https://gerrit.wikimedia.org/r/315973 [16:55:51] (03PS2) 10Jcrespo: dbtools: Move db1053 from s1 to s4 [software] - 10https://gerrit.wikimedia.org/r/315973 [16:57:08] (03PS6) 10Rush: WIP: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [16:58:30] (03CR) 10Jcrespo: [C: 032] dbtools: Move db1053 from s1 to s4 [software] - 10https://gerrit.wikimedia.org/r/315973 (owner: 10Jcrespo) [16:59:27] (03CR) 10Dzahn: "the "watroles" tool claims there is just a single instance in all of labs using role::logstash, deployment-logstash2" [puppet] - 10https://gerrit.wikimedia.org/r/315888 (owner: 10Dzahn) [17:02:13] (03CR) 10BryanDavis: "> the "watroles" tool claims there is just a single instance in all" [puppet] - 10https://gerrit.wikimedia.org/r/315888 (owner: 10Dzahn) [17:05:29] Good afternoon ejegg [17:09:09] Hi Zppix [17:11:48] (03CR) 10Muehlenhoff: [C: 031] "It's fine, then. The existing rules are all fine wrt labs/deployment-prep." [puppet] - 10https://gerrit.wikimedia.org/r/315888 (owner: 10Dzahn) [17:12:07] (03PS2) 10Dzahn: logstash: move base::firewall from node to role [puppet] - 10https://gerrit.wikimedia.org/r/315888 [17:12:16] (03CR) 10Dzahn: [C: 032] "ok, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/315888 (owner: 10Dzahn) [17:13:44] (03CR) 10Reedy: Switch to extension registration for Mobile extensions (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) (owner: 10Dereckson) [17:14:15] (03CR) 10Dereckson: "A PS3 is coming to rebase against current master, and focus on MobileFrontend, safe to switch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) (owner: 10Dereckson) [17:16:25] (03PS1) 10Jcrespo: mariadb: move db1053 from s1 to s4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315975 [17:17:57] rejoice, Bugzilla works again, lol [17:18:01] LOL [17:18:08] but only on http://bugs.wmflabs.org/ [17:18:40] yea [17:19:48] why the hell do we need bugzilla lmao [17:20:37] 06Operations, 13Patch-For-Review: Migrate pool counters to trusty/jessie - https://phabricator.wikimedia.org/T123734#2717356 (10Dzahn) +1 , poolcounter1001 sounds good [17:20:51] we'll use it to track bugs in our phabricator instance :) [17:21:52] Zppix: https://phabricator.wikimedia.org/T95267 [17:22:54] Zppix: because it has a search unlike https://static-bugzilla.wikimedia.org/ :) [17:23:42] https://static-bugzilla.wikimedia.org/all_bugs.html makes my eyes bleed [17:25:03] that's the browser defaults :p it literally has no style applied [17:25:56] so much blue [17:25:58] help [17:26:00] xD [17:26:01] (03PS3) 10Dereckson: Switch MobileFrontend to extension registration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) [17:26:28] clearly you need to open a ticket about making a read-write version of bugs.wmflabs and then open a bug there about the style on static-bugzilla :) [17:26:35] (03CR) 10Dereckson: Switch MobileFrontend to extension registration (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) (owner: 10Dereckson) [17:26:56] mutante i would if i could see [17:27:29] Zppix: you can use Greasemonkey [17:27:42] ik im lazy [17:27:56] fair :) [17:32:09] Zppix: https://kb.iu.edu/d/algz [17:32:14] scnr [17:32:36] I don't really care about it mutante [17:34:17] (03PS3) 10Dzahn: ganglia/netmon1001: rm ganglia::deprecated::collector [puppet] - 10https://gerrit.wikimedia.org/r/315890 [17:40:26] (03CR) 10Dzahn: [C: 032] "i'll check the graphs for power strips on torrus" [puppet] - 10https://gerrit.wikimedia.org/r/315890 (owner: 10Dzahn) [17:46:18] (03CR) 10Dzahn: "> You understand that there are years of changes pending to be merged with the old structure." [puppet] - 10https://gerrit.wikimedia.org/r/315343 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [17:50:50] (03CR) 10Madhuvishy: [C: 031] "Small naming nits, looks good to me otherwise" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/315534 (owner: 10Rush) [17:53:10] (03CR) 10Dzahn: "> Thanks for the information. I will do it on a separate patch, just doing a simple file rename." [puppet] - 10https://gerrit.wikimedia.org/r/315343 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [17:53:23] (03Abandoned) 10Dzahn: mariadb: split role classes into separate files [puppet] - 10https://gerrit.wikimedia.org/r/315343 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [17:55:51] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review, 07Wikimedia-Incident: Make OCSP Stapling support more generic and robust - https://phabricator.wikimedia.org/T93927#2717425 (10BBlack) I've re-evaluated nginx's stapling support today. We last evaluated it deeply many versions ago and found that for... [18:00:37] (03CR) 10Dzahn: ""simple file rename" is not going to work though, if you look at modules/role/manifests/ there isn't a single file directly in there and n" [puppet] - 10https://gerrit.wikimedia.org/r/315343 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [18:05:08] 06Operations, 06Discovery, 06Discovery-Analysis (Current work), 13Patch-For-Review, 07Tracking: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2717468 (10mpopov) >>! In T147682#2715761, @MoritzMuehlenhoff wrote: > @mpopov: Can you provi... [18:05:51] (03PS1) 10Jgreen: flip payments-listener from eqiad to codfw [dns] - 10https://gerrit.wikimedia.org/r/315979 [18:06:31] greg-g, do you think https://phabricator.wikimedia.org/T148117 is user visible enough to deploy today? the fix is very trivial and safe [18:07:32] * greg-g looks [18:07:38] (03CR) 10Jgreen: [C: 032] flip payments-listener from eqiad to codfw [dns] - 10https://gerrit.wikimedia.org/r/315979 (owner: 10Jgreen) [18:07:58] (03PS1) 10Chad: Just make wiki.phtml symlink index.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315980 [18:08:11] (03CR) 10Dzahn: "not doing this is actually going to be a blocker now for doing what Faidon suggested, this, eventlogging and logstash, all others can be m" [puppet] - 10https://gerrit.wikimedia.org/r/315343 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [18:08:12] Reedy: Heh ^ [18:08:16] !log flip payments-listener service from eqiad to codfw [18:08:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:09:01] MaxSem: doesn't look like it? based on what Nikerabbit said in the task, at least [18:09:12] ok [18:09:41] it's visible for default pageviews thoug =) [18:10:18] ostriches: I've a patch to just delete it :P [18:10:26] or not.. [18:10:29] (03CR) 10Dzahn: "@Bearlog maybe amend to just use g++-4.8 for now?" [puppet] - 10https://gerrit.wikimedia.org/r/315885 (https://phabricator.wikimedia.org/T147682) (owner: 10Bearloga) [18:10:39] Or that... [18:10:49] * MaxSem scratches head [18:11:56] MaxSem: let me know what you figure out :P [18:12:25] (03CR) 10Bearloga: Update R and C++-related stats puppet configs (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/315885 (https://phabricator.wikimedia.org/T147682) (owner: 10Bearloga) [18:12:57] (03CR) 10Jforrester: "Provisionally scheduled for 18 October." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315278 (https://phabricator.wikimedia.org/T142589) (owner: 10Jforrester) [18:14:29] greg-g, I see it when using a non-default language. whatever, if Niklas feels it works to fix it on monday... [18:16:45] (03CR) 10Dereckson: [C: 04-1] Switch MobileFrontend to extension registration (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) (owner: 10Dereckson) [18:17:41] (03PS4) 10Dereckson: Switch MobileFrontend to extension registration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) [18:20:38] MaxSem: /me nods [18:24:31] (03CR) 10Dzahn: "no-op per compiler http://puppet-compiler.wmflabs.org/4383/" [puppet] - 10https://gerrit.wikimedia.org/r/315887 (owner: 10Dzahn) [18:27:55] (03CR) 10Reedy: Switch MobileFrontend to extension registration (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314748 (https://phabricator.wikimedia.org/T147092) (owner: 10Dereckson) [18:28:46] (03PS1) 10BBlack: [WIP] preliminary ssl_stapling_proxy work [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/315982 (https://phabricator.wikimedia.org/T93927) [18:28:52] (03CR) 10Dzahn: "i still see for example https://torrus.wikimedia.org/torrus/Facilities?nodeid=device//ps1-b2-eqiad like before, i also did not stop anyth" [puppet] - 10https://gerrit.wikimedia.org/r/315890 (owner: 10Dzahn) [18:30:11] (03PS1) 10Dereckson: Explicit dblist name for compact language links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315983 [18:31:09] (03CR) 10Chad: Remove MWVersion, fold its two functions into MWMultiVersion (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 (owner: 10Chad) [18:31:30] (03PS2) 10Dzahn: site.pp: remove "include admin"s [puppet] - 10https://gerrit.wikimedia.org/r/315887 [18:33:50] (03PS2) 10Dzahn: lists::server: simplify includes on node level [puppet] - 10https://gerrit.wikimedia.org/r/315886 [18:35:09] (03PS4) 10Chad: Remove MWVersion, fold its two functions into MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 [18:35:44] (03CR) 10Dzahn: [C: 032] "no-op http://puppet-compiler.wmflabs.org/4385/" [puppet] - 10https://gerrit.wikimedia.org/r/315886 (owner: 10Dzahn) [18:36:01] (03PS3) 10Dzahn: lists::server: simplify includes on node level [puppet] - 10https://gerrit.wikimedia.org/r/315886 [18:39:34] (03CR) 10Dzahn: [C: 04-1] "http://puppet-compiler.wmflabs.org/4386/" [puppet] - 10https://gerrit.wikimedia.org/r/315882 (owner: 10Dzahn) [19:01:59] (03CR) 10Andrew Bogott: [C: 032] site.pp: remove "include admin"s [puppet] - 10https://gerrit.wikimedia.org/r/315887 (owner: 10Dzahn) [19:05:27] andrewbogott: :) thank you [19:05:53] hope I didn't step on any toes, I got kicked from irc just as I started to merge :) [19:06:07] not at all, it was a positive surprise [19:06:13] to just see it merged [19:10:07] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Stirring The Pot, and 2 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2717891 (10AndyRussG) Is this also a bug? In `MessageCache::replace()`, [[ https://github.com/wikimedia/m... [19:17:58] greg-g: did you ping me? :o [19:19:52] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Stirring The Pot, and 2 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2717936 (10AndyRussG) Here is a general theory: - There is a bug in `MessageCache` that causes wrong dat... [19:22:06] greg-g, MaxSem: I would be okay deploying the fix (which I was not able to test because I have only seen the issue in production, but I am fairly confident on it). Given it doesn't really break anything, we decided in our daily stand-up that we don't try to deploy it out of schedule [19:38:59] greg-g, thcipriani, i just discovered that yesterday's amp deployment had a big nasty bug in it. I need to redeploy kartotherian service via scap3 [19:39:29] oh perfect..... yurik what bug was it? [19:39:35] (03PS2) 10Dzahn: installserver: move standard include to role [puppet] - 10https://gerrit.wikimedia.org/r/315882 [19:40:00] zppix i haven't filed it yet - basically any maps that uses external data (like sparql query) does not show the map [19:40:42] Good job there mate, hopefully it gets fixed by your patch [19:40:51] i hope so :) [19:41:01] building the new deployment package as we speak [19:41:10] it takes about 10-15 min just to build :( [19:43:27] https://phabricator.wikimedia.org/T148237 [19:43:31] zppix^ [19:43:41] !log Manual DB update for https://www.wikidata.org/wiki/User_talk:Doror and https://fr.wikipedia.org/wiki/Discussion_utilisateur:Robur15 . T148057 [19:43:42] T148057: Fix user talk pages already in inconsistent state due to to T138310 - https://phabricator.wikimedia.org/T148057 [19:43:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:44:19] gehel, ^^^ [19:45:04] yurik I subscribed to the phab ticket ill get my eyes out [19:45:07] keep* [19:46:48] (03PS1) 10Jgreen: flip payments-listener back to eqiad [dns] - 10https://gerrit.wikimedia.org/r/315987 [19:48:30] yurik: amp? [19:48:39] greg-g, ?? [19:48:47] " yesterday's amp deployment " [19:48:58] oh, maps? [19:49:06] got confused with google's AMP [19:49:08] this was an early day depl during services [19:49:10] hehe [19:49:26] i have the fix, doing some final packanging and testing now [19:49:36] slow network :( [19:50:23] (03CR) 10Jgreen: [C: 032] flip payments-listener back to eqiad [dns] - 10https://gerrit.wikimedia.org/r/315987 (owner: 10Jgreen) [19:50:26] Nikerabbit: great choice as long as you think it's not terribly wide-spread/huge number of complaining users (seems not). So yeah, have a good weekend :) [19:51:02] !log flip payments-listener back to eqiad [19:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:43:28] (03PS9) 10Rush: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [20:44:28] (03CR) 10jenkins-bot: [V: 04-1] bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 (owner: 10Rush) [20:44:33] (03PS10) 10Rush: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [20:45:26] (03CR) 10jenkins-bot: [V: 04-1] bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 (owner: 10Rush) [20:46:01] (03PS11) 10Rush: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [20:51:23] (03CR) 10Thcipriani: [C: 031] "LGTM, but would be good to try in beta." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 (owner: 10Chad) [20:59:39] marostegui, jynus, if one of you is around, could you come to #wikimedia-releng. We accidentally got the slave and master out of sync on Beta Cluster for centralauth DB. [21:09:22] (03CR) 10Madhuvishy: bdsync backup setup for labstore (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/315595 (owner: 10Rush) [21:20:27] ^ Never mind, we solved that. [21:26:20] marostegui: jynus: 21:20 < matt_flas> ^ Never mind, we solved that. [21:26:43] (03PS1) 10Alex Monk: Enable Quiz extension on beta cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316004 (https://phabricator.wikimedia.org/T142692) [21:31:43] greg-g, we can do that whenever ^ [21:31:45] shall I do it now? [21:32:31] er [21:32:34] or we will when I upload PS2 [21:32:46] (03PS2) 10Alex Monk: Enable Quiz extension on beta cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316004 (https://phabricator.wikimedia.org/T142692) [21:35:25] :) [21:36:18] Krenair: sure thing [21:37:15] 06Operations, 10Beta-Cluster-Infrastructure, 10DBA: Possible to run writes (e.g. UPDATE) on slave - https://phabricator.wikimedia.org/T110115#2718307 (10Mattflaschen-WMF) This caused complications when trying to fix {T148111} too. @bd808 accidentally ran the ALTER on the slave, I then ran it on master, but... [21:37:28] 06Operations, 10Beta-Cluster-Infrastructure, 10DBA: Possible to run writes (e.g. UPDATE) on Beta Cluster replica - https://phabricator.wikimedia.org/T110115#2718311 (10Mattflaschen-WMF) [21:39:39] (03CR) 10Alex Monk: [C: 032] Enable Quiz extension on beta cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316004 (https://phabricator.wikimedia.org/T142692) (owner: 10Alex Monk) [21:40:18] (03Merged) 10jenkins-bot: Enable Quiz extension on beta cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316004 (https://phabricator.wikimedia.org/T142692) (owner: 10Alex Monk) [21:41:43] greg-g, thcipriani, seems like i cannot even build the package properly with the slow network (npm be damned), so i will have to deploy kartotherian early next week [21:42:22] yurik: what is the user facing impact of this issue? [21:42:58] greg-g, basically our coolest maps are dead - the ones that show a map + the data generated by a SPARQL query [21:43:12] like the one with governors [21:43:43] !log krenair@mira Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/316004/ - no-op here, only labs reads this file. just keeping it in sync (duration: 02m 07s) [21:43:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:44:26] oh, actually i think it is working ... let me check again [21:45:33] aha! it is working, i was just testing it incorrectly [21:45:42] greg-g, i will go ahead and sync it [21:47:28] !log about to sync kartotherian fix [21:47:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:49:08] !log deployed & restarted kartotherian, all's good now [21:49:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:49:21] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Stirring The Pot, and 2 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2718353 (10AndyRussG) Just a note about the underlying `ObjectCache` implementations we're dealing with:... [21:58:06] (03PS12) 10Rush: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [21:59:09] (03CR) 10jenkins-bot: [V: 04-1] bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 (owner: 10Rush) [22:00:55] (03PS13) 10Rush: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [22:01:48] (03CR) 10jenkins-bot: [V: 04-1] bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 (owner: 10Rush) [22:03:36] (03PS14) 10Rush: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [22:04:29] (03CR) 10jenkins-bot: [V: 04-1] bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 (owner: 10Rush) [22:04:48] I give up for today :) [22:06:44] (03PS15) 10Rush: bdsync backup setup for labstore [puppet] - 10https://gerrit.wikimedia.org/r/315595 [22:16:55] (03PS1) 10Chad: Gerrit: Automate slave mode detection [puppet] - 10https://gerrit.wikimedia.org/r/316015 [22:18:01] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Stirring The Pot, and 2 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2615311 (10GWicke) @aaron knows that code quite well. It might be worth asking him for advice. [22:19:02] (03PS1) 10Dzahn: repeat hostname for IPv6 for more misc hosts [dns] - 10https://gerrit.wikimedia.org/r/316017 [22:20:30] . [22:20:55] (03PS2) 10Dzahn: repeat hostname for IPv6 alsafi,fermium,neon,netmon1001,radium [dns] - 10https://gerrit.wikimedia.org/r/316017 [22:33:37] (03CR) 10Dzahn: [C: 032] repeat hostname for IPv6 alsafi,fermium,neon,netmon1001,radium [dns] - 10https://gerrit.wikimedia.org/r/316017 (owner: 10Dzahn) [22:33:57] (03PS3) 10Dzahn: repeat hostname for IPv6 alsafi,fermium,neon,netmon1001,radium [dns] - 10https://gerrit.wikimedia.org/r/316017 [22:42:12] 06Operations, 06Labs, 10Striker, 07LDAP: Store Wikimedia unified account name (SUL) in LDAP directory - https://phabricator.wikimedia.org/T148048#2718457 (10bd808) p:05Triage>03Normal [22:44:32] what's with the etcd servers that are CRIT in Icinga since a couple days [22:44:56] 1004, 1005 and 1006, i mean the puppet run on them [22:51:06] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Stirring The Pot, and 2 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2615311 (10Ejegg) a:03AndyRussG [22:54:01] 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure: labvirt1005 - HP RAID controller issue (battery?) - https://phabricator.wikimedia.org/T148255#2718489 (10Dzahn) [22:59:53] 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure: labvirt1005 - HP RAID controller issue (battery?) - https://phabricator.wikimedia.org/T148255#2718504 (10Dzahn) btw, when searching phab i saw a couple older resolved tickets, like "doesnt boot up" T100030 and "memory errors" T97521 all on this sam... [23:13:13] (03CR) 10Dzahn: [C: 032] "nice! also how it gets rid of the hosts yaml file. cant compile on cobalt yet, but does compile on lead as no-op (lead got removed but sa" [puppet] - 10https://gerrit.wikimedia.org/r/316015 (owner: 10Chad) [23:14:17] ostriches do you know how to update the puppet compiler for cobalt? [23:14:23] like we did for lead [23:14:47] runs puppet on cobalt [23:15:20] nothing happens, good [23:18:20] oh, no icinga bot [23:18:53] " * ircecho is running [23:19:13] @seen icinga-wm [23:19:13] mutante: icinga-wm is in here, right now [23:19:29] icinga-wm quit in the netsplit at 18:51:47 [23:19:39] ok, that explains [23:19:50] might rejoin the channel only if it has to say something? [23:19:58] thanks, i was really confused about "it's here" but i cant autocomplete the name [23:20:07] been a number of hours since the last warning though [23:20:13] restarts it [23:20:38] usually that bot is... not the quietest bot around [23:21:34] yea, so it is running, i can tail the log file it pipes into channel [23:21:42] and see changes. it's just not here [23:21:45] try writing to the log file? [23:22:29] did [23:22:33] not joining [23:22:54] what if you write to the file it pipes to -labs? [23:23:33] 16:24 < icinga-wm> RECOVERY - MAGIC SELF HEAL [23:24:16] ¯\_(ツ)_/¯ [23:45:25] (03PS1) 10Dzahn: add IPv6 AAAA and PTR for terbium and wasat [dns] - 10https://gerrit.wikimedia.org/r/316028 [23:46:00] (03CR) 10Dzahn: "follow-up in DNS https://gerrit.wikimedia.org/r/#/c/316028/" [puppet] - 10https://gerrit.wikimedia.org/r/302649 (owner: 10Dzahn) [23:54:49] (03CR) 10Dzahn: "this is also relevant for manifests/role/tcpircbot which first used the "unmapped" IPv6 addresses (then i updated that) and also has comme" [dns] - 10https://gerrit.wikimedia.org/r/316028 (owner: 10Dzahn) [23:59:14] (03CR) 10Alex Monk: [C: 031] add IPv6 AAAA and PTR for terbium and wasat [dns] - 10https://gerrit.wikimedia.org/r/316028 (owner: 10Dzahn)