[00:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Evening SWAT (Max 8 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180309T0000). [00:00:05] Jhs, MaxSem, and Niharika: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:02:52] Niharika: Hey, sorry about missing the previous window for those two logos [00:03:00] !log set compression chunk length to 32, parsoid tables (group "enwiki") - T189057 [00:03:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:03:16] T189057: Understand (and if possible, improve) performance of new storage strategy - https://phabricator.wikimedia.org/T189057 [00:08:48] * twkozlowski wondering who's going to be so kind & SWAT https://gerrit.wikimedia.org/r/#/c/417020/ for him [00:12:13] (03PS1) 10Dzahn: netboot: fix partman selection for bast, unify bast hosts [puppet] - 10https://gerrit.wikimedia.org/r/417472 [00:18:37] thcipriani: Hey, are you available for SWAT tonight too? [00:19:43] twkozlowski: ah sure, give me a minute and I can get all setup [00:22:08] (03PS4) 10Thcipriani: Update logos for Banyumasan and Urdu Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417020 (https://phabricator.wikimedia.org/T189155) (owner: 10Odder) [00:23:08] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417020 (https://phabricator.wikimedia.org/T189155) (owner: 10Odder) [00:24:39] (03Merged) 10jenkins-bot: Update logos for Banyumasan and Urdu Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417020 (https://phabricator.wikimedia.org/T189155) (owner: 10Odder) [00:25:38] i'm here now. if anyone else is [00:27:35] twkozlowski: change is live on mwdebug1002, check please [00:28:49] thcipriani: Looks great :-) [00:29:37] (03CR) 10jenkins-bot: Update logos for Banyumasan and Urdu Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417020 (https://phabricator.wikimedia.org/T189155) (owner: 10Odder) [00:29:53] twkozlowski: ok, going live, will take me a few syncs :) [00:33:21] thcipriani, have you run the namespaceDupes script? :) [00:33:26] !log thcipriani@tin Synchronized static/images/project-logos/urwiki-1.5x.png: SWAT: [[gerrit:417020|Update logos for Banyumasan and Urdu Wikipedias]] T189155 PART I (duration: 00m 59s) [00:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:33:44] T189155: Update logo for Banyumasan and Urdu Wikipedias - https://phabricator.wikimedia.org/T189155 [00:34:15] Jhs: I ran a dry run, shows 36 pages to fix, 0 were resolvable and then another section with 14 links to fix, 14 were resolvable [00:35:05] thcipriani, ok cool. what's the procedure for the unresolvable ones again? [00:35:24] cry [00:35:41] * bd808 was channeling his inner Reedy [00:35:43] hehe [00:36:04] lolol [00:36:04] thcipriani: I apologise it this sounds like a silly question, but did you do https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#Image_Cache_Purges? [00:36:18] !log thcipriani@tin Synchronized static/images/project-logos/urwiki-2x.png: SWAT: [[gerrit:417020|Update logos for Banyumasan and Urdu Wikipedias]] T189155 PART II (duration: 00m 58s) [00:36:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:36:35] twkozlowski: not yet, but I can once I get everything synced out [00:36:56] great, thanks ;) [00:38:39] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:417020|Update logos for Banyumasan and Urdu Wikipedias]] T189155 PART III (duration: 00m 58s) [00:38:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:38:53] T189155: Update logo for Banyumasan and Urdu Wikipedias - https://phabricator.wikimedia.org/T189155 [00:40:51] !log thcipriani@tin Synchronized static/images/project-logos: SWAT: [[gerrit:417020|Update logos for Banyumasan and Urdu Wikipedias]] T189155 PART IV (duration: 00m 58s) [00:41:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:43:09] (03PS2) 10Dzahn: netboot: fix partman selection for bast, unify bast hosts [puppet] - 10https://gerrit.wikimedia.org/r/417472 [00:43:36] twkozlowski: ok, synced and purged [00:44:00] Jhs: I don't know the procedure for the unresolvable ones [00:44:37] Add a prefix/suffix [00:45:09] thcipriani: You, sir, have made history! yay [00:45:26] twkozlowski: kudos on the changes :) [00:45:33] These were the two last Wikipedia logos to be changed from version 1 to version 2 :-) [00:46:15] twkozlowski, yay, only took ~8 years ;D [00:46:43] 7 years 10 months ;) [00:46:52] an new world record! [00:47:06] (03CR) 10Dzahn: [C: 032] netboot: fix partman selection for bast, unify bast hosts [puppet] - 10https://gerrit.wikimedia.org/r/417472 (owner: 10Dzahn) [00:47:24] we'll have to wait for a v3 change to see it broken [00:47:28] thcipriani, we can use the suffix T186943 from the related task i think. that's bound to be unique, and the pages can be moved afterwards. the results of the script run can also be pasted there ( https://phabricator.wikimedia.org/T186943 ) so the community members can go through them [00:47:28] T186943: Localize & change namespaces on Sindhi Wikipedia (sdwiki) - https://phabricator.wikimedia.org/T186943 [00:47:58] bd808: I don't think this is ever going to be beaten, i.e. no one ever is going to be replacing Wikipedia v1 logo to v2 :) [00:48:01] so: mwscript namespaceDupes.php sdwiki --fix --add-prefix=T186943 [00:48:32] twkozlowski: fair point :) [00:49:18] sorry: [00:49:24] mwscript namespaceDupes.php sdwiki --fix --add-suffix=T186943 [00:49:37] And also, word of notice if anyone is thinking of doing a v2 - it's /literally/ going to take ages to replace that logo :-) [00:49:56] * thcipriani runs nsdupes [00:50:05] bd808: I'm going to be pissed if the logo is being replaced earlier than 8 years from now, haha :) [00:50:21] also, s/earlier/sooner/ [00:50:47] twkozlowski, maybe not replace, but within 8 years i think we'll be adding truly 3d versions for VR readers :D [00:51:03] Oh no :-( [00:51:40] Jhs: I guess we're lucky to "have them all" in vectors then ;) [00:53:13] yup :) [00:54:33] (03CR) 10Krinkle: [C: 031] wmf-config: enable Singapore oversample as default on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417331 (https://phabricator.wikimedia.org/T188652) (owner: 10Imarlier) [00:57:01] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 23 probes of 294 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [00:57:06] Jhs: list added to task [00:57:21] thcipriani, thank you very much :) [00:57:37] sure thing, yw :) [01:02:00] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 8 probes of 294 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [01:03:33] 10Operations, 10Traffic: Turn up network links for Asia Cache DC - https://phabricator.wikimedia.org/T156031#4037248 (10ayounsi) [01:03:36] 10Operations, 10Traffic: Network hardware configuration for Asia Cache DC - https://phabricator.wikimedia.org/T162684#4037246 (10ayounsi) 05Open>03Resolved Devices added to Rancid & monitoring We're all done here. [01:04:14] 10Operations, 10Traffic: Enable Service in Asia Cache DC - https://phabricator.wikimedia.org/T156026#4037252 (10ayounsi) [01:04:17] 10Operations, 10Traffic: Turn up network links for Asia Cache DC - https://phabricator.wikimedia.org/T156031#2962044 (10ayounsi) 05Open>03Resolved a:03ayounsi Transit, Transport, and Peering are up. [01:16:18] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10hardware-requests, 10procurement: eqiad: (4) systems for CirrusSearch Elasticssearch replica service - https://phabricator.wikimedia.org/T187627#4037274 (10bd808) [01:25:06] (03CR) 10Krinkle: Front Thumbor instances with Haproxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [01:28:12] 10Operations, 10Patch-For-Review: setup/install bast1002(WMF4749) - https://phabricator.wikimedia.org/T186623#4037280 (10Dzahn) after reverting the changes to let it auto-partition again: - fails as before tried another partman recipe with "gpt" - fails as before switched SATA controller to ATA mode - fa... [01:52:47] 10Operations, 10Traffic, 10Performance-Team (Radar): Define turn-up process and scope for eqsin service to regional countries - https://phabricator.wikimedia.org/T189252#4037302 (10Krinkle) [01:55:37] (03CR) 10Krinkle: "https://grafana.wikimedia.org/dashboard/db/webpagereplay-desktop-alerts" [puppet] - 10https://gerrit.wikimedia.org/r/417221 (https://phabricator.wikimedia.org/T188988) (owner: 10Phedenskog) [01:55:51] (03CR) 10Krinkle: [C: 031] Icinga: Add WebPageReplay Grafana performance alerts [puppet] - 10https://gerrit.wikimedia.org/r/417221 (https://phabricator.wikimedia.org/T188988) (owner: 10Phedenskog) [02:44:20] PROBLEM - Host cp3048 is DOWN: PING CRITICAL - Packet loss = 100% [02:45:50] RECOVERY - Host cp3048 is UP: PING WARNING - Packet loss = 80%, RTA = 87.18 ms [03:22:26] (03PS1) 10Andrew Bogott: labweb striker: move from memcached to nutcracker [puppet] - 10https://gerrit.wikimedia.org/r/417503 [03:23:24] (03CR) 10Andrew Bogott: [C: 032] labweb striker: move from memcached to nutcracker [puppet] - 10https://gerrit.wikimedia.org/r/417503 (owner: 10Andrew Bogott) [03:27:10] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 [03:27:30] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 711.70 seconds [03:27:31] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [03:36:14] 10Operations, 10Traffic, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): How to purge misc-web varnishes for wikitech changes? - https://phabricator.wikimedia.org/T189168#4037359 (10Andrew) 05Open>03Resolved [04:23:51] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 198.08 seconds [04:45:58] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 3 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#3911827 (10Prtksxna) Requested {T189279} too. [04:53:19] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 3 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#3911827 (10Gryllida) Jumping out of context here, but it could be nice to have the new site multi-lingual unlike what https://wiki... [04:56:54] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 3 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#4037419 (10Volker_E) @Gryllida That is one of our own quests and is discussed in T164449. Please don't side-rail tasks, but rather... [04:57:44] !log andrew@tin Started deploy [horizon/deploy@930009e]: rebuilding venvs to avoid rogue configs, as was causing T189278 [04:58:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:58:02] T189278: Session issues on labweb horizon ('newhorizon.wikimedia.org') - https://phabricator.wikimedia.org/T189278 [05:00:42] !log andrew@tin Finished deploy [horizon/deploy@930009e]: rebuilding venvs to avoid rogue configs, as was causing T189278 (duration: 02m 59s) [05:00:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:36:20] (03PS1) 10Marostegui: db-eqiad.php: Fully repool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417639 [06:38:56] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417639 (owner: 10Marostegui) [06:39:05] (03PS1) 10Marostegui: site.pp: Clarify db1113 status [puppet] - 10https://gerrit.wikimedia.org/r/417643 [06:39:58] (03CR) 10Marostegui: [C: 032] site.pp: Clarify db1113 status [puppet] - 10https://gerrit.wikimedia.org/r/417643 (owner: 10Marostegui) [06:40:13] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417639 (owner: 10Marostegui) [06:40:29] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool es1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417639 (owner: 10Marostegui) [06:41:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore es1019 normal weight (duration: 00m 59s) [06:41:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:27] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037480 (10Marostegui) [06:44:33] 10Operations, 10DBA, 10cloud-services-team, 10Patch-For-Review: Failover m5 master from db1009 to db1073 - https://phabricator.wikimedia.org/T189005#4037478 (10Marostegui) 05Open>03Resolved I am going to close this as resolved as nothing has come up. We will follow up the decommission of db1009 on T189... [06:46:16] (03PS1) 10Marostegui: db-codfw.php: Depool es2012 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417647 [06:47:54] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool es2012 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417647 (owner: 10Marostegui) [06:49:22] (03Merged) 10jenkins-bot: db-codfw.php: Depool es2012 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417647 (owner: 10Marostegui) [06:50:19] (03CR) 10jenkins-bot: db-codfw.php: Depool es2012 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417647 (owner: 10Marostegui) [06:50:57] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool es2012 for kernel and mariadb upgrade (duration: 00m 58s) [06:51:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:51:29] (03CR) 10Marostegui: [C: 04-1] labswiki: Replace 'm5-master' CNAME with backing db name (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417324 (owner: 10BryanDavis) [06:52:09] !log Stop MariaDB on es2012 to upgrade mariadb and kernel [06:52:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:02:43] (03PS1) 10Marostegui: db-codfw.php: Repool es2012, depool es2013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417658 [07:03:25] (03PS3) 10Madhuvishy: dumps: Refactor fetcher profile [puppet] - 10https://gerrit.wikimedia.org/r/416983 (https://phabricator.wikimedia.org/T188727) [07:04:27] (03CR) 10Madhuvishy: [C: 032] dumps: Refactor fetcher profile [puppet] - 10https://gerrit.wikimedia.org/r/416983 (https://phabricator.wikimedia.org/T188727) (owner: 10Madhuvishy) [07:04:40] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool es2012, depool es2013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417658 (owner: 10Marostegui) [07:05:57] (03Merged) 10jenkins-bot: db-codfw.php: Repool es2012, depool es2013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417658 (owner: 10Marostegui) [07:07:14] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool es2012, depool es2013 (duration: 00m 58s) [07:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:08:09] (03CR) 10jenkins-bot: db-codfw.php: Repool es2012, depool es2013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417658 (owner: 10Marostegui) [07:16:33] (03PS1) 10Marostegui: db-codfw.php. Repool es2013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417667 [07:18:15] (03CR) 10Marostegui: [C: 032] db-codfw.php. Repool es2013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417667 (owner: 10Marostegui) [07:21:28] (03Merged) 10jenkins-bot: db-codfw.php. Repool es2013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417667 (owner: 10Marostegui) [07:21:30] (03CR) 10jenkins-bot: db-codfw.php. Repool es2013 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417667 (owner: 10Marostegui) [07:22:29] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool es2013 (duration: 00m 58s) [07:22:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:28:14] (03PS1) 10Marostegui: db-codfw.php: Depool db2058, db2084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417675 [07:29:27] (03CR) 10jerkins-bot: [V: 04-1] db-codfw.php: Depool db2058, db2084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417675 (owner: 10Marostegui) [07:30:10] (03PS2) 10Marostegui: db-codfw.php: Depool db2058, db2084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417675 [07:33:36] !log Logging for the record: es2013 was stopped and rebooted for mariadb and kernel upgrade [07:33:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:54] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2058 and db2084 (duration: 00m 58s) [07:34:56] !log Stop mariadb on db2058 and db2084 for mariadb+kernel upgrade [07:35:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:37] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2058, db2084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417675 (owner: 10Marostegui) [07:35:39] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2058, db2084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417675 (owner: 10Marostegui) [07:35:41] (03CR) 10jenkins-bot: db-codfw.php: Depool db2058, db2084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417675 (owner: 10Marostegui) [07:42:52] (03PS1) 10Urbanecm: Remove obsolete throttle rules, add one new [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417687 (https://phabricator.wikimedia.org/T189241) [07:47:55] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2058, db2084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417695 [07:57:19] (03CR) 10Jcrespo: "information_schema is probably not the right way to query objects on labs, specially if you plan to do it periodically- with hundreds of t" [puppet] - 10https://gerrit.wikimedia.org/r/417357 (owner: 10Bstorm) [08:04:32] (03CR) 10Jcrespo: "Performance issues could be minimized if there was 1 single query to I_S, not 800, but it should be tested to be checked." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/417357 (owner: 10Bstorm) [08:05:07] 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Pybal stuck at BGP state OPENSENT while the other peer reached ESTABLISHED - https://phabricator.wikimedia.org/T188085#4037541 (10Vgutierrez) Checking cr2-eqiad BGP neighbor information, I realized that for lvs1006 it's showing an Open Message Error tha... [08:08:28] 10Operations, 10Ops-Access-Requests: Requesting access to terbium.eqiad.wmnet for bmansurov - https://phabricator.wikimedia.org/T189285#4037542 (10bmansurov) [08:13:59] (03CR) 10Jcrespo: "Bryan- I abandoned by change for a reason- I think we need to talk first to sync (I almost think I understand what you want to do, but kno" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417324 (owner: 10BryanDavis) [08:14:37] (03CR) 10Jcrespo: [C: 04-1] labswiki: Replace 'm5-master' CNAME with backing db name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417324 (owner: 10BryanDavis) [08:16:20] (03Draft1) 10Paladox: Update bazlets to upstream (includes fix for python 3) [software/gerrit/plugins/wikimedia] - 10https://gerrit.wikimedia.org/r/417713 [08:16:22] (03PS2) 10Paladox: Update bazlets to upstream (includes fix for python 3) [software/gerrit/plugins/wikimedia] - 10https://gerrit.wikimedia.org/r/417713 [08:19:04] (03PS1) 10Jcrespo: mariadb: Repool db1114 with low weight after reboot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417714 [08:23:33] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1114 with low weight after reboot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417714 (owner: 10Jcrespo) [08:24:47] (03Merged) 10jenkins-bot: mariadb: Repool db1114 with low weight after reboot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417714 (owner: 10Jcrespo) [08:26:46] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037570 (10Marostegui) >>! In T183469#4031760, @Marostegui wrote: > In order to replace db1020 (m2 master) and following: https://gerrit.wikimedia.org/... [08:27:46] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1114 with low load (duration: 00m 58s) [08:28:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:16] (03CR) 10jenkins-bot: mariadb: Repool db1114 with low weight after reboot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417714 (owner: 10Jcrespo) [08:32:25] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2058, db2084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417695 (owner: 10Marostegui) [08:32:58] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037577 (10jcrespo) I had another idea (not much dissimilar) having into account the new hosts that are coming- dumps will greatly benefit from larger... [08:33:30] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037578 (10jcrespo) [08:33:36] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2058, db2084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417695 (owner: 10Marostegui) [08:34:08] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037581 (10Marostegui) >>! In T183469#4037577, @jcrespo wrote: > I had another idea (not much dissimilar) having into account the new hosts that are co... [08:34:34] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037582 (10jcrespo) yes, sorry, multiinstance :-P [08:35:21] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2058 and db2084 (duration: 00m 58s) [08:35:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:54] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037587 (10Marostegui) I am going to move the backups of db1009 from db1113 to db1114 [08:38:18] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2058, db2084" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417695 (owner: 10Marostegui) [08:38:45] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037588 (10jcrespo) maybe es2001? It has an older directory. [08:39:24] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037590 (10Marostegui) >>! In T183469#4037588, @jcrespo wrote: > maybe es2001? It has an older directory. I wanted to avoid cross-dc transfers....but... [09:01:54] 10Operations, 10Analytics, 10DBA, 10EventBus, and 5 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4037597 (10jcrespo) [09:18:44] 10Operations, 10Analytics, 10DBA, 10EventBus, and 5 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4037613 (10jcrespo) The lower concurrency is better, but the problem it is still ongoing- it is too "bursty"- moments where many connections happ... [09:22:38] !log cp-misc_esams: reboot for retpoline kernel updates T188092 [09:22:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:35] (03PS4) 10Phedenskog: Icinga: Add WebPageReplay Grafana performance alerts [puppet] - 10https://gerrit.wikimedia.org/r/417221 (https://phabricator.wikimedia.org/T188988) [09:35:01] 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Pybal stuck at BGP state OPENSENT while the other peer reached ESTABLISHED - https://phabricator.wikimedia.org/T188085#4037641 (10Vgutierrez) rechecking logs on lvs1006.wikimedia.org shows the following output regarding bgp for Feb 22nd: ``` vgutierrez@... [09:36:47] (03PS2) 10Gehel: wdqs: configure new servers wdqs200[4-6] [puppet] - 10https://gerrit.wikimedia.org/r/416961 (https://phabricator.wikimedia.org/T187766) [09:42:11] (03PS1) 10Jcrespo: mariadb: Increase db1114 weight on both main and api traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417759 [09:42:35] (03CR) 10Gehel: [C: 032] wdqs: configure new servers wdqs200[4-6] [puppet] - 10https://gerrit.wikimedia.org/r/416961 (https://phabricator.wikimedia.org/T187766) (owner: 10Gehel) [09:44:07] (03PS1) 10Jayprakash12345: Enable ShortUrl Extension at knwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417762 (https://phabricator.wikimedia.org/T189287) [09:46:19] (03CR) 10Jcrespo: [C: 032] mariadb: Increase db1114 weight on both main and api traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417759 (owner: 10Jcrespo) [09:46:54] (03Merged) 10jenkins-bot: mariadb: Increase db1114 weight on both main and api traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417759 (owner: 10Jcrespo) [09:48:16] (03CR) 10jenkins-bot: mariadb: Increase db1114 weight on both main and api traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417759 (owner: 10Jcrespo) [09:51:07] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1114 with normal load (duration: 00m 58s) [09:51:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:37] 10Operations, 10Analytics, 10DBA, 10EventBus, and 5 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4037678 (10jcrespo) In other order of things, is it normal to still get errors from 127.0.0.1, which I think it points to the older queue? I thou... [10:07:48] (03CR) 10Jayprakash12345: "To deployer: please run before "mwscript extensions/WikimediaMaintenance/createExtensionTables.php shorturl --wiki=knwikisource"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417762 (https://phabricator.wikimedia.org/T189287) (owner: 10Jayprakash12345) [10:07:50] (03PS1) 10Marostegui: site.pp: Remove comments about db1113 [puppet] - 10https://gerrit.wikimedia.org/r/417774 [10:09:16] (03CR) 10Marostegui: [C: 032] site.pp: Remove comments about db1113 [puppet] - 10https://gerrit.wikimedia.org/r/417774 (owner: 10Marostegui) [10:13:27] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037688 (10Marostegui) >>! In T183469#4037577, @jcrespo wrote: > I had another idea (not much dissimilar) having into account the new hosts that are co... [10:16:19] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037692 (10jcrespo) ok to me- with "candidate hosts" and "statement hosts", the puzzle gets more and more difficult. [10:23:02] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4037711 (10Marostegui) >>! In T183469#4037692, @jcrespo wrote: > ok to me- with "candidate hosts" and "statement hosts", the puzzle gets more and more... [10:28:10] (03PS1) 10Arturo Borrero Gonzalez: apt: apt-upgrade: add -x/--exclude option [puppet] - 10https://gerrit.wikimedia.org/r/417779 (https://phabricator.wikimedia.org/T181647) [10:29:16] (03CR) 10jerkins-bot: [V: 04-1] apt: apt-upgrade: add -x/--exclude option [puppet] - 10https://gerrit.wikimedia.org/r/417779 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [10:35:37] (03PS1) 10Marostegui: mariadb: Provision db1113 [puppet] - 10https://gerrit.wikimedia.org/r/417781 (https://phabricator.wikimedia.org/T184161) [10:35:39] (03PS1) 10Gehel: wdqs: comment out wdqs_internal nodes from eqiad [puppet] - 10https://gerrit.wikimedia.org/r/417782 (https://phabricator.wikimedia.org/T187766) [10:36:09] (03CR) 10jerkins-bot: [V: 04-1] wdqs: comment out wdqs_internal nodes from eqiad [puppet] - 10https://gerrit.wikimedia.org/r/417782 (https://phabricator.wikimedia.org/T187766) (owner: 10Gehel) [10:36:20] (03PS2) 10Marostegui: mariadb: Initial setup for db1113 [puppet] - 10https://gerrit.wikimedia.org/r/417781 (https://phabricator.wikimedia.org/T184161) [10:37:06] (03PS2) 10Gehel: wdqs: comment out wdqs_internal nodes from eqiad [puppet] - 10https://gerrit.wikimedia.org/r/417782 (https://phabricator.wikimedia.org/T187766) [10:37:44] (03CR) 10Gehel: [C: 032] wdqs: comment out wdqs_internal nodes from eqiad [puppet] - 10https://gerrit.wikimedia.org/r/417782 (https://phabricator.wikimedia.org/T187766) (owner: 10Gehel) [10:42:03] (03PS3) 10Marostegui: mariadb: Initial setup for db1113 [puppet] - 10https://gerrit.wikimedia.org/r/417781 (https://phabricator.wikimedia.org/T184161) [10:47:38] (03CR) 10Marostegui: [C: 032] mariadb: Initial setup for db1113 [puppet] - 10https://gerrit.wikimedia.org/r/417781 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [10:48:05] 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Pybal stuck at BGP state OPENSENT while the other peer reached ESTABLISHED - https://phabricator.wikimedia.org/T188085#4037730 (10Vgutierrez) 05Open>03Invalid a:03Vgutierrez [10:58:49] (03PS5) 10ArielGlenn: cheap image dump script that might be ok for wikitech [dumps] - 10https://gerrit.wikimedia.org/r/417009 (https://phabricator.wikimedia.org/T188915) [11:02:46] 10Operations, 10Pybal, 10Traffic: Tune systemd journal rate limiting for PyBal - https://phabricator.wikimedia.org/T189290#4037747 (10Vgutierrez) [11:04:00] 10Operations, 10DBA, 10MediaWiki-General-or-Unknown, 10MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#4037759 (10jcrespo) As a followup, and out of scope of this ticket- are... [11:05:45] (03PS1) 10Marostegui: s5,s6.hosts: Add db1113 [software] - 10https://gerrit.wikimedia.org/r/417796 (https://phabricator.wikimedia.org/T184161) [11:07:40] (03CR) 10Marostegui: [C: 032] s5,s6.hosts: Add db1113 [software] - 10https://gerrit.wikimedia.org/r/417796 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [11:08:19] (03Merged) 10jenkins-bot: s5,s6.hosts: Add db1113 [software] - 10https://gerrit.wikimedia.org/r/417796 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [11:09:06] 10Operations, 10Pybal, 10Traffic: Tune systemd journal rate limiting for PyBal - https://phabricator.wikimedia.org/T189290#4037765 (10Vgutierrez) p:05Triage>03Normal [11:10:30] (03PS1) 10Marostegui: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417799 (https://phabricator.wikimedia.org/T184161) [11:12:14] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417799 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [11:13:23] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417799 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [11:14:19] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417799 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [11:14:47] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1051 to clone db1113 - T184161 (duration: 00m 58s) [11:14:55] !log Stop MySQL on db1051 to clone db1113 - https://phabricator.wikimedia.org/T184161 [11:15:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:02] T184161: Productionize 2 new eqiad database servers - https://phabricator.wikimedia.org/T184161 [11:15:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:43] (03PS2) 10Arturo Borrero Gonzalez: apt: apt-upgrade: add -x/--exclude option [puppet] - 10https://gerrit.wikimedia.org/r/417779 (https://phabricator.wikimedia.org/T181647) [11:28:34] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add initial config db1113 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417812 (https://phabricator.wikimedia.org/T184161) [11:30:36] (03CR) 10Jcrespo: [C: 031] db-eqiad,db-codfw.php: Add initial config db1113 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417812 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [11:31:50] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Add initial config db1113 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417812 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [11:33:27] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add initial config db1113 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417812 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [11:33:42] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add initial config db1113 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417812 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [11:34:48] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add initial config for db1113 multiinstance - T184161 (duration: 00m 58s) [11:35:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:05] T184161: Productionize 2 new eqiad database servers - https://phabricator.wikimedia.org/T184161 [11:35:51] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Add initial config for db1113 multiinstance - T184161 (duration: 00m 58s) [11:35:58] 10Operations, 10DBA, 10MediaWiki-General-or-Unknown, 10MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#4037808 (10EddieGP) >>! In T176754#4037759, @jcrespo wrote: > As a foll... [11:36:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:47] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417814 [11:36:51] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417814 [11:37:02] (03CR) 10Marostegui: [C: 04-2] "Wait for db1051 to be back up" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417814 (owner: 10Marostegui) [11:37:05] 10Operations, 10DBA, 10MediaWiki-General-or-Unknown, 10MW-1.31-release-notes (WMF-deploy-2018-03-13 (1.31.0-wmf.25)), and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#4037814 (10jcrespo) That makes sense, thank you. [11:39:20] (03CR) 10Arturo Borrero Gonzalez: [C: 032] apt: apt-upgrade: add -x/--exclude option [puppet] - 10https://gerrit.wikimedia.org/r/417779 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [11:43:17] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4037819 (10phuedx) 05Open>03stalled >>! In T186748#4028238, @phuedx wrote: > I've reached out to SRE and Services to clarify where we are... [11:58:24] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#3954012 (10Peter) FYI: Watch out for bug this introduced in 64/65 (depending on version, I got it in 65 but according to the report it is in... [12:04:09] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417814 (owner: 10Marostegui) [12:05:39] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417814 (owner: 10Marostegui) [12:07:00] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db1051 after cloning db1113:3315- T184161 (duration: 00m 58s) [12:07:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:13] T184161: Productionize 2 new eqiad database servers - https://phabricator.wikimedia.org/T184161 [12:07:43] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4037873 (10phuedx) Thanks for the heads up, @Peter! [12:08:02] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1051 after cloning db1113:3315- T184161 (duration: 00m 58s) [12:08:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:25] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417814 (owner: 10Marostegui) [12:11:12] !log dropping test databases on dbstore2* instances [12:11:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:36] (03PS1) 10Marostegui: db-eqiad.php: Depool db1063 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417832 (https://phabricator.wikimedia.org/T184161) [12:12:29] I would like to do that everywhere, but I will start with that only on non-production [12:14:05] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1063 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417832 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [12:15:19] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1063 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417832 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [12:16:39] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1063 to clone db1113:3316 - T184161 (duration: 00m 58s) [12:16:43] !log Stop mysql on db1063 to clone db1113:3316 - T184161 [12:16:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:16:56] T184161: Productionize 2 new eqiad database servers - https://phabricator.wikimedia.org/T184161 [12:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:11] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1063 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417832 (https://phabricator.wikimedia.org/T184161) (owner: 10Marostegui) [12:26:14] !log Compress s5 on db1113:3315 - T184161 [12:26:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:26:30] T184161: Productionize 2 new eqiad database servers - https://phabricator.wikimedia.org/T184161 [12:34:11] 10Operations, 10ops-eqsin, 10Traffic, 10netops: Setup eqsin RIPE Atlas anchor - https://phabricator.wikimedia.org/T179042#4037912 (10BBlack) Oh makes sense, maybe the initial image install just has the v4 and RIPE has to configure the v6 during their bringup process? [12:35:47] 10Operations, 10ops-eqsin, 10Traffic, 10netops: Setup eqsin RIPE Atlas anchor - https://phabricator.wikimedia.org/T179042#4037931 (10faidon) That is correct to my knowledge -- that was the case with our other anchors. [12:39:02] RECOVERY - Check systemd state on kafka1012 is OK: OK - running: The system is fully operational [12:39:51] RECOVERY - Check systemd state on kafka1014 is OK: OK - running: The system is fully operational [12:40:21] RECOVERY - Check systemd state on kafka1020 is OK: OK - running: The system is fully operational [12:40:21] RECOVERY - Check systemd state on kafka1013 is OK: OK - running: The system is fully operational [12:40:31] RECOVERY - Check systemd state on kafka1023 is OK: OK - running: The system is fully operational [12:41:03] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Upgrade mw* servers to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T174431#4037933 (10MoritzMuehlenhoff) [12:41:05] !log manually executed systemctl reset-failed to some old (not present anymore) units on kafka analytics hosts [12:41:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:21] RECOVERY - Check systemd state on kafka1022 is OK: OK - running: The system is fully operational [12:41:30] 10Operations, 10HHVM, 10User-Elukey: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4037934 (10MoritzMuehlenhoff) p:05Triage>03High [12:42:19] (03CR) 10Paladox: Add repository configuration for thirdparty/php72 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415857 (owner: 10Muehlenhoff) [13:06:58] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Upgrade mw* servers to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T174431#4038047 (10MoritzMuehlenhoff) [13:07:03] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy - https://phabricator.wikimedia.org/T177498#4038041 (10MoritzMuehlenhoff) 05Open>03Resolved The packages have been built and tests have been made, closing. The ta... [13:11:34] (03PS2) 10Mark Bergsma: [WiP] Split off attributes and exceptions from bgp.py into their own modules [debs/pybal] - 10https://gerrit.wikimedia.org/r/416985 [13:12:19] !log Compress s6 on db1113:3316 - T184161 [13:12:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:36] T184161: Productionize 2 new eqiad database servers - https://phabricator.wikimedia.org/T184161 [13:20:05] 10Operations, 10HHVM, 10User-Elukey: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4038068 (10jhsoby) [13:20:10] 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy - https://phabricator.wikimedia.org/T177498#4038069 (10jhsoby) [13:20:44] (03Restored) 10Niedzielski: New: add chromium_render service [puppet] - 10https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166) (owner: 10Niedzielski) [13:21:10] (03PS1) 10Jcrespo: mariadb: Allow recovering arbitrary backups by providing a path [puppet] - 10https://gerrit.wikimedia.org/r/417876 [13:21:53] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Allow recovering arbitrary backups by providing a path [puppet] - 10https://gerrit.wikimedia.org/r/417876 (owner: 10Jcrespo) [13:40:54] (03CR) 10Gilles: Front Thumbor instances with Haproxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [13:52:28] (03PS8) 10Gilles: Front Thumbor instances with Haproxy [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) [13:52:56] (03CR) 10jerkins-bot: [V: 04-1] Front Thumbor instances with Haproxy [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [13:54:06] (03PS9) 10Gilles: Front Thumbor instances with Haproxy [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) [13:57:15] (03PS2) 10Herron: fix puppetdb terminus package conflict in ::puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/417459 (https://phabricator.wikimedia.org/T177253) [13:57:58] (03CR) 10jerkins-bot: [V: 04-1] fix puppetdb terminus package conflict in ::puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/417459 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [13:58:14] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 3 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#3994382 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['wdqs1004.eqiad.wmnet'] ``` The log can be found in `/var/log/w... [14:00:31] (03CR) 10Gilles: "A bit better, avoids the empty profile and is in line with how nginx is invoked." [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [14:01:42] (03CR) 10Gilles: "https://puppet-compiler.wmflabs.org/compiler02/10385/" [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [14:06:38] (03PS1) 10Gehel: wdqs: migrate to stretch [puppet] - 10https://gerrit.wikimedia.org/r/417899 (https://phabricator.wikimedia.org/T188045) [14:07:55] (03PS3) 10Herron: fix puppetdb terminus package conflict in ::puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/417459 (https://phabricator.wikimedia.org/T177253) [14:08:36] (03CR) 10jerkins-bot: [V: 04-1] fix puppetdb terminus package conflict in ::puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/417459 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [14:09:04] (03CR) 10Herron: [V: 032 C: 032] "> modules/role/manifests/puppetmaster/standalone.pp:50 wmf-style: Found hiera call in class 'role::puppetmaster::standalone' for 'puppetdb" [puppet] - 10https://gerrit.wikimedia.org/r/417459 (https://phabricator.wikimedia.org/T177253) (owner: 10Herron) [14:17:35] (03CR) 10Muehlenhoff: [C: 031] wdqs: migrate to stretch [puppet] - 10https://gerrit.wikimedia.org/r/417899 (https://phabricator.wikimedia.org/T188045) (owner: 10Gehel) [14:17:48] moritzm: thanks! [14:18:12] * gehel guesses that moritzm is happy to see more stretch... [14:20:49] it's as if you can read my mind! [14:24:34] (03PS2) 10Jcrespo: mariadb: Allow recovering arbitrary backups by providing a path [puppet] - 10https://gerrit.wikimedia.org/r/417876 [14:24:51] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Allow recovering arbitrary backups by providing a path [puppet] - 10https://gerrit.wikimedia.org/r/417876 (owner: 10Jcrespo) [14:27:26] (03PS3) 10Jcrespo: mariadb: Allow recovering arbitrary backups by providing a path [puppet] - 10https://gerrit.wikimedia.org/r/417876 [14:42:10] (03PS2) 10Gehel: wdqs: migrate to stretch [puppet] - 10https://gerrit.wikimedia.org/r/417899 (https://phabricator.wikimedia.org/T188045) [14:43:13] (03CR) 10Gehel: [C: 032] wdqs: migrate to stretch [puppet] - 10https://gerrit.wikimedia.org/r/417899 (https://phabricator.wikimedia.org/T188045) (owner: 10Gehel) [14:45:08] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 3 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4038243 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['wdqs1004.eqiad.wmnet'] ``` The log can be found in `/var/log/w... [14:53:52] 10Operations, 10Analytics, 10DBA, 10EventBus, and 5 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4034945 (10Pchelolo) > In other order of things, is it normal to still get errors from 127.0.0.1, which I think it points to the older queue? I t... [14:54:49] 10Operations, 10Traffic, 10Performance-Team (Radar): Define turn-up process and scope for eqsin service to regional countries - https://phabricator.wikimedia.org/T189252#4038250 (10BBlack) [14:56:26] 10Operations, 10Traffic, 10Performance-Team (Radar): Define turn-up process and scope for eqsin service to regional countries - https://phabricator.wikimedia.org/T189252#4036653 (10BBlack) Updated with actual target country lists above. Process and batching of this for actual turn-up work still TODO :) [14:57:37] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4038252 (10Pchelolo) [14:58:00] (03CR) 1020after4: [C: 031] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/417226 (https://phabricator.wikimedia.org/T180628) (owner: 10Alexandros Kosiaris) [14:58:33] (03CR) 10jerkins-bot: [V: 04-1] scap::target: Install git-lfs [puppet] - 10https://gerrit.wikimedia.org/r/417226 (https://phabricator.wikimedia.org/T180628) (owner: 10Alexandros Kosiaris) [14:58:42] 10Operations, 10Traffic: WP Zero workarounds for eqsin - https://phabricator.wikimedia.org/T189250#4038254 (10BBlack) [15:02:01] (03PS1) 10Andrew Bogott: labweb wikitech: change hostname from 'newwikitech' to 'wikitech' [puppet] - 10https://gerrit.wikimedia.org/r/417925 (https://phabricator.wikimedia.org/T168470) [15:02:27] PROBLEM - Disk space on wdqs1004 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [15:02:28] PROBLEM - dhclient process on wdqs1004 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [15:03:02] 10Operations, 10Analytics, 10DBA, 10EventBus, and 5 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4038274 (10jcrespo) Thanks, that last comment was indeed *very very* useful. > one-by-one and that means each job establishes a new connection... [15:04:00] wdqs1004 (alert above) is being reimaged, not sure why icinga warns us anyway... [15:04:04] (03PS1) 10Andrew Bogott: wikitech: move from silver to misc-web backed by labweb [dns] - 10https://gerrit.wikimedia.org/r/417926 (https://phabricator.wikimedia.org/T168470) [15:04:56] (03CR) 10Andrew Bogott: [C: 032] labweb wikitech: change hostname from 'newwikitech' to 'wikitech' [puppet] - 10https://gerrit.wikimedia.org/r/417925 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [15:05:27] RECOVERY - Disk space on wdqs1004 is OK: DISK OK [15:05:28] RECOVERY - dhclient process on wdqs1004 is OK: PROCS OK: 0 processes with command name dhclient [15:09:09] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 3 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4038279 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['wdqs1004.eqiad.wmnet'] ``` and were **ALL** successful. [15:11:54] !log cp-upload_esams: reboot for retpoline kernel updates T188092 [15:12:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:44] 10Operations, 10DBA, 10Goal: Generate consistent logical database backups in CODFW - https://phabricator.wikimedia.org/T184699#4038285 (10jcrespo) [15:14:25] 10Operations, 10Analytics, 10DBA, 10EventBus, and 5 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4038288 (10Pchelolo) > Proxy/connection pool is something that we are going to use for crossdc connections, so it was already in the backlog. Gr... [15:15:15] !log installing sensible-utils security update on trusty (Debian already fixed) [15:15:26] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417935 [15:15:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:06] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417935 (owner: 10Marostegui) [15:18:41] (03PS3) 10Bstorm: wiki replicas: script index creation for easier maintenance [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) [15:18:51] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417935 (owner: 10Marostegui) [15:19:07] 10Operations, 10DBA, 10Patch-For-Review: Create less overhead on bacula jobs when dumping production databases - https://phabricator.wikimedia.org/T162789#4038292 (10jcrespo) 05Open>03Resolved The previous ticket was closed, other things, like coordionaton and monitoring previously mentioned will be hand... [15:19:10] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417935 (owner: 10Marostegui) [15:20:23] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1063 after cloning db1113:3316 - T184161 (duration: 00m 58s) [15:20:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:38] PROBLEM - IPsec on cp2008 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:20:38] PROBLEM - IPsec on cp2014 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:20:38] PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:20:38] PROBLEM - IPsec on cp1048 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:20:38] PROBLEM - IPsec on cp2024 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:20:38] T184161: Productionize 2 new eqiad database servers - https://phabricator.wikimedia.org/T184161 [15:20:38] PROBLEM - IPsec on cp2005 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:20:47] PROBLEM - IPsec on kafka-jumbo1001 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:20:47] PROBLEM - IPsec on cp1071 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:20:47] PROBLEM - IPsec on kafka1012 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:20:48] PROBLEM - IPsec on kafka-jumbo1002 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:20:48] PROBLEM - IPsec on kafka-jumbo1006 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:20:48] that's me, sorry about the spam ^ [15:20:54] 10Operations, 10DBA, 10monitoring, 10Patch-For-Review: Create script to monitor db dumps for backups are successful (and if not, old backups are not deleted) - https://phabricator.wikimedia.org/T151999#4038297 (10jcrespo) This is partially implemented on T184696, if backups fail, they are not rotated. The... [15:20:57] PROBLEM - IPsec on cp1050 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:20:57] PROBLEM - IPsec on kafka-jumbo1003 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:20:57] PROBLEM - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:20:57] PROBLEM - IPsec on cp2020 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:20:57] PROBLEM - IPsec on cp1063 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:20:57] PROBLEM - IPsec on kafka-jumbo1005 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:20:58] PROBLEM - IPsec on kafka1013 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:20:58] PROBLEM - IPsec on kafka1020 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:21:07] PROBLEM - IPsec on cp1099 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:21:07] PROBLEM - IPsec on cp2022 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:21:08] PROBLEM - IPsec on cp1049 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:21:08] PROBLEM - IPsec on cp2017 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:21:08] PROBLEM - IPsec on cp1074 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:21:17] PROBLEM - IPsec on cp2002 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:21:17] PROBLEM - IPsec on cp1072 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:21:17] PROBLEM - IPsec on cp1062 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:21:18] PROBLEM - IPsec on cp2011 is CRITICAL: Strongswan CRITICAL - ok: 90 not-conn: cp3034_v4, cp3034_v6 [15:21:18] PROBLEM - IPsec on kafka-jumbo1004 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:21:18] PROBLEM - IPsec on kafka1014 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:21:18] PROBLEM - IPsec on kafka1023 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:21:27] PROBLEM - IPsec on kafka1022 is CRITICAL: Strongswan CRITICAL - ok: 134 not-conn: cp3034_v4, cp3034_v6 [15:21:28] PROBLEM - IPsec on cp1064 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3034_v4, cp3034_v6 [15:21:42] 10Operations, 10DBA, 10monitoring, 10Patch-For-Review: Create script to monitor db dumps for backups are successful - https://phabricator.wikimedia.org/T151999#4038299 (10jcrespo) [15:21:48] RECOVERY - IPsec on kafka-jumbo1006 is OK: Strongswan OK - 136 ESP OK [15:21:59] RECOVERY - IPsec on cp1050 is OK: Strongswan OK - 66 ESP OK [15:21:59] RECOVERY - IPsec on kafka-jumbo1003 is OK: Strongswan OK - 136 ESP OK [15:21:59] RECOVERY - IPsec on cp2026 is OK: Strongswan OK - 92 ESP OK [15:21:59] RECOVERY - IPsec on cp2020 is OK: Strongswan OK - 92 ESP OK [15:21:59] RECOVERY - IPsec on cp1063 is OK: Strongswan OK - 66 ESP OK [15:21:59] RECOVERY - IPsec on kafka-jumbo1005 is OK: Strongswan OK - 136 ESP OK [15:21:59] RECOVERY - IPsec on kafka1013 is OK: Strongswan OK - 136 ESP OK [15:21:59] RECOVERY - IPsec on kafka1020 is OK: Strongswan OK - 136 ESP OK [15:22:07] RECOVERY - IPsec on cp1099 is OK: Strongswan OK - 66 ESP OK [15:22:08] RECOVERY - IPsec on cp1049 is OK: Strongswan OK - 66 ESP OK [15:22:08] RECOVERY - IPsec on cp2022 is OK: Strongswan OK - 92 ESP OK [15:22:08] RECOVERY - IPsec on cp2017 is OK: Strongswan OK - 92 ESP OK [15:22:17] RECOVERY - IPsec on cp1074 is OK: Strongswan OK - 66 ESP OK [15:22:17] RECOVERY - IPsec on cp2002 is OK: Strongswan OK - 92 ESP OK [15:22:18] RECOVERY - IPsec on cp1072 is OK: Strongswan OK - 66 ESP OK [15:22:18] RECOVERY - IPsec on cp1062 is OK: Strongswan OK - 66 ESP OK [15:22:18] RECOVERY - IPsec on cp2011 is OK: Strongswan OK - 92 ESP OK [15:22:27] RECOVERY - IPsec on kafka-jumbo1004 is OK: Strongswan OK - 136 ESP OK [15:22:27] RECOVERY - IPsec on kafka1014 is OK: Strongswan OK - 136 ESP OK [15:22:27] RECOVERY - IPsec on kafka1023 is OK: Strongswan OK - 136 ESP OK [15:22:28] RECOVERY - IPsec on kafka1022 is OK: Strongswan OK - 136 ESP OK [15:22:28] RECOVERY - IPsec on cp1064 is OK: Strongswan OK - 66 ESP OK [15:22:37] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 3 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4038301 (10Gehel) Bad news... I just finished reimaging wdqs1004, and I still have trouble. SSH session suddenly / randomly freezes. I can't see the same link flapping that @Dzahn s... [15:22:38] RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 66 ESP OK [15:22:38] RECOVERY - IPsec on cp2008 is OK: Strongswan OK - 92 ESP OK [15:22:38] RECOVERY - IPsec on cp2014 is OK: Strongswan OK - 92 ESP OK [15:22:38] RECOVERY - IPsec on cp1048 is OK: Strongswan OK - 66 ESP OK [15:22:47] RECOVERY - IPsec on cp2024 is OK: Strongswan OK - 92 ESP OK [15:22:47] RECOVERY - IPsec on cp2005 is OK: Strongswan OK - 92 ESP OK [15:22:47] RECOVERY - IPsec on kafka-jumbo1001 is OK: Strongswan OK - 136 ESP OK [15:22:48] RECOVERY - IPsec on cp1071 is OK: Strongswan OK - 66 ESP OK [15:22:48] RECOVERY - IPsec on kafka1012 is OK: Strongswan OK - 136 ESP OK [15:22:48] RECOVERY - IPsec on kafka-jumbo1002 is OK: Strongswan OK - 136 ESP OK [15:26:14] 10Operations, 10DBA: Puppetize grants for mysql hosts that are the source of recovery (dbstore, passive misc) - https://phabricator.wikimedia.org/T111929#4038314 (10jcrespo) [15:27:15] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4038321 (10Marostegui) db1113:3315 and db1113:3316 are now compressing tables. I will pool this host on Monday and if it all goes fine for 24h, I will... [15:30:02] !log installing zsh security update on trusty [15:30:08] (03CR) 10Bstorm: wiki replicas: script index creation for easier maintenance (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [15:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:21] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 codfw machines - https://phabricator.wikimedia.org/T183470#4038328 (10jcrespo) [15:30:25] 10Operations, 10DBA, 10Patch-For-Review: Followup for TLS MariaDB server roll-out - https://phabricator.wikimedia.org/T157702#4038327 (10jcrespo) [15:32:35] 10Operations, 10DBA, 10Goal: Generate consistent logical database backups in CODFW - https://phabricator.wikimedia.org/T184699#4038342 (10jcrespo) [15:33:13] 10Operations, 10ops-esams, 10Traffic: cp3034: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T189305#4038347 (10ema) [15:33:48] 10Operations, 10ops-esams, 10Traffic: cp3034: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T189305#4038382 (10ema) p:05Triage>03Normal [15:34:25] (03PS1) 1020after4: Bump scap package version to 3.7.7-1 [puppet] - 10https://gerrit.wikimedia.org/r/417943 (https://phabricator.wikimedia.org/T189306) [15:35:37] 10Operations, 10Analytics, 10DBA, 10EventBus, and 5 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4038391 (10jcrespo) > I think it would be much better do it on your side of the "fence" I can own this no problem, but if I do, I will ask to re... [15:37:52] (03PS3) 10Mark Bergsma: [WiP] Split off attributes and exceptions from bgp.py into their own modules [debs/pybal] - 10https://gerrit.wikimedia.org/r/416985 [15:38:08] 10Operations, 10Packaging, 10Scap, 10Patch-For-Review: Install git-lfs client (at least on scap targets & masters) - https://phabricator.wikimedia.org/T180628#4038395 (10mmodell) [15:40:41] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339#4038396 (10Marostegui) is this still an issue? [15:42:29] 10Operations, 10Analytics, 10DBA, 10EventBus, and 6 others: High (2-3x) write and connection load on enwiki databases - https://phabricator.wikimedia.org/T189204#4038398 (10jcrespo) [15:45:08] (03PS1) 10Giuseppe Lavagetto: systemd: various fixes [puppet] - 10https://gerrit.wikimedia.org/r/417947 [15:45:10] (03PS1) 10Giuseppe Lavagetto: systemd: add define specific to timers [puppet] - 10https://gerrit.wikimedia.org/r/417948 [15:45:59] (03CR) 10jerkins-bot: [V: 04-1] systemd: various fixes [puppet] - 10https://gerrit.wikimedia.org/r/417947 (owner: 10Giuseppe Lavagetto) [15:46:08] <_joe_> I know, gerrit [15:46:09] (03CR) 10jerkins-bot: [V: 04-1] systemd: add define specific to timers [puppet] - 10https://gerrit.wikimedia.org/r/417948 (owner: 10Giuseppe Lavagetto) [15:47:36] 10Operations, 10Pybal, 10Traffic: Tune systemd journal rate limiting for PyBal - https://phabricator.wikimedia.org/T189290#4038412 (10Vgutierrez) pybal emits 854 messages during a restart in lvs1006. Also during a restart is when appears to log at its fastest rate, achieving almost 400 lines per second: ```v... [15:51:40] !log andrew@tin Started deploy [horizon/deploy@f59f568]: rolling out a fix for T188458 [15:51:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:51:57] T188458: Newhorizon errors on sudo policy changes - https://phabricator.wikimedia.org/T188458 [15:54:51] !log andrew@tin Finished deploy [horizon/deploy@f59f568]: rolling out a fix for T188458 (duration: 03m 11s) [15:55:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:30] (03CR) 10Muehlenhoff: Add repository configuration for thirdparty/php72 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415857 (owner: 10Muehlenhoff) [15:56:02] (03PS4) 10Muehlenhoff: Add repository configuration for thirdparty/php72 [puppet] - 10https://gerrit.wikimedia.org/r/415857 [15:57:45] 10Operations, 10Puppet, 10Patch-For-Review: compile/diff catalogs between puppetdb v2 (production) and puppetdb v4 - https://phabricator.wikimedia.org/T188544#4038449 (10herron) Rhodium (depooled) is now using the puppetdb v4 backend puppetdb1001. Since the puppetdbquery version in production is incompati... [15:57:51] (03PS2) 10Andrew Bogott: wikitech: move from silver to misc-web backed by labweb [dns] - 10https://gerrit.wikimedia.org/r/417926 (https://phabricator.wikimedia.org/T168470) [15:58:42] 10Operations, 10Pybal, 10Traffic: Tune systemd journal rate limiting for PyBal - https://phabricator.wikimedia.org/T189290#4038450 (10Vgutierrez) number of lines logged on a restart it's directly proportional to the number of services configured, lvs1010 appears to be the pybal instance with more services co... [15:59:07] !log ppchelko@tin Started deploy [cpjobqueue/deploy@5795526]: Enable root_claim_ttl for refreshLinks T189303 [15:59:23] !log moving wikitech dns record to point to misc-web and the new labweb cluster, https://gerrit.wikimedia.org/r/#/c/417926/ [15:59:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:24] T189303: Support claimTTL and rootClaimTTL in change-prop - https://phabricator.wikimedia.org/T189303 [15:59:33] (03CR) 10Andrew Bogott: [C: 032] wikitech: move from silver to misc-web backed by labweb [dns] - 10https://gerrit.wikimedia.org/r/417926 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [15:59:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:46] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@5795526]: Enable root_claim_ttl for refreshLinks T189303 (duration: 00m 38s) [16:00:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:04] andrewbogott: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Wikitech maintenance deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180309T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:03:44] !log test log to new labweb/wikitech cluster [16:03:45] andrewbogott: Failed to log message to wiki. Somebody should check the error logs. [16:03:49] dang [16:04:23] 10Operations, 10ops-esams, 10Traffic: cp3034: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T189305#4038347 (10BBlack) See also T183177 (why aren't we getting runtime icinga alerts when these happen, via EDAC?) [16:07:10] !log what about now, can you log now? [16:07:11] andrewbogott: Failed to log message to wiki. Somebody should check the error logs. [16:07:58] !log bblack@neodymium conftool action : set/pooled=no; selector: name=cp3034.esams.wmnet [16:08:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:24] !log testing logging on new wikitech setup [16:08:54] 10Operations, 10ops-esams, 10Traffic: cp3034: Uncorrectable Memory Error - https://phabricator.wikimedia.org/T189305#4038496 (10BBlack) Also, depooled for now, since we can't trust the uncorrected memory errors not causing production issues: `16:07 <+logmsgbot> !log bblack@neodymium conftool action : set/poo... [16:12:42] (03PS2) 10Elukey: statistics::rsync::eventlogging: change rsync target to eventlog1002 [puppet] - 10https://gerrit.wikimedia.org/r/417322 (https://phabricator.wikimedia.org/T114199) [16:14:54] !log test log [16:15:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:18] bd808: worked! ^ [16:16:46] (03CR) 10Elukey: [C: 032] statistics::rsync::eventlogging: change rsync target to eventlog1002 [puppet] - 10https://gerrit.wikimedia.org/r/417322 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [16:18:43] (03CR) 10Bstorm: "If this can just be run on the labsdb hosts with the right effect, that might be better on a lot of levels. Having a script like this als" [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [16:22:52] (03PS1) 10Andrew Bogott: Revert "multiversion: add a transitional mapping for newwikitech.wikimedia.org" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417974 [16:23:05] (03CR) 10Muehlenhoff: [C: 032] Add repository configuration for thirdparty/php72 [puppet] - 10https://gerrit.wikimedia.org/r/415857 (owner: 10Muehlenhoff) [16:23:10] (03CR) 10Andrew Bogott: [C: 031] "Let's give this a few days before cleaning everything up" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417974 (owner: 10Andrew Bogott) [16:23:15] (03PS5) 10Muehlenhoff: Add repository configuration for thirdparty/php72 [puppet] - 10https://gerrit.wikimedia.org/r/415857 [16:23:44] moritzm: nice! [16:24:31] <_joe_> moritzm: <3 [16:30:40] (03CR) 10Zoranzoki21: "rebase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416708 (https://phabricator.wikimedia.org/T189022) (owner: 10Zoranzoki21) [16:34:32] (03PS1) 10Elukey: Reduce references and roles for eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/417978 (https://phabricator.wikimedia.org/T114199) [16:41:45] (03CR) 10Elukey: "Pcc for eventlog1001 seems indicating that admin perms are removed, maybe this change is too aggressive? :D" [puppet] - 10https://gerrit.wikimedia.org/r/417978 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [16:45:14] 10Operations, 10DBA: Icinga MariaDB disk space check on silver checks the wrong partition - https://phabricator.wikimedia.org/T151491#4038593 (10jcrespo) a:03Andrew Close or invalid, probably? [16:47:55] (03PS10) 10Gilles: Front Thumbor instances with Haproxy [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) [16:47:58] nice @ php72 [16:50:41] (03PS4) 10Bstorm: wiki replicas: script index creation for easier maintenance [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) [16:51:27] (03CR) 10Gilles: "https://puppet-compiler.wmflabs.org/compiler02/10387/thumbor1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [16:51:58] (03CR) 10Ottomata: [C: 031] Reduce references and roles for eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/417978 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [16:56:03] (03PS2) 10Elukey: Reduce references and roles for eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/417978 (https://phabricator.wikimedia.org/T114199) [16:57:46] (03CR) 10Jcrespo: "Let's talk on IRC sometime next week, I have comments regarding on problems you are likely to find that Cloud should be aware (real ones t" [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [16:58:14] (03PS5) 10Bstorm: wiki replicas: script index creation for easier maintenance [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) [16:59:29] (03CR) 10Bstorm: "> Let's talk on IRC sometime next week, I have comments regarding on" [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:00:18] (03CR) 10Jcrespo: "BTW, I don't say it enough, but I <3 any automation effort, it is easy to not acknoledge that and focus only on the problems. Sorry about " [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:01:16] (03CR) 10Bstorm: "Added labsdb hosts as potential location for the script and another way to check for index." [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:01:31] (03PS6) 10Bstorm: wiki replicas: script index creation for easier maintenance [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) [17:04:23] (03PS2) 10Giuseppe Lavagetto: systemd: various fixes [puppet] - 10https://gerrit.wikimedia.org/r/417947 [17:04:25] (03PS2) 10Giuseppe Lavagetto: systemd: add define specific to timers [puppet] - 10https://gerrit.wikimedia.org/r/417948 [17:08:15] (03PS3) 10Giuseppe Lavagetto: systemd: various fixes [puppet] - 10https://gerrit.wikimedia.org/r/417947 [17:10:50] (03CR) 10Elukey: [C: 04-2] "This thing would remove also the following:" [puppet] - 10https://gerrit.wikimedia.org/r/417978 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [17:10:56] (03Abandoned) 10Elukey: Reduce references and roles for eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/417978 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [17:10:58] (03CR) 10Giuseppe Lavagetto: [C: 032] systemd: various fixes [puppet] - 10https://gerrit.wikimedia.org/r/417947 (owner: 10Giuseppe Lavagetto) [17:22:22] (03PS5) 10Giuseppe Lavagetto: hhvm::admin: remove inclusion of apache::mod::proxy_fcgi [puppet] - 10https://gerrit.wikimedia.org/r/415573 [17:29:25] 10Operations, 10Cloud-Services, 10Cloud-VPS: Silver anomalies - https://phabricator.wikimedia.org/T151486#4038703 (10Andrew) [17:29:28] 10Operations, 10DBA: Icinga MariaDB disk space check on silver checks the wrong partition - https://phabricator.wikimedia.org/T151491#4038701 (10Andrew) 05Open>03Invalid Yep, this is moot as silver is about to be switched off. [17:29:49] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler02/10391/" [puppet] - 10https://gerrit.wikimedia.org/r/415573 (owner: 10Giuseppe Lavagetto) [17:31:05] 10Operations, 10DBA: Icinga MariaDB disk space check on silver checks the wrong partition - https://phabricator.wikimedia.org/T151491#4038719 (10jcrespo) Andrew- I will not do that, but you may want to search open tickets with the keyworkd "silver" or "wikitech"- probably you will be able to get rid of a lot o... [17:32:41] !log andrew@tin Started deploy [horizon/deploy@9c234d6]: Another try at fixing T188458 [17:32:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:32:47] T188458: Newhorizon errors on sudo policy changes - https://phabricator.wikimedia.org/T188458 [17:35:40] !log andrew@tin Finished deploy [horizon/deploy@9c234d6]: Another try at fixing T188458 (duration: 03m 00s) [17:35:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:22] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4038733 (10RobH) So this sees the disk errors still. In reviewing the installer log (P6827) with @MoritzMuehlenh... [17:41:39] PROBLEM - Host wdqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:43:45] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 3 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4038737 (10Gehel) Note that Icinga also has random failed checks (size of conntrack table, ferm, dpkg, ...) all with "Return code of 255 is out of bounds". The service (blazegraph)... [17:44:09] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 3 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4038738 (10RobH) So I can see @gehel asked: >>! In T188045#4009517, @Gehel wrote: > @Cmjohnson this looks like an issue with the physical connection. Could you try moving the cable... [17:47:48] 10Operations, 10Patch-For-Review: setup/install bast1002(WMF4749) - https://phabricator.wikimedia.org/T186623#4038757 (10Dzahn) from /var/log/syslog from installer shell ``` Mar 9 01:25:26 kernel: [ 72.856447] ata2.00: configured for UDMA/133 Mar 9 01:25:26 kernel: [ 72.856460] ata2: EH complete Mar... [17:53:00] 10Operations, 10HHVM, 10User-Elukey: ICU 57 migration for wikis using non-default collation - https://phabricator.wikimedia.org/T189295#4038764 (10Reedy) [17:58:16] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4038785 (10Dzahn) I bet it's the controller rather than all the disks then, yep. But sounds all good, thanks! [18:06:02] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 3 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4038807 (10Gehel) @Cmjohnson did message me about changing the port as well, so that's probably not it. Side note if it wasnt clear from the gerrit change above, OS also has been u... [18:07:19] jouncebot: next [18:07:19] In 64 hour(s) and 52 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180312T1100) [18:08:59] ACKNOWLEDGEMENT - Host wdqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Gehel https://phabricator.wikimedia.org/T189318 [18:10:29] Is gerrit being shit? [18:11:00] Reedy what do you mean? [18:11:04] won't load [18:11:08] loads for me [18:11:26] ok now [18:11:28] stupid thing [18:11:30] wfm [18:11:32] ok [18:11:38] * Nemo_bis slow [18:11:54] Nemo_bis it dosen't load for you either? [18:12:18] it works; I tested too late; I'm slow [18:12:30] ah ok [18:14:00] 10Operations, 10Puppet, 10Patch-For-Review: Upgrade PuppetDB to version 4.4 - https://phabricator.wikimedia.org/T177253#4038834 (10herron) [18:14:04] 10Operations, 10Puppet, 10Patch-For-Review: Build a pair of debian stretch PuppetDB servers - https://phabricator.wikimedia.org/T185499#4038831 (10herron) 05Open>03Resolved a:03herron puppetdb[1]2001 are online and running puppetdb 4.4 with postgres db and nginx https frontend [18:14:27] (03PS1) 10Rxy: Change autoconfirmed settings and Enable flood group at zhwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418008 (https://phabricator.wikimedia.org/T189289) [18:26:38] !log reedy@tin Synchronized php-1.31.0-wmf.24/extensions/AbuseFilter/includes/AbuseFilter.class.php: Unbreak AbuseFilter tagging T189299 (duration: 00m 59s) [18:26:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:47] T189299: Abusefilter did not successfully tag the edits - https://phabricator.wikimedia.org/T189299 [19:09:41] PROBLEM - MariaDB Slave Lag: s4 on db2090 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 349.35 seconds [19:19:32] !log restarted my script on tin, now with more aggressive writes [19:19:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:51] RECOVERY - Host wdqs2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.78 ms [19:31:50] PROBLEM - MD RAID on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [19:37:00] PROBLEM - configured eth on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [19:37:10] PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 603.71 seconds [19:38:40] PROBLEM - dhclient process on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [19:40:20] PROBLEM - puppet last run on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [19:43:04] * foks !log changed global email for User:Mathmensch [19:43:12] !log changed global email for User:Mathmensch [19:43:15] that was odd. [19:43:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:43:50] PROBLEM - DPKG on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [19:45:31] PROBLEM - Disk space on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [19:49:31] (03PS1) 10Dzahn: add wmf4727 as bast1003 [dns] - 10https://gerrit.wikimedia.org/r/418034 (https://phabricator.wikimedia.org/T186623) [19:57:55] (03CR) 10Dzahn: [C: 032] add wmf4727 as bast1003 [dns] - 10https://gerrit.wikimedia.org/r/418034 (https://phabricator.wikimedia.org/T186623) (owner: 10Dzahn) [20:01:40] PROBLEM - NTP on labtestvirt2003 is CRITICAL: NTP CRITICAL: No response from NTP server [20:02:20] RECOVERY - MariaDB Slave Lag: s4 on db2090 is OK: OK slave_sql_lag Replication lag: 27.55 seconds [20:05:14] PROBLEM - IPMI Sensor Status on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds [20:06:17] (03PS1) 10Dzahn: add wmf4727 as bast1003.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/418036 (https://phabricator.wikimedia.org/T186623) [20:11:47] (03CR) 1020after4: [C: 031] "Thanks @Muehlenhoff! This is awesome!" [puppet] - 10https://gerrit.wikimedia.org/r/415856 (owner: 10Muehlenhoff) [20:15:21] RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 273.13 seconds [21:03:10] (03PS1) 10Andrew Bogott: wikitech: move wikitech-static backup job from silver to labweb1001 [puppet] - 10https://gerrit.wikimedia.org/r/418057 (https://phabricator.wikimedia.org/T168470) [21:05:04] (03CR) 10Andrew Bogott: [C: 032] wikitech: move wikitech-static backup job from silver to labweb1001 [puppet] - 10https://gerrit.wikimedia.org/r/418057 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [21:25:53] (03PS1) 10Framawiki: Change NS aliases on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418070 (https://phabricator.wikimedia.org/T189277) [21:29:53] (03PS1) 10Andrew Bogott: wikitech-static sync: tidy up the dump process [puppet] - 10https://gerrit.wikimedia.org/r/418071 (https://phabricator.wikimedia.org/T168470) [21:30:44] (03CR) 10Andrew Bogott: [C: 032] wikitech-static sync: tidy up the dump process [puppet] - 10https://gerrit.wikimedia.org/r/418071 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [21:32:02] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 3 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4039309 (10Gehel) @Cmjohnson I'll be vacation starting March 18, and I would be more relax if I knew our wdqs eqiad clsuter has its usual 3 nodes. Coud you already rack one of the n... [21:32:52] (03PS1) 10Andrew Bogott: wikitech_static_sync: fix broken file dependency [puppet] - 10https://gerrit.wikimedia.org/r/418073 [21:33:31] (03CR) 10Andrew Bogott: [C: 032] wikitech_static_sync: fix broken file dependency [puppet] - 10https://gerrit.wikimedia.org/r/418073 (owner: 10Andrew Bogott) [21:36:40] (03CR) 10Framawiki: [C: 031] Remove obsolete throttle rules, add one new [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417687 (https://phabricator.wikimedia.org/T189241) (owner: 10Urbanecm) [21:42:50] (03PS1) 10Umherirrender: Fix typo in word compatibility [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418128 [21:47:07] (03PS1) 10Andrew Bogott: wikitech-static-sync: Fix vhost for modern Apache [puppet] - 10https://gerrit.wikimedia.org/r/418130 (https://phabricator.wikimedia.org/T168470) [21:48:00] (03CR) 10Andrew Bogott: [C: 032] wikitech-static-sync: Fix vhost for modern Apache [puppet] - 10https://gerrit.wikimedia.org/r/418130 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [21:49:20] PROBLEM - Host wdqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [22:12:07] (03CR) 10Zoranzoki21: [C: 031] "Looks good" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418008 (https://phabricator.wikimedia.org/T189289) (owner: 10Rxy) [22:58:34] I'll deploy a fix for a trivial ReadingLists bug [23:04:30] (03CR) 10Dzahn: [C: 032] add wmf4727 as bast1003.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/418036 (https://phabricator.wikimedia.org/T186623) (owner: 10Dzahn) [23:09:42] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 2 others: Restricting access for a collaboration nearing completion - https://phabricator.wikimedia.org/T189341#4039520 (10DarTar) [23:13:04] (03PS1) 10Dzahn: DHCP: add MAC of wmf4727 as bast1003 [puppet] - 10https://gerrit.wikimedia.org/r/418157 (https://phabricator.wikimedia.org/T186623) [23:13:53] (03PS2) 10Dzahn: DHCP: add MAC of wmf4727 as bast1003 [puppet] - 10https://gerrit.wikimedia.org/r/418157 (https://phabricator.wikimedia.org/T186623) [23:14:09] (03CR) 10Dzahn: [C: 032] "https://racktables.wikimedia.org/index.php?page=object&tab=default&object_id=2892" [puppet] - 10https://gerrit.wikimedia.org/r/418157 (https://phabricator.wikimedia.org/T186623) (owner: 10Dzahn) [23:26:58] (03PS7) 10Bstorm: wiki replicas: script index creation for easier maintenance [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) [23:27:27] (03CR) 10jerkins-bot: [V: 04-1] wiki replicas: script index creation for easier maintenance [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [23:29:04] !log tgr@tin Synchronized php-1.31.0-wmf.24/extensions/ReadingLists/src/Api/ApiQueryReadingListEntries.php: T189272 fix stupid ReadingLists typo breaking production (duration: 00m 54s) [23:29:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:11] T189272: Error: Deleted row returned in non-changes mode - https://phabricator.wikimedia.org/T189272 [23:29:59] (03CR) 10Bstorm: "This version also includes a change that allows the script to actually check current indexes that are named in the config file. If the co" [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [23:31:16] (03PS8) 10Bstorm: wiki replicas: script index creation for easier maintenance [puppet] - 10https://gerrit.wikimedia.org/r/417357 (https://phabricator.wikimedia.org/T181650) [23:47:00] PROBLEM - Long running screen/tmux on labtestvirt2003 is CRITICAL: Return code of 255 is out of bounds