[00:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate Evening SWAT (Max 8 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171121T0000). [00:00:05] RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:45] jouncebot: you are advanced AI. you nailed it. [00:00:59] but for 99/1 splits, or easy revertibility, the versioning has to overlap in compatibility. e.g. the ores cache has versions 1, 2, and 3 of its data definition. And then you have a series of releases that kinda code code:0.2, data:2, code:0.3, data:3 to overlap their compatibility [00:01:11] harrr [00:01:11] I'm here and I can do my own SWAT [00:01:17] (code:0.2 is compatible with data:1 and data:2, code:0.3 is compatible with data:2 and data:3, etc) [00:01:49] I really appreciate the help here, I’m proud to have spent my last evening with deployment rights with you all. I now must trod through the rainy night streets. [00:02:22] given the long timescales of our mysql schema changes, I assume we've already solved that problem for basic MW<->mysql stuff [00:02:32] but there's other datastores [00:02:35] RoanKattouw: No swat plz. [00:03:59] no_justification: I will point out that the change I listed is a two-character JS change https://gerrit.wikimedia.org/r/#/c/392479/1/resources/src/mediawiki.rcfilters/ui/mw.rcfilters.ui.ItemMenuOptionWidget.js . If the answer is still no, I'll reschedule for tomorrow [00:04:17] I don't trust any changes from any developers anymore [00:04:27] * no_justification sets `scap lock` with a timeout of infinity [00:04:29] lol no_justification has the thousand-yard stare [00:04:41] Well if releng is being unreasonable I guess I gotta live with that [00:04:49] RoanKattouw: I can't stop you ;-) [00:04:54] o/ [00:04:57] * no_justification is super grumpy today [00:04:58] lol yes you can. I'm not a root anymore, remember [00:05:09] RoanKattouw: I was gonna ask about your root status :) [00:05:18] I gave that up years ago! [00:05:20] RoanKattouw: `scap lock --all` is group writable, so yes you can :p [00:05:25] haha [00:05:30] Oh interesting I didn't realize that [00:05:36] So it's an advisory lock rather than an enforced one [00:05:37] ? [00:05:52] Yep. Locking deployments is meant to stop accidents, not a determined actor [00:05:53] well, in an emergency we don't want a lock with some random person's ownership blocking a fix [00:06:07] Oh yes that's a good point, you need to be able to unlock in emergencies [00:06:07] Yeah, that ^ [00:06:21] eg: I lock, go afk for the day, you come behind me.... [00:06:21] (03Draft1) 10Paladox: planet: Replace div with a [puppet] - 10https://gerrit.wikimedia.org/r/392543 [00:06:23] (03Draft2) 10Paladox: planet: Replace div with a [puppet] - 10https://gerrit.wikimedia.org/r/392543 [00:06:33] Heh, replace div with a puppet. [00:07:04] Or someone locks during an outage, then gets hit by a bus [00:07:05] Who needs divs when you got a puppet [00:07:18] RoanKattouw: or a train [00:07:29] Right, of the non-deployment variety [00:07:51] no_justification its a fix for planets new ui :) [00:07:58] http://planet-hotdog.wmflabs.org [00:08:07] * Platonides sets a cron running scap lock --all every minute [00:08:20] Platonides: You'd spam IRC ;-) [00:08:24] H,, [00:08:45] greg-g: Could we have a morning SWAT (11am PDT) window tomorrow please? There isn't one because it's a Tuesday, but there's also no train to dodge [00:08:50] RoanKattouw: Eh, go ahead [00:08:59] well, let's add a conditional to only do that if the lock is missing ;) [00:09:06] OK in that case belay that, I can deploy my patch now [00:09:07] I just instinctively said no because deployments bit me today and I'm feeling rather spiteful. [00:09:09] <3 [00:09:09] vcoleman____: this is a ping to see if victoria can see it [00:09:32] (she's trouble shooting something) [00:09:53] no_justification: That's alright. What else happened today aside from the ORES thing just now? [00:10:18] people getting in the way of his train deploy [00:10:36] A train that's already late, I'll add! [00:11:19] Oh that would be why I couldn't add this follow-up to the morning SWAT I guess [00:11:27] Oh well, I'm not celebrating thanksgiving anyway, I'll just deploy it when everyone's gone and stuffing themselves with turkey and shit. [00:12:01] (it was a follow-up for a half-functional patch I had put in the morning SWAT, so it naturally was at the back of the line and I was told it was dropped because the morning SWAT went over time or something) [00:13:32] * no_justification adds "the train" to the blame wheel [00:14:39] bblack, https://mariadb.org/wp-content/uploads/2017/04/As-Of-Tech-Feature-Overview.pdf [00:16:39] honestly, with as huge a role as the deployment train plays in our lives, we're not using it for puns enough [00:17:03] e.g. "no_justification is not in the office today, he got hit by the deployment train" [00:17:04] Because puns are the lowest form of comedy [00:17:38] "this feature fell off the back of the deployment train" [00:18:01] mostly I want to use "get off my train" for reverts [00:18:25] * no_justification stabs everyone with a railroad spike [00:18:28] See, train puns! [00:19:23] https://upload.wikimedia.org/wikipedia/commons/1/19/Train_wreck_at_Montparnasse_1895.jpg [00:19:31] !log catrope@tin Synchronized php-1.31.0-wmf.8/resources/src/mediawiki.rcfilters/ui/mw.rcfilters.ui.ItemMenuOptionWidget.js: T180863 (duration: 00m 49s) [00:19:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:19:38] T180863: [wmf.7] "Excluded" label is displayed with filter selection - https://phabricator.wikimedia.org/T180863 [00:19:45] jynus: how the heck did someone manage that one [00:20:13] https://en.wikipedia.org/wiki/List_of_rail_accidents_(2010%E2%80%93present)#2017 [00:20:27] ^ we still have a better track record than physical trains :) [00:20:58] Zppix: my guess is booze [00:21:13] no_justification: "ill park my train.... here" [00:21:13] Zppix, are you telling me you do not recognize that photograph? [00:21:20] jynus: should i? [00:21:27] bblack: For the whole world? [00:22:03] jynus: https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys [00:22:32] Best train photo ever tbh [00:22:43] "The Lévy and Sons photograph (at the top of the article) has become one of the most famous in transportation history." [00:22:52] Antoine tried to change it but I reverted his vandalism [00:22:52] https://en.wikipedia.org/wiki/Montparnasse_derailment [00:24:27] What if the mw deployment train was a real train... [00:25:03] I have deployed code from a train many times, actually, at 350Km/h [00:25:31] I have also deployed from a train, but a significantly slower one. [00:25:41] The only ban is deploying from ferries. [00:25:57] no_justification: cause they get sick? [00:26:03] lol [00:26:58] No, because the ferry went out of range and said deployer lost their wifi connection ;-) [00:27:23] I suppose deployment from ISS is probably inadvisable as well but it hasn't come up yet. [00:29:08] I mean your surrounded by satellite so the internet should be fast no [00:34:13] if one of us has the opportunity to deploy from the ISS how could we turn that down? [00:36:11] legoktm: https://imgflip.com/i/1zqbma [00:44:35] legoktm: Probably too busy taking photos, tbh [00:49:57] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3776546 (10Dzahn) Maybe Planet can be the guinea pig. [00:59:20] (03PS3) 10Ayounsi: Reserve internal anycast range [dns] - 10https://gerrit.wikimedia.org/r/391131 [01:01:03] (03PS3) 10Paladox: Gerrit: Move apache resources to the profile instead of gerrit::proxy [puppet] - 10https://gerrit.wikimedia.org/r/392494 [01:03:13] (03CR) 10Dzahn: [C: 032] planet: Replace div with a [puppet] - 10https://gerrit.wikimedia.org/r/392543 (owner: 10Paladox) [01:03:29] thanks :) [01:04:21] PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6479 [01:05:21] RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3474904 keys, up 5 minutes 11 seconds - replication_delay is 0 [01:07:14] deploys from bus [01:14:17] 10Operations, 10Traffic: Define 3-host infra cluster for traffic pops - https://phabricator.wikimedia.org/T96852#3776588 (10Krinkle) [01:16:36] (03CR) 10Krinkle: keys: Note that Chris, Mark and Markus are no longer releasing new versions (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392462 (https://phabricator.wikimedia.org/T180615) (owner: 10Legoktm) [01:17:22] (03Draft1) 10Paladox: planet: Decrease size for h2 font to 1.5em [puppet] - 10https://gerrit.wikimedia.org/r/392551 [01:17:26] (03PS2) 10Paladox: planet: Decrease size for h2 font to 1.5em [puppet] - 10https://gerrit.wikimedia.org/r/392551 [01:18:40] 10Operations, 10Traffic: Lower geodns TTLs from 600 (10min) to 300 (5min) - https://phabricator.wikimedia.org/T140365#3776597 (10Krinkle) [01:19:28] (03CR) 10Dzahn: [C: 032] planet: Decrease size for h2 font to 1.5em [puppet] - 10https://gerrit.wikimedia.org/r/392551 (owner: 10Paladox) [01:19:33] thanks :) [01:19:55] PROBLEM - MariaDB disk space on db2085 is CRITICAL: DISK CRITICAL - free space: /srv 225644 MB (5% inode=99%) [01:20:48] (03CR) 10Krinkle: [C: 031] keys: Document usage of gpg --fetch-keys to import all keys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392464 (owner: 10Legoktm) [01:22:06] (03Draft1) 10Paladox: planet: Fix syntax error with font size [puppet] - 10https://gerrit.wikimedia.org/r/392552 [01:22:10] (03PS2) 10Paladox: planet: Fix syntax error with font size [puppet] - 10https://gerrit.wikimedia.org/r/392552 [01:24:38] (03CR) 10Dzahn: [C: 032] planet: Fix syntax error with font size [puppet] - 10https://gerrit.wikimedia.org/r/392552 (owner: 10Paladox) [01:38:11] (03PS2) 10Legoktm: keys: Note that Chris, Mark and Markus are no longer releasing new versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392462 (https://phabricator.wikimedia.org/T180615) [01:38:13] (03PS2) 10Legoktm: keys: Remove keys of former release managers from keys.txt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392463 [01:38:15] (03PS2) 10Legoktm: keys: Document usage of gpg --fetch-keys to import all keys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392464 [01:38:44] (03CR) 10Legoktm: keys: Note that Chris, Mark and Markus are no longer releasing new versions (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392462 (https://phabricator.wikimedia.org/T180615) (owner: 10Legoktm) [01:43:38] (03PS1) 10Smalyshev: Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) [01:44:01] (03CR) 10jerkins-bot: [V: 04-1] Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) (owner: 10Smalyshev) [01:47:25] (03PS2) 10Smalyshev: Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) [01:47:47] (03CR) 10jerkins-bot: [V: 04-1] Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) (owner: 10Smalyshev) [01:50:26] (03PS3) 10Smalyshev: Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) [01:50:46] (03PS4) 10Smalyshev: Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) [01:50:48] (03CR) 10jerkins-bot: [V: 04-1] Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) (owner: 10Smalyshev) [01:52:34] (03PS9) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [01:52:36] (03PS5) 10Smalyshev: Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) [01:53:02] PROBLEM - Disk space on ms-be1039 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdj1 is not accessible: Input/output error [01:54:01] PROBLEM - HP RAID on ms-be1039 is CRITICAL: CRITICAL: Slot 3: Failed: 1I:1:4 - OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK [01:54:06] ACKNOWLEDGEMENT - HP RAID on ms-be1039 is CRITICAL: CRITICAL: Slot 3: Failed: 1I:1:4 - OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T181020 [01:54:10] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T181020#3776680 (10ops-monitoring-bot) [02:00:41] PROBLEM - puppet last run on ms-be1039 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[mountpoint-/srv/swift-storage/sdj1] [02:01:11] RECOVERY - Disk space on ms-be1039 is OK: DISK OK [02:02:44] (03PS1) 10Krinkle: noc: Remove more unused styles/images and update Vector [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392557 [02:03:23] !log smalyshev@tin Started deploy [wdqs/wdqs@7d951d2]: Restore categories vocabulary to V003 [02:03:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:11:24] RECOVERY - MariaDB disk space on db2085 is OK: DISK OK [02:23:06] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.7) (duration: 05m 42s) [02:23:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:24:21] PROBLEM - Nginx local proxy to apache on mw2108 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:12] RECOVERY - Nginx local proxy to apache on mw2108 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.199 second response time [02:44:12] (03CR) 10Krinkle: [C: 032] noc: Remove more unused styles/images and update Vector [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392557 (owner: 10Krinkle) [02:45:20] (03Merged) 10jenkins-bot: noc: Remove more unused styles/images and update Vector [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392557 (owner: 10Krinkle) [02:46:54] (03CR) 10jenkins-bot: noc: Remove more unused styles/images and update Vector [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392557 (owner: 10Krinkle) [02:48:17] !log krinkle@tin Synchronized docroot/noc: clean up (duration: 00m 49s) [02:48:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:54:40] will reboot the phabricator server in a few seconds [02:55:48] !log phab1001 (phabricator prod) reboot for kernel upgrade [02:55:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:58:24] (03PS1) 10Krinkle: multiversion: Assume --wiki=aawiki for purgeUrls.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392561 [02:58:51] !log phabricator back up [02:58:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:07:02] (03CR) 10Dzahn: "T174782 is resolved meanwhile. should this be merged now?" [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) (owner: 10Tpt) [03:08:01] I knew i was getting off too easy with language converter [03:09:05] (03CR) 10Dzahn: "per comment from Moritz, remove the entire rule since we just need localhost?" [puppet] - 10https://gerrit.wikimedia.org/r/376024 (owner: 10Giuseppe Lavagetto) [03:19:15] (03PS1) 10Dzahn: mw_rc_irc: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/392562 [03:21:41] (03PS2) 10Dzahn: mw_rc_irc: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/392562 [03:22:29] (03CR) 10Dzahn: [C: 032] mw_rc_irc: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/392562 (owner: 10Dzahn) [03:24:26] (03CR) 10Dzahn: "no-op on kraz.wikimedia.org" [puppet] - 10https://gerrit.wikimedia.org/r/392562 (owner: 10Dzahn) [03:25:01] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 799.11 seconds [03:26:19] (03PS1) 10Dzahn: test: use profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/392563 [03:27:21] (03PS2) 10Dzahn: test: use profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/392563 [03:34:52] (03CR) 10Dzahn: [C: 032] test: use profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/392563 (owner: 10Dzahn) [03:51:21] (03PS1) 10Dzahn: ganeti: create profiles, split monitoring/firewall classes [puppet] - 10https://gerrit.wikimedia.org/r/392564 [03:51:42] (03CR) 10jerkins-bot: [V: 04-1] ganeti: create profiles, split monitoring/firewall classes [puppet] - 10https://gerrit.wikimedia.org/r/392564 (owner: 10Dzahn) [03:53:52] (03PS2) 10Dzahn: ganeti: create profiles, split monitoring/firewall classes [puppet] - 10https://gerrit.wikimedia.org/r/392564 [03:56:12] (03CR) 10Dzahn: [C: 031] apache: Add http2 to mod [puppet] - 10https://gerrit.wikimedia.org/r/392495 (owner: 10Paladox) [03:59:22] PROBLEM - HHVM rendering on mw2132 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:00:12] RECOVERY - HHVM rendering on mw2132 is OK: HTTP OK: HTTP/1.1 200 OK - 76422 bytes in 0.438 second response time [04:06:11] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 149.10 seconds [04:07:22] (03CR) 10Krinkle: [C: 031] keys: Note that Chris, Mark and Markus are no longer releasing new versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392462 (https://phabricator.wikimedia.org/T180615) (owner: 10Legoktm) [04:07:43] (03CR) 10Krinkle: [C: 031] keys: Remove keys of former release managers from keys.txt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392463 (owner: 10Legoktm) [04:08:21] (03CR) 10Krinkle: "What's the easiest way to verify/confirm these short hashes?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392461 (owner: 10Legoktm) [04:09:26] (03CR) 10Legoktm: "I separated each key into its own file, and then ran `gpg --import key1.asc` etc. and copied it out of the gpg output." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392461 (owner: 10Legoktm) [05:13:03] (03PS1) 10Brian Wolff: Enable Timeless everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392576 (https://phabricator.wikimedia.org/T154371) [06:14:40] (03PS1) 10Marostegui: db-eqiad.php: Restore original weight for db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392579 (https://phabricator.wikimedia.org/T180700) [06:17:07] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore original weight for db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392579 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [06:18:23] (03Merged) 10jenkins-bot: db-eqiad.php: Restore original weight for db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392579 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [06:18:32] (03CR) 10jenkins-bot: db-eqiad.php: Restore original weight for db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392579 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [06:19:34] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore db1082 original weight - T177208 (duration: 00m 49s) [06:19:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:19:48] T177208: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208 [06:22:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392580 (https://phabricator.wikimedia.org/T178359) [06:24:32] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392580 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [06:24:37] /win 23 [06:25:43] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392580 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [06:26:59] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392580 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [06:27:05] !log Stop MySQL on db1096 to clone db1101.s5 - T178359 [06:27:05] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1096 - T178359 (duration: 00m 48s) [06:27:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:12] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [06:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:31:20] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 3 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3776876 (10Tgr) [06:31:22] PROBLEM - puppet last run on mw2222 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/mwrepl] [06:32:32] PROBLEM - puppet last run on mw2154 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/modprobe.d/nf_conntrack.conf] [06:42:55] (03PS1) 10Marostegui: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392582 (https://phabricator.wikimedia.org/T179106) [06:45:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392582 (https://phabricator.wikimedia.org/T179106) (owner: 10Marostegui) [06:46:02] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392582 (https://phabricator.wikimedia.org/T179106) (owner: 10Marostegui) [06:46:38] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392583 [06:47:02] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392582 (https://phabricator.wikimedia.org/T179106) (owner: 10Marostegui) [06:47:15] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1087 - T179106 (duration: 00m 48s) [06:47:16] !log Remove index wb_terms_language from db1087 - https://phabricator.wikimedia.org/T179106 [06:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:22] T179106: Drop the "wb_terms.wb_terms_language" index - https://phabricator.wikimedia.org/T179106 [06:47:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:57] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392583 (owner: 10Marostegui) [06:49:08] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392583 (owner: 10Marostegui) [06:50:07] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1087 - T179106 (duration: 00m 48s) [06:50:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:13] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392583 (owner: 10Marostegui) [06:51:52] (03PS1) 10Marostegui: db-eqiad.php: Pool db1109 and db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392584 (https://phabricator.wikimedia.org/T180700) [06:53:31] (03PS2) 10Marostegui: db-eqiad.php: Pool db1109 and db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392584 (https://phabricator.wikimedia.org/T180700) [06:53:33] (03PS1) 10Marostegui: mariadb: Enable notifications db1109,db1110 [puppet] - 10https://gerrit.wikimedia.org/r/392585 (https://phabricator.wikimedia.org/T180700) [06:56:22] RECOVERY - puppet last run on mw2222 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:41] (03CR) 10Marostegui: [C: 032] mariadb: Enable notifications db1109,db1110 [puppet] - 10https://gerrit.wikimedia.org/r/392585 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [06:57:31] RECOVERY - puppet last run on mw2154 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:09] (03PS3) 10Marostegui: db-eqiad.php: Pool db1109 and db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392584 (https://phabricator.wikimedia.org/T180700) [06:58:56] (03PS4) 10Marostegui: db-eqiad.php: Pool db1109 and db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392584 (https://phabricator.wikimedia.org/T180700) [07:01:04] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Pool db1109 and db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392584 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [07:01:54] (03Merged) 10jenkins-bot: db-eqiad.php: Pool db1109 and db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392584 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [07:02:10] (03CR) 10jenkins-bot: db-eqiad.php: Pool db1109 and db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392584 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [07:03:06] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Pool db1109 and db1110 in s5 with small weight - T180700 (duration: 00m 48s) [07:03:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:03:13] T180700: Rack and setup db1109 and db1110 - https://phabricator.wikimedia.org/T180700 [07:26:50] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename supervision request: Angr → Mahagaja - https://phabricator.wikimedia.org/T180946#3776923 (10Marostegui) In which wiki are those 243,627 global edits? [07:27:43] 10Operations, 10DBA, 10Patch-For-Review, 10Wikimedia-Incident: s5 primary master db1063 crashed - https://phabricator.wikimedia.org/T180714#3776924 (10Marostegui) 05Open>03Resolved [07:27:45] 10Operations, 10TechCom: Create email alias for the TechCom - https://phabricator.wikimedia.org/T181027#3776926 (10Joe) [07:27:56] 10Operations, 10TechCom: Create email alias for the TechCom - https://phabricator.wikimedia.org/T181027#3776938 (10Joe) p:05Triage>03Normal a:03Joe [07:30:53] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1109, db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392587 (https://phabricator.wikimedia.org/T180700) [07:34:04] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1109, db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392587 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [07:35:16] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1109, db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392587 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [07:36:28] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1109 and db1110 - T180700 (duration: 00m 49s) [07:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:37] T180700: Rack and setup db1109 and db1110 - https://phabricator.wikimedia.org/T180700 [07:36:53] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1109, db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392587 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [07:51:30] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received [07:52:20] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy [07:55:25] <_joe_> uh what's this rb alert on esams? [07:58:21] looking [08:02:44] all seems good with both rb and mobileapps [08:06:50] (03PS1) 10Marostegui: db-eqiad.php: Fully poll db1109,db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392589 (https://phabricator.wikimedia.org/T180700) [08:09:26] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully poll db1109,db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392589 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [08:10:38] (03Merged) 10jenkins-bot: db-eqiad.php: Fully poll db1109,db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392589 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [08:11:02] (03CR) 10jenkins-bot: db-eqiad.php: Fully poll db1109,db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392589 (https://phabricator.wikimedia.org/T180700) (owner: 10Marostegui) [08:11:40] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully pool db1109 and db1110 - T180700 (duration: 00m 48s) [08:11:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:49] T180700: Rack and setup db1109 and db1110 - https://phabricator.wikimedia.org/T180700 [08:11:53] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1109 and db1110 - https://phabricator.wikimedia.org/T180700#3776960 (10Marostegui) 05Open>03Resolved These two servers have been fully pooled in s5. [08:15:08] (03PS1) 10DCausse: [logstash] add debug_blob field [puppet] - 10https://gerrit.wikimedia.org/r/392590 (https://phabricator.wikimedia.org/T180051) [08:15:10] (03PS1) 10DCausse: [WIP] [logstash] Add a way to move some data to debug_blob [puppet] - 10https://gerrit.wikimedia.org/r/392591 (https://phabricator.wikimedia.org/T180051) [08:16:57] (03CR) 10DCausse: [C: 04-1] "requires logstash 5.6+" [puppet] - 10https://gerrit.wikimedia.org/r/392591 (https://phabricator.wikimedia.org/T180051) (owner: 10DCausse) [08:19:00] !log Compress s5 on db1101 - T178359 [08:19:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:07] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [08:25:58] (03CR) 10Thiemo Mättig (WMDE): [C: 031] "Again, I would like to applaud Lucas for a concise, precise commit message. :-) I had a look and can confirm everything." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392449 (owner: 10Lucas Werkmeister (WMDE)) [08:27:20] (03PS1) 10Marostegui: db-eqiad.php: Repool db1096 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392592 (https://phabricator.wikimedia.org/T178359) [08:27:51] !log Reboot db1096 for kernel and MariaDB upgrade [08:27:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:34] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1096 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392592 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [08:33:45] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1096 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392592 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [08:34:15] ACKNOWLEDGEMENT - HP RAID on ms-be1039 is CRITICAL: CRITICAL: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Failed: 1I:1:4 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T181028 [08:34:19] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T181028#3776984 (10ops-monitoring-bot) [08:35:11] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1096 with low weight - T178359 (duration: 00m 48s) [08:35:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:35:18] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [08:36:57] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1096 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392592 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [08:37:00] (03PS1) 10Hashar: Remove a trail semicolon, breaking deployment-tin [puppet] - 10https://gerrit.wikimedia.org/r/392594 [08:37:42] (03PS2) 10Hashar: Remove a trail semicolon, breaking deployment-tin [puppet] - 10https://gerrit.wikimedia.org/r/392594 [08:38:13] (03PS3) 10Hashar: Remove a trail semicolon, breaking deployment-tin [puppet] - 10https://gerrit.wikimedia.org/r/392594 [08:39:28] !log Drop index on db1089 enwiki.ores_classification - T180045 [08:39:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:35] T180045: Review and deploy schema change on dropping oresc_rev_predicted_model index - https://phabricator.wikimedia.org/T180045 [08:45:08] hashar: shouldnt that semi colon also be breaking terbium then? [08:47:17] (03CR) 10Addshore: "> All of the mentioned changes are also in wmf.6 of the Wikidata build, which is the currently deployed version." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392449 (owner: 10Lucas Werkmeister (WMDE)) [08:49:14] 10Operations, 10Gerrit, 10Readers-Web-Backlog, 10Patch-For-Review, and 3 others: [spike] Temporarily allow pushing large objects - https://phabricator.wikimedia.org/T178189#3777010 (10phuedx) 05Open>03Resolved Being **bold**. I'll be creating a higher-level "Deploy the service" task that summarises th... [08:49:39] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 4 others: How should we get Chromium for use in puppeteer? - https://phabricator.wikimedia.org/T178570#3777013 (10phuedx) 05Open>03Resolved >>! In T178189#3777010, @phuedx wrote: > Being **bold**. > > I'll be creating a higher-level "... [08:53:34] 10Operations, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 5 others: Job queue is increasing non-stop - https://phabricator.wikimedia.org/T173710#3777026 (10Lydia_Pintscher) [08:56:51] (03CR) 10Elukey: [C: 04-1] "Precautionary -1 to verify the following:" [puppet] - 10https://gerrit.wikimedia.org/r/392489 (https://phabricator.wikimedia.org/T180978) (owner: 10Paladox) [08:57:26] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392601 [08:58:43] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3775204 (10elukey) Added a couple of notes in the code review: 1) mod_http2 does not work with mpm-prefork but only with worker/event (latter is preferred). The mod_http... [08:59:25] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392601 (owner: 10Marostegui) [09:00:41] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392601 (owner: 10Marostegui) [09:00:50] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392601 (owner: 10Marostegui) [09:02:30] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1096 (duration: 00m 49s) [09:02:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:32] (03PS4) 10Elukey: mediawiki::maintenance::wikidata: remove a trailing semicolon [puppet] - 10https://gerrit.wikimedia.org/r/392594 (owner: 10Hashar) [09:04:58] (03CR) 10Elukey: [C: 032] mediawiki::maintenance::wikidata: remove a trailing semicolon [puppet] - 10https://gerrit.wikimedia.org/r/392594 (owner: 10Hashar) [09:05:37] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 4 others: How should we get Chromium for use in puppeteer? - https://phabricator.wikimedia.org/T178570#3777061 (10phuedx) ^ For context: The conversation around this problem forked between this task and {T178189}. In the latter, we (Reade... [09:06:11] hashar: --^ [09:08:05] 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org - https://phabricator.wikimedia.org/T180854#3777062 (10Qgil) This task is about setting up the pilot. Let's discuss blockers for production at {T180853}. A basic blocker for the pilot: where should di... [09:09:25] elukey: hello. I found that commit on the beta cluster puppetmaster. I guess it is fixing something somehow :) [09:09:42] elukey: I have no clue what the impact is going to be on production though :( [09:10:13] hashar: just ran it on terbium and it was a no-op [09:10:34] \o/ [09:10:45] thanks for spotting it :) [09:11:26] hashar: if you have a min, would you mind to give me some info about how to mirror a (new) gerrit repo to github? [09:11:46] (03PS1) 10DCausse: [logstash] log all elastic queries [puppet] - 10https://gerrit.wikimedia.org/r/392603 (https://phabricator.wikimedia.org/T180051) [09:13:49] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392605 [09:14:43] 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org - https://phabricator.wikimedia.org/T180854#3777072 (10Qgil) [09:16:14] (03CR) 10Alexandros Kosiaris: [C: 04-1] apache: Add http2 to mod (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392495 (owner: 10Paladox) [09:16:56] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392605 (owner: 10Marostegui) [09:17:50] (03PS9) 10Volans: Icinga notification: use notes_url in messages [puppet] - 10https://gerrit.wikimedia.org/r/391237 (https://phabricator.wikimedia.org/T170353) [09:17:52] (03PS10) 10Volans: Metric alarms: make link to Grafana mandatory [puppet] - 10https://gerrit.wikimedia.org/r/391238 (https://phabricator.wikimedia.org/T170353) [09:17:54] (03PS1) 10Volans: Icinga web: add icons for multiple notes_url items [puppet] - 10https://gerrit.wikimedia.org/r/392606 (https://phabricator.wikimedia.org/T170353) [09:17:57] (03PS1) 10Volans: Metric alarms: convert dashboad_link to array [puppet] - 10https://gerrit.wikimedia.org/r/392607 (https://phabricator.wikimedia.org/T170353) [09:18:05] * volans waiting for jenkins -1 on the last one [09:18:11] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392605 (owner: 10Marostegui) [09:18:26] * elukey trusts volans and knows that he'll get a +2 [09:18:42] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1096 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392605 (owner: 10Marostegui) [09:18:48] (03CR) 10jerkins-bot: [V: 04-1] Metric alarms: convert dashboad_link to array [puppet] - 10https://gerrit.wikimedia.org/r/392607 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [09:18:51] noooo [09:18:57] told ya ;) [09:19:11] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1096 (duration: 00m 49s) [09:19:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:43] 10Operations, 10Wikimedia-Logstash, 10Discovery-Search (Current work), 10Patch-For-Review, 10Services (watching): Reduce the number of fields declared in elasticsearch by logstash - https://phabricator.wikimedia.org/T180051#3777081 (10dcausse) I'm investigating two approaches here: # provide a way inside... [09:25:33] (03PS1) 10Marostegui: db2068.yaml: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/392609 (https://phabricator.wikimedia.org/T180927) [09:26:07] (03CR) 10Marostegui: [C: 032] db2068.yaml: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/392609 (https://phabricator.wikimedia.org/T180927) (owner: 10Marostegui) [09:28:03] (03PS1) 10Elukey: Release version 0.4 [software/druid_exporter] (debian) - 10https://gerrit.wikimedia.org/r/392611 [09:28:06] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2068 storage crash - https://phabricator.wikimedia.org/T180927#3777092 (10Marostegui) >>! In T180927#3774972, @Papaul wrote: > The ILO is up to date. I need to update the Storage and BIOS on the system but the Service pack disk that i have is old, the... [09:28:15] !log Shutdown db2068 for maintenance - T180927 [09:28:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:22] T180927: db2068 storage crash - https://phabricator.wikimedia.org/T180927 [09:35:26] (03CR) 10Volans: [C: 04-2] "Blocked on green light to use future parser syntax" [puppet] - 10https://gerrit.wikimedia.org/r/392606 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [09:35:34] (03CR) 10Volans: [C: 04-2] "Blocked on green light to use future parser syntax" [puppet] - 10https://gerrit.wikimedia.org/r/392607 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [09:37:21] (03CR) 10Elukey: [V: 032 C: 032] Release version 0.4 [software/druid_exporter] (debian) - 10https://gerrit.wikimedia.org/r/392611 (owner: 10Elukey) [09:39:50] !log upload prometheus-druid-exporter 0.4 to jessie/stretch-wikimedia [09:39:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:12] 10Operations, 10ops-esams, 10DC-Ops, 10Traffic: Multiple systems in esams OE10 showing PSU failures - https://phabricator.wikimedia.org/T177228#3651185 (10ArielGlenn) Checked in with Mark, he's hoping to have the budget discussion on this next quarter. [09:45:12] PROBLEM - cassandra-b CQL 10.192.32.138:9042 on restbase2004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:46:38] that's me ^ [09:59:24] (03PS4) 10Elukey: profile::mariadb::misc::eventlogging:replication: add EL sanitization cron [puppet] - 10https://gerrit.wikimedia.org/r/391828 (https://phabricator.wikimedia.org/T156933) [09:59:57] (03CR) 10Elukey: [C: 032] profile::mariadb::misc::eventlogging:replication: add EL sanitization cron [puppet] - 10https://gerrit.wikimedia.org/r/391828 (https://phabricator.wikimedia.org/T156933) (owner: 10Elukey) [10:10:50] !log ppchelko@tin Started deploy [cpjobqueue/deploy@e35aa05]: Set consumer_batch_size to 10 T181007 [10:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:56] T181007: Investigate backlog in RecorLintJob - https://phabricator.wikimedia.org/T181007 [10:11:21] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@e35aa05]: Set consumer_batch_size to 10 T181007 (duration: 00m 31s) [10:11:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:48] (03CR) 10Alexandros Kosiaris: [C: 031] "+1 from me on idea + implementation. This is our first future parser only code so let's be indeed careful in deploying it and learn for it" [puppet] - 10https://gerrit.wikimedia.org/r/392607 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [10:35:06] !log bootstrap cassandra restbase2004-a - T179422 [10:35:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:14] T179422: Reshape RESTBase Cassandra clusters - https://phabricator.wikimedia.org/T179422 [10:40:19] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 21 probes of 284 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [10:44:14] (03PS1) 10Alexandros Kosiaris: install_server: Assign VMs the correct tty [puppet] - 10https://gerrit.wikimedia.org/r/392617 (https://phabricator.wikimedia.org/T179036) [10:45:19] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 10 probes of 284 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [10:45:50] (03CR) 10Alexandros Kosiaris: [C: 032] install_server: Assign VMs the correct tty [puppet] - 10https://gerrit.wikimedia.org/r/392617 (https://phabricator.wikimedia.org/T179036) (owner: 10Alexandros Kosiaris) [10:48:15] (03PS1) 10Giuseppe Lavagetto: First version of the helm chart scaffolding for production services [deployment-charts] - 10https://gerrit.wikimedia.org/r/392619 (https://phabricator.wikimedia.org/T177397) [10:48:52] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T181028#3777325 (10fgiunchedi) [10:48:54] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T181020#3777327 (10fgiunchedi) [10:50:56] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T181028#3777328 (10fgiunchedi) @Volans FYI I merged the duplicate T181020 into this, one difference seems to be the command output ``` CRITICAL: Slot 3: Failed: 1I:1:4 - OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8... [10:52:41] (03Draft1) 10MarcoAurelio: Close transitionteamwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392620 (https://phabricator.wikimedia.org/T181000) [10:52:45] (03PS2) 10MarcoAurelio: Close transitionteamwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392620 (https://phabricator.wikimedia.org/T181000) [10:53:35] godog: wut? I don't see it retriggered here on IRC, wondering why the handler was retriggered [10:54:20] (03PS1) 10DCausse: Upgrade logstash plugins to 5.5.2 [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/392621 (https://phabricator.wikimedia.org/T178412) [10:54:23] volans: I thought because the output changed? [10:54:51] could be, checking the logs [10:55:28] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T181028#3777357 (10fgiunchedi) a:03Cmjohnson @Cmjohnson disk `sdj` failed here, please replace, thanks! [10:56:08] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392622 (https://phabricator.wikimedia.org/T128546) [10:56:16] (03CR) 10jerkins-bot: [V: 04-1] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392622 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:56:24] (03PS2) 10DCausse: Upgrade logstash plugins to 5.5.2 [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/392621 (https://phabricator.wikimedia.org/T178412) [10:56:50] it was called only once, directly with service_attempts=3, service_state='CRITICAL', service_state_type='HARD' [10:57:35] !log reboot install1002 for serial tty change [10:57:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:29] PROBLEM - Check systemd state on install1002 is CRITICAL: CRITICAL - starting: Late bootup, before the job queue becomes idle for the first time, or one of the rescue targets are reached. [10:59:00] heh, first time I ever see that live [10:59:37] fascinating [11:00:05] Interesting... [11:00:30] RECOVERY - Check systemd state on install1002 is OK: OK - running: The system is fully operational [11:01:22] !log reboot install2002 for serial tty change [11:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:07] !log reboot netmon1003 for serial tty change [11:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:10] (03PS6) 10Elukey: profile::pmacct: move configuration to Kafka Jumbo [puppet] - 10https://gerrit.wikimedia.org/r/392007 (https://phabricator.wikimedia.org/T173489) [11:03:34] !log reboot oresrdb2001 for serial tty change [11:03:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:41] (03CR) 10Elukey: [C: 032] profile::pmacct: move configuration to Kafka Jumbo [puppet] - 10https://gerrit.wikimedia.org/r/392007 (https://phabricator.wikimedia.org/T173489) (owner: 10Elukey) [11:04:42] !log reboot ununpentium for serial tty change [11:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:37] (03CR) 10Lucas Werkmeister (WMDE): "Hm, I checked that they’re in .6 of the extension, but actually not whether that’s also in .6 of the build :) will do." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392449 (owner: 10Lucas Werkmeister (WMDE)) [11:15:56] (03CR) 10Lucas Werkmeister (WMDE): "Okay, they’re all also in .6 of the build." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392449 (owner: 10Lucas Werkmeister (WMDE)) [11:19:24] 10Operations, 10monitoring, 10netops, 10Patch-For-Review, 10User-Elukey: pmacct should be upgraded to 1.6.2 on Stretch - https://phabricator.wikimedia.org/T173489#3777427 (10elukey) 05Open>03Resolved [11:19:36] (03PS1) 10ArielGlenn: rsync misc dumps (everything but xml/sql) to fallback hosts, labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/392625 (https://phabricator.wikimedia.org/T179942) [11:19:59] (03CR) 10jerkins-bot: [V: 04-1] rsync misc dumps (everything but xml/sql) to fallback hosts, labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/392625 (https://phabricator.wikimedia.org/T179942) (owner: 10ArielGlenn) [11:22:05] (03PS10) 10Volans: Icinga notification: use notes_url in messages [puppet] - 10https://gerrit.wikimedia.org/r/391237 (https://phabricator.wikimedia.org/T170353) [11:22:16] (03PS11) 10Volans: Icinga notification: use notes_url in messages [puppet] - 10https://gerrit.wikimedia.org/r/391237 (https://phabricator.wikimedia.org/T170353) [11:23:28] (03CR) 10Volans: [C: 032] Icinga notification: use notes_url in messages [puppet] - 10https://gerrit.wikimedia.org/r/391237 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [11:23:41] 10Operations, 10Analytics, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3777449 (10elukey) [11:28:06] (03Abandoned) 10Elukey: Remove any reference of mc1001->mc1018 for decom [puppet] - 10https://gerrit.wikimedia.org/r/354453 (https://phabricator.wikimedia.org/T164341) (owner: 10Elukey) [11:28:43] 10Operations, 10vm-requests, 10Patch-For-Review, 10Performance-Team (Radar): Request VM for webperf (metrics processing) - https://phabricator.wikimedia.org/T179036#3777495 (10akosiaris) Fixed the console issue in above patch, reimaged the VMs and just run puppet for the first time. I am guessing this is s... [11:38:43] (03CR) 10Giuseppe Lavagetto: [C: 031] [WIP] profile::redis::jobqueue: stagger redis slave restarts [puppet] - 10https://gerrit.wikimedia.org/r/391798 (owner: 10Elukey) [11:54:29] CUSTOM - Check systemd state on install1002 is OK: OK - running: The system is fully operational [11:54:52] that's me... I might have broken icinga notifications... still digging [11:57:14] (03PS7) 10Elukey: profile::redis::jobqueue: stagger redis slave restarts [puppet] - 10https://gerrit.wikimedia.org/r/391798 (https://phabricator.wikimedia.org/T179684) [11:57:39] (03CR) 10jerkins-bot: [V: 04-1] profile::redis::jobqueue: stagger redis slave restarts [puppet] - 10https://gerrit.wikimedia.org/r/391798 (https://phabricator.wikimedia.org/T179684) (owner: 10Elukey) [11:57:57] PROBLEM - Check systemd state on install1002 is CRITICAL: TEST notifications - volans [11:58:17] RECOVERY - Check systemd state on install1002 is OK: OK - running: The system is fully operational [11:58:32] ok this seems to work as before [11:58:39] (03PS8) 10Elukey: profile::redis::jobqueue: stagger redis slave restarts [puppet] - 10https://gerrit.wikimedia.org/r/391798 (https://phabricator.wikimedia.org/T179684) [11:58:42] !log kartik@tin Started deploy [cxserver/deploy@b87a27a]: Update cxserver to 4301987 [11:58:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:06] !log kartik@tin Finished deploy [cxserver/deploy@b87a27a]: Update cxserver to 4301987 (duration: 03m 24s) [12:02:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:17] PROBLEM - Varnish child restarted on cp2004 is CRITICAL: TEST dashboard link notifications - volans https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?orgId=1var-server=cp2004var-datasource=codfw%2520prometheus%252Fops [12:04:19] godog, akosiaris: test of the dashboard link in the IRC notification FYI ;) ^^^ [12:04:30] and damn double URL escape [12:06:34] wah wah, looks good though [12:09:42] it's the notification, the URL is correct in icinga config, get's the double URL encoding in irc.log [12:09:45] I'll look at it [12:10:17] RECOVERY - Varnish child restarted on cp2004 is OK: OK - varnish-frontend-check-child-start is 1 https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?orgId=1var-server=cp2004var-datasource=codfw%2520prometheus%252Fops [12:21:38] (03PS11) 10Volans: Metric alarms: make link to Grafana mandatory [puppet] - 10https://gerrit.wikimedia.org/r/391238 (https://phabricator.wikimedia.org/T170353) [12:22:45] (03CR) 10Volans: [C: 032] Metric alarms: make link to Grafana mandatory [puppet] - 10https://gerrit.wikimedia.org/r/391238 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [12:38:01] (03PS1) 10Volans: Icinga notes_url: do not pre-encode the URLs [puppet] - 10https://gerrit.wikimedia.org/r/392630 (https://phabricator.wikimedia.org/T170353) [12:38:52] * volans blames godog for the prometheus datasource names :-P [12:39:45] (03CR) 10Volans: [C: 032] Icinga notes_url: do not pre-encode the URLs [puppet] - 10https://gerrit.wikimedia.org/r/392630 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [12:40:19] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename supervision request: Angr → Mahagaja - https://phabricator.wikimedia.org/T180946#3777679 (10Steinsplitter) >>! In T180946#3776923, @Marostegui wrote: > In which wiki are those 243,627 global edits? https://commons.wikimedia.org/wiki/Special:Centr... [12:40:39] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename supervision request: Angr → Mahagaja - https://phabricator.wikimedia.org/T180946#3777680 (10Steinsplitter) [12:41:47] volans: yeah it is all part of a long con to troll you in this exact case ;) [12:43:17] rotfl [12:47:38] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:48:38] aaaaand this is me... [12:51:40] (03PS1) 10Volans: Metric alarms: fix dashboard link validation [puppet] - 10https://gerrit.wikimedia.org/r/392631 (https://phabricator.wikimedia.org/T170353) [12:55:31] (03CR) 10Volans: [C: 032] Metric alarms: fix dashboard link validation [puppet] - 10https://gerrit.wikimedia.org/r/392631 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [12:57:38] RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [13:10:08] (03CR) 10BBlack: [C: 032] Reserve internal anycast range [dns] - 10https://gerrit.wikimedia.org/r/391131 (owner: 10Ayounsi) [13:12:07] (03PS7) 10BBlack: Have every rdns advertise a private anycast VIP [puppet] - 10https://gerrit.wikimedia.org/r/391149 (https://phabricator.wikimedia.org/T98006) (owner: 10Ayounsi) [13:12:19] (03PS8) 10BBlack: Have every rdns advertise a private anycast VIP [puppet] - 10https://gerrit.wikimedia.org/r/391149 (https://phabricator.wikimedia.org/T98006) (owner: 10Ayounsi) [13:14:03] (03PS1) 10BBlack: dnsrecursor: send hostname in version responses [puppet] - 10https://gerrit.wikimedia.org/r/392635 (https://phabricator.wikimedia.org/T98006) [13:16:28] (03Abandoned) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392622 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [13:18:47] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392636 (https://phabricator.wikimedia.org/T128546) [13:28:46] (03PS2) 10ArielGlenn: rsync misc dumps (everything but xml/sql) to fallback hosts, labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/392625 (https://phabricator.wikimedia.org/T179942) [13:31:24] (03PS5) 10BBlack: eqsin: basics [puppet] - 10https://gerrit.wikimedia.org/r/389741 (https://phabricator.wikimedia.org/T156027) [13:31:26] (03PS1) 10BBlack: [WIP] eqsin: cache/lvs/dns/bast site.pp [puppet] - 10https://gerrit.wikimedia.org/r/392639 [13:31:58] (03CR) 10jerkins-bot: [V: 04-1] [WIP] eqsin: cache/lvs/dns/bast site.pp [puppet] - 10https://gerrit.wikimedia.org/r/392639 (owner: 10BBlack) [13:39:07] !log quick reboot on lvs4005 [13:39:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:24] <_joe_> !log starting 2 manual runners for htmlcacheupdate on commons, 1 for htmlcacheupdate and 1 for refreshlinks on ruwiki, on terbium [13:39:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:02] * volans wants to know the secret command that allows bblack to do a "quick reboot" compared to a normal reboot :-P [13:40:57] I guess what I mean is, I'm not worried the reboot will turn into a train wreck of debugging before it's back in service :) [13:41:11] lol [13:42:18] (03PS1) 10BBlack: lvs400[567]: turn on BGP (still MED 100, all secondary) [puppet] - 10https://gerrit.wikimedia.org/r/392641 [13:44:59] !log ppchelko@tin Started deploy [cpjobqueue/deploy@5341d94]: Enable GC metrics reporting [13:45:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:35] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@5341d94]: Enable GC metrics reporting (duration: 00m 36s) [13:45:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:54] *yawn* [14:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171121T1400). Please do the needful. [14:00:04] brion and jan_drewniak: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:13] \o/ [14:00:27] o/ [14:00:36] o/ [14:00:45] I can SWAT today [14:00:57] brion, jan_drewniak: want to deploy your own changes, or should I? [14:01:26] zeljkof: I would very much like to deploy my change today! (this will be my first deploy!) [14:01:58] zeljkof: you can do mine, i'm out of practice :D https://gerrit.wikimedia.org/r/#/c/392643/ is the cherry-pick [14:02:20] brion: ok :) [14:02:24] thanks :D [14:02:30] jan_drewniak: ok, go ahead then, and let me know when you are done [14:02:39] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename supervision request: Angr → Mahagaja - https://phabricator.wikimedia.org/T180946#3777924 (10Marostegui) Thanks! When do you want to this? [14:02:52] yipee [14:03:05] (03CR) 10Jdrewniak: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392636 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [14:04:19] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392636 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [14:06:58] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392636 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [14:07:13] zeljkof: alright, one thing I don't know how to do though, is get the portals up on mwdebug test server... [14:07:35] jan_drewniak: log in to mwdebug1002 [14:07:45] run [14:07:46] scap pull [14:08:04] !log mobrovac@tin Started restart [electron-render/deploy@8dd5f13]: electron stuck - T174916 [14:08:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:10] T174916: electron/pdfrender hangs - https://phabricator.wikimedia.org/T174916 [14:08:35] !log not-abnormally-quick reboot on lvs4005 [14:08:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:52] zeljkof: scap pull from the home directory? [14:08:59] jan_drewniak: yes [14:09:57] zeljkof: ok that worked :) looks good, so not I'm ready to sync [14:11:09] !log restart cpjobqueue to try increasing maxSockets [14:11:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:19] !log ppchelko@tin Started restart [cpjobqueue/deploy@5341d94]: Restart to try increased maxSockets [14:11:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:35] !log jdrewniak@tin Synchronized portals/prod/wikipedia.org/assets: SWAT: [[gerrit:392636|Bumping portals to master (T128546)]] (duration: 00m 50s) [14:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:41] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [14:11:57] sorry crashed [14:12:16] hashar: you have crashed?! ;) [14:12:24] !log jdrewniak@tin Synchronized portals: SWAT: [[gerrit:392636|Bumping portals to master (T128546)]] (duration: 00m 49s) [14:12:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:32] hmm no :D [14:12:45] my computer / process / X11 whatever got locked somehow [14:13:10] zeljkof hashar: boom! did it! [14:13:23] jan_drewniak: congratulations! :) [14:13:27] nice :D [14:13:27] and that ladies and gentlemen, was my first deploy [14:13:36] * brion clap clap clap [14:13:41] !!!!!! [14:13:51] jan_drewniak: can I take over the SWAT? [14:13:54] now, are the assets purged properly from the varnish cache ? :) [14:14:38] zeljkof: yeah, it's yours. [14:14:47] jan_drewniak: great, thanks :) [14:15:01] brion: please stand by, will ping you in a few minutes when your commit is at mwdebug1002 [14:15:09] thanks :D [14:16:04] (03CR) 10BBlack: [C: 032] lvs400[567]: turn on BGP (still MED 100, all secondary) [puppet] - 10https://gerrit.wikimedia.org/r/392641 (owner: 10BBlack) [14:16:16] 10Operations, 10vm-requests, 10Patch-For-Review, 10Performance-Team (Radar): Request VM for webperf (metrics processing) - https://phabricator.wikimedia.org/T179036#3777939 (10Dzahn) @Akosiaris thank you! Wow so many others were in the wrong file as well. .. re: role Krinkle pointed out that it's NOT y... [14:16:23] hashar: I lit the incense and did all the prayers that purge the varnish cache :P so I think it worked [14:17:00] varnish = voodoo [14:17:00] brion: 392643 is at mwdebug1002, please test and let me know if I can deploy [14:17:08] testing... [14:17:42] zeljkof: good, looks working [14:17:51] brion: deploying... [14:18:22] (03CR) 10Alexandros Kosiaris: [C: 031] add webperf1001/2001 to site, using webperf role [puppet] - 10https://gerrit.wikimedia.org/r/392030 (https://phabricator.wikimedia.org/T179036) (owner: 10Dzahn) [14:18:33] 10Operations, 10vm-requests, 10Patch-For-Review, 10Performance-Team (Radar): Request VM for webperf (metrics processing) - https://phabricator.wikimedia.org/T179036#3777953 (10akosiaris) >>! In T179036#3777939, @Dzahn wrote: > @Akosiaris thank you! Wow so many others were in the wrong file as well. .. Y... [14:18:42] !log zfilipin@tin Synchronized php-1.31.0-wmf.8/extensions/TimedMediaHandler/MwEmbedModules/EmbedPlayer/resources/mw.EmbedPlayerOgvJs.js: SWAT: [[gerrit:392643|Disable wasm, use asm.js codec modules for Safari/Edge (T181022)]] (duration: 00m 49s) [14:18:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:51] T181022: WebAssembly .wasm file cannot be loaded, breaks video on Safari & Edge - https://phabricator.wikimedia.org/T181022 [14:18:58] brion: deployed, please check and thanks for deploying with #releng ;) [14:19:32] "we know you have a choice in release engineering, and thank you for choosing us" :DDD [14:19:41] "deploy or not deploy" [14:20:29] ok, the way resourceloader interacts with timedmediahandler it'll be a few minutes before i can verify in safari, but it looks good so far :D [14:21:02] &debug=1 would bypass the RL cache isn't it ? [14:21:47] hashar: it does, and it works :D great [14:25:30] !log cr[12]-ulsfo: allow lvs400[567] as PyBal neighbors for BGP [14:25:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:11] (03CR) 10Jcrespo: [C: 032] ""qui tacet consentire videtur"" [puppet] - 10https://gerrit.wikimedia.org/r/384695 (https://phabricator.wikimedia.org/T175672) (owner: 10Jcrespo) [14:28:17] (03PS9) 10Jcrespo: proxysql: Setup proxysql on terbium/wasat as a test [puppet] - 10https://gerrit.wikimedia.org/r/384695 (https://phabricator.wikimedia.org/T175672) [14:28:39] (03PS1) 10BBlack: kmod::blacklist: prevent manual install, update initramfs [puppet] - 10https://gerrit.wikimedia.org/r/392644 [14:28:41] (03PS1) 10BBlack: cp/lvs: prevent accidental iptables kmods [puppet] - 10https://gerrit.wikimedia.org/r/392645 [14:29:30] oh, forgot to close swat [14:29:36] !log EU SWAT finished [14:29:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:36] (03PS10) 10Jcrespo: proxysql: Setup proxysql on terbium/wasat as a test [puppet] - 10https://gerrit.wikimedia.org/r/384695 (https://phabricator.wikimedia.org/T175672) [14:34:01] (03CR) 10Jcrespo: [C: 032] proxysql: Setup proxysql on terbium/wasat as a test [puppet] - 10https://gerrit.wikimedia.org/r/384695 (https://phabricator.wikimedia.org/T175672) (owner: 10Jcrespo) [14:37:06] (03PS1) 10Marostegui: db-eqiad.php: Restore db1096 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392647 [14:38:30] 10Operations, 10wikidiff2, 10Patch-For-Review, 10User-Addshore, and 2 others: Update and use php-wikidiff2 1.5.1 & MovedParagraphDetectionCutoff in production - https://phabricator.wikimedia.org/T177891#3778052 (10Tobi_WMDE_SW) [14:38:44] 10Operations, 10wikidiff2, 10Patch-For-Review, 10User-Addshore, and 2 others: Update and use php-wikidiff2 1.5.1 & MovedParagraphDetectionCutoff in production - https://phabricator.wikimedia.org/T177891#3674094 (10Tobi_WMDE_SW) [14:39:05] !log ppchelko@tin Started deploy [cpjobqueue/deploy@aac3201]: Temporarily disable deduplication [14:39:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:34] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@aac3201]: Temporarily disable deduplication (duration: 00m 29s) [14:39:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:22] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1096 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392647 (owner: 10Marostegui) [14:41:42] (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1096 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392647 (owner: 10Marostegui) [14:41:56] (03CR) 10jenkins-bot: db-eqiad.php: Restore db1096 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392647 (owner: 10Marostegui) [14:42:59] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore original weight for db1096 (duration: 00m 48s) [14:43:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:30] PROBLEM - proxysql processes on wasat is CRITICAL: PROCS CRITICAL: 0 processes with command name proxysql [14:51:55] I didn't know we had monitoring for that yet [14:52:37] (03PS1) 10Cmjohnson: Adding dns entries for new mw hosts mw13[61-69] T165519 [dns] - 10https://gerrit.wikimedia.org/r/392650 [14:53:21] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: Decomission mw1161-69 - https://phabricator.wikimedia.org/T177387#3778089 (10Cmjohnson) [14:53:36] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: Decomission mw1161-69 - https://phabricator.wikimedia.org/T177387#3656944 (10Cmjohnson) [14:53:42] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3778093 (10Cmjohnson) [14:53:45] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: Decomission mw1161-69 - https://phabricator.wikimedia.org/T177387#3656944 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson [14:55:08] (03PS9) 10Elukey: profile::redis::jobqueue: stagger redis slave restarts [puppet] - 10https://gerrit.wikimedia.org/r/391798 (https://phabricator.wikimedia.org/T179684) [14:55:59] (03PS1) 10Jcrespo: proxysql: Make proxy configuration non-readable for all users [puppet] - 10https://gerrit.wikimedia.org/r/392651 (https://phabricator.wikimedia.org/T175672) [14:56:01] (03CR) 10Cmjohnson: [C: 032] "This is the update for mw1329-37 not 1361-69" [dns] - 10https://gerrit.wikimedia.org/r/392650 (owner: 10Cmjohnson) [14:56:18] cmjohnson1: new mw hosts? nice!! [14:56:35] (03PS10) 10Elukey: profile::redis::jobqueue: stagger redis slave restarts [puppet] - 10https://gerrit.wikimedia.org/r/391798 (https://phabricator.wikimedia.org/T179684) [14:56:47] elukey: you will have them later today [14:57:29] \o/ [14:57:50] (03CR) 10Jcrespo: [C: 032] proxysql: Make proxy configuration non-readable for all users [puppet] - 10https://gerrit.wikimedia.org/r/392651 (https://phabricator.wikimedia.org/T175672) (owner: 10Jcrespo) [14:59:16] (03PS1) 10Dzahn: add webperf nodes with test role, add shell for perf-roots [puppet] - 10https://gerrit.wikimedia.org/r/392653 (https://phabricator.wikimedia.org/T179036) [15:00:05] !log ppchelko@tin Started deploy [cpjobqueue/deploy@3e948de]: Revert: Temporarily disable deduplication [15:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:13] (03PS2) 10Dzahn: add webperf nodes with test role, add shell for perf-roots [puppet] - 10https://gerrit.wikimedia.org/r/392653 (https://phabricator.wikimedia.org/T179036) [15:00:32] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@3e948de]: Revert: Temporarily disable deduplication (duration: 00m 26s) [15:00:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:51] (03CR) 10Dzahn: [C: 032] add webperf nodes with test role, add shell for perf-roots [puppet] - 10https://gerrit.wikimedia.org/r/392653 (https://phabricator.wikimedia.org/T179036) (owner: 10Dzahn) [15:01:26] (03CR) 10Dzahn: "first step: https://gerrit.wikimedia.org/r/#/c/392653/" [puppet] - 10https://gerrit.wikimedia.org/r/392030 (https://phabricator.wikimedia.org/T179036) (owner: 10Dzahn) [15:02:28] (03PS2) 10Giuseppe Lavagetto: lvs::configuration: standardize depool thresholds for mw servers [puppet] - 10https://gerrit.wikimedia.org/r/391797 (https://phabricator.wikimedia.org/T178799) [15:04:02] (03PS2) 10Dzahn: webperf1001/2001 start using webperf role [puppet] - 10https://gerrit.wikimedia.org/r/392030 (https://phabricator.wikimedia.org/T179036) [15:11:10] (03PS1) 10Dzahn: webperf: convert to profile/role [puppet] - 10https://gerrit.wikimedia.org/r/392655 [15:13:54] (03CR) 10Dzahn: "15:11:32 wmf-style: total violations delta -2" [puppet] - 10https://gerrit.wikimedia.org/r/392655 (owner: 10Dzahn) [15:15:07] (03CR) 10Giuseppe Lavagetto: [C: 032] lvs::configuration: standardize depool thresholds for mw servers [puppet] - 10https://gerrit.wikimedia.org/r/391797 (https://phabricator.wikimedia.org/T178799) (owner: 10Giuseppe Lavagetto) [15:22:09] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:22:28] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Patch-For-Review, 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3778177 (10jcrespo) @aaron the proxy is installed but unconfigured, - we still have... [15:23:11] (03CR) 10Paladox: [C: 031] Move role::gerrit::server to just role::gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392095 (owner: 10Chad) [15:25:03] (03CR) 10Dzahn: [C: 04-1] "Could not find data item gerrit::service::ipv4 in any Hiera data file and no default supplied = http://puppet-compiler.wmflabs.org/8877/co" [puppet] - 10https://gerrit.wikimedia.org/r/392095 (owner: 10Chad) [15:25:36] (03CR) 10Mobrovac: [C: 031] "LGTM, in spite of the fact that there is still a slight chance of overlap between redis instances restarts across nodes in the same DC (wh" [puppet] - 10https://gerrit.wikimedia.org/r/391798 (https://phabricator.wikimedia.org/T179684) (owner: 10Elukey) [15:26:24] (03PS11) 10Elukey: profile::redis::jobqueue: stagger redis slave restarts [puppet] - 10https://gerrit.wikimedia.org/r/391798 (https://phabricator.wikimedia.org/T179684) [15:28:26] (03PS2) 10Dzahn: Move role::gerrit::server to just role::gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392095 (owner: 10Chad) [15:31:17] (03CR) 10Dzahn: [C: 031] "works now: http://puppet-compiler.wmflabs.org/8878/" [puppet] - 10https://gerrit.wikimedia.org/r/392095 (owner: 10Chad) [15:31:21] <_joe_> !log rolling restart of pybal on low-traffic in codfw, eqiad for the new depool thresholds for MW [15:31:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:30] (03CR) 10Dzahn: [C: 032] Move role::gerrit::server to just role::gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392095 (owner: 10Chad) [15:31:35] (03CR) 10Paladox: [C: 031] Move role::gerrit::server to just role::gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392095 (owner: 10Chad) [15:31:37] (03PS3) 10Dzahn: Move role::gerrit::server to just role::gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392095 (owner: 10Chad) [15:33:40] 10Operations, 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3778201 (10Ottomata) [15:34:03] paladox: please feel free to go ahead and switch the labs roles/hiera :) [15:34:10] heh thanks :) [15:34:11] no-op in prod [15:34:13] thanks too [15:34:50] remember hieradata/role/common/ AND hieradata/role/eqiad|codfw [15:34:52] i was using profile::gerrit::server [15:34:54] anyways [15:35:05] don't :) [15:35:18] oh [15:36:16] (03PS1) 10Elukey: role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [15:36:39] (03CR) 10jerkins-bot: [V: 04-1] role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [15:37:09] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [15:38:58] (03PS2) 10Elukey: role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [15:39:02] (03PS3) 10Paladox: apache: Add http2 to mod [puppet] - 10https://gerrit.wikimedia.org/r/392495 [15:39:19] (03CR) 10Paladox: apache: Add http2 to mod (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392495 (owner: 10Paladox) [15:39:21] (03CR) 10jerkins-bot: [V: 04-1] role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [15:39:28] (03CR) 10jerkins-bot: [V: 04-1] apache: Add http2 to mod [puppet] - 10https://gerrit.wikimedia.org/r/392495 (owner: 10Paladox) [15:40:09] 10Operations, 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3108353 (10Nettrom) The data behind [[ https://page-creation.wmflabs.org/#projects=nlwiki,eswiki,plwiki,itwiki,enwiki,jawiki,dewiki,svwiki,ruwik... [15:40:55] (03PS4) 10Paladox: apache: Add http2 to mod [puppet] - 10https://gerrit.wikimedia.org/r/392495 [15:41:57] 10Operations, 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3778229 (10elukey) >>! In T156844#3778225, @Nettrom wrote: > The data behind [[ https://page-creation.wmflabs.org/#projects=nlwiki,eswiki,plwiki... [15:41:59] (03PS4) 10Paladox: Gerrit: Move apache resources to the profile instead of gerrit::proxy [puppet] - 10https://gerrit.wikimedia.org/r/392494 [15:44:09] (03PS5) 10Paladox: Gerrit: Move apache resources to the profile instead of gerrit::proxy [puppet] - 10https://gerrit.wikimedia.org/r/392494 [15:47:50] 10Operations, 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3778241 (10Nettrom) Nevermind, turns out @mforns has already updated that configuration, should've checked that first. Thanks again for taking c... [15:48:39] (03PS3) 10Elukey: role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [15:49:08] (03CR) 10jerkins-bot: [V: 04-1] role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [15:49:19] * elukey fires himself [15:49:48] we have high 5xx? [15:50:11] jynus: i dont remember it ever fully recovering [15:50:33] I think it was just a spike [15:50:45] at 16:39 [15:50:49] *15:39 [15:51:18] jynus: ill say something if i hear any complaints [15:52:27] (03PS4) 10Elukey: role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [15:52:58] jynus: it seems fine to me now, there is a background of 500s but nothing out of the ordinary checking the past 24h [15:53:59] RECOVERY - proxysql processes on wasat is OK: PROCS OK: 2 processes with command name proxysql [16:00:41] (03PS1) 10Herron: puppet: point db2* hosts at puppet 4 master puppetmaster2001 [puppet] - 10https://gerrit.wikimedia.org/r/392664 (https://phabricator.wikimedia.org/T177254) [16:00:45] 10Operations, 10ops-codfw, 10Services (watching): Degraded RAID on restbase2004 - https://phabricator.wikimedia.org/T180562#3778274 (10fgiunchedi) 05Open>03declined We're bootstrapping 2004 as part of T179422, resolving this for now and will reopen in case of further issues. [16:00:57] herron: I'm back to thinking about puppetmaster upgrades… is the state of the 4.x puppetmaster now fully puppetized or does it still involve local hacks? [16:01:14] T177254 suggests the latter but maybe phab just needs updating [16:01:15] T177254: Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254 [16:01:50] 10Operations, 10Patch-For-Review: Revisit Pybal depool thresholds for app servers - https://phabricator.wikimedia.org/T178799#3778279 (10Joe) 05Open>03Resolved [16:02:09] andrewbogott it's puppetized with the exception of package upgrade steps [16:02:54] herron: meaning that new puppetmasters will still be built from puppet as 3.x? [16:03:42] andrewbogott no a new build should install 4, but since it's an ensure => present no action would be taken if puppet was already installed [16:03:54] ah, ok [16:04:24] there is a setting in hiera called puppet_major_version [16:04:31] so, T179722 and T179721 and the plugin symlinks… all fixed? I haven't looked at the code yet :) [16:04:31] T179721: puppet4: The following unknown setting(s) are being ignored: parser - https://phabricator.wikimedia.org/T179721 [16:04:32] T179722: puppet4: puppet master auth.conf changes - https://phabricator.wikimedia.org/T179722 [16:05:14] 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org - https://phabricator.wikimedia.org/T180854#3778284 (10bd808) >>! In T180854#3777062, @Qgil wrote: > A basic blocker for the pilot: where should discourse-mediawiki.wmflabs.org be hosted? Options (if I... [16:05:17] 10Operations, 10Puppet, 10Patch-For-Review: Granular puppet version selection - https://phabricator.wikimedia.org/T178825#3778287 (10herron) [16:05:19] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe, 10cloud-services-team (FY2017-18): Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254#3778288 (10herron) [16:05:21] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: puppet4: puppet master auth.conf changes - https://phabricator.wikimedia.org/T179722#3778285 (10herron) 05Open>03Resolved a:03herron [16:06:59] herron: ok, I'll give this another go on a VM :) thanks! [16:07:12] the ignored settings "parser" error is still open, it adds some noise but is harmless. planning to remove that line once all the masters are upgraded but we could do it conditionally based on puppet_major_version [16:07:19] sounds good! [16:08:43] (03Draft1) 10Paladox: Gerrit: Fix disabling hmac-md5 [puppet] - 10https://gerrit.wikimedia.org/r/392666 [16:08:45] (03PS2) 10Paladox: Gerrit: Fix disabling hmac-md5 [puppet] - 10https://gerrit.wikimedia.org/r/392666 [16:08:58] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe, 10cloud-services-team (FY2017-18): Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254#3778301 (10herron) [16:09:00] 10Operations, 10Puppet, 10Patch-For-Review: Granular puppet version selection - https://phabricator.wikimedia.org/T178825#3778299 (10herron) 05Open>03Resolved a:03herron [16:09:10] 10Operations, 10Puppet, 10Patch-For-Review: Granular puppet version selection - https://phabricator.wikimedia.org/T178825#3704234 (10herron) [16:09:12] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: puppet4: conditionally pin puppet* packages to the appropriate repo for OS release - https://phabricator.wikimedia.org/T179724#3778302 (10herron) 05Open>03Resolved a:03herron [16:09:15] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe, 10cloud-services-team (FY2017-18): Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254#3715646 (10herron) [16:15:18] (03Draft1) 10Paladox: planet: Add a xhtml archive plugin to rawdog [puppet] - 10https://gerrit.wikimedia.org/r/392657 [16:15:21] (03PS2) 10Paladox: planet: Add a xhtml archive plugin to rawdog [puppet] - 10https://gerrit.wikimedia.org/r/392657 [16:15:41] (03CR) 10jerkins-bot: [V: 04-1] planet: Add a xhtml archive plugin to rawdog [puppet] - 10https://gerrit.wikimedia.org/r/392657 (owner: 10Paladox) [16:19:20] !log updating firmware on db2068 [16:19:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:21:11] (03PS1) 10Hoo man: Set Wikidata entity dump batch size to 1500 [puppet] - 10https://gerrit.wikimedia.org/r/392670 (https://phabricator.wikimedia.org/T177486) [16:28:10] (03CR) 10Alexandros Kosiaris: [C: 04-1] "nice first draft. Comments inline" (0313 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/392619 (https://phabricator.wikimedia.org/T177397) (owner: 10Giuseppe Lavagetto) [16:31:13] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename supervision request: Angr → Mahagaja - https://phabricator.wikimedia.org/T180946#3778434 (10Steinsplitter) 05Open>03Resolved a:03Steinsplitter Done. Thanks :) [16:31:54] (03CR) 10ArielGlenn: [C: 032] Set Wikidata entity dump batch size to 1500 [puppet] - 10https://gerrit.wikimedia.org/r/392670 (https://phabricator.wikimedia.org/T177486) (owner: 10Hoo man) [16:35:39] !log powering down wtp2017 for disk replacement [16:35:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:32] (03PS1) 10Jcrespo: proxysql: Enable systemd support so it starts as non-root [puppet] - 10https://gerrit.wikimedia.org/r/392674 (https://phabricator.wikimedia.org/T175672) [16:38:05] (03PS2) 10Jcrespo: proxysql: Enable systemd support so it starts as non-root [puppet] - 10https://gerrit.wikimedia.org/r/392674 (https://phabricator.wikimedia.org/T175672) [16:38:29] PROBLEM - Host wtp2017 is DOWN: PING CRITICAL - Packet loss = 100% [16:41:30] PROBLEM - Host wtp2017.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:45:02] !log Compress s3 on db2085 - T178359 [16:45:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:45:10] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [16:46:39] RECOVERY - Host wtp2017.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.72 ms [16:47:35] 10Operations, 10ops-codfw: Degraded RAID on wtp2017 - https://phabricator.wikimedia.org/T180211#3778476 (10Papaul) a:05Papaul>03akosiaris Call Dell, they said that the server part warranty has expired so i have some spare 500G SATA disks on site that i used for the system. systems is back up. [16:47:39] RECOVERY - Host wtp2017 is UP: PING OK - Packet loss = 0%, RTA = 36.07 ms [16:48:19] (03PS3) 10Jcrespo: proxysql: Enable systemd support so it starts as non-root [puppet] - 10https://gerrit.wikimedia.org/r/392674 (https://phabricator.wikimedia.org/T175672) [16:49:50] ACKNOWLEDGEMENT - MD RAID on wtp2017 is CRITICAL: CRITICAL: State: degraded, Active: 2, Working: 2, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T181069 [16:49:55] 10Operations, 10ops-codfw: Degraded RAID on wtp2017 - https://phabricator.wikimedia.org/T181069#3778481 (10ops-monitoring-bot) [16:51:21] (03PS1) 10Herron: puppet: point codfw cp servers at codfw puppet 4 masters [puppet] - 10https://gerrit.wikimedia.org/r/392676 (https://phabricator.wikimedia.org/T177254) [16:53:43] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2068 storage crash - https://phabricator.wikimedia.org/T180927#3778496 (10Papaul) a:05Papaul>03Marostegui Firmware update complete [16:54:27] (03CR) 10Jcrespo: "Aside from "puppet can fail", is there any potential problem (mysql is quite resiliant to puppet- we do not let it manage it directly unle" [puppet] - 10https://gerrit.wikimedia.org/r/392664 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [16:55:14] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2068 storage crash - https://phabricator.wikimedia.org/T180927#3778499 (10Marostegui) Thanks @Papaul - I will start mysql, let it run for the night and if all goes fine close this. If this breaks again, we can contact the vendor and see how to procee... [16:56:17] (03PS3) 10Dzahn: Gerrit: Fix disabling hmac-md5 [puppet] - 10https://gerrit.wikimedia.org/r/392666 (owner: 10Paladox) [16:58:53] (03CR) 10Jcrespo: [C: 031] "https://puppet-compiler.wmflabs.org/compiler02/8881/terbium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/392674 (https://phabricator.wikimedia.org/T175672) (owner: 10Jcrespo) [16:59:00] (03PS4) 10Jcrespo: proxysql: Enable systemd support so it starts as non-root [puppet] - 10https://gerrit.wikimedia.org/r/392674 (https://phabricator.wikimedia.org/T175672) [16:59:35] 10Operations, 10Electron-PDFs, 10Security-Reviews, 10Services-next, and 2 others: Restrict outgoing network connections from Electron render service - https://phabricator.wikimedia.org/T148567#3778518 (10dpatrick) Just following up on some lingering security reviews. I know that this service has been deplo... [17:00:05] godog, moritzm, and _joe_: Dear deployers, time to do the Puppet SWAT(Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171121T1700). [17:00:05] No GERRIT patches in the queue for this window AFAICS. [17:03:04] (03CR) 10Marostegui: "Wouldn't be a good idea to deploy this to a subset of servers only and let it run for some hours or so, instead of the whole DC?" [puppet] - 10https://gerrit.wikimedia.org/r/392664 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [17:07:18] (03CR) 10Dzahn: [C: 032] Gerrit: Fix disabling hmac-md5 [puppet] - 10https://gerrit.wikimedia.org/r/392666 (owner: 10Paladox) [17:07:56] (03PS6) 10Paladox: Gerrit: Move apache resources to the profile instead of gerrit::proxy [puppet] - 10https://gerrit.wikimedia.org/r/392494 [17:09:53] 10Operations, 10ops-codfw: Degraded RAID on wtp2017 - https://phabricator.wikimedia.org/T181069#3778534 (10faidon) [17:09:55] 10Operations, 10ops-codfw: Degraded RAID on wtp2017 - https://phabricator.wikimedia.org/T180211#3778536 (10faidon) [17:16:37] (03CR) 10Herron: "Yes we don't want to deploy any unexpected changes as those could cause problems. I suggest disabling the puppet agent on all codfw db sy" [puppet] - 10https://gerrit.wikimedia.org/r/392664 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [17:21:27] (03CR) 10Marostegui: "I don't have any particular subset in mind, I would leave masters and dbstore servers for the last bit." [puppet] - 10https://gerrit.wikimedia.org/r/392664 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [17:38:31] 10Operations, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current): Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3778593 (10awight) [17:38:48] (03CR) 10Dzahn: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8882/" [puppet] - 10https://gerrit.wikimedia.org/r/392494 (owner: 10Paladox) [17:38:57] (03PS1) 10Awight: [DO NOT MERGE] Update ORES venv path to use versioned cache [puppet] - 10https://gerrit.wikimedia.org/r/392683 (https://phabricator.wikimedia.org/T181071) [17:39:09] (03PS5) 10Paladox: apache: Add http2 to mod [puppet] - 10https://gerrit.wikimedia.org/r/392495 [17:40:25] (03PS18) 10Paladox: Gerrit: Fix up logstash configuation [puppet] - 10https://gerrit.wikimedia.org/r/392079 (https://phabricator.wikimedia.org/T141324) [17:41:48] (03CR) 10Elukey: [C: 032] profile::redis::jobqueue: stagger redis slave restarts [puppet] - 10https://gerrit.wikimedia.org/r/391798 (https://phabricator.wikimedia.org/T179684) (owner: 10Elukey) [17:41:55] (03PS12) 10Elukey: profile::redis::jobqueue: stagger redis slave restarts [puppet] - 10https://gerrit.wikimedia.org/r/391798 (https://phabricator.wikimedia.org/T179684) [17:41:57] (03PS19) 10Paladox: Gerrit: Fix up logstash configuation [puppet] - 10https://gerrit.wikimedia.org/r/392079 (https://phabricator.wikimedia.org/T141324) [17:41:59] (03PS6) 10Paladox: Gerrit: Enable logstash for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392083 (https://phabricator.wikimedia.org/T141324) [17:43:07] mutante: shall I merge your code too? [17:44:08] elukey: yes, please [17:45:26] done! [17:46:25] thanks [17:49:40] PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:50:23] (03PS5) 10Jcrespo: proxysql: Enable systemd support so it starts as non-root [puppet] - 10https://gerrit.wikimedia.org/r/392674 (https://phabricator.wikimedia.org/T175672) [17:51:11] rdb1002 is me, fixing it [17:51:29] (03PS1) 10Elukey: profile::redis::jobqueue: fix default selector [puppet] - 10https://gerrit.wikimedia.org/r/392684 (https://phabricator.wikimedia.org/T179684) [17:51:55] (03CR) 10Jcrespo: [C: 032] proxysql: Enable systemd support so it starts as non-root [puppet] - 10https://gerrit.wikimedia.org/r/392674 (https://phabricator.wikimedia.org/T175672) (owner: 10Jcrespo) [17:54:05] (03CR) 10Mobrovac: [C: 031] profile::redis::jobqueue: fix default selector [puppet] - 10https://gerrit.wikimedia.org/r/392684 (https://phabricator.wikimedia.org/T179684) (owner: 10Elukey) [17:54:32] (03PS2) 10Elukey: profile::redis::jobqueue: fix default selector [puppet] - 10https://gerrit.wikimedia.org/r/392684 (https://phabricator.wikimedia.org/T179684) [17:55:22] (03CR) 10Elukey: [C: 032] profile::redis::jobqueue: fix default selector [puppet] - 10https://gerrit.wikimedia.org/r/392684 (https://phabricator.wikimedia.org/T179684) (owner: 10Elukey) [17:56:31] (03CR) 10Paladox: [C: 031] "Thanks :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392576 (https://phabricator.wikimedia.org/T154371) (owner: 10Brian Wolff) [17:57:20] PROBLEM - Check systemd state on wasat is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:58:20] that is me [17:58:22] patch coming [17:58:30] (03PS1) 10Jcrespo: proxysql: fix /etc/proxysql.cfg permissions [puppet] - 10https://gerrit.wikimedia.org/r/392685 (https://phabricator.wikimedia.org/T175672) [17:59:07] (03PS2) 10Jcrespo: proxysql: fix /etc/proxysql.cfg permissions [puppet] - 10https://gerrit.wikimedia.org/r/392685 (https://phabricator.wikimedia.org/T175672) [17:59:34] (03CR) 10Jcrespo: [C: 032] proxysql: fix /etc/proxysql.cfg permissions [puppet] - 10https://gerrit.wikimedia.org/r/392685 (https://phabricator.wikimedia.org/T175672) (owner: 10Jcrespo) [17:59:39] RECOVERY - puppet last run on rdb1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:00:05] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Time to snap out of that daydream and deploy Services – Graphoid / Parsoid / Citoid / ORES. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171121T1800). [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:00:18] no parsoid deploy today [18:00:46] ORES has gone underground [18:01:43] 10Operations, 10Discovery, 10Wikimedia-Mailing-lists, 10Wikimedia-Portals, 10Discovery-Portal-Sprint: Email list needed for automating the Wikipedia.org portal - https://phabricator.wikimedia.org/T180976#3778675 (10debt) Hi @RobH - it turns out that we needed this list to be private (per T179694#3778354)... [18:01:49] 10Operations, 10Discovery, 10Wikimedia-Mailing-lists, 10Wikimedia-Portals, 10Discovery-Portal-Sprint: Email list needed for automating the Wikipedia.org portal - https://phabricator.wikimedia.org/T180976#3778677 (10debt) 05Resolved>03Open [18:02:29] PROBLEM - IPMI Sensor Status on maps1002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Power Supply 2 = Critical, Power Supplies = Critical] [18:08:29] RECOVERY - Check systemd state on wasat is OK: OK - running: The system is fully operational [18:12:47] 10Operations, 10Discovery, 10Wikimedia-Mailing-lists, 10Wikimedia-Portals, 10Discovery-Portal-Sprint: Email list needed for automating the Wikipedia.org portal - https://phabricator.wikimedia.org/T180976#3778688 (10RobH) 05Open>03Resolved You did it all correctly, except I changed it from 'Approved'... [18:14:39] 10Operations, 10Discovery, 10Wikimedia-Mailing-lists, 10Wikimedia-Portals, 10Discovery-Portal-Sprint: Email list needed for automating the Wikipedia.org portal - https://phabricator.wikimedia.org/T180976#3778696 (10debt) Perfect, thanks @RobH ! :) [18:17:38] (03CR) 10Zoranzoki21: [C: 031] Enable Timeless everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392576 (https://phabricator.wikimedia.org/T154371) (owner: 10Brian Wolff) [18:17:57] (03CR) 10Zoranzoki21: [C: 031] $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) (owner: 10TerraCodes) [18:19:01] (03CR) 10Zoranzoki21: [C: 031] keys: Document which key is which in keys.txt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392461 (owner: 10Legoktm) [18:23:19] PROBLEM - HHVM rendering on mw2141 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:23:39] (03CR) 10Zoranzoki21: [C: 031] Close transitionteamwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392620 (https://phabricator.wikimedia.org/T181000) (owner: 10MarcoAurelio) [18:24:16] (03PS14) 10Zoranzoki21: Enable the ArticlePlaceholder for sewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387077 (https://phabricator.wikimedia.org/T179241) [18:24:19] RECOVERY - HHVM rendering on mw2141 is OK: HTTP OK: HTTP/1.1 200 OK - 76286 bytes in 0.318 second response time [18:25:25] !log smalyshev@tin Started deploy [wdqs/wdqs@7d951d2]: Restore categories vocabulary to V003 [18:25:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:36] !log smalyshev@tin Finished deploy [wdqs/wdqs@7d951d2]: Restore categories vocabulary to V003 (duration: 00m 11s) [18:25:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:07] !log smalyshev@tin Started deploy [wdqs/wdqs@c69c739]: Restore categories vocabulary to V003 [18:26:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:33] (03PS1) 10Jcrespo: proxysql: Fix /var/lib/proxysql permissions and move .my.cnf to profile [puppet] - 10https://gerrit.wikimedia.org/r/392689 (https://phabricator.wikimedia.org/T175672) [18:28:03] !log smalyshev@tin Finished deploy [wdqs/wdqs@c69c739]: Restore categories vocabulary to V003 (duration: 01m 55s) [18:28:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:11] (03PS5) 10Elukey: role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [18:29:04] (03CR) 10Zoranzoki21: [C: 031] multiversion: Assume --wiki=aawiki for purgeUrls.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392561 (owner: 10Krinkle) [18:30:52] (03CR) 10Zoranzoki21: [C: 031] Add *.dimu.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392189 (https://phabricator.wikimedia.org/T180791) (owner: 10Framawiki) [18:31:33] (03CR) 10Zoranzoki21: "> Per task description, "@jhsoby-WMNO will ask the (very small)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387077 (https://phabricator.wikimedia.org/T179241) (owner: 10Zoranzoki21) [18:31:48] 10Operations, 10Performance-Team, 10vm-requests: Request VM for webperf (metrics processing) - https://phabricator.wikimedia.org/T179036#3778745 (10Krinkle) [18:32:22] 10Operations, 10Performance-Team, 10monitoring: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837#3778752 (10Krinkle) [18:32:24] 10Operations, 10Performance-Team, 10vm-requests: Request VM for webperf (metrics processing) - https://phabricator.wikimedia.org/T179036#3711107 (10Krinkle) 05Open>03Resolved [18:33:02] 10Operations, 10Performance-Team, 10vm-requests: Request VM for webperf (metrics processing) - https://phabricator.wikimedia.org/T179036#3711107 (10Krinkle) Thanks! Next step is to actually migrate the role, which will be done by Performance Team and tracked via parent task (T179036). [18:33:09] (03PS6) 10Elukey: role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [18:34:30] (03CR) 10Dzahn: [C: 031] apache: Add http2 to mod [puppet] - 10https://gerrit.wikimedia.org/r/392495 (owner: 10Paladox) [18:35:06] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/8887/analytics1040.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [18:35:30] !log mholloway-shell@tin Started deploy [mobileapps/deploy@dd41387]: Update mobileapps to 9d1602d [18:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:41] (03CR) 10Dzahn: "Style guide says profiles should be included in role classes and we've been doing that across the board. A "require" would be the first." [puppet] - 10https://gerrit.wikimedia.org/r/391742 (owner: 10Dzahn) [18:40:39] !log mholloway-shell@tin Finished deploy [mobileapps/deploy@dd41387]: Update mobileapps to 9d1602d (duration: 05m 09s) [18:40:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:03] (03PS3) 10Paladox: planet: Add a xhtml archive plugin to rawdog [puppet] - 10https://gerrit.wikimedia.org/r/392657 [18:47:00] 10Operations, 10Performance-Team, 10monitoring: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837#3778878 (10Krinkle) [18:48:16] 10Operations, 10Traffic, 10netops, 10Cloud-VPS (Quota-requests): Request increased quota for traffic Cloud VPS project - https://phabricator.wikimedia.org/T180178#3778885 (10Andrew) 05Open>03Resolved a:03Andrew All set. [18:49:10] (03PS2) 10Jcrespo: proxysql: Fix /var/lib/proxysql permissions and move .my.cnf to profile [puppet] - 10https://gerrit.wikimedia.org/r/392689 (https://phabricator.wikimedia.org/T175672) [18:49:45] (03CR) 10Legoktm: "Zoranzoki21: did you verify all the GPG fingerprints are correct?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392461 (owner: 10Legoktm) [18:49:58] 10Operations, 10Performance-Team, 10vm-requests: Request VM for webperf (metrics processing) - https://phabricator.wikimedia.org/T179036#3778889 (10Dzahn) >>! In T179036#3778745, @Krinkle wrote: > Next step is to actually migrate the role, which will be done by Performance Team and tracked via parent task (T... [18:50:56] mutante: thx :) [18:51:42] Krinkle: welcome :) [18:53:26] jouncebot: next [18:53:26] In 5 hour(s) and 6 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171122T0000) [18:53:26] In 5 hour(s) and 6 minute(s): No deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171122T0000) [18:53:35] im rather confused [18:53:42] (03PS1) 10Andrew Bogott: diskspace.py and flavorreport.py: update manual flavor disk sizes [puppet] - 10https://gerrit.wikimedia.org/r/392690 [18:54:07] greg-g: do you have control of the google calander? and is it all messed up? [18:55:04] (03CR) 10Andrew Bogott: [C: 032] diskspace.py and flavorreport.py: update manual flavor disk sizes [puppet] - 10https://gerrit.wikimedia.org/r/392690 (owner: 10Andrew Bogott) [18:55:18] addshore: it's not the google calendar afaict, it's just the wiki page [18:55:41] so there is a swat window in 5 mins? O_o [18:56:20] addshore: yes, a window. but nothing is in it [18:56:54] I want to put something in it, but dont know where to on the wiki page :D [18:58:20] for puppet swat? [18:58:22] (03PS1) 10Herron: puppet: point codfw elasticsearch servers at codfw puppet 4 masters [puppet] - 10https://gerrit.wikimedia.org/r/392691 (https://phabricator.wikimedia.org/T177254) [18:58:27] jouncebot: refresh [18:58:32] I refreshed my knowledge about deployments. [18:58:34] addshore: go to https://wikitech.wikimedia.org/w/index.php?title=Deployments&action=edit§ion=3 and then... [18:58:47] not for puppet swat [18:58:54] then search for "2017-11-27 11:00 SF" [18:59:06] and replace "* ''Gerrit link to backport or config change'' [18:59:11] somehow the mid-day swat is gone? [18:59:22] no... [18:59:36] you mean the Morning SWAT? [18:59:37] (ignore me) [18:59:58] For today I only see European Mid-day SWAT and Evening swat on the wikipage [19:00:05] actually, no, it is gone [19:00:09] I can add it :) [19:00:10] i think the bot just tells us in 2 messages "there is the window" and "the window is empty" that.s all [19:00:22] greg-g: thanks! :D [19:00:33] maybe it is smart about not using it in the output of "next" because it's empty. heh [19:00:39] I got super confused for a second, as I havn't used this swat window in months.... and thought something else might have happened [19:01:02] ok, add to it [19:01:41] thanks! [19:02:19] (03PS3) 10Addshore: Enable AdvancedSearch on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390386 (https://phabricator.wikimedia.org/T180147) [19:03:07] jouncebot: refresh [19:03:11] I refreshed my knowledge about deployments. [19:03:15] jouncebot: now [19:03:15] For the next 0 hour(s) and 56 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171121T1900) [19:03:19] tada [19:03:20] =] [19:03:38] (03PS4) 10Addshore: Enable AdvancedSearch on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390386 (https://phabricator.wikimedia.org/T180147) [19:04:56] (03CR) 10Addshore: [C: 032] Enable AdvancedSearch on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390386 (https://phabricator.wikimedia.org/T180147) (owner: 10Addshore) [19:06:02] greg-g: i've added two small config patchs to the list, is it ok for you ? [19:06:05] (03Merged) 10jenkins-bot: Enable AdvancedSearch on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390386 (https://phabricator.wikimedia.org/T180147) (owner: 10Addshore) [19:06:56] (03PS2) 10Framawiki: Add images.collection.cooperhewitt.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390881 (https://phabricator.wikimedia.org/T180241) [19:07:02] (03PS3) 10Framawiki: Add *.dimu.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392189 (https://phabricator.wikimedia.org/T180791) [19:07:04] (03CR) 10jenkins-bot: Enable AdvancedSearch on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390386 (https://phabricator.wikimedia.org/T180147) (owner: 10Addshore) [19:08:39] * Niharika gives jouncebot a cookie [19:10:05] (03PS3) 10Jcrespo: proxysql: Fix /var/lib/proxysql permissions and move .my.cnf to profile [puppet] - 10https://gerrit.wikimedia.org/r/392689 (https://phabricator.wikimedia.org/T175672) [19:10:41] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:390386|Enable AdvancedSearch on group0]] PT1/2 (duration: 00m 50s) [19:10:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:12] framawiki: it's up to the SWAT deployer :) [19:11:42] !log addshore@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:390386|Enable AdvancedSearch on group0]] PT2/2 (duration: 00m 49s) [19:11:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:12:48] Lovely, thats that one oout of the door! [19:12:55] *looks at the deploy page* [19:14:47] framawiki: I can do those for you [19:15:42] I added two more at the last minute, and I apologize:) thank you addshore ! [19:15:58] (03CR) 10Zoranzoki21: [C: 031] "> Zoranzoki21: did you verify all the GPG fingerprints are correct?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392461 (owner: 10Legoktm) [19:16:09] (03PS3) 10Addshore: Add images.collection.cooperhewitt.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390881 (https://phabricator.wikimedia.org/T180241) (owner: 10Framawiki) [19:16:22] (03CR) 10Addshore: [C: 032] Add images.collection.cooperhewitt.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390881 (https://phabricator.wikimedia.org/T180241) (owner: 10Framawiki) [19:16:33] (03PS4) 10Addshore: Add *.dimu.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392189 (https://phabricator.wikimedia.org/T180791) (owner: 10Framawiki) [19:16:37] (03CR) 10Addshore: [C: 032] Add *.dimu.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392189 (https://phabricator.wikimedia.org/T180791) (owner: 10Framawiki) [19:17:05] (03CR) 10Zoranzoki21: [C: 031] Add images.collection.cooperhewitt.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390881 (https://phabricator.wikimedia.org/T180241) (owner: 10Framawiki) [19:17:10] framawiki: I will do them both at the same time :) [19:17:47] (03CR) 10Krinkle: [C: 031] webperf: convert to profile/role [puppet] - 10https://gerrit.wikimedia.org/r/392655 (owner: 10Dzahn) [19:17:52] (03Merged) 10jenkins-bot: Add images.collection.cooperhewitt.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390881 (https://phabricator.wikimedia.org/T180241) (owner: 10Framawiki) [19:17:54] (03Merged) 10jenkins-bot: Add *.dimu.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392189 (https://phabricator.wikimedia.org/T180791) (owner: 10Framawiki) [19:18:01] (03CR) 10jenkins-bot: Add images.collection.cooperhewitt.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390881 (https://phabricator.wikimedia.org/T180241) (owner: 10Framawiki) [19:18:05] (03PS4) 10Jcrespo: proxysql: Fix /var/lib/proxysql permissions and move .my.cnf to profile [puppet] - 10https://gerrit.wikimedia.org/r/392689 (https://phabricator.wikimedia.org/T175672) [19:19:08] (03PS15) 10Zoranzoki21: Enable the ArticlePlaceholder for sewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387077 (https://phabricator.wikimedia.org/T179241) [19:19:31] (03PS2) 10Dzahn: webperf: convert to profile/role [puppet] - 10https://gerrit.wikimedia.org/r/392655 [19:19:46] (03PS5) 10Jcrespo: proxysql: Fix /var/lib/proxysql permissions and move .my.cnf to profile [puppet] - 10https://gerrit.wikimedia.org/r/392689 (https://phabricator.wikimedia.org/T175672) [19:19:54] (03CR) 10Dzahn: [C: 032] webperf: convert to profile/role [puppet] - 10https://gerrit.wikimedia.org/r/392655 (owner: 10Dzahn) [19:20:48] !log addshore@tin Synchronized wmf-config/CommonSettings.php: SWAT: Add [[gerrit:390881|images.collection.cooperhewitt.org]] & [[gerrit:392189|*.dimu.org]] to wgCopyUploadsDomains. T180791 T180241 (duration: 00m 48s) [19:20:50] framawiki: ^^ [19:20:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:55] T180791: Please add *.dimu.org to the wgCopyUploadsDomains whitelist of Wikimedia Commons - https://phabricator.wikimedia.org/T180791 [19:20:55] T180241: Please add images.collection.cooperhewitt.org to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T180241 [19:21:08] !log SWAT done! [19:21:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:21:14] (03CR) 10jenkins-bot: Add *.dimu.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392189 (https://phabricator.wikimedia.org/T180791) (owner: 10Framawiki) [19:21:40] (03PS6) 10Jcrespo: proxysql: Fix /var/lib/proxysql permissions and move .my.cnf to profile [puppet] - 10https://gerrit.wikimedia.org/r/392689 (https://phabricator.wikimedia.org/T175672) [19:22:12] (03CR) 10Jcrespo: [C: 032] proxysql: Fix /var/lib/proxysql permissions and move .my.cnf to profile [puppet] - 10https://gerrit.wikimedia.org/r/392689 (https://phabricator.wikimedia.org/T175672) (owner: 10Jcrespo) [19:22:35] thanks addshore ! [19:23:58] np! [19:25:36] (03PS1) 10Jcrespo: proxysql: Fix invalid puppet dependency cycle [puppet] - 10https://gerrit.wikimedia.org/r/392694 (https://phabricator.wikimedia.org/T175672) [19:25:54] (03PS2) 10Jcrespo: proxysql: Fix invalid puppet dependency cycle [puppet] - 10https://gerrit.wikimedia.org/r/392694 (https://phabricator.wikimedia.org/T175672) [19:26:13] (03CR) 10jerkins-bot: [V: 04-1] proxysql: Fix invalid puppet dependency cycle [puppet] - 10https://gerrit.wikimedia.org/r/392694 (https://phabricator.wikimedia.org/T175672) (owner: 10Jcrespo) [19:27:32] (03PS3) 10Jcrespo: proxysql: Fix invalid puppet dependency cycle [puppet] - 10https://gerrit.wikimedia.org/r/392694 (https://phabricator.wikimedia.org/T175672) [19:27:42] (03PS4) 10Jcrespo: proxysql: Fix invalid puppet dependency cycle [puppet] - 10https://gerrit.wikimedia.org/r/392694 (https://phabricator.wikimedia.org/T175672) [19:28:20] (03CR) 10Jcrespo: [C: 032] proxysql: Fix invalid puppet dependency cycle [puppet] - 10https://gerrit.wikimedia.org/r/392694 (https://phabricator.wikimedia.org/T175672) (owner: 10Jcrespo) [19:29:39] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:34:19] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Patch-For-Review, 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3779106 (10jcrespo) Blocked on getting answers written at T175672#3778177. [19:34:39] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [19:38:04] http://gerrit-logstash.wmflabs.org/app/kibana#/dashboards?_g=() [19:38:05] woops [19:38:08] wrong palce [19:38:12] place meant for someone else [19:45:10] (03PS1) 10Herron: base: auto logout idle bash shells after 2 days [puppet] - 10https://gerrit.wikimedia.org/r/392698 (https://phabricator.wikimedia.org/T122922) [20:00:43] !log finishing wmf.8 rollout, starting group2 to wmf.8 [20:00:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:03:47] (03PS1) 10Thcipriani: Revert "Revert "group2 to wmf.8"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392701 [20:05:30] (03PS1) 10Cmjohnson: adding mgmt dns entries for db111[12] T180788 [dns] - 10https://gerrit.wikimedia.org/r/392702 [20:06:51] addshore: around still? [20:07:02] addshore: should wikidata go to wmf.8 or stay on wmf.7 right now? [20:07:32] .8 please! :) [20:08:21] wikidata wiki? I ask because yesterday's deploy held wikidatawiki to wmf.7: https://gerrit.wikimedia.org/r/#/c/392467/ [20:08:48] (line 864 there) [20:08:51] addshore: ^ [20:09:27] thcipriani: It should be fine, the thing that broken on the first rollout was https://phabricator.wikimedia.org/T180665 [20:09:28] I need to step away for a few. [20:10:12] addshore: ok, cool, will roll them forward all together. Thanks! [20:10:19] Great! [20:10:23] * addshore will watch stuff too :)_ [20:10:55] (03Abandoned) 10Thcipriani: Revert "Revert "group2 to wmf.8"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392701 (owner: 10Thcipriani) [20:16:58] (03PS1) 10Thcipriani: group2 to 1.31.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392704 [20:17:36] heh, hadn't used scap update-wikiversions before :) [20:19:12] (03CR) 10Thcipriani: [C: 032] group2 to 1.31.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392704 (owner: 10Thcipriani) [20:20:27] (03Merged) 10jenkins-bot: group2 to 1.31.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392704 (owner: 10Thcipriani) [20:20:29] (03PS1) 10ArielGlenn: set cron dates back to normal for xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/392705 [20:20:37] (03CR) 10jenkins-bot: group2 to 1.31.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392704 (owner: 10Thcipriani) [20:21:55] (03CR) 10ArielGlenn: [C: 032] set cron dates back to normal for xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/392705 (owner: 10ArielGlenn) [20:22:09] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group2 to 1.31.0-wmf.8 [20:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:22:20] ^ addshore fyi [20:22:25] great! [20:22:34] nothing popping up in the logs as far as i can see [20:24:37] cool :) [20:26:08] !log ariel@tin Started deploy [dumps/dumps@16f92d6]: gzip namespace and abstract dumps; remove last configfile existence checks [20:26:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:24] !log ariel@tin Finished deploy [dumps/dumps@16f92d6]: gzip namespace and abstract dumps; remove last configfile existence checks (duration: 00m 16s) [20:26:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:02] (03CR) 10RobH: [C: 031] base: auto logout idle bash shells after 2 days [puppet] - 10https://gerrit.wikimedia.org/r/392698 (https://phabricator.wikimedia.org/T122922) (owner: 10Herron) [20:36:35] (03PS9) 10BBlack: Have every rdns advertise a private anycast VIP [puppet] - 10https://gerrit.wikimedia.org/r/391149 (https://phabricator.wikimedia.org/T98006) (owner: 10Ayounsi) [20:37:27] (03PS2) 10BBlack: dnsrecursor: send hostname in version responses [puppet] - 10https://gerrit.wikimedia.org/r/392635 (https://phabricator.wikimedia.org/T98006) [20:42:56] !log ayounsi@neodymium conftool action : set/pooled=no; selector: name=acamar.wikimedia.org [20:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:45:01] !log recdns: puppet disabled on all, acamar depooled, careful deploys going on for anycast+recdns stuff [20:45:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:46:08] (03CR) 10BBlack: [C: 032] Have every rdns advertise a private anycast VIP [puppet] - 10https://gerrit.wikimedia.org/r/391149 (https://phabricator.wikimedia.org/T98006) (owner: 10Ayounsi) [20:46:16] (03CR) 10BBlack: [C: 032] dnsrecursor: send hostname in version responses [puppet] - 10https://gerrit.wikimedia.org/r/392635 (https://phabricator.wikimedia.org/T98006) (owner: 10BBlack) [20:47:49] herron: If you have a minute, can you have a look at an issue I'm having with puppet4 on a VM? It's probably a change we need in auth.conf. [20:47:58] https://www.irccloud.com/pastebin/IDu17mWK/ [20:47:59] andrewbogott hey, sure [20:48:36] Oh, actually... [20:48:40] maybe I remember this issue [20:48:45] yeah auth.conf sounds about right, did you swap in the package provided version? [20:48:53] it's because I'm trying to run a v4 client with a v3 master, right? [20:50:32] "did you swap in the package provided version" <- meaning that that auth.conf isn't puppetized according to major_puppet_version? [20:51:42] auth.conf is puppetized, so you set puppet_major_version: 4 on this master? [20:52:46] yes. But — the process I'm using doesn't make sense, so probably you should ignore that particular paste. [20:52:59] I'm guessing you just have puppet disabled on your v4 masters? [20:54:00] so right now puppetmaster2001 and puppetmaster2002 are on puppet v4 and both have the agent pointed at server=puppetmaster2001.codfw.wmnet [20:54:17] I think I'm going to tear this down and try again. If I 1) change major_puppet_version to 4 2) run puppet 3) disable puppet 4) upgrade packages by hand would you expect that to just work? [20:54:56] ok, and be sure to set it as puppet_major_version rather than major_puppet_version [20:55:04] ah, yes :) [20:55:34] in your step-by-step you have this bit that's needed to install the packages... [20:55:37] https://www.irccloud.com/pastebin/oyVT4PXa/ [20:55:54] that's just a temporary change during the package change, right? And then puppet switches those paths right back to how they were? [20:55:58] yes and what a can of worms that opened! [20:56:13] yeah exactly [20:56:21] ok [20:56:26] I can live with that :) [20:56:26] well I switched it manually but yes it should be restored [20:56:38] I have a meeting now but I will try again later on and follow up if I hit new issues [20:56:39] thanks [20:56:49] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:57:09] alright good luck! [21:04:47] (03PS1) 10BBlack: recdns anycast: fix create_resources data structure [puppet] - 10https://gerrit.wikimedia.org/r/392710 [21:06:58] (03CR) 10BBlack: [C: 032] recdns anycast: fix create_resources data structure [puppet] - 10https://gerrit.wikimedia.org/r/392710 (owner: 10BBlack) [21:13:25] (03PS1) 10Rush: openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [21:14:27] (03PS2) 10Rush: openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [21:14:52] (03PS1) 10ArielGlenn: snapshots: add scap deploy key info for dumpsgen user [puppet] - 10https://gerrit.wikimedia.org/r/392712 [21:16:19] PROBLEM - Check systemd state on acamar is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:16:29] PROBLEM - Recursive DNS on 2620:0:860:1:208:80:153:12 is CRITICAL: CRITICAL - Plugin timed out while executing system call [21:16:39] PROBLEM - Recursive DNS on 208.80.153.12 is CRITICAL: CRITICAL - Plugin timed out while executing system call [21:19:49] ^ it's ok, depooled [21:20:36] (03CR) 10ArielGlenn: [C: 032] snapshots: add scap deploy key info for dumpsgen user [puppet] - 10https://gerrit.wikimedia.org/r/392712 (owner: 10ArielGlenn) [21:21:16] tx bblack [21:26:41] !log ariel@tin Started deploy [dumps/dumps@16f92d6]: take 2: gzip namespace and abstract dumps; remove last configfile existence checks [21:26:43] !log ariel@tin Finished deploy [dumps/dumps@16f92d6]: take 2: gzip namespace and abstract dumps; remove last configfile existence checks (duration: 00m 02s) [21:26:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:26:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:28:50] (03PS1) 10BBlack: recnds+anycast: a couple more puppetization fixups [puppet] - 10https://gerrit.wikimedia.org/r/392714 [21:29:14] (03CR) 10jerkins-bot: [V: 04-1] recnds+anycast: a couple more puppetization fixups [puppet] - 10https://gerrit.wikimedia.org/r/392714 (owner: 10BBlack) [21:30:14] (03PS2) 10BBlack: recnds+anycast: a couple more puppetization fixups [puppet] - 10https://gerrit.wikimedia.org/r/392714 [21:32:43] (03CR) 10BBlack: [C: 032] recnds+anycast: a couple more puppetization fixups [puppet] - 10https://gerrit.wikimedia.org/r/392714 (owner: 10BBlack) [21:32:45] (03CR) 10Ayounsi: [C: 031] recnds+anycast: a couple more puppetization fixups [puppet] - 10https://gerrit.wikimedia.org/r/392714 (owner: 10BBlack) [21:34:39] RECOVERY - Recursive DNS on 2620:0:860:1:208:80:153:12 is OK: DNS OK: 0.046 seconds response time. www.wikipedia.org returns 208.80.154.224 [21:34:49] RECOVERY - Recursive DNS on 208.80.153.12 is OK: DNS OK: 0.044 seconds response time. www.wikipedia.org returns 208.80.154.224 [21:36:49] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:47] (03PS20) 10Paladox: Gerrit: Fix up logstash configuation [puppet] - 10https://gerrit.wikimedia.org/r/392079 (https://phabricator.wikimedia.org/T141324) [21:40:56] (03PS7) 10Paladox: Gerrit: Enable logstash for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392083 (https://phabricator.wikimedia.org/T141324) [21:50:41] (03PS1) 10BBlack: bird config 1.4 compat [puppet] - 10https://gerrit.wikimedia.org/r/392716 [21:51:30] RECOVERY - Check systemd state on acamar is OK: OK - running: The system is fully operational [21:53:02] no stretch yet? [21:53:03] damn :) [21:54:54] one thing at a time [21:55:09] yeah sure, not complaining :) [21:55:17] it bit us in other ways too [21:55:27] oh? [21:55:29] bird syntax was checked on stretch, jessie has older syntax :P [21:55:52] oh yeah, that's what I saw and said this [21:56:04] oh, yeah, that makes sense [21:56:17] I think temporal ordering ceased to exist for me, briefly [21:56:19] it's back now [21:56:32] bblack@bast4001:~$ dig +short @10.3.0.1 version.bind CH TXT [21:56:32] "acamar" [21:56:44] :) [21:56:53] ^ that's a random ulsfo host querying via anycast recdns and getting the self-id of a server in codfw :) [21:58:03] (03CR) 10BBlack: [C: 032] bird config 1.4 compat [puppet] - 10https://gerrit.wikimedia.org/r/392716 (owner: 10BBlack) [21:59:22] (03PS3) 10Rush: openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [21:59:41] (03PS4) 10Rush: openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [21:59:43] (03CR) 10jerkins-bot: [V: 04-1] openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [22:00:10] (03CR) 10jerkins-bot: [V: 04-1] openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [22:02:36] !log repooling acamar for recdns [22:02:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:02:59] 10Operations, 10Developer-Relations, 10cloud-services-team (Kanban): Create discourse-mediawiki.wmflabs.org (pilot instance) - https://phabricator.wikimedia.org/T180854#3779557 (10Qgil) [22:03:02] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=acamar.wikimedia.org [22:03:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:05:50] !log bblack@neodymium conftool action : set/pooled=no; selector: name=hydrogen.wikimedia.org [22:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:06:22] (03PS1) 10Chad: updatewikiversions: Only attempt symlink change if needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392723 [22:08:53] (03CR) 10Thcipriani: [C: 031] updatewikiversions: Only attempt symlink change if needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392723 (owner: 10Chad) [22:09:50] (03CR) 10Rush: "http://puppet-compiler.wmflabs.org/8896/" [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [22:12:38] !log gerrit - temp disable puppet on cobalt (prod gerrit), test switching gerrit logging to logstash on gerrit2001 - gerrit:392079 gerrit:392083 T141324 [22:12:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:12:46] T141324: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324 [22:13:06] (03PS5) 10Rush: openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [22:13:14] :) [22:13:26] (03CR) 10jerkins-bot: [V: 04-1] openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [22:14:16] (03CR) 10Dzahn: [C: 032] "looks like paladox has addressed all the comments from gehel, added the async appender, doesn't have to pre-define all those new logfiles " [puppet] - 10https://gerrit.wikimedia.org/r/392079 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [22:14:53] (03PS21) 10Dzahn: Gerrit: Fix up logstash configuation [puppet] - 10https://gerrit.wikimedia.org/r/392079 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [22:15:41] (03PS6) 10Rush: openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [22:16:01] (03CR) 10jerkins-bot: [V: 04-1] openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [22:19:29] PROBLEM - HHVM rendering on mw2214 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:20:20] RECOVERY - HHVM rendering on mw2214 is OK: HTTP OK: HTTP/1.1 200 OK - 76094 bytes in 0.307 second response time [22:20:21] (03PS1) 10BBlack: anycast recdns: ferm before bird [puppet] - 10https://gerrit.wikimedia.org/r/392729 [22:21:29] (03PS7) 10Rush: openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [22:21:53] (03CR) 10jerkins-bot: [V: 04-1] openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [22:24:13] (03PS2) 10BBlack: anycast recdns: ferm and ip before bird [puppet] - 10https://gerrit.wikimedia.org/r/392729 [22:26:26] (03PS8) 10Dzahn: Gerrit: Enable logstash for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392083 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [22:27:30] (03PS9) 10Paladox: Gerrit: Enable logstash for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392083 (https://phabricator.wikimedia.org/T141324) [22:28:18] (03CR) 10Dzahn: [C: 032] Gerrit: Enable logstash for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/392083 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [22:28:31] (03PS1) 10BBlack: bird: enable after systemctl reload, too [puppet] - 10https://gerrit.wikimedia.org/r/392731 [22:29:07] !log Bootstrapping Cassandra, restbase2004-b.codfw.wmnet (T179422) [22:29:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:29:13] T179422: Reshape RESTBase Cassandra clusters - https://phabricator.wikimedia.org/T179422 [22:32:56] (03CR) 10BBlack: [C: 032] anycast recdns: ferm and ip before bird [puppet] - 10https://gerrit.wikimedia.org/r/392729 (owner: 10BBlack) [22:33:02] (03PS3) 10BBlack: anycast recdns: ferm and ip before bird [puppet] - 10https://gerrit.wikimedia.org/r/392729 [22:33:05] (03CR) 10BBlack: [V: 032 C: 032] anycast recdns: ferm and ip before bird [puppet] - 10https://gerrit.wikimedia.org/r/392729 (owner: 10BBlack) [22:33:10] (03CR) 10BBlack: [C: 032] bird: enable after systemctl reload, too [puppet] - 10https://gerrit.wikimedia.org/r/392731 (owner: 10BBlack) [22:33:18] (03PS2) 10BBlack: bird: enable after systemctl reload, too [puppet] - 10https://gerrit.wikimedia.org/r/392731 [22:33:21] (03CR) 10BBlack: [V: 032 C: 032] bird: enable after systemctl reload, too [puppet] - 10https://gerrit.wikimedia.org/r/392731 (owner: 10BBlack) [22:36:44] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=hydrogen.wikimedia.org [22:36:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:36:51] 10Operations, 10Discovery, 10Wikimedia-Mailing-lists, 10Wikimedia-Portals, 10Discovery-Portal-Sprint: Email list needed for automating the Wikipedia.org portal - https://phabricator.wikimedia.org/T180976#3775108 (10hashar) Thank you @RobH . As usual 5/5 :] [22:38:00] !log bblack@neodymium conftool action : set/pooled=no; selector: name=achernar.wikimedia.org [22:38:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:39:52] lol, so close! but still: [22:39:53] Notice: /Stage[main]/Dnsrecursor/Service[pdns-recursor]: Triggered 'refresh' from 2 events [22:39:56] Notice: /Stage[main]/Ferm/Service[ferm]: Triggered 'refresh' from 3 events [22:40:35] ferm::service Before does not imply Service['ferm']'s before :) [22:45:22] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=achernar.wikimedia.org [22:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:45:52] (03PS1) 10BBlack: recdns anycast: bird class requires ferm service [puppet] - 10https://gerrit.wikimedia.org/r/392734 [22:46:46] (03CR) 10BBlack: [C: 032] recdns anycast: bird class requires ferm service [puppet] - 10https://gerrit.wikimedia.org/r/392734 (owner: 10BBlack) [22:47:20] PROBLEM - puppet last run on es2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:48:48] 10Operations, 10monitoring, 10Graphite, 10User-fgiunchedi: Programmatic generation of grafana dashboards - https://phabricator.wikimedia.org/T171482#3779727 (10Krinkle) [22:51:05] 10Operations, 10Graphite, 10Upstream: Grafana: Job Queue Health: Panel is displayed incorrectly - https://phabricator.wikimedia.org/T130512#3779730 (10Krinkle) [22:52:43] 10Operations, 10Graphite, 10Upstream: Grafana: Job Queue Health: Panel is displayed incorrectly - https://phabricator.wikimedia.org/T130512#2138003 (10Krinkle) 05Open>03Resolved a:03Krinkle [22:53:02] (03PS6) 10Smalyshev: Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) [22:56:54] (03PS7) 10Smalyshev: Enable configuration for aliasing namespaces [puppet] - 10https://gerrit.wikimedia.org/r/392554 (https://phabricator.wikimedia.org/T181016) [22:56:56] (03PS1) 10Smalyshev: Create script for automatic reload of categories [puppet] - 10https://gerrit.wikimedia.org/r/392736 (https://phabricator.wikimedia.org/T173772) [23:07:19] PROBLEM - Check systemd state on achernar is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:12:20] RECOVERY - Check systemd state on achernar is OK: OK - running: The system is fully operational [23:12:21] RECOVERY - puppet last run on es2004 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [23:18:18] (03PS1) 10BBlack: anycast dns: use router loopbacks for neighbors [puppet] - 10https://gerrit.wikimedia.org/r/392740 [23:20:32] (03CR) 10Ayounsi: [C: 031] anycast dns: use router loopbacks for neighbors [puppet] - 10https://gerrit.wikimedia.org/r/392740 (owner: 10BBlack) [23:26:11] !log mholloway-shell@tin Started deploy [mobileapps/deploy@fc01242]: Update mobileapps to 52d6a83 [23:26:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:48] (03CR) 10BBlack: [C: 032] anycast dns: use router loopbacks for neighbors [puppet] - 10https://gerrit.wikimedia.org/r/392740 (owner: 10BBlack) [23:29:11] (03PS1) 10Ayounsi: Bird: use multihop eBGP (peer with router's loopback) [puppet] - 10https://gerrit.wikimedia.org/r/392742 [23:30:02] (03CR) 10BBlack: [C: 032] Bird: use multihop eBGP (peer with router's loopback) [puppet] - 10https://gerrit.wikimedia.org/r/392742 (owner: 10Ayounsi) [23:30:11] (03PS2) 10BBlack: Bird: use multihop eBGP (peer with router's loopback) [puppet] - 10https://gerrit.wikimedia.org/r/392742 (owner: 10Ayounsi) [23:30:33] (03PS3) 10Ayounsi: Bird: use multihop eBGP (peer with router's loopback) [puppet] - 10https://gerrit.wikimedia.org/r/392742 [23:30:35] (03CR) 10BBlack: [V: 032 C: 032] Bird: use multihop eBGP (peer with router's loopback) [puppet] - 10https://gerrit.wikimedia.org/r/392742 (owner: 10Ayounsi) [23:30:48] !log mholloway-shell@tin Finished deploy [mobileapps/deploy@fc01242]: Update mobileapps to 52d6a83 (duration: 04m 36s) [23:30:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:58] (03PS10) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [23:39:30] PROBLEM - Check systemd state on achernar is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:40:54] (03PS1) 10BBlack: bird: add local source IP for BGP+BFD [puppet] - 10https://gerrit.wikimedia.org/r/392746 [23:41:30] RECOVERY - Check systemd state on achernar is OK: OK - running: The system is fully operational [23:42:00] (03CR) 10BBlack: [C: 032] bird: add local source IP for BGP+BFD [puppet] - 10https://gerrit.wikimedia.org/r/392746 (owner: 10BBlack) [23:55:26] !log bblack@neodymium conftool action : set/pooled=no; selector: name=chromium.wikimedia.org [23:55:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log