[00:00:02] although, abandoned [00:00:04] RoanKattouw ostriches Krenair awight: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160304T0000). [00:00:04] RoanKattouw Jdlrobson ebernhardson Dereckson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [00:00:11] oh, no. turns out it's abandoned. okay [00:00:42] " Jdlrobson Mar 3 6:47 PM [00:00:42] Abandoned [00:00:42] Will roll out on train today. [00:00:42] We may need to SWAT a follow up patch though" [00:00:57] Hi. [00:01:58] (03PS2) 10Awight: Disable useless Echo eventlogging schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274345 (owner: 10Catrope) [00:02:05] (03CR) 10Awight: [C: 032] Disable useless Echo eventlogging schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274345 (owner: 10Catrope) [00:02:31] (03CR) 10Awight: [C: 032] Set default completion suggester scoring for beta and prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274777 (owner: 10EBernhardson) [00:02:36] 6Operations, 6Research-and-Data, 10Wikimedia-Mailing-lists: Close / Archive rcom-l - https://phabricator.wikimedia.org/T128141#2086927 (10DarTar) 5Resolved>3Open HI @Dzahn, reopening this because I keep getting notification from the moderation queue for rcom-l. [00:02:42] (03Merged) 10jenkins-bot: Disable useless Echo eventlogging schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274345 (owner: 10Catrope) [00:02:55] (03PS2) 10Awight: Set default completion suggester scoring for beta and prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274777 (owner: 10EBernhardson) [00:03:28] (03CR) 10Awight: [C: 032] Site name configuration on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274196 (https://phabricator.wikimedia.org/T128354) (owner: 10Dereckson) [00:03:43] * RoanKattouw waves [00:03:59] RoanKattouw: hey! I'm stealing yr SWAT shift today, unless you'd like to do the honors. [00:04:28] Go for it [00:04:34] 6Operations, 10Analytics, 10hardware-requests, 13Patch-For-Review: AQS replacement nodes - https://phabricator.wikimedia.org/T124947#2086933 (10RobH) a:5RobH>3JAllemandou Please note we don't want to use non-supported SSDs in production. As such, we don't want to purchase more of the Samsung SSDs. We... [00:04:36] (03Merged) 10jenkins-bot: Site name configuration on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274196 (https://phabricator.wikimedia.org/T128354) (owner: 10Dereckson) [00:04:58] (03PS3) 10Awight: Set default completion suggester scoring for beta and prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274777 (owner: 10EBernhardson) [00:05:03] 6Operations, 10Analytics, 10hardware-requests, 13Patch-For-Review: eqiad: (3)AQS replacement nodes - https://phabricator.wikimedia.org/T124947#1970805 (10RobH) [00:05:11] 6Operations, 10Analytics, 10hardware-requests, 13Patch-For-Review: eqiad: (3) AQS replacement nodes - https://phabricator.wikimedia.org/T124947#1970805 (10RobH) [00:05:25] awight: \o :) [00:05:33] ottomata: damn dude [00:05:43] you went on a server request spree ;] [00:05:45] huh. mwversionsinuse doesn't seem to be right [00:06:05] ottomata: ive triaged them all and some came right back to you for futher info =] [00:07:10] 6Operations, 10hardware-requests, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: codfw: (2) servers for redis jobrunners - https://phabricator.wikimedia.org/T126453#2086950 (10RobH) [00:07:26] I'll push the config changes first... [00:07:41] k [00:09:16] gah. no more scap-dir [00:10:37] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2086965 (10Jdlrobson) >>! In T124356#2086258, @Jd... [00:10:44] awight: sync-dir and mwversions is correct (I think) it's just 1.27.0-wmf.15 right now [00:10:59] yeah, should jsut be 1 [00:11:22] thcipriani: ah, thanks! I was reading https://www.mediawiki.org/wiki/MediaWiki_1.27/Roadmap and the date is correct, but the green checkmark perhaps is not. [00:12:03] yeah, seems like we're missing one there. [00:12:09] !log awight@tin Synchronized wmf-config: SWAT: Disable useless Echo eventlogging schema; Site name configuration on wuu.wikipedia; Set default completion suggester scoring for beta and prod (duration: 00m 36s) [00:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:12:18] Testing for wuu. [00:12:23] thanks! [00:12:51] 6Operations, 10Wikimedia-Mailing-lists: Change configuration of research-newsletter list - https://phabricator.wikimedia.org/T128817#2086968 (10DarTar) [00:13:05] wiki gnoming is hard [00:13:17] 6Operations, 10ops-ulsfo: ulsfo UL Nagios "host DOWN" for pdua-122/pdua-123 - https://phabricator.wikimedia.org/T128383#2086983 (10RobH) 5Open>3Resolved They fixed this for me on Monday and I neglected to resolve this task until now. All nagios alerts from unitedlayer now go to the noc email not the main... [00:13:25] jdlrobson: sorry for the déjà vu, but did you have a second patch you wanted deployed? [00:14:16] 6Operations: track down and power off spare systems hitting dhcp - https://phabricator.wikimedia.org/T122990#2086988 (10RobH) 5Open>3Resolved [00:15:06] awight: doesn't work [00:15:39] awight: thanks for deploying. My patch has 0 effect on production (it already had that setting, and it's only used by a maintenance script), so nothing in particular to test. [00:15:50] so all good from my end :) [00:16:07] awight: could you mwscript eval and check $wgSiteName? [00:16:09] abian: thanks for the note! [00:16:16] Dereckson: sure-- on wuuwiki? [00:16:16] s/mwscript eval/mwrepl/ [00:16:19] yup [00:16:34] mwrepl is 10x better, imo ;) [00:16:49] ebernhardson: it's based on Psysh ? [00:16:51] eh [00:16:55] Dereckson: no, its the hhvm interactive debugger [00:17:01] just with mediawiki pre-loaded [00:17:04] ok [00:17:18] so you not only get a repl, you get conditional breakpoints, global state introspection, etc. [00:17:42] ebernhardson: U have a link to docs? it doesn't respond to --help which causes my brow to furrow [00:18:37] awight: `mwrepl` by itself gives you testwiki, `mwrepl ` gets you the named wiki. Type help from inside the repl for more details (its quite complete) [00:19:06] awight: apologies [00:19:13] got completely derailed. No patches to merge [00:19:25] there was an issue with the 2nd patch and first not needed [00:19:26] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There are 4 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [00:19:28] sorry for not updating sooner [00:19:44] Dereckson: "Wikipedia" [00:20:00] jdlrobson: see you in a week! ;) [00:20:07] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 4 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [00:20:32] awight: okay, so the changes you merged doesn't seem deployed [00:21:26] Dereckson: whoa--stupid mistake. [00:21:40] I carefully read the log, and sync'd. never did the rebase [00:21:51] (03PS1) 10Yuvipanda: labs: Kill NFS from the wikistats project [puppet] - 10https://gerrit.wikimedia.org/r/274854 (https://phabricator.wikimedia.org/T128816) [00:21:56] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [00:22:46] (03PS2) 10Yuvipanda: labs: Kill NFS from the wikistats project [puppet] - 10https://gerrit.wikimedia.org/r/274854 (https://phabricator.wikimedia.org/T128816) [00:23:20] !log awight@tin Synchronized wmf-config: SWAT: Disable useless Echo eventlogging schema; Site name configuration on wuu.wikipedia; Set default completion suggester scoring for beta and prod (take 2) (duration: 00m 32s) [00:23:53] Dereckson: looks like your change is deployed now! "维基百科" [00:24:11] Indeed, thanks for the deploy. [00:24:29] (03CR) 10Yuvipanda: [C: 032] labs: Kill NFS from the wikistats project [puppet] - 10https://gerrit.wikimedia.org/r/274854 (https://phabricator.wikimedia.org/T128816) (owner: 10Yuvipanda) [00:24:40] * awight punches timecard [00:24:48] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [00:25:10] awight: can I submit a follow-up patch? Speaking of stupid mistakes, I put the namespace in the wrong setting, wgMetaNamespaceTalk instead of wgMetaNamespace. [00:25:18] Dereckson: sure! [00:27:58] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087040 (10Dzahn) edit: let me make one exception from that. let us keep the fr-tech-ops alias on our side, but all others can be moved. thank you [00:28:11] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087041 (10Dzahn) [00:30:33] (03PS1) 10Dereckson: Namespace configuration on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274857 [00:30:44] (03CR) 10Dereckson: "Follow-up: Iccc4b0cdda94cd0b0be1145dc96a213338be3354" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274196 (https://phabricator.wikimedia.org/T128354) (owner: 10Dereckson) [00:30:55] awight: ^ [00:33:31] (03CR) 10Awight: [C: 032] Namespace configuration on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274857 (owner: 10Dereckson) [00:34:35] !log Running extensions/Echo/maintenance/backfillUnreadWikis.php on all wikis. This will probably take a few days [00:34:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Mr. Obvious [00:35:00] !log awight@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: followup Namespace configuration on wuu.wikipedia (duration: 00m 26s) [00:35:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:35:05] Dereckson: lmk if that looks good, now [00:35:24] * Dereckson concurs. [00:36:10] Thanks. [00:36:40] enjoy! ;) [00:37:12] 6Operations, 10Mail: move travel related aliases to OIT - https://phabricator.wikimedia.org/T127549#2087061 (10Dzahn) We tried this out and talked on IRC. I commented them and Byron added them but it wouldn't work, there were still bounces. Reverted, while there seemed no such issue with the fundraising aliase... [00:40:03] (03PS10) 10Dzahn: RT: add role to krypton [puppet] - 10https://gerrit.wikimedia.org/r/250047 (https://phabricator.wikimedia.org/T119112) [00:51:32] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087111 (10bbogaert) @atgo Do we need fundraiser-2012 or fr-2012? Thanks, Byron [00:51:51] 6Operations, 10hardware-requests: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2087114 (10RobH) [00:52:02] 6Operations, 10hardware-requests: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2087128 (10RobH) [00:52:04] 6Operations, 10ops-codfw, 13Patch-For-Review: power off Codfw-Cisco Servers - https://phabricator.wikimedia.org/T115372#2087129 (10RobH) [00:52:07] 6Operations, 10hardware-requests: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2087114 (10RobH) [00:52:09] 6Operations, 10ops-codfw, 13Patch-For-Review: power off Codfw-Cisco Servers - https://phabricator.wikimedia.org/T115372#1722630 (10RobH) 5Open>3Resolved [00:53:02] 6Operations, 10hardware-requests: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2087114 (10RobH) [00:56:19] 6Operations, 10hardware-requests: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2087139 (10RobH) [00:59:30] 6Operations, 10ops-eqiad, 10hardware-requests: Decommission calcium - https://phabricator.wikimedia.org/T116790#1758512 (10mmodell) >>! In T116790#1934848, @Cmjohnson wrote: > Wiped - awaiting approval to decommission //Decalcified?// sorry, I couldn't resist. [01:00:15] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087158 (10bbogaert) @Dzahn Who was in fr-tech-ops? I can not nest/inherit other groups in Google Groups. [01:01:14] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087162 (10Dzahn) @bbogaert fr-tech-ops: cmjohnson, jgreen it's back on our side [01:04:47] PROBLEM - puppet last run on mw2181 is CRITICAL: CRITICAL: Puppet has 1 failures [01:04:49] 6Operations, 10Wikimedia-Mailing-lists: Change configuration of research-newsletter list - https://phabricator.wikimedia.org/T128817#2087166 (10Tbayer) p:5Triage>3Lowest (As noted in the parallel email thread, I made two changes already - without seeing this ticket - that may have taken care of the situati... [01:06:46] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087168 (10bbogaert) @Dzahn Thanks, I just need to make sure they were added to fr-tech [01:07:21] 6Operations, 10ops-eqiad: What to do with decommissioned ciscos? - https://phabricator.wikimedia.org/T103374#2087170 (10Peachey88) [01:07:23] 6Operations, 10hardware-requests: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2087169 (10Peachey88) [01:16:33] haha, robh thank you! [01:16:42] yeah, i realized that i should have created those things right after we were done budgeting [01:19:27] 6Operations, 10RESTBase, 6Services, 10Traffic, and 2 others: Split slash decoding from general percent normalization in Varnish VCL - https://phabricator.wikimedia.org/T127387#2087179 (10GWicke) p:5High>3Normal [01:22:37] (03PS1) 10Smalyshev: Add caching headers for nginx [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) [01:30:52] (03PS2) 10Smalyshev: Add caching headers for nginx [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) [01:31:56] RECOVERY - puppet last run on mw2181 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:32:00] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087211 (10bbogaert) Hi Daniel, I have created the Google Groups: fr-development, fr-software-engineers, fr-tech, fr-online, fr-all, strategicpartnerships, and advancement. When you have time let's remove t... [01:37:45] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087216 (10Dzahn) Hi Byron, i'm here. we are starting with fr-development .. Switching to IRC session.. doing these one by one. btw, it's not mailman, it's just exim. mailman is lists.wikimedia.org and a... [01:38:45] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2087218 (10EBernhardson) For the pool counter part i filed a task, T128761, and wrote a patch which is up now. I used (p75+cross dc latency... [01:38:55] 6Operations, 10media-storage: Unable to delete, restore/undelete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2087220 (10Osiris) I was able to delete the one on simplewiki just now. Thank you! [01:40:23] (03Abandoned) 10MaxSem: Fix GWToolset-related fatal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/270459 (https://phabricator.wikimedia.org/T126830) (owner: 10MaxSem) [01:42:08] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087232 (10Dzahn) >>! In T128647#2087211, @bbogaert wrote: > I have created fundraising_vmail but need to know who is in //donate// before it is completed. so donate@ is just fundraising@ and then there ar... [01:50:12] (03PS2) 10Yuvipanda: labs: Revert all work around CNAMEs for toollabs [puppet] - 10https://gerrit.wikimedia.org/r/274179 (https://phabricator.wikimedia.org/T118758) [01:50:23] (03Abandoned) 10Yuvipanda: labs: Revert all work around CNAMEs for toollabs [puppet] - 10https://gerrit.wikimedia.org/r/274179 (https://phabricator.wikimedia.org/T118758) (owner: 10Yuvipanda) [01:51:18] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087247 (10Dzahn) permanently removed: ``` -fundraiser-2012: fr-online -fr-2012: fr-online ``` deactivated after Byron added them: ``` -fr-software-engineers: agreen, awight, eeggleston, khorn, dkozlows... [01:52:39] 6Operations, 10hardware-requests: +1 'stat' type box for hadoop client usage - https://phabricator.wikimedia.org/T128808#2087249 (10Ottomata) WMF4541 sounds great! [01:52:53] 6Operations, 10hardware-requests: +1 'stat' type box for hadoop client usage - https://phabricator.wikimedia.org/T128808#2087251 (10Ottomata) a:5Ottomata>3RobH [01:56:56] (03PS3) 10Smalyshev: Add caching headers for nginx [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) [02:02:07] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087265 (10Dzahn) Ok, Byron confirmed it's working. I removed all of that, also "fundraising_vmail@" and the only one that remains from that block is fr-tech-ops@ because that just makes sense to stay in ops... [02:04:45] 6Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#2087273 (10Dzahn) [02:04:47] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087271 (10Dzahn) 5Open>3Resolved a:3Dzahn [02:06:52] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087274 (10Dzahn) There is more fundraising related stuff in other subtasks but the original block in this ticket is gone. [02:09:14] (03CR) 10Dzahn: [C: 032] RT: add role to krypton [puppet] - 10https://gerrit.wikimedia.org/r/250047 (https://phabricator.wikimedia.org/T119112) (owner: 10Dzahn) [02:13:36] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087287 (10atgo) @dzahn donate@ is an alias we use to send email. It's important not to mess with that one. I believe those emails go to zendesk (our donor services console). @mbeat33 @ccogdill_wmf please loo... [02:13:41] (03PS3) 10Dzahn: role/dns: move to module role, rename ::dnsrecursor [puppet] - 10https://gerrit.wikimedia.org/r/271735 [02:14:03] (03CR) 10jenkins-bot: [V: 04-1] role/dns: move to module role, rename ::dnsrecursor [puppet] - 10https://gerrit.wikimedia.org/r/271735 (owner: 10Dzahn) [02:18:37] PROBLEM - puppet last run on krypton is CRITICAL: CRITICAL: Puppet has 1 failures [02:20:04] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2087289 (10Dzahn) @atgo Thank you, ok! We haven't touched donate@ . Even though i mentioned that on T128647#2087232 only the things on T128647#2087247 have been touched (so far). The following lines are defi... [02:21:37] krypton is me [02:22:34] i wanted to see what error i get and yea, it's something fun [02:22:44] Error: Cannot create /var/cache/request-tracker4/data/RT-Shredder; ... history [02:23:00] Shredder was for secure deleting things from tickets [02:26:05] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.15) (duration: 11m 22s) [02:26:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:27:10] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2087323 (10EBernhardson) For configuring CirrusSearch to use https connections and utilize a specific pem file i have put together a patch... [02:27:58] (03PS1) 10EBernhardson: Use https to talk to elasticsearch in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274877 (https://phabricator.wikimedia.org/T124444) [02:33:37] (03PS1) 10Dzahn: RT: do not load shredder plugin [puppet] - 10https://gerrit.wikimedia.org/r/274879 [02:33:50] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Mar 4 02:33:49 UTC 2016 (duration 7m 44s) [02:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:34:44] (03CR) 10Dzahn: [C: 032] RT: do not load shredder plugin [puppet] - 10https://gerrit.wikimedia.org/r/274879 (owner: 10Dzahn) [02:36:14] RECOVERY - puppet last run on krypton is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [02:44:11] 6Operations, 10Mail, 10fundraising-tech-ops: donation aliases for moneybookers? - https://phabricator.wikimedia.org/T127489#2045121 (10Dzahn) Hi FR people, do we still need the mail aliases above (donationGPB@, donationAUD@ etc for moneybookers? ) [02:45:17] 6Operations, 10Mail: delete communicationsintern@ mail alias ? - https://phabricator.wikimedia.org/T127546#2087338 (10Dzahn) @Varnent do you have an intern nowadays? [02:46:49] 6Operations, 6Labs, 10Mail, 10Tool-Labs: remove toolserver mail aliases - https://phabricator.wikimedia.org/T127543#2087339 (10Dzahn) p:5Normal>3Low [02:47:00] yuvipanda: ^ :p [02:47:16] ts-admins@ [02:49:44] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 65.22% of data above the critical threshold [5000000.0] [02:56:33] afk's [03:00:35] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [03:25:35] RECOVERY - cassandra-b CQL 10.64.0.115:9042 on restbase1010 is OK: TCP OK - 0.005 second response time on port 9042 [03:28:28] !log starting decomission of restbase1009.eqiad.wmnet : T95253 [03:28:29] T95253: Finish conversion to multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253 [03:28:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:34:07] !log Starting `nodetool cleanup' on restbase100{1,2,7-a,7-b}.eqiad.wmnet and restbase1010-a : T95253 [03:34:08] T95253: Finish conversion to multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253 [03:34:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:47:33] PROBLEM - MariaDB Slave SQL: m3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1007, Errmsg: Error Cant create database heartbeat: database exists on query. Default database: heartbeat. Query: create database heartbeat [04:21:24] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [100000000.0] [04:30:54] PROBLEM - MariaDB Slave SQL: m3 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1007, Errmsg: Error Cant create database heartbeat: database exists on query. Default database: heartbeat. Query: create database heartbeat [05:15:14] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [05:24:56] (03CR) 1020after4: [C: 031] gerrit: Whitelist PNG as safe to render in change sets [puppet] - 10https://gerrit.wikimedia.org/r/274741 (owner: 10Krinkle) [05:51:06] 6Operations, 10Mail: delete communicationsintern@ mail alias ? - https://phabricator.wikimedia.org/T127546#2087507 (10Varnent) @Dzahn We have team interns, but I do not have one working with me in particular. :) We have talked about this list a bit on our staff list. It doesn't sound like we are in need of thi... [05:59:38] 6Operations, 10Mail: delete communicationsintern@ mail alias ? - https://phabricator.wikimedia.org/T127546#2087510 (10Varnent) Just got confirmation, sounds like we are ready to live without this alias. :) You are welcome to delete this. I can verify that Comms is good with that. Thank you! -greg [05:59:44] PROBLEM - puppet last run on mw2069 is CRITICAL: CRITICAL: Puppet has 1 failures [06:08:10] 6Operations, 6Labs: Can't create account "Bishoy Camel" (user with a former SVN account not migrated) - https://phabricator.wikimedia.org/T128833#2087511 (10BishoyCamel) [06:18:58] 6Operations, 6Labs: Can't create account "Bishoy Camel" (user with a former SVN account not migrated) - https://phabricator.wikimedia.org/T128833#2087511 (10Peachey88) Hi @BishoyCamel, Can you please provide the following information so a labs admin can work on the issue: Preferred wikitech username: Prefer... [06:21:22] (03PS3) 10Dzahn: gerrit: Whitelist PNG as safe to render in change sets [puppet] - 10https://gerrit.wikimedia.org/r/274741 (owner: 10Krinkle) [06:21:55] (03CR) 10Dzahn: [C: 032] "with the +1 from csteipp now, i'm removing my own -1 and merge it instead" [puppet] - 10https://gerrit.wikimedia.org/r/274741 (owner: 10Krinkle) [06:23:39] (03CR) 10Dzahn: "will cause gerrit restart, doing now when fewer use it. then good night" [puppet] - 10https://gerrit.wikimedia.org/r/274741 (owner: 10Krinkle) [06:24:40] !log gerrit being restarted for config change 274741 [06:24:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:26:34] RECOVERY - puppet last run on mw2069 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:29:43] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: puppet fail [06:30:24] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:35] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:45] PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:45] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:54] PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:03] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:13] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:25] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:25] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:25] PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:53] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:54] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:04] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:13] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:23] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:35:23] 6Operations, 10Mail: delete communicationsintern@ mail alias ? - https://phabricator.wikimedia.org/T127546#2087547 (10Dzahn) @Varnent Thank you very much for checking this. I just removed it now. ``` -communicationintern: communicationsintern - ``` In the " ## Communications ##" section we now just have... [06:35:46] 6Operations, 10Mail: delete communicationsintern@ mail alias ? - https://phabricator.wikimedia.org/T127546#2087548 (10Dzahn) 5Open>3Resolved a:3Dzahn [06:35:47] 6Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#2087550 (10Dzahn) [06:40:24] 6Operations, 6Research-and-Data, 10Wikimedia-Mailing-lists: Close / Archive rcom-l - https://phabricator.wikimedia.org/T128141#2087573 (10Dzahn) a:3Dzahn [06:43:42] 6Operations, 6Research-and-Data, 10Wikimedia-Mailing-lists: Close / Archive rcom-l - https://phabricator.wikimedia.org/T128141#2087577 (10Dzahn) Hey @DarTar i just removed your email address from admin and moderator fields (and also Erik). That seemed to be the easiest solution to make sure you don't get spa... [06:56:24] RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:56:53] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:56:53] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:57:04] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:57:05] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:57:05] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:57:24] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:34] RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:34] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:44] RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:44] RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:55] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:13] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:14] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:08:52] 6Operations: Sudden increase in NOTICE events from hhvm while trying to de-pool rdb1003 for maintenance - https://phabricator.wikimedia.org/T128730#2087621 (10elukey) [07:08:54] 6Operations, 13Patch-For-Review: Reinstall redis servers (Job queues) with Jessie (NOTE: rdb1002 is special and is excluded!) - https://phabricator.wikimedia.org/T123675#2087620 (10elukey) [07:23:41] I warn you in advance- today delayed slaves' replication will fail several times- it won't page, and the errors are planned due to the delay nature of them [07:24:15] I will not downtime it because it will only show errors on IRC, and need to see them [07:29:14] RECOVERY - MariaDB Slave SQL: m3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [07:32:48] hey, I'm sooo new to puppet so my question might sound really silly. If we do "require_package" instead of "ensure_package" does it make the package to initiate? in general what's the difference between these two [07:33:25] RECOVERY - MariaDB Slave SQL: m3 on dbstore2001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [07:35:10] I tried google, but no good luck [07:43:01] Amir1, https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/stdlib/lib/puppet/parser/functions/ensure_packages.rb vs https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/wmflib/lib/puppet/parser/functions/require_package.rb [07:44:08] thanks jynus [07:45:04] I know it is not too helpful, but it is how I would check the differences [07:46:05] Actually it's very helpful in my case :) I've got what I've needed [07:57:11] 6Operations, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, 6Services, and 3 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2087700 (10Arrbee) [07:59:10] Another question, how do you make patches in operations/puppt.git? even in shallow mode, my pc and labs both choked [08:01:17] trying --depth 1 [08:03:31] Amir1: what error message are you getting? [08:03:52] you have to make sure you're pushing to the 'production' branch, there is no "master" branch [08:03:54] https://www.irccloud.com/pastebin/A6ZvzX08/ [08:04:11] I'm trying to clone [08:04:20] legoktm: ^ [08:04:22] umm, you need to give it your passphrase? [08:04:25] or just clone over https? [08:04:58] git clone https://gerrit.wikimedia.org/r/operations/puppet.git [08:05:04] my ppk in labs is closed with a passphrase so whenever I use labs to make patches I enter the password [08:05:18] but in my pc, it's open [08:05:27] ok, let me try [08:06:21] ok, it's working! [08:06:50] let me see if I can make patches :) [08:06:50] thanks :) [08:15:07] internal server error [08:15:14] I guess gerrit is down :( [08:15:31] ladsgroup@tools-bastion-02:~$ git clone --depth 10 https://gerrit.wikimedia.org/r/operations/puppet.git [08:15:31] Cloning into 'puppet'... [08:15:39] and hangs [08:23:32] wtf, it works in my pc not labs [08:35:15] Amir1: ops/puppet is a really big repo (I think it's the largest on our Gerrit), sometimes it just acts weird [08:35:44] Yeah, as I guessed [08:41:37] !log downtiming all mysql replicas lag for 2 hours to test new alert check [08:41:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:44:24] PROBLEM - MariaDB Slave SQL: s1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1060, Errmsg: Error Duplicate column name shard on query. Default database: heartbeat. Query: ALTER TABLE heartbeat ADD COLUMN shard varbinary(10) DEFAULT NULL, ENGINE=MyISAM [08:44:42] that is ok, mentioned it earlier^ [08:46:13] RECOVERY - MariaDB Slave SQL: s1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [08:47:17] for those joining us now, I expect this to happen once per shard on non-production machines, and will fix immediatelly (it is difficult to fix for a delayed slave yesterday) [09:10:58] !log re-imaging iron with jessie [09:11:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:11:29] so long iron [09:13:54] PROBLEM - MariaDB Slave SQL: s2 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1060, Errmsg: Error Duplicate column name shard on query. Default database: heartbeat. Query: ALTER TABLE heartbeat ADD COLUMN shard varbinary(10) DEFAULT NULL, ENGINE=MyISAM [09:14:54] PROBLEM - MariaDB Slave SQL: s2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1060, Errmsg: Error Duplicate column name shard on query. Default database: heartbeat. Query: ALTER TABLE heartbeat ADD COLUMN shard varbinary(10) DEFAULT NULL, ENGINE=MyISAM [09:15:08] ^doing [09:16:25] PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1060, Errmsg: Error Duplicate column name shard on query. Default database: heartbeat. Query: ALTER TABLE heartbeat ADD COLUMN shard varbinary(10) DEFAULT NULL, ENGINE=MyISAM [09:16:44] RECOVERY - MariaDB Slave SQL: s2 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [09:18:14] RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [09:19:52] waiting now for the rest... [09:21:04] RECOVERY - MariaDB Slave SQL: s2 on dbstore2001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [09:21:40] 6Operations, 6Labs: Can't create account "Bishoy Camel" (user with a former SVN account not migrated) - https://phabricator.wikimedia.org/T128833#2087889 (10Aklapper) a:5RyanLane>3None [09:21:46] I cought a couple before icinga [09:22:41] 6Operations, 6Labs: Can't create account "Bishoy Camel" (user with a former SVN account not migrated) - https://phabricator.wikimedia.org/T128833#2087511 (10Aklapper) (Removing task assignee as per [[ https://mediawiki.org/wiki/How_to_report_a_bug | guidelines ]].) [10:00:44] (03CR) 10Mobrovac: Services: introduce service::packages (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/274675 (https://phabricator.wikimedia.org/T128280) (owner: 10Mobrovac) [10:01:21] grrrit-wm: welcome back [10:01:27] volans: ^ [10:01:36] thanks! [10:03:37] (03CR) 10Volans: [C: 032] Update codfw external storage server topology [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274914 (https://phabricator.wikimedia.org/T127330) (owner: 10Volans) [10:04:02] (03Merged) 10jenkins-bot: Update codfw external storage server topology [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274914 (https://phabricator.wikimedia.org/T127330) (owner: 10Volans) [10:07:02] (03PS1) 10Hashar: hiera_lookup: enhance help message [puppet] - 10https://gerrit.wikimedia.org/r/274917 [10:07:33] !log volans@tin Synchronized wmf-config/db-codfw.php: Update codfw external storage servers topology T127330 (duration: 00m 39s) [10:07:34] T127330: Migration from es2001-es2010 to es2011-es2019 - https://phabricator.wikimedia.org/T127330 [10:07:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:08:24] 6Operations, 10ops-eqiad: No serial console on iron's mgmt interface - https://phabricator.wikimedia.org/T128845#2087952 (10MoritzMuehlenhoff) [10:08:32] (03CR) 10Hashar: "The main reason for this change is that verbose option (-v) was not being displayed in the help message and I though it would be idea to e" [puppet] - 10https://gerrit.wikimedia.org/r/274917 (owner: 10Hashar) [10:09:34] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 90185 MB (3% inode=99%) [10:16:38] 6Operations, 7Availability, 5MW-1.27-release-notes, 13Patch-For-Review, and 3 others: Implement a replication strategy for Swift - https://phabricator.wikimedia.org/T91869#2087974 (10fgiunchedi) [10:16:40] 6Operations, 10media-storage: Unable to delete, restore/undelete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2087971 (10fgiunchedi) 5Open>3Resolved a:3fgiunchedi indeed I can't see further e... [10:21:15] (03PS1) 10Jcrespo: Simplify lag checking thanks to new pt-heartbeat-wikimedia [software] - 10https://gerrit.wikimedia.org/r/274919 [10:23:30] (03PS12) 10ArielGlenn: send web server logs from dataset hosts to stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/268129 (https://phabricator.wikimedia.org/T118739) [10:24:31] (03CR) 10ArielGlenn: [C: 032] send web server logs from dataset hosts to stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/268129 (https://phabricator.wikimedia.org/T118739) (owner: 10ArielGlenn) [10:25:26] ^I wanted for sooooo long fix that horrible check [10:27:42] nice! [10:27:42] (03PS1) 10Muehlenhoff: Drop references to the source package for the rt flavour [debs/linux44] - 10https://gerrit.wikimedia.org/r/274920 [10:30:22] !log Start copying data from es200[124] to es201[123] (ETA ~16-17h) T127330 [10:30:23] T127330: Migration from es2001-es2010 to es2011-es2019 - https://phabricator.wikimedia.org/T127330 [10:30:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:35:41] 6Operations, 10Datasets-General-or-Unknown, 6WMDE-Analytics-Engineering, 10Wikidata: Push dumps.wm.o logs files to stat1002 - https://phabricator.wikimedia.org/T118739#2088009 (10ArielGlenn) 5Open>3Resolved After a ridiculous amount of help from @Ottomata (thank you!) this is now live, and a manual run... [10:48:08] 6Operations, 10Analytics, 10hardware-requests, 13Patch-For-Review: eqiad: (3) AQS replacement nodes - https://phabricator.wikimedia.org/T124947#2088046 (10JAllemandou) @Eevans : is 64GB memory good (for 2x 6 cores CPU0, or is it better to ask for 128 ? @RobH : 8T useful (after RAID 10) per machine gives u... [10:48:31] 6Operations, 10Analytics, 10hardware-requests, 13Patch-For-Review: eqiad: (3) AQS replacement nodes - https://phabricator.wikimedia.org/T124947#2088047 (10JAllemandou) a:5JAllemandou>3RobH [10:50:45] 6Operations: upgrade 15+4 swift servers from precise to trusty - https://phabricator.wikimedia.org/T125024#2088054 (10fgiunchedi) [10:50:47] 6Operations, 13Patch-For-Review: UnicodeDecodeError invalid continuation byte on ms-fe1004 - https://phabricator.wikimedia.org/T128081#2088052 (10fgiunchedi) 5Open>3Resolved logging is now enabled for these exceptions and we're returning 400 to clients [10:58:31] (03PS1) 10Filippo Giunchedi: swiftrepl: bump to 0.0.3 [software] - 10https://gerrit.wikimedia.org/r/274921 [10:58:43] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swiftrepl: bump to 0.0.3 [software] - 10https://gerrit.wikimedia.org/r/274921 (owner: 10Filippo Giunchedi) [10:59:34] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 91471 MB (3% inode=99%) [11:00:19] (03CR) 10Gehel: Add caching headers for nginx (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) (owner: 10Smalyshev) [11:02:14] (03PS2) 10Jcrespo: Simplify lag checking thanks to new pt-heartbeat-wikimedia [software] - 10https://gerrit.wikimedia.org/r/274919 [11:03:55] (03PS3) 10Jcrespo: Simplify lag checking thanks to new pt-heartbeat-wikimedia [software] - 10https://gerrit.wikimedia.org/r/274919 [11:04:27] (03CR) 10Volans: Simplify lag checking thanks to new pt-heartbeat-wikimedia (031 comment) [software] - 10https://gerrit.wikimedia.org/r/274919 (owner: 10Jcrespo) [11:04:47] (03CR) 10Jcrespo: [C: 032] Simplify lag checking thanks to new pt-heartbeat-wikimedia [software] - 10https://gerrit.wikimedia.org/r/274919 (owner: 10Jcrespo) [11:04:55] (03CR) 10Jcrespo: [V: 032] Simplify lag checking thanks to new pt-heartbeat-wikimedia [software] - 10https://gerrit.wikimedia.org/r/274919 (owner: 10Jcrespo) [11:08:33] (03PS5) 10Jcrespo: New check for pt-heartbeat-wikimedia including the shards [puppet] - 10https://gerrit.wikimedia.org/r/274680 [11:10:51] (03CR) 10Jcrespo: "@Volans It should be already part of this change (mariadb repo update." [puppet] - 10https://gerrit.wikimedia.org/r/274680 (owner: 10Jcrespo) [11:13:26] (03CR) 10Jcrespo: [C: 032] New check for pt-heartbeat-wikimedia including the shards [puppet] - 10https://gerrit.wikimedia.org/r/274680 (owner: 10Jcrespo) [11:13:56] ) [11:18:17] (03CR) 10Filippo Giunchedi: [C: 031] Services: introduce service::packages [puppet] - 10https://gerrit.wikimedia.org/r/274675 (https://phabricator.wikimedia.org/T128280) (owner: 10Mobrovac) [11:19:07] !log deploying new replication check algorithm cross-fleet [11:19:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:21:44] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 91253 MB (3% inode=99%) [11:27:15] now we can see the lag of all servers, including those stopped [11:27:47] with a [0, 0.5] second error [11:29:14] neat! [11:30:04] also, on those that "replication says its is running, but it is blocked", acting as a watchdog [11:30:15] plus, needed for multi-datacenter check [11:32:01] and a hard requirement for active-active datacenters [11:34:21] (03PS5) 10Muehlenhoff: Add nschaaf to researchers, bastiononly [puppet] - 10https://gerrit.wikimedia.org/r/274118 (https://phabricator.wikimedia.org/T128381) [11:34:41] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add nschaaf to researchers, bastiononly [puppet] - 10https://gerrit.wikimedia.org/r/274118 (https://phabricator.wikimedia.org/T128381) (owner: 10Muehlenhoff) [11:36:46] there is high load on s2 [11:38:12] nothing worring, but unusual (it is higher than enwiki) [11:41:25] 6Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to researchers for nschaaf - https://phabricator.wikimedia.org/T128381#2088162 (10MoritzMuehlenhoff) 5Open>3Resolved Nathaniel, I've just merged the patch which enables your access. Please give it a try and ping me on IRC (username... [11:45:11] (03PS1) 10ArielGlenn: Revert "use 10gb nic mac addy for dataset1001 in dhcp" [puppet] - 10https://gerrit.wikimedia.org/r/274927 [11:45:33] (03PS2) 10ArielGlenn: Revert "use 10gb nic mac addy for dataset1001 in dhcp" [puppet] - 10https://gerrit.wikimedia.org/r/274927 [11:46:56] (03CR) 10ArielGlenn: [C: 032] Revert "use 10gb nic mac addy for dataset1001 in dhcp" [puppet] - 10https://gerrit.wikimedia.org/r/274927 (owner: 10ArielGlenn) [11:49:28] (03PS1) 10Muehlenhoff: Disable rt flavour on the source package level [debs/linux44] - 10https://gerrit.wikimedia.org/r/274929 [11:49:30] (03PS1) 10Muehlenhoff: Regenerate rules/control files after disabling rt flavour [debs/linux44] - 10https://gerrit.wikimedia.org/r/274930 [11:51:07] 6Operations, 10Dumps-Generation, 13Patch-For-Review: Migrate dataset1001 and ms1001 to jessie - https://phabricator.wikimedia.org/T123724#2088211 (10ArielGlenn) Here we go again, in about one hour: Disable all rsyncs to/from dataset1001 except ms1001, disable any cron jobs that run there Disable cron jobs o... [11:51:54] didn't we have an s2 spike in the recent past with a lot of writes? [11:52:11] jynus: and do you remember what the cause was? (I don't) [11:52:20] (03PS2) 10BBlack: remove cache ipsec-specific nodelists [puppet] - 10https://gerrit.wikimedia.org/r/274824 (https://phabricator.wikimedia.org/T127481) [11:52:22] (03PS3) 10BBlack: re-arrange cache ipsec for codfw as a backend [puppet] - 10https://gerrit.wikimedia.org/r/274825 (https://phabricator.wikimedia.org/T127481) [11:52:24] (03PS2) 10BBlack: strongswan: do not rely on $site_tier for dpdaction [puppet] - 10https://gerrit.wikimedia.org/r/274822 (https://phabricator.wikimedia.org/T127481) [11:52:26] (03PS2) 10BBlack: remove $site_tier, no longer used [puppet] - 10https://gerrit.wikimedia.org/r/274823 (https://phabricator.wikimedia.org/T127481) [11:52:29] you have better memory than I do [11:52:37] uh oh [11:52:55] 6Operations, 6Analytics-Kanban, 10Traffic, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2088214 (10elukey) @Ottomata and @faidon: thanks a lot for all the info! >For this, my own recommendation would be to either add JSON & librdkaf... [11:53:02] I just remember thinking it wasn't wikidata or any of he usual suspects and then drawing a blank [11:53:49] yes, I want to remember a spike on updates, which is the current trend [11:53:54] 6Operations, 10Wikimedia-Logstash: Auto generated Logstash unit file has "Restart=no" - https://phabricator.wikimedia.org/T127677#2088217 (10MoritzMuehlenhoff) [11:53:56] 6Operations: Make services manageable by systemd (tracking) - https://phabricator.wikimedia.org/T97402#2088216 (10MoritzMuehlenhoff) [11:55:27] it seems to have calmed down, so I haven't followed up [11:56:35] nice jynus volans es rebuild in codfw is successfully saturating their 1gbit port [11:57:29] is that ironic or ok with that? [11:58:32] not ironic no, I got the port utilization emails from librenms, and yes I think that's ok [11:58:44] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 89653 MB (3% inode=99%) [11:59:23] I sent him my recipe, but he improved it :-) [11:59:30] 6Operations, 6Analytics-Kanban, 10Traffic, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2088220 (10BBlack) >>! In T124278#2088214, @elukey wrote: > but we had to import a lot of .c files into varnishkafka's source. An alternative p... [11:59:43] I want to put that as a command on saltmaster [12:00:00] m*ritz also asked for it [12:00:11] godog: yes it is expected, it did also for the other es servers in the past days, I checked with para.void our capacity [12:00:36] puppet compiler console output just failed with: [12:00:36] [ 2016-03-04T11:54:32 ] CRITICAL: Unexpected error running the payload: [Errno 28] No space left on device [12:00:39] [ 2016-03-04T11:54:32 ] WARNING: post-exec callback failed. [12:00:42] [ 2016-03-04T11:54:32 ] CRITICAL: Build run failed: [Errno 28] No space left on device [12:00:49] Building remotely on compiler02.puppet3-diffs.eqiad.wmflabs [12:00:54] you miss the emails in the past days? :) unfortunately there was not an easy way to disable the email alert for port utilization [12:00:55] <_joe_> bblack: heh it finally ran out of space [12:01:26] <_joe_> bblack: I'm going to look at it, we might need to add more machines I guess [12:01:56] volans: heheh no, but I've acknowledged the alarm in librenms now [12:02:27] <_joe_> bblack: yup, fixing [12:02:32] thanks! I don't have access [12:06:15] <_joe_> bblack: fixed, we had the results of 2000 compile jobs still there [12:06:43] :) [12:07:56] <_joe_> the fact that there was no rotation was deliberate; I wanted to see how long we could go without deleting stuff [12:09:01] <_joe_> apparently the answer is "somewhere around 6 months" [12:18:29] (03PS4) 10Muehlenhoff: Add ferm rules for maps/cassandra [puppet] - 10https://gerrit.wikimedia.org/r/270280 [12:19:20] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add ferm rules for maps/cassandra [puppet] - 10https://gerrit.wikimedia.org/r/270280 (owner: 10Muehlenhoff) [12:24:29] (03PS1) 10Muehlenhoff: Install php5-readline on mediawiki maintenance hosts [puppet] - 10https://gerrit.wikimedia.org/r/274931 (https://phabricator.wikimedia.org/T126262) [12:40:12] 6Operations, 6Services, 10hardware-requests: Hardware request for SCA and SCB in codfw - https://phabricator.wikimedia.org/T128475#2088273 (10Ricordisamoa) Out of curiosity, will prices be announced once they're final? [12:40:53] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 90310 MB (3% inode=99%) [12:53:33] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 87862 MB (3% inode=99%) [13:03:32] !log nfs filesystem from dataset1001 now unavailable as we prep for upgrade [13:03:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:04:36] rats forgot to ack [13:04:43] * apergos goes to set downtime Right Now [13:14:02] (03PS3) 10BBlack: strongswan: do not rely on $site_tier for dpdaction [puppet] - 10https://gerrit.wikimedia.org/r/274822 (https://phabricator.wikimedia.org/T127481) [13:15:30] (03CR) 10BBlack: [C: 032 V: 032] "templates changes look good in compiler" [puppet] - 10https://gerrit.wikimedia.org/r/274822 (https://phabricator.wikimedia.org/T127481) (owner: 10BBlack) [13:20:34] (03PS1) 10Muehlenhoff: Add ferm rules for kartotherian, tilerator and tileratorui [puppet] - 10https://gerrit.wikimedia.org/r/274936 [13:23:07] (03PS3) 10BBlack: remove cache ipsec-specific nodelists [puppet] - 10https://gerrit.wikimedia.org/r/274824 (https://phabricator.wikimedia.org/T127481) [13:23:09] (03PS4) 10BBlack: re-arrange cache ipsec for codfw as a backend [puppet] - 10https://gerrit.wikimedia.org/r/274825 (https://phabricator.wikimedia.org/T127481) [13:23:11] (03PS3) 10BBlack: remove $site_tier, no longer used [puppet] - 10https://gerrit.wikimedia.org/r/274823 (https://phabricator.wikimedia.org/T127481) [13:24:44] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 88269 MB (3% inode=99%) [13:27:16] (03CR) 10BBlack: [C: 032] "compiler no-op (there really are no refs to this var anyways)" [puppet] - 10https://gerrit.wikimedia.org/r/274823 (https://phabricator.wikimedia.org/T127481) (owner: 10BBlack) [13:27:43] (03CR) 10BBlack: [C: 032] "compiler no-op on all relevant classes of hosts" [puppet] - 10https://gerrit.wikimedia.org/r/274824 (https://phabricator.wikimedia.org/T127481) (owner: 10BBlack) [13:31:45] (03PS5) 10BBlack: re-arrange cache ipsec for codfw as a backend [puppet] - 10https://gerrit.wikimedia.org/r/274825 (https://phabricator.wikimedia.org/T127481) [13:31:51] !log dumps/download wikimedia.org service interrupted now while server is being upgraded [13:31:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:37:10] (03PS2) 10Tim Landscheidt: ores: Move role classes to module role [puppet] - 10https://gerrit.wikimedia.org/r/270102 [13:39:03] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [13:39:30] !log canceling doomed bootstrap on restbase1009-a.eqiad.wmnet [13:39:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:40:48] (03PS5) 10Tim Landscheidt: Tools: Fix argument quoting in jlocal [puppet] - 10https://gerrit.wikimedia.org/r/266935 [13:41:36] (03PS3) 10Tim Landscheidt: Tools: Outfactor the configuration for outgoing HBA connections [puppet] - 10https://gerrit.wikimedia.org/r/267832 [13:41:45] !log disabling puppet on esams,ulsfo,codfw caches for ipsec changes, to minimize alertspam... [13:41:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:42:33] (03CR) 10BBlack: [C: 032] "Compiler output looks good on relevant hosts" [puppet] - 10https://gerrit.wikimedia.org/r/274825 (https://phabricator.wikimedia.org/T127481) (owner: 10BBlack) [13:43:14] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 90264 MB (3% inode=99%) [13:47:33] PROBLEM - puppet last run on cp2012 is CRITICAL: CRITICAL: Puppet last ran 14 hours ago [13:48:26] !log installing pillow security updates [13:48:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:49:24] RECOVERY - puppet last run on cp2012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:52:43] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 91317 MB (3% inode=99%) [14:02:43] !log puppet back online for all caches (ipsec changes complete) [14:02:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:04:06] !log installing postgres security updates on labsdb1004 [14:04:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:05:23] 6Operations, 10Traffic, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Switch ulsfo to backend to codfw rather than eqiad - https://phabricator.wikimedia.org/T127492#2088421 (10BBlack) [14:05:25] 6Operations, 10Traffic, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Enable VCL source-DC switching via confd - https://phabricator.wikimedia.org/T127482#2088422 (10BBlack) [14:05:27] 6Operations, 10Traffic, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Traffic Infrastructure support for Mar 2016 codfw rollout - https://phabricator.wikimedia.org/T125510#2088423 (10BBlack) [14:06:54] <_joe_> bblack: did you check the mc* hosts by any chance? [14:07:10] 6Operations, 10Traffic: Port varnishlog.py to new VSL API - https://phabricator.wikimedia.org/T128788#2088424 (10ema) a:3ema [14:07:18] _joe_: yes [14:08:03] (well by "check" I mean I tested them in compiler, and I watched their alerts in icinga for ipsec. I didn't log into them and look around really) [14:08:58] most of the changes are cache/kafka -only anyways, not mc. the only one that touched mc was the first one for dpdaction ( https://gerrit.wikimedia.org/r/#/c/274822/ ) [14:12:53] (03PS1) 10Ema: Port varnishlog to new VSL API [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) [14:12:54] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [14:13:29] mmmmm --^ [14:14:56] (03CR) 10jenkins-bot: [V: 04-1] Port varnishlog to new VSL API [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) (owner: 10Ema) [14:15:44] (03PS2) 10ArielGlenn: fix up dataset nginx confs for jessie, ipv6only defaults to on now (!) [puppet] - 10https://gerrit.wikimedia.org/r/274168 [14:16:17] !log installing perl security updates [14:16:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:17:01] (03CR) 10ArielGlenn: [C: 032] fix up dataset nginx confs for jessie, ipv6only defaults to on now (!) [puppet] - 10https://gerrit.wikimedia.org/r/274168 (owner: 10ArielGlenn) [14:19:20] 6Operations, 10DBA, 13Patch-For-Review: Puppetize pt-heartbeat on MariaDB10 masters and its corresponding checks on icinga - https://phabricator.wikimedia.org/T114752#2088435 (10jcrespo) pt-heartbeat is puppetized and in production on all main core, misc and labs servers. There are some minor pending tasks:... [14:20:07] !log web service restored for dumps/download.wikimedia.org [14:20:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:20:55] (03PS2) 10Ema: Port varnishlog to new VSL API [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) [14:21:07] 6Operations, 6Discovery, 10Wikimedia-Logstash, 3Discovery-Search-Sprint, and 2 others: Upgrade ElasticSearch to 1.7.5 - https://phabricator.wikimedia.org/T122697#2088439 (10Gehel) estest100{1..4}.eqiad.wmflabs have also been upgraded. Nobelium has not (as suggested by @EBernhardson) [14:22:28] (03CR) 10jenkins-bot: [V: 04-1] Port varnishlog to new VSL API [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) (owner: 10Ema) [14:22:31] 6Operations, 10DBA, 13Patch-For-Review: Puppetize pt-heartbeat on MariaDB10 masters and its corresponding checks on icinga - https://phabricator.wikimedia.org/T114752#2088441 (10jcrespo) [14:23:46] 6Operations, 10Traffic, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: Traffic Infrastructure support for Mar 2016 codfw rollout - https://phabricator.wikimedia.org/T125510#2088448 (10BBlack) Status Update: The first chunk of work is done: we can supposedly do all of the switching in steps 1-4 in the descrip... [14:26:10] (03CR) 10Gehel: "Looks good to me, and most of all not dangerous. We will have to review the base value for pool counter once we enable SSL (no idea what t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274834 (https://phabricator.wikimedia.org/T128761) (owner: 10EBernhardson) [14:26:17] (03CR) 10Gehel: [C: 032] Update CirrusSearch PoolCounter for cross-dc search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274834 (https://phabricator.wikimedia.org/T128761) (owner: 10EBernhardson) [14:26:38] (03CR) 10jenkins-bot: [V: 04-1] Update CirrusSearch PoolCounter for cross-dc search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274834 (https://phabricator.wikimedia.org/T128761) (owner: 10EBernhardson) [14:28:00] !log all services back in operation from dataset1001 [14:28:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:29:41] (03CR) 10Gehel: [C: 031] Update CirrusSearch PoolCounter for cross-dc search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274834 (https://phabricator.wikimedia.org/T128761) (owner: 10EBernhardson) [14:34:09] !log upgrade and restart dbstore2002 to apply new replication filters [14:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:39:17] (03PS3) 10Ema: Port varnishlog to new VSL API [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) [14:39:32] (pep8 is killing me) [14:40:30] (03CR) 10DCausse: [C: 031] Update CirrusSearch PoolCounter for cross-dc search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274834 (https://phabricator.wikimedia.org/T128761) (owner: 10EBernhardson) [14:41:10] expected 2 blank lines, found one: -1, will break production :-) [14:43:09] (03PS1) 10Muehlenhoff: Tweak server groups for debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/274951 [14:43:16] jynus: right :) [14:43:26] :-} [14:43:33] (03CR) 10Muehlenhoff: [C: 032 V: 032] Tweak server groups for debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/274951 (owner: 10Muehlenhoff) [14:43:37] I should do a pres about it [14:43:51] most editors should be able to run it for you [14:44:30] (except on emacs OS which lacks a proper editor) [14:44:43] I have linters for all languages I have to use on my editor [14:45:08] (03PS1) 10ArielGlenn: Revert "turn off dumps cron job in prep for second try dataset1001 upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/274953 [14:45:19] (03PS2) 10ArielGlenn: Revert "turn off dumps cron job in prep for second try dataset1001 upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/274953 [14:45:46] yeah I don't like to care about blank lines #YOLO [14:46:27] 6Operations, 6Analytics-Kanban, 10Traffic, 13Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2088466 (10Ottomata) I think we have control over the varnishkafka license, so if we go that route and need to change it, we can. [14:46:39] (03CR) 10ArielGlenn: [C: 032] Revert "turn off dumps cron job in prep for second try dataset1001 upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/274953 (owner: 10ArielGlenn) [14:52:21] 6Operations, 13Patch-For-Review, 7Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2088494 (10ArielGlenn) [14:52:23] 6Operations, 10Dumps-Generation, 13Patch-For-Review: Migrate dataset1001 and ms1001 to jessie - https://phabricator.wikimedia.org/T123724#2088492 (10ArielGlenn) 5Open>3Resolved And that's that. For future reference, anyone else needing to upgrade to an external nic may need to use the embedded nic for t... [14:56:52] !log rebooting iron to fix virtual console problem [14:56:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:59:32] (03CR) 10Ldaptestaccount123: [C: 04-1] "yes this would be duplicate afaik:" [puppet] - 10https://gerrit.wikimedia.org/r/274905 (https://phabricator.wikimedia.org/T114363) (owner: 1020after4) [14:59:57] 6Operations, 10Incident-Labs-NFS-20151216: Investigate need and candidate for labstore100(1|2) kernel upgrade - https://phabricator.wikimedia.org/T121903#1891298 (10MoritzMuehlenhoff) 4.4 should be available in jessie-wikimedia early next week [15:00:15] PROBLEM - Host iron is DOWN: PING CRITICAL - Packet loss = 100% [15:00:39] (03CR) 10Rush: "oops, made with my test account accidentally^" [puppet] - 10https://gerrit.wikimedia.org/r/274905 (https://phabricator.wikimedia.org/T114363) (owner: 1020after4) [15:05:26] RECOVERY - Host iron is UP: PING OK - Packet loss = 0%, RTA = 2.30 ms [15:06:20] 6Operations, 10ops-eqiad: No serial console on iron's mgmt interface - https://phabricator.wikimedia.org/T128845#2088515 (10Cmjohnson) 5Open>3Resolved This issue has been fixed, resolving the task [15:09:54] 6Operations, 10ops-eqiad: Rack and Initial setup db1074-79 - https://phabricator.wikimedia.org/T128753#2088519 (10Cmjohnson) Okay, having one set for replacing labsdb1002 is fine. labsdb1002 is in row C and I had 2 scheduled to go in the same rack so it will work out fine. We will name it labsdb1008. I am go... [15:13:47] 6Operations, 10ops-eqiad: Rack and Initial setup db1074-79 - https://phabricator.wikimedia.org/T128753#2088522 (10Volans) [15:13:53] 6Operations, 10Traffic, 13Patch-For-Review: Port varnishlog.py to new VSL API - https://phabricator.wikimedia.org/T128788#2088523 (10ema) p:5Triage>3Normal [15:15:07] (03CR) 10Hashar: "I have exchanged a bit on IRC. The Nodepool instances are broken with puppet related files disappearing (0 bytes). Filled it as T128846 " [puppet] - 10https://gerrit.wikimedia.org/r/274675 (https://phabricator.wikimedia.org/T128280) (owner: 10Mobrovac) [15:15:12] 6Operations, 10ops-eqiad: Rack and Initial setup db1074-79 - https://phabricator.wikimedia.org/T128753#2084820 (10Volans) FYI I just did 2 minor edit in the description: - add as a reminder the stripe size in the RAID step - replace the reference to es20* to db1074-79 [15:25:08] (03PS1) 10Muehlenhoff: Move dynamicproxy ferm rules into the novaproxy role [puppet] - 10https://gerrit.wikimedia.org/r/274962 [15:26:18] (03CR) 10jenkins-bot: [V: 04-1] Move dynamicproxy ferm rules into the novaproxy role [puppet] - 10https://gerrit.wikimedia.org/r/274962 (owner: 10Muehlenhoff) [15:28:17] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 90297 MB (3% inode=99%) [15:28:26] (03PS14) 10Ottomata: Replace limn::data::generate by reportupdater [puppet] - 10https://gerrit.wikimedia.org/r/273487 (https://phabricator.wikimedia.org/T127327) (owner: 10Mforns) [15:28:45] (03CR) 10Ottomata: [C: 032 V: 032] Replace limn::data::generate by reportupdater [puppet] - 10https://gerrit.wikimedia.org/r/273487 (https://phabricator.wikimedia.org/T127327) (owner: 10Mforns) [15:30:29] 6Operations, 10Traffic, 13Patch-For-Review: Upgrade to Varnish 4: things to remember - https://phabricator.wikimedia.org/T126206#2088570 (10Southparkfan) I know that you guys have way more knowledge of (upgrading) Varnish than I do (and Wikimedia's needs are totally different compared to mine), but I alread... [15:30:56] (03PS2) 10Ottomata: Add topic config for wmf.resource_change [puppet] - 10https://gerrit.wikimedia.org/r/274785 (https://phabricator.wikimedia.org/T126687) [15:33:39] 6Operations, 10DBA: External Storage on codfw (es2005-2010) is consuming 100-90GB of disk space per server and per month and it has 370GB available - https://phabricator.wikimedia.org/T119056#2088575 (10Volans) [15:34:29] (03CR) 10Tim Landscheidt: [C: 04-1] "The proxymanager port is only used for the Tools proxy, not the Labs one, so that would go into role::labs::tools::proxy if I understand y" [puppet] - 10https://gerrit.wikimedia.org/r/274962 (owner: 10Muehlenhoff) [15:34:53] 6Operations, 10DBA: External Storage on codfw (es2005-2010) is consuming 100-90GB of disk space per server and per month and it has 370GB available - https://phabricator.wikimedia.org/T119056#1816735 (10Volans) a:3Volans Data already migrated to new servers in related task T127330 Those will be put out of pr... [15:35:41] (03PS1) 10Ottomata: User proper user and require repo before cron in reportupdater::job [puppet] - 10https://gerrit.wikimedia.org/r/274964 [15:36:01] (03CR) 10Tim Landscheidt: "(And the rules for http and https would probably need to be duplicated to role::labs::tools::proxy and role::labs::novaproxy?)" [puppet] - 10https://gerrit.wikimedia.org/r/274962 (owner: 10Muehlenhoff) [15:36:58] (03CR) 10Ottomata: [C: 032] User proper user and require repo before cron in reportupdater::job [puppet] - 10https://gerrit.wikimedia.org/r/274964 (owner: 10Ottomata) [15:37:46] PROBLEM - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 90295 MB (3% inode=99%) [15:49:16] 6Operations, 10hardware-requests: eqiad: (3) nodes for Druid / analytics - https://phabricator.wikimedia.org/T128807#2088591 (10Ottomata) Hyperthreading: yes. [15:49:35] (03CR) 10Dzahn: "like others have said. already existing rules:" [puppet] - 10https://gerrit.wikimedia.org/r/274905 (https://phabricator.wikimedia.org/T114363) (owner: 1020after4) [15:54:34] what is "wikilabels"? [15:54:40] ACKNOWLEDGEMENT - Disk space on labvirt1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 90162 MB (3% inode=99%): cpettet andrew was in teh middle of balancing things. see email to ops yesterday. Im acking this and will let him know for today. [15:55:40] 6Operations, 10hardware-requests: eqiad: (3) nodes for Druid / analytics - https://phabricator.wikimedia.org/T128807#2088603 (10Ottomata) More cores and RAM for these in general is good. They will be mostly in memory processors and query analyzers. These are also a distributed cluster, so I think 3 is better... [15:57:29] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2088608 (10CCogdill_WMF) Seems like this this is fine by me as long as we're leaving donate@ untouched. If we revisit making changes to that alias, I just want a better sense of if/how this impacts our reply... [15:58:19] !log puppet disabled on stat1003 for reportupdater deployment, paused until dan is out of meetings [15:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:58:44] 6Operations, 10Mail, 10fundraising-tech-ops: donation aliases for moneybookers? - https://phabricator.wikimedia.org/T127489#2088609 (10CCogdill_WMF) I believe our moneybookers account is deactivated, but @MBeat33 please confirm. [16:05:12] (03CR) 10Ottomata: [C: 031] "one nit!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/274946 (https://phabricator.wikimedia.org/T128788) (owner: 10Ema) [16:05:39] (03PS3) 10Ottomata: Add topic config for wmf.resource_change [puppet] - 10https://gerrit.wikimedia.org/r/274785 (https://phabricator.wikimedia.org/T126687) [16:05:49] (03CR) 10Ottomata: [C: 032 V: 032] Add topic config for wmf.resource_change [puppet] - 10https://gerrit.wikimedia.org/r/274785 (https://phabricator.wikimedia.org/T126687) (owner: 10Ottomata) [16:07:27] 6Operations, 10hardware-requests: +1 'stat' type box for hadoop client usage - https://phabricator.wikimedia.org/T128808#2088639 (10Ottomata) If someone else has their eyes on WMF4541, I think we can live with one of the smaller systems. [16:19:39] (03PS2) 10EBernhardson: Use https to talk to elasticsearch in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274877 (https://phabricator.wikimedia.org/T124444) [16:25:28] !log changing in a hot way db1047 replication filters [16:25:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:27:34] (03PS2) 10EBernhardson: Update CirrusSearch PoolCounter for cross-dc search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274834 (https://phabricator.wikimedia.org/T128761) [16:27:53] 6Operations, 10DBA, 13Patch-For-Review: Puppetize pt-heartbeat on MariaDB10 masters and its corresponding checks on icinga - https://phabricator.wikimedia.org/T114752#2088685 (10jcrespo) dbstore2002 and db1047 fixed [16:31:11] (03CR) 10Paladox: [C: 031] "Yep this fixed phabricator." [puppet] - 10https://gerrit.wikimedia.org/r/274906 (https://phabricator.wikimedia.org/T128797) (owner: 1020after4) [16:34:10] 6Operations, 10Traffic, 13Patch-For-Review: Port varnishlog.py to new VSL API - https://phabricator.wikimedia.org/T128788#2088709 (10ema) Note that lots of tags have changed. Taking a quick look at varnishreqstats, for example, we need to fix: TxRequest, RxStatus, RxRequest, TxStatus, ReqEnd. I couldn't fi... [16:43:02] 6Operations, 6Services, 10hardware-requests: Hardware request for SCA and SCB in codfw - https://phabricator.wikimedia.org/T128475#2088723 (10GWicke) I'm wondering the same as @faidon. The requirements are those of generic CPU-bound / stateless services, of which we have quite a few. [16:44:22] (03PS1) 10Ema: Drop full stop from 403 error message [puppet] - 10https://gerrit.wikimedia.org/r/274978 [16:56:51] (03CR) 10Thcipriani: [C: 031] Parameterize the git_server variable in global scap.cfg [puppet] - 10https://gerrit.wikimedia.org/r/272947 (https://phabricator.wikimedia.org/T126259) (owner: 1020after4) [17:00:49] (03Abandoned) 1020after4: Ferm rule: allow deployment hosts to connect to iridium ssh (for scap) [puppet] - 10https://gerrit.wikimedia.org/r/274905 (https://phabricator.wikimedia.org/T114363) (owner: 1020after4) [17:01:35] 6Operations, 10CirrusSearch, 6Discovery, 3Discovery-Search-Sprint, and 4 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2088779 (10EBernhardson) In terms of persistent connections, I'm not sure if we have anything in HHVM for doing persistent SSL connections.... [17:02:40] 6Operations, 6Labs: labs precise instance not accessible after provisioning - https://phabricator.wikimedia.org/T117673#2088781 (10fgiunchedi) I've ran into this again just now with `monitoring-prometheus` instance which I've deleted a while ago but tried to reprovision just now. note the instance in this case... [17:02:53] 6Operations, 6Labs: labs precise and jessie instance not accessible after provisioning - https://phabricator.wikimedia.org/T117673#2088782 (10fgiunchedi) [17:05:28] (03PS2) 1020after4: Move phabricator/extensions to libext fixes T128797 [puppet] - 10https://gerrit.wikimedia.org/r/274906 (https://phabricator.wikimedia.org/T128797) [17:07:34] (03PS3) 1020after4: Move phabricator/extensions to libext fixes T128797 [puppet] - 10https://gerrit.wikimedia.org/r/274906 (https://phabricator.wikimedia.org/T128797) [17:08:05] (03CR) 1020after4: [C: 031] "rebased to get rid of dependency" [puppet] - 10https://gerrit.wikimedia.org/r/274906 (https://phabricator.wikimedia.org/T128797) (owner: 1020after4) [17:09:47] (03CR) 10Paladox: [C: 031] Move phabricator/extensions to libext fixes T128797 [puppet] - 10https://gerrit.wikimedia.org/r/274906 (https://phabricator.wikimedia.org/T128797) (owner: 1020after4) [17:10:47] (03CR) 10Dzahn: "@Tim Landscheidt thank you for the list of instances, i wanted to check on them to answer the question if puppet runs fine on all of them " [puppet] - 10https://gerrit.wikimedia.org/r/270102 (owner: 10Tim Landscheidt) [17:12:23] 6Operations, 10DBA, 13Patch-For-Review: Puppetize pt-heartbeat on MariaDB10 masters and its corresponding checks on icinga - https://phabricator.wikimedia.org/T114752#2088834 (10jcrespo) The others too, now. It was a combination of replication filters not having been updated (pending restart) and lacking per... [17:14:07] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#1997604 (10bd808) https://www.mediawiki.org/wiki/... [17:15:42] (03PS1) 10ArielGlenn: pull down wikitech dumps and serve them in 'other' datasets [puppet] - 10https://gerrit.wikimedia.org/r/274989 (https://phabricator.wikimedia.org/T128680) [17:15:46] (03CR) 10Dzahn: [C: 032] "merging since it should just puppetize the status quo and the fix is confirmed working" [puppet] - 10https://gerrit.wikimedia.org/r/274906 (https://phabricator.wikimedia.org/T128797) (owner: 1020after4) [17:18:19] (03CR) 10EBernhardson: [C: 032] "labs only change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274877 (https://phabricator.wikimedia.org/T124444) (owner: 10EBernhardson) [17:18:53] twentyafterfour: ping .. [17:18:53] Phab busted? "Include of '/srv/phab/libext/misc/__phutil_library_init__.php' failed!" [17:18:59] yes [17:18:59] yeah [17:18:59] twentyafterfour: Im getting this error Include of '/srv/phab/libext/misc/__phutil_library_init__.php' failed! [17:19:02] gut this too now [17:19:13] i merged the thing above that puppetized the hotfix [17:19:30] (03Merged) 10jenkins-bot: Use https to talk to elasticsearch in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274877 (https://phabricator.wikimedia.org/T124444) (owner: 10EBernhardson) [17:19:37] https://gerrit.wikimedia.org/r/#/c/274906/3 this broke it [17:20:07] i will wait a few moments ..or revert [17:20:52] phabricator is broken! :o [17:20:56] someone did a `git pull` in /srv/mediawiki-stanging as root on tin :( Anyone mind chown'ing everything back to wikidev group? [17:21:25] (03PS1) 10Dzahn: Revert "Move phabricator/extensions to libext fixes T128797" [puppet] - 10https://gerrit.wikimedia.org/r/274993 [17:21:39] ebernhardson, I wasn't, but let me fix it for you [17:21:46] (03PS1) 10Dzahn: Revert "Move phabricator/extensions to libext fixes T128797" [puppet] - 10https://gerrit.wikimedia.org/r/274994 [17:21:53] jynus: thanks! [17:22:02] crap, i get "Invalid application/json in request" from gerrit ? [17:22:35] (03CR) 10Dzahn: [C: 032] Revert "Move phabricator/extensions to libext fixes T128797" [puppet] - 10https://gerrit.wikimedia.org/r/274993 (owner: 10Dzahn) [17:22:36] ebernhardson, which part is broken? [17:22:59] or I can chgrp recursivelly all of it? [17:23:26] PROBLEM - https://phabricator.wikimedia.org on iridium is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string Wikimedia and MediaWiki not found on https://phabricator.wikimedia.org:443https://phabricator.wikimedia.org/ - 2052 bytes in 0.761 second response time [17:23:53] jynus: checking find, looks to be limited to .git/objects directory [17:24:06] who broke Phab? [17:24:12] Wtf [17:24:15] I see some owned by nwdeploy [17:24:16] Elitre: not a useful question [17:24:22] Didn't do it [17:24:30] so I did not want to do it blindly [17:24:46] (was intended as a joke. nm.) [17:24:54] twentyafterfour: mutante merged https://gerrit.wikimedia.org/r/#/c/274906/3 and things went boom! [17:25:02] Elitre: jokes for after it's fixed, not during [17:25:07] twentyafterfour: ^ that, and i just reverted right now [17:25:22] twentyafterfour: i hope reverting is just fine.. a few seconds... [17:25:27] RECOVERY - https://phabricator.wikimedia.org on iridium is OK: HTTP OK: HTTP/1.1 200 OK - 22153 bytes in 0.210 second response time [17:25:28] jynus: maybe `find .git/objects -gid 0 -0 | xargs -0 chgrp mwdeploy` [17:25:29] !log chgrp recursive on tin to wikidev on .git/objects [17:25:31] there, back [17:25:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:26:06] thanks mutante [17:26:12] err, yea not mwdeploy the group is wikidev [17:26:33] jynus: thanks! [17:26:51] I wonder why it broke. [17:26:54] (03PS1) 10ArielGlenn: lock wikis for dump runs by date, permitting runs across multiple dates [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/274997 (https://phabricator.wikimedia.org/T126341) [17:27:02] so yea, it was supposed to just puppetize the existing fix [17:27:14] and the fix was confirmed working [17:27:38] Yeah let me check it out [17:27:45] Sorry about that [17:27:57] the only thing that would have been undone by reverting was the symlink, so it must still be needed [17:27:57] ebernhardson, try now [17:28:02] !log ebernhardson@tin Synchronized wmf-config/CirrusSearch-labs.php: prod nop, enables https in beta cluster for elasticsearch connections (duration: 00m 33s) [17:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:28:15] jynus: git fetch worked great now [17:28:48] I should have checked the owner first and bite him :-) [17:29:03] thcipriani: Thanks you for getting the cdb build thing fixed in scap. 33s sync times are much nicer than 2m30s sync times. :) [17:29:20] bd808: :D [17:29:35] (glad it's now _actually_ working :P) [17:29:36] thcipriani: put on your annual review that you made scap 400% faster [17:29:53] hey, I broke mediawiki-staging once and fix it once, it cancels, right? [17:29:59] right? [17:30:03] lol [17:30:06] jynus: totally [17:30:42] after all, I break mysql more than I fix it, and everybody is ok with that! [17:32:50] as long as the end state is fixed :-P [17:33:54] apergos: but [[WP:DEADLINE]] ;) [17:34:40] by which I read: "don't rush to break the site" [17:34:43] true! [17:38:24] (03Abandoned) 10Muehlenhoff: Drop references to the source package for the rt flavour [debs/linux44] - 10https://gerrit.wikimedia.org/r/274920 (owner: 10Muehlenhoff) [17:40:33] phabricator is back, I see [17:40:59] yes, it's back, the change has been reverted [17:41:10] should have updated that topic earlier [17:42:21] that's the disadvantage when phab wents down: You can't create a UBN task :D [17:42:33] :-D [17:42:39] some might see that as an advantage [17:43:07] * Luke081515 created a lot of UBN tasks, so he gots routine [17:43:08] you have to create it on Etherpad in that case :) [17:43:19] or at phab-01? :D [17:43:40] or just jump up and down in the channel a lot [17:43:55] Luke081515: heh, yea, but 2, 3, many phabs [17:44:17] that's why I updated the topic fast, because in this case people see, that this is a known issue [17:44:41] mutante: Yeah, I thought about that, a phab for a phab for a phab.... t obe continued :D [17:44:41] hm [17:44:45] I might be done for the day [17:44:50] * apergos contemplates [17:46:26] 6Operations, 10DBA, 13Patch-For-Review: Puppetize pt-heartbeat on MariaDB10 masters and its corresponding checks on icinga - https://phabricator.wikimedia.org/T114752#2088918 (10jcrespo) [17:46:46] (03PS1) 10Andrew Bogott: Fix usage statement for live-migrate virtscript [puppet] - 10https://gerrit.wikimedia.org/r/275008 [17:46:53] Luke081515: yes, thank you for updating the topic, that's useful [17:47:37] np ;) [17:48:36] 6Operations, 10Mail, 10fundraising-tech-ops: donation aliases for moneybookers? - https://phabricator.wikimedia.org/T127489#2088936 (10MBeat33) @Dzahn Yes, Moneybookers is no longer an active payment method, so we don't need the aliases [17:48:59] Luke081515: and about labs, i mean the "cattle not pets"-approach, ideally it's so easy to create a new one that you don't need to become attached to a particular instance. and also you probably want different kinds. one that is serious staging and always like prod, another to test the new version [17:52:39] 6Operations, 6Labs, 10Labs-Infrastructure: Estimate hardware requirements for relevance lab elasticsearch servers - https://phabricator.wikimedia.org/T128433#2088977 (10TJones) I don't have strong opinions on the specific specs (these machines are all studly beyond my ken), but it sounds like something a lit... [17:54:31] 6Operations, 6Research-and-Data, 10Wikimedia-Mailing-lists: Close / Archive rcom-l - https://phabricator.wikimedia.org/T128141#2088995 (10DarTar) 5Open>3Resolved super, thank you. [18:05:40] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add ferm rules for kartotherian, tilerator and tileratorui (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/274936 (owner: 10Muehlenhoff) [18:06:29] 6Operations, 10Mail, 10fundraising-tech-ops: donation aliases for moneybookers? - https://phabricator.wikimedia.org/T127489#2089101 (10Dzahn) @CCogdill_WMF @MBeat33 great! thank you very much. I just removed them and will close this ticket. it's all separate from the other alias related tickets ``` -# Need... [18:06:45] 6Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#2089105 (10Dzahn) [18:06:47] 6Operations, 10Mail, 10fundraising-tech-ops: donation aliases for moneybookers? - https://phabricator.wikimedia.org/T127489#2089103 (10Dzahn) 5Open>3Resolved a:3Dzahn [18:08:24] 6Operations, 10Mail: status of fdcsupport@ ? - https://phabricator.wikimedia.org/T127548#2089110 (10Dzahn) @bbogaert this could be one of the next ones, just 2 people on this fdcsupport@ [18:10:53] 6Operations, 10Mail: status of wikigroup@ alias - https://phabricator.wikimedia.org/T127551#2089119 (10Dzahn) @bbogaert this has a bit of an unfortuname name because it's so generic. it's actually used as a contact address for our internal travel@ people. maybe we can ask Doreen how much trouble it would to re... [18:11:24] (03CR) 10Muehlenhoff: [C: 032 V: 032] Disable rt flavour on the source package level [debs/linux44] - 10https://gerrit.wikimedia.org/r/274929 (owner: 10Muehlenhoff) [18:11:49] (03CR) 10Smalyshev: Add caching headers for nginx (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) (owner: 10Smalyshev) [18:12:05] (03CR) 10Muehlenhoff: [C: 032 V: 032] Regenerate rules/control files after disabling rt flavour [debs/linux44] - 10https://gerrit.wikimedia.org/r/274930 (owner: 10Muehlenhoff) [18:12:56] (03PS4) 10Smalyshev: Add caching headers for nginx [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) [18:13:58] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2089122 (10Samat) And [[ https://hu.wikipedia.org... [18:15:49] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2089126 (10Dzahn) Thank you for confirming, yes. This change was just meant to be for our internal group aliases fr-tech, fr-online, fr-development etc. so that when people join and leave teams you don't have... [18:18:49] greg-g: nuria: your approvals are requested on https://phabricator.wikimedia.org/T128666 [18:20:26] PROBLEM - check_payments_wiki on payments2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:25:16] RECOVERY - check_payments_wiki on payments2002 is OK: HTTP OK: HTTP/1.1 200 OK - 226 bytes in 0.030 second response time [18:29:21] 6Operations, 10Mail: move fundraising group aliases to OIT - https://phabricator.wikimedia.org/T128647#2089197 (10CCogdill_WMF) Sounds good, thanks for clarifying! [18:30:29] Decline https://phabricator.wikimedia.org/T105422 ? [18:31:51] 6Operations, 10Ops-Access-Requests: Access to deployment group for user madhuvishy - https://phabricator.wikimedia.org/T128666#2089218 (10greg) Approve, we'll go through training before @madhuvishy starts SWATing [18:32:56] Krinkle: not sure, i just personally didn't want to do it anymore. it's like nobody really asks for it but it would mean having to schedule downtime [18:33:14] Krinkle: ..and we want it to be replaced by stream hopefullh [18:33:29] 6Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2089222 (10Volans) MySQL allows also to specify different SSL parameters just for the replica connection from a slave to a master through the `CHANGE MASTER TO` syntax, but it doesn't help t... [18:34:18] it did not get created because somebody really needed it, only because we went through lists of all services [18:34:54] 6Operations, 6Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Wikitechwiki has 4xx responses to requests for some static assets inc. poweredby_mediawiki_88x31.png and WikiEditor's button-sprite.svg - https://phabricator.wikimedia.org/T128747#2089235 (10Krinkle) [18:35:26] PROBLEM - check_payments_wiki on payments2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:35:46] Jeff_Green: ? [18:35:48] 6Operations, 6Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Wikitechwiki has 4xx responses to requests for some static assets inc. poweredby_mediawiki_88x31.png and WikiEditor's button-sprite.svg - https://phabricator.wikimedia.org/T128747#2084612 (10Krinkle) Check list: * `public-wiki-rewrites.incl`... [18:35:53] (03PS2) 10Dzahn: pull down wikitech dumps and serve them in 'other' datasets [puppet] - 10https://gerrit.wikimedia.org/r/274989 (https://phabricator.wikimedia.org/T128680) (owner: 10ArielGlenn) [18:35:56] apergos: looking [18:35:59] k [18:37:40] (03CR) 10Dzahn: [C: 032] pull down wikitech dumps and serve them in 'other' datasets [puppet] - 10https://gerrit.wikimedia.org/r/274989 (https://phabricator.wikimedia.org/T128680) (owner: 10ArielGlenn) [18:38:26] 6Operations, 10Wikimedia-IRC-RC-Server, 7IPv6, 13Patch-For-Review: enable IPv6 on irc.wikimedia.org - https://phabricator.wikimedia.org/T105422#2089257 (10Krinkle) 5Open>3declined [18:39:14] 6Operations, 10Wikimedia-IRC-RC-Server, 7IPv6, 13Patch-For-Review: enable IPv6 on irc.wikimedia.org - https://phabricator.wikimedia.org/T105422#2089261 (10Dzahn) [18:39:18] 6Operations: schedule maintenance for IRC server - https://phabricator.wikimedia.org/T105804#2089260 (10Dzahn) 5Open>3declined [18:39:35] 6Operations, 10Wikimedia-IRC-RC-Server, 7IPv6: enable IPv6 on irc.wikimedia.org - https://phabricator.wikimedia.org/T105422#1443503 (10Dzahn) [18:40:26] PROBLEM - check_payments_wiki on payments2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:26] i'm not sure why it's erroring yet, but will ack that [18:42:42] ACKNOWLEDGEMENT - check_payments_wiki on payments2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds Jeff_Green noncritical, investigating [18:43:28] k [18:43:46] RECOVERY - check_payments_wiki on payments2002 is OK: HTTP OK: HTTP/1.1 200 OK - 226 bytes in 0.038 second response time [18:44:04] weird [18:44:16] PROBLEM - check_mysql on payments2002 is CRITICAL: Access denied for user root@localhost (using password: NO) [18:44:37] 6Operations, 10Ops-Access-Requests: Can't access piwik.wikimedia.org via ldap - https://phabricator.wikimedia.org/T128885#2089274 (10JMinor) [18:45:16] RECOVERY - check_mysql on payments2002 is OK: Uptime: 694070 Threads: 1 Questions: 118304 Slow queries: 0 Opens: 36 Flush tables: 1 Open tables: 29 Queries per second avg: 0.170 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [18:46:37] RECOVERY - Disk space on labvirt1008 is OK: DISK OK [18:47:59] 6Operations, 6Services, 10hardware-requests: Hardware request for SCA and SCB in codfw - https://phabricator.wikimedia.org/T128475#2089324 (10mobrovac) I guess that if/when we venture into the abyss of container, we could easily repurpose them for that, right? [18:50:24] payments2002 got rebooted somehow [18:50:45] 6Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 6Labs, and 2 others: copy wikitech dumps to dumps server ? - https://phabricator.wikimedia.org/T128680#2089333 (10Dzahn) i merged that but did not see the cron entry yet, needs confirmation on dataset1001 [18:53:11] 6Operations, 10Ops-Access-Requests: Can't access piwik.wikimedia.org via ldap - https://phabricator.wikimedia.org/T128885#2089274 (10Krenair) It's piwik.wikimedia.org LDAP login requires certain groups. It doesn't just let *any* LDAP user in - I think this service contains private data [18:53:28] 6Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 6Labs, and 2 others: copy wikitech dumps to dumps server ? - https://phabricator.wikimedia.org/T128680#2089371 (10ArielGlenn) I think you checked it before running puppet. I did a run and it's there now. [18:54:13] nm. i'm just losing my mind. [18:54:38] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [18:54:54] 6Operations, 6Release-Engineering-Team, 10scap, 10Scap3 (Scap3-MediaWiki-MVP): Depool proxies temporarily while scap is ongoing to avoid taxing those nodes - https://phabricator.wikimedia.org/T125629#2089377 (10mmodell) [18:55:18] 6Operations, 10scap, 10Scap3 (Scap3-MediaWiki-MVP): Move scap target configuration to etcd - https://phabricator.wikimedia.org/T115899#2089394 (10mmodell) [18:55:27] 6Operations, 6Performance-Team, 10scap, 7HHVM, 10Scap3 (Scap3-MediaWiki-MVP): Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#2089398 (10mmodell) [18:55:43] 6Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 6Labs, and 2 others: copy wikitech dumps to dumps server ? - https://phabricator.wikimedia.org/T128680#2089420 (10Dzahn) odd, i ran puppet even more than once. but there it is now indeed :) [18:56:02] mutante: is that you puppet unmerged? [18:56:04] (03Abandoned) 10Dzahn: Revert "Move phabricator/extensions to libext fixes T128797" [puppet] - 10https://gerrit.wikimedia.org/r/274994 (owner: 10Dzahn) [18:57:10] volans: no, i want everything to be merged and it is on palladium [18:57:34] yeah it was the dataset cron job [18:57:40] somehoe didn't make it to strontium [18:57:44] fixed [18:57:45] that also explains [18:57:53] why i couldt see the change on dataset1001 but you could now .. [18:57:53] hrmm [18:57:57] wish we knew how to fix that bug [18:58:01] yes [18:58:17] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [18:58:22] * volans missing context here :) [18:58:25] volans: yea, it's a bug with the sync between the 2 masters .:( [18:58:33] we dont want that to happen [18:58:37] once in a while you do puppet-merge on palladioum [18:58:44] and for whatever reason stronium won't take the changes [18:58:45] we were just talking in PM about that change i merged [18:58:55] and that i could not see puppet making the expected change [18:59:01] when it was clearly merged [18:59:14] so when I went to strontium I saw the patchset (running puppet-merge) and it was indeed the wikitech dumps above [18:59:20] and the reason was this, change was merged on palladium, but not on strotium [18:59:25] and that server talks to strontium [18:59:32] make sense now, thanks [18:59:44] just wait til we have three puppet masters to choose from :-D [18:59:48] :p [19:00:03] it's something custom or puppet "magic"? [19:00:37] custom [19:00:42] it's a little shell script [19:00:49] with some git pull action going on etc [19:00:59] post commit hook [19:01:12] ./modules/puppetmaster/files/puppet-merge [19:01:21] https://wikitech.wikimedia.org/wiki/Puppet#Updating_operations.2Fpuppet_for_production_nodes [19:01:30] that part about the puppet-merge wrapper script ... [19:03:44] * volans looking [19:09:03] mutante: by any chance you committed on a git submodule? [19:09:48] 6Operations, 6Services, 10hardware-requests: Hardware request for SCA and SCB in codfw - https://phabricator.wikimedia.org/T128475#2089514 (10RobH) >>! In T128475#2084761, @faidon wrote: > Same as the terbium replacement concern here: these boxes aren't very special — can't we just expand our spares pool ins... [19:10:00] volans: no, that was just in regular ops/puppet [19:10:38] ok, in the git hook there is no retry, but any output should come your way if we are not using --quiet in the merge AFAIK [19:17:25] the login is there at 18:37 UTC, so I guess something might have give some error during the execution of the remote command through ssh [19:22:21] 6Operations, 10Traffic, 7HTTPS: SSL cert needed for benefactorevents.wikimedia.org - https://phabricator.wikimedia.org/T115028#2089525 (10CCogdill_WMF) a:3CCogdill_WMF Claiming this task since I think the ball is in my court. Eric at Trilogy confirmed he received the cert and will have SNI set up for us by... [19:31:30] 6Operations: Randomly failing puppetmaster sync to strontium - https://phabricator.wikimedia.org/T128895#2089539 (10Volans) [19:34:10] (03PS1) 10Volans: Git: add logging to post-merge hook on puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/275031 (https://phabricator.wikimedia.org/T128895) [19:35:45] (03PS1) 10Dzahn: (WIP) move some microsites roles into a common module [puppet] - 10https://gerrit.wikimedia.org/r/275034 [19:35:50] 6Operations: Sudden increase in NOTICE events from hhvm while trying to de-pool rdb1003 for maintenance - https://phabricator.wikimedia.org/T128730#2089572 (10aaron) How would the queues on 1003 drain if it's depooled? Wouldn't they just be left as they were before? [19:37:03] (03CR) 10Dzahn: "looks good, but missing the trailing ' ?" [puppet] - 10https://gerrit.wikimedia.org/r/275031 (https://phabricator.wikimedia.org/T128895) (owner: 10Volans) [19:37:43] (03CR) 10Dzahn: [C: 031] "nevermind" [puppet] - 10https://gerrit.wikimedia.org/r/275031 (https://phabricator.wikimedia.org/T128895) (owner: 10Volans) [19:38:59] I like.. do it (volans) [19:39:13] yep [19:39:36] (03CR) 10ArielGlenn: [C: 031] Git: add logging to post-merge hook on puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/275031 (https://phabricator.wikimedia.org/T128895) (owner: 10Volans) [19:48:55] 6Operations: Sudden increase in NOTICE events from hhvm while trying to de-pool rdb1003 for maintenance - https://phabricator.wikimedia.org/T128730#2089627 (10aaron) The SearchExactMatchRescorer seems unrelated, expecially given all the random spikes at https://logstash.wikimedia.org/#dashboard/temp/AVNDLJInO3D7... [19:49:00] (03CR) 10Volans: [C: 032] Git: add logging to post-merge hook on puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/275031 (https://phabricator.wikimedia.org/T128895) (owner: 10Volans) [19:51:12] 6Operations: reinstall bast2001 with jessie - https://phabricator.wikimedia.org/T128899#2089630 (10Dzahn) [19:57:59] 6Operations: reinstall bast2001 with jessie - https://phabricator.wikimedia.org/T128899#2089665 (10Dzahn) a:3Dzahn [19:58:48] (03PS2) 10ArielGlenn: puppetize the install_console script [puppet] - 10https://gerrit.wikimedia.org/r/274101 [20:00:26] (03CR) 10ArielGlenn: [C: 032] puppetize the install_console script [puppet] - 10https://gerrit.wikimedia.org/r/274101 (owner: 10ArielGlenn) [20:00:44] !log Added logging to post-merge hook on palladium T128895 [20:00:45] T128895: Randomly failing puppetmaster sync to strontium - https://phabricator.wikimedia.org/T128895 [20:00:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:04:43] (03PS1) 10Ottomata: Add logrotate for reportupdater jobs [puppet] - 10https://gerrit.wikimedia.org/r/275038 (https://phabricator.wikimedia.org/T127327) [20:05:52] (03CR) 10jenkins-bot: [V: 04-1] Add logrotate for reportupdater jobs [puppet] - 10https://gerrit.wikimedia.org/r/275038 (https://phabricator.wikimedia.org/T127327) (owner: 10Ottomata) [20:06:57] (03PS2) 10Ottomata: Add logrotate for reportupdater jobs [puppet] - 10https://gerrit.wikimedia.org/r/275038 (https://phabricator.wikimedia.org/T127327) [20:09:17] (03PS1) 10Ori.livneh: Hard-code the dynamic extension path for HHVM [puppet] - 10https://gerrit.wikimedia.org/r/275042 [20:13:39] (03PS2) 10Ori.livneh: Hard-code the dynamic extension path for HHVM [puppet] - 10https://gerrit.wikimedia.org/r/275042 [20:14:03] (03PS3) 10Ottomata: Add logrotate for reportupdater jobs [puppet] - 10https://gerrit.wikimedia.org/r/275038 (https://phabricator.wikimedia.org/T127327) [20:14:10] (03CR) 10Ottomata: [C: 032] Add logrotate for reportupdater jobs [puppet] - 10https://gerrit.wikimedia.org/r/275038 (https://phabricator.wikimedia.org/T127327) (owner: 10Ottomata) [20:14:20] (03CR) 10Ottomata: [V: 032] Add logrotate for reportupdater jobs [puppet] - 10https://gerrit.wikimedia.org/r/275038 (https://phabricator.wikimedia.org/T127327) (owner: 10Ottomata) [20:19:08] (03PS1) 10Volans: Git: create the log file for T128895 [puppet] - 10https://gerrit.wikimedia.org/r/275044 (https://phabricator.wikimedia.org/T128895) [20:25:50] (03PS3) 10Ori.livneh: Hard-code the dynamic extension path for HHVM [puppet] - 10https://gerrit.wikimedia.org/r/275042 [20:26:12] (03CR) 10Ori.livneh: [C: 032 V: 032] Hard-code the dynamic extension path for HHVM [puppet] - 10https://gerrit.wikimedia.org/r/275042 (owner: 10Ori.livneh) [20:26:28] _joe_: check out the commit message on that when you have a chance ^ [20:32:02] (03PS2) 10Andrew Bogott: Fix usage statement for live-migrate virtscript [puppet] - 10https://gerrit.wikimedia.org/r/275008 [20:32:28] (03CR) 10ArielGlenn: [C: 031] Git: create the log file for T128895 [puppet] - 10https://gerrit.wikimedia.org/r/275044 (https://phabricator.wikimedia.org/T128895) (owner: 10Volans) [20:32:33] (03PS2) 10Volans: Git: create the log file for T128895 [puppet] - 10https://gerrit.wikimedia.org/r/275044 (https://phabricator.wikimedia.org/T128895) [20:33:02] (03PS1) 10Dzahn: install-server: switch bast2001 to jessie [puppet] - 10https://gerrit.wikimedia.org/r/275045 (https://phabricator.wikimedia.org/T128899) [20:33:46] (03CR) 10Andrew Bogott: [C: 032] Fix usage statement for live-migrate virtscript [puppet] - 10https://gerrit.wikimedia.org/r/275008 (owner: 10Andrew Bogott) [20:33:51] (03CR) 10Volans: [C: 032] Git: create the log file for T128895 [puppet] - 10https://gerrit.wikimedia.org/r/275044 (https://phabricator.wikimedia.org/T128895) (owner: 10Volans) [20:34:16] (03PS3) 10Volans: Git: create the log file for T128895 [puppet] - 10https://gerrit.wikimedia.org/r/275044 (https://phabricator.wikimedia.org/T128895) [20:34:52] andrewbogott: the permission denied is known, I was just rebasing to merge the fix [20:36:02] volans: ah, good to know [20:36:14] sorry about that [20:36:23] you beat me on time :) [20:37:00] andrewbogott, I have another 18GB instance called labs-dynamicproxy.openstack.eqiad.wmflabs [20:37:12] I don't remember why [20:37:49] (I know why I have visualeditor-switch.visualeditor.eqiad.wmflabs and otrs-test.otrs.eqiad.wmflabs) [20:38:04] (03CR) 10Tjones: [C: 04-1] "We need to specify which languages we want TextCat to use on enwiki. The most recent list is English, Spanish, Chinese, Portuguese, Arabic" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/268048 (https://phabricator.wikimedia.org/T121542) (owner: 10EBernhardson) [20:38:10] Krenair: want to just delete it, or do you think it might be good for something? [20:38:53] It might be good for testing dynamicproxy changes if I ever got around to making that feature I requested [20:39:32] on the other hand, maybe I have nothing special here [20:39:41] and could just create a new one when needed [20:40:13] looks like I used this back in September/October [20:40:25] Krenair: up to you. In my case I usually just wind up building new instances anyway since I can never remember what I was doing with the old ones :) [20:42:23] oh, I remember this [20:45:13] it was https://phabricator.wikimedia.org/T69927 [20:46:20] andrewbogott, only thing left for that bug is someone with access to the real labs dynamicproxy to delete the old broken entries [20:46:30] will delete my test instance [20:47:08] Krenair: I’ll keep that bug open in my browser, maybe I will get to it today. Maybe. [20:48:57] So why are some instances created by novaadmin? [20:49:06] test-labvirt1011-103.testlabs ? [20:49:23] probably testing vm's for new boxes or post maint [20:54:27] PROBLEM - puppet last run on cp2019 is CRITICAL: CRITICAL: puppet fail [20:55:16] (03PS2) 10Dzahn: install-server: switch bast2001 to jessie [puppet] - 10https://gerrit.wikimedia.org/r/275045 (https://phabricator.wikimedia.org/T128899) [20:55:22] (03CR) 10Dzahn: [C: 032] install-server: switch bast2001 to jessie [puppet] - 10https://gerrit.wikimedia.org/r/275045 (https://phabricator.wikimedia.org/T128899) (owner: 10Dzahn) [21:01:47] !log bast2001 - rebooting into PXE for T128899 [21:01:48] T128899: reinstall bast2001 with jessie - https://phabricator.wikimedia.org/T128899 [21:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:02:41] 6Operations, 13Patch-For-Review: Randomly failing puppetmaster sync to strontium - https://phabricator.wikimedia.org/T128895#2089867 (10Volans) Added some additional logging to the post-merge hook so that next time it fails we should have some additional info [21:14:47] (03PS2) 10Ottomata: Add oozie-ste.xml configuration to handle SLAs [puppet/cdh] - 10https://gerrit.wikimedia.org/r/274720 (owner: 10Joal) [21:15:17] (03PS2) 10EBernhardson: Create pool counter for CirrusSearch completion suggester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/268029 (https://phabricator.wikimedia.org/T125547) [21:15:19] (03CR) 10Ottomata: [C: 032 V: 032] "Cool! I have cdh oozie related changes to do too. Will update the puppet submodule with this and try it out on Monday." [puppet/cdh] - 10https://gerrit.wikimedia.org/r/274720 (owner: 10Joal) [21:17:57] !log bast2001 - revoke and sign new puppet cert / salt keys [21:18:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:22:18] RECOVERY - puppet last run on cp2019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:28:37] (03PS2) 10Jdlrobson: Enable reference storage on Japanese Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274470 (https://phabricator.wikimedia.org/T126802) [21:28:39] (03PS1) 10Jdlrobson: Enable reference storage on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/275058 [21:29:32] !log bast2001 - reinstalled with jessie, fingerprints on https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/bast2001.wikimedia.org [21:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:36:53] PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 58.33% of data above the critical threshold [5000000.0] [21:38:22] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 61.90% of data above the critical threshold [5000000.0] [21:40:33] RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 50.00% above the threshold [1000000.0] [21:42:22] 6Operations, 6Labs: labs precise and jessie instance not accessible after provisioning - https://phabricator.wikimedia.org/T117673#2090002 (10Andrew) The initial point of failure in all these cases seems to be that 'hostname -d' fails. That causes a cascade of disaster since things like cert names &c are base... [21:48:01] PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 59.09% of data above the critical threshold [5000000.0] [21:49:31] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [21:51:41] RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 50.00% above the threshold [1000000.0] [21:58:29] !log bast2001 if your ssh client shows the fingerprint as base64 SHA256, the new default, you can ssh -o FingerprintHash=md5 bast2001.wikimedia.org to compare [21:58:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:01:06] 6Operations, 10Analytics, 10hardware-requests, 13Patch-For-Review: eqiad: (3) AQS replacement nodes - https://phabricator.wikimedia.org/T124947#2090071 (10RobH) Agreed, we expect systems to typically last for three years. I'll move ahead on getting a quote to give you at least 8TB usable space after raid1... [22:05:12] 6Operations: reinstall bast2001 with jessie - https://phabricator.wikimedia.org/T128899#2090078 (10Dzahn) [22:06:00] (03PS3) 10EBernhardson: Update CirrusSearch PoolCounter for cross-dc search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/274834 (https://phabricator.wikimedia.org/T128761) [22:17:08] 6Operations, 13Patch-For-Review: reinstall bast4001 with jessie - https://phabricator.wikimedia.org/T123674#2090117 (10Dzahn) {F3522807} [22:17:43] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2090118 (10Legoktm) Hrm...so once we remove the F... [22:24:23] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2090145 (10Legoktm) I think fully reverting {30f3... [22:29:15] !Ran P2709 against DB manually to work around T127693 [22:29:15] P2709 (An Untitled Masterwork) - https://phabricator.wikimedia.org/P2709 [22:29:15] T127693: Flow board move requiring allowCreation fails on zh.wp - https://phabricator.wikimedia.org/T127693 [22:29:22] !log Ran P2709 against DB manually to work around T127693 [22:29:23] P2709 (An Untitled Masterwork) - https://phabricator.wikimedia.org/P2709 [22:29:23] T127693: Flow board move requiring allowCreation fails on zh.wp - https://phabricator.wikimedia.org/T127693 [22:29:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:35:21] PROBLEM - Disk space on restbase1005 is CRITICAL: DISK CRITICAL - free space: /var 110967 MB (3% inode=99%) [22:35:38] urandom: ^^ [22:37:53] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2090177 (10Jdlrobson) @legoktm a full revert isn'... [22:41:51] !log restbase1005: `nodetool stop -- CLEANUP; nodetool stop -- COMPACTION` [22:41:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:42:33] RECOVERY - Disk space on restbase1005 is OK: DISK OK [22:43:22] 6Operations, 10MobileFrontend, 10Traffic, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2090218 (10Legoktm) Uh. We need LESS hacks, not m... [22:44:34] (03CR) 10Gehel: [C: 031] "Looks good to me..." [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) (owner: 10Smalyshev) [23:02:20] (03PS1) 10Dzahn: ganglia: set $sites in labs hiera for testing [puppet] - 10https://gerrit.wikimedia.org/r/275114 [23:03:11] (03CR) 10Dzahn: [C: 032] ganglia: set $sites in labs hiera for testing [puppet] - 10https://gerrit.wikimedia.org/r/275114 (owner: 10Dzahn) [23:03:19] (03CR) 10Dzahn: [V: 032] ganglia: set $sites in labs hiera for testing [puppet] - 10https://gerrit.wikimedia.org/r/275114 (owner: 10Dzahn) [23:05:49] (03PS1) 10BBlack: caches: remove backend_scaled_weights [puppet] - 10https://gerrit.wikimedia.org/r/275115 (https://phabricator.wikimedia.org/T125485) [23:05:51] (03PS1) 10BBlack: wikimedia-common VCL: remove static backend weighting [puppet] - 10https://gerrit.wikimedia.org/r/275116 (https://phabricator.wikimedia.org/T127484) [23:05:53] (03PS1) 10BBlack: varnish: get rid of backend_options [puppet] - 10https://gerrit.wikimedia.org/r/275117 (https://phabricator.wikimedia.org/T127484) [23:05:55] (03PS1) 10BBlack: varnish: allow director backends to be single-value again [puppet] - 10https://gerrit.wikimedia.org/r/275118 (https://phabricator.wikimedia.org/T127484) [23:05:57] (03PS1) 10BBlack: r::c::config: remove has_ganglia [puppet] - 10https://gerrit.wikimedia.org/r/275119 (https://phabricator.wikimedia.org/T127484) [23:05:59] (03PS1) 10BBlack: r::c::config: remove lvs::configuration include [puppet] - 10https://gerrit.wikimedia.org/r/275120 (https://phabricator.wikimedia.org/T127484) [23:06:01] (03PS1) 10BBlack: r::c::config: remove parsoid (unused) [puppet] - 10https://gerrit.wikimedia.org/r/275121 (https://phabricator.wikimedia.org/T127484) [23:06:03] (03PS1) 10BBlack: r::c::config: add restbase @ codfw [puppet] - 10https://gerrit.wikimedia.org/r/275122 (https://phabricator.wikimedia.org/T127484) [23:06:05] (03PS1) 10BBlack: r::c::config: move to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275123 (https://phabricator.wikimedia.org/T127484) [23:06:07] (03PS1) 10BBlack: varnishes: control applayer DC routing from hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275124 (https://phabricator.wikimedia.org/T127484) [23:06:32] (03PS1) 10Dzahn: ganglia: fix format for $sites in labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/275125 [23:07:45] (03CR) 10Dzahn: [C: 032 V: 032] ganglia: fix format for $sites in labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/275125 (owner: 10Dzahn) [23:23:11] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: puppet fail [23:26:07] (03PS1) 10Dzahn: ganglia: set site in labs to eqiad to avoid error with ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/275130 [23:26:32] (03PS2) 10BBlack: r::c::config: move to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275123 (https://phabricator.wikimedia.org/T127484) [23:26:34] (03PS2) 10BBlack: varnishes: control applayer DC routing from hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275124 (https://phabricator.wikimedia.org/T127484) [23:26:43] PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: puppet fail [23:26:47] (03CR) 10Dzahn: [C: 032 V: 032] ganglia: set site in labs to eqiad to avoid error with ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/275130 (owner: 10Dzahn) [23:29:50] (03CR) 10Dzahn: "nope.. still Error 400 on SERVER: Must pass sites to Class[Ganglia::Monitor::Aggregator] on node jessie-bastion-01.ganglia.eqiad.wmflabs" [puppet] - 10https://gerrit.wikimedia.org/r/275130 (owner: 10Dzahn) [23:34:17] (03PS2) 10BBlack: varnish: get rid of backend_options [puppet] - 10https://gerrit.wikimedia.org/r/275117 (https://phabricator.wikimedia.org/T127484) [23:34:19] (03PS2) 10BBlack: varnish: allow director backends to be single-value again [puppet] - 10https://gerrit.wikimedia.org/r/275118 (https://phabricator.wikimedia.org/T127484) [23:34:21] (03PS2) 10BBlack: r::c::config: remove has_ganglia [puppet] - 10https://gerrit.wikimedia.org/r/275119 (https://phabricator.wikimedia.org/T127484) [23:34:23] (03PS2) 10BBlack: r::c::config: remove parsoid (unused) [puppet] - 10https://gerrit.wikimedia.org/r/275121 (https://phabricator.wikimedia.org/T127484) [23:34:25] (03PS2) 10BBlack: r::c::config: remove lvs::configuration include [puppet] - 10https://gerrit.wikimedia.org/r/275120 (https://phabricator.wikimedia.org/T127484) [23:34:27] (03PS3) 10BBlack: r::c::config: move to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275123 (https://phabricator.wikimedia.org/T127484) [23:34:29] (03PS2) 10BBlack: r::c::config: add restbase @ codfw [puppet] - 10https://gerrit.wikimedia.org/r/275122 (https://phabricator.wikimedia.org/T127484) [23:34:31] (03PS3) 10BBlack: varnishes: control applayer DC routing from hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275124 (https://phabricator.wikimedia.org/T127484) [23:39:32] (03PS3) 10BBlack: varnish: get rid of backend_options [puppet] - 10https://gerrit.wikimedia.org/r/275117 (https://phabricator.wikimedia.org/T127484) [23:39:34] (03PS3) 10BBlack: varnish: allow director backends to be single-value again [puppet] - 10https://gerrit.wikimedia.org/r/275118 (https://phabricator.wikimedia.org/T127484) [23:39:36] (03PS3) 10BBlack: r::c::config: remove has_ganglia [puppet] - 10https://gerrit.wikimedia.org/r/275119 (https://phabricator.wikimedia.org/T127484) [23:39:38] (03PS3) 10BBlack: r::c::config: remove parsoid (unused) [puppet] - 10https://gerrit.wikimedia.org/r/275121 (https://phabricator.wikimedia.org/T127484) [23:39:40] (03PS3) 10BBlack: r::c::config: remove lvs::configuration include [puppet] - 10https://gerrit.wikimedia.org/r/275120 (https://phabricator.wikimedia.org/T127484) [23:39:42] (03PS4) 10BBlack: r::c::config: move to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275123 (https://phabricator.wikimedia.org/T127484) [23:39:44] (03PS3) 10BBlack: r::c::config: add restbase @ codfw [puppet] - 10https://gerrit.wikimedia.org/r/275122 (https://phabricator.wikimedia.org/T127484) [23:39:46] (03PS4) 10BBlack: varnishes: control applayer DC routing from hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275124 (https://phabricator.wikimedia.org/T127484) [23:47:25] (03PS1) 10Dzahn: ganglia: remove labs hiera data again [puppet] - 10https://gerrit.wikimedia.org/r/275133 [23:47:56] (03CR) 10Dzahn: [C: 032 V: 032] ganglia: remove labs hiera data again [puppet] - 10https://gerrit.wikimedia.org/r/275133 (owner: 10Dzahn) [23:50:43] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:54:06] (03PS4) 10BBlack: varnish: get rid of backend_options [puppet] - 10https://gerrit.wikimedia.org/r/275117 (https://phabricator.wikimedia.org/T127484) [23:54:08] (03PS4) 10BBlack: varnish: allow director backends to be single-value again [puppet] - 10https://gerrit.wikimedia.org/r/275118 (https://phabricator.wikimedia.org/T127484) [23:54:11] (03PS4) 10BBlack: r::c::config: remove has_ganglia [puppet] - 10https://gerrit.wikimedia.org/r/275119 (https://phabricator.wikimedia.org/T127484) [23:54:13] (03PS4) 10BBlack: r::c::config: remove parsoid (unused) [puppet] - 10https://gerrit.wikimedia.org/r/275121 (https://phabricator.wikimedia.org/T127484) [23:54:14] (03PS4) 10BBlack: r::c::config: remove lvs::configuration include [puppet] - 10https://gerrit.wikimedia.org/r/275120 (https://phabricator.wikimedia.org/T127484) [23:54:17] (03PS5) 10BBlack: r::c::config: move to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275123 (https://phabricator.wikimedia.org/T127484) [23:54:19] (03PS4) 10BBlack: r::c::config: add restbase @ codfw [puppet] - 10https://gerrit.wikimedia.org/r/275122 (https://phabricator.wikimedia.org/T127484) [23:54:21] (03PS5) 10BBlack: varnishes: control applayer DC routing from hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275124 (https://phabricator.wikimedia.org/T127484) [23:54:22] RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:56:37] (03PS5) 10BBlack: varnish: get rid of backend_options [puppet] - 10https://gerrit.wikimedia.org/r/275117 (https://phabricator.wikimedia.org/T127484) [23:56:39] (03PS5) 10BBlack: varnish: allow director backends to be single-value again [puppet] - 10https://gerrit.wikimedia.org/r/275118 (https://phabricator.wikimedia.org/T127484) [23:56:41] (03PS5) 10BBlack: r::c::config: remove has_ganglia [puppet] - 10https://gerrit.wikimedia.org/r/275119 (https://phabricator.wikimedia.org/T127484) [23:56:43] (03PS5) 10BBlack: r::c::config: remove parsoid (unused) [puppet] - 10https://gerrit.wikimedia.org/r/275121 (https://phabricator.wikimedia.org/T127484) [23:56:45] (03PS5) 10BBlack: r::c::config: remove lvs::configuration include [puppet] - 10https://gerrit.wikimedia.org/r/275120 (https://phabricator.wikimedia.org/T127484) [23:56:47] (03PS6) 10BBlack: r::c::config: move to hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275123 (https://phabricator.wikimedia.org/T127484) [23:56:49] (03PS5) 10BBlack: r::c::config: add restbase @ codfw [puppet] - 10https://gerrit.wikimedia.org/r/275122 (https://phabricator.wikimedia.org/T127484) [23:56:51] (03PS6) 10BBlack: varnishes: control applayer DC routing from hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275124 (https://phabricator.wikimedia.org/T127484) [23:58:44] (03CR) 10jenkins-bot: [V: 04-1] varnish: get rid of backend_options [puppet] - 10https://gerrit.wikimedia.org/r/275117 (https://phabricator.wikimedia.org/T127484) (owner: 10BBlack) [23:58:47] (03CR) 10jenkins-bot: [V: 04-1] varnish: allow director backends to be single-value again [puppet] - 10https://gerrit.wikimedia.org/r/275118 (https://phabricator.wikimedia.org/T127484) (owner: 10BBlack) [23:58:59] aww jenkins is so mean :P [23:59:01] (03CR) 10jenkins-bot: [V: 04-1] varnishes: control applayer DC routing from hieradata [puppet] - 10https://gerrit.wikimedia.org/r/275124 (https://phabricator.wikimedia.org/T127484) (owner: 10BBlack) [23:59:32] oh, that's the previous round of PSes heh