[00:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180227T0000). [00:00:04] RoanKattouw, subbu, and MaxSem: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:36] o/ [00:01:00] * subbu is ready to see Tidy removed from 60 more wikis [00:01:13] I'll do the SWAT today [00:01:21] awight: I mean I can also test on mwdebug1002 [00:01:35] subbu: I am too, so let's do it [00:01:47] RoanKattouw: It’s all you, I have no worries that you’ll leave it in good condition. [00:02:00] (03PS2) 10Catrope: Enable RemexHtml on all private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414701 (https://phabricator.wikimedia.org/T188009) (owner: 10Subramanya Sastry) [00:02:08] (03PS2) 10Catrope: Enable RemexHtml on all wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414700 (https://phabricator.wikimedia.org/T188000) (owner: 10Subramanya Sastry) [00:02:13] (03PS3) 10Catrope: Enable RemexHtml on all private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414701 (https://phabricator.wikimedia.org/T188009) (owner: 10Subramanya Sastry) [00:02:19] (03PS2) 10Catrope: Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414702 (owner: 10Subramanya Sastry) [00:02:40] (03CR) 10Catrope: [C: 032] Enable RemexHtml on all wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414700 (https://phabricator.wikimedia.org/T188000) (owner: 10Subramanya Sastry) [00:02:45] RoanKattouw: The only thing I’d like to request is at the end of the process, we can test things on beta if needed. I made this patch but fear it’s inadequate since I can’t find docs on what I’m doing: https://gerrit.wikimedia.org/r/414860 [00:02:52] subbu: Do you want to do all of those at the same time? [00:03:04] awight: I don't fully know how to add a new wiki either [00:03:13] But you're right, we should have a beta wiki for Swedish and Spanish [00:03:25] Thankfully I'm not rolling out Spanish today so we may be able to create that one in time [00:03:33] kk it can wait for another day, I officially bless the production-first approach ;-) [00:03:37] RoanKattouw, yes .. i have them as separate patches in case we want to revert any of those subsets. [00:03:38] haha [00:03:46] OK cool, I'll just push them out together them [00:03:47] *then [00:03:50] ok [00:03:58] (03CR) 10Catrope: [C: 032] Enable RemexHtml on all private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414701 (https://phabricator.wikimedia.org/T188009) (owner: 10Subramanya Sastry) [00:04:02] (03CR) 10Catrope: [C: 032] Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414702 (owner: 10Subramanya Sastry) [00:04:04] (03Merged) 10jenkins-bot: Enable RemexHtml on all wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414700 (https://phabricator.wikimedia.org/T188000) (owner: 10Subramanya Sastry) [00:04:17] (03CR) 10Catrope: [C: 04-1] Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414702 (owner: 10Subramanya Sastry) [00:04:31] PROBLEM - HHVM rendering on mw2219 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:04:37] (03PS3) 10Catrope: Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414702 (https://phabricator.wikimedia.org/T188008) (owner: 10Subramanya Sastry) [00:04:43] (03CR) 10Catrope: [C: 032] Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414702 (https://phabricator.wikimedia.org/T188008) (owner: 10Subramanya Sastry) [00:04:53] (03PS1) 10Krinkle: profiler-labs: Factor out 'Enable profiler' code, add 'forceprofile' to XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414863 [00:05:10] (03PS2) 10Krinkle: profiler-labs: Factor out 'Enable profiler' code, add 'forceprofile' to XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414863 (https://phabricator.wikimedia.org/T180183) [00:05:21] RECOVERY - HHVM rendering on mw2219 is OK: HTTP OK: HTTP/1.1 200 OK - 74054 bytes in 0.482 second response time [00:05:33] (03Merged) 10jenkins-bot: Enable RemexHtml on all private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414701 (https://phabricator.wikimedia.org/T188009) (owner: 10Subramanya Sastry) [00:05:42] Krinkle: Where do we swap the config for the l10n cache backend to use? I don't see it.... [00:05:47] (03CR) 10Catrope: [C: 032] beta: enable mentions in edit summaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413879 (https://phabricator.wikimedia.org/T187835) (owner: 10MaxSem) [00:06:02] no_justification: somewhere in wmf-config I assume. [00:06:22] All I see is [00:06:22] $wgLocalisationCacheConf['storeDirectory'] = "$IP/cache/l10n"; [00:06:22] $wgLocalisationCacheConf['manualRecache'] = true; [00:06:51] RoanKattouw: I found this, https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Add_a_wiki [00:07:01] (03CR) 10jenkins-bot: Enable RemexHtml on all wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414700 (https://phabricator.wikimedia.org/T188000) (owner: 10Subramanya Sastry) [00:07:03] (03CR) 10jerkins-bot: [V: 04-1] profiler-labs: Factor out 'Enable profiler' code, add 'forceprofile' to XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414863 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [00:07:10] (03Merged) 10jenkins-bot: beta: enable mentions in edit summaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413879 (https://phabricator.wikimedia.org/T187835) (owner: 10MaxSem) [00:07:19] awight: Nice! [00:07:31] no_justification: [00:07:31] https://github.com/wikimedia/mediawiki/blob/ad776c7d5f8deee581bf3338c76c6312c3e2933e/maintenance/doMaintenance.php#L64 [00:07:34] haha wait until you start clicking on the links. [00:07:34] and [00:07:35] https://github.com/wikimedia/mediawiki/blob/89843b44ce94bcbb75b69f25c00c30f0ecc12752/includes/cache/localisation/LocalisationCache.php#L209-L219 [00:07:47] the default if one of those things is set (which we do) is 'detect' [00:07:53] which seems to find its way to 'LcStoreCdb' [00:08:33] LCStoreCDB, but it should be named LcStoreCdb, of course. [00:09:00] Hmm, should it? L and C refer to separate words [00:09:04] RoanKattouw, the X-Wikimedia-Debug extension is now directing reqs to mwdebug1001 ... so, i assume you will test deploy to that server? [00:09:18] (03PS4) 10Catrope: Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414702 (https://phabricator.wikimedia.org/T188008) (owner: 10Subramanya Sastry) [00:09:23] (03CR) 10Catrope: [C: 032] Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414702 (https://phabricator.wikimedia.org/T188008) (owner: 10Subramanya Sastry) [00:09:24] bawolff: In method names and class names we always prefer making abbreviations ucfirst only, per camel case conventions [00:09:24] i thought it used to be mwdebug1002 before. [00:09:30] Krinkle: Gotcha, I see the old patch ... https://gerrit.wikimedia.org/r/c/217702/1/wmf-config/CommonSettings.php#231 [00:09:36] subbu: You can configure it [00:09:38] no_justification: Yep, that's it [00:09:43] i see. [00:09:56] There's a dropdown, set it to 1002 [00:10:02] no_justification: I'm not 100% sure how changing it for one will will affect the localisation rebuild script, I don't know if it currently expects to vary by wiki. [00:10:08] Thanks for pointing that out though, I hadn't noticed that it had forgotten the last-used server and defaulted to 1001 again [00:10:18] But yeah, that config change will need to be part of it. [00:10:23] Krinkle: The rebuild script should Just Work, from last time I checked it [00:10:30] Oh, by-wiki [00:10:32] Yeah idk [00:10:37] We *should* enable it in beta though [00:10:40] (03Merged) 10jenkins-bot: Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414702 (https://phabricator.wikimedia.org/T188008) (owner: 10Subramanya Sastry) [00:10:43] Since we have TC enabled there [00:10:44] RoanKattouw, i see now. [00:10:53] (03CR) 10Dzahn: Add Apache 2.0 license. (032 comments) [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/414851 (owner: 10Nfontes) [00:11:02] no_justification: We run it once for all wikis, presumably with something like --wiki=aawiki [00:11:09] hence extension list etc. [00:11:09] Yeah [00:12:01] (03CR) 10Dzahn: "i'm sure my contributions are more due to reorganizing puppet code in the repo and style/lint fixes. i have no problem with licensing them" [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/414851 (owner: 10Nfontes) [00:12:13] (03PS3) 10Awight: Enable Swedish and Spanish Wikibooks on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414860 (https://phabricator.wikimedia.org/T188349) [00:12:19] no_justification: https://github.com/wikimedia/puppet/blob/2a4af3a094c16f376c5972c5777b81375d7018b4/modules/scap/files/l10nupdate-1#L85-L102 [00:12:25] Looks like it currently invokes it once per wiki version [00:12:29] but that's for nightly updates [00:12:31] not sure about scap [00:12:43] RoanKattouw: Mind if I hack on creating the beta wikis, while you SWAT? [00:12:50] Krinkle: It does per-version as well in scap [00:12:54] awight: Please do [00:13:00] kk [00:13:18] Sooooo, per-wiki is basically impossible, sans having a special version we only use on like test2wiki [00:13:26] no_justification: https://github.com/wikimedia/scap/blob/892c4f5c8fd950a2722192100d76a38a463c56f1/scap/main.py#L500-L502 [00:13:34] Yep [00:13:53] That might be the safest/easiest way to test in prod [00:14:36] addshore: Any idea what perms I would need to edit this page? https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep/host/deployment-cache-text04 [00:15:12] (03PS1) 10Chad: Beta: Attempt using LCStoreStaticArray [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414865 (https://phabricator.wikimedia.org/T99740) [00:15:30] awight: Be in the admin list for the project iirc [00:15:34] Welp, I almost synced that straight to prod [00:15:43] subbu: Your config changes are live on mwdebug1002 [00:15:51] k [00:15:52] I synced it direct to prod at first and Ctrl+C-ed that [00:15:59] no_justification: Could temporarily hack into scap a second call for each wiki version invoking with store=array, except that parameter doesn't exist [00:16:08] no_justification: ok ty [00:16:20] no_justification: would it be possible to invoke the maintenance script with wiki=test2wiki even when that wiki isn't on that version? [00:16:31] Oh I guess just adding wiki=test2wiki would suffice [00:16:45] as a fourth group temporarily, just in that python loop. [00:17:11] Possible [00:17:24] Anyway, we should probably get TC enabled in prod *first* [00:17:31] Also test LCStoreStaticArray in beta [00:17:44] imma merge that now actually [00:17:52] RoanKattouw: Want to make me an admin for deployment-prep? [00:18:09] (03CR) 10Krinkle: Beta: Attempt using LCStoreStaticArray (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414865 (https://phabricator.wikimedia.org/T99740) (owner: 10Chad) [00:18:16] RoanKattouw, verified remex behaves as expected on officewiki sandbox and that an enwikinews page looks broken as expected :) [00:18:26] Sounds good, deploying [00:18:35] RoanKattouw, no errors in logs i presume? [00:19:05] (03PS3) 10Krinkle: profiler-labs: Factor out 'Enable profiler' code, add 'forceprofile' to XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414863 [00:19:22] (03PS4) 10Krinkle: profiler-labs: Factor out 'Enable profiler' code, add 'forceprofile' to XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414863 (https://phabricator.wikimedia.org/T180183) [00:19:27] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable RemexHtml on all wikinews wikis (T188000), all private wikis (T188009), test2wiki, loginwiki, votewiki and wikimania2017wiki (T188008) (duration: 00m 56s) [00:19:35] Krinkle: Also, that MW_SETUP_CALLBACK check is....ew. If an extension ever registered it, it would make that entire callback break. [00:19:44] No errors in the mwdebug1002 logs, no [00:19:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:19:46] T188000: Enable RemexHTML on all wikinews wikis - https://phabricator.wikimedia.org/T188000 [00:19:46] T188009: Enable RemexHTML on all private wikis - https://phabricator.wikimedia.org/T188009 [00:19:46] T188008: Enable RemexHTML on a bunch of miscellaneous wikis - https://phabricator.wikimedia.org/T188008 [00:19:55] Let's see what the farm-wide logs say no [00:19:56] w [00:20:30] no_justification: Aye, that's kind of by design. There can only be one handler for that hook, and whichever code handles it is then responsible for doing it. [00:20:42] There is a default, but something like WikiFarm could override it [00:21:02] Warning: Unable to record MySQL stats with: EXPLAIN /* MediaWiki\Linter\Database::getTotalsEstimate */ SELECT * FROM `linter` WHERE linter_cat = '12' in /srv/mediawiki/php-1.31.0-wmf.22/includes/libs/rdbms/database/DatabaseMysqli.php on line 47 [00:21:06] Is "trending" [00:21:10] But I think that's a known HHVM issue [00:21:11] RoanKattouw, known. [00:21:16] yes. [00:21:22] that is from the linter though. not remex. [00:21:27] Krinkle: Well it just means the assumptions inside that maintenance block can never be for sure :( :) [00:21:33] (03PS2) 10Nfontes: Add Apache 2.0 license. [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/414851 [00:21:38] RoanKattouw: I haz patch upstream [00:21:39] no_justification: nvm, I'm talking about MW_CONFIG_CALLBACK. MW_SETUP_CALLBACK is even more restricted, it's basically core-only, for customisation of individual entry points. [00:21:46] (03CR) 10Dzahn: [C: 04-2] "all misc apps need to be changed at once, not just racktables. and that is already https://gerrit.wikimedia.org/r/#/c/406970/" [puppet] - 10https://gerrit.wikimedia.org/r/409478 (owner: 10Paladox) [00:22:03] (03Abandoned) 10Dzahn: racktables: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409478 (owner: 10Paladox) [00:22:06] I don't see anything else weird related to linter/remex [00:22:10] So yay [00:22:12] RoanKattouw, alright. \o/ thanks. [00:22:21] RoanKattouw: https://github.com/facebook/hhvm/pull/8139 (it was fixed years ago but not *completely* fixed) [00:22:23] another 440 wikis to go .. i'll target about 200 of them sometime in march. [00:22:26] Woohoo. [00:22:32] You'll note....all the errors are with EXPLAIN aka estimateRowCount() [00:22:32] :) [00:23:05] (03CR) 10Catrope: [C: 032] Enable ORES filters on simplewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395818 (https://phabricator.wikimedia.org/T182012) (owner: 10Catrope) [00:23:17] subbu: Have you done all the closed.dblist wikis yet? :) [00:23:30] no_justification: I introduced the callback as a way of eliminating lots of duplicate code between WebStart/doMaintenance and moving it into Setup.php. With this callback as the way to retain the minor differences between them. Which had to happen in-between two chunks of code, instead of strictly before or after Setup.php [00:23:33] Yeah I saw the bug report about EXPLAIN a few days ago [00:23:42] no_justification, i don't know .. maybe give me a list of those? [00:23:44] Perhaps we can somehow deprecate reliance on that and just do it only before or after. [00:23:54] Makes sense. I was mostly going down a rabbit hole :) [00:23:59] subbu: dblists/closed.dblist :) [00:24:03] There's 126 of them [00:24:40] They're also closed, so you can't edit them unless you have superpowers [00:24:51] Well the point is more that people don't feel upset about it :) [00:24:52] no_justification, alright .. will file a phab task to switch there. [00:25:00] Plus gives you a look at likely-to-unchange historical content [00:25:06] aawiki,aawikibooks,aawiktionary,abwiktionary,advisorywiki,akwikibooks,akwiktionary,amwikiquote,angwikibooks,angwikiquote,angwikisource,astwikibooks,astwikiquote,aswikibooks,aswiktionary,avwiktionary,aywikibooks,bhwiktionary,biwikibooks,biwiktionary,bmwikibooks,bmwikiquote,bmwiktionary,bowikibooks,bowiktionary,chowiki,chwikibooks,chwiktionary,cowikibooks,cowikiquote,crwikiquote,crwiktionary,dzwiktionary,gawikibooks,gawikiqu [00:25:06] ote,gnwikibooks,gotwikibooks,guwikibooks,howiki,htwikisource,huwikinews,hzwiki,iewikibooks,iiwiki,ikwiktionary,kjwiki,kkwikiquote,knwikibooks,krwiki,krwikiquote,kswikibooks,kswikiquote,kwwikiquote,lbwikibooks,lbwikiquote,lnwikibooks,lvwikibooks,mhwiki,mhwiktionary,miwikibooks,mnwikibooks,muswiki,mywikibooks,nahwikibooks,nawikibooks,nawikiquote,ndswikibooks,ndswikiquote,ngwiki,nzwikimedia,pa_uswikimedia,piwiktionary,pswikib [00:25:06] ooks,qualitywiki,quwikibooks,quwikiquote,rmwikibooks,rmwiktionary,rnwiktionary,scwiktionary,sdwikinews,sewikibooks,simplewikibooks,simplewikiquote,snwiktionary,strategywiki,suwikibooks,swwikibooks,tenwiki,thwikinews,tkwikibooks,tkwikiquote,towiktionary,ttwikiquote,transitionteamwiki,twwiktionary,ugwikibooks,ugwikiquote,usabilitywiki,uzwikibooks,vowikibooks,vowikiquote,wawikibooks,wikimania2005wiki,wikimania2006wiki,wikiman [00:25:06] ia2007wiki,wikimania2008wiki,wikimania2009wiki,wikimania2010wiki,wikimania2011wiki,wikimania2012wiki,wikimania2013wiki,wikimania2014wiki,wikimania2015wiki,wikimania2016wiki,wikimania2017wiki,xhwikibooks,xhwiktionary,yowikibooks,yowiktionary,zawikibooks,zawikiquote,zawiktionary,zh_min_nanwikibooks,zh_min_nanwikiquote,zuwikibooks [00:25:35] Heh, no spaces or newlines and still had to split over 4 messages. [00:27:07] (03PS3) 10Dzahn: grafana/racktables/iegreview/misc: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/406970 [00:27:12] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/10144/krypton.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/406970 (owner: 10Dzahn) [00:28:09] RoanKattouw: FYI, i was blocked at step 1 in creating the new wikis. We can chat tomorrow, no rush. [00:29:33] (03PS2) 10Chad: Beta: Attempt using LCStoreStaticArray [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414865 (https://phabricator.wikimedia.org/T99740) [00:30:19] Added some scary warning comments Krinkle [00:31:20] (03PS3) 10Catrope: Enable ORES filters on simplewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395818 (https://phabricator.wikimedia.org/T182012) [00:31:24] (03CR) 10Catrope: Enable ORES filters on simplewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395818 (https://phabricator.wikimedia.org/T182012) (owner: 10Catrope) [00:31:28] (03CR) 10Catrope: [C: 032] Enable ORES filters on simplewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395818 (https://phabricator.wikimedia.org/T182012) (owner: 10Catrope) [00:31:41] (03PS2) 10Catrope: Enable ORES filters on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414859 (https://phabricator.wikimedia.org/T174560) [00:31:53] (03PS1) 10Krinkle: profiler: Swap order of Xenon and XHGui setup (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414868 (https://phabricator.wikimedia.org/T180183) [00:31:54] no_justification: want to add me to deployment-prep admins? [00:31:56] (03PS1) 10Krinkle: profiler: Merge XHGui-setup into the overall hot_profiler block [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414869 (https://phabricator.wikimedia.org/T180183) [00:31:58] What could possible go wrong. [00:32:24] no_justification: thx :) [00:32:44] (03Merged) 10jenkins-bot: Enable ORES filters on simplewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395818 (https://phabricator.wikimedia.org/T182012) (owner: 10Catrope) [00:32:46] awight: {{done}} [00:32:51] O_O [00:35:21] (03PS1) 10Jforrester: Remove no-op RemexHtml disable on Wikivoyage section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414870 [00:35:31] (03CR) 10Dzahn: [C: 032] "no-op on krypton.eqiad.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/406970 (owner: 10Dzahn) [00:36:23] (03CR) 10Chad: [C: 032] "Here goes nothing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414865 (https://phabricator.wikimedia.org/T99740) (owner: 10Chad) [00:37:44] (03Merged) 10jenkins-bot: Beta: Attempt using LCStoreStaticArray [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414865 (https://phabricator.wikimedia.org/T99740) (owner: 10Chad) [00:38:04] (03CR) 10Subramanya Sastry: [C: 031] "whoops :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414870 (owner: 10Jforrester) [00:39:09] (03CR) 10Krinkle: [C: 031] "For the record, these are actually setting null to *wikibooks, which are already set a few lines up." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414870 (owner: 10Jforrester) [00:39:11] Ugh who took the scap lock [00:39:23] * Krinkle looks at no_justification [00:39:28] owner is "demon"; reason is "beta-only change: lsctorestaticarray" [00:39:36] Maybe don't do that during a scheduled deployment window? [00:39:43] !log demon@tin Synchronized wmf-config/CommonSettings.php: beta-only change: lsctorestaticarray (duration: 00m 56s) [00:39:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:40:05] Oh, I guess it is swat-o-clock [00:40:26] I could've skipped the sync I s'pose [00:40:45] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable ORES filters on simplewiki (T182012) (duration: 00m 56s) [00:40:46] (03CR) 10Catrope: [C: 032] Enable ORES filters on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414859 (https://phabricator.wikimedia.org/T174560) (owner: 10Catrope) [00:40:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:41:00] T182012: Deploy ORES filters to Simple Wikipedia - https://phabricator.wikimedia.org/T182012 [00:41:15] RoanKattouw: I think you’re good to add this to the SWAT now, https://gerrit.wikimedia.org/r/#/c/414860/ [00:41:40] (03CR) 10Catrope: [C: 032] Enable Swedish and Spanish Wikibooks on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414860 (https://phabricator.wikimedia.org/T188349) (owner: 10Awight) [00:42:01] awight: Thanks. It's beta-only so it'll auto-deploy [00:42:11] +1 (we hope) [00:42:26] (03Merged) 10jenkins-bot: Enable ORES filters on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414859 (https://phabricator.wikimedia.org/T174560) (owner: 10Catrope) [00:42:52] (03Merged) 10jenkins-bot: Enable Swedish and Spanish Wikibooks on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414860 (https://phabricator.wikimedia.org/T188349) (owner: 10Awight) [00:43:25] I’m on steps 4 and 5 now, which are parsoid and restbase patches. [00:43:47] However, https://sv.wikipedia.beta.wmflabs.org still returns a 404 [00:44:57] Krinkle: I s'pose the best/most responsible step in production would be to to pick a couple of servers (maybe canaries, to keep it simple?) and swap them to using reusable tc [00:45:14] Rather than roll it everywhere and it explode in our faces [00:45:23] no_justification: Yeah, prolly starting with mwdebug [00:45:39] and a day later canaries with traffic yeah [00:46:08] Keeping a close eye on grafana boards about hhvm and server-level metrics [00:49:13] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable ORES filters on svwiki (T174560) (duration: 00m 56s) [00:49:18] (03PS1) 10Awight: Add eswikibooks and svwiki to the beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/414873 (https://phabricator.wikimedia.org/T174560) [00:49:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:49:28] T174560: Enable ORES filters for svwiki - https://phabricator.wikimedia.org/T174560 [00:49:49] (03CR) 10jerkins-bot: [V: 04-1] Add eswikibooks and svwiki to the beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/414873 (https://phabricator.wikimedia.org/T174560) (owner: 10Awight) [00:51:11] (03PS1) 10Catrope: Follow-up 178936e19d9: add very likely bad faith filter for svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414874 (https://phabricator.wikimedia.org/T174560) [00:51:22] (03PS2) 10Awight: Add eswikibooks and svwiki to the beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/414873 (https://phabricator.wikimedia.org/T174560) [00:51:24] (03PS2) 10Catrope: Follow-up 178936e19d9: add very likely bad faith filter for svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414874 (https://phabricator.wikimedia.org/T174560) [00:51:33] (03CR) 10Catrope: [C: 032] Follow-up 178936e19d9: add very likely bad faith filter for svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414874 (https://phabricator.wikimedia.org/T174560) (owner: 10Catrope) [00:53:17] (03Merged) 10jenkins-bot: Follow-up 178936e19d9: add very likely bad faith filter for svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414874 (https://phabricator.wikimedia.org/T174560) (owner: 10Catrope) [00:53:41] RoanKattouw: svwiki looks good! [00:53:48] (03PS2) 10Catrope: Remove no-op RemexHtml disable on Wikivoyage section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414870 (owner: 10Jforrester) [00:53:54] (03CR) 10Catrope: [C: 032] Remove no-op RemexHtml disable on Wikivoyage section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414870 (owner: 10Jforrester) [00:55:01] Yeah both simple and Swedish look to be working in prod [00:55:05] (03Merged) 10jenkins-bot: Remove no-op RemexHtml disable on Wikivoyage section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414870 (owner: 10Jforrester) [00:55:26] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: add very likely bad faith filter on svwiki (T174560) (duration: 00m 57s) [00:55:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:55:40] T174560: Enable ORES filters for svwiki - https://phabricator.wikimedia.org/T174560 [00:57:10] Looks like you got it. I need to catch a bus, thanks for doing this deployment! [00:57:34] (03PS1) 10Chad: mwdebug1001/1002: enable reusable TC on HHVM [puppet] - 10https://gerrit.wikimedia.org/r/414876 (https://phabricator.wikimedia.org/T103886) [00:57:56] Krinkle: I'll add ^^ to puppetswat [01:00:22] (03CR) 10Dzahn: "really just mwdebug by host name and not all mediawiki::canary_appserver? i guess that would be the next step?" [puppet] - 10https://gerrit.wikimedia.org/r/414876 (https://phabricator.wikimedia.org/T103886) (owner: 10Chad) [01:02:21] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:02:56] (03CR) 10Krinkle: "Do mwdebug servers (also the two in codfw) not have a common role/profile?" [puppet] - 10https://gerrit.wikimedia.org/r/414876 (https://phabricator.wikimedia.org/T103886) (owner: 10Chad) [01:03:11] Swedish is missing "very likely bad faith", I added it but it's cached for a day, so I'm gonna try to clear that cache manually using eval.php [01:03:14] no_justification: LGTM [01:03:44] There's no bad faith on Swedish Wikipedia. [01:03:45] they have a common role, mediawiki::canary_appserver, just that that group is larger than mwdebug [01:03:57] mutante: right. [01:04:07] if there are 2 levels, 1) mwdebug 2) all of canary ... [01:04:11] Oh nm it turned over already [01:04:24] mutante: Would it make sense to given them a common role? They are separate after all. [01:04:28] then maybe we should have canary and precanary :) [01:04:28] mwdebug aren't even pooled [01:04:34] yes, i think so [01:04:56] pre-canary stage-bird [01:05:26] I don't want to be boring, but mediawiki::debug_appserver might fit better with the hostnames. [01:05:29] :) [01:05:40] hah, fair enough, yea [01:06:03] basically a copy of canary_appserver then [01:06:23] not using hostnames is good [01:06:41] I'm glad to hear they are provisioned the same way as production app servers, though. [01:07:10] So... we no longer want multi-role servers, but roles can still extend right? [01:07:32] I mean... it'd be nice to keep the behaviour that setting hiera for app server still affects canaries, and that setting canaries still affects debug. [01:07:54] Or do we use class-based hiera for that instead? [01:07:59] A role for the debug servers sounds nice [01:08:05] I was a bit confused by the drive to eliminate multi-role puppet nodes [01:08:08] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4004558 (10Dzahn) This host fails at the "Partition Disk" step in installer. It is similar but different from bas... [01:08:30] Krinkle: You'd set that stuff at the profile level then [01:08:38] So you'd have role(debug_server), role(canary_server) [01:08:46] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4004563 (10Dzahn) The mgmt password works again since Chris re-enabled the root user. [01:08:49] Which would basically be light shims around the profiles we want [01:09:01] AIUI [01:13:59] (03PS1) 10Krinkle: profiler: Simplify XHGui handling (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414880 [01:14:32] no_justification: I forgot RE: profile vs role. [01:14:48] One of them we newly introduced I think, and the other we said we don't want multiple of on one node. [01:14:54] Or is that both about profile? [01:17:03] (03PS1) 10Krinkle: profiler: Move $XWD assignment to inside hot_profiler block (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414881 (https://phabricator.wikimedia.org/T180183) [01:18:17] yes, there should be only a single role per node [01:18:29] but each role can include more than one profile [01:18:50] Krinkle: only 1 role per node, but many profiles per role [01:19:04] and "what we used to call role all the time is basically profile" [01:19:14] (03PS1) 10Dzahn: partman: fix recipes for bastion servers [puppet] - 10https://gerrit.wikimedia.org/r/414882 (https://phabricator.wikimedia.org/T186623) [01:20:13] there can be 1 role for actual_prod, 1 role for canary_prod and 1 role for debug_prod and they can all include the same profiles and be identical [01:20:56] but at the same time you can set different Hiera things based on role/common/mediawiki/foo [01:21:13] maybe that's not what you wanted though .. hmm [01:21:50] (03PS2) 10Krinkle: profiler: Move $XWD assignment to inside hot_profiler block (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414881 (https://phabricator.wikimedia.org/T180183) [01:21:52] (03PS1) 10Krinkle: profiler: Simplify XHGui handling (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414883 [01:21:54] (03PS1) 10Krinkle: profiler: Rename $XWD to $xwd, and unset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414884 (https://phabricator.wikimedia.org/T180183) [01:22:25] mutante: Thx, that's exactly what I wanted to know :) [01:22:35] yea, what you want to be the same for all should be in one of the profiles they all include [01:22:58] mutante: so our old 'role' is being turned into 'profile', except that we'll only have one profile per node, and where needed this means that a node changed from having 4 roles, to now having 1 profile that includes 3 roles. [01:23:20] Oh wait, I got it the wrong way around [01:23:42] old roles are renamed to profiles [01:23:53] and new roles are created from scratch and include one or more profiles [01:23:58] Got it [01:24:02] then on the node level, you make sure it includes only 1 role [01:24:10] and hiera is per-profile or per-role, or can be either? [01:24:16] either [01:24:23] right it does per-node, profile, role and class. [01:24:37] yea, there is some discussion about making it less complex [01:24:44] by removing some of the Hiera options [01:24:51] afair [01:24:55] OK [01:25:49] Krinkle: Added to tomorrow's puppetswat [01:25:53] cool [01:25:57] Put both our names so at least one of us will be around for the ping :) [01:26:51] (03CR) 10Dzahn: [C: 032] partman: fix recipes for bastion servers [puppet] - 10https://gerrit.wikimedia.org/r/414882 (https://phabricator.wikimedia.org/T186623) (owner: 10Dzahn) [01:28:03] (03PS2) 10Dzahn: partman: fix recipes for bastion servers [puppet] - 10https://gerrit.wikimedia.org/r/414882 (https://phabricator.wikimedia.org/T186623) [01:38:03] (03PS1) 10Catrope: Enable ORES filters on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414886 (https://phabricator.wikimedia.org/T130279) [01:39:59] !log install1002 - re-enabling disabled puppet [01:40:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:47:26] 10Operations, 10Patch-For-Review: setup/install bast1002(WMF4749) - https://phabricator.wikimedia.org/T186623#4004654 (10Dzahn) I found an issue with the selection of partman recipes but after that the next thing happened. While i was on the console my connection froze and i can't get back on it : ``` ro... [01:53:55] (03PS10) 10Smalyshev: wdqs: allow configuration of kafka based updates [puppet] - 10https://gerrit.wikimedia.org/r/412873 (https://phabricator.wikimedia.org/T185951) (owner: 10Gehel) [01:54:33] (03CR) 10jerkins-bot: [V: 04-1] wdqs: allow configuration of kafka based updates [puppet] - 10https://gerrit.wikimedia.org/r/412873 (https://phabricator.wikimedia.org/T185951) (owner: 10Gehel) [01:57:00] (03PS11) 10Smalyshev: wdqs: allow configuration of kafka based updates [puppet] - 10https://gerrit.wikimedia.org/r/412873 (https://phabricator.wikimedia.org/T185951) (owner: 10Gehel) [01:57:38] (03CR) 10jerkins-bot: [V: 04-1] wdqs: allow configuration of kafka based updates [puppet] - 10https://gerrit.wikimedia.org/r/412873 (https://phabricator.wikimedia.org/T185951) (owner: 10Gehel) [01:59:48] (03PS12) 10Smalyshev: wdqs: allow configuration of kafka based updates [puppet] - 10https://gerrit.wikimedia.org/r/412873 (https://phabricator.wikimedia.org/T185951) (owner: 10Gehel) [02:00:26] (03CR) 10jerkins-bot: [V: 04-1] wdqs: allow configuration of kafka based updates [puppet] - 10https://gerrit.wikimedia.org/r/412873 (https://phabricator.wikimedia.org/T185951) (owner: 10Gehel) [02:07:30] (03PS13) 10Smalyshev: wdqs: allow configuration of kafka based updates [puppet] - 10https://gerrit.wikimedia.org/r/412873 (https://phabricator.wikimedia.org/T185951) (owner: 10Gehel) [02:08:12] (03PS2) 10Catrope: Enable ORES filters on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414886 (https://phabricator.wikimedia.org/T130279) [02:08:14] (03PS1) 10Catrope: Enable ORES filters on eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414889 (https://phabricator.wikimedia.org/T145394) [02:08:25] (03CR) 10Jforrester: [C: 031] robots.txt: Combine various NS_SPECIAL disallows [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411255 (owner: 10Chad) [02:09:09] 10Operations, 10Wikimedia-General-or-Unknown: Move "transparency.wikimedia.org/private" to "transparency-private.wikimedia.org" - https://phabricator.wikimedia.org/T188362#4004707 (10Prtksxna) [02:10:10] 10Operations: TransparencyReport-private is not auto deploying - https://phabricator.wikimedia.org/T188224#4004718 (10Prtksxna) >>>! In T188224#4000455, @Peachey88 wrote: >>>>! In T188224#4000339, @Prtksxna wrote: >>> Also, would it be possible to move https://transparency.wikimedia.org/private to https://privat... [02:21:21] PROBLEM - Host cp3048 is DOWN: PING CRITICAL - Packet loss = 100% [02:23:02] RECOVERY - Host cp3048 is UP: PING WARNING - Packet loss = 73%, RTA = 83.83 ms [02:25:12] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.22) (duration: 06m 11s) [02:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:39:18] (03PS3) 10Andrew Bogott: wikitech: grants for the new labswiki db on m5 [puppet] - 10https://gerrit.wikimedia.org/r/413884 (https://phabricator.wikimedia.org/T188029) [02:41:33] (03CR) 10Andrew Bogott: [C: 032] wikitech: grants for the new labswiki db on m5 [puppet] - 10https://gerrit.wikimedia.org/r/413884 (https://phabricator.wikimedia.org/T188029) (owner: 10Andrew Bogott) [02:44:47] (03CR) 10Krinkle: [C: 04-1] "Minor nit regarding %3A, which we should probably keep." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411255 (owner: 10Chad) [02:46:07] mutante: I suspect that https://gerrit.wikimedia.org/r/#/c/406970/ broke puppet on labmon1001.eqiad.wmnet. Is that an easy fix? [02:48:47] RoanKattouw: just checking, done with deploys? [02:49:18] (03PS5) 10Krinkle: profiler-labs: Factor out 'Enable profiler' code, add 'forceprofile' to XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414863 (https://phabricator.wikimedia.org/T180183) [02:49:25] (03CR) 10Krinkle: [C: 032] profiler-labs: Factor out 'Enable profiler' code, add 'forceprofile' to XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414863 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [02:49:54] (03CR) 10Chad: robots.txt: Combine various NS_SPECIAL disallows (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411255 (owner: 10Chad) [02:50:08] Krinkle: yes [02:50:39] (03Merged) 10jenkins-bot: profiler-labs: Factor out 'Enable profiler' code, add 'forceprofile' to XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414863 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [02:50:45] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10NewPHP, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4004786 (10Legoktm) [02:51:05] (03CR) 10Krinkle: [C: 04-1] robots.txt: Combine various NS_SPECIAL disallows (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411255 (owner: 10Chad) [02:51:08] RoanKattouw: thx [02:51:58] (03PS2) 10Chad: robots.txt: Combine various NS_SPECIAL disallows [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411255 [02:52:57] Hm.. beta deploys are stuck in jenkins it seems [02:53:03] Ruh roh [02:53:05] 2 hour stall [02:53:08] Did we/I break it? [02:53:31] no_justification: I doint think so [02:53:42] I think it’s that gearmon thing [02:53:42] no_justification: It's waiting for executor slot [02:53:43] With the LCStore change [02:53:47] but nothing is happening [02:53:48] Ah [02:53:50] So it's a Jenkins thing [02:53:56] This used to happen quite often... in 2015. [02:53:56] Needs restarting [02:54:01] (Gearmon) [02:54:04] The beta jenkins slave is faulty [02:54:09] Krinkle: no_justification ^^ [02:54:26] Yeah, I know, I've done enough Gearman restarts to last a lifetime. [02:54:29] But I thought that problem was fixed, in 2015. [02:55:59] I hate gearman [02:56:02] * no_justification grumbles [02:56:09] I don't think Gearman is the problem. [02:56:12] other jobs are working fine [02:56:19] And Gearman doesn't do anything with individual hosts [02:56:28] Restarting that link from jenkins fixes it [02:56:46] but the problem seems to be that Jenkins is somehow breaking its link with the beta jenkins slave agent [02:56:52] And no amount of slave restarting fixes it [02:57:04] I've investigated this dozens of times [02:57:14] Does...it have to be a Jenkins slave? [02:57:21] Couldn't we just fire off some commands over SSH? [02:57:33] (running Jenkins on beta's tin feels like overkill) [02:57:41] Oh it's not runnig there [02:57:44] Jenkins ssh's into it yes [02:57:47] Or heck: just automate it as a cron? [02:57:55] A cron seems less error prone ;-) [02:58:05] !log demon@tin Pruned MediaWiki: 1.31.0-wmf.21 [keeping static files] (duration: 01m 24s) [02:58:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:58:41] https://integration.wikimedia.org/ci/computer/deployment-tin.eqiad/ [02:59:04] And the weird thing is, executing a manual script via https://integration.wikimedia.org/ci/computer/deployment-tin.eqiad/script [02:59:05] works fine [02:59:36] * Krinkle clicks Disconnect/Relaunch for that slave [03:00:06] * no_justification looks at just doing it as a cron on beta every 5 minutes or something [03:00:16] Oh I see what you mean [03:00:27] not abut running Jenkins (the service) on tin, but not using jenkins [03:00:29] right right [03:01:06] oh yes please just use a cronjob [03:01:11] there's no value added by jenkins [03:02:12] (03CR) 10jenkins-bot: Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414702 (https://phabricator.wikimedia.org/T188008) (owner: 10Subramanya Sastry) [03:02:28] Also: beta-update-databases-eqiad, beta-code-update-eqiad are *basically* the same job [03:02:42] no_justification: indeed [03:02:48] no_justification: Ive killed the pending builds that were stuck [03:02:53] and relaunched the slave agent [03:02:58] seems they are starting now [03:02:59] Fuck this shit [03:03:01] Intresting. [03:03:03] * no_justification just found his dinnertime project [03:03:20] Seems the first time we made it non-stuck without restarting Jenkins master nor without restarting Gearman [03:04:08] (03PS2) 10Krinkle: profiler: Swap order of Xenon and XHGui setup (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414868 (https://phabricator.wikimedia.org/T180183) [03:04:09] (03CR) 10Krinkle: [C: 032] profiler: Swap order of Xenon and XHGui setup (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414868 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [03:04:11] (03PS2) 10Krinkle: profiler: Merge XHGui-setup into the overall hot_profiler block [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414869 (https://phabricator.wikimedia.org/T180183) [03:04:13] (03CR) 10Krinkle: [C: 032] profiler: Merge XHGui-setup into the overall hot_profiler block [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414869 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [03:04:20] (03PS2) 10Krinkle: profiler: Simplify XHGui handling (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414880 [03:04:23] (03CR) 10Krinkle: [C: 032] profiler: Simplify XHGui handling (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414880 (owner: 10Krinkle) [03:05:41] (03PS2) 10Krinkle: profiler: Simplify XHGui handling (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414883 [03:05:43] (03Merged) 10jenkins-bot: profiler: Swap order of Xenon and XHGui setup (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414868 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [03:05:45] (03Merged) 10jenkins-bot: profiler: Merge XHGui-setup into the overall hot_profiler block [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414869 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [03:05:49] (03PS3) 10Krinkle: profiler: Remove now-redundant 'use $XWD' clause [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414883 [03:05:51] (03CR) 10Krinkle: [C: 032] profiler: Remove now-redundant 'use $XWD' clause [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414883 (owner: 10Krinkle) [03:05:58] (03PS3) 10Krinkle: profiler: Move $XWD assignment to inside hot_profiler block (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414881 (https://phabricator.wikimedia.org/T180183) [03:06:07] (03CR) 10Krinkle: [C: 032] profiler: Move $XWD assignment to inside hot_profiler block (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414881 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [03:06:29] (03PS2) 10Krinkle: profiler: Rename $XWD to $xwd, and unset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414884 (https://phabricator.wikimedia.org/T180183) [03:07:08] (03Merged) 10jenkins-bot: profiler: Simplify XHGui handling (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414880 (owner: 10Krinkle) [03:07:42] (03Merged) 10jenkins-bot: profiler: Remove now-redundant 'use $XWD' clause [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414883 (owner: 10Krinkle) [03:07:44] (03Merged) 10jenkins-bot: profiler: Move $XWD assignment to inside hot_profiler block (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414881 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [03:07:53] (03CR) 10Krinkle: [C: 032] profiler: Rename $XWD to $xwd, and unset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414884 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [03:08:02] Staging on tin and mwdebug1002 now [03:08:19] Should all be no-op still (haven't solved the task yet) [03:09:13] (03Merged) 10jenkins-bot: profiler: Rename $XWD to $xwd, and unset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414884 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [03:09:40] no_justification: RE: cron, it'd be nice to have mw-config things happen based on push, though, instead of having to wait 5min [03:09:51] Although I suppose one could also trigger deploys from deployment-tin manually if needed [03:10:21] last I checked it's a bit tricky to do manually because it's all caught up in jenkins chmods. [03:10:23] I think 5mins is typically quick enough for most folks :) [03:10:27] Also, 5mins is upper bound [03:10:28] :) [03:10:29] Sure [03:10:31] Yeah [03:10:42] For wmf-config, could shorten it to every minute, even [03:10:47] It's a quick pull [03:10:48] Right [03:12:09] Also possible: gerrit plugin to trigger the local command :) [03:12:18] Eh, probably overkill. cron is nice [03:12:37] !log krinkle@tin Synchronized wmf-config/profiler-labs.php: beta only (no-op) (duration: 00m 56s) [03:12:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:14:56] (03PS1) 10Chad: Beta: Cron to update wmf-config every 3 minutes [puppet] - 10https://gerrit.wikimedia.org/r/414893 [03:18:21] PROBLEM - HHVM rendering on mw2170 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:19:11] RECOVERY - HHVM rendering on mw2170 is OK: HTTP OK: HTTP/1.1 200 OK - 73745 bytes in 0.296 second response time [03:19:51] (03PS1) 10Krinkle: profiler: Fix typo in $qs assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414895 [03:20:00] (03CR) 10Krinkle: [C: 032] profiler: Simplify XHGui handling (no-op) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414880 (owner: 10Krinkle) [03:20:08] (03CR) 10Krinkle: [C: 032] profiler: Fix typo in $qs assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414895 (owner: 10Krinkle) [03:21:08] no_justification: uh, that script is already existing in puppet? [03:21:15] See the line above it? [03:21:23] Dont tell me... The Jenkins job calls a shell script provisioned by puppet? [03:21:27] I have some WIP to port it to a scap plugin [03:21:35] Yes, of course! [03:21:47] :facepalm: [03:21:56] Yes, by all means, make it a cron then. [03:22:03] It's not even like there's anything to port. [03:22:14] I guess the downside is lack of IRC reporting in the job [03:22:16] Hmmm [03:22:35] git reset --hard "$ZUUL_COMMIT" [03:22:35] git tag "jenkins_build_$BUILD_NUMBER" "$ZUUL_COMMIT" [03:22:36] ^ [03:22:45] That does depend on Jenkins unfortunately [03:22:58] Who cares about that part? [03:23:00] reset --hard origin/hard, would fix it, though. [03:23:01] It's only master. Ever. [03:23:05] Yeah [03:23:08] But does need a fix [03:23:21] Oh yeah, we'll just remove that bit then [03:23:30] And the tags.. [03:23:36] would save some space by not having them [03:23:46] Why do we need git submodule update --remote portals? [03:24:50] well, it could be without 'portals' to update all submodules [03:24:55] but it does need to be there [03:25:00] Ah [03:25:03] I see [03:25:05] given git pull/checkout/reset won't apply submodule changes [03:25:11] it's like doing 'checkout to current ref' [03:25:17] I didn't know about --remote before [03:25:19] Makes sense now [03:25:30] Oh, I missed that [03:25:32] indeed [03:25:45] * Krinkle waits for Jenkins to merge his commit [03:26:32] Amended patch incoming [03:26:41] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 810.72 seconds [03:26:47] (03Merged) 10jenkins-bot: profiler: Fix typo in $qs assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414895 (owner: 10Krinkle) [03:28:11] OK. Confirmed all still works, and this time keeping the 'action' parameter in XHGui urls [03:28:39] !log krinkle@tin Synchronized wmf-config/profiler.php: various refactor and clean up for T180183 (no-op) (duration: 00m 54s) [03:28:47] (03PS2) 10Chad: Beta: Cron to update wmf-config every 3 minutes [puppet] - 10https://gerrit.wikimedia.org/r/414893 [03:28:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:28:53] T180183: Profiling for X-Wikimedia-Debug seems to start fairly late - https://phabricator.wikimedia.org/T180183 [03:29:19] (03PS3) 10Chad: Beta: Cron to update wmf-config every 3 minutes [puppet] - 10https://gerrit.wikimedia.org/r/414893 [03:30:49] (03CR) 10Krinkle: [C: 031] "(Reviewed the beta/files bit, not familiar enough with Puppet to review that.)" [puppet] - 10https://gerrit.wikimedia.org/r/414893 (owner: 10Chad) [03:33:29] (03PS1) 10Krinkle: tests: Add unit test confirming logic of profiler.php XMD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414897 [03:33:53] (03PS2) 10Krinkle: tests: Add unit test confirming logic of profiler.php XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414897 [03:36:12] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1948 bytes in 0.091 second response time [03:36:23] (03PS4) 10Chad: Beta: Cron to update wmf-config every 3 minutes [puppet] - 10https://gerrit.wikimedia.org/r/414893 [03:37:27] (03CR) 10Chad: "PS3 sets RelEng as notification email instead of root cronspam :)" [puppet] - 10https://gerrit.wikimedia.org/r/414893 (owner: 10Chad) [03:37:40] (03CR) 10Chad: "Er, PS4" [puppet] - 10https://gerrit.wikimedia.org/r/414893 (owner: 10Chad) [03:40:06] (03CR) 10Krinkle: [C: 032] tests: Add unit test confirming logic of profiler.php XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414897 (owner: 10Krinkle) [03:41:12] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1940 bytes in 0.095 second response time [03:41:27] (03Merged) 10jenkins-bot: tests: Add unit test confirming logic of profiler.php XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414897 (owner: 10Krinkle) [03:42:45] (03CR) 10Chad: "Needs to trigger a scap afterwords--should just add to the end of the script" [puppet] - 10https://gerrit.wikimedia.org/r/414893 (owner: 10Chad) [03:44:56] (03PS5) 10Chad: Beta: Cron to update wmf-config every 3 minutes [puppet] - 10https://gerrit.wikimedia.org/r/414893 [03:49:42] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 195.77 seconds [03:56:19] (03PS1) 10Krinkle: labs: Remove dead profiler-related code from CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414906 (https://phabricator.wikimedia.org/T180766) [03:56:50] (03CR) 10Krinkle: [C: 031] robots.txt: Combine various NS_SPECIAL disallows [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411255 (owner: 10Chad) [03:57:26] (03CR) 10jerkins-bot: [V: 04-1] labs: Remove dead profiler-related code from CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414906 (https://phabricator.wikimedia.org/T180766) (owner: 10Krinkle) [03:57:38] (03PS1) 10Chad: Beta autoupdate: Clean up, support wmf-config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 [03:57:48] (03CR) 10jerkins-bot: [V: 04-1] Beta autoupdate: Clean up, support wmf-config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 (owner: 10Chad) [03:57:59] Krinkle: If ^ works, we can yank out those puppet-managed scripts & have the cron just call `scap whatever` [03:58:54] (03CR) 10Krinkle: Beta autoupdate: Clean up, support wmf-config itself (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 (owner: 10Chad) [03:59:03] no_justification: Ah, yeah, that seems like a better place to put it [03:59:21] (03PS2) 10Krinkle: labs: Remove dead profiler-related code from CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414906 (https://phabricator.wikimedia.org/T180766) [03:59:22] PROBLEM - HHVM rendering on mw2132 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:00:12] RECOVERY - HHVM rendering on mw2132 is OK: HTTP OK: HTTP/1.1 200 OK - 73745 bytes in 0.519 second response time [04:03:22] Krinkle: Also https://gerrit.wikimedia.org/r/#/c/414923/ [04:04:50] Ideally we just kill all these jobs from Jenkins, but that's a quick improvement [04:04:51] no_justification: Nice [04:05:07] no_justification: btw some of those options aren't the same between those jobs, not sure if that matters though [04:05:16] failure: true, content-type: html [04:05:58] (03CR) 10Krinkle: [C: 032] labs: Remove dead profiler-related code from CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414906 (https://phabricator.wikimedia.org/T180766) (owner: 10Krinkle) [04:06:12] (03PS2) 10Chad: Beta autoupdate: Clean up, support wmf-config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 [04:06:24] (03CR) 10jerkins-bot: [V: 04-1] Beta autoupdate: Clean up, support wmf-config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 (owner: 10Chad) [04:07:43] (03Merged) 10jenkins-bot: labs: Remove dead profiler-related code from CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414906 (https://phabricator.wikimedia.org/T180766) (owner: 10Krinkle) [04:07:45] (03PS3) 10Chad: Beta autoupdate: Clean up, support wmf-config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 [04:10:50] (03PS1) 10Krinkle: profiler: Use preg_match_all instead of parse_str for XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414932 (https://phabricator.wikimedia.org/T180183) [04:11:07] (03PS3) 10Krinkle: tests: Add test to enforce dblists using expressions are pre-computed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413969 [04:12:11] (03CR) 10jerkins-bot: [V: 04-1] profiler: Use preg_match_all instead of parse_str for XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414932 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [04:22:48] (03CR) 10Krinkle: [C: 032] tests: Add test to enforce dblists using expressions are pre-computed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413969 (owner: 10Krinkle) [04:24:03] (03Merged) 10jenkins-bot: tests: Add test to enforce dblists using expressions are pre-computed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413969 (owner: 10Krinkle) [04:24:09] (03PS2) 10Krinkle: profiler: Use preg_match_all instead of parse_str for XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414932 (https://phabricator.wikimedia.org/T180183) [04:25:32] (03CR) 10jerkins-bot: [V: 04-1] profiler: Use preg_match_all instead of parse_str for XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414932 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [04:26:14] (03PS3) 10Krinkle: profiler: Use preg_match_all instead of parse_str for XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414932 (https://phabricator.wikimedia.org/T180183) [04:52:31] PROBLEM - Nginx local proxy to apache on mw2122 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:53:21] RECOVERY - Nginx local proxy to apache on mw2122 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.183 second response time [04:56:21] (03CR) 10Krinkle: [C: 032] profiler: Use preg_match_all instead of parse_str for XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414932 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [04:57:40] (03Merged) 10jenkins-bot: profiler: Use preg_match_all instead of parse_str for XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414932 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [05:00:48] !log krinkle@tin Synchronized wmf-config/profiler.php: I34687c0569af (duration: 00m 57s) [05:01:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:01:09] (03Abandoned) 10BryanDavis: bd808 home: Add mw helper script and fix ~/bin perms [puppet] - 10https://gerrit.wikimedia.org/r/389408 (owner: 10BryanDavis) [05:21:33] (03PS1) 10Krinkle: profiler: Remove code for dumping xhprof to /tmp (no longer works) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414933 (https://phabricator.wikimedia.org/T180183) [05:21:36] (03PS1) 10Krinkle: profiler: Make entire xhprof-related block conditional on XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414934 [05:22:09] (03CR) 10Krinkle: [C: 032] profiler: Remove code for dumping xhprof to /tmp (no longer works) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414933 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [05:24:06] (03Merged) 10jenkins-bot: profiler: Remove code for dumping xhprof to /tmp (no longer works) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414933 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [05:24:49] (03PS2) 10Krinkle: [WIP] profiler: Make entire xhprof-related block conditional on XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414934 [05:24:53] (03CR) 10Krinkle: [C: 04-1] [WIP] profiler: Make entire xhprof-related block conditional on XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414934 (owner: 10Krinkle) [05:26:05] !log krinkle@tin Synchronized wmf-config/profiler.php: I1e7dc263b43 (duration: 00m 56s) [05:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:56:50] (03CR) 10jenkins-bot: Enable ORES filters on simplewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395818 (https://phabricator.wikimedia.org/T182012) (owner: 10Catrope) [05:56:52] (03CR) 10jenkins-bot: Beta: Attempt using LCStoreStaticArray [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414865 (https://phabricator.wikimedia.org/T99740) (owner: 10Chad) [05:56:56] (03CR) 10jenkins-bot: Enable ORES filters on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414859 (https://phabricator.wikimedia.org/T174560) (owner: 10Catrope) [05:56:58] (03CR) 10jenkins-bot: Enable Swedish and Spanish Wikibooks on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414860 (https://phabricator.wikimedia.org/T188349) (owner: 10Awight) [05:57:00] (03CR) 10jenkins-bot: Follow-up 178936e19d9: add very likely bad faith filter for svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414874 (https://phabricator.wikimedia.org/T174560) (owner: 10Catrope) [05:57:02] (03CR) 10jenkins-bot: Remove no-op RemexHtml disable on Wikivoyage section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414870 (owner: 10Jforrester) [05:57:04] (03CR) 10jenkins-bot: profiler-labs: Factor out 'Enable profiler' code, add 'forceprofile' to XWD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414863 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [05:57:06] (03CR) 10jenkins-bot: Enable RemexHtml on all private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414701 (https://phabricator.wikimedia.org/T188009) (owner: 10Subramanya Sastry) [05:57:08] (03CR) 10jenkins-bot: beta: enable mentions in edit summaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413879 (https://phabricator.wikimedia.org/T187835) (owner: 10MaxSem) [05:57:10] (03CR) 10jenkins-bot: profiler: Swap order of Xenon and XHGui setup (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414868 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [05:57:12] (03CR) 10jenkins-bot: profiler: Merge XHGui-setup into the overall hot_profiler block [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414869 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [05:57:14] (03CR) 10jenkins-bot: profiler: Simplify XHGui handling (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414880 (owner: 10Krinkle) [05:57:16] (03CR) 10jenkins-bot: profiler: Remove now-redundant 'use $XWD' clause [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414883 (owner: 10Krinkle) [05:57:18] (03CR) 10jenkins-bot: profiler: Move $XWD assignment to inside hot_profiler block (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414881 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [05:57:21] (03CR) 10jenkins-bot: profiler: Rename $XWD to $xwd, and unset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414884 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [05:57:22] (03CR) 10jenkins-bot: profiler: Fix typo in $qs assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414895 (owner: 10Krinkle) [05:57:24] (03CR) 10jenkins-bot: tests: Add unit test confirming logic of profiler.php XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414897 (owner: 10Krinkle) [05:57:26] (03CR) 10jenkins-bot: labs: Remove dead profiler-related code from CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414906 (https://phabricator.wikimedia.org/T180766) (owner: 10Krinkle) [05:57:29] (03CR) 10jenkins-bot: tests: Add test to enforce dblists using expressions are pre-computed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413969 (owner: 10Krinkle) [05:57:31] (03CR) 10jenkins-bot: profiler: Use preg_match_all instead of parse_str for XWD parsing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414932 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [05:57:32] (03CR) 10jenkins-bot: profiler: Remove code for dumping xhprof to /tmp (no longer works) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414933 (https://phabricator.wikimedia.org/T180183) (owner: 10Krinkle) [06:20:00] !log Reload haproxy on dbproxy1005 [06:20:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:20:36] RECOVERY - haproxy failover on dbproxy1005 is OK: OK check_failover servers up 2 down 0 [06:21:55] !log Stop MySQL on db1115 to copy it to db2093 - tendril (dbtree) service will be down for this maintenance - T184704 [06:22:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:22:08] T184704: Setup tendril database monitoring on 2 new hosts, one on eqiad and one on codfw - https://phabricator.wikimedia.org/T184704 [06:30:32] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414946 [06:32:20] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414946 (owner: 10Marostegui) [06:33:10] !log Deploy schema change on dbstore1002 - T187089 T185128 T153182 [06:33:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:33:26] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [06:33:27] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [06:33:27] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [06:33:46] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414946 (owner: 10Marostegui) [06:35:06] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Slowly repool db1103:3312 (duration: 00m 56s) [06:35:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:35:48] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414946 (owner: 10Marostegui) [06:44:18] (03PS1) 10Chad: Revert "Beta: Attempt using LCStoreStaticArray" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414947 [06:44:22] (03CR) 10Chad: [V: 032 C: 032] Revert "Beta: Attempt using LCStoreStaticArray" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414947 (owner: 10Chad) [06:46:23] (03PS1) 10Chad: Revert "Revert "Beta: Attempt using LCStoreStaticArray"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414948 [06:46:29] (03CR) 10Chad: [V: 032 C: 032] Revert "Revert "Beta: Attempt using LCStoreStaticArray"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414948 (owner: 10Chad) [06:47:17] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414949 [06:48:46] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414949 (owner: 10Marostegui) [06:50:06] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414949 (owner: 10Marostegui) [06:51:19] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Increase traffic for db1103:3312 (duration: 00m 56s) [06:51:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:40] (03CR) 10jenkins-bot: Revert "Beta: Attempt using LCStoreStaticArray" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414947 (owner: 10Chad) [06:53:49] 10Operations, 10ops-codfw, 10DBA: db2048: RAID with predictive failure - https://phabricator.wikimedia.org/T187983#4005049 (10Marostegui) 05Open>03Resolved Thanks! ``` root@db2048:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337E3350) Port Name: 1I... [06:54:27] (03CR) 10jenkins-bot: Revert "Revert "Beta: Attempt using LCStoreStaticArray"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414948 (owner: 10Chad) [06:54:31] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1103:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414949 (owner: 10Marostegui) [06:58:06] (03PS1) 10Marostegui: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414951 [06:59:42] !log demon@tin Synchronized README: no-op (duration: 00m 56s) [06:59:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:55] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414951 (owner: 10Marostegui) [07:01:19] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414951 (owner: 10Marostegui) [07:02:30] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db1084 (duration: 00m 56s) [07:02:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:03:35] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 56s) [07:03:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:04:03] !log Stop MySQL on db1084 for kernel and mariadb upgrade [07:04:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:16] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414951 (owner: 10Marostegui) [07:11:53] (03PS1) 10Ayounsi: Eqsin tunnel changed from over Telia to over NTT [dns] - 10https://gerrit.wikimedia.org/r/414953 [07:13:50] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414954 [07:13:53] (03CR) 10Ayounsi: [C: 032] Eqsin tunnel changed from over Telia to over NTT [dns] - 10https://gerrit.wikimedia.org/r/414953 (owner: 10Ayounsi) [07:15:47] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414954 (owner: 10Marostegui) [07:17:16] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414954 (owner: 10Marostegui) [07:18:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1084 (duration: 00m 56s) [07:19:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:01] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414954 (owner: 10Marostegui) [07:24:28] (03CR) 10Giuseppe Lavagetto: [C: 032] Typo in changelog: jesse -> jessie [software/conftool] - 10https://gerrit.wikimedia.org/r/414695 (owner: 10Hashar) [07:25:35] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1084,db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414955 [07:26:06] (03CR) 10jerkins-bot: [V: 04-1] Typo in changelog: jesse -> jessie [software/conftool] - 10https://gerrit.wikimedia.org/r/414695 (owner: 10Hashar) [07:26:51] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Typo in changelog: jesse -> jessie [software/conftool] - 10https://gerrit.wikimedia.org/r/414695 (owner: 10Hashar) [07:27:05] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Tweak gbp to use 'master' has the upstream branch [software/conftool] - 10https://gerrit.wikimedia.org/r/414694 (owner: 10Hashar) [07:27:24] (03PS1) 10Chad: Revert "Revert "Revert "Beta: Attempt using LCStoreStaticArray""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414956 [07:27:33] (03CR) 10Chad: [V: 032 C: 032] Revert "Revert "Revert "Beta: Attempt using LCStoreStaticArray""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414956 (owner: 10Chad) [07:27:45] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1084,db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414955 (owner: 10Marostegui) [07:27:49] (03CR) 10jenkins-bot: Revert "Revert "Revert "Beta: Attempt using LCStoreStaticArray""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414956 (owner: 10Chad) [07:28:33] (03PS3) 10Giuseppe Lavagetto: Add the --hostname switch to simple node actions. [software/conftool] - 10https://gerrit.wikimedia.org/r/414669 [07:29:56] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1084,db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414955 (owner: 10Marostegui) [07:29:58] (03CR) 10jerkins-bot: [V: 04-1] Add the --hostname switch to simple node actions. [software/conftool] - 10https://gerrit.wikimedia.org/r/414669 (owner: 10Giuseppe Lavagetto) [07:30:09] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1084,db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414955 (owner: 10Marostegui) [07:31:09] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1084 and db1103:3312 (duration: 00m 56s) [07:31:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:46] (03PS1) 10Marostegui: install_server: Allow reimage db1111 [puppet] - 10https://gerrit.wikimedia.org/r/414958 (https://phabricator.wikimedia.org/T187526) [07:39:49] (03CR) 10Marostegui: [C: 032] install_server: Allow reimage db1111 [puppet] - 10https://gerrit.wikimedia.org/r/414958 (https://phabricator.wikimedia.org/T187526) (owner: 10Marostegui) [07:45:24] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#3977928 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1111.eqiad.wmnet'] ```... [07:48:50] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic db1103,db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414960 [07:51:09] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic db1103,db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414960 (owner: 10Marostegui) [07:52:37] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic db1103,db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414960 (owner: 10Marostegui) [07:54:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1084 and db1103:3312 (duration: 00m 56s) [07:54:33] (03PS2) 10Vgutierrez: Provide an UDP monitor. [debs/pybal] - 10https://gerrit.wikimedia.org/r/413211 (https://phabricator.wikimedia.org/T178151) [07:54:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:16] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic db1103,db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414960 (owner: 10Marostegui) [07:56:11] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#4005143 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1111.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['db1111.eqiad.wmnet'... [07:56:49] (03CR) 10Vgutierrez: "Done" [debs/pybal] - 10https://gerrit.wikimedia.org/r/413211 (https://phabricator.wikimedia.org/T178151) (owner: 10Vgutierrez) [07:57:08] (03PS3) 10Muehlenhoff: admins: Add imarlier to udp2log-users [puppet] - 10https://gerrit.wikimedia.org/r/414668 (https://phabricator.wikimedia.org/T188042) [07:57:48] (03CR) 10Muehlenhoff: [C: 032] admins: Add imarlier to udp2log-users [puppet] - 10https://gerrit.wikimedia.org/r/414668 (https://phabricator.wikimedia.org/T188042) (owner: 10Muehlenhoff) [08:03:08] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#4005162 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1111.eqiad.wmnet'] ```... [08:05:04] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Add Ian Marlier to udp2log-users group - https://phabricator.wikimedia.org/T188042#4005166 (10MoritzMuehlenhoff) 05Open>03Resolved @Imarlier You can now log in, closing the ticket. [08:05:45] 10Operations, 10LuaSandbox: Build and deploy hhvm-luasandbox 3.0.0 to Wikimedia wikis - https://phabricator.wikimedia.org/T187673#4005168 (10MoritzMuehlenhoff) p:05Triage>03Normal a:03MoritzMuehlenhoff [08:06:01] 10Operations, 10Analytics, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4005170 (10elukey) [08:06:07] 10Operations, 10LuaSandbox: Build and deploy hhvm-luasandbox 3.0.0 to Wikimedia wikis - https://phabricator.wikimedia.org/T187673#3982203 (10MoritzMuehlenhoff) To be done after the ICU migration. [08:09:54] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic db1084,db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414963 [08:11:49] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic db1084,db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414963 (owner: 10Marostegui) [08:13:15] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic db1084,db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414963 (owner: 10Marostegui) [08:13:30] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic db1084,db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414963 (owner: 10Marostegui) [08:14:23] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1084 and db1103:3312 (duration: 00m 56s) [08:14:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:34] (03CR) 10Gehel: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/412873 (https://phabricator.wikimedia.org/T185951) (owner: 10Gehel) [08:24:57] !log gilles@tin Synchronized private/PrivateSettings.php: Separate Thumbor Swift user for private containers (duration: 00m 56s) [08:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:21] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414964 [08:26:05] (03PS1) 10Gilles: Set up separate Thumbor Swift user for private containers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414965 (https://phabricator.wikimedia.org/T187822) [08:27:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414964 (owner: 10Marostegui) [08:28:13] (03PS1) 10Marostegui: Revert "install_server: Allow reimage db1111" [puppet] - 10https://gerrit.wikimedia.org/r/414966 [08:28:27] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414964 (owner: 10Marostegui) [08:28:48] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414964 (owner: 10Marostegui) [08:29:27] PROBLEM - puppet last run on elastic1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:29:34] 10Operations, 10DNS, 10Traffic: Move "transparency.wikimedia.org/private" to "transparency-private.wikimedia.org" - https://phabricator.wikimedia.org/T188362#4005199 (10Peachey88) [08:29:51] (03PS2) 10Marostegui: Revert "install_server: Allow reimage db1111" [puppet] - 10https://gerrit.wikimedia.org/r/414966 [08:30:03] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1084 (duration: 00m 56s) [08:30:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:16] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:31:06] PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:31:06] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:31:07] PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:31:07] PROBLEM - puppet last run on logstash1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:31:36] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:31:36] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:31:57] PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:32:36] PROBLEM - puppet last run on boron is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:32:47] PROBLEM - puppet last run on wdqs1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:33:16] PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:33:16] PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:33:16] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:33:17] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:33:36] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:33:36] PROBLEM - puppet last run on analytics1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:34:07] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:34:17] PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:34:26] ^ oomkiller on nitrogen [08:34:27] let me guess.. nitrogen? :D [08:35:14] we should be renaming nitrogen in arsenic or plutonium [08:35:18] * moritzm sends a prize to Bologna [08:35:42] ahahhaha [08:36:01] godog: +1 [08:36:31] LOL [08:38:17] RECOVERY - puppet last run on conf1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:38:17] Killed process 730 (java) total-vm:12285132kB, anon-rss:5982768kB, file-rss:0kB, shmem-rss:0kB [08:38:43] so ~6G of RSS [08:41:41] * elukey would love to test swappiness=1 [08:42:34] !log powercycling wdqs1004 - T188045 [08:42:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:49] T188045: wdqs1004 broken - https://phabricator.wikimedia.org/T188045 [08:50:07] (03CR) 10Giuseppe Lavagetto: "recheck" [software/conftool] - 10https://gerrit.wikimedia.org/r/414669 (owner: 10Giuseppe Lavagetto) [08:50:12] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4005217 (10Gehel) Hardware diagnostics have completed with no error. It is now up again, catching on updates, but still depooled. I'll keep an eye on it, and if it looks stable and... [08:52:42] 10Operations, 10netops: cr1-eqsin faulty interfaces - https://phabricator.wikimedia.org/T187807#4005229 (10ayounsi) Unit shipped with https://www.expeditors.com/ Supposed to arrive in Singapore on the 1st, and clear custom 2 days later, for a final ETA of 03-Mar-2018 20:25:00 SGT. As this is a Saturday 8pm, t... [08:53:17] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [08:53:24] (03CR) 10Giuseppe Lavagetto: [C: 032] Add the --hostname switch to simple node actions. [software/conftool] - 10https://gerrit.wikimedia.org/r/414669 (owner: 10Giuseppe Lavagetto) [08:53:42] (03CR) 10Giuseppe Lavagetto: "recheck" [software/conftool] - 10https://gerrit.wikimedia.org/r/414670 (owner: 10Giuseppe Lavagetto) [08:57:31] (03PS2) 10Filippo Giunchedi: hieradata: depool rhodium [puppet] - 10https://gerrit.wikimedia.org/r/414706 (https://phabricator.wikimedia.org/T184562) [08:57:47] RECOVERY - puppet last run on wdqs1003 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [08:57:55] I'm merging ^ as soon as jenkins is done, there might be some puppet failures [08:58:47] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [08:58:51] (03CR) 10Alexandros Kosiaris: "I am not sure I understand this example tbh. For example, why is this preferable to a a profile class with a parameter $modules (an array " [puppet] - 10https://gerrit.wikimedia.org/r/414748 (owner: 10Dzahn) [08:59:02] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: depool rhodium [puppet] - 10https://gerrit.wikimedia.org/r/414706 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [08:59:08] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [08:59:18] RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:59:28] RECOVERY - puppet last run on elastic1050 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:00:18] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:01:07] RECOVERY - puppet last run on labvirt1013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:01:17] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:01:17] RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [09:01:17] RECOVERY - puppet last run on logstash1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [09:01:38] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [09:01:40] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [09:01:58] RECOVERY - puppet last run on mw1282 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [09:02:37] RECOVERY - puppet last run on boron is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [09:03:17] RECOVERY - puppet last run on labvirt1012 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [09:03:17] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [09:07:17] RECOVERY - puppet last run on analytics1044 is OK: OK: Puppet is currently enabled, last run 9 minutes ago with 0 failures [09:07:49] (03PS2) 10Filippo Giunchedi: install_server: reinstall rhodium with Stretch [puppet] - 10https://gerrit.wikimedia.org/r/414707 (https://phabricator.wikimedia.org/T184562) [09:09:16] (03CR) 10Filippo Giunchedi: [C: 032] install_server: reinstall rhodium with Stretch [puppet] - 10https://gerrit.wikimedia.org/r/414707 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [09:13:53] (03CR) 10Giuseppe Lavagetto: [C: 032] Make full path of the object seen in the output for any change in SetAction and EditAction [software/conftool] - 10https://gerrit.wikimedia.org/r/414670 (owner: 10Giuseppe Lavagetto) [09:14:13] (03PS3) 10Giuseppe Lavagetto: Make full path of the object seen in the output for any change in SetAction and EditAction [software/conftool] - 10https://gerrit.wikimedia.org/r/414670 [09:14:29] <_joe_> win 25 [09:16:20] (03PS1) 10Marostegui: wmnet: Replaced db2011 with db2044 [dns] - 10https://gerrit.wikimedia.org/r/414967 (https://phabricator.wikimedia.org/T187886) [09:16:43] !log reimage rhodium - T184562 [09:16:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:57] T184562: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562 [09:18:55] (03PS3) 10Marostegui: Revert "install_server: Allow reimage db1111" [puppet] - 10https://gerrit.wikimedia.org/r/414966 [09:19:47] (03CR) 10Marostegui: [C: 032] Revert "install_server: Allow reimage db1111" [puppet] - 10https://gerrit.wikimedia.org/r/414966 (owner: 10Marostegui) [09:23:16] (03CR) 10Lokal Profil: "> If you want to have this merged, do not forget to add it to" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404942 (https://phabricator.wikimedia.org/T184981) (owner: 10Lokal Profil) [09:25:54] (03Abandoned) 10Filippo Giunchedi: puppetmaster: use puppetdb-termini on stretch [puppet] - 10https://gerrit.wikimedia.org/r/413690 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [09:30:35] (03PS8) 10Lokal Profil: Drop the medlem user group and editallpages user right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404942 (https://phabricator.wikimedia.org/T184981) [09:37:14] (03CR) 10Jcrespo: [C: 031] wmnet: Replaced db2011 with db2044 [dns] - 10https://gerrit.wikimedia.org/r/414967 (https://phabricator.wikimedia.org/T187886) (owner: 10Marostegui) [09:38:00] (03CR) 10Marostegui: [C: 032] wmnet: Replaced db2011 with db2044 [dns] - 10https://gerrit.wikimedia.org/r/414967 (https://phabricator.wikimedia.org/T187886) (owner: 10Marostegui) [09:50:57] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#4005279 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1111.eqiad.wmnet'] ``` and were **ALL** successful. [09:52:37] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#4005283 (10Marostegui) 05Open>03Resolved a:03Marostegui The server was reimaged and all the data transferred back from db1112 and it is now fu... [10:03:43] <_joe_> !log uploading conftool-1.0.0-1 to jessie-wikimedia [10:03:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:55] !log reboot scb in eqiad for kernel security updates [10:06:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:37] (03CR) 10Filippo Giunchedi: "AFAICT activerecord is used for storedconfigs, for which we use puppetdb now" [puppet] - 10https://gerrit.wikimedia.org/r/414674 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [10:07:58] !log poweroff sca1004 for T181121 tests [10:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:08:11] T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O - https://phabricator.wikimedia.org/T181121 [10:08:30] PROBLEM - Host sca1004 is DOWN: PING CRITICAL - Packet loss = 100% [10:10:26] (03PS15) 10Elukey: [WIP] eventlogging: add systemd support [puppet] - 10https://gerrit.wikimedia.org/r/413362 [10:11:03] (03PS16) 10Elukey: [WIP] eventlogging: add systemd support [puppet] - 10https://gerrit.wikimedia.org/r/413362 [10:12:20] PROBLEM - Check systemd state on rhodium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:12:40] (03PS2) 10Giuseppe Lavagetto: Release 1.0.0 [software/conftool] - 10https://gerrit.wikimedia.org/r/414715 [10:13:10] RECOVERY - Host sca1004 is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms [10:14:01] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 11 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ruby-mysql],Package[ruby-activerecord-deprecated-finders] [10:15:36] (03CR) 10Giuseppe Lavagetto: [C: 032] Release 1.0.0 [software/conftool] - 10https://gerrit.wikimedia.org/r/414715 (owner: 10Giuseppe Lavagetto) [10:16:34] <_joe_> godog: yeah ruby-activerecord-deprecated-finders was a transitional thing for people not using puppetdb but the activerecord stored configs [10:16:42] <_joe_> I think we can drop it from stretch [10:18:33] (03PS17) 10Elukey: [WIP] eventlogging: add systemd support [puppet] - 10https://gerrit.wikimedia.org/r/413362 [10:20:14] 10Operations, 10Wikimedia-Incident: Detect high server load earlier – prometheus alert? - https://phabricator.wikimedia.org/T188317#4003428 (10fgiunchedi) Would be nice indeed, my preference would be for something around latency and/or (number of errors) / (number of successes + number of errors) [10:20:42] _joe_: yeah it looked like it, I'll merge the patch shortly [10:21:00] (03CR) 10Alexandros Kosiaris: [C: 032] "This increases by 28% (35->45) the celery worker count per https://puppet-compiler.wmflabs.org/compiler02/10145/ores2001.codfw.wmnet/ . Wh" [puppet] - 10https://gerrit.wikimedia.org/r/414666 (owner: 10Awight) [10:21:05] (03PS3) 10Alexandros Kosiaris: Restore ORES celery worker count; kill defaults [puppet] - 10https://gerrit.wikimedia.org/r/414666 (owner: 10Awight) [10:24:32] (03PS18) 10Elukey: [WIP] eventlogging: add systemd support [puppet] - 10https://gerrit.wikimedia.org/r/413362 [10:29:10] !log Starting big global rename: Darkweasel94 → Tokfo - with DBA/OPS green light - T187629 [10:29:23] (03PS1) 10Vgutierrez: Provide BGP session state visibility for every ASN/peer This can be leveraged over the instrumentation web server to trigger alarms when no BGP sessions are established [debs/pybal] - 10https://gerrit.wikimedia.org/r/414973 (https://phabricator.wikimedia.org/T188085) [10:29:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:24] T187629: Global rename of Darkweasel94 → Tokfo: supervision needed - https://phabricator.wikimedia.org/T187629 [10:33:54] 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Pybal stuck at BGP state OPENSENT while the other peer reached ESTABLISHED - https://phabricator.wikimedia.org/T188085#4005400 (10Vgutierrez) [[ https://gerrit.wikimedia.org/r/414973 |Change 414973 ]] exposes the BGP session state over prometheus and ov... [10:37:06] (03PS19) 10Elukey: [WIP] eventlogging: add systemd support [puppet] - 10https://gerrit.wikimedia.org/r/413362 [10:37:50] (03PS2) 10Filippo Giunchedi: puppetmaster: ruby-activerecord-deprecated-finders not in stretch [puppet] - 10https://gerrit.wikimedia.org/r/414674 (https://phabricator.wikimedia.org/T184562) [10:39:06] (03CR) 10Filippo Giunchedi: [C: 032] puppetmaster: ruby-activerecord-deprecated-finders not in stretch [puppet] - 10https://gerrit.wikimedia.org/r/414674 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [10:49:04] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:49:32] !log powercycling scb1003, stuck during reboot [10:49:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:01] <_joe_> I was about to ask :P [10:50:45] (03PS20) 10Elukey: [WIP] eventlogging: add systemd support [puppet] - 10https://gerrit.wikimedia.org/r/413362 (https://phabricator.wikimedia.org/T114199) [10:51:02] <_joe_> !log uploaded python-conftool 1.0.0 to stretch-wikimedia [10:51:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:21] <_joe_> !log updating python-conftool everywhere to 1.0.0 [10:51:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:50] !log keeping scb1003 depooled for T188385 [10:58:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:05] T188385: Memory initialization error on scb1003 - https://phabricator.wikimedia.org/T188385 [10:58:53] <_joe_> moritzm: ouch [10:59:04] uh oh [10:59:06] <_joe_> akosiaris: ^^ we should be ok now that ores is off those machines [10:59:14] <_joe_> right? [10:59:27] <_joe_> moritzm: lemme set it to pooled=inactive [10:59:29] yes [10:59:53] strictly speaking we could also repool it, but seems non-ideal [11:00:02] !log oblivian@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=scb1003.eqiad.wmnet [11:00:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:19] <_joe_> agreed [11:00:32] host booted up just fine after pressing F1, but still who knows what might happen [11:00:35] <_joe_> btw, the command I gave is the now-handy [11:00:37] <_joe_> sudo -i confctl decommission --hostname scb1003.eqiad.wmnet [11:00:52] <_joe_> moritzm: ^^ I added the --hostname parameter [11:00:58] <_joe_> it's being deployed right now [11:01:13] thanks! I saw the commit, will adapt my scripts later on [11:04:34] 10Operations: TransparencyReport-private is not auto deploying - https://phabricator.wikimedia.org/T188224#4005476 (10akosiaris) 05Open>03Invalid >>! In T188224#4002915, @APalmer_WMF wrote: > Thanks, everyone! We spoke with @Catrope last week, and he was able to get it working again. Is there any way to dete... [11:08:28] (03CR) 10Ema: "I'm not sure the check is actually working. I've deployed the monitor on pybal-test2002 and tried overriding (/etc/hosts) the IP address o" (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/413211 (https://phabricator.wikimedia.org/T178151) (owner: 10Vgutierrez) [11:08:35] !log rebooting mw1240-mw1258 (app servers) for kernel security update [11:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:32] (03PS1) 10Filippo Giunchedi: puppetmaster: document 'offline' worker option [puppet] - 10https://gerrit.wikimedia.org/r/414978 (https://phabricator.wikimedia.org/T184562) [11:15:34] (03PS1) 10Filippo Giunchedi: puppetmaster: add rhodium, depooled [puppet] - 10https://gerrit.wikimedia.org/r/414979 (https://phabricator.wikimedia.org/T184562) [11:19:05] PROBLEM - Host mw1241 is DOWN: PING CRITICAL - Packet loss = 100% [11:19:05] PROBLEM - Host mw1245 is DOWN: PING CRITICAL - Packet loss = 100% [11:19:44] RECOVERY - Host mw1241 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [11:19:44] RECOVERY - Host mw1245 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [11:27:37] (03CR) 10Alexandros Kosiaris: [C: 031] puppetmaster: add rhodium, depooled [puppet] - 10https://gerrit.wikimedia.org/r/414979 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [11:28:42] (03CR) 10Alexandros Kosiaris: [C: 031] puppetmaster: document 'offline' worker option [puppet] - 10https://gerrit.wikimedia.org/r/414978 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [11:39:36] 10Operations, 10Puppet, 10Patch-For-Review, 10User-fgiunchedi: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4005542 (10fgiunchedi) >>! In T184562#4001762, @fgiunchedi wrote: > I mocked some configuration values and installed mariadb on `puppetmaster-fil... [11:40:56] (03PS2) 10Filippo Giunchedi: puppetmaster: document 'offline' worker option [puppet] - 10https://gerrit.wikimedia.org/r/414978 (https://phabricator.wikimedia.org/T184562) [11:41:49] (03CR) 10Filippo Giunchedi: [C: 032] puppetmaster: document 'offline' worker option [puppet] - 10https://gerrit.wikimedia.org/r/414978 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [11:41:58] (03PS2) 10Filippo Giunchedi: puppetmaster: add rhodium, depooled [puppet] - 10https://gerrit.wikimedia.org/r/414979 (https://phabricator.wikimedia.org/T184562) [11:42:51] (03CR) 10Filippo Giunchedi: [C: 032] puppetmaster: add rhodium, depooled [puppet] - 10https://gerrit.wikimedia.org/r/414979 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [11:47:52] RECOVERY - Check systemd state on rhodium is OK: OK - running: The system is fully operational [11:52:44] (03PS1) 10Arturo Borrero Gonzalez: toollabs: include clush hosts group for canary servers [puppet] - 10https://gerrit.wikimedia.org/r/414983 (https://phabricator.wikimedia.org/T181647) [11:54:11] (03PS2) 10Arturo Borrero Gonzalez: toollabs: include clush hosts group for canary servers [puppet] - 10https://gerrit.wikimedia.org/r/414983 (https://phabricator.wikimedia.org/T181647) [11:54:43] (03Abandoned) 10Arturo Borrero Gonzalez: toollabs: tools-clush-generator: introduce clush group 'one_of_each' [puppet] - 10https://gerrit.wikimedia.org/r/414657 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [11:54:55] (03CR) 10Arturo Borrero Gonzalez: [C: 032] toollabs: include clush hosts group for canary servers [puppet] - 10https://gerrit.wikimedia.org/r/414983 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [11:55:59] !log rebooting mw1221-mw1235 (API servers) for kernel security update [11:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:36] (03PS3) 10Vgutierrez: Provide an UDP monitor. [debs/pybal] - 10https://gerrit.wikimedia.org/r/413211 (https://phabricator.wikimedia.org/T178151) [11:59:50] (03PS1) 10Arturo Borrero Gonzalez: toollabs: fix path of new canary server list [puppet] - 10https://gerrit.wikimedia.org/r/414985 (https://phabricator.wikimedia.org/T181647) [12:00:49] (03CR) 10Arturo Borrero Gonzalez: [C: 032] toollabs: fix path of new canary server list [puppet] - 10https://gerrit.wikimedia.org/r/414985 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [12:01:56] (03CR) 10Vgutierrez: Provide an UDP monitor. (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/413211 (https://phabricator.wikimedia.org/T178151) (owner: 10Vgutierrez) [12:03:02] (03CR) 10Vgutierrez: "@ema you're describing the limitation of this monitor. pybal-test2001 has a default input policy of DROP, so instead of rejecting the inco" [debs/pybal] - 10https://gerrit.wikimedia.org/r/413211 (https://phabricator.wikimedia.org/T178151) (owner: 10Vgutierrez) [12:05:09] 10Operations, 10Datasets-General-or-Unknown, 10User-ArielGlenn: Reboots of dumps/snapshot hosts - https://phabricator.wikimedia.org/T188242#4005597 (10ArielGlenn) p:05Triage>03Normal [12:11:36] 10Operations, 10Ops-Access-Requests: reinstate ezachte's access - https://phabricator.wikimedia.org/T188335#4005620 (10MoritzMuehlenhoff) [12:16:06] !log The global rename: Darkweasel94 → Tokfo has FINISHED - T187629 [12:16:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:16:19] T187629: Global rename of Darkweasel94 → Tokfo: supervision needed - https://phabricator.wikimedia.org/T187629 [12:26:22] (03PS1) 10Muehlenhoff: Reinstate Erik's key after OS change [puppet] - 10https://gerrit.wikimedia.org/r/414989 [12:30:29] (03CR) 10Muehlenhoff: [C: 032] Reinstate Erik's key after OS change [puppet] - 10https://gerrit.wikimedia.org/r/414989 (owner: 10Muehlenhoff) [12:35:39] !log Remove /srv/tmp/dbstore1001 files from es1017 to free up space - T186596 [12:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:56] T186596: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596 [12:40:08] (03PS1) 10Urbanecm: Fix: Add missed line in wgLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) [12:41:03] jouncebot, next [12:41:03] In 1 hour(s) and 18 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180227T1400) [12:46:34] (03CR) 10MarcoAurelio: [C: 031] Fix: Add missed line in wgLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [12:58:13] (03CR) 10Elukey: [C: 031] "Are we planning to deploy this soonish? If so I'll wait to merge my change to migrate AQS to the jmx exporter :)" [puppet] - 10https://gerrit.wikimedia.org/r/402069 (https://phabricator.wikimedia.org/T181728) (owner: 10Filippo Giunchedi) [13:00:36] !log inserting wikidata-related interwikis to site_identifiers table using eval.php in enwiki (T183019) [13:00:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:51] T183019: Wikibase must not insert local recentchanges entries for nonexistent local users (days: 5) - https://phabricator.wikimedia.org/T183019 [13:01:56] 10Operations, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#4005760 (10elukey) 05Open>03stalled >>! In T181036#3979339, @Nuria wrote: > Are we planing to use tranquility to move the he data into druid... [13:11:00] (03Abandoned) 10Filippo Giunchedi: WIP ruby-mysql2 [puppet] - 10https://gerrit.wikimedia.org/r/414675 (https://phabricator.wikimedia.org/T184562) (owner: 10Filippo Giunchedi) [13:22:37] !log upload ruby-mysql 2.9.1-1~bpo9+1 to stretch-wikimedia - T184562 [13:22:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:53] T184562: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562 [13:25:17] !log rebooting thumbor in codfw for kernel security update [13:25:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:44] !log starting rolling restart of elasticsearch / cirrus codfw (config changes + kernel upgrade) [13:33:54] PROBLEM - DPKG on restbase-dev1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:33:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:15] ^that's me [13:37:13] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Re-setup lvs1007-lvs1012, replace lvs1001-lvs1006 - https://phabricator.wikimedia.org/T150256#4005869 (10BBlack) [13:37:18] 10Operations, 10ops-eqiad, 10Traffic, 10netops, 10Patch-For-Review: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#4005867 (10BBlack) 05Open>03declined Gave up on these machines! [13:37:21] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Re-setup lvs1007-lvs1012, replace lvs1001-lvs1006 - https://phabricator.wikimedia.org/T150256#2779434 (10BBlack) 05Open>03declined Gave up on these machines! [13:39:22] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4005877 (10BBlack) [13:41:27] (03CR) 10Rush: [C: 032] openstack: keystone running on mitaka setup [puppet] - 10https://gerrit.wikimedia.org/r/414847 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [13:41:33] (03PS3) 10Rush: openstack: keystone running on mitaka setup [puppet] - 10https://gerrit.wikimedia.org/r/414847 (https://phabricator.wikimedia.org/T188266) [13:42:22] (03PS4) 10Rush: openstack: keystone running on mitaka setup [puppet] - 10https://gerrit.wikimedia.org/r/414847 (https://phabricator.wikimedia.org/T188266) [13:42:51] 10Operations, 10Traffic: Fix lvs1001-6 storage - https://phabricator.wikimedia.org/T136737#4005883 (10BBlack) 05Open>03Resolved a:03BBlack [13:42:53] 10Operations, 10Patch-For-Review: Audit/fix hosts with no RAID configured - https://phabricator.wikimedia.org/T136562#4005885 (10BBlack) [13:45:50] (03Abandoned) 10Paladox: puppetmaster: Use ruby-mysql2 over ruby-mysql and migrate servermon to it [puppet] - 10https://gerrit.wikimedia.org/r/391336 (https://phabricator.wikimedia.org/T184562) (owner: 10Paladox) [13:52:47] (03PS1) 10Giuseppe Lavagetto: puppetmaster::frontend: re-add testing vhost [puppet] - 10https://gerrit.wikimedia.org/r/415005 [13:53:13] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414755 (https://phabricator.wikimedia.org/T188292) (owner: 10Urbanecm) [13:53:26] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster::frontend: re-add testing vhost [puppet] - 10https://gerrit.wikimedia.org/r/415005 (owner: 10Giuseppe Lavagetto) [13:54:33] (03PS2) 10Zfilipin: New throttle rule for cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413345 (https://phabricator.wikimedia.org/T187990) (owner: 10Urbanecm) [13:54:43] (03Merged) 10jenkins-bot: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414755 (https://phabricator.wikimedia.org/T188292) (owner: 10Urbanecm) [13:55:05] jouncebot: now [13:55:05] No deployments scheduled for the next 0 hour(s) and 4 minute(s) [13:55:18] are you doing swat? [13:55:22] jouncebot: next [13:55:22] In 0 hour(s) and 4 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180227T1400) [13:56:28] (03PS1) 10Gilles: Upgrade to 1.15 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/415006 (https://phabricator.wikimedia.org/T187822) [13:56:32] Hauskatze: I am doing SWAT, if nobody else insists :) [13:56:44] (03PS3) 10Zfilipin: New throttle rule for cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413345 (https://phabricator.wikimedia.org/T187990) (owner: 10Urbanecm) [13:57:55] 10Operations, 10Cloud-VPS: package prometheus-rabbitmq-exporter for Debian jessie - https://phabricator.wikimedia.org/T188392#4005905 (10chasemp) [13:58:02] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413345 (https://phabricator.wikimedia.org/T187990) (owner: 10Urbanecm) [13:58:05] 10Operations, 10Cloud-VPS: package prometheus-rabbitmq-exporter for Debian jessie - https://phabricator.wikimedia.org/T188392#4005917 (10chasemp) p:05Triage>03Normal [13:58:11] (03CR) 10jenkins-bot: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414755 (https://phabricator.wikimedia.org/T188292) (owner: 10Urbanecm) [13:58:14] 10Operations, 10Cloud-VPS: package prometheus-rabbitmq-exporter for Debian jessie - https://phabricator.wikimedia.org/T188392#4005905 (10chasemp) [13:58:38] (03CR) 10Zfilipin: "@Urbanecm: "The change could not be rebased due to a conflict during merge."" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413461 (https://phabricator.wikimedia.org/T188034) (owner: 10Urbanecm) [13:59:20] (03Merged) 10jenkins-bot: New throttle rule for cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413345 (https://phabricator.wikimedia.org/T187990) (owner: 10Urbanecm) [13:59:26] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4005921 (10BBlack) The hard part here is mapping out the necessary network ports correctly: * Each of the 4 servers is physically located in a different row (I'm assuming fo... [13:59:37] (03PS1) 10Rush: openstack: admin_scripts for mitaka and prometheus package logic [puppet] - 10https://gerrit.wikimedia.org/r/415007 (https://phabricator.wikimedia.org/T188266) [14:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180227T1400). [14:00:04] Urbanecm, raynor, gilles, and Hauskatze: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:12] o/ [14:00:15] o/ [14:00:16] I can SWAT today [14:00:22] o/ [14:00:28] (03PS1) 10Ema: cache_text: upgrade esams to varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/415008 (https://phabricator.wikimedia.org/T184448) [14:00:40] hello zeljkof :) [14:00:43] present [14:00:48] and if you break the wikis and do not fix them what do they award to you? [14:01:09] Hauskatze, of course. If you accept loosing of deploy rights as an award... :D [14:01:10] Urbanecm, raynor, gilles, and Hauskatze: I have probably asked you many times already, but I will ask one more time, do you want to deploy your own commits, if you can? :) [14:01:21] I'll deploy mine [14:01:29] I can go last [14:01:29] zeljkof, I have no deploy privs :( [14:01:30] zeljkof: I do not have deploy rights [14:01:31] (03PS2) 10Rush: openstack: admin_scripts for mitaka and prometheus package logic [puppet] - 10https://gerrit.wikimedia.org/r/415007 (https://phabricator.wikimedia.org/T188266) [14:01:53] gilles: great, I will let you know when I finish [14:02:40] Urbanecm: you are first, I have already merged the first two throttle commits, but the third one has a conflict (413461) [14:02:41] (03CR) 10jenkins-bot: New throttle rule for cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413345 (https://phabricator.wikimedia.org/T187990) (owner: 10Urbanecm) [14:02:49] zeljkof, will fix it [14:02:59] zeljkof, my patch will take a while to test ~15min [14:03:07] Urbanecm: thanks, I'll deploy them together [14:03:49] raynor_: ok, then you are third (after Urbanecm and Hauskatze) [14:03:49] (03CR) 10Rush: [C: 031] "Talked a bit with filippo about what is already covered and labstore100[1345].eqiad.wmnet are so this is more-or-less *.wikimedia.org host" [puppet] - 10https://gerrit.wikimedia.org/r/412860 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [14:04:14] (03PS3) 10Urbanecm: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413461 (https://phabricator.wikimedia.org/T188034) [14:04:20] zeljkof, fixed [14:04:21] kk [14:04:41] (03CR) 10Rush: [C: 032] openstack: admin_scripts for mitaka and prometheus package logic [puppet] - 10https://gerrit.wikimedia.org/r/415007 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [14:04:48] Urbanecm: reviewing [14:05:02] ack [14:05:12] brb [14:05:15] (03CR) 10jerkins-bot: [V: 04-1] New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413461 (https://phabricator.wikimedia.org/T188034) (owner: 10Urbanecm) [14:05:22] Well... [14:05:23] Urbanecm: you made a mistake, see https://gerrit.wikimedia.org/r/#/c/413461/3/wmf-config/throttle.php [14:05:28] line 77 [14:05:53] !log Update tendril shard table for the "tendril" replication topology - T184704 [14:05:56] (03PS4) 10Urbanecm: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413461 (https://phabricator.wikimedia.org/T188034) [14:05:57] Attempt #2 [14:06:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:07] T184704: Setup tendril database monitoring on 2 new hosts, one on eqiad and one on codfw - https://phabricator.wikimedia.org/T184704 [14:08:14] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413461 (https://phabricator.wikimedia.org/T188034) (owner: 10Urbanecm) [14:08:23] (03CR) 10Filippo Giunchedi: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/402069 (https://phabricator.wikimedia.org/T181728) (owner: 10Filippo Giunchedi) [14:08:46] I'm back [14:09:44] (03Merged) 10jenkins-bot: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413461 (https://phabricator.wikimedia.org/T188034) (owner: 10Urbanecm) [14:10:13] (03PS2) 10Elukey: cassandra: use prometheus-jmx-exporter Debian package [puppet] - 10https://gerrit.wikimedia.org/r/402069 (https://phabricator.wikimedia.org/T181728) (owner: 10Filippo Giunchedi) [14:10:13] Hauskatze: does your commit take a long time to test? [14:10:32] mw core patches take some minutes zeljkof as you can see [14:10:45] Hauskatze: to test, not to merge [14:10:48] (03PS2) 10Giuseppe Lavagetto: puppetmaster::frontend: re-add testing vhost [puppet] - 10https://gerrit.wikimedia.org/r/415005 [14:10:55] + mighty jenkinks can decide not to work [14:10:56] ah [14:11:04] zeljkof: my patch cannot be tested [14:11:12] not without performing a rename [14:11:22] (03CR) 10BBlack: [C: 031] "Seems reasonable to me, but @ema should stare at this first too I think" [puppet] - 10https://gerrit.wikimedia.org/r/404158 (https://phabricator.wikimedia.org/T69015) (owner: 10Mholloway) [14:11:38] Hauskatze: can you perform it? [14:12:09] zeljkof: sure, but if there are no queued requests I won't be able [14:12:24] feel free not to deploy if you're not confortable [14:12:26] (03CR) 10jenkins-bot: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413461 (https://phabricator.wikimedia.org/T188034) (owner: 10Urbanecm) [14:12:32] it'll arrive on today's train nonetheless [14:12:40] (03CR) 10Elukey: [C: 032] cassandra: use prometheus-jmx-exporter Debian package [puppet] - 10https://gerrit.wikimedia.org/r/402069 (https://phabricator.wikimedia.org/T181728) (owner: 10Filippo Giunchedi) [14:12:46] though I'd like to fix that failed global rename sooner [14:12:59] instead of waiting for the train which can or cannot go [14:13:05] Hauskatze: if it will be deployed during train today, and if it's not urgent, I would prefer not to deploy during swat [14:13:15] as you wish [14:13:23] if it't urgent, I can deploy [14:15:07] !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:414755|Add new throttle rule (T188292)]] [[gerrit:413345|New throttle rule for cswiki (T187990)]] [[gerrit:413461|New throttle rule (T188034)]] (duration: 00m 57s) [14:15:19] Urbanecm: throttle rules deployed [14:15:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:25] T188034: Please remove account creation limit for UNESCO editathon on 7th, 8th and 9th of March 2018 - https://phabricator.wikimedia.org/T188034 [14:15:25] T187990: Lift IP account limit on 2018-02-28 - https://phabricator.wikimedia.org/T187990 [14:15:26] T188292: Lift IP cap on en.wiki for account creation for MoMA NYC - Saturday March 3 - https://phabricator.wikimedia.org/T188292 [14:16:02] (03PS2) 10BBlack: Add net_driver fact [puppet] - 10https://gerrit.wikimedia.org/r/414739 [14:16:04] (03PS6) 10BBlack: rps: change IRQs without reboot on bnx2x [puppet] - 10https://gerrit.wikimedia.org/r/414676 [14:16:06] (03PS2) 10BBlack: lvs - use new fact to determine bnx2x [puppet] - 10https://gerrit.wikimedia.org/r/414740 [14:16:08] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2049" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415009 [14:16:10] (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2049" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415009 [14:16:50] zeljkof, ack [14:17:27] (03CR) 10BBlack: [C: 032] Add net_driver fact [puppet] - 10https://gerrit.wikimedia.org/r/414739 (owner: 10BBlack) [14:17:47] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:18:00] Urbanecm: merging 414993 ^ [14:18:09] ack [14:19:14] (03PS3) 10Filippo Giunchedi: hieradata: enable SMART for lab(test) [puppet] - 10https://gerrit.wikimedia.org/r/412860 (https://phabricator.wikimedia.org/T86552) [14:19:30] Urbanecm: 414993 can be tested at mwdebug1002, right? (it will be there in a few minutes) [14:19:35] (03PS1) 10ArielGlenn: make sure prefetch stubs include metadata for the last page wanted [dumps] - 10https://gerrit.wikimedia.org/r/415011 (https://phabricator.wikimedia.org/T188388) [14:19:52] zeljkof, yes, of course [14:20:58] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: enable SMART for lab(test) [puppet] - 10https://gerrit.wikimedia.org/r/412860 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [14:21:31] (03PS1) 10Cmjohnson: Adding mgmt dns entries analytics1070-77 [dns] - 10https://gerrit.wikimedia.org/r/415012 (https://phabricator.wikimedia.org/T188294) [14:22:09] (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns entries analytics1070-77 [dns] - 10https://gerrit.wikimedia.org/r/415012 (https://phabricator.wikimedia.org/T188294) (owner: 10Cmjohnson) [14:22:13] (03CR) 10ArielGlenn: [C: 032] make sure prefetch stubs include metadata for the last page wanted [dumps] - 10https://gerrit.wikimedia.org/r/415011 (https://phabricator.wikimedia.org/T188388) (owner: 10ArielGlenn) [14:22:16] I abandoned the patch zeljkof [14:23:29] Hauskatze: ok, thanks for letting me know, we can deploy it some other time if there are problems with train [14:23:59] !log ariel@tin Started deploy [dumps/dumps@9b7841f]: fix off-by-one error in prefetch stubs generation [14:24:02] !log ariel@tin Finished deploy [dumps/dumps@9b7841f]: fix off-by-one error in prefetch stubs generation (duration: 00m 04s) [14:24:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:29] zeljkof, is the patch at mwdebug now? [14:24:41] Urbanecm: still waiting for CI :( [14:24:49] zeljkof, oh, ok [14:24:52] https://integration.wikimedia.org/ci/job/operations-mw-config-composer-test-docker/596/console [14:24:58] Please ping me as soon as there will be anything to do for me ;) [14:25:05] looks like there is some progress, if was stuck for 6 minutes [14:25:20] ok [14:26:32] (03CR) 10jerkins-bot: [V: 04-1] Fix: Add missed line in wgLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:27:12] (03PS1) 10Elukey: cassandra: flip jmx exporter's jar reference [puppet] - 10https://gerrit.wikimedia.org/r/415013 (https://phabricator.wikimedia.org/T181728) [14:27:19] !log silence labvirt1019/1020 in icinga [14:27:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:44] hashar: do you know what went wrong here? https://integration.wikimedia.org/ci/job/operations-mw-config-composer-test-docker/596/console [14:29:52] (03CR) 10Zfilipin: Fix: Add missed line in wgLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:29:55] (03CR) 10Elukey: "pcc: https://puppet-compiler.wmflabs.org/compiler02/10148/" [puppet] - 10https://gerrit.wikimedia.org/r/415013 (https://phabricator.wikimedia.org/T181728) (owner: 10Elukey) [14:30:04] (03CR) 10MarcoAurelio: [C: 031] "It's a timeout error. I suggest to try to re+2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:30:06] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:30:36] Urbanecm: merge failed, trying again https://integration.wikimedia.org/ci/job/operations-mw-config-composer-test-docker/596/console [14:31:10] !log puppet disable on RPS-using hosts to be careful with RPS hosts https://gerrit.wikimedia.org/r/#/c/414676/ - cp*, lvs*, labstore [14:31:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:46] (03PS7) 10BBlack: rps: change IRQs without reboot on bnx2x [puppet] - 10https://gerrit.wikimedia.org/r/414676 [14:34:11] (03CR) 10Filippo Giunchedi: [C: 031] cassandra: flip jmx exporter's jar reference [puppet] - 10https://gerrit.wikimedia.org/r/415013 (https://phabricator.wikimedia.org/T181728) (owner: 10Elukey) [14:35:14] Urbanecm: I'm not sure what is going on, but to me it looks like CI is having problems :( https://integration.wikimedia.org/ci/job/operations-mw-config-composer-test-docker/597/console [14:35:19] (03CR) 10Ema: [C: 04-1] "So the patch does work as advertised VCL-wise (and the VTC tests are green). -1 though because of Krinkle's comment." [puppet] - 10https://gerrit.wikimedia.org/r/404158 (https://phabricator.wikimedia.org/T69015) (owner: 10Mholloway) [14:35:32] (03CR) 10jerkins-bot: [V: 04-1] Fix: Add missed line in wgLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:35:35] until hashar is around, there isn't much I can do [14:35:55] (03CR) 10Zfilipin: Fix: Add missed line in wgLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:36:03] (03CR) 10Zfilipin: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:36:53] Urbanecm, raynor_, gilles: something is wrong with CI :( https://gerrit.wikimedia.org/r/#/c/414993/ [14:36:56] (03CR) 10Hashar: [C: 031] Add Apache 2.0 license. [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/414851 (owner: 10Nfontes) [14:37:08] I am not sure we will be able to continue with SWAT [14:37:25] I'm trying something [14:38:30] zeljkof: what is happening? [14:38:39] hashar: take a look a this https://gerrit.wikimedia.org/r/#/c/414993/ [14:38:50] CI works fine until I +2 [14:38:56] then a job fails [14:39:04] bah [14:39:05] :/ [14:39:25] hashar: operations-mw-config-composer-test-docker https://integration.wikimedia.org/ci/job/operations-mw-config-composer-test-docker/597/console [14:39:46] 00:05:14.535 File ./wmf-config/PoolCounterSettings.php has empty skip status. Please contact PHP Parallel Lint author. [14:39:46] 00:05:14.655 Script parallel-lint --exclude multiversion/vendor --ignore-fails . handling the lint event returned with an error [14:39:49] (03CR) 10BBlack: [C: 032] rps: change IRQs without reboot on bnx2x [puppet] - 10https://gerrit.wikimedia.org/r/414676 (owner: 10BBlack) [14:40:01] zeljkof, our patches are not critical, we can do those tomorrow [14:40:44] zeljkof, can you fix it? [14:41:04] Urbanecm: I can not but hopefully hashar can :) [14:41:29] (03PS2) 10Gehel: wdqs: icinga check for categories updates [puppet] - 10https://gerrit.wikimedia.org/r/415010 (https://phabricator.wikimedia.org/T188293) [14:41:34] hashar: should we stop SWAT or is it something that can be fixed quickly? [14:42:13] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585#4006110 (10Cmjohnson) @ayounsi I added 2 new servers to the switch stack in row B (A and C as well) analytics1072 B2 2/0/0 (populated the new swithc wi... [14:42:21] (03CR) 10Ottomata: [V: 032 C: 032] Add Apache 2.0 license. [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/414851 (owner: 10Nfontes) [14:43:13] zeljkof, what about the throttles? Are they done? [14:43:21] The last patch can be IMHO sent to tomorrow [14:43:28] Urbanecm: throttles are all done [14:43:33] Great [14:43:38] I have absolute no clue what the error is :( [14:44:12] hashar: ok, stopping with swat then? [14:44:36] !log rebooting thumbor in eqiad for kernel security update [14:44:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:57] (03PS2) 10Hashar: Fix: Add missed line in wgLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:45:05] rsync: failed to set times on "/cache/.": Operation not permitted (1) [14:45:10] This shouldn't happen IMHO... [14:45:13] one sure thing the failure is unrelated to the patch [14:45:19] zeljkof - I moved my patches for tomorrow [14:45:19] Urbanecm: that is not an issue [14:45:36] we can wait [14:45:47] hashar, as throttles are done, I have no problem with waiting :) [14:46:23] I rebased https://gerrit.wikimedia.org/r/#/c/414993/ , I guess zeljkof you can +2 it again [14:46:34] as for why parallel-lint failed, really I don't know [14:47:15] Urbanecm: hashar: +2ing the patch again [14:47:31] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:47:36] (I've already marked remaining patches as "skipped" and MarcoAurelio's patch as "declined", maybe too soon :D) [14:47:43] (03PS3) 10Gehel: wdqs: icinga check for categories updates [puppet] - 10https://gerrit.wikimedia.org/r/415010 (https://phabricator.wikimedia.org/T188293) [14:47:55] But as gehel's patch requires 15 mins+ for testing, it definitely won't be processed [14:48:14] (gilles's patch, sorry gehel) [14:48:56] * gehel was suddenly wondering if he forgot about sending a patch to swat... [14:48:59] (03CR) 10Filippo Giunchedi: [C: 031] puppetmaster::frontend: re-add testing vhost [puppet] - 10https://gerrit.wikimedia.org/r/415005 (owner: 10Giuseppe Lavagetto) [14:49:01] (03Merged) 10jenkins-bot: Fix: Add missed line in wgLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:49:21] Urbanecm, hashar: rebase did the trick! :) https://gerrit.wikimedia.org/r/#/c/414993/ [14:49:21] (03CR) 10jenkins-bot: Fix: Add missed line in wgLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414993 (https://phabricator.wikimedia.org/T185977) (owner: 10Urbanecm) [14:49:29] zeljkof: maybe it was a one off IO error of some sort [14:49:42] hashar: happened twice in a row :/ [14:50:18] (03PS3) 10Filippo Giunchedi: puppetmaster::frontend: re-add testing vhost [puppet] - 10https://gerrit.wikimedia.org/r/415005 (owner: 10Giuseppe Lavagetto) [14:50:28] Great! [14:50:47] Urbanecm: the patch is at mwdebug1002 [14:50:53] zeljkof, will have a look [14:51:37] zeljkof, please deploy to the whole gallaxy [14:51:51] Urbanecm: sending it to the space [14:52:00] Thanks [14:52:51] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:414993|Fix: Add missed line in wgLogo (T185977)]] (duration: 00m 56s) [14:53:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:05] T185977: Update logo for Urdu Wikibooks - https://phabricator.wikimedia.org/T185977 [14:53:10] Urbanecm: deployed, please check and thanks for deploying with #releng ;) [14:53:24] zeljkof, thank you! [14:53:28] (03PS4) 10Gehel: wdqs: icinga check for categories updates [puppet] - 10https://gerrit.wikimedia.org/r/415010 (https://phabricator.wikimedia.org/T188293) [14:53:45] gilles: looks like CI is back, there are 7 minutes left if you want to deploy your patch [14:53:54] yay [14:54:15] gilles: swat is yours, please remember to close the window with !log EU SWAT finished :) [14:54:41] (03CR) 10Gilles: [C: 032] Set up separate Thumbor Swift user for private containers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414965 (https://phabricator.wikimedia.org/T187822) (owner: 10Gilles) [14:56:10] (03Merged) 10jenkins-bot: Set up separate Thumbor Swift user for private containers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414965 (https://phabricator.wikimedia.org/T187822) (owner: 10Gilles) [14:56:19] 10Operations, 10Analytics, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4006202 (10Ottomata) All for jmxtrans, varnishkafka, and kafkatee. Might want to keep zookeeper as a submodule. Just because there aren't many contributions... [14:57:32] PROBLEM - puppet last run on lvs1007 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[ethtool_rss_combined_channels_eth0],Exec[ethtool_rss_combined_channels_eth1] [14:58:09] 10Operations, 10ops-eqiad, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4006219 (10Ottomata) > We still haven't tested Hadoop packages on stretch We kinda have, just not services. stat1005 is a Stretch Hadoop clien... [14:59:07] (03CR) 10jenkins-bot: Set up separate Thumbor Swift user for private containers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414965 (https://phabricator.wikimedia.org/T187822) (owner: 10Gilles) [15:00:02] !log gilles@tin Synchronized wmf-config/filebackend.php: Thumbor private wiki support deployment: [[gerrit:414965| (T187822)]] (duration: 00m 56s) [15:00:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:17] T187822: Have Thumbor use a different Swift user when dealing with private containers - https://phabricator.wikimedia.org/T187822 [15:01:46] 10Operations, 10Analytics, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4006234 (10elukey) >>! In T188377#4006202, @Ottomata wrote: > All for jmxtrans, varnishkafka, and kafkatee. Might want to keep zookeeper as a submodule. Jus... [15:02:18] !log gilles@tin Synchronized private/PrivateSettings.php.example: Thumbor private wiki support deployment: [[gerrit:414965| Set up separate Thumbor Swit user for private containers (T187822)]] (duration: 00m 55s) [15:02:30] !log EU SWAT finished [15:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:03] (03PS1) 10Ottomata: Install libhdfs0 on all hadoop nodes [puppet/cdh] - 10https://gerrit.wikimedia.org/r/415015 [15:03:17] https://media.giphy.com/media/5tLyokkZ02wWA/giphy.gif [15:04:08] 10Operations, 10Analytics, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4006258 (10Ottomata) stars: https://github.com/wikimedia/puppet-zookeeper/stargazers watchers: https://github.com/wikimedia/puppet-zookeeper/watchers forks: h... [15:09:22] (03PS1) 10Ottomata: Migrate webrequest upload varnishkafka to Kafka jumbo [puppet] - 10https://gerrit.wikimedia.org/r/415016 (https://phabricator.wikimedia.org/T185136) [15:11:05] !log upgrade cache_text@esams to varnish 5 T184448 [15:11:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:21] T184448: Upgrade cache_text to Varnish 5 - https://phabricator.wikimedia.org/T184448 [15:12:18] (03PS2) 10Ema: cache_text: upgrade esams to varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/415008 (https://phabricator.wikimedia.org/T184448) [15:13:00] (03CR) 10Ema: [C: 032] cache_text: upgrade esams to varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/415008 (https://phabricator.wikimedia.org/T184448) (owner: 10Ema) [15:13:30] 10Operations, 10Datasets-General-or-Unknown, 10User-ArielGlenn: Reboots of dumps/snapshot hosts - https://phabricator.wikimedia.org/T188242#4006314 (10MoritzMuehlenhoff) [15:13:57] (03PS5) 10Gehel: wdqs: icinga check for categories updates [puppet] - 10https://gerrit.wikimedia.org/r/415010 (https://phabricator.wikimedia.org/T188293) [15:14:57] (03PS1) 10Cmjohnson: Adding production DNS analytics1070-77 [dns] - 10https://gerrit.wikimedia.org/r/415017 (https://phabricator.wikimedia.org/T188294) [15:17:33] (03CR) 10Mholloway: "> This would break the m.wikipedia.org and zero.wikipedia.org entry" [puppet] - 10https://gerrit.wikimedia.org/r/404158 (https://phabricator.wikimedia.org/T69015) (owner: 10Mholloway) [15:18:01] (03CR) 10Cmjohnson: [C: 032] Adding production DNS analytics1070-77 [dns] - 10https://gerrit.wikimedia.org/r/415017 (https://phabricator.wikimedia.org/T188294) (owner: 10Cmjohnson) [15:18:09] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2049" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415009 (owner: 10Marostegui) [15:19:22] !log beginning migration of varnishkafka webrequest upload from Kafka analytics to kafka jumbo [15:19:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:19:39] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2049" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415009 (owner: 10Marostegui) [15:19:54] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2049" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415009 (owner: 10Marostegui) [15:20:29] !log powercycling thumbor1004, stuck during reboot [15:20:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:56] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2049 - T187534 (duration: 00m 57s) [15:21:05] (03PS2) 10Ottomata: Migrate webrequest upload varnishkafka to Kafka jumbo [puppet] - 10https://gerrit.wikimedia.org/r/415016 (https://phabricator.wikimedia.org/T185136) [15:21:07] (03CR) 10Filippo Giunchedi: [C: 032] puppetmaster::frontend: re-add testing vhost [puppet] - 10https://gerrit.wikimedia.org/r/415005 (owner: 10Giuseppe Lavagetto) [15:21:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:10] T187534: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534 [15:21:13] (03PS4) 10Filippo Giunchedi: puppetmaster::frontend: re-add testing vhost [puppet] - 10https://gerrit.wikimedia.org/r/415005 (owner: 10Giuseppe Lavagetto) [15:22:49] (03CR) 10Ottomata: [C: 032] Migrate webrequest upload varnishkafka to Kafka jumbo [puppet] - 10https://gerrit.wikimedia.org/r/415016 (https://phabricator.wikimedia.org/T185136) (owner: 10Ottomata) [15:23:52] (03PS3) 10Ppchelko: [JobQueue] Switch refreshLinks for all but wikipedia and wiktionary. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414760 (https://phabricator.wikimedia.org/T185052) [15:24:17] (03PS5) 10Filippo Giunchedi: puppetmaster::frontend: re-add testing vhost [puppet] - 10https://gerrit.wikimedia.org/r/415005 (owner: 10Giuseppe Lavagetto) [15:28:31] (03PS1) 10Marostegui: db1081.yaml: Change binlog to statement based [puppet] - 10https://gerrit.wikimedia.org/r/415018 (https://phabricator.wikimedia.org/T186321) [15:28:44] (03PS1) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [15:28:48] (03CR) 10Mholloway: "I just found I0c2d0e2d3af261c6606bdf3ce64a6c1103b9bace, though, which looks like it would make mobilelanding.php only reachable by m.wikip" [puppet] - 10https://gerrit.wikimedia.org/r/404158 (https://phabricator.wikimedia.org/T69015) (owner: 10Mholloway) [15:29:10] (03CR) 10jerkins-bot: [V: 04-1] labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [15:29:46] (03CR) 10Marostegui: [C: 032] db1081.yaml: Change binlog to statement based [puppet] - 10https://gerrit.wikimedia.org/r/415018 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [15:31:43] (03PS1) 10Marostegui: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415020 (https://phabricator.wikimedia.org/T186321) [15:33:00] (03PS2) 10Marostegui: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415020 (https://phabricator.wikimedia.org/T186321) [15:33:31] PROBLEM - puppet last run on boron is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:33] (03CR) 10Ottomata: [C: 032] Install libhdfs0 on all hadoop nodes [puppet/cdh] - 10https://gerrit.wikimedia.org/r/415015 (owner: 10Ottomata) [15:33:52] !log installing squid security updates [15:34:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:18] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415020 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [15:35:31] (03PS2) 10Ottomata: Update cdh mordule to install libhdfs0 on hadoop nodes [puppet] - 10https://gerrit.wikimedia.org/r/411464 (owner: 10EBernhardson) [15:35:59] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415020 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [15:36:04] (03CR) 10Ottomata: [V: 032 C: 032] "Did this in cdh module:" [puppet] - 10https://gerrit.wikimedia.org/r/411464 (owner: 10EBernhardson) [15:37:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1081 - T186321 (duration: 00m 55s) [15:37:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:27] T186321: Prepare and indicate proper master db failover candidates for all eqiad database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321 [15:37:37] !log Stop MySQL and reboot db1081 for kernel ugprade, mariadb upgrade and binlog format change - T186321 [15:37:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:18] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415020 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [15:38:27] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415023 (https://phabricator.wikimedia.org/T186321) [15:39:37] (03PS1) 10Jcrespo: mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) [15:40:22] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [15:42:44] (03PS2) 10Jcrespo: mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) [15:42:58] (03PS2) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [15:43:29] (03CR) 10jerkins-bot: [V: 04-1] labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [15:43:37] (03Abandoned) 10Jcrespo: [WIP]Orchestrate the source of the database backups per datacenter [puppet] - 10https://gerrit.wikimedia.org/r/410180 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [15:45:06] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415023 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [15:45:29] (03CR) 10Jcrespo: [C: 04-1] "The most important part is missing- the script!" [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [15:46:34] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415023 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [15:47:54] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 - T186321 (duration: 00m 56s) [15:48:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:08] T186321: Prepare and indicate proper master db failover candidates for all eqiad database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321 [15:48:18] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415023 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [15:48:34] (03CR) 10Muehlenhoff: labweb: include mediawiki profiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [15:49:10] !log Restarting ORES celery workers, changing from 35 -> 45 workers per node. [15:49:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:52] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415026 [15:50:04] (03CR) 10Eevans: [C: 031] cassandra: flip jmx exporter's jar reference [puppet] - 10https://gerrit.wikimedia.org/r/415013 (https://phabricator.wikimedia.org/T181728) (owner: 10Elukey) [15:53:31] RECOVERY - puppet last run on boron is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:55:08] boron puppet failures is me btw [15:55:47] (03PS3) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [15:56:15] (03CR) 10jerkins-bot: [V: 04-1] labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [15:58:37] (03PS2) 10Elukey: cassandra: flip jmx exporter's jar reference [puppet] - 10https://gerrit.wikimedia.org/r/415013 (https://phabricator.wikimedia.org/T181728) [15:59:57] (03CR) 10Elukey: [C: 032] cassandra: flip jmx exporter's jar reference [puppet] - 10https://gerrit.wikimedia.org/r/415013 (https://phabricator.wikimedia.org/T181728) (owner: 10Elukey) [16:03:14] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415026 (owner: 10Marostegui) [16:03:16] (03CR) 10Dzahn: [C: 032] "it's not possible to apply this on krypton without breaking labmon because labmon mixes prometheus with grafana classes on one node" [puppet] - 10https://gerrit.wikimedia.org/r/406970 (owner: 10Dzahn) [16:03:48] (03CR) 10Marostegui: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415026 (owner: 10Marostegui) [16:04:22] PROBLEM - Check whether ferm is active by checking the default input chain on db2049 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [16:04:31] PROBLEM - Check systemd state on db2049 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:05:31] PROBLEM - puppet last run on boron is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:05:47] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415026 (owner: 10Marostegui) [16:06:55] !log rebooting restbase-dev for kernel security update [16:07:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:20] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415026 (owner: 10Marostegui) [16:08:18] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415026 (owner: 10Marostegui) [16:08:30] (03PS2) 10Elukey: role::aqs: enable Cassandra JMX exporter [puppet] - 10https://gerrit.wikimedia.org/r/413405 (https://phabricator.wikimedia.org/T184795) [16:08:30] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1081 (duration: 00m 56s) [16:08:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:01] !log restarting jenkins for plugin update [16:10:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:21] (03PS1) 10Dzahn: prometheus: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/415029 [16:11:03] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415030 [16:12:19] (03CR) 10Elukey: [C: 032] role::aqs: enable Cassandra JMX exporter [puppet] - 10https://gerrit.wikimedia.org/r/413405 (https://phabricator.wikimedia.org/T184795) (owner: 10Elukey) [16:15:07] (03CR) 10jerkins-bot: [V: 04-1] prometheus: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/415029 (owner: 10Dzahn) [16:17:01] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests, and 2 others: Decommission restbase-test environment - https://phabricator.wikimedia.org/T186755#4006462 (10MoritzMuehlenhoff) [16:17:47] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415030 (owner: 10Marostegui) [16:17:54] (03PS1) 10Awight: Double ORES worker count [puppet] - 10https://gerrit.wikimedia.org/r/415032 [16:18:59] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415030 (owner: 10Marostegui) [16:19:46] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163815 (owner: 10Hashar) [16:20:06] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1081 (duration: 00m 56s) [16:20:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:20:20] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415030 (owner: 10Marostegui) [16:21:05] (03PS1) 10Papaul: DHCP: Add dhcp entries for wdqs200[4-6] [puppet] - 10https://gerrit.wikimedia.org/r/415033 [16:21:39] (03CR) 10jerkins-bot: [V: 04-1] DHCP: Add dhcp entries for wdqs200[4-6] [puppet] - 10https://gerrit.wikimedia.org/r/415033 (owner: 10Papaul) [16:21:49] (03PS1) 10Cmjohnson: Adding dhcpd entries analytics1070-77 [puppet] - 10https://gerrit.wikimedia.org/r/415035 (https://phabricator.wikimedia.org/T188294) [16:24:30] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415036 [16:25:07] (03CR) 10Muehlenhoff: labweb: include mediawiki profiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [16:26:41] PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:26:43] !log rebooting mw1293-mw1298 for kernel security update [16:26:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:27:04] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415036 (owner: 10Marostegui) [16:27:53] (03PS1) 10Papaul: Partman: Add wdqs200[4-6] to partman [puppet] - 10https://gerrit.wikimedia.org/r/415037 [16:28:22] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415036 (owner: 10Marostegui) [16:28:25] (03CR) 10jerkins-bot: [V: 04-1] Partman: Add wdqs200[4-6] to partman [puppet] - 10https://gerrit.wikimedia.org/r/415037 (owner: 10Papaul) [16:29:22] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install wdqs200[4-6] - https://phabricator.wikimedia.org/T187800#4006520 (10Papaul) [16:29:28] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1081 (duration: 00m 55s) [16:29:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:03] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415038 [16:30:34] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415036 (owner: 10Marostegui) [16:30:35] RECOVERY - puppet last run on boron is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:35:51] 10Operations, 10Goal, 10User-Elukey, 10User-fgiunchedi: Export Prometheus-compatible JVM metrics from JVMs in production - https://phabricator.wikimedia.org/T177197#4006554 (10elukey) [16:35:53] 10Operations, 10Goal, 10Patch-For-Review, 10User-Elukey, 10User-fgiunchedi: Stop using jmx_exporter deployed via scap in favour of Debian package - https://phabricator.wikimedia.org/T181728#4006551 (10elukey) 05Open>03Resolved a:03elukey Restbase and AQS' instances will pick up the new jar as part... [16:36:12] 10Operations, 10Goal, 10User-Elukey, 10User-fgiunchedi: Export Prometheus-compatible JVM metrics from JVMs in production - https://phabricator.wikimedia.org/T177197#3650154 (10elukey) [16:36:14] 10Operations, 10Goal, 10Patch-For-Review, 10User-Elukey, 10User-fgiunchedi: Stop using jmx_exporter deployed via scap in favour of Debian package - https://phabricator.wikimedia.org/T181728#4006555 (10elukey) 05Resolved>03Open [16:36:31] 10Operations, 10Goal, 10Patch-For-Review, 10User-Elukey, 10User-fgiunchedi: Stop using jmx_exporter deployed via scap in favour of Debian package - https://phabricator.wikimedia.org/T181728#3800073 (10elukey) ETOOSOON, we'll need to cleanup the scap dirs on the hosts probably? [16:38:27] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415038 (owner: 10Marostegui) [16:39:48] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415038 (owner: 10Marostegui) [16:39:55] there are so many prometheus roles, prometheus::ops ::global ::beta ::analytics ::services ::tools but they are all just included in the global prometheus role. i gotta figure out what actually uses the prometheus::web define. [16:40:21] seems just the hosts called prometheus.. but then it's als labmon [16:40:28] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415038 (owner: 10Marostegui) [16:42:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1081 (duration: 02m 04s) [16:42:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:26] !log cp1008: retpoline kernel/libs upgrade T188092 [16:46:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:11] (03PS1) 10ArielGlenn: rsync txt files from dumps 'latest' directories to webserver [puppet] - 10https://gerrit.wikimedia.org/r/415042 (https://phabricator.wikimedia.org/T187426) [16:47:59] (03PS2) 10Dzahn: prometheus: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/415029 [16:50:12] (03PS1) 10Anomie: Set wgCommentTableSchemaMigrationStage = MIGRATION_WRITE_BOTH everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415043 (https://phabricator.wikimedia.org/T166733) [16:50:28] (03CR) 10Anomie: [C: 032] "Config change, previously discussed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415043 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie) [16:50:41] !log lvs1010: retpoline kernel/libs upgrade T188092 [16:50:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:51:35] (03PS2) 10Vgutierrez: Provide BGP session state visibility for every ASN/peer [debs/pybal] - 10https://gerrit.wikimedia.org/r/414973 (https://phabricator.wikimedia.org/T188085) [16:51:42] (03Merged) 10jenkins-bot: Set wgCommentTableSchemaMigrationStage = MIGRATION_WRITE_BOTH everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415043 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie) [16:51:54] (03CR) 10jenkins-bot: Set wgCommentTableSchemaMigrationStage = MIGRATION_WRITE_BOTH everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415043 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie) [16:52:15] PROBLEM - DPKG on rhodium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:52:39] 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Pybal stuck at BGP state OPENSENT while the other peer reached ESTABLISHED - https://phabricator.wikimedia.org/T188085#4006626 (10Vgutierrez) After some very meaningful CR by @ema now 414973 looks like this: ``` pybal_bgp_enabled 1.0 pybal_bgp_session_e... [16:52:58] !log anomie@tin Synchronized wmf-config/InitialiseSettings.php: Setting wgCommentTableSchemaMigrationStage = MIGRATION_WRITE_BOTH everywhere (duration: 00m 56s) [16:53:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:14] !log restart cassandra-a on aqs1004 to test the prometheus jmx agent before complete rollout - T184795 [16:53:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:27] T184795: Add the prometheus jmx agent to AQS Cassandra - https://phabricator.wikimedia.org/T184795 [16:54:10] (03PS3) 10Vgutierrez: Provide BGP session state visibility for every ASN/peer [debs/pybal] - 10https://gerrit.wikimedia.org/r/414973 (https://phabricator.wikimedia.org/T188085) [16:55:59] (03CR) 10Gehel: [C: 031] "Except the commit message validator failing, this LGTM. I haven't actually checked the MAC." [puppet] - 10https://gerrit.wikimedia.org/r/415033 (owner: 10Papaul) [16:56:25] PROBLEM - cassandra-a CQL 10.64.0.126:9042 on aqs1004 is CRITICAL: connect to address 10.64.0.126 and port 9042: Connection refused [16:56:45] RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:56:51] lovely [16:57:35] ah it is still booting, should resolve in a bit [16:57:43] (03CR) 10Gehel: [C: 031] "I'm always somewhat lost in partman, but as far as I understand, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/415037 (owner: 10Papaul) [16:58:12] (03PS3) 10BBlack: lvs - use new fact to determine bnx2x [puppet] - 10https://gerrit.wikimedia.org/r/414740 [16:58:14] (03PS1) 10BBlack: lvs1007-12: remove LVS config bits everywhere [puppet] - 10https://gerrit.wikimedia.org/r/415044 [16:59:53] (03PS3) 10Jcrespo: mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) [17:00:02] (03CR) 10Ottomata: "+1 in general, LGTM :)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/413362 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [17:00:04] godog, moritzm, and _joe_: That opportune time is upon us again. Time for a Puppet SWAT(Max 8 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180227T1700). [17:00:05] no_justification and Krinkle: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:00:26] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [17:01:31] (03CR) 10Dzahn: [C: 032] "this is not compilable due to unrelated issue with compiling prometheus changes .. just have to do it and see" [puppet] - 10https://gerrit.wikimedia.org/r/415029 (owner: 10Dzahn) [17:01:33] (03PS3) 10Dzahn: prometheus: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/415029 [17:03:30] (03PS4) 10Jcrespo: mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) [17:04:01] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [17:04:52] (03CR) 10Filippo Giunchedi: [C: 04-1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/414876 (https://phabricator.wikimedia.org/T103886) (owner: 10Chad) [17:05:00] no_justification: ^ [17:06:56] PROBLEM - keystone admin endpoint port 35357 on labtestcontrol2003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:08:36] PROBLEM - keystone public endoint port 5000 on labtestcontrol2003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:08:43] (03CR) 10Giuseppe Lavagetto: "yes, this should cover all the canaries *at the very least*, or we won't be able to test it under user traffic." [puppet] - 10https://gerrit.wikimedia.org/r/414876 (https://phabricator.wikimedia.org/T103886) (owner: 10Chad) [17:08:51] (03CR) 10Ema: [C: 031] "Minor comment inline, LGTM otherwise." (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/414973 (https://phabricator.wikimedia.org/T188085) (owner: 10Vgutierrez) [17:10:00] (03PS5) 10Jcrespo: mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) [17:10:02] !log restarting Cassandra, restbase1007-a to test jmx_exporter [17:10:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:35] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [17:12:46] PROBLEM - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: connect to address 10.64.0.230 and port 9042: Connection refused [17:13:08] (03PS1) 10Giuseppe Lavagetto: conftool: add json-schemas for MediaWiki variables validation [puppet] - 10https://gerrit.wikimedia.org/r/415046 (https://phabricator.wikimedia.org/T185080) [17:13:16] PROBLEM - cassandra-a SSL 10.64.0.230:7001 on restbase1007 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [17:14:13] (03PS6) 10Jcrespo: mariadb: Set up es2001 as the temporary backup target [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) [17:14:35] !log upload puppetdb 2.3.8-1~wmf1+stretch to stretch-wikimedia - T184562 [17:14:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:51] T184562: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562 [17:16:22] andrewbogott: the puppet run is now fixed on labmon1001 (for the price of a new issue on prometheus .. but .. that's next) [17:16:44] godog: i broke the puppet run on prometheus to fix it on labmon :p on it [17:16:54] mutante: haha ok no worries [17:16:56] mutante: thank you! And, I'm sorry :) [17:17:06] hehe:) ok [17:17:16] RECOVERY - cassandra-a SSL 10.64.0.230:7001 on restbase1007 is OK: SSL OK - Certificate restbase1007-a valid until 2018-08-17 16:10:53 +0000 (expires in 170 days) [17:17:37] RECOVERY - keystone public endoint port 5000 on labtestcontrol2003 is OK: HTTP OK: HTTP/1.1 300 Multiple Choices - 757 bytes in 7.162 second response time [17:17:46] RECOVERY - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is OK: TCP OK - 0.000 second response time on 10.64.0.230 port 9042 [17:17:56] RECOVERY - keystone admin endpoint port 35357 on labtestcontrol2003 is OK: HTTP OK: HTTP/1.1 300 Multiple Choices - 759 bytes in 0.075 second response time [17:18:14] (03PS1) 10Ema: wmf-upgrade-and-reboot: upgrade the given host and reboot it [puppet] - 10https://gerrit.wikimedia.org/r/415047 [17:18:36] PROBLEM - puppet last run on labtestcontrol2003 is CRITICAL: CRITICAL: Puppet has 5 failures. Last run 11 minutes ago with 5 failures. Failed resources (up to 3 shown): Package[python-glanceclient],Package[python-keystoneclient],Package[python-openstackclient],Package[python-designateclient] [17:18:56] PROBLEM - puppet last run on prometheus1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:19:16] PROBLEM - puppet last run on prometheus2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:19:26] PROBLEM - puppet last run on prometheus1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:20:06] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:20:37] ACKNOWLEDGEMENT - puppet last run on prometheus1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues daniel_zahn apache-httpd [17:20:37] ACKNOWLEDGEMENT - puppet last run on prometheus1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues daniel_zahn apache-httpd [17:20:37] ACKNOWLEDGEMENT - puppet last run on prometheus2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues daniel_zahn apache-httpd [17:20:54] ^ not related to those other recoveries up there [17:23:01] godog oh, you downgraded puppetdb. [17:23:08] it was puppetdb 4 [17:23:12] but now it is 2 [17:24:06] 10Operations, 10Puppet, 10Patch-For-Review, 10User-fgiunchedi: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4006804 (10Paladox) puppetdb 4 was in stretch-wikipedia. But seems it is now puppetdb 2. [17:25:39] paladox: yes, we want puppetdb 2 for now [17:25:44] ok [17:27:52] 10Operations, 10Proton, 10Readers-Web-Backlog, 10Services (watching): Choose a server for the chromium-render service - https://phabricator.wikimedia.org/T187821#4006822 (10ovasileva) p:05Triage>03High [17:31:06] (03PS1) 10Dzahn: prometheus: move httpd declaration to role class [puppet] - 10https://gerrit.wikimedia.org/r/415051 [17:31:54] PROBLEM - cassandra-a service on aqs1004 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is inactive [17:32:48] !log starting branch cut for 1.31.0-wmf.23 T183962 [17:32:52] (03PS2) 10Dzahn: prometheus: move httpd declaration to role class [puppet] - 10https://gerrit.wikimedia.org/r/415051 [17:33:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:03] T183962: 1.31.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T183962 [17:33:22] (03PS3) 10Dzahn: prometheus: move httpd declaration to role class [puppet] - 10https://gerrit.wikimedia.org/r/415051 [17:33:38] (03CR) 10Krinkle: [C: 04-1] "@MHolloway: I understand that from a product perspective, perhaps the m-dot and zero-dot domains seem "broken", but as far as I'm concerne" [puppet] - 10https://gerrit.wikimedia.org/r/404158 (https://phabricator.wikimedia.org/T69015) (owner: 10Mholloway) [17:33:54] RECOVERY - cassandra-a service on aqs1004 is OK: OK - cassandra-a is active [17:34:03] (03CR) 10Dzahn: [C: 032] prometheus: move httpd declaration to role class [puppet] - 10https://gerrit.wikimedia.org/r/415051 (owner: 10Dzahn) [17:34:57] (03CR) 10Jcrespo: "This needs more work, but I would like to deploy quickly to es2001 (and disable dbstore2001 cron) when backups finish this week." [puppet] - 10https://gerrit.wikimedia.org/r/415024 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [17:35:05] (03CR) 10Volans: [C: 04-1] "Nice! I think there are some small things to fix though, see inline." (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/415047 (owner: 10Ema) [17:37:28] godog: fixed. no-op. and the prometheus machines now don't use the apache module anymore. i just had to move one piece from web.pp to the role [17:37:37] (03CR) 10ArielGlenn: [C: 032] rsync txt files from dumps 'latest' directories to webserver [puppet] - 10https://gerrit.wikimedia.org/r/415042 (https://phabricator.wikimedia.org/T187426) (owner: 10ArielGlenn) [17:37:44] (03PS2) 10ArielGlenn: rsync txt files from dumps 'latest' directories to webserver [puppet] - 10https://gerrit.wikimedia.org/r/415042 (https://phabricator.wikimedia.org/T187426) [17:38:09] mutante: awesome! good job, thanks for taking care of it [17:38:11] because that's a defined type used multiple times.. created duplicate definitions [17:38:17] welcome :) and thx [17:38:33] RECOVERY - puppet last run on labtestcontrol2003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:38:35] !log restarting wdqs-updater on wdqs1004 - T188045 [17:38:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:50] T188045: wdqs1004 broken - https://phabricator.wikimedia.org/T188045 [17:39:03] RECOVERY - puppet last run on prometheus1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:39:14] RECOVERY - puppet last run on prometheus2003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:39:23] RECOVERY - puppet last run on prometheus1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:40:33] omg lol.. but that brought me an issue on labmon1001 again .. fml [17:41:16] !log restarting ferm on db2049, seems failed one day ago [17:41:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:03] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:42:14] (03PS1) 10Elukey: Revert "role::aqs: enable Cassandra JMX exporter" [puppet] - 10https://gerrit.wikimedia.org/r/415055 [17:42:23] RECOVERY - Check whether ferm is active by checking the default input chain on db2049 is OK: OK ferm input default policy is set [17:42:43] RECOVERY - Check systemd state on db2049 is OK: OK - running: The system is fully operational [17:43:01] (03CR) 10Elukey: [C: 032] Revert "role::aqs: enable Cassandra JMX exporter" [puppet] - 10https://gerrit.wikimedia.org/r/415055 (owner: 10Elukey) [17:43:07] (03PS2) 10Elukey: Revert "role::aqs: enable Cassandra JMX exporter" [puppet] - 10https://gerrit.wikimedia.org/r/415055 [17:44:54] (03PS2) 10Giuseppe Lavagetto: conftool: add json-schemas for MediaWiki variables validation [puppet] - 10https://gerrit.wikimedia.org/r/415046 (https://phabricator.wikimedia.org/T185080) [17:45:53] (03CR) 10Ppchelko: "@Ottomata shall we merge this one?" [puppet] - 10https://gerrit.wikimedia.org/r/410251 (https://phabricator.wikimedia.org/T187241) (owner: 10Ppchelko) [17:46:07] !log rebooting kubernetes workers in codfw for kernel security update [17:46:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:48:43] RECOVERY - cassandra-a CQL 10.64.0.126:9042 on aqs1004 is OK: TCP OK - 0.000 second response time on 10.64.0.126 port 9042 [17:50:54] (03CR) 10Ottomata: conftool: add json-schemas for MediaWiki variables validation (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/415046 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [17:53:53] (03PS1) 10Arturo Borrero Gonzalez: toollabs: introduce base class for all toolforge roles [puppet] - 10https://gerrit.wikimedia.org/r/415057 (https://phabricator.wikimedia.org/T187193) [17:54:18] (03CR) 10jerkins-bot: [V: 04-1] toollabs: introduce base class for all toolforge roles [puppet] - 10https://gerrit.wikimedia.org/r/415057 (https://phabricator.wikimedia.org/T187193) (owner: 10Arturo Borrero Gonzalez) [17:54:52] (03PS1) 10Dzahn: labs::graphite: add httpd declaration to role [puppet] - 10https://gerrit.wikimedia.org/r/415058 [17:56:47] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:57:24] (03CR) 10Dzahn: [C: 032] labs::graphite: add httpd declaration to role [puppet] - 10https://gerrit.wikimedia.org/r/415058 (owner: 10Dzahn) [17:57:36] PROBLEM - puppet last run on bast4002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Services – Graphoid / Parsoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180227T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:00:39] now the bastions!! arggglgr [18:00:47] "turtles all the way down" [18:01:26] but labmon and prometheus are happy:) it's just that there is always another one breaking in exchange for it :p [18:02:06] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:02:17] !log rebooting kubernetes workers in eqiad for kernel security update [18:02:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:31] 10Operations, 10Datasets-General-or-Unknown, 10User-ArielGlenn: Reboots of dumps/snapshot hosts - https://phabricator.wikimedia.org/T188242#4006980 (10ArielGlenn) [18:04:06] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:04:20] (03CR) 10Ottomata: [C: 032] 2.2.1 binary release for Hadoop 2.6 [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/405894 (https://phabricator.wikimedia.org/T185581) (owner: 10Ottomata) [18:07:26] PROBLEM - puppet last run on bast5001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:08:06] PROBLEM - Restbase root url on restbase-dev1005 is CRITICAL: connect to address 10.64.16.96 and port 7231: Connection refused [18:08:17] PROBLEM - Restbase root url on restbase-dev1004 is CRITICAL: connect to address 10.64.0.89 and port 7231: Connection refused [18:09:50] (03PS4) 10Ottomata: Added page-related events to EventStreams. [puppet] - 10https://gerrit.wikimedia.org/r/410251 (https://phabricator.wikimedia.org/T187241) (owner: 10Ppchelko) [18:11:44] (03PS2) 10Cmjohnson: Adding dhcpd entries analytics1070-77 [puppet] - 10https://gerrit.wikimedia.org/r/415035 (https://phabricator.wikimedia.org/T188294) [18:16:59] (03PS1) 10Dzahn: bastionhost::pop: add httpd declaration [puppet] - 10https://gerrit.wikimedia.org/r/415062 [18:21:10] (03CR) 10Muehlenhoff: [C: 031] [WIP] eventlogging: add systemd support [puppet] - 10https://gerrit.wikimedia.org/r/413362 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [18:21:43] (03CR) 10Dzahn: [C: 032] bastionhost::pop: add httpd declaration [puppet] - 10https://gerrit.wikimedia.org/r/415062 (owner: 10Dzahn) [18:25:51] (03CR) 10Ottomata: "Tested in beta, looks good to me. Needs a scap config deploy and service restart. Also, scap.yaml is separate (in source) from these con" [puppet] - 10https://gerrit.wikimedia.org/r/410251 (https://phabricator.wikimedia.org/T187241) (owner: 10Ppchelko) [18:25:55] (03CR) 10Ottomata: [C: 032] Added page-related events to EventStreams. [puppet] - 10https://gerrit.wikimedia.org/r/410251 (https://phabricator.wikimedia.org/T187241) (owner: 10Ppchelko) [18:25:57] (03PS5) 10Ottomata: Added page-related events to EventStreams. [puppet] - 10https://gerrit.wikimedia.org/r/410251 (https://phabricator.wikimedia.org/T187241) (owner: 10Ppchelko) [18:26:22] (03CR) 10Cmjohnson: [C: 032] Adding dhcpd entries analytics1070-77 [puppet] - 10https://gerrit.wikimedia.org/r/415035 (https://phabricator.wikimedia.org/T188294) (owner: 10Cmjohnson) [18:26:29] (03PS3) 10Cmjohnson: Adding dhcpd entries analytics1070-77 [puppet] - 10https://gerrit.wikimedia.org/r/415035 (https://phabricator.wikimedia.org/T188294) [18:26:39] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4007098 (10Smalyshev) I wonder if it's possible to use one of the new servers we're getting in T187766 to restore full capacity if debugging what is going on with 1004 takes time. W... [18:26:46] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:26:48] (03CR) 10Cmjohnson: [V: 032 C: 032] Adding dhcpd entries analytics1070-77 [puppet] - 10https://gerrit.wikimedia.org/r/415035 (https://phabricator.wikimedia.org/T188294) (owner: 10Cmjohnson) [18:26:56] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4007100 (10Smalyshev) p:05Triage>03High [18:26:59] bast5001: let us know [18:27:26] RECOVERY - puppet last run on bast5001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:29:06] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [18:30:17] (03PS4) 10Herron: puppet_compiler: add support for puppetdb4 and local postgresql [puppet] - 10https://gerrit.wikimedia.org/r/413881 (https://phabricator.wikimedia.org/T187258) [18:31:07] (03CR) 10jerkins-bot: [V: 04-1] puppet_compiler: add support for puppetdb4 and local postgresql [puppet] - 10https://gerrit.wikimedia.org/r/413881 (https://phabricator.wikimedia.org/T187258) (owner: 10Herron) [18:32:24] !log otto@tin Started deploy [eventstreams/deploy@7629e16]: Config deploy to publish page change related streams: T187241 (scb2001 only) [18:32:26] !log otto@tin Finished deploy [eventstreams/deploy@7629e16]: Config deploy to publish page change related streams: T187241 (scb2001 only) (duration: 00m 03s) [18:32:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:39] T187241: Add page-related topics to EventStreams - https://phabricator.wikimedia.org/T187241 [18:32:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:55] !log otto@tin Started restart [eventstreams/deploy@7629e16]: service restart to publish page change related streams: T187241 (scb2001 only) [18:33:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:36] (03PS1) 10Ottomata: Fix codfw typo for eventstreams config [puppet] - 10https://gerrit.wikimedia.org/r/415065 (https://phabricator.wikimedia.org/T187241) [18:35:08] (03CR) 10Ottomata: [C: 032] Fix codfw typo for eventstreams config [puppet] - 10https://gerrit.wikimedia.org/r/415065 (https://phabricator.wikimedia.org/T187241) (owner: 10Ottomata) [18:36:33] (03PS1) 10Chad: Run initSiteStats twice a month [puppet] - 10https://gerrit.wikimedia.org/r/415066 (https://phabricator.wikimedia.org/T59788) [18:37:36] RECOVERY - puppet last run on bast4002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:37:37] ● kafkatee-webrequest-analytics.service loaded failed failed kafkatee-webrequest-analytics [18:37:41] ● kafkatee-webrequest.service loaded failed failed kafkatee-webrequest [18:37:46] on rhenium.wikimedia.org [18:47:13] (03CR) 10Alexandros Kosiaris: [C: 032] Double ORES worker count [puppet] - 10https://gerrit.wikimedia.org/r/415032 (owner: 10Awight) [18:47:19] (03PS2) 10Alexandros Kosiaris: Double ORES worker count [puppet] - 10https://gerrit.wikimedia.org/r/415032 (owner: 10Awight) [18:47:21] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Double ORES worker count [puppet] - 10https://gerrit.wikimedia.org/r/415032 (owner: 10Awight) [18:57:02] (03CR) 10Krinkle: Beta autoupdate: Clean up, support wmf-config itself (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 (owner: 10Chad) [18:57:28] ACKNOWLEDGEMENT - DPKG on rhodium is CRITICAL: DPKG CRITICAL dpkg reports broken packages daniel_zahn filippo puppetmaster stretch [19:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180227T1900) [19:00:04] No GERRIT patches in the queue for this window AFAICS. [19:03:16] !log otto@tin Started deploy [eventstreams/deploy@8f2eec4]: Publish page change related streams: T187241 (scb2002 only) [19:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:32] T187241: Add page-related topics to EventStreams - https://phabricator.wikimedia.org/T187241 [19:03:39] !log otto@tin Finished deploy [eventstreams/deploy@8f2eec4]: Publish page change related streams: T187241 (scb2002 only) (duration: 00m 22s) [19:03:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:54] !log otto@tin Started deploy [eventstreams/deploy@8f2eec4]: Publish page change related streams: T187241 [19:04:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:22] (03PS4) 10Cmjohnson: Adding dhcpd entries analytics1070-77 [puppet] - 10https://gerrit.wikimedia.org/r/415035 (https://phabricator.wikimedia.org/T188294) [19:07:25] (03CR) 10Cmjohnson: [V: 032 C: 032] Adding dhcpd entries analytics1070-77 [puppet] - 10https://gerrit.wikimedia.org/r/415035 (https://phabricator.wikimedia.org/T188294) (owner: 10Cmjohnson) [19:07:59] (03PS1) 10Dzahn: prometheus::beta: add httpd declaration to role class [puppet] - 10https://gerrit.wikimedia.org/r/415077 [19:08:10] !log otto@tin Finished deploy [eventstreams/deploy@8f2eec4]: Publish page change related streams: T187241 (duration: 04m 16s) [19:08:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:27] (03CR) 10Dzahn: [C: 032] prometheus::beta: add httpd declaration to role class [puppet] - 10https://gerrit.wikimedia.org/r/415077 (owner: 10Dzahn) [19:09:34] (03PS2) 10Dzahn: prometheus::beta: add httpd declaration to role class [puppet] - 10https://gerrit.wikimedia.org/r/415077 [19:23:47] 10Operations, 10ops-eqiad, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4007407 (10Cmjohnson) [19:25:12] 10Operations, 10ops-eqiad, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4002756 (10Cmjohnson) a:03RobH @robh handing this off to you to finish installs and turnover to Team Analytics. [19:28:48] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4007445 (10Gehel) There isn't much impact on response time or even load on the cluster at this point. So I would not worry too much yet. If we loose another node, this is going to b... [19:28:59] (03PS2) 10Ema: wmf-upgrade-and-reboot: upgrade the given host and reboot it [puppet] - 10https://gerrit.wikimedia.org/r/415047 [19:30:19] !log thcipriani@tin Started scap: testwiki to php-1.31.0-wmf.23 and rebuild l10n cache [19:30:28] (03PS1) 10Ladsgroup: Enable Wikibase RC injection for ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415078 [19:30:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:35] (03CR) 10Ema: wmf-upgrade-and-reboot: upgrade the given host and reboot it (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/415047 (owner: 10Ema) [19:35:00] (03PS1) 10Ladsgroup: Enable reading from full term entity id everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415080 (https://phabricator.wikimedia.org/T114903) [19:35:37] “enable reading from full term entity ID everywhere” \o/ [19:45:23] thcipriani: is the train happening today? [19:45:25] (03PS1) 10Rush: icinga: creaet irc-cloud-feed channel for ircbot [puppet] - 10https://gerrit.wikimedia.org/r/415083 (https://phabricator.wikimedia.org/T178405) [19:45:51] Hauskatze: yeah, started the l10nupdate at 19:30:28 [19:46:25] thcipriani: okay, it's because I'd like to backport a fix for global rename so we can unblock an stuck global rename [19:46:26] l10n cache rebuild/initial scap to testwiki still in progress currently [19:46:47] (03PS2) 10Rush: icinga: creaet irc-cloud-feed channel for ircbot [puppet] - 10https://gerrit.wikimedia.org/r/415083 (https://phabricator.wikimedia.org/T178405) [19:46:48] hopefully by morning swat this will be over, crossing fingers :) [19:47:04] Hauskatze: sure, do you have a link to the change? [19:47:37] (03PS3) 10Rush: icinga: create irc-cloud-feed channel for ircbot [puppet] - 10https://gerrit.wikimedia.org/r/415083 (https://phabricator.wikimedia.org/T178405) [19:47:44] thcipriani: this if in 'master' https://gerrit.wikimedia.org/r/#/c/414117/ [19:48:06] I understand it comes on wmf.23? [19:48:41] yep, we cut new branches from master tuesday western-US AM, so this looks like it should be there. [19:48:45] Hauskatze: Yes. [19:48:46] (03CR) 10Rush: [C: 032] icinga: create irc-cloud-feed channel for ircbot [puppet] - 10https://gerrit.wikimedia.org/r/415083 (https://phabricator.wikimedia.org/T178405) (owner: 10Rush) [19:49:11] "included in" include wmf.23, so you're good [19:49:39] okay so I'll restore a request to backport that to wmf.22 so the fix is on wmf.23 and wmf.22 wiki and so the script can work [19:49:52] Hauskatze: You already made that backport, but you abandoned it? [19:49:58] Hauskatze: https://gerrit.wikimedia.org/r/#/c/414972/ [19:50:17] James_F: yep, 'cause zeljkof wasn't sure about doing that on a SWAT window. [19:50:25] I'll restore that [19:50:39] Yeah, that's fine for a SWAT. [19:50:47] greg-g: never clicked that link, turns it's good advice :) [19:50:51] (Says the person who doesn't have to pick up the pieces if they're wrong.) [19:51:21] the concern was that it couldn't be tested [19:51:21] In general, b.d808 knows what they're doing. :-) [19:51:31] (03CR) 10Andrew Bogott: labweb: include mediawiki profiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [19:51:50] but once backported we can try unblocking that stuck global rename with the script and see if it works [19:52:00] It's getting tested by use, either following the train or a SWAT. At least with a SWAT we can test it in more isolation. [19:52:44] Restored. [19:56:30] 10Operations, 10ops-eqiad: rack/setup/install wdqs100[7-9] - https://phabricator.wikimedia.org/T188432#4007597 (10Cmjohnson) [19:58:38] PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 49.46, 27.70, 20.15 [19:59:55] (03PS4) 10Vgutierrez: Provide BGP session state visibility for every ASN/peer [debs/pybal] - 10https://gerrit.wikimedia.org/r/414973 (https://phabricator.wikimedia.org/T188085) [20:00:04] thcipriani: Time to snap out of that daydream and deploy MediaWiki train. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180227T2000). [20:00:04] No GERRIT patches in the queue for this window AFAICS. [20:00:37] ah, but I daydream *about* deploying MediaWiki so... [20:00:38] RECOVERY - High CPU load on API appserver on mw1227 is OK: OK - load average: 20.70, 24.45, 19.92 [20:01:17] (03CR) 10Vgutierrez: Provide BGP session state visibility for every ASN/peer (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/414973 (https://phabricator.wikimedia.org/T188085) (owner: 10Vgutierrez) [20:02:30] !log thcipriani@tin Finished scap: testwiki to php-1.31.0-wmf.23 and rebuild l10n cache (duration: 32m 10s) [20:02:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:46] !log temporarily disabling puppet agents and rebooting eqiad puppet masters for kernel update [20:03:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:17] (03PS4) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:05:55] (03CR) 10jerkins-bot: [V: 04-1] labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [20:08:16] (03PS5) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:08:38] PROBLEM - puppet last run on mw1293 is CRITICAL: CRITICAL: Puppet has 33 failures. Last run 1 minute ago with 33 failures. Failed resources (up to 3 shown): File[/home/mlitn],File[/home/andyrussg],File[/home/nikerabbit],File[/home/reedy] [20:08:45] !log eqiad puppet master reboots finished -- re-enabling puppet agents [20:08:46] wat [20:08:50] (03CR) 10jerkins-bot: [V: 04-1] labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [20:09:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:04] herron: wanna kick mw1293? [20:09:18] I'm guessing it tried to poll when the master was unavailable [20:09:35] sure [20:09:38] PROBLEM - puppet last run on lvs1007 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[ethtool_rss_combined_channels_eth0],Exec[ethtool_rss_combined_channels_eth1] [20:09:39] PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Puppet has 61 failures. Last run 1 minute ago with 61 failures. Failed resources (up to 3 shown): File[/etc/apache2/sites-available/03-main.conf],File[/etc/apache2/sites-available/04-remnant.conf],File[/etc/apache2/sites-available/05-search-wikimedia.conf],File[/etc/apache2/sites-available/06-secure-wikimedia.conf] [20:09:42] taking a look [20:16:26] (03CR) 10Krinkle: "Thanks. For now I'll keep the other ones, but feel free to propose removal of those in a separate commit/task." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413978 (owner: 10Krinkle) [20:17:23] (03PS6) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:18:01] (03CR) 10jerkins-bot: [V: 04-1] labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [20:18:38] RECOVERY - puppet last run on mw1293 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [20:19:57] (03PS7) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:20:04] (03PS1) 10Thcipriani: Group0 to 1.31.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415089 [20:20:33] (03CR) 10jerkins-bot: [V: 04-1] labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [20:21:34] (03CR) 10Smalyshev: wdqs: icinga check for categories updates (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/415010 (https://phabricator.wikimedia.org/T188293) (owner: 10Gehel) [20:21:48] (03PS8) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:22:11] (03CR) 10Thcipriani: [C: 032] Group0 to 1.31.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415089 (owner: 10Thcipriani) [20:23:11] (03PS9) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:23:24] (03Merged) 10jenkins-bot: Group0 to 1.31.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415089 (owner: 10Thcipriani) [20:27:08] (03PS10) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:29:38] RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [20:31:22] !log thcipriani@tin rebuilt and synchronized wikiversions files: Group0 to 1.31.0-wmf.23 [20:31:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:01] (03CR) 10jenkins-bot: Group0 to 1.31.0-wmf.23 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415089 (owner: 10Thcipriani) [20:33:40] (03PS11) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:36:35] !log ppchelko@tin Started deploy [eventstreams/deploy@8f2eec4]: Set correct CSP headers [20:36:43] (03PS12) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:36:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:37:01] !log ppchelko@tin Finished deploy [eventstreams/deploy@8f2eec4]: Set correct CSP headers (duration: 00m 25s) [20:37:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:39:31] !log ppchelko@tin Started deploy [eventstreams/deploy@14e0b03]: Set correct CSP headers, forgot to git pull [20:39:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:00] !log ppchelko@tin Finished deploy [eventstreams/deploy@14e0b03]: Set correct CSP headers, forgot to git pull (duration: 02m 29s) [20:42:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:41] (03PS13) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:44:06] (03PS1) 10Dzahn: phragile: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/415092 [20:47:16] (03PS14) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:48:34] (03CR) 10Dzahn: [C: 032] phragile: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/415092 (owner: 10Dzahn) [20:49:27] (03PS15) 10Andrew Bogott: labweb: include mediawiki profiles [puppet] - 10https://gerrit.wikimedia.org/r/415019 (https://phabricator.wikimedia.org/T168470) [20:53:39] (03CR) 10Dzahn: [C: 032] "this is labs-only and the instance is "phragile-pro" in the project "phragile" https://tools.wmflabs.org/openstack-browser/puppetclass/rol" [puppet] - 10https://gerrit.wikimedia.org/r/415092 (owner: 10Dzahn) [21:01:00] (03CR) 10Vgutierrez: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/415047 (owner: 10Ema) [21:17:09] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [21:18:09] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [21:23:52] !log thcipriani@tin Synchronized php-1.31.0-wmf.23/includes/user/User.php: [[gerrit:415101|Add a missing check of $wgActorTableSchemaMigrationStage]] T188437 (duration: 01m 14s) [21:24:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:24:09] T188437: mediawikiwiki.actor does not exist - https://phabricator.wikimedia.org/T188437 [21:29:26] !log ALL global renames failing [21:29:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:30:33] all global renames failing on mediawiki.org [21:30:37] Hauskatze: It's probably fixed [21:30:43] I guess it's because of the train-in-progress [21:30:45] Based on the patch thcipriani just dpeloyed [21:31:06] the system tries to restart the failed jobs once every three hours so they should be fine [21:31:37] s/once every/only once after 3 hours [21:49:18] (03PS1) 10Dzahn: prometheus::tools: add httpd declaration [puppet] - 10https://gerrit.wikimedia.org/r/415164 [21:50:13] (03CR) 10Paladox: [C: 031] prometheus::tools: add httpd declaration [puppet] - 10https://gerrit.wikimedia.org/r/415164 (owner: 10Dzahn) [21:50:31] (03CR) 10Dzahn: [C: 032] prometheus::tools: add httpd declaration [puppet] - 10https://gerrit.wikimedia.org/r/415164 (owner: 10Dzahn) [21:53:24] (03PS6) 10Gehel: wdqs: icinga check for categories updates [puppet] - 10https://gerrit.wikimedia.org/r/415010 (https://phabricator.wikimedia.org/T188293) [21:54:37] (03CR) 10Gehel: wdqs: icinga check for categories updates (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/415010 (https://phabricator.wikimedia.org/T188293) (owner: 10Gehel) [22:01:20] 10Operations, 10Puppet, 10Patch-For-Review: puppetdb4: use postgres db backend in puppet-compiler - https://phabricator.wikimedia.org/T187258#4008186 (10herron) While https://gerrit.wikimedia.org/r/413881 is still a work in progress, I was able to put together a postgres/puppetdb4 backed compiler instance us... [22:01:22] (03PS1) 10Rush: icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) [22:01:52] (03CR) 10jerkins-bot: [V: 04-1] icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) (owner: 10Rush) [22:02:32] 10Operations, 10Ops-Access-Requests, 10Discovery-Search: Google Search Console access for Search Platform team - https://phabricator.wikimedia.org/T188453#4008190 (10mpopov) [22:08:29] (03PS2) 10Rush: icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) [22:09:02] (03CR) 10jerkins-bot: [V: 04-1] icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) (owner: 10Rush) [22:10:36] 10Operations, 10Puppet: Upgrade PuppetDB to version 4.4 - https://phabricator.wikimedia.org/T177253#4008267 (10herron) [22:12:06] (03PS2) 10Smalyshev: Add configuration for CirrusSearch to instantly index new Wikidata items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413899 (https://phabricator.wikimedia.org/T183053) [22:12:43] (03PS3) 10Rush: icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) [22:13:16] (03CR) 10jerkins-bot: [V: 04-1] icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) (owner: 10Rush) [22:14:35] (03PS4) 10Rush: icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) [22:15:09] (03CR) 10jerkins-bot: [V: 04-1] icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) (owner: 10Rush) [22:17:29] (03PS5) 10Rush: icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) [22:21:36] 10Operations: TransparencyReport-private is not auto deploying - https://phabricator.wikimedia.org/T188224#4000314 (10Dzahn) The reason was that puppet was disabled on the instance, so puppet didn't run and therefore didn't clone. The reason it was disabled was that we had to once do a content revert for anothe... [22:22:16] 10Operations: TransparencyReport-private is not auto deploying - https://phabricator.wikimedia.org/T188224#4008337 (10Dzahn) 05Invalid>03Resolved [22:27:43] 10Operations, 10DNS, 10Traffic: Move "transparency.wikimedia.org/private" to "transparency-private.wikimedia.org" - https://phabricator.wikimedia.org/T188362#4008362 (10Dzahn) a:03Dzahn [22:30:55] (03PS2) 10Dzahn: Partman: Add wdqs200[4-6] to partman [puppet] - 10https://gerrit.wikimedia.org/r/415037 (owner: 10Papaul) [22:31:29] (03CR) 10jerkins-bot: [V: 04-1] Partman: Add wdqs200[4-6] to partman [puppet] - 10https://gerrit.wikimedia.org/r/415037 (owner: 10Papaul) [22:32:26] (03PS3) 10Dzahn: Partman: Add wdqs200[4-6] to partman [puppet] - 10https://gerrit.wikimedia.org/r/415037 (https://phabricator.wikimedia.org/T187800) (owner: 10Papaul) [22:32:55] (03CR) 10Dzahn: "fixed commit message, jenkins -1 is always due to that one missing space after "Bug: "" [puppet] - 10https://gerrit.wikimedia.org/r/415037 (https://phabricator.wikimedia.org/T187800) (owner: 10Papaul) [22:37:06] (03CR) 10Rush: [C: 032] icinga: change alerting for openstack things [puppet] - 10https://gerrit.wikimedia.org/r/415167 (https://phabricator.wikimedia.org/T178405) (owner: 10Rush) [22:38:48] 10Operations: TransparencyReport-private is not auto deploying - https://phabricator.wikimedia.org/T188224#4008423 (10APalmer_WMF) Thanks @Dzahn! We were pretty confused about what happened, so we're glad to have it explained. [22:41:29] (03CR) 10Dzahn: [C: 032] Partman: Add wdqs200[4-6] to partman [puppet] - 10https://gerrit.wikimedia.org/r/415037 (https://phabricator.wikimedia.org/T187800) (owner: 10Papaul) [22:41:42] (03PS4) 10Dzahn: Partman: Add wdqs200[4-6] to partman [puppet] - 10https://gerrit.wikimedia.org/r/415037 (https://phabricator.wikimedia.org/T187800) (owner: 10Papaul) [22:43:50] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1943 bytes in 0.123 second response time [22:45:26] (03PS2) 10Dzahn: DHCP: Add dhcp entries for wdqs200[4-6] [puppet] - 10https://gerrit.wikimedia.org/r/415033 (https://phabricator.wikimedia.org/T187800) (owner: 10Papaul) [22:45:33] (03PS3) 10Dzahn: DHCP: Add dhcp entries for wdqs200[4-6] [puppet] - 10https://gerrit.wikimedia.org/r/415033 (https://phabricator.wikimedia.org/T187800) (owner: 10Papaul) [22:49:53] (03CR) 10Dzahn: [C: 032] DHCP: Add dhcp entries for wdqs200[4-6] [puppet] - 10https://gerrit.wikimedia.org/r/415033 (https://phabricator.wikimedia.org/T187800) (owner: 10Papaul) [22:53:18] 10Operations, 10MediaWiki-Platform-Team, 10PHP 7.0 support, 10HHVM, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#4008491 (10Jdforrester-WMF) [22:53:50] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1932 bytes in 0.106 second response time [22:54:29] (03PS1) 10Legoktm: wmcs: Notify legoktm for codesearch alerts [puppet] - 10https://gerrit.wikimedia.org/r/415178 [23:00:10] going to trigger keystone alerts I expect to see here from labtestcontrol2001 [23:01:02] err glance [23:02:30] PROBLEM - glance-api http on labtestcontrol2001 is CRITICAL: connect to address 208.80.153.47 and port 9292: Connection refused [23:03:02] q.e.d. [23:03:31] RECOVERY - glance-api http on labtestcontrol2001 is OK: HTTP OK: HTTP/1.1 300 Multiple Choices - 817 bytes in 0.076 second response time [23:25:06] 10Operations, 10Traffic: varnish: discard cold vcl - https://phabricator.wikimedia.org/T187778#4008615 (10BBlack) [23:25:09] 10Operations, 10Traffic, 10Patch-For-Review: VCL discards crash varnish frontend child process - https://phabricator.wikimedia.org/T188089#4008613 (10BBlack) 05Open>03Resolved a:03BBlack [23:27:31] (03CR) 10Krinkle: [C: 032] Remove unused pp_stage1_raw dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413978 (owner: 10Krinkle) [23:27:40] (03CR) 10jerkins-bot: [V: 04-1] Remove unused pp_stage1_raw dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413978 (owner: 10Krinkle) [23:27:49] (03PS3) 10Krinkle: Remove unused pp_stage1_raw dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413978 [23:28:21] (03PS1) 10Cmjohnson: updating netboot.cfg for analytics1070-77 [puppet] - 10https://gerrit.wikimedia.org/r/415184 (https://phabricator.wikimedia.org/T188294) [23:32:24] (03CR) 10jenkins-bot: Remove unused pp_stage1_raw dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413978 (owner: 10Krinkle) [23:32:39] Staging ^ on mwdebug1002 [23:33:57] Hm... noc.wikimdia.org doesn't obey x-wikimedia-debug [23:34:07] It technically would work fine on any app server afaik. [23:34:14] Oh well [23:35:11] Ah, it also doesn't run on tin, it runs on terbium. [23:35:14] Hm.. okay [23:37:26] no_justification: you think it's okay to do now a wmf.22 backport? [23:37:51] Ask thcipriani, he's train conductor this week [23:38:04] is he finished with group0 already? [23:38:19] Krinkle: I don't see why we couldn't load balance it across all apaches, but maybe skip caching [23:38:40] One node creates a spof, and as you said no reason it couldn't be on any of 'em [23:38:49] no_justification: Well, I kind of like it being isolated in terms of traffic. Makes it more consistent, and also less risky given it's less quality code. [23:38:51] !log krinkle@tin Synchronized dblists/: remove pp_stage1_raw.dblist (duration: 01m 14s) [23:38:59] It is already balanced between codfw/eqiad it seems [23:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:08] In which case, the deploy master makes more sense than terbium/wasat [23:39:09] :) [23:39:17] Anyway, bikesheds. [23:39:18] Right, I do agree with that^ [23:39:25] tin/mira instead of terbium/wasat [23:39:37] But maybe we dont' want user traffic on the deployment host [23:39:41] Another form of isolation [23:40:10] Anyway, a scap pull on terbium/wasat is as easy as a pull from mwdebug1002 [23:40:11] so not to worried [23:40:21] Just forgot where it ran [23:47:37] (03PS4) 10Chad: Beta autoupdate: Clean up, support wmf-config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 [23:49:09] Hauskatze: I am done with group0 [23:49:42] Krinkle: So, we just re-implemented two Jenkins jobs containing 143 lines of Python and 19 lines of bash....using 36 lines of a scap plugin? [23:50:17] thcipriani: think we can have https://gerrit.wikimedia.org/r/#/c/414972/ merged now and the unblock script run so https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Drytime%25$1600 gets done? [23:51:23] no_justification: Krinkle are you in the middle of deploying anything right now? Or can I sneak in ^ [23:52:10] (03CR) 10Krinkle: Beta autoupdate: Clean up, support wmf-config itself (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 (owner: 10Chad) [23:52:19] thcipriani: Go ahead :) [23:52:27] thanks [23:56:55] Hauskatze: it's live on mwdebug1002 if there is anything you want to test there before broader deployment [23:57:02] (03PS5) 10Chad: Beta autoupdate: Clean up, support wmf-config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414909 [23:57:13] thcipriani: I'll see if the wikis ain't broken [23:57:21] thanks :) [23:58:02] thcipriani: wikis showing normal, cannot test further from my side [23:58:45] ok, going live [23:59:51] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:59:51] PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues