[00:00:05] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170209T0000). [00:00:05] jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [00:00:39] jouncebot: how about me [00:01:13] mutante: you programmed it 24 hours earlier [00:01:28] I can SWAT. [00:01:28] (I was expecting that, too) [00:02:13] mutante: yeah, that's easy to shift one day in the deployments table [00:02:19] Platonides: you were expecting it because it happens every time ?:p [00:02:29] i am fixing [00:02:35] Platonides: if you add a last minute patch, you can ask "jouncebot: refresh" [00:02:40] (03CR) 10Bmansurov: [C: 031] Beta cluster should show related pages 100% of time [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336732 (https://phabricator.wikimedia.org/T157372) (owner: 10Jdlrobson) [00:02:52] mutante: I was expecting your entry :P [00:03:29] RECOVERY - puppet last run on labvirt1014 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:03:38] Platonides: by the way, you've some config changes waiting [00:04:06] Platonides: https://gerrit.wikimedia.org/r/#/q/project:operations/mediawiki-config+owner:%22Platonides+%253Cplatonides%2540gmail.com%253E%22+status:open [00:04:31] Platonides: I imagine we could deploy Quiz on es.wikibooks and if you quickly fix it the throttle rule? [00:04:58] Dereckson: i fixed the calendar entry [00:04:59] (03CR) 10Dereckson: [C: 031] Remove outdated comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336704 (owner: 10Platonides) [00:05:18] that should be trivial to rebase [00:05:32] (03PS2) 10Dereckson: Use https:// urls when communicating with PediaPress [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336340 (https://phabricator.wikimedia.org/T157398) (owner: 10Platonides) [00:05:44] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336340 (https://phabricator.wikimedia.org/T157398) (owner: 10Platonides) [00:06:05] thanks [00:07:07] (03Merged) 10jenkins-bot: Use https:// urls when communicating with PediaPress [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336340 (https://phabricator.wikimedia.org/T157398) (owner: 10Platonides) [00:07:15] (03CR) 10jenkins-bot: Use https:// urls when communicating with PediaPress [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336340 (https://phabricator.wikimedia.org/T157398) (owner: 10Platonides) [00:07:16] note: the merge failed on the eswikibooks patch, not the throttle one [00:07:50] i'll handle the style [00:08:00] Platonides: fwiw, we can't test it because "The service is currently down." [00:08:11] say the docs [00:08:23] is it? [00:08:34] Platonides: mutante: live on mwdebug1002 [00:08:34] i just say it is claimed by https://en.wikipedia.org/wiki/Help:Books/PediaPress_PDF_rendering [00:08:47] that's probably the public server [00:09:33] works for me [00:09:42] Is swat happening Dereckson ? [00:09:53] some how i missed my ping sorry [00:09:57] jdlrobson: yes [00:10:02] jdlrobson: you're next :) [00:10:04] (03PS3) 10Platonides: Enable Quiz on Spanish Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336553 (https://phabricator.wikimedia.org/T157513) [00:10:06] sweet :) [00:10:09] rebased [00:10:20] Dereckson: i'll have one more config patch soon for swat if it fits [00:10:21] Dereckson: thanks, it can go ahead [00:10:29] okay, syncing [00:10:31] ebernhardson: ack'ed [00:10:41] ebernhardson: yes, we've still free slots for changes [00:11:01] !log dereckson@tin Synchronized wmf-config/CommonSettings.php: Use https:// urls when communicating with PediaPress (T157398) (duration: 00m 41s) [00:11:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:11:08] T157398: Book collections communicate with pediapress using http: - https://phabricator.wikimedia.org/T157398 [00:11:38] (03PS2) 10Dereckson: wgMinervaUseFooterV2 config flag no longer necessary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336664 (https://phabricator.wikimedia.org/T157075) (owner: 10Jdlrobson) [00:12:19] jdlrobson: ok, seen the note about ready to land today instead to wait the 9 [00:13:03] (03PS1) 10EBernhardson: Switc to SiteMatrixInterwikiResolver for AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336738 [00:13:12] (03CR) 10jerkins-bot: [V: 04-1] Switc to SiteMatrixInterwikiResolver for AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336738 (owner: 10EBernhardson) [00:13:39] (03PS2) 10EBernhardson: Switc to SiteMatrixInterwikiResolver for AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336738 [00:14:47] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336664 (https://phabricator.wikimedia.org/T157075) (owner: 10Jdlrobson) [00:15:59] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.006 second response time [00:16:00] (03PS2) 10Platonides: Raise account creation limit on it.wikiversity for a couple of schools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336552 (https://phabricator.wikimedia.org/T157504) [00:16:28] Dereckson: did you see the correction to the note that we dont need to wait till the 9th anymore? [00:16:46] I confirmed that Special:Book - Preview with Pediapress still works fine after synx [00:16:46] (answer to my own question: yes you did) [00:16:52] (03Merged) 10jenkins-bot: wgMinervaUseFooterV2 config flag no longer necessary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336664 (https://phabricator.wikimedia.org/T157075) (owner: 10Jdlrobson) [00:16:55] (03CR) 10Platonides: "It was indeed ugly. Sorry about that." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336552 (https://phabricator.wikimedia.org/T157504) (owner: 10Platonides) [00:16:59] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.006 second response time [00:17:04] (03CR) 10jenkins-bot: wgMinervaUseFooterV2 config flag no longer necessary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336664 (https://phabricator.wikimedia.org/T157075) (owner: 10Jdlrobson) [00:17:06] jdlrobson: yes, I've seen it [00:17:11] mutante: ok [00:17:32] jdlrobson: live on mwdebug1002 [00:17:37] Dereckson: testing :) [00:17:49] (03CR) 10Dzahn: "confirmed that Special:Book, creating a book, preview with Pediapress, still works fine after sync" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336340 (https://phabricator.wikimedia.org/T157398) (owner: 10Platonides) [00:18:10] (03CR) 10Dereckson: [C: 032] "SWAT (no-op in prod, labs only)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336732 (https://phabricator.wikimedia.org/T157372) (owner: 10Jdlrobson) [00:18:16] 06Operations, 10Collection, 10Traffic, 07HTTPS, 13Patch-For-Review: Book collections communicate with pediapress using http: - https://phabricator.wikimedia.org/T157398#3011735 (10Dzahn) confirmed that Special:Book, creating a book, preview with Pediapress, still works fine after sync of the change above... [00:18:35] 06Operations, 10Collection, 10Traffic, 07HTTPS, 13Patch-For-Review: Book collections communicate with pediapress using http: - https://phabricator.wikimedia.org/T157398#3011736 (10Dzahn) 05Open>03Resolved a:05Dzahn>03Platonides [00:19:01] Dereckson: LGTM [00:19:08] * Dereckson nods [00:19:25] (03CR) 10Platonides: "I too confirm that there doesn't seem to be any loss of functionality." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336340 (https://phabricator.wikimedia.org/T157398) (owner: 10Platonides) [00:19:39] (03Merged) 10jenkins-bot: Beta cluster should show related pages 100% of time [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336732 (https://phabricator.wikimedia.org/T157372) (owner: 10Jdlrobson) [00:19:48] (03CR) 10jenkins-bot: Beta cluster should show related pages 100% of time [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336732 (https://phabricator.wikimedia.org/T157372) (owner: 10Jdlrobson) [00:20:54] (03PS2) 10Dereckson: Labs instances should reflect production value for RelatedArticlesShowInFooter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336733 (owner: 10Jdlrobson) [00:20:54] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Prune wgMinervaUseFooterV2 (T157075) (duration: 00m 41s) [00:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:21:00] T157075: Cleanup feature flags for footer work - https://phabricator.wikimedia.org/T157075 [00:21:09] (03CR) 10Dereckson: [C: 032] "SWAT (no-op in prod, labs only)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336733 (owner: 10Jdlrobson) [00:22:28] (03Merged) 10jenkins-bot: Labs instances should reflect production value for RelatedArticlesShowInFooter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336733 (owner: 10Jdlrobson) [00:22:40] (03CR) 10jenkins-bot: Labs instances should reflect production value for RelatedArticlesShowInFooter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336733 (owner: 10Jdlrobson) [00:23:08] I add the eswikibooks config? [00:23:40] Platonides: k [00:24:10] !log dereckson@tin Synchronized wmf-config/InitialiseSettings-labs.php: Configuration changes for RelatedArticles (labs only, [[Gerrit:336732]] and [[Gerrit:336733]]) (duration: 00m 40s) [00:24:13] (03PS4) 10Platonides: Enable Quiz on Spanish Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336553 (https://phabricator.wikimedia.org/T157513) [00:24:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:24:15] (03PS5) 10Dereckson: Enable Quiz on Spanish Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336553 (https://phabricator.wikimedia.org/T157513) (owner: 10Platonides) [00:24:37] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336553 (https://phabricator.wikimedia.org/T157513) (owner: 10Platonides) [00:25:22] jdlrobson: you're happy with the labs change? They should have reached beta now [00:25:39] yup [00:25:40] looking good [00:25:48] Thanks! Is the footer one live now? [00:25:51] (03Merged) 10jenkins-bot: Enable Quiz on Spanish Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336553 (https://phabricator.wikimedia.org/T157513) (owner: 10Platonides) [00:26:00] (03CR) 10jenkins-bot: Enable Quiz on Spanish Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336553 (https://phabricator.wikimedia.org/T157513) (owner: 10Platonides) [00:26:55] Platonides: Quiz on Spanish Wikibooks live on mwdebug1002 [00:27:04] jdlrobson: I think so, yes [00:27:12] 06Operations, 10Ops-Access-Requests: Request for access to stat1003 for Sam Tarling - https://phabricator.wikimedia.org/T157483#3011760 (10RobH) @Samtar: I forgot one more thing. New requirements mean we need to list an email account with the new shell account. What email address would you like used? Additi... [00:27:27] jdlrobson: at least, it's synced to prod [00:27:31] Brilliant thank you Dereckson ! :D [00:27:35] You're welcome [00:28:29] (03PS3) 10Dereckson: Switc to SiteMatrixInterwikiResolver for AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336738 (owner: 10EBernhardson) [00:28:59] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.008 second response time [00:29:13] Dereckson: quick question [00:29:21] ebernhardson: User:... subpages aren't the more stable URLs from experience [00:29:36] jdlrobson: yes? [00:29:36] If default in RelatedArticlesEnabledSamplingRate is set to 1 will that override 'enwiki'=> 0.9 in production [00:29:39] or do i need to be explicit? [00:29:43] i'm not seeing that working [00:29:45] Dereckson: doesn't seem to be active? [00:30:04] Dereckson: ? [00:30:11] jdlrobson: enwiki will override the 'wikipedia' value, the 'wikipedia' value will override 'default' [00:30:19] jdlrobson: database > dblist > default [00:30:24] doh [00:30:41] (03PS1) 10Jdlrobson: Also override the production value of enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336740 [00:30:44] ^ Dereckson i have this follow up then [00:30:46] ok [00:31:16] jdlrobson: the goal of these priorities is to submit sensible defaults for all the cluster, or a part of it, then customize the setting per wiki [00:31:58] ebernhardson: you've added https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Spaceless_Writing_Systems_and_Wiki-Projects as a comment [00:32:05] Dereckson: using mwdebug1002, quiz is not listed on https://es.wikibooks.org/wiki/Especial:Versión [00:32:34] Platonides: okay, fixed [00:32:48] sorry about that Dereckson [00:32:59] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.006 second response time [00:33:18] it loads now [00:33:18] Platonides: it seems to be there now :) [00:33:24] and reacts to tag [00:33:28] ok [00:33:32] logs are happy too [00:33:35] Dereckson: hmm, that shuldn't be part of my patch, that should already be configured. Sec i gotta check history [00:33:36] nice [00:34:47] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable Quiz on Spanish Wikibooks (duration: 00m 41s) [00:34:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:35:44] Dereckson: ahh, now i remember that is a change i was going to deploy, but went a different direction. Sec i'll push a new patch. [00:35:51] ebernhardson: ok [00:36:37] (03PS4) 10EBernhardson: Switch to SiteMatrixInterwikiResolver for AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336738 [00:36:42] Dereckson: ^ should be right now [00:37:34] ok [00:38:28] lol, flow thanked me for my tenth edit to eswikibooks [00:39:09] Platonides: are you sure it's an automated thank? afaik thank's are only done by users (but i havn't been in that codebase in awhile) [00:39:47] it doesn't have an user associated [00:39:53] interesting [00:40:54] Dereckson: is it possible to get https://gerrit.wikimedia.org/r/336740 in the SWAT window or will I have to wait till another time? [00:41:34] (03PS2) 10Dereckson: Always enable wgRelatedArticlesEnabledSamplingRate on en.wikipedia.beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336740 (owner: 10Jdlrobson) [00:41:42] jdlrobson: would it really be doing anything? [00:41:43] (03PS3) 10Dereckson: Always enable wgRelatedArticlesEnabledSamplingRate on en.wikipedia.beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336740 (owner: 10Jdlrobson) [00:42:16] jdlrobson: yes, yes, I was going to merge it, labs change are generally okay [00:42:17] it looks like a no-op [00:42:48] jdlrobson: for follow-up, if you use commits hashes instead of Gerrit ones, you allow to browse them in Phabricator by the way [00:42:57] (03CR) 10Dereckson: [C: 032] "SWAT (no-op, labs only)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336740 (owner: 10Jdlrobson) [00:43:18] jdlrobson: Phabricator will cross-link them [00:43:33] Dereckson: good advice! [00:43:38] (and Gerrit also allows to click on hashes, and GitHub too) [00:43:58] Platonides .. so in InitialiseSettings.php 'enwiki' is 0.9 [00:44:03] and I'm seeing 0.9 on the beta cluster. [00:44:26] I assume the 0.9 is applying here too so I need to override it in the beta cluster? [00:44:26] (03Merged) 10jenkins-bot: Always enable wgRelatedArticlesEnabledSamplingRate on en.wikipedia.beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336740 (owner: 10Jdlrobson) [00:44:34] jdlrobson: track the Jenkins job to push config, will be 1 soon [00:44:43] beta cluster is picking it from production? :s [00:44:45] (03CR) 10jenkins-bot: Always enable wgRelatedArticlesEnabledSamplingRate on en.wikipedia.beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336740 (owner: 10Jdlrobson) [00:44:48] Platonides: yes [00:44:50] well, maybe [00:45:02] Platonides: usually the production values apply on the beta cluster unless the beta cluster overrides them [00:45:05] at least that's my understanding [00:45:06] I'm not familiar with how prod is loaded there [00:45:22] I thought just setting a default value in labs would override anything more specific in production but that doesnt seem to be the case [00:45:30] Platonides: IS is loaded first, then it's overrided by IS-labs [00:46:25] ok [00:47:12] Platonides: by the way, could you fix too the throttle rule change please? [00:47:36] !log dereckson@tin Synchronized wmf-config/InitialiseSettings-labs.php: Configuration change for RelatedArticles (labs only, [[Gerrit:336740]]) (duration: 00m 40s) [00:47:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:47:47] I thought I edited it [00:48:12] I did [00:48:15] (03PS5) 10Dereckson: Switch to SiteMatrixInterwikiResolver for AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336738 (owner: 10EBernhardson) [00:48:32] ebernhardson: so https://gerrit.wikimedia.org/r/#/c/336738/ is fine now? [00:48:51] the commit msg is still a bit long, but it's good enough if it was truncated at 52 [00:49:01] Platonides: 52 is awesome [00:49:04] Dereckson: yes [00:49:21] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336738 (owner: 10EBernhardson) [00:49:22] heh, no [00:49:37] it's near the upper limit ;) [00:49:58] Platonides: yes, http://chris.beams.io/posts/git-commit/ suggests 50 max [00:50:20] But what are really annoying are the > 72 [00:50:22] (03PS3) 10Platonides: Raise account creation limit on it.wikiversity for a couple of schools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336552 (https://phabricator.wikimedia.org/T157504) [00:50:27] GitHub for example truncates the first line at 72 [00:50:31] I had left it at 71 :P [00:50:47] Platonides: btw for your thank, it looks like a particular user thanked you, based on the thank logs: https://es.wikibooks.org/w/index.php?title=Especial%3ARegistro&type=thanks&user=&page=&year=&month=-1&tagfilter=&hide_thanks_log=1&hide_tag_log=1&uselang=en sounds like there might be a problem rendering them if it doesn't say that user in your notification [00:50:47] feel free to amend it [00:51:02] Zuul is slow today [00:51:17] (03Merged) 10jenkins-bot: Switch to SiteMatrixInterwikiResolver for AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336738 (owner: 10EBernhardson) [00:51:17] not worth thinking how to rephrase it to be shorter imho, [00:51:25] (03CR) 10jenkins-bot: Switch to SiteMatrixInterwikiResolver for AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336738 (owner: 10EBernhardson) [00:51:40] ebernhardson: yes [00:51:51] ebernhardson: live on mwdebug1002 [00:52:00] he thanked me for requesting the install (ie. opening the phabricator ticket) [00:52:49] Dereckson: looks to work as expected [00:53:21] Dereckson: it should be live right? [00:53:24] because i'm not seeing it :/ [00:53:50] ohh now i am [00:54:06] Dereckson: <3 thanks a bunch gotta run but thankyou for all your help today! [00:54:08] jdlrobson: there is a Jenkins job to track for the push to beta [00:54:09] You're welcome [00:55:29] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:55:56] good night everyone [00:56:27] Good night Platonides, thanks for your help for config changes. [00:56:35] ebernhardson: okay, syncing [00:57:20] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Switch to SiteMatrixInterwikiResolver for AB test ([[Gerrit:336738]]) (duration: 00m 41s) [00:57:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:28] (03PS4) 10Dereckson: Raise account creation limit on it.wikiversity for a couple of schools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336552 (https://phabricator.wikimedia.org/T157504) (owner: 10Platonides) [00:57:42] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336552 (https://phabricator.wikimedia.org/T157504) (owner: 10Platonides) [00:57:44] And the last. [00:59:21] (03Merged) 10jenkins-bot: Raise account creation limit on it.wikiversity for a couple of schools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336552 (https://phabricator.wikimedia.org/T157504) (owner: 10Platonides) [00:59:38] (03CR) 10jenkins-bot: Raise account creation limit on it.wikiversity for a couple of schools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336552 (https://phabricator.wikimedia.org/T157504) (owner: 10Platonides) [01:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170209T0100). [01:02:40] !log starting phabricator deployment #phab-2017-02-08 [01:02:40] (03PS1) 10Dereckson: Clean throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336741 [01:02:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:02:59] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336741 (owner: 10Dereckson) [01:02:59] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.223 second response time [01:04:39] (03Merged) 10jenkins-bot: Clean throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336741 (owner: 10Dereckson) [01:06:11] Works, syncing. [01:06:49] (03CR) 10jenkins-bot: Clean throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336741 (owner: 10Dereckson) [01:07:21] !log dereckson@tin Synchronized wmf-config/throttle.php: Update throttle rules ([[Gerrit:336552]] for it.wikiversity + [[Gerrit:336741]] for cleaning) (duration: 00m 40s) [01:07:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:07:38] SWAT done. [01:07:59] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [01:10:26] !log phabricator upgrade finished. [01:10:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:12:59] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.006 second response time [01:15:23] (03CR) 10Tim Landscheidt: "That gives me an opportunity to canvas for https://www.mediawiki.org/wiki/Developer_Wishlist/2017/Code_Contribution_(Process,_Guidelines,_" [puppet] - 10https://gerrit.wikimedia.org/r/336351 (https://phabricator.wikimedia.org/T157400) (owner: 10Tim Landscheidt) [01:15:59] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.005 second response time [01:23:29] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [01:27:59] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.007 second response time [01:29:59] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.119 second response time [01:30:59] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.008 second response time [02:17:29] PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.32.133 on port 6479 [02:18:29] RECOVERY - Redis replication status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 3412663 keys, up 100 days 17 hours - replication_delay is 0 [02:18:39] PROBLEM - Redis replication status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [02:19:39] RECOVERY - Redis replication status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3412630 keys, up 100 days 17 hours - replication_delay is 0 [02:36:11] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.10) (duration: 15m 05s) [02:36:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:49:42] Every time I try to log in on the Beta Cluster, I get a database error (and can't log in). I'll file a Phab ticket... [02:59:59] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [03:02:59] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [03:05:39] PROBLEM - puppet last run on mw1250 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:09:11] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.11) (duration: 15m 19s) [03:09:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:14:39] PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:14:55] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Feb 9 03:14:55 UTC 2017 (duration 5m 44s) [03:15:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:16:40] (03PS1) 10Kaldari: Setting $wgPageAssessmentsSubprojects to true on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336743 (https://phabricator.wikimedia.org/T157654) [03:22:19] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 785.89 seconds [03:33:39] RECOVERY - puppet last run on mw1250 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [03:34:19] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 225.91 seconds [03:42:39] RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [04:01:59] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [04:04:29] PROBLEM - puppet last run on maps1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:04:59] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [04:23:59] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=851.20 Read Requests/Sec=404.10 Write Requests/Sec=1.90 KBytes Read/Sec=41366.00 KBytes_Written/Sec=146.40 [04:33:29] RECOVERY - puppet last run on maps1003 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [04:34:59] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=0.70 Read Requests/Sec=1.50 Write Requests/Sec=0.40 KBytes Read/Sec=28.40 KBytes_Written/Sec=5.60 [04:39:19] PROBLEM - Host mr1-esams.oob is DOWN: PING CRITICAL - Packet loss = 100% [04:56:59] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [04:59:59] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [05:03:59] PROBLEM - puppet last run on notebook1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:05:49] RECOVERY - Host mr1-esams.oob is UP: PING OK - Packet loss = 0%, RTA = 119.37 ms [05:32:59] RECOVERY - puppet last run on notebook1001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [05:44:59] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [05:48:00] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [06:14:29] PROBLEM - Host mr1-esams.oob is DOWN: PING CRITICAL - Packet loss = 100% [06:17:42] (03PS1) 10Gergő Tisza: Fix SiteConfiguration array merge syntax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336747 (https://phabricator.wikimedia.org/T157656) [06:21:38] (03PS2) 10Gergő Tisza: Fix SiteConfiguration array merge syntax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336747 (https://phabricator.wikimedia.org/T157656) [06:33:59] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [06:36:59] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [06:58:33] 06Operations, 10Ops-Access-Requests: Request for access to stat1003 for Sam Tarling - https://phabricator.wikimedia.org/T157483#3012203 (10Samtar) @RobH Awesome, thank you - samtar.on.en.wp@gmail.com would be my preferred email to be listed [06:59:29] PROBLEM - Disk space on elastic2001 is CRITICAL: DISK CRITICAL - free space: / 4355 MB (9% inode=98%) [07:00:19] PROBLEM - Disk space on elastic2016 is CRITICAL: DISK CRITICAL - free space: / 2884 MB (6% inode=98%) [07:06:45] (03PS1) 10Marostegui: db-eqiad.php: Repool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336750 (https://phabricator.wikimedia.org/T156126) [07:09:55] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336750 (https://phabricator.wikimedia.org/T156126) (owner: 10Marostegui) [07:11:26] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336750 (https://phabricator.wikimedia.org/T156126) (owner: 10Marostegui) [07:11:34] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336750 (https://phabricator.wikimedia.org/T156126) (owner: 10Marostegui) [07:13:01] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1073 - T156126 (duration: 00m 42s) [07:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:09] T156126: Move db1073 to B3 - https://phabricator.wikimedia.org/T156126 [07:19:06] 06Operations, 10DBA, 06Labs, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#3012217 (10Marostegui) [07:19:10] 06Operations, 10DBA, 13Patch-For-Review: Move db1073 to B3 - https://phabricator.wikimedia.org/T156126#3012215 (10Marostegui) 05Open>03Resolved Server is repooled this can be closed. [07:19:59] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.006 second response time [07:21:59] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [07:22:29] RECOVERY - MariaDB Slave Lag: s1 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 89984.10 seconds [07:22:53] 06Operations, 06Labs, 10netops: asw-c2-eqiad reboots & fdb_mac_entry_mc_set() issues - https://phabricator.wikimedia.org/T155875#3012222 (10Marostegui) [07:22:55] 06Operations, 10DBA, 06Labs, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#3012218 (10Marostegui) 05Open>03Resolved All the initial actions listed on the original ticket to mitigate this issue have been completed, the only pending... [07:23:31] 06Operations, 06Performance-Team, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare a reasonably performant warmup tool for MediaWiki caches (memcached/apc) - https://phabricator.wikimedia.org/T156922#3012223 (10Joe) Another interesting possibility we might want to... [07:23:59] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.007 second response time [07:24:40] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1064" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336759 [07:24:59] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [07:28:49] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1064" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336759 (owner: 10Marostegui) [07:28:59] RECOVERY - Host mr1-esams.oob is UP: PING OK - Packet loss = 0%, RTA = 85.39 ms [07:30:45] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1064" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336759 [07:32:49] PROBLEM - puppet last run on elastic2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:35:29] PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:36:29] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 13Patch-For-Review, and 2 others: Create an etcd cluster in codfw - https://phabricator.wikimedia.org/T156009#3012230 (10Joe) The codfw cluster is getting replicated data from eqiad under `/eqiad.wmnet/conftool`. What remains to be done: [] Monitorin... [07:36:49] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:40:03] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1064" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336759 (owner: 10Marostegui) [07:41:38] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1064 - T153743 (duration: 00m 40s) [07:41:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:42] T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743 [07:42:42] (03PS1) 10Marostegui: db-eqiad.php: Add comment for db1064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336760 (https://phabricator.wikimedia.org/T153743) [07:44:27] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Add comment for db1064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336760 (https://phabricator.wikimedia.org/T153743) (owner: 10Marostegui) [07:45:57] (03Merged) 10jenkins-bot: db-eqiad.php: Add comment for db1064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336760 (https://phabricator.wikimedia.org/T153743) (owner: 10Marostegui) [07:46:08] !log Renamed some logs in /var/log (adding _renamed) on aluminum, elastic102[46]/1040 to avoid cronspam and logrotate failures [07:46:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:46:15] gehel: --^ [07:46:42] (03CR) 10jenkins-bot: db-eqiad.php: Add comment for db1064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336760 (https://phabricator.wikimedia.org/T153743) (owner: 10Marostegui) [07:47:16] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Added comment for db1064 being master of db1095 - T153743 (duration: 00m 40s) [07:47:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:20] T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743 [07:50:19] PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:52:29] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:52:52] logrotate --^ [07:53:17] theoretically it should have waited 1.30h before alarming [07:54:00] ah no only 1h [07:57:09] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:04:49] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [08:13:59] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.011 second response time [08:17:18] !log Compressing commonswiki tables on db1095 - T153743 [08:17:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:24] T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743 [08:21:59] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.215 second response time [08:26:09] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [08:30:59] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [08:33:59] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.008 second response time [08:33:59] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [08:34:09] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:37:59] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.009 second response time [08:44:18] !installing Java security updates on stat* and contint1001 [08:45:31] !log installing Java security updates on stat* and contint1001 [08:45:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:52] elukey: thanks forthe log rotation... having a check right now... [08:47:49] gehel: not sure what is the root cause, it happens sometimes.. I just wanted to let you know that if you see "_renamed" filenames it was me :D [08:49:54] elukey: `find /var/log -name "*_renamed"` on those elastic servers does not return anything... Did I misunderstand your message? [08:51:08] !log installing Java security updates on Hadoop cluster [08:51:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:20] let me see what myself with low caffeine level did [08:51:35] elukey: it sounds like a leftover from last Sunday log explosion (known issue), but I'd like to check [08:51:42] * gehel is getting coffee as well :P [08:59:14] !log cleaning empty logs on elastic10(22|24|40) - thanks elukey ! [08:59:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:28] !log restarting java daemons on all the Hadoop nodes for security upgrades [09:01:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:09] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [09:07:42] (03PS13) 10Paladox: Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 [09:07:53] !log Executing Cassandra nodetool cleanup on aqs1006-{a,b} (one at the time) and aqs1009-a [09:07:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:09] !log restarting archiva on meitnerium for java security update [09:08:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:21] !log Deploy alter table on codfw hosts for s7 metawiki and wiki on the echo_notification tables - T136428 [09:10:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:25] T136428: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428 [09:10:59] RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [09:11:59] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [09:13:24] (03PS1) 10Tim Landscheidt: toolserver_legacy: Add redirect for ~wiegels/wikipedia-termine.php [puppet] - 10https://gerrit.wikimedia.org/r/336764 (https://phabricator.wikimedia.org/T62888) [09:14:59] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [09:16:33] (03PS2) 10Jcrespo: Revert "mariadb: Depool db2057 for mariadb upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336657 [09:17:05] !log restarting blazegraph on wdqs1003 to ensure proper war is loaded [09:17:05] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db2057 for mariadb upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336657 (owner: 10Jcrespo) [09:17:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:24] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db2057 for mariadb upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336657 (owner: 10Jcrespo) [09:18:33] (03CR) 10jenkins-bot: Revert "mariadb: Depool db2057 for mariadb upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336657 (owner: 10Jcrespo) [09:19:59] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.007 second response time [09:20:18] 06Operations, 10Dumps-Generation: determine hardware needs for dumps in eqiad and codfw - https://phabricator.wikimedia.org/T118154#3012380 (10fgiunchedi) >>! In T118154#3009098, @ArielGlenn wrote: > - @fgiunchedi mentioned that the esams swift cluster could be used to hold dumps as a viability test if we wan... [09:20:29] !log jynus@tin Synchronized wmf-config/db-codfw.php: Repool db2057 (duration: 00m 41s) [09:20:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:36] (03PS2) 10Jcrespo: mariadb: Depool db1034 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336661 (https://phabricator.wikimedia.org/T111654) [09:22:17] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1013 - https://phabricator.wikimedia.org/T155907#3012395 (10fgiunchedi) 05Open>03Resolved thanks @Cmjohnson ! disk is rebuilding [09:22:57] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1034 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336661 (https://phabricator.wikimedia.org/T111654) (owner: 10Jcrespo) [09:24:42] (03Merged) 10jenkins-bot: mariadb: Depool db1034 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336661 (https://phabricator.wikimedia.org/T111654) (owner: 10Jcrespo) [09:24:59] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.007 second response time [09:26:46] 06Operations, 07Puppet, 07Documentation, 07Need-volunteer, 13Patch-For-Review: document all puppet classes / defined types!? - https://phabricator.wikimedia.org/T127797#3012412 (10hashar) [09:27:09] (03CR) 10jenkins-bot: mariadb: Depool db1034 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336661 (https://phabricator.wikimedia.org/T111654) (owner: 10Jcrespo) [09:27:24] 06Operations, 07Puppet, 07Documentation, 07Need-volunteer, 13Patch-For-Review: document all puppet classes / defined types!? - https://phabricator.wikimedia.org/T127797#2054254 (10hashar) Via another task, I made the puppet documentation to be generated using `yard` (a ruby documentation generator). The... [09:29:39] (03PS1) 10Gehel: elasticsearch - reimage elastic20(17|18|19|20) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/336765 (https://phabricator.wikimedia.org/T151326) [09:31:17] 06Operations, 07Puppet, 07Epic, 07Need-volunteer, 13Patch-For-Review: align puppet-lint config with coding style - https://phabricator.wikimedia.org/T93645#3012422 (10hashar) `--no-autoloader_layout-check` is gone which is quite an achievement! The last ones are: * lines to have 80 characters, which I... [09:32:09] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=elastic20(17|18|19|20).codfw.wmnet [09:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:18] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1034 (duration: 00m 44s) [09:34:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:21] 06Operations, 06Performance-Team, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare a reasonably performant warmup tool for MediaWiki caches (memcached/apc) - https://phabricator.wikimedia.org/T156922#3012431 (10fgiunchedi) >>! In T156922#3003542, @Krinkle wrote: >... [09:36:05] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012432 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2017.codfw.wmnet'] ```... [09:36:29] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012433 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2018.codfw.wmnet'] ```... [09:36:39] PROBLEM - puppet last run on mw2156 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [09:36:39] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012435 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2019.codfw.wmnet'] ```... [09:36:59] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012436 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2020.codfw.wmnet'] ```... [09:36:59] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.005 second response time [09:37:39] RECOVERY - puppet last run on mw2156 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [09:38:56] !log upgrading and restarting db1034 T111654 [09:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:00] T111654: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654 [09:39:27] !log Deploy alter table on eqiad hosts for s7 metawiki and wiki on the echo_notification tables - T136428 [09:39:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:32] T136428: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428 [09:39:49] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:39:59] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.005 second response time [09:44:19] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1028" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336767 [09:44:27] (03CR) 10jerkins-bot: [V: 04-1] Revert "db-eqiad.php: Depool db1028" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336767 (owner: 10Marostegui) [09:46:02] (03CR) 10Gehel: [C: 032] elasticsearch - reimage elastic20(17|18|19|20) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/336765 (https://phabricator.wikimedia.org/T151326) (owner: 10Gehel) [09:50:41] !log restarting oozie and hive on analytics1003 for java security upgrades [09:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:46] !log cleaning up logs on elastic20(01|16) - T139043 [09:52:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:50] T139043: nested RemoteTransportExceptions filled the disk on elastic1036 and elastic1045 during a rolling restart - https://phabricator.wikimedia.org/T139043 [09:53:29] RECOVERY - Disk space on elastic2001 is OK: DISK OK [09:54:06] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012453 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2017.codfw.wmnet'] ``` Of which those **FAILED**: ``` set(['elastic201... [09:54:19] RECOVERY - Disk space on elastic2016 is OK: DISK OK [09:57:17] !log failover Hadoop masters from an1001 to an1002 to allow Java upgrades [09:57:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:49] RECOVERY - puppet last run on elastic2001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:03:45] !log restore Hadoop master to an1001 [10:03:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:36] !log Running package upgrades on contint2001 [10:04:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:33] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2020.codfw.wmnet'] ``` and were **ALL** successful. [10:06:42] PROBLEM - DPKG on contint2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:07:22] ^^^icinga check triggered while the upgrade is going on bah [10:07:52] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:08:42] RECOVERY - DPKG on contint2001 is OK: All packages OK [10:09:50] !log Restarted Jenkins on contint1001 [10:09:53] moritzm: done [10:09:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:04] zeljkof: yeah I restarted it [10:10:08] java upgrade :) [10:10:17] ok [10:10:24] ok, thanks [10:10:29] I see it now here [10:11:10] (03CR) 10Filippo Giunchedi: [C: 04-1] "It looks like upstart already restarts zotero more frequently than three days" [puppet] - 10https://gerrit.wikimedia.org/r/336647 (owner: 10Mobrovac) [10:17:52] RECOVERY - Check HHVM threads for leakage on mw1259 is OK: OK [10:18:12] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[hadoop-yarn-nodemanager] [10:19:07] this one should be fixed soon [10:21:22] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1034 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336772 [10:24:04] (03CR) 10Marostegui: [C: 032] Revert "mariadb: Depool db1034 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336772 (owner: 10Jcrespo) [10:24:12] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:24:23] PROBLEM - puppet last run on relforge1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:25:40] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1034 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336772 (owner: 10Jcrespo) [10:26:40] (03PS1) 10Ema: prometheus: run -vhtcpd-stats iff vhtcpd is running [puppet] - 10https://gerrit.wikimedia.org/r/336773 (https://phabricator.wikimedia.org/T157353) [10:26:46] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1034 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336772 (owner: 10Jcrespo) [10:26:58] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1034 - T111654 (duration: 00m 41s) [10:27:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:02] T111654: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654 [10:29:13] (03PS1) 10Jcrespo: mariadb: Depool db2040 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336774 (https://phabricator.wikimedia.org/T111654) [10:33:38] (03PS1) 10Marostegui: db-eqiad.php: Repool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336775 (https://phabricator.wikimedia.org/T153300) [10:34:44] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db2040 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336774 (https://phabricator.wikimedia.org/T111654) (owner: 10Jcrespo) [10:36:09] (03Merged) 10jenkins-bot: mariadb: Depool db2040 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336774 (https://phabricator.wikimedia.org/T111654) (owner: 10Jcrespo) [10:36:14] (03Abandoned) 10Hashar: build: update rubocop to 0.39 and tweak config [puppet] - 10https://gerrit.wikimedia.org/r/330470 (owner: 10Hashar) [10:36:47] (03CR) 10jenkins-bot: mariadb: Depool db2040 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336774 (https://phabricator.wikimedia.org/T111654) (owner: 10Jcrespo) [10:37:52] !log jynus@tin Synchronized wmf-config/db-codfw.php: Depool db2040 (duration: 00m 40s) [10:37:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:39] !log preparing to reimage db2040 T111654 [10:42:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:44] T111654: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654 [10:49:02] !log restarting Java daemons on druid100[123] for security upgrades [10:49:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:03] !log upgrading java on kafka clusters and druid [10:51:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:32] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:52:22] RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [10:53:21] (03CR) 10Hashar: Modification of Rakefile spec entry point (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [10:53:29] (03PS16) 10Hashar: Rake helper to run rspec in all modules having specs [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [10:54:25] (03PS1) 10Ema: varnish: remove ganglia python module [puppet] - 10https://gerrit.wikimedia.org/r/336778 [10:54:43] (03CR) 10Hashar: [C: 031] "Few minor tweaks to get this patch to finally get merged:" [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [10:54:52] (03PS2) 10Marostegui: db-eqiad.php: Repool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336775 (https://phabricator.wikimedia.org/T153300) [10:54:58] !log deploy exim and openssh bugfix updates from jessie point release [10:55:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:55] (03PS2) 10Ema: varnish: remove ganglia python module [puppet] - 10https://gerrit.wikimedia.org/r/336778 [11:03:02] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.254 second response time [11:08:42] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:21:34] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [11:27:22] PROBLEM - puppet last run on maps1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:27:32] PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 151.85, 104.13, 73.78 [11:27:50] (03PS11) 10Hashar: Use rake tasks to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223 [11:29:17] (03CR) 10Hashar: [C: 031] "Rebased. specs are no more a prerequisite of "rake test", I removed that in PS16 of parent change https://gerrit.wikimedia.org/r/#/c/2824" [puppet] - 10https://gerrit.wikimedia.org/r/307223 (owner: 10Hashar) [11:30:02] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.346 second response time [11:37:32] RECOVERY - very high load average likely xfs on ms-be1013 is OK: OK - load average: 40.75, 73.43, 76.56 [11:37:42] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [11:39:09] !log failed reimage on elastic201[89], restarting [11:39:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:45] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012639 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2018.codfw.wmnet'] ```... [11:45:02] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [11:46:32] PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 166.08, 108.98, 86.26 [11:48:02] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [11:52:32] RECOVERY - very high load average likely xfs on ms-be1013 is OK: OK - load average: 38.95, 72.51, 79.69 [11:55:22] RECOVERY - puppet last run on maps1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [11:56:46] !log upgrading hhvm on mw1170-mw1188 (also effecting updates of openssl, libgd, lcms, gnutls, sqlite, libxpm and glibc) [11:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:59:32] PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 148.87, 108.53, 90.97 [12:07:32] PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 139.18, 108.47, 96.43 [12:10:11] I'll take a look at that [12:10:15] that == ms-be1013 [12:14:52] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.008 second response time [12:15:54] (03CR) 10Filippo Giunchedi: [C: 031] prometheus: run -vhtcpd-stats iff vhtcpd is running [puppet] - 10https://gerrit.wikimedia.org/r/336773 (https://phabricator.wikimedia.org/T157353) (owner: 10Ema) [12:18:34] (03CR) 10Elukey: [C: 031] "Less cronspam, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/336773 (https://phabricator.wikimedia.org/T157353) (owner: 10Ema) [12:18:52] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.006 second response time [12:19:07] (03CR) 10Ema: [V: 032 C: 032] prometheus: run -vhtcpd-stats iff vhtcpd is running [puppet] - 10https://gerrit.wikimedia.org/r/336773 (https://phabricator.wikimedia.org/T157353) (owner: 10Ema) [12:19:57] (03PS2) 10Muehlenhoff: Add three users to absent group [puppet] - 10https://gerrit.wikimedia.org/r/336619 (https://phabricator.wikimedia.org/T142836) [12:20:32] RECOVERY - very high load average likely xfs on ms-be1013 is OK: OK - load average: 29.64, 49.28, 77.93 [12:22:30] (03CR) 10Muehlenhoff: [C: 032] Add three users to absent group [puppet] - 10https://gerrit.wikimedia.org/r/336619 (https://phabricator.wikimedia.org/T142836) (owner: 10Muehlenhoff) [12:23:53] 06Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3012696 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['db2040.codfw.wmnet'] ``` The log can be found in `/var/log/wmf-auto-re... [12:25:48] (03PS2) 10Filippo Giunchedi: graphite: add Restart / RestartSec for graphite daemons [puppet] - 10https://gerrit.wikimedia.org/r/334364 (https://phabricator.wikimedia.org/T155876) [12:27:21] (03PS1) 10Muehlenhoff: Add tbolliger to LDAP users [puppet] - 10https://gerrit.wikimedia.org/r/336783 (https://phabricator.wikimedia.org/T150790) [12:29:12] (03CR) 10Muehlenhoff: [C: 032] Add tbolliger to LDAP users [puppet] - 10https://gerrit.wikimedia.org/r/336783 (https://phabricator.wikimedia.org/T150790) (owner: 10Muehlenhoff) [12:29:39] (03CR) 10Filippo Giunchedi: "> Two things: there's a cron job on fluorine that shoves data over," [puppet] - 10https://gerrit.wikimedia.org/r/335623 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [12:29:49] (03PS2) 10Filippo Giunchedi: Allow mwlog[12]001 on datasets/dumps/eventlog/logstash [puppet] - 10https://gerrit.wikimedia.org/r/335623 (https://phabricator.wikimedia.org/T123728) [12:30:12] (03CR) 10Filippo Giunchedi: [C: 032] graphite: add Restart / RestartSec for graphite daemons [puppet] - 10https://gerrit.wikimedia.org/r/334364 (https://phabricator.wikimedia.org/T155876) (owner: 10Filippo Giunchedi) [12:30:23] (03PS3) 10Filippo Giunchedi: graphite: add Restart / RestartSec for graphite daemons [puppet] - 10https://gerrit.wikimedia.org/r/334364 (https://phabricator.wikimedia.org/T155876) [12:30:52] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.008 second response time [12:32:52] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.006 second response time [12:34:43] !log mforns@tin Started deploy [analytics/refinery@9e689f3]: (no justification provided) [12:34:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:58] 06Operations, 10ops-codfw, 10hardware-requests: Reclaim/Decommission old codfw mc2001->mc2016 hosts - https://phabricator.wikimedia.org/T157675#3012720 (10elukey) [12:35:59] (03PS1) 10Elukey: Assign role spare to mc2001->2016 [puppet] - 10https://gerrit.wikimedia.org/r/336784 (https://phabricator.wikimedia.org/T157675) [12:37:15] (03PS2) 10Elukey: Assign role spare to mc2001->2016 [puppet] - 10https://gerrit.wikimedia.org/r/336784 (https://phabricator.wikimedia.org/T157675) [12:37:48] !log mforns@tin Finished deploy [analytics/refinery@9e689f3]: (no justification provided) (duration: 03m 05s) [12:37:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:32] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [12:44:52] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.011 second response time [12:45:07] 06Operations: elastic2018 fails to PXE boot and seems to have lost network connectivity - https://phabricator.wikimedia.org/T157677#3012757 (10Gehel) [12:45:52] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.007 second response time [12:47:45] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012771 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2019.codfw.wmnet'] ```... [12:50:02] PROBLEM - salt-minion processes on puppetmaster1001 is CRITICAL: PROCS CRITICAL: 5 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:52:01] !log killing salt runs stuck on failing reimage of elastic2018 [12:52:02] RECOVERY - salt-minion processes on puppetmaster1001 is OK: PROCS OK: 4 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:52:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:42] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:57:53] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.005 second response time [13:01:22] RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK [13:01:52] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.012 second response time [13:05:52] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012819 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2019.codfw.wmnet'] ``` Of which those **FAILED**: ``` set(['elastic201... [13:10:55] 06Operations, 10ops-codfw: elastic2018 fails to PXE boot and seems to have lost network connectivity - https://phabricator.wikimedia.org/T157677#3012835 (10Gehel) @Papaul : could you have a look into this please? [13:11:15] !log upgrading firejail on sca cluster [13:11:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:19] 06Operations, 10hardware-requests: Replace bast3001 - https://phabricator.wikimedia.org/T156506#3012840 (10faidon) @Dzahn, any chance you could take this? [13:13:52] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.005 second response time [13:20:52] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.005 second response time [13:21:42] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [13:24:52] PROBLEM - puppet last run on wdqs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:30:14] (03PS1) 10Ema: VCL: Add support for WMF-Last-Access-Global analytics cookie [puppet] - 10https://gerrit.wikimedia.org/r/336790 (https://phabricator.wikimedia.org/T138027) [13:30:29] (03PS1) 10Muehlenhoff: Add two more account expiry dates [puppet] - 10https://gerrit.wikimedia.org/r/336791 [13:30:52] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336775 (https://phabricator.wikimedia.org/T153300) (owner: 10Marostegui) [13:32:15] 06Operations, 10Traffic, 10Wikimedia-Mailing-lists: convert lists.wikimedia.org certificate to LetsEncrypt (deadline:2017-03-02) - https://phabricator.wikimedia.org/T154917#2928370 (10faidon) @RobH, what's the status of this? The certificate expires in 20 days. Relatedly, there is an alert for HTTPS too; th... [13:32:22] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336775 (https://phabricator.wikimedia.org/T153300) (owner: 10Marostegui) [13:32:39] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336775 (https://phabricator.wikimedia.org/T153300) (owner: 10Marostegui) [13:32:53] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.004 second response time [13:34:11] (03CR) 10Muehlenhoff: [C: 032] Add two more account expiry dates [puppet] - 10https://gerrit.wikimedia.org/r/336791 (owner: 10Muehlenhoff) [13:34:19] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1028 - T153300 (duration: 00m 41s) [13:34:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:23] T153300: Remove partitions from metawiki.pagelinks in s7 - https://phabricator.wikimedia.org/T153300 [13:34:53] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.009 second response time [13:39:17] (03PS1) 10Hashar: zuul: move zuul-merger to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/336794 [13:40:53] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:40:56] (03CR) 10Hashar: "The aim is to ultimately allow several zuul-merger instances :) I will migrate the zuul-server in another chain of changes." [puppet] - 10https://gerrit.wikimedia.org/r/336794 (owner: 10Hashar) [13:45:35] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=elastic2020.codfw.wmnet [13:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:53] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:45:53] RECOVERY - puppet last run on wdqs1002 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [13:50:50] hashar: nothing for eu swat today \o/ https://wikitech.wikimedia.org/wiki/Deployments#Thursday.2C.C2.A0February.C2.A009 [13:51:14] (03CR) 10Hashar: "Puppet compile https://puppet-compiler.wmflabs.org/5390/" [puppet] - 10https://gerrit.wikimedia.org/r/336794 (owner: 10Hashar) [13:53:33] (03PS5) 10Faidon Liambotis: Setup & configure certspotter [puppet] - 10https://gerrit.wikimedia.org/r/333231 (https://phabricator.wikimedia.org/T155807) [13:55:13] (03CR) 10Faidon Liambotis: [C: 032] Setup & configure certspotter [puppet] - 10https://gerrit.wikimedia.org/r/333231 (https://phabricator.wikimedia.org/T155807) (owner: 10Faidon Liambotis) [13:55:53] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:56:27] (03PS1) 10Gehel: elasticsearch - reimage elastic20(21|22|23|24) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/336795 (https://phabricator.wikimedia.org/T151326) [13:57:11] (03PS2) 10Gehel: elasticsearch - reimage elastic20(21|22|23|24) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/336795 (https://phabricator.wikimedia.org/T151326) [13:57:17] 06Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3012956 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db2040.codfw.wmnet'] ``` and were **ALL** successful. [13:58:20] (03CR) 10Gehel: [C: 032] elasticsearch - reimage elastic20(21|22|23|24) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/336795 (https://phabricator.wikimedia.org/T151326) (owner: 10Gehel) [14:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170209T1400). [14:00:56] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:01:04] (03PS1) 10Faidon Liambotis: Fix SSL checks for tendril/lists that are now LE [puppet] - 10https://gerrit.wikimedia.org/r/336796 [14:01:47] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=elastic20(21|22|23|24).codfw.wmnet [14:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:03] Nothind to deploy. [14:02:39] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012988 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2021.codfw.wmnet'] ```... [14:03:25] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3012991 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2022.codfw.wmnet'] ```... [14:03:30] (03PS2) 10Faidon Liambotis: Fix SSL checks for tendril/lists that are now LE [puppet] - 10https://gerrit.wikimedia.org/r/336796 [14:03:32] (03PS1) 10Faidon Liambotis: certspotter: add system => true to user/group [puppet] - 10https://gerrit.wikimedia.org/r/336798 [14:05:44] (03CR) 10Faidon Liambotis: [C: 032] certspotter: add system => true to user/group [puppet] - 10https://gerrit.wikimedia.org/r/336798 (owner: 10Faidon Liambotis) [14:05:49] (03CR) 10Faidon Liambotis: [C: 032] Fix SSL checks for tendril/lists that are now LE [puppet] - 10https://gerrit.wikimedia.org/r/336796 (owner: 10Faidon Liambotis) [14:05:56] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:06:48] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/336794 (owner: 10Hashar) [14:08:08] (03PS1) 10Marostegui: db-eqiad.php: Depool db1062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336799 (https://phabricator.wikimedia.org/T136428) [14:10:57] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:14:48] (03PS2) 10Faidon Liambotis: aptrepo: add suite stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/336386 [14:14:57] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.008 second response time [14:15:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336799 (https://phabricator.wikimedia.org/T136428) (owner: 10Marostegui) [14:18:22] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336799 (https://phabricator.wikimedia.org/T136428) (owner: 10Marostegui) [14:18:30] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336799 (https://phabricator.wikimedia.org/T136428) (owner: 10Marostegui) [14:19:57] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.009 second response time [14:20:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1062 - T136428 (duration: 00m 45s) [14:20:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:19] T136428: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428 [14:23:52] (03CR) 10Reedy: "OOI, does - in the same place actually work?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336747 (https://phabricator.wikimedia.org/T157656) (owner: 10Gergő Tisza) [14:24:26] (03CR) 10Filippo Giunchedi: [C: 032] "ATM if carbon-cache is killed carbon-c-relay will queue the metrics for a bit and then start dropping metrics once its queue is full, I'll" [puppet] - 10https://gerrit.wikimedia.org/r/334364 (https://phabricator.wikimedia.org/T155876) (owner: 10Filippo Giunchedi) [14:24:35] (03PS4) 10Filippo Giunchedi: graphite: add Restart / RestartSec for graphite daemons [puppet] - 10https://gerrit.wikimedia.org/r/334364 (https://phabricator.wikimedia.org/T155876) [14:25:20] (03PS1) 10Jcrespo: Resolve hanging mysql group with uid 1000 for new reimages [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/336800 (https://phabricator.wikimedia.org/T100501) [14:26:00] (03PS2) 10Muehlenhoff: zuul: move zuul-merger to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/336794 (owner: 10Hashar) [14:28:25] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] graphite: add Restart / RestartSec for graphite daemons [puppet] - 10https://gerrit.wikimedia.org/r/334364 (https://phabricator.wikimedia.org/T155876) (owner: 10Filippo Giunchedi) [14:29:22] godog: +1 [14:29:52] volans: \o/ [14:30:16] 06Operations, 10Graphite, 10Monitoring: Increased load on graphite1003, carbon-cache not autorestarting when killed by OOM - https://phabricator.wikimedia.org/T155876#3013092 (10Pppery) [14:30:57] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:31:57] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.015 second response time [14:32:32] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3013098 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2021.codfw.wmnet'] ``` and were **ALL** successful. [14:32:57] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.010 second response time [14:33:02] (03PS3) 10Filippo Giunchedi: Allow mwlog[12]001 on datasets/dumps/eventlog/logstash [puppet] - 10https://gerrit.wikimedia.org/r/335623 (https://phabricator.wikimedia.org/T123728) [14:33:06] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3013099 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2022.codfw.wmnet'] ``` and were **ALL** successful. [14:35:10] (03PS3) 10Muehlenhoff: zuul: move zuul-merger to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/336794 (owner: 10Hashar) [14:35:47] (03CR) 10Filippo Giunchedi: [C: 032] Allow mwlog[12]001 on datasets/dumps/eventlog/logstash [puppet] - 10https://gerrit.wikimedia.org/r/335623 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [14:36:09] (03CR) 10BBlack: [C: 031] VCL: Add support for WMF-Last-Access-Global analytics cookie [puppet] - 10https://gerrit.wikimedia.org/r/336790 (https://phabricator.wikimedia.org/T138027) (owner: 10Ema) [14:37:35] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1062" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336801 [14:39:10] (03PS4) 10Muehlenhoff: zuul: move zuul-merger to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/336794 (owner: 10Hashar) [14:39:24] 06Operations, 10ops-eqiad, 13Patch-For-Review: Suspected faulty SSD on graphite1001 - https://phabricator.wikimedia.org/T157022#3013111 (10fgiunchedi) >>! In T157022#3002379, @MoritzMuehlenhoff wrote: >>>! In T157022#2997068, @fgiunchedi wrote: >> Read traffic has been switched over to graphite2001 now and s... [14:39:56] (03CR) 10Filippo Giunchedi: [C: 032] diamond: switch to graphite2001 [puppet] - 10https://gerrit.wikimedia.org/r/335764 (https://phabricator.wikimedia.org/T157022) (owner: 10Filippo Giunchedi) [14:40:15] (03CR) 10Muehlenhoff: [C: 032] zuul: move zuul-merger to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/336794 (owner: 10Hashar) [14:42:32] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1062" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336801 (owner: 10Marostegui) [14:43:03] (03PS3) 10Elukey: Assign role spare to mc2001->2016 [puppet] - 10https://gerrit.wikimedia.org/r/336784 (https://phabricator.wikimedia.org/T157675) [14:43:57] (03PS3) 10Filippo Giunchedi: diamond: switch to graphite2001 [puppet] - 10https://gerrit.wikimedia.org/r/335764 (https://phabricator.wikimedia.org/T157022) [14:43:59] (03PS3) 10Filippo Giunchedi: graphite: move alerts to graphite2001 [puppet] - 10https://gerrit.wikimedia.org/r/335763 (https://phabricator.wikimedia.org/T157022) [14:44:01] (03PS3) 10Filippo Giunchedi: graphite: switch graphite alerts to graphite2001 [puppet] - 10https://gerrit.wikimedia.org/r/335765 (https://phabricator.wikimedia.org/T157022) [14:45:34] (03PS1) 10Filippo Giunchedi: hieradata: remove stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/336802 (https://phabricator.wikimedia.org/T154164) [14:45:54] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1062" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336801 (owner: 10Marostegui) [14:45:58] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:46:53] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1062 - T136428 (duration: 00m 41s) [14:46:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:58] T136428: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428 [14:47:09] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1062" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336801 (owner: 10Marostegui) [14:47:56] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] diamond: switch to graphite2001 [puppet] - 10https://gerrit.wikimedia.org/r/335764 (https://phabricator.wikimedia.org/T157022) (owner: 10Filippo Giunchedi) [14:48:58] !log move diamond traffic to graphite2001 - T157022 [14:49:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:03] T157022: Suspected faulty SSD on graphite1001 - https://phabricator.wikimedia.org/T157022 [14:49:57] PROBLEM - Check systemd state on dataset1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:53:07] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [14:53:16] !log upgrading hhvm on mw1189-mw1199 and mw1293/mw1294 [14:53:19] (03PS1) 10Hashar: (WIP) zuul-merger instances (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/336803 [14:53:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:02] (03PS1) 10Filippo Giunchedi: diamond: reload on handler config file changes [puppet] - 10https://gerrit.wikimedia.org/r/336804 [14:54:47] PROBLEM - Host elastic2018 is DOWN: PING CRITICAL - Packet loss = 100% [14:55:05] ^gehel downtime expired? [14:55:19] (03PS1) 10Marostegui: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336805 (https://phabricator.wikimedia.org/T136428) [14:55:31] moritzm: so it seems... I though I gave it a week... [14:55:33] (03PS2) 10Filippo Giunchedi: diamond: reload on handler config file changes [puppet] - 10https://gerrit.wikimedia.org/r/336804 (https://phabricator.wikimedia.org/T157022) [14:55:57] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:56:07] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating [14:56:37] any volunteer for a quick review of https://gerrit.wikimedia.org/r/#/c/336804/1/modules/diamond/manifests/init.pp ? [14:56:45] moritzm: thanks! downtime rescheduled... [14:57:56] (03CR) 10Muehlenhoff: [C: 031] Assign role spare to mc2001->2016 [puppet] - 10https://gerrit.wikimedia.org/r/336784 (https://phabricator.wikimedia.org/T157675) (owner: 10Elukey) [14:58:57] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336805 (https://phabricator.wikimedia.org/T136428) (owner: 10Marostegui) [15:00:34] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336805 (https://phabricator.wikimedia.org/T136428) (owner: 10Marostegui) [15:00:47] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336805 (https://phabricator.wikimedia.org/T136428) (owner: 10Marostegui) [15:01:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1079 - T136428 (duration: 00m 43s) [15:01:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:42] T136428: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428 [15:03:00] (03PS1) 10Hashar: (WIP) Add zuul-merger on contint1001 and contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/336807 [15:03:34] (03CR) 10Volans: [C: 04-1] "See inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/336804 (https://phabricator.wikimedia.org/T157022) (owner: 10Filippo Giunchedi) [15:03:39] godog: ^^^ [15:03:47] PROBLEM - Check systemd state on ms1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:04:32] volans: thanks! suggestions on how to fix that? [15:04:41] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336808 [15:05:08] I'm checking why $handler is checked for empty anyways [15:05:23] * volans thinking [15:05:57] PROBLEM - puppet last run on graphite1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:07:52] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336808 (owner: 10Marostegui) [15:09:35] godog: the only "easy" way I can see is to define the array of additional subscribers in the if (adding and else in which you define it empty) and then use concat([File['/etc/diamond/diamond.conf']], $handler_array) [15:09:46] still pretty ugly, but you know... it's puppet [15:09:49] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336808 (owner: 10Marostegui) [15:09:57] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336808 (owner: 10Marostegui) [15:10:48] volans: hehe I was thinking to fail instead if $handler is empty, afaics we always want an handler anyways [15:10:56] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1079 - T136428 (duration: 00m 40s) [15:11:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:00] T136428: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428 [15:11:12] that's an option too, but I guess the if is there for a reason :D [15:11:56] the reason could be scope creep though :) [15:12:30] lol [15:13:14] # An empty handler string will purge all handlers [15:13:17] (03PS4) 10Elukey: Assign role spare to mc2001->2016 [puppet] - 10https://gerrit.wikimedia.org/r/336784 (https://phabricator.wikimedia.org/T157675) [15:14:19] I don't think the module will actually do that [15:14:29] (03PS1) 10Marostegui: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336809 (https://phabricator.wikimedia.org/T136428) [15:15:38] (03CR) 10Elukey: [C: 032] Assign role spare to mc2001->2016 [puppet] - 10https://gerrit.wikimedia.org/r/336784 (https://phabricator.wikimedia.org/T157675) (owner: 10Elukey) [15:18:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336809 (https://phabricator.wikimedia.org/T136428) (owner: 10Marostegui) [15:19:26] (03PS1) 10ArielGlenn: report failed dump runs correctly [dumps] - 10https://gerrit.wikimedia.org/r/336811 (https://phabricator.wikimedia.org/T157669) [15:19:30] !log restarting all Analytics Kafka brokers for Java security upgrades [15:19:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:01] ah yes it will [15:20:16] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336809 (https://phabricator.wikimedia.org/T136428) (owner: 10Marostegui) [15:20:24] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336809 (https://phabricator.wikimedia.org/T136428) (owner: 10Marostegui) [15:21:05] (03PS2) 10Hashar: (WIP) Add zuul-merger on contint1001 and contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/336807 [15:21:18] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1086 - T136428 (duration: 00m 40s) [15:21:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:22] T136428: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428 [15:21:43] 06Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3013195 (10jcrespo) All core servers/server with core data now support TLS connections and use it for replication (except labs- the new server suport it, but are not accesible remotely for... [15:22:32] ^ jynus congrats!!! [15:22:36] 06Operations, 10ops-esams, 10hardware-requests, 13Patch-For-Review: decom cp3011-22 (12 machines) - https://phabricator.wikimedia.org/T130883#2149483 (10MoritzMuehlenhoff) cp3014, cp3020 and cp3022 are still shown in servermon: https://servermon.wikimedia.org/hosts/ Probably "puppet node deactivate" was m... [15:23:57] !log shutdown cp3020 T130883 [15:24:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:01] T130883: decom cp3011-22 (12 machines) - https://phabricator.wikimedia.org/T130883 [15:24:26] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336813 [15:26:38] (03PS3) 10Hashar: (WIP) Add zuul-merger on contint1001 and contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/336807 [15:26:58] PROBLEM - SSH on cp3020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:27:07] PROBLEM - MD RAID on cp3020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:27:08] PROBLEM - puppet last run on cp3020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:27:27] PROBLEM - Check systemd state on cp3020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:27:30] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336813 (owner: 10Marostegui) [15:27:47] PROBLEM - DPKG on cp3020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:27:57] PROBLEM - Disk space on cp3020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:28:05] it doesn't want to go away peacefully [15:28:49] oh was it still on icinga? [15:29:29] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336813 (owner: 10Marostegui) [15:29:37] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336813 (owner: 10Marostegui) [15:29:37] PROBLEM - Host cp3020 is DOWN: PING CRITICAL - Packet loss = 100% [15:30:44] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1086 - T136428 (duration: 00m 44s) [15:30:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:51] T136428: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428 [15:33:35] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3013248 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2024.codfw.wmnet'] ```... [15:34:07] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [15:36:14] (03PS3) 10Filippo Giunchedi: diamond: require $handler to be defined [puppet] - 10https://gerrit.wikimedia.org/r/336804 (https://phabricator.wikimedia.org/T157022) [15:36:19] talking about scope screep ^ [15:42:27] !log roll-restart diamond to pick up graphite2001 changes [15:42:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:49] 06Operations, 07discovery-system: confctl SubjectAltNameWarning after python-urllib3 upgrade - https://phabricator.wikimedia.org/T156232#3013298 (10faidon) [15:44:16] 06Operations, 07discovery-system: confctl SubjectAltNameWarning after python-urllib3 upgrade - https://phabricator.wikimedia.org/T156232#2968185 (10faidon) Adding this under T132324, as crons with this message are spamming us regularly now. [15:44:20] (03CR) 10Mobrovac: "Those might be zotero crashes. But the fact is that when it does not crash, it starts swallowing memory and we need to restart it. This a " [puppet] - 10https://gerrit.wikimedia.org/r/336647 (owner: 10Mobrovac) [15:45:08] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3013304 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2023.codfw.wmnet'] ```... [15:45:47] godog: LGTM, but safer if you could run a decently large puppet compiler on it [15:46:12] (03PS4) 10Hashar: Add zuul-merger on contint1001 and contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/336807 [15:48:54] volans: ok! do you remember offhand the syntax to ask the compiler for a regex in the hostname? r: ? [15:49:13] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=aqs1009.eqiad.wmnet [15:49:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:47] godog: sorry, not offhand [15:51:54] np! [15:53:01] godog: elif host_list.startswith("re:"): [15:53:05] I would say re: :D [15:53:37] 06Operations, 10Analytics, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#2956925 (10Ottomata) 05Open>03Resolved Done! https://apt.wikimedia.org/wikimedia/pool/main/g/git-fat/ Reopen if it doesn't work! :) [15:54:34] volans: haha thanks! almost :)) [15:54:40] 06Operations, 10Analytics, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3013355 (10Ottomata) 05Resolved>03Open Ok, I'm just clicking buttons way too fast over here. It's been packaged. To deploy, we need to run a `apt-get install git-fat` everywhere. Can... [15:54:49] 06Operations, 06Analytics-Kanban, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3013358 (10Ottomata) [16:00:50] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3013377 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2024.codfw.wmnet'] ``` Of which those **FAILED**: ``` set(['elastic202... [16:02:49] (03PS5) 10Hashar: Add zuul-merger on contint1001 and contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/336807 [16:03:48] (03PS1) 10Jcrespo: Revert "mariadb: Depool db2040 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336819 [16:06:23] !log rolling restart of replication threads for dbstore1002/2001/2002 T111654 [16:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:29] T111654: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654 [16:08:44] !log lvs1012: upgrade to jessie 8.7, pybal 1.13.4, reboot into kernel 4.4.2-3+wmf8 T155401 [16:08:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:49] T155401: Integrate jessie 8.7 point release - https://phabricator.wikimedia.org/T155401 [16:09:13] PROBLEM - Check systemd state on kafka1012 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:09:33] PROBLEM - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [16:11:34] elukey: issues with kafka1012 ^ ? [16:11:57] its mirror maker, it seems to die on broker restarts sometimes..., other processes should pick up, and it will restart in a sec... [16:12:01] dunno what's totally up with that [16:12:33] RECOVERY - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is OK: PROCS OK: 1 process with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [16:12:35] restarted it [16:13:13] RECOVERY - Check systemd state on kafka1012 is OK: OK - running: The system is fully operational [16:13:48] ema: sorry just seen it, thanks :) [16:14:49] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3013450 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2023.codfw.wmnet'] ``` and were **ALL** successful. [16:15:35] !log Compressing commonswiki on labsdb1009 - T153743 [16:15:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:39] T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743 [16:17:17] !log Shutdown db2060 for maintenance - T156161 [16:17:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:21] T156161: db2060 not accessible - https://phabricator.wikimedia.org/T156161 [16:21:34] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: db2060 not accessible - https://phabricator.wikimedia.org/T156161#3013464 (10Marostegui) The server is off now. Feel free to turn it on once it is all done (or if the HP technician doesn't show up again) Thank you! [16:22:24] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Reclaim/Decommission old codfw mc2001->mc2016 hosts - https://phabricator.wikimedia.org/T157675#3013466 (10elukey) p:05Triage>03Normal a:03Papaul [16:22:25] !log lvs1011: upgrade to jessie 8.7, pybal 1.13.4, reboot into kernel 4.4.2-3+wmf8 T155401 [16:22:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:29] T155401: Integrate jessie 8.7 point release - https://phabricator.wikimedia.org/T155401 [16:24:03] PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100% [16:24:33] RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 36.11 ms [16:25:49] (03PS6) 10Hashar: Add zuul-merger on contint1001 and contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/336807 (https://phabricator.wikimedia.org/T150936) [16:27:17] PROBLEM - Check systemd state on kafka1012 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:27:37] PROBLEM - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [16:28:17] RECOVERY - Check systemd state on kafka1012 is OK: OK - running: The system is fully operational [16:28:37] RECOVERY - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is OK: PROCS OK: 1 process with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [16:28:49] (03CR) 10Hashar: "So in theory this patch should:" [puppet] - 10https://gerrit.wikimedia.org/r/336807 (https://phabricator.wikimedia.org/T150936) (owner: 10Hashar) [16:30:57] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db2040 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336819 (owner: 10Jcrespo) [16:31:43] mobrovac: re: https://gerrit.wikimedia.org/r/#/c/336647/ can you amend the commit message with the behaviour you described at the meeting? it sounds much more sane as a fix for that type of symptom [16:33:17] PROBLEM - Check systemd state on kafka1012 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:33:21] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db2040 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336819 (owner: 10Jcrespo) [16:33:27] fixing 1012 [16:33:37] PROBLEM - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [16:33:38] (03CR) 10jenkins-bot: Revert "mariadb: Depool db2040 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336819 (owner: 10Jcrespo) [16:34:17] RECOVERY - Check systemd state on kafka1012 is OK: OK - running: The system is fully operational [16:34:22] ottomata: we might need to add Restart always or something similar [16:34:34] restart always? so it just restarts if it dies? [16:34:37] RECOVERY - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is OK: PROCS OK: 1 process with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [16:34:42] oh its dying each time you restart another broker now? [16:34:47] yeah [16:34:50] weird that its just 1012 [16:35:16] let's open a phab task to investigate [16:35:41] (03PS2) 10Filippo Giunchedi: graphite: switch to graphite2001 [dns] - 10https://gerrit.wikimedia.org/r/335766 (https://phabricator.wikimedia.org/T157022) [16:37:22] (03CR) 10Filippo Giunchedi: [C: 032] graphite: switch to graphite2001 [dns] - 10https://gerrit.wikimedia.org/r/335766 (https://phabricator.wikimedia.org/T157022) (owner: 10Filippo Giunchedi) [16:37:36] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3013512 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2018.codfw.wmnet'] ```... [16:37:46] !log jynus@tin Synchronized wmf-config/db-codfw.php: Repool db2040 (duration: 00m 52s) [16:37:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:11] !log flip dns records for statsd/carbon to graphite2001 - T157022 [16:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:15] T157022: Suspected faulty SSD on graphite1001 - https://phabricator.wikimedia.org/T157022 [16:38:42] 06Operations, 10ops-codfw: elastic2018 fails to PXE boot and seems to have lost network connectivity - https://phabricator.wikimedia.org/T157677#3013514 (10Papaul) 05Open>03Resolved a:03Papaul For some reason port 1 on the NIC was not showing. After resetting the power by removing all PSU's for 5 minutes... [16:40:30] 06Operations, 06Analytics-Kanban, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3013520 (10thcipriani) >>! In T155856#3013355, @Ottomata wrote: > Ok, I'm just clicking buttons way too fast over here. It's been packaged. To deploy, we need to run a `apt-get ins... [16:40:55] (03PS1) 10Eevans: Update path of exporter jar to currently deployed version [puppet] - 10https://gerrit.wikimedia.org/r/336831 (https://phabricator.wikimedia.org/T155120) [16:43:29] (03CR) 10Filippo Giunchedi: [C: 032] Update path of exporter jar to currently deployed version [puppet] - 10https://gerrit.wikimedia.org/r/336831 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [16:44:27] RECOVERY - Host elastic2018 is UP: PING OK - Packet loss = 0%, RTA = 36.06 ms [17:00:05] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170209T1700). [17:02:07] no patches? https://i.imgur.com/Phu8eXt.gifv [17:03:42] how usually puppet swat goes between me j.oe and morit.zm https://i.imgur.com/awtt8NH.gifv [17:04:23] godog: quick draw with the gifs :) [17:04:57] thcipriani: I keep a list :)) [17:05:31] dammit I should have replied with a gif [17:05:39] embarrassing. [17:05:44] :D [17:05:56] https://out.reddit.com/t3_5bmbb5?url=http%3A%2F%2Fwww.reactiongifs.com%2Fr%2Fgtfafm1.gif&token=AQAAf66cWHtIXc6MOuLKPPXuE4nw6lKjPfpEs1re1VmCVQIDvBXb&app_name=reddit.com [17:06:00] noooo [17:06:04] even more embarrasing [17:06:35] oh hey -- is it too late to apply that queue change to the timedmediahandler bits? [17:06:53] i think the queue's filling up again :( [17:07:05] brion: it depends, does it come with a relevant animated gif? [17:07:10] haha [17:07:31] what's the patch btw? [17:08:22] brion: ^ [17:08:29] looking for link now [17:08:55] 06Operations, 10DBA, 06Performance-Team, 07Availability, 07Wikimedia-Multiple-active-datacenters: Apache <=> mariadb SSL/TLS for cross-datacenter writes - https://phabricator.wikimedia.org/T134809#3013633 (10jcrespo) [17:08:59] 06Operations, 06Security-Team: Use user-specific passwords for accessing EventLogging database - https://phabricator.wikimedia.org/T120532#3013634 (10jcrespo) [17:09:01] 06Operations: Encrypt all the things - https://phabricator.wikimedia.org/T111653#3013635 (10jcrespo) [17:09:05] https://gerrit.wikimedia.org/r/#/c/331668/ <- [17:09:06] 06Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3013629 (10jcrespo) 05Open>03Resolved a:03jcrespo I have restarted all replication channels of dbstore1002/2001/2002 and db1047. I consider this task resolved, with some follow-ups,... [17:09:17] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3013636 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2018.codfw.wmnet'] ``` and were **ALL** successful. [17:09:18] (03PS2) 10Hashar: (WIP) Puppet compile an host via rspec (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/308889 [17:09:41] bah merge conflict [17:10:16] (03PS3) 10Brion VIBBER: Split TMH transcode queue into two for prioritization [puppet] - 10https://gerrit.wikimedia.org/r/331668 (https://phabricator.wikimedia.org/T155098) [17:10:21] mm, rebased cleanly \o/ [17:11:24] (03PS3) 10Hashar: (WIP) Puppet compile an host via rspec (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/308889 [17:11:26] (03PS1) 10Hashar: interface: IPAddr.new() requires an address family [puppet] - 10https://gerrit.wikimedia.org/r/336840 [17:14:07] brion: looking [17:14:48] 06Operations, 10Traffic, 10Wikimedia-Mailing-lists: convert lists.wikimedia.org certificate to LetsEncrypt (deadline:2017-03-02) - https://phabricator.wikimedia.org/T154917#3013654 (10RobH) I'm working through a Python book trying to skill up to modify the script that handles the LE certificate generation an... [17:15:41] thx [17:19:18] brion: yup, lgtm, merging shortly [17:19:23] woot [17:19:24] thanks :D [17:19:30] (03CR) 10Filippo Giunchedi: [V: 032] Split TMH transcode queue into two for prioritization [puppet] - 10https://gerrit.wikimedia.org/r/331668 (https://phabricator.wikimedia.org/T155098) (owner: 10Brion VIBBER) [17:19:44] ci is backed up, +2'ing [17:19:49] heh [17:20:08] great, i'll make sure the php side is ready to start routing to the new queue at the 11am swat :D [17:23:20] brion: ok! looks good on jobrunner side, it is deploying now [17:23:26] :D [17:23:35] 06Operations, 10DBA, 06Performance-Team, 07Availability, 07Wikimedia-Multiple-active-datacenters: Apache <=> mariadb SSL/TLS for cross-datacenter writes - https://phabricator.wikimedia.org/T134809#3013704 (10jcrespo) TLS is deployed on all core MySQLs (s*, x2, es*, pc* shards)- although for obvious reaso... [17:24:37] successful puppetswat https://i.imgur.com/0Sbuow9.gif [17:29:07] nice [17:29:11] thanks godog ! [17:29:49] brion: np, thank you! [17:31:28] 06Operations, 10DBA: Followup for TLS MariaDB server roll-out - https://phabricator.wikimedia.org/T157702#3013769 (10jcrespo) [17:32:36] PROBLEM - MD RAID on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:33:06] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Add global last-access cookie for top domain (*.wikipedia.org) - https://phabricator.wikimedia.org/T138027#3013792 (10Milimetric) [17:34:21] ottomata: all the analytics kafka brokers restarted, everything looks good [17:34:36] RECOVERY - MD RAID on thumbor1001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [17:35:16] PROBLEM - Check systemd state on kafka1012 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:35:36] PROBLEM - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [17:36:56] (03CR) 10jerkins-bot: [V: 04-1] (WIP) Puppet compile an host via rspec (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/308889 (owner: 10Hashar) [17:37:44] 06Operations, 10DBA: Followup for TLS MariaDB server roll-out - https://phabricator.wikimedia.org/T157702#3013816 (10jcrespo) [17:37:51] 06Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3013817 (10jcrespo) [17:38:17] 06Operations, 06Analytics-Kanban, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3013818 (10thcipriani) I just tried the above ^ on a labs project machine with the new git fat version and it worked! Here's the tl;dr: ``` $ /bin/bash fattest.sh Cloning into '/ho... [17:39:19] fixing 1012 (grr) [17:40:16] RECOVERY - Check systemd state on kafka1012 is OK: OK - running: The system is fully operational [17:40:36] RECOVERY - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is OK: PROCS OK: 1 process with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [17:42:04] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=elastic20(18|21|22|23|24).codfw.wmnet [17:42:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:52] ok opened a task to investigate the mirror maker failures [17:45:28] 06Operations, 06Analytics-Kanban, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3013854 (10Ottomata) Ok, great. Can I install on a remote target and we try in prod for something? [17:46:48] brion: hi! I am checking https://grafana.wikimedia.org/dashboard/db/prometheus-apache-hhvm-dc-stats?var-datasource=eqiad%20prometheus%2Fops&var-cluster=videoscaler&var-instance=All , was the load decrease expected? (Just curious) [17:48:38] (not that it is bad, but I just want to make sure that it was expected :) [17:48:40] elukey: yeah it should reduce the number of processes running [17:48:45] nice :) [17:48:50] then they should go back up after the 11am update [17:48:58] but running a different subset of files :) [17:49:16] and more modestly i think [17:49:47] oh ok now I got it, this is waiting the mw changes for the new queue [17:49:50] super [17:49:50] thanks :) [17:49:56] ya [17:50:13] these paranoid ops [17:50:19] always asking questions :D [17:51:02] hehehe [17:52:47] (03PS1) 10ArielGlenn: Clean up temp files from page content dumps before retry [dumps] - 10https://gerrit.wikimedia.org/r/336849 [17:55:53] !log proactively restarted statsv on hafnium after the kafka broker restarts [17:55:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:01] Krinkle: --^ [17:56:11] hope that this time it will not cause big holes in metrics [17:56:24] let me know otherwise [17:56:29] * elukey afk! [17:56:49] (03PS1) 10Giuseppe Lavagetto: profile::etcd::replication: refactor to make failover easier [puppet] - 10https://gerrit.wikimedia.org/r/336850 (https://phabricator.wikimedia.org/T156009) [17:56:51] (03PS1) 10Giuseppe Lavagetto: prometheus::class_config: allow new selections for prometheus [puppet] - 10https://gerrit.wikimedia.org/r/336851 [17:56:53] (03PS1) 10Giuseppe Lavagetto: prometheus: add etcd metrics [puppet] - 10https://gerrit.wikimedia.org/r/336852 [17:57:09] <_joe_> godog: the latter two are of your interest [17:57:14] <_joe_> godog: completely untested [17:58:12] _joe_: nice, thanks I've added myself in there [18:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170209T1800). [18:00:20] I have deploy for ores [18:00:37] * halfak looks on from another meeting [18:03:39] (03CR) 10jerkins-bot: [V: 04-1] Clean up temp files from page content dumps before retry [dumps] - 10https://gerrit.wikimedia.org/r/336849 (owner: 10ArielGlenn) [18:04:11] (03CR) 10jerkins-bot: [V: 04-1] profile::etcd::replication: refactor to make failover easier [puppet] - 10https://gerrit.wikimedia.org/r/336850 (https://phabricator.wikimedia.org/T156009) (owner: 10Giuseppe Lavagetto) [18:04:40] (03CR) 10BBlack: [C: 031] varnish: remove ganglia python module [puppet] - 10https://gerrit.wikimedia.org/r/336778 (owner: 10Ema) [18:05:13] (03PS2) 10ArielGlenn: Clean up temp files from page content dumps before retry [dumps] - 10https://gerrit.wikimedia.org/r/336849 [18:05:52] PROBLEM - Check systemd state on ms-fe1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:06:17] Have tin fingerprint changed? ssh says it's changed or man in the middle is happening [18:06:22] PROBLEM - puppet last run on mc1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:06:45] Amir1: not in the last 3 months, afaik [18:07:12] PROBLEM - Check systemd state on ms-fe1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:07:14] (03CR) 10jerkins-bot: [V: 04-1] prometheus: add etcd metrics [puppet] - 10https://gerrit.wikimedia.org/r/336852 (owner: 10Giuseppe Lavagetto) [18:07:17] ebernhardson: Is there a way to check fingerprints online? [18:07:57] Amir1: logging into tin, i get this from ssh-keygen: 2048 d6:18:81:2f:2e:68:7b:e6:69:fc:b5:f0:82:66:7d:03 /etc/ssh/ssh_host_rsa_key.pub (RSA) [18:08:45] The fingerprint for the ECDSA key sent by the remote host is [18:08:45] SHA256:jZhHHpPiAspcYnKiJIo+h380CoMBpaBSS5Bw03mMCTs. [18:08:53] Amir1: https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/tin.eqiad.wmnet [18:09:13] changed last october [18:09:23] valhallasw`cloud: Thanks [18:09:27] it's the same [18:09:45] Amir1: also each host in production has each other host's key in its /etc/ssh/authorized_keys file [18:09:46] but it may be that it's confused because of an ordering change, and it's now using ECDSA instead of RSA [18:10:02] sorry, known_hosts [18:10:14] nice [18:10:17] Thanks [18:16:25] (03PS5) 10Paladox: Testing: do not merge [puppet] - 10https://gerrit.wikimedia.org/r/336304 [18:18:16] Okay, It seems that I have another problem. one of ores deploy submodules has changed to ssh from ssl and in tin I can not do git submodule update, it gives me permission denied (publickey). Do you need to add my ssh key somewhere? [18:20:08] (03PS6) 10Dzahn: phabricator: Testing: do not merge [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [18:20:50] Amir1: ugh. that sounds like a problem with the submodule defines [18:21:09] you really should not be using ssh in the deploy dir [18:21:23] I know halfak was to deploy, did some hacking [18:21:45] bd808: the reason is that it's too big that now, it's not possible to clone/fetch using ssl [18:22:03] what? [18:22:13] let me find the bug [18:22:17] you mean that gerrit barfs when you clone if via https? [18:22:44] that's a lame gerrit bug we've hit in mw-vagrant for core [18:22:48] yes [18:22:49] (03CR) 10Dzahn: phabricator: Testing: do not merge (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [18:23:19] (03PS7) 10Dzahn: phabricator: elasticsearch version settings (WIP, DNM) [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [18:23:26] Amir1: so the problem with ssh clones is that requires a private key on tin and we don't like private keys on tin generally [18:23:51] thcipriani, RainbowSprinkles: Amir1 needs some git ninja help ^ [18:24:01] (03CR) 10Paladox: phabricator: elasticsearch version settings (WIP, DNM) (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [18:24:10] * thcipriani reads scrollback [18:24:14] (03CR) 10Dzahn: "what is currently not working?" [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [18:24:22] (03PS8) 10Paladox: phabricator: elasticsearch version settings (WIP, DNM) [puppet] - 10https://gerrit.wikimedia.org/r/336304 [18:24:41] bd808: https://phabricator.wikimedia.org/T157141 [18:24:43] found it [18:25:13] (03CR) 10Paladox: "> what is currently not working?" [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [18:25:28] yesterday, IIRC, I believe halfak made a key directly on tin. [18:25:36] to clone down the submodule [18:25:59] thcipriani: so basically I need to gen a pair of keys? [18:26:14] ugh. I hope tied to a gerrit account with no perms other than read [18:26:40] * bd808 leaves this in thcipriani's capable hands [18:26:49] Amir1: that is the workaround right now, correct. [18:26:56] "capable" [18:27:07] okay, thanks! [18:28:56] thcipriani: any idea how to check which version of debian is being run on labs [18:29:00] tried uname -or [18:29:17] halfak: thcipriani worked [18:29:24] JustBerry: cat /etc/*release [18:29:25] let's go in the next step [18:29:42] (03CR) 10Dzahn: "Ok, so you are getting a "5" from Hiera in Labs as the Elastic search version, right? And does that come from the wiki page or from repo? " [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [18:29:59] !log starting deploy of ores:e27e845 to canary node [18:29:59] \o/ [18:30:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:05] !log ladsgroup@tin Started deploy [ores/deploy@e27e845]: (no justification provided) [18:30:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:19] thcipriani: didn't work [18:30:20] in bastion? [18:30:22] RECOVERY - puppet last run on mc1010 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [18:30:22] or shell [18:30:45] just in shell on whatever machine you're trying to find out what version of debian it's running [18:30:46] thcipriani: wait a sec [18:31:38] JustBerry: also cat /etc/apt/sources.list [18:31:42] like this: https://gist.github.com/thcipriani/79ac806d05a80d5a355a6ff353b447ad [18:32:45] JustBerry: If it's Debian, just do "uname -a", it will tell you somewhere there [18:33:06] JustBerry: Output should be "Linux debian ..." and the version follows [18:33:25] (If you mean your kernel) [18:34:02] PROBLEM - ores on scb1002 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 136 bytes in 0.001 second response time [18:34:20] Woops! [18:34:21] okay, canary failed [18:34:38] !log ladsgroup@tin Finished deploy [ores/deploy@e27e845]: (no justification provided) (duration: 04m 33s) [18:34:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:42] :( [18:35:02] RECOVERY - ores on scb1002 is OK: HTTP OK: HTTP/1.0 200 OK - 2822 bytes in 0.009 second response time [18:35:12] halfak: thank you for the scap patch by the way. [18:35:35] !log bounce zuul to pick up statsd DNS change - T157022 [18:35:35] thcipriani, no problem. Will get back to it when I have a minute :) [18:35:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:39] T157022: Suspected faulty SSD on graphite1001 - https://phabricator.wikimedia.org/T157022 [18:35:45] :) [18:36:02] This is weird, it works just fine in beta [18:36:26] Looks like we have the revscoring version issue again. [18:36:37] Somehow the submodule update failed. [18:37:59] Is there a way that we can review the output of a check script? [18:38:08] Calling all scap masters ^ [18:38:10] scap-log [18:38:19] Oh cool. [18:39:02] ...where's that? [18:39:12] scap deploy-log [18:39:19] instead of scap deploy :D [18:39:36] Permission denied: '/srv/log/ores/scap' [18:39:54] are you in tin? [18:40:09] Oh was on scb1002 [18:40:14] Thought that's where we'd find it. [18:40:34] tin says: No such file or directory: '/srv/deployment/ores/deploy/scap/.git/config-files' [18:41:15] ladsgroup@tin:/srv/deployment/ores/deploy$ scap deploy-log works for me [18:41:26] I think the path is important otherwise can't read config [18:41:46] Gotcha [18:42:23] 06Operations, 06Analytics-Kanban, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3014122 (10thcipriani) >>! In T155856#3013854, @Ottomata wrote: > Ok, great. Can I install on a remote target and we try in prod for > something? hrm. I usually test in beta and do... [18:43:02] (03PS10) 10Dzahn: icinga/base: add hiera override to skip base monitoring [puppet] - 10https://gerrit.wikimedia.org/r/327388 (https://phabricator.wikimedia.org/T151632) [18:43:35] ah, yes, looked away for a second: PWD does matter when running scap deploy-log [18:43:49] (as you already figured out, just confirmed) [18:44:16] thcipriani: Do you know why it can not update submodules? [18:44:16] I don't see any logs for today -- just the one dated for yesterday [18:44:27] (otherwise it wouldn't know which thing you want deploy logs about :) ) [18:44:28] hrm, lemme take a look at deploy log [18:44:30] 2017-02-08 [18:44:49] halfak: this is probably fun, probably you can't read my deployment logs :D [18:45:02] https://www.irccloud.com/pastebin/W4po0Ze9/ [18:45:09] Nope. Can read them just fine. [18:45:11] Date is wrong [18:45:19] but yeah, I see that you are the owner. [18:45:20] halfak: this is all of it [18:45:46] There should be logs for 2017-02-08 and 2017-02-09 [18:45:57] oh, you are right [18:46:04] I've confirmed that running $(date) on tin gets 2017-02-09 for today [18:46:50] (03CR) 10Dzahn: [C: 032] icinga/base: add hiera override to skip base monitoring [puppet] - 10https://gerrit.wikimedia.org/r/327388 (https://phabricator.wikimedia.org/T151632) (owner: 10Dzahn) [18:47:02] (03PS11) 10Dzahn: icinga/base: add hiera override to skip base monitoring [puppet] - 10https://gerrit.wikimedia.org/r/327388 (https://phabricator.wikimedia.org/T151632) [18:48:01] (03PS12) 10Dzahn: icinga/base: add hiera override to skip base monitoring [puppet] - 10https://gerrit.wikimedia.org/r/327388 (https://phabricator.wikimedia.org/T151632) [18:48:05] hrm. for whatever reason the log from today is at scap/log/scap-sync-2017-02-08-0002-2-ge27e845.log [18:48:30] the stat on that file is today's date from a few minutes ago [18:49:06] thcipriani, what do you think about the log from yesterday being missing? [18:49:15] Maybe it was overwritten? [18:49:41] is this log file it? scap/log/scap-sync-2017-02-08-0001.log [18:50:12] PROBLEM - puppet last run on ms-be1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:50:13] oh... maybe? name format is weirdly different. [18:50:22] !log test bouncing jmxtrans on kafka1012 to pick up statsd changes [18:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:41] yeah, names are weird :\ [18:51:23] anyway, in looking at these files scap seems to think it's working correctly. so it's not running into git errors when cloning the submodules afaict. [18:51:43] even seems to restart the service and connect to 8081 successfully [18:51:52] So weird that this all works on Beta :/ [18:51:56] but that doesn't jive with what you're seeing on that machine? [18:52:10] scb1002? [18:52:17] sca03 [18:52:36] sca03 is our beta [18:52:37] eployment-sca03.deployment-prep.eqiad.wmflabs [18:52:40] right [18:52:41] *deployment-sca03.deployment-prep.eqiad.wmflabs [18:52:47] the canary is sca1002 [18:52:53] sorry, scb1002 [18:52:53] *scb1002 [18:52:53] I meant that what the deploy logs on tin show doesn't jive with sca1002 in prod? [18:53:07] the submodule is not getting cloned? [18:53:27] the problem is we needed to rollback so I can't look at the deployed state. [18:53:42] Maybe we could deploy to codfw and look at the resulting files manually [18:53:50] I don't have access to that machine, but I believe the new dir should still be there: /srv/deployment/ores/deploy-cache/revs/e27e845993ba1c68c52de134bea8fcf99697dfaf [18:53:51] See what's not happening that ought to [18:54:08] Oh cool [18:54:12] I'll look at that [18:54:13] (note: we do blue/green system, so our canary gets traffic) [18:54:47] wheels submodule looks right [18:55:08] has "pywikibase-0.0.4a" which is our last change. [18:55:12] yeah [18:55:35] thcipriani, any way I could check on the status of the venv it created? [18:57:11] ACKNOWLEDGEMENT - puppet last run on cp1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues daniel_zahn on it - https://gerrit.wikimedia.org/r/#/c/327388/ [18:57:12] Amir1, what would it take to do a deploy to codfw? [18:57:21] where we can leave it broken for a little bit. [18:57:33] This will be fun [18:57:53] first, we need to make a patch in the deploy repo [18:57:54] on it [18:58:14] kk. [18:58:22] hrm. I'm not sure about the virtualenv. This is what's setup by the cmd_worker.sh script. [18:58:32] Right [18:59:02] Given that I now know that the wheels submodule is right, I'm worried about the venv that they should have been installed in. [18:59:13] My hypothesis is that no wheels were installed at all. [18:59:52] in looking at this .sh script it seems like all deploys look at /srv/deployment/ores/venv which doesn't change with the symlink switch [19:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170209T1900). [19:00:04] kaldari and ebernhardson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [19:00:21] That might be the real problem. Maybe we need to destroy that directory as part of our deploy. [19:00:40] whoops ne of those is mine, i mistyped a template [19:00:59] here [19:01:22] I can SWAT [19:01:40] get it together, Puppet. Error 400 on SERVER: Invalid relationship: [19:02:57] (03PS2) 10Thcipriani: Setting $wgPageAssessmentsSubprojects to true on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336743 (https://phabricator.wikimedia.org/T157654) (owner: 10Kaldari) [19:03:05] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336743 (https://phabricator.wikimedia.org/T157654) (owner: 10Kaldari) [19:04:26] halfak: https://gerrit.wikimedia.org/r/#/c/336863/1 [19:04:38] (03Merged) 10jenkins-bot: Setting $wgPageAssessmentsSubprojects to true on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336743 (https://phabricator.wikimedia.org/T157654) (owner: 10Kaldari) [19:04:50] (03CR) 10jenkins-bot: Setting $wgPageAssessmentsSubprojects to true on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336743 (https://phabricator.wikimedia.org/T157654) (owner: 10Kaldari) [19:05:16] Amir1, so the plan is to try a full deploy to codfw and then submit a new patch later to point back to eqiad? [19:05:26] yup [19:05:33] kaldari: your patch is live on mwdebug1002, check please [19:05:38] (simply just revert it) [19:05:43] checkinfg... [19:05:43] Merged [19:05:50] Thanks [19:05:59] !log ladsgroup@tin Started deploy [ores/deploy@4fdaf7d]: (no justification provided) [19:06:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:11] * halfak works on scap patch -- convinces his editor to stop cleaning up imports. [19:06:25] thcipriani: Looks good, feel free to sync [19:06:36] kaldari: ok, going live [19:07:37] halfak: it's live broken in scb2002.codfw.wmnet now [19:07:42] kk [19:07:44] checking [19:07:46] going to full deploy :D [19:08:36] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:336743|Setting $wgPageAssessmentsSubprojects to true on testwiki]] T157654 (duration: 00m 54s) [19:08:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:08:40] T157654: Turn on subproject support for PageAssessments in production - https://phabricator.wikimedia.org/T157654 [19:08:41] ^ kaldari live now [19:08:41] Amir1, nothing is installed! [19:08:47] revscoring-1.2.8! [19:08:51] So WTF. [19:09:00] Why does our check script work on beta but not prod? [19:09:09] how that get's a version of revscoring that belongs to months ago [19:09:21] I think the check script doesn't happen [19:09:26] I think the check files are different [19:09:32] PROBLEM - ores on scb2002 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 136 bytes in 0.072 second response time [19:09:32] not sure [19:09:32] ohhhh [19:09:39] thanks icinga ;) [19:09:50] ops, how we can ack this? [19:10:06] (I don't want to send sms to people :D) [19:11:07] wait...now that I'm thinking about it: maybe your checks don't run. Usually the checks output to the log. [19:11:08] halfak: they are not differnt [19:11:22] like: Executing check 'Logstash Error rate for mw1265.eqiad.wmnet' [19:11:43] thcipriani, ^ aha! That seems likely [19:11:56] I *think* this may have to do with the group [19:12:07] thcipriani: crap i forgot to paste one more -- https://gerrit.wikimedia.org/r/#/c/336846/ [19:12:14] like each check specifies either "worker" or "web" but none specify "canary" [19:12:19] https://github.com/wikimedia/mediawiki-services-ores-deploy/tree/master/scap [19:12:24] Here's the config [19:12:36] brion: np :) [19:12:37] ah i pasted it in the wrong section [19:12:37] sigh [19:12:43] * brion brainfarts today [19:12:44] maybe how scap handles checks has changed in recent releases [19:12:47] thanks :D [19:13:30] thcipriani: Mind if I add a couple of patches to swat? They're just backports of maintenance script stuff for ConfirmEdit [19:13:30] halfak: probably canary needs this too [19:13:31] https://github.com/wikimedia/mediawiki-services-ores-deploy/blob/master/scap/checks.yaml [19:13:40] !log smalyshev@tin Started deploy [wdqs/wdqs@1a7cd32]: (no justification provided) [19:13:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:59] Reedy: should be fine. Add away. [19:14:11] ebernhardson: ping for SWAT [19:14:15] !log Restarted logstash on logstash1001. Dead since 2017-02-09T06:39:46 with "java.lang.UnsupportedOperationException" crash in worker thread. [19:14:15] \o/ [19:14:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:06] !log smalyshev@tin Started deploy [wdqs/wdqs@1a7cd32]: (no justification provided) [19:15:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:29] bleugh, dependant patches [19:15:46] Amir1: yeah my suspicion is that the script that handles the venv is not getting triggered in the canary deploy. [19:16:00] since the group doesn't match [19:16:24] Okay, I make a patch [19:17:22] !log smalyshev@tin Started deploy [wdqs/wdqs@1a7cd32]: Deploy new WAR build [19:17:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:17:32] \o/ [19:17:38] 06Operations, 06Analytics-Kanban, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3014261 (10Ottomata) I think we’ll have a refinery deploy to do soon. Will check… [19:18:08] halfak: https://gerrit.wikimedia.org/r/336865 [19:18:36] brion: the first of your patches is on mwdebug1002, check please [19:18:36] thcipriani: any idea why I am getting ValueError from scap? [19:18:48] checking [19:18:59] Amir1, {{done}} [19:19:12] RECOVERY - puppet last run on ms-be1021 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [19:19:20] cooool [19:19:29] !log ladsgroup@tin Started deploy [ores/deploy@a3a410b]: (no justification provided) [19:19:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:42] PROBLEM - puppet last run on multatuli is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:19:45] SMalyshev: hrm, just checked you log for scap. Could you file a task with that info? [19:19:52] I haven't seen that error before [19:20:01] sure [19:20:22] SMalyshev: it doesn't look like the deploy actually started? [19:20:37] nope, it just errors out [19:20:39] SMalyshev, https://phabricator.wikimedia.org/T157136 [19:20:42] Like that? [19:21:28] halfak: IT WORKS [19:21:31] YESSSSSS [19:21:32] RECOVERY - ores on scb2002 is OK: HTTP OK: HTTP/1.0 200 OK - 3147 bytes in 0.082 second response time [19:21:38] Can we deploy now? [19:21:45] Woah! [19:21:46] thcipriani: seems ok, though uploading to the debug is hard to confirm, it has a short post limit :D [19:21:48] :D! [19:21:58] halfak: yeah same error [19:22:18] SMalyshev: I think I see the problem [19:22:21] SMalyshev, seems like hashar's comment in there is spot-on. [19:22:30] I secretly prayed to his noodely goodness :D [19:22:34] ha [19:22:50] (03PS3) 10Krinkle: Don't use computed dblist in production (nowikidatadescriptiontaglines) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334462 [19:22:53] https://en.wikipedia.org/wiki/Flying_Spaghetti_Monster [19:23:19] halfak: https://gerrit.wikimedia.org/r/#/c/336868/ [19:23:27] halfak: I prefer https://en.wikipedia.org/wiki/Invisible_Pink_Unicorn [19:23:30] (03PS4) 10Krinkle: tests: Use sample data that doesn't match production names [software/conftool] - 10https://gerrit.wikimedia.org/r/327686 [19:23:45] ok confirmed with a small file, looks ok [19:23:54] SMalyshev: scratch that I don't see the problem... [19:24:00] This is blasphemy :P [19:24:03] brion: ok, going live with the first change [19:24:13] Amir1, {{done}} [19:24:20] Awesome [19:24:22] thcipriani: the only difference I used -l [19:24:29] I want to deploy just on one host.... [19:24:32] bd808, Suggestions 1:1 "I am the Flying Spaghetti Monster. Thou shalt have no other monsters before Me." [19:24:46] From "Loose Canon" [19:25:00] :D [19:25:03] SMalyshev: ah, damn, that's definitely it. I get what's happening. Which host? Not the canary I'd guess. [19:25:15] thcipriani: nope, just a host [19:25:46] I'm waiting for the codfw deploy to finish, then I go to eqiad [19:26:49] !log thcipriani@tin Synchronized php-1.29.0-wmf.11/extensions/TimedMediaHandler/SpecialTimedMediaHandler.php: SWAT: [[gerrit:336734|Only load necessary fields on Special:TimedMediaHandler lists]] (T157621) (duration: 00m 41s) [19:26:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:52] T157621: Special:TimedMediaHandler does not exist and won't even load a webpage - https://phabricator.wikimedia.org/T157621 [19:27:00] ^ brion first one live now [19:27:16] thcipriani: woot, looks good [19:27:30] Amir1, sounds good. [19:27:30] SMalyshev: we must have borked the limit behavior in the newest version :( [19:28:32] !log ladsgroup@tin Started deploy [ores/deploy@a3a410b]: (no justification provided) [19:28:33] SMalyshev: to deploy to 1 host the work around would be to edit your scap.cfg to only deploy to one server_group that points to a dsh_target with one group. [19:28:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:59] ignore this ^I forgot to do git pull [19:29:07] !log ladsgroup@tin Started deploy [ores/deploy@10fa16b]: (no justification provided) [19:29:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:24] brion: 2nd change is live on mwdebug1002 if there's anything to check there [19:30:30] lemme check [19:31:14] thcipriani: so far so good, nothing explodes on upload :D [19:31:26] Reedy: one of your changes -- Add script for counting captchas -- is also on mwdebug1002 if there's anything to check there. [19:31:36] no explosions is good :) [19:31:48] hehe [19:33:04] (03PS1) 10Kaldari: Set $wgPageAssessmentsSubprojects to true on English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336871 (https://phabricator.wikimedia.org/T157654) [19:33:33] Canary is live [19:34:19] !log smalyshev@tin Started deploy [wdqs/wdqs@1a7cd32]: Deploy new WAR build on 2003 [19:34:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:46] !log smalyshev@tin Finished deploy [wdqs/wdqs@1a7cd32]: Deploy new WAR build on 2003 (duration: 00m 26s) [19:34:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:09] (03PS1) 10Cmjohnson: Adding dns entries for new elastic search servers elastic1048-1052 T155790 [dns] - 10https://gerrit.wikimedia.org/r/336872 [19:35:11] confirmed manually that it's in canary [19:35:13] moving on [19:36:02] thcipriani: Nothing really to test :) [19:36:13] Other patch just merged though, so can be staged however [19:37:03] !log smalyshev@tin Started deploy [wdqs/wdqs@1a7cd32]: Deploy new WAR build on 1003 [19:37:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:37:19] !log smalyshev@tin Finished deploy [wdqs/wdqs@1a7cd32]: Deploy new WAR build on 1003 (duration: 00m 16s) [19:37:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:37:35] Reedy: 2nd patch staged on mwdebug1002. [19:37:42] PROBLEM - Blazegraph process on wdqs2003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (blazegraph), regex args ^java .* blazegraph-service-.*-dist.war [19:37:48] (03PS2) 10Cmjohnson: Adding dns entries for new elastic search servers elastic1048-1052 T155790 [dns] - 10https://gerrit.wikimedia.org/r/336872 [19:38:21] (03CR) 10Eevans: [C: 031] "> I've run PCC at https://puppet-compiler.wmflabs.org/5378/xenon.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/335826 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [19:39:22] thcipriani: sorry i didn't respond to ping, if you're already done i can ship my patch [19:39:32] PROBLEM - Blazegraph process on wdqs1003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (blazegraph), regex args ^java .* blazegraph-service-.*-dist.war [19:39:32] ebernhardson: still working on stuff :) [19:39:36] ok [19:39:49] I'll get you in the queue [19:40:02] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:40:04] i'm fighting with the elasticsearch migration plugin, which wants to give me ~3k individual problem reports (in html of course, no easy to munge javascript :P) [19:40:13] s/javascript/json/ [19:40:53] (03CR) 10Cmjohnson: [C: 032] Adding dns entries for new elastic search servers elastic1048-1052 T155790 [dns] - 10https://gerrit.wikimedia.org/r/336872 (owner: 10Cmjohnson) [19:40:55] sounds fun :) [19:41:08] brion: have you been able to confirm that your 2nd patch is working? [19:41:38] thcipriani: until it hits the job runners i won't know for sure [19:42:21] but i can confirm that with patch in place, they go to the new queue instead of the old one [19:43:02] PROBLEM - Check correctness of the icinga configuration on einsteinium is CRITICAL: Icinga configuration contains errors [19:43:07] ACKNOWLEDGEMENT - Blazegraph process on wdqs1003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (blazegraph), regex args ^java .* blazegraph-service-.*-dist.war Gehel check too restrictive on process arguments, fix coming up [19:43:07] ACKNOWLEDGEMENT - Blazegraph process on wdqs2003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (blazegraph), regex args ^java .* blazegraph-service-.*-dist.war Gehel check too restrictive on process arguments, fix coming up [19:43:16] they just won't get run until the video scaler queue runners are updated [19:43:48] ok. Trying to figure out the best order in which to deploy this patch. TimedMediaHandler.php -> maybe just a syncdir of everything? [19:43:57] Amir1, deploy still in progress? [19:44:13] or does the order still matter after TimedMediaHandler.php is sync'd? [19:44:34] (03PS1) 10RobH: Sam Tarling shell access + statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/336875 [19:44:46] thcipriani: shouldn't matter i think [19:44:57] (03PS2) 10RobH: Sam Tarling shell access + statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/336875 [19:44:58] ok, I'll try that then. [19:45:08] thx [19:45:58] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Request for access to stat1003 for Sam Tarling - https://phabricator.wikimedia.org/T157483#3014328 (10RobH) 05Open>03stalled p:05Triage>03Normal I've prepared the patchset and it is ready for merge as long as no objections are noted on this tas... [19:46:34] halfak: yup [19:47:08] Oooh! I just got a glimpse of new model info being output :DDD [19:47:42] RECOVERY - puppet last run on multatuli is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [19:47:46] You are super lucky, it's in only one node [19:47:56] !log thcipriani@tin Synchronized php-1.29.0-wmf.11/extensions/TimedMediaHandler/TimedMediaHandler.php: SWAT: [[gerrit:336846|TMH job queue split into low and high priority]] PART I T155098 (duration: 00m 41s) [19:47:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:00] T155098: Rework job queue usage for TimedMediaHandler (video scalers) - https://phabricator.wikimedia.org/T155098 [19:48:37] * halfak refreshes a lot :) [19:48:48] (03PS1) 10Gehel: wdqs - icinga process check more relaxed on arguments [puppet] - 10https://gerrit.wikimedia.org/r/336878 [19:48:52] wheeee [19:49:02] !log thcipriani@tin Synchronized php-1.29.0-wmf.11/extensions/TimedMediaHandler/TimedMediaHandler.hooks.php: SWAT: [[gerrit:336846|TMH job queue split into low and high priority]] PART II T155098 (duration: 00m 40s) [19:49:02] Basically, I F5 hammer during deploys a lot [19:49:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:15] I see a completed job! "Completed 19:49, 9 February 2017 " [19:49:18] thanks thcipriani [19:49:52] nice :) [19:50:04] !log thcipriani@tin Synchronized php-1.29.0-wmf.11/extensions/TimedMediaHandler: SWAT: [[gerrit:336846|TMH job queue split into low and high priority]] PART III T155098 (duration: 00m 44s) [19:50:08] ^ all sync'd! [19:50:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:39] (03CR) 10Smalyshev: [C: 031] wdqs - icinga process check more relaxed on arguments [puppet] - 10https://gerrit.wikimedia.org/r/336878 (owner: 10Gehel) [19:51:09] Reedy: how does everything look with ConfirmEdit changes? [19:52:02] :)))) [19:52:15] halfak: It's done now [19:52:48] thcipriani: Did you update the submodule? after a git pull? :P [19:52:57] Amir1, I might still be seeing some cached output [19:52:58] ... [19:53:00] * thcipriani checks [19:53:16] Check out https://ores.wikimedia.org/v2/scores/enwiki/ [19:53:25] hrm, says it's up-to-date [19:53:31] lemme re-pull on mwdebug [19:53:49] Amir1, and then https://ores.wikimedia.org/v2/scores/enwiki/?derpderp will show the right models [19:53:58] weirdest part is it's minifed by default https://ores.wikimedia.org/v2/scores/wikidatawiki/?models=damaging&revids=421063984 [19:54:08] Reedy: ok, now I'm sure it's up-to-date. [19:54:23] Amir1, that is super weird. [19:54:24] Hmm [19:54:28] halfak: probably you got varnish'ed (because you refresh a lot) [19:54:36] reedy@mwdebug1002:~$ ls -al /srv/mediawiki/php-1.29.0-wmf.10/extensions/ConfirmEdit/maintenance/ [19:54:39] There's only 1 file there... [19:54:49] Amir1, I can't explain that minifying [19:55:09] ok definitely confirmed the updated TMH queues are working as expected on a new upload [19:55:15] They're there on tin [19:55:31] Reedy: hrm. patches are on wmf.11 [19:55:38] Failure rate is okay: https://grafana.wikimedia.org/dashboard/db/ores-extension?from=now-1h&to=now [19:55:48] brion: \o/ nice :) [19:56:07] Amir1, I don't think the JSON formatting is worth a rollback [19:56:12] But it is super weird. [19:56:35] I'll file a task [19:57:13] halfak: One other thing, I was trying swagger. Why it calls ores.wmflabs.org when you hit try it out [19:57:14] https://ores.wikimedia.org/v1#!/default/get_v1_scores_context_models_models [19:57:19] !log restarting main kafka brokers in codfw and then eqiad to pick up jvm updates [19:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:43] Amir1, yikes! It shouldn't. That's maybe in the config? [19:58:04] Yeah, we should have a task for that too [19:58:06] https://phabricator.wikimedia.org/T157721 [19:58:13] I guess finding and fixing it would be easy [19:58:28] (03PS1) 10Dzahn: cp1008: do not attempt to skip Icinga base monitoring [puppet] - 10https://gerrit.wikimedia.org/r/336879 (https://phabricator.wikimedia.org/T151632) [19:58:34] ebernhardson: your change should be on mwdebug1002 [19:58:46] PHP Fatal error: Class 'Memcached' not found in /srv/mediawiki/php-1.29.0-wmf.11/includes/libs/objectcache/MemcachedPeclBagOStuff.php on line 63 [19:58:50] stupid thing [19:58:56] Amir1, https://github.com/wiki-ai/ores/blob/master/ores/wsgi/routes/v2/swagger.yaml#L14 [19:59:03] Looks like it is hard-coded. [19:59:04] thcipriani: For me, feel free to just sync-dir the whole extension to teh cluster :) [19:59:12] Maybe we should turn this into a template and parameterize it. [19:59:22] 06Operations, 06Release-Engineering-Team, 05DC-Switchover-Prep-Q3-2016-17: Understand the preparedness of misc services for datacenter switchover - https://phabricator.wikimedia.org/T156937#3014405 (10RobH) [19:59:23] thcipriani: testing [19:59:25] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Build warm slave for Gerrit in Dallas - https://phabricator.wikimedia.org/T148186#3014406 (10RobH) [19:59:30] 06Operations, 10Gerrit, 06Release-Engineering-Team, 10hardware-requests, 13Patch-For-Review: Requesting 1 spare misc box for Gerrit in codfw - https://phabricator.wikimedia.org/T148187#3014404 (10RobH) 05Open>03Resolved [19:59:34] Reedy: ok, doing. [19:59:52] Amir1, https://phabricator.wikimedia.org/T157723 [19:59:59] For today, I think we can declare victory [20:00:02] (03PS2) 10Dzahn: cp1008: do not attempt to skip Icinga base monitoring [puppet] - 10https://gerrit.wikimedia.org/r/336879 (https://phabricator.wikimedia.org/T151632) [20:00:04] thcipriani: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170209T2000). [20:00:16] 06Operations, 10Ops-Access-Requests: Cluster Access for Nithum Thain - https://phabricator.wikimedia.org/T157724#3014434 (10ellery) [20:00:18] (03CR) 10Dzahn: [C: 032] cp1008: do not attempt to skip Icinga base monitoring [puppet] - 10https://gerrit.wikimedia.org/r/336879 (https://phabricator.wikimedia.org/T151632) (owner: 10Dzahn) [20:00:19] I acknowledge you jouncebot [20:00:22] (03CR) 10Dzahn: [V: 032 C: 032] cp1008: do not attempt to skip Icinga base monitoring [puppet] - 10https://gerrit.wikimedia.org/r/336879 (https://phabricator.wikimedia.org/T151632) (owner: 10Dzahn) [20:00:25] \o/ [20:00:43] We introduced new features, but lots of new bugs [20:00:43] 06Operations, 10Ops-Access-Requests, 06Research-and-Data: Cluster Access for Nithum Thain - https://phabricator.wikimedia.org/T157724#3014451 (10ellery) [20:00:47] :P [20:00:50] thcipriani: looks good [20:01:05] ebernhardson: cool, I'll sync shortly [20:01:06] i.e. alternative features [20:01:16] ok, load's a bit higher on the video scaler now. i'll keep an eye on it and tweak the queue sizes later if need be [20:01:22] PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw on kafka2001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_main-codfw/producer\.properties [20:01:32] PROBLEM - Check systemd state on kafka2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:02:22] RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw on kafka2001 is OK: PROCS OK: 1 process with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_main-codfw/producer\.properties [20:02:23] !log thcipriani@tin Synchronized php-1.29.0-wmf.11/extensions/ConfirmEdit: SWAT: [[gerrit:336864|Add script for counting captchas]] [[gerrit:336866|Use an accurate number of captchas]] (duration: 00m 43s) [20:02:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:29] ^ Reedy sync'd [20:02:32] RECOVERY - Check systemd state on kafka2001 is OK: OK - running: The system is fully operational [20:02:34] ^ i'm on those kafka things [20:03:00] (03CR) 10Dzahn: "What conflicted here was that" [puppet] - 10https://gerrit.wikimedia.org/r/336879 (https://phabricator.wikimedia.org/T151632) (owner: 10Dzahn) [20:03:16] thcipriani: thanks! [20:03:56] (03CR) 10Jdlrobson: [C: 031] "Seems fine but don't know enough about how this stuff works and how it would be updated in future. Would be great to capture best practice" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334462 (owner: 10Krinkle) [20:05:13] !log thcipriani@tin Synchronized php-1.29.0-wmf.11/resources/src/mediawiki.special/mediawiki.special.search.interwikiwidget.styles.less: SWAT: [[gerrit:336855|Temporary hax to hide cawiki hacked in search sidebar]] T149806 (duration: 00m 40s) [20:05:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:18] T149806: [A/B/C Test] Add cross-wiki search results in a right sidebar - https://phabricator.wikimedia.org/T149806 [20:05:21] ^ ebernhardson sync'd! [20:05:30] thcipriani: thanks! [20:06:25] yw :) [20:06:31] and now it's train time... [20:07:23] 06Operations, 13Patch-For-Review: Fix Icinga checks for test/decom servers - https://phabricator.wikimedia.org/T151632#3014483 (10Dzahn) After the merges above, now if you use "role::spare" on a node then base monitoring gets skipped. It's just that currently no servers are using role::spare in site.pp anymor... [20:07:50] 06Operations: Fix Icinga checks for test/decom servers - https://phabricator.wikimedia.org/T151632#3014484 (10Dzahn) [20:09:02] RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [20:15:28] (03PS20) 10Madhuvishy: labstore: Diamond collector to track directory sizes [puppet] - 10https://gerrit.wikimedia.org/r/335855 (https://phabricator.wikimedia.org/T126623) [20:15:32] PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:19:57] !log otto@tin Started deploy [analytics/refinery@9e689f3]: (no justification provided) [20:20:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:15] !log otto@tin Started deploy [analytics/refinery@9e689f3]: (no justification provided) [20:20:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:35] !log otto@tin Started deploy [analytics/refinery@9e689f3]: (no justification provided) [20:20:38] !log otto@tin Started deploy [analytics/refinery@9e689f3]: (no justification provided) [20:20:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:46] !log otto@tin Finished deploy [analytics/refinery@9e689f3]: (no justification provided) (duration: 00m 07s) [20:20:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:58] hrm. [20:21:11] !log otto@tin Started deploy [analytics/refinery@9e689f3]: (no justification provided) [20:21:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:41] !log otto@tin Started deploy [analytics/refinery@9e689f3]: (no justification provided) [20:21:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:46] !log otto@tin Finished deploy [analytics/refinery@9e689f3]: (no justification provided) (duration: 00m 04s) [20:21:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:54] !log otto@tin Started deploy [analytics/refinery@9e689f3]: (no justification provided) [20:23:56] !log otto@tin Started deploy [analytics/refinery@9e689f3]: (no justification provided) [20:23:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:16] !log otto@tin Finished deploy [analytics/refinery@9e689f3]: (no justification provided) (duration: 00m 20s) [20:24:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:45] (03CR) 10Nuria: [C: 031] VCL: Add support for WMF-Last-Access-Global analytics cookie (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/336790 (https://phabricator.wikimedia.org/T138027) (owner: 10Ema) [20:33:27] (03PS1) 10Ottomata: Ensuring explicit version of git-fat to upgrade to 0.1.2 [puppet] - 10https://gerrit.wikimedia.org/r/336887 (https://phabricator.wikimedia.org/T155856) [20:34:30] (03CR) 10jerkins-bot: [V: 04-1] Ensuring explicit version of git-fat to upgrade to 0.1.2 [puppet] - 10https://gerrit.wikimedia.org/r/336887 (https://phabricator.wikimedia.org/T155856) (owner: 10Ottomata) [20:35:26] (03PS2) 10Ottomata: Ensuring explicit version of git-fat to upgrade to 0.1.2 [puppet] - 10https://gerrit.wikimedia.org/r/336887 (https://phabricator.wikimedia.org/T155856) [20:37:54] (03CR) 10Krinkle: [C: 031] "(bump)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323330 (https://phabricator.wikimedia.org/T136849) (owner: 10Gergő Tisza) [20:41:05] !log thcipriani@tin Synchronized php-1.29.0-wmf.11/skins/CologneBlue/CologneBlue.php: [[gerrit:336714|Fix a bunch of undefined indexes]] T157619 (duration: 00m 41s) [20:41:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:41:09] T157619: Notice: Undefined index: X in /srv/mediawiki/php-1.29.0-wmf.11/skins/CologneBlue/SkinCologneBlue.php on line Y - https://phabricator.wikimedia.org/T157619 [20:41:37] (03CR) 10Paladox: "> Ok, so you are getting a "5" from Hiera in Labs as the Elastic" [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [20:42:33] 06Operations, 10Ops-Access-Requests, 06Research-and-Data: Cluster Access for Nithum Thain - https://phabricator.wikimedia.org/T157724#3014586 (10RobH) @ellery: So there are some new access request policies being put into place. @MoritzMuehlenhoff has been point on them, so he may have to elaborate. Howeve... [20:43:32] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [20:45:20] !log thcipriani@tin Synchronized php-1.29.0-wmf.11/skins/CologneBlue/SkinCologneBlue.php: [[gerrit:336714|Fix a bunch of undefined indexes]] T157619 (sync actual skin file) (duration: 00m 40s) [20:45:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:46:36] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: db2060 not accessible - https://phabricator.wikimedia.org/T156161#3014626 (10Papaul) unfortunately once again the tech didn't show up as scheduled between 9 am and 11am. I had to call HP and find out why but they couldn't tell me the reason the tech did... [20:46:42] (03CR) 10Ottomata: [C: 032] Ensuring explicit version of git-fat to upgrade to 0.1.2 [puppet] - 10https://gerrit.wikimedia.org/r/336887 (https://phabricator.wikimedia.org/T155856) (owner: 10Ottomata) [20:48:36] 06Operations, 10ops-eqiad, 06Analytics-Kanban, 13Patch-For-Review, 15User-Elukey: rack and set up aqs100[7-9] - https://phabricator.wikimedia.org/T155654#3014643 (10Nuria) 05Open>03Resolved [20:49:32] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:49:52] PROBLEM - puppet last run on cp1073 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:49:52] (03PS1) 10Ottomata: Fix git-fat version: 0.1.2-2 [puppet] - 10https://gerrit.wikimedia.org/r/336889 [20:49:52] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:12] PROBLEM - puppet last run on scb1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:12] PROBLEM - puppet last run on etcd1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:12] PROBLEM - puppet last run on mw2172 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:22] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:22] PROBLEM - puppet last run on mc1036 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:29] that's me! [20:50:31] sorry! [20:50:32] fixing. [20:50:32] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:32] PROBLEM - puppet last run on restbase-dev1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:32] PROBLEM - puppet last run on cp1099 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:32] PROBLEM - puppet last run on serpens is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:32] PROBLEM - puppet last run on mw2098 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:32] PROBLEM - puppet last run on mw1304 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:34] (03PS2) 10Ottomata: Fix git-fat version: 0.1.2-1 [puppet] - 10https://gerrit.wikimedia.org/r/336889 [20:50:42] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:42] PROBLEM - puppet last run on bromine is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:42] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:42] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:42] PROBLEM - puppet last run on mw2223 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:48] (03CR) 10Ottomata: [V: 032 C: 032] Fix git-fat version: 0.1.2-1 [puppet] - 10https://gerrit.wikimedia.org/r/336889 (owner: 10Ottomata) [20:50:52] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:52] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:52] PROBLEM - puppet last run on dbmonitor2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:50:52] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:02] PROBLEM - puppet last run on elastic2013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:02] PROBLEM - puppet last run on db2057 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:02] PROBLEM - puppet last run on es1016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:12] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:12] PROBLEM - puppet last run on mw2242 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:12] PROBLEM - puppet last run on mw2152 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:12] PROBLEM - puppet last run on mw2107 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:12] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:12] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:13] PROBLEM - puppet last run on elastic2032 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:13] PROBLEM - puppet last run on ms-be1021 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:14] PROBLEM - puppet last run on conf2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:14] PROBLEM - puppet last run on elastic1035 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:15] haha [20:51:22] PROBLEM - puppet last run on cp2022 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:22] PROBLEM - puppet last run on druid1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:22] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:22] PROBLEM - puppet last run on mc1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:22] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:22] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:22] PROBLEM - puppet last run on ms-be2014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:23] PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:23] PROBLEM - puppet last run on kafka1014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:24] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:32] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:32] PROBLEM - puppet last run on mc1035 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:32] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:32] PROBLEM - puppet last run on db2041 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:32] PROBLEM - puppet last run on analytics1056 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:32] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:33] PROBLEM - puppet last run on pc2005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:33] PROBLEM - puppet last run on mw2133 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:42] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:42] PROBLEM - puppet last run on mw2200 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:42] PROBLEM - puppet last run on mw2094 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:42] PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:42] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:42] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:52] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:52] PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:51:52] PROBLEM - puppet last run on seaborgium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:02] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:02] PROBLEM - puppet last run on mc1025 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:02] PROBLEM - puppet last run on rdb1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:12] PROBLEM - puppet last run on bohrium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:12] PROBLEM - puppet last run on mw2150 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:12] PROBLEM - puppet last run on db1055 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:12] PROBLEM - puppet last run on mw1274 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:13] PROBLEM - puppet last run on db2061 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:22] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:22] PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:22] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:22] PROBLEM - puppet last run on conf2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:22] PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:32] PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:33] PROBLEM - puppet last run on etcd1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat],Etcd_user[root] [20:52:33] PROBLEM - puppet last run on db1076 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:33] PROBLEM - puppet last run on kafka1013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:33] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:33] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:33] PROBLEM - puppet last run on mw1295 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:33] PROBLEM - puppet last run on mw1245 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:33] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:34] PROBLEM - puppet last run on wtp1021 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:34] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:42] PROBLEM - puppet last run on cp2019 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:42] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:42] PROBLEM - puppet last run on hassium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:42] PROBLEM - puppet last run on mw2255 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:42] PROBLEM - puppet last run on wtp2009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:42] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:43] PROBLEM - puppet last run on mw2237 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:43] PROBLEM - puppet last run on mw2202 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:44] PROBLEM - puppet last run on labvirt1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:44] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:45] PROBLEM - puppet last run on cp2015 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:52] PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:52:59] 06Operations, 06Security-Team, 13Patch-For-Review: Create cronjob for regular captcha regeneration - https://phabricator.wikimedia.org/T150029#3014683 (10Reedy) 05Open>03Resolved a:03Reedy [20:53:02] PROBLEM - puppet last run on labtestmetal2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:02] PROBLEM - puppet last run on maps-test2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:02] PROBLEM - puppet last run on aqs1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:02] PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:02] PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:12] PROBLEM - puppet last run on es2012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:12] PROBLEM - puppet last run on mc1026 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:13] PROBLEM - puppet last run on dbproxy1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:13] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:13] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:13] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:19] (03PS1) 10Ottomata: Need trusty/jessie in version number [puppet] - 10https://gerrit.wikimedia.org/r/336890 [20:53:22] PROBLEM - puppet last run on labcontrol1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:22] PROBLEM - puppet last run on db1081 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:22] PROBLEM - puppet last run on mc2020 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:22] PROBLEM - puppet last run on labvirt1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:32] PROBLEM - puppet last run on mw1296 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:32] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:32] PROBLEM - puppet last run on mx1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:32] PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:32] PROBLEM - puppet last run on ms-be2017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:32] PROBLEM - puppet last run on krypton is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:32] PROBLEM - puppet last run on elastic2028 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:33] PROBLEM - puppet last run on mw2186 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:33] PROBLEM - puppet last run on mw2193 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:33] (03CR) 10Ottomata: [V: 032 C: 032] Need trusty/jessie in version number [puppet] - 10https://gerrit.wikimedia.org/r/336890 (owner: 10Ottomata) [20:53:34] PROBLEM - puppet last run on mw2106 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:34] PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:52] PROBLEM - puppet last run on pc1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:53:52] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:02] PROBLEM - puppet last run on ms-be1024 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:02] PROBLEM - puppet last run on es2016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:02] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:03] PROBLEM - puppet last run on maps-test2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:03] PROBLEM - puppet last run on db1074 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:03] PROBLEM - puppet last run on ms-be1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:03] PROBLEM - puppet last run on logstash1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:03] PROBLEM - puppet last run on aluminium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:04] PROBLEM - puppet last run on elastic1023 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:12] PROBLEM - puppet last run on db1093 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:12] PROBLEM - puppet last run on db1070 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:12] PROBLEM - puppet last run on chlorine is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:12] PROBLEM - puppet last run on analytics1044 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:12] PROBLEM - puppet last run on analytics1032 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:12] PROBLEM - puppet last run on elastic2021 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:12] PROBLEM - puppet last run on mw2219 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:13] PROBLEM - puppet last run on db1063 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:13] PROBLEM - puppet last run on db2011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:14] PROBLEM - puppet last run on dbproxy1009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:22] PROBLEM - puppet last run on ms-fe1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:22] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:22] PROBLEM - puppet last run on elastic2005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:22] PROBLEM - puppet last run on elastic2020 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:22] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:22] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:32] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:32] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:32] PROBLEM - puppet last run on relforge1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:32] PROBLEM - puppet last run on elastic2026 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:32] PROBLEM - puppet last run on mc1016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:33] PROBLEM - puppet last run on db1053 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:33] PROBLEM - puppet last run on dbproxy1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:33] PROBLEM - puppet last run on lvs4002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:34] PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:34] PROBLEM - puppet last run on mw2167 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:35] PROBLEM - puppet last run on mw2144 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:35] PROBLEM - puppet last run on mw2209 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:36] PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:54:36] PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:02] PROBLEM - puppet last run on ms-fe2007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:02] PROBLEM - puppet last run on labstore2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:02] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:03] PROBLEM - puppet last run on ms-be1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:03] PROBLEM - puppet last run on wdqs1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:03] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:12] PROBLEM - puppet last run on pc2006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:12] PROBLEM - puppet last run on thorium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:12] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:12] PROBLEM - puppet last run on poolcounter1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:13] PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:13] PROBLEM - puppet last run on mw1270 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:22] PROBLEM - puppet last run on ms-be1019 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:22] PROBLEM - puppet last run on dbmonitor1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:23] PROBLEM - puppet last run on mw1219 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:23] PROBLEM - puppet last run on mc1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:23] PROBLEM - puppet last run on ganeti2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:23] PROBLEM - puppet last run on db2050 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:23] PROBLEM - puppet last run on mw2157 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:23] PROBLEM - puppet last run on mw2091 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:24] PROBLEM - puppet last run on dbstore1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:24] PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:25] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:32] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:32] PROBLEM - puppet last run on mwlog1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:32] PROBLEM - puppet last run on mc1030 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:32] PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:33] PROBLEM - puppet last run on mc1018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:33] PROBLEM - puppet last run on ms-be1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:33] PROBLEM - puppet last run on mw2248 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:33] PROBLEM - puppet last run on puppetmaster1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:34] PROBLEM - puppet last run on aqs1009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:42] PROBLEM - puppet last run on graphite1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:42] PROBLEM - puppet last run on ganeti2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:42] PROBLEM - puppet last run on heze is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:43] PROBLEM - puppet last run on mw2141 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:43] PROBLEM - puppet last run on db2067 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:43] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:55:55] (03PS4) 10Krinkle: Include DB shard in production SPI log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 (owner: 10Aaron Schulz) [20:56:02] PROBLEM - puppet last run on es2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:02] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:03] PROBLEM - puppet last run on mw2118 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:03] PROBLEM - puppet last run on cp3039 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:03] PROBLEM - puppet last run on db1084 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:12] PROBLEM - puppet last run on wtp1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:12] PROBLEM - puppet last run on sarin is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:12] PROBLEM - puppet last run on analytics1041 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:12] PROBLEM - puppet last run on mw1231 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:12] PROBLEM - puppet last run on analytics1028 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:12] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:13] PROBLEM - puppet last run on restbase1012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:13] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:14] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:22] PROBLEM - puppet last run on wtp1016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:22] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:22] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:22] PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:22] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:23] PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:23] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:23] PROBLEM - puppet last run on ms-fe1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:32] PROBLEM - puppet last run on ms-be1025 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:32] PROBLEM - puppet last run on mw1262 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:32] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:32] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:32] PROBLEM - puppet last run on labvirt1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:32] PROBLEM - puppet last run on dubnium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:32] PROBLEM - puppet last run on elastic2014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:33] PROBLEM - puppet last run on pybal-test2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:33] PROBLEM - puppet last run on prometheus2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:34] PROBLEM - puppet last run on rdb2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:34] PROBLEM - puppet last run on mw2145 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:42] PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:42] PROBLEM - puppet last run on elastic2006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:42] PROBLEM - puppet last run on es2014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:42] PROBLEM - puppet last run on ms-be1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:52] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:52] PROBLEM - puppet last run on mw2217 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:52] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:52] PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:52] PROBLEM - puppet last run on mw1260 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:52] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:56:52] PROBLEM - puppet last run on ms-be2020 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:02] PROBLEM - puppet last run on prometheus1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:02] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:03] PROBLEM - puppet last run on wtp1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:03] PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:03] PROBLEM - puppet last run on db1077 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:03] PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:03] PROBLEM - puppet last run on elastic1020 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:04] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:09] (03PS1) 10Dzahn: icinga/base: test skipping icinga monitoring for mc2016 by host [puppet] - 10https://gerrit.wikimedia.org/r/336891 [20:57:12] PROBLEM - puppet last run on analytics1053 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:12] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:12] PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:12] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:12] PROBLEM - puppet last run on ms-be1022 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:12] PROBLEM - puppet last run on ms-be2025 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:12] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:22] PROBLEM - puppet last run on dbproxy1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:22] PROBLEM - puppet last run on etcd1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat],Etcd_user[root] [20:57:22] PROBLEM - puppet last run on mc1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:22] PROBLEM - puppet last run on maps2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:32] PROBLEM - puppet last run on mc1021 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:32] PROBLEM - puppet last run on maps1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:32] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:32] PROBLEM - puppet last run on cp2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:32] PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:33] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:33] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:33] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:34] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:42] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:42] PROBLEM - puppet last run on db2058 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:42] PROBLEM - puppet last run on wtp2017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:42] PROBLEM - puppet last run on subra is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:42] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:53] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:57:53] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:02] PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:02] PROBLEM - puppet last run on db1086 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:02] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:02] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:02] PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:12] PROBLEM - puppet last run on elastic2007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:12] PROBLEM - puppet last run on mc2030 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:12] PROBLEM - puppet last run on db2064 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:12] PROBLEM - puppet last run on mw2254 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:12] PROBLEM - puppet last run on mw2228 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:12] PROBLEM - puppet last run on mw2095 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:12] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:13] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat],Etcd_role[conftool],Etcd_user[conftool] [20:58:13] PROBLEM - puppet last run on druid1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:13] (03CR) 10Dzahn: "where does the 5 come from? can you link ?" [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [20:58:14] PROBLEM - puppet last run on mc2026 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:14] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:20] (03CR) 10Krinkle: [C: 031] "Okay. I believe it's one more day (Jan 11) before we'll see this confirmed on logstash-beta, but I believe you :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 (owner: 10Aaron Schulz) [20:58:22] PROBLEM - puppet last run on ms-be2022 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:22] PROBLEM - puppet last run on wtp2015 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:22] PROBLEM - puppet last run on mwdebug1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:22] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:22] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:22] PROBLEM - puppet last run on mw1284 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:22] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:34] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:34] PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:35] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:35] PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:36] PROBLEM - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:36] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:37] PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:37] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:42] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:42] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:42] PROBLEM - puppet last run on elastic2034 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:42] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:58:49] (03CR) 10Ottomata: [V: 032 C: 032] Need to change git-fat version in trebuchet::packages too [puppet] - 10https://gerrit.wikimedia.org/r/336892 (owner: 10Ottomata) [20:58:52] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:02] PROBLEM - puppet last run on mc2029 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:02] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:03] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:12] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:12] PROBLEM - puppet last run on elastic1042 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:12] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:12] PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:12] PROBLEM - puppet last run on eventlog2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:12] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:13] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:13] PROBLEM - puppet last run on ms-be2021 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:14] PROBLEM - puppet last run on wtp2019 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:14] PROBLEM - puppet last run on labtestvirt2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:15] PROBLEM - puppet last run on snapshot1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:15] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:22] whee [20:59:22] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:22] PROBLEM - puppet last run on wtp2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:22] PROBLEM - puppet last run on rdb1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:22] PROBLEM - puppet last run on wtp1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:26] wheeeeee [20:59:32] PROBLEM - puppet last run on mc1022 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:32] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:32] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:32] PROBLEM - puppet last run on elastic2024 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:32] PROBLEM - puppet last run on es2018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:32] PROBLEM - puppet last run on restbase2008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:33] PROBLEM - puppet last run on labtestneutron2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:33] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:34] PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:34] PROBLEM - puppet last run on db1089 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:35] PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat],Etcd_user[root],Etcd_role[guest] [20:59:42] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:42] PROBLEM - puppet last run on mc2035 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:42] PROBLEM - puppet last run on mw2166 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:43] PROBLEM - puppet last run on mw2198 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:43] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:52] PROBLEM - puppet last run on snapshot1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [20:59:52] PROBLEM - puppet last run on ununpentium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:02] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:02] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:02] PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:02] PROBLEM - puppet last run on scb2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:02] PROBLEM - puppet last run on mw2138 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:03] PROBLEM - puppet last run on stat1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:03] PROBLEM - puppet last run on wtp1012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:03] PROBLEM - puppet last run on elastic1038 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:12] PROBLEM - puppet last run on db1049 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:12] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:12] PROBLEM - puppet last run on mw2132 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:12] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:12] PROBLEM - puppet last run on elastic1039 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:13] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:13] PROBLEM - puppet last run on ms-fe2005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:13] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:14] PROBLEM - puppet last run on suhail is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:22] PROBLEM - puppet last run on planet2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:22] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:22] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:22] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:22] PROBLEM - puppet last run on elastic2010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:23] PROBLEM - puppet last run on mw1302 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:23] PROBLEM - puppet last run on ms-be2015 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:23] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:32] PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:32] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:33] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:33] PROBLEM - puppet last run on cobalt is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:33] PROBLEM - puppet last run on aqs1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:33] PROBLEM - puppet last run on kafka1012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:33] PROBLEM - puppet last run on dbproxy1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:33] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:33] PROBLEM - puppet last run on nihal is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:34] PROBLEM - puppet last run on db2068 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:34] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:35] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:52] PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:00:57] (03PS2) 10Krinkle: Swap from protocol-relative urls to https everywhere [puppet] - 10https://gerrit.wikimedia.org/r/332707 (owner: 10Chad) [21:01:02] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:02] PROBLEM - puppet last run on mc2025 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:02] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:12] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:12] PROBLEM - puppet last run on db2062 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:12] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:12] PROBLEM - puppet last run on restbase-dev1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:12] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:13] PROBLEM - puppet last run on mw1261 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:13] PROBLEM - puppet last run on etcd1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat],Etcd_user[root] [21:01:22] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:22] PROBLEM - puppet last run on elastic2030 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:23] PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:23] PROBLEM - puppet last run on db1041 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:23] PROBLEM - puppet last run on mw2235 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:23] PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:23] PROBLEM - puppet last run on mw2125 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:23] PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:24] PROBLEM - puppet last run on analytics1045 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:32] PROBLEM - puppet last run on mw1288 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:32] PROBLEM - puppet last run on mc1024 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:32] PROBLEM - puppet last run on db2049 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:32] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:32] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:32] PROBLEM - puppet last run on wtp2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:32] PROBLEM - puppet last run on db2033 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:33] PROBLEM - puppet last run on mw2232 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:33] PROBLEM - puppet last run on mw2162 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:34] PROBLEM - puppet last run on mw2181 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:34] PROBLEM - puppet last run on mw1303 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:35] PROBLEM - puppet last run on cp3032 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:47] phew ok [21:01:52] PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:01:53] pupept should be happier now... [21:02:02] PROBLEM - puppet last run on es2019 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:02] PROBLEM - puppet last run on maps-test2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:02] PROBLEM - puppet last run on ms-be1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:03] PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:12] PROBLEM - puppet last run on mw1259 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:13] PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:13] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:13] PROBLEM - puppet last run on db1091 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:13] PROBLEM - puppet last run on ms-be1020 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:13] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:13] PROBLEM - puppet last run on elastic2025 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:14] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:22] PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:22] PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:23] PROBLEM - puppet last run on mw1257 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:23] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:29] ottomata: those git-fat changes are breaking puppet on a bunch of Labs instances, are you still fixing things? [21:02:32] PROBLEM - puppet last run on mw1252 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:32] PROBLEM - puppet last run on es2015 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:32] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:02:32] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:02:37] well, ok, clearly not just in labs [21:02:59] andrewbogott: yes [21:03:06] last commit that just got merged shoudl fix [21:03:13] ok, checking... [21:03:23] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:03:31] ^ I checked cp1065 and it's working (a jessie node) [21:03:38] great, thanks, sorry guys [21:03:42] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:03:52] puppet compiler doesn't help with checking what package versions it will like [21:03:58] didn't realize it had to be so specific [21:03:58] labs still has all 3 though i think (precise, trusty, jessie) [21:04:01] 0.1.2 not good enough [21:04:02] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:04:10] had to be 0.1.2-1~trusty1 (or jessie10 [21:04:15] ottomata: tools instances still seem upset [21:04:16] it has precise!? [21:04:24] building precise package... [21:04:26] anyways, why is my cache node installing git-fat anyways? [21:04:39] i guess trebuchet or scap is included everywhere? [21:04:40] dunno [21:04:46] bleh [21:04:55] was expecting a lot (which is why i did puppet instead of just salt), but not THAT many [21:05:02] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:05:39] (03PS1) 10Cmjohnson: Removing analytics1015 mgmt ip address....decommissioned and removed from rack T147313 [dns] - 10https://gerrit.wikimedia.org/r/336894 [21:05:58] oo carbon is precise :) [21:06:35] you can ignore carbon, puppet is disabled [21:06:38] yeah apparently trebuchet is in base [21:06:55] seems not-quite-right, as we'll likely always have lots of machine roles that have no use for trebuchet [21:07:14] ok, precise package built and added to apt [21:07:20] yeah [21:07:22] bblack agree [21:07:37] i guess the trebuchet classes are intended to disappear, but not sure how long that will be [21:07:50] the scap ones should only happen if you have either scap, or scap::target [21:08:11] andrewbogott: i just built a precise package [21:08:15] can you try puppet again? [21:08:23] yep, trying [21:09:02] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[git-fat] [21:09:18] we still have a handful of service on trebuchet that we need to make time to dig deeper on ownership/removal/whatever. Then trebuchet classes can disappear. There is a long tail here :( [21:09:25] ^^ mutante you sure pupept is disabled on carbon? [21:09:26] :) [21:09:40] ottomata: still failing. [21:09:43] i re-enabled it after you said there is a package now [21:09:48] to see if it works [21:09:50] andrewbogott: same error? can't install package? [21:09:50] um… but I didn't apt-get update, sorry [21:09:52] oh ok mutante :) [21:09:55] but permanently i want puppet to be disabled again [21:09:55] yeah, but puppet should do that, no? [21:10:04] i did apt-get update [21:10:04] wait ok, so puppet failed on carbon? [21:10:06] and then it works fine [21:10:07] mutante: can I try? [21:10:08] oh [21:10:09] ok... [21:10:17] Notice: /Stage[main]/Trebuchet::Packages/Package[git-fat]/ensure: ensure changed '0.1.1-2' to '0.1.2-1~precise1' [21:10:23] great ok [21:10:30] yep [21:11:04] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:12:59] andrewbogott: better? [21:13:08] ottomata: I think so, one more test... [21:15:42] (03PS2) 10Chad: Scap clean: Rework --l10n-only into --keep-static [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336730 (https://phabricator.wikimedia.org/T73313) [21:16:37] ottomata: seems better, thanks [21:17:42] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:17:52] RECOVERY - puppet last run on cp1073 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:18:02] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:18:06] great [21:18:12] RECOVERY - puppet last run on scb1003 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:18:12] RECOVERY - puppet last run on etcd1004 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:18:22] RECOVERY - puppet last run on druid1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:18:22] RECOVERY - puppet last run on mc1036 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:18:23] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [21:18:23] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:18:32] RECOVERY - puppet last run on cp1099 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [21:18:32] RECOVERY - puppet last run on restbase-dev1002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:18:32] RECOVERY - puppet last run on serpens is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:18:32] RECOVERY - puppet last run on mw2098 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:18:32] RECOVERY - puppet last run on mw1304 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:18:33] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:18:42] RECOVERY - puppet last run on mw2200 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [21:18:42] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:18:42] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:18:42] RECOVERY - puppet last run on bromine is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:18:42] RECOVERY - puppet last run on mw2223 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:18:42] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:18:42] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:18:43] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:18:52] RECOVERY - puppet last run on dbmonitor2001 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:18:52] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:18:52] RECOVERY - puppet last run on analytics1051 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:18:52] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:18:52] RECOVERY - puppet last run on seaborgium is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:19:02] RECOVERY - puppet last run on elastic2013 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:19:02] RECOVERY - puppet last run on db2057 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:19:02] RECOVERY - puppet last run on rdb1006 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:19:02] RECOVERY - puppet last run on es1016 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:19:12] RECOVERY - puppet last run on mw2152 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [21:19:12] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:19:12] RECOVERY - puppet last run on elastic1035 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:19:12] RECOVERY - puppet last run on ms-be1021 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:19:12] RECOVERY - puppet last run on mw1274 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:19:13] RECOVERY - puppet last run on conf2002 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:19:13] RECOVERY - puppet last run on mw2172 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:19:22] RECOVERY - puppet last run on cp2022 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:19:22] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:19:22] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:19:22] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:19:22] RECOVERY - puppet last run on labvirt1012 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [21:19:22] RECOVERY - puppet last run on kafka1014 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:19:32] RECOVERY - puppet last run on mc1035 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:19:32] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:19:32] RECOVERY - puppet last run on analytics1056 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:19:32] RECOVERY - puppet last run on db2041 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:19:32] RECOVERY - puppet last run on pc2005 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:19:32] RECOVERY - puppet last run on mw2133 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:19:42] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:19:42] RECOVERY - puppet last run on mira is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:19:42] RECOVERY - puppet last run on mw2094 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:19:42] RECOVERY - puppet last run on db2037 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:19:52] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [21:20:02] RECOVERY - puppet last run on mc1025 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:20:02] RECOVERY - puppet last run on maps-test2001 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [21:20:12] RECOVERY - puppet last run on mw2242 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:20:12] RECOVERY - puppet last run on mw2107 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:20:12] RECOVERY - puppet last run on mw2150 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:20:12] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:20:12] RECOVERY - puppet last run on elastic2032 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:20:13] RECOVERY - puppet last run on db2061 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:20:22] RECOVERY - puppet last run on labsdb1001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:20:22] RECOVERY - puppet last run on db1062 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:20:22] RECOVERY - puppet last run on mc1008 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:20:22] RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:20:22] RECOVERY - puppet last run on ms-be2014 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:20:22] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:20:22] RECOVERY - puppet last run on conf2003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:20:23] RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:20:32] RECOVERY - puppet last run on db1076 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:20:32] RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:20:32] RECOVERY - puppet last run on etcd1001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:20:32] RECOVERY - puppet last run on kafka1013 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:20:32] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:20:32] RECOVERY - puppet last run on elastic2028 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:20:32] RECOVERY - puppet last run on mw2106 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:20:33] RECOVERY - puppet last run on analytics1052 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:20:33] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:20:34] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:20:34] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:20:35] RECOVERY - puppet last run on wtp1021 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:20:52] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:21:02] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [21:21:02] RECOVERY - puppet last run on labtestmetal2001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:21:02] RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:21:02] RECOVERY - puppet last run on ms-be1017 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:21:02] RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:21:03] RECOVERY - puppet last run on db1093 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:21:03] RECOVERY - puppet last run on bohrium is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:21:12] RECOVERY - puppet last run on mc1026 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:21:12] RECOVERY - puppet last run on db1055 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:21:12] RECOVERY - puppet last run on dbproxy1006 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:21:12] RECOVERY - puppet last run on es2012 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:21:12] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:21:12] RECOVERY - puppet last run on labcontrol1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:21:22] RECOVERY - puppet last run on ms-fe1003 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:21:22] RECOVERY - puppet last run on db1081 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:21:22] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:21:22] RECOVERY - puppet last run on mc2020 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:21:22] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:21:22] RECOVERY - puppet last run on labvirt1007 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:21:22] RECOVERY - puppet last run on mw1296 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:21:23] RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:21:32] RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:21:32] RECOVERY - puppet last run on mx1001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:21:32] RECOVERY - puppet last run on krypton is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [21:21:32] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:21:32] RECOVERY - puppet last run on mw2186 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [21:21:32] RECOVERY - puppet last run on mw2193 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:21:33] RECOVERY - puppet last run on mw2144 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [21:21:33] RECOVERY - puppet last run on mw2167 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:21:34] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:21:34] RECOVERY - puppet last run on mw1295 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:21:35] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:21:42] RECOVERY - puppet last run on cp2017 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:21:42] RECOVERY - puppet last run on ms-be2016 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:21:42] RECOVERY - puppet last run on conf2001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:21:42] RECOVERY - puppet last run on mw2255 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:21:42] RECOVERY - puppet last run on zosma is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:21:52] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:21:52] RECOVERY - puppet last run on pc1004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:21:52] RECOVERY - puppet last run on ms-be1024 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:22:02] RECOVERY - puppet last run on maps-test2003 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:22:02] RECOVERY - puppet last run on aqs1005 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:22:02] RECOVERY - puppet last run on aluminium is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:22:02] RECOVERY - puppet last run on wdqs1002 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:22:03] RECOVERY - puppet last run on chlorine is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:22:03] RECOVERY - puppet last run on analytics1044 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:22:04] RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:22:12] RECOVERY - puppet last run on pc2006 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:22:12] RECOVERY - puppet last run on mw2219 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:22:12] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:22:12] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:22:12] RECOVERY - puppet last run on poolcounter1001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:22:13] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:22:13] RECOVERY - puppet last run on mw1270 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:22:14] RECOVERY - puppet last run on dbproxy1009 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:22:14] RECOVERY - puppet last run on db2011 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [21:22:22] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:22:23] RECOVERY - puppet last run on elastic2020 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:22:23] RECOVERY - puppet last run on elastic2005 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:22:23] RECOVERY - puppet last run on db2050 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:22:23] RECOVERY - puppet last run on mw2091 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:22:23] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:22:32] RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:22:32] RECOVERY - puppet last run on mc1016 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:22:32] RECOVERY - puppet last run on db1053 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:22:32] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:22:32] RECOVERY - puppet last run on dbproxy1005 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:22:32] RECOVERY - puppet last run on ms-be2017 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:22:33] RECOVERY - puppet last run on elastic2026 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:22:33] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:22:34] RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [21:22:34] RECOVERY - puppet last run on mw2209 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:22:35] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:22:36] RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:22:36] RECOVERY - puppet last run on aqs1004 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:22:36] RECOVERY - puppet last run on aqs1007 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:23:02] RECOVERY - puppet last run on es2002 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:23:03] RECOVERY - puppet last run on ms-fe2007 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:23:03] RECOVERY - puppet last run on es2016 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:23:03] RECOVERY - puppet last run on db1074 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [21:23:03] RECOVERY - puppet last run on ms-be1004 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:23:03] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:23:03] RECOVERY - puppet last run on logstash1005 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:23:04] (03CR) 10Cmjohnson: [C: 032] Removing analytics1015 mgmt ip address....decommissioned and removed from rack T147313 [dns] - 10https://gerrit.wikimedia.org/r/336894 (owner: 10Cmjohnson) [21:23:04] RECOVERY - puppet last run on elastic1023 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:23:04] RECOVERY - puppet last run on db1070 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:23:12] RECOVERY - puppet last run on thorium is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:23:12] RECOVERY - puppet last run on elastic2021 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:23:12] RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:23:12] RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:23:22] RECOVERY - puppet last run on mw1219 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:23:22] RECOVERY - puppet last run on dbmonitor1001 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:23:22] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:23:22] RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:23:22] RECOVERY - puppet last run on ganeti2002 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:23:22] RECOVERY - puppet last run on mw2157 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:23:22] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:23:23] RECOVERY - puppet last run on ms-be1025 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:23:23] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:23:24] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:23:32] RECOVERY - puppet last run on mc1030 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:23:32] RECOVERY - puppet last run on mwlog1001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:23:32] RECOVERY - puppet last run on mc1018 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:23:32] RECOVERY - puppet last run on mw2248 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:23:32] RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:23:33] RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:23:33] RECOVERY - puppet last run on puppetmaster1002 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:23:35] RECOVERY - puppet last run on aqs1009 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:23:35] RECOVERY - puppet last run on graphite1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:23:42] RECOVERY - puppet last run on ganeti2003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:23:43] RECOVERY - puppet last run on heze is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:23:43] RECOVERY - puppet last run on mw2141 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:23:43] RECOVERY - puppet last run on db2067 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:23:43] RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:23:43] RECOVERY - puppet last run on es2014 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:23:52] RECOVERY - puppet last run on mw2217 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:24:02] RECOVERY - puppet last run on labstore2004 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:24:02] RECOVERY - puppet last run on mw2118 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:24:02] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:24:02] RECOVERY - puppet last run on cp3039 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:24:02] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:24:02] RECOVERY - puppet last run on elastic1020 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:24:03] RECOVERY - puppet last run on db1077 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:24:03] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:24:04] RECOVERY - puppet last run on db1084 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:24:05] RECOVERY - puppet last run on wtp1006 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:24:12] RECOVERY - puppet last run on mw1231 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:24:12] RECOVERY - puppet last run on analytics1041 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:24:12] RECOVERY - puppet last run on sarin is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:24:12] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:24:12] RECOVERY - puppet last run on restbase1012 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:24:13] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:24:13] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:24:19] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3014795 (10Cmjohnson) [21:24:22] RECOVERY - puppet last run on wtp1016 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:24:22] RECOVERY - puppet last run on dbproxy1008 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:24:22] RECOVERY - puppet last run on ms-be1019 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:24:22] RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:24:22] RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:24:22] RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:24:22] RECOVERY - puppet last run on dbstore1002 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:24:23] RECOVERY - puppet last run on db1073 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:24:23] RECOVERY - puppet last run on ms-fe1007 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:24:24] RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [21:24:24] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:24:32] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:24:32] RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:24:32] RECOVERY - puppet last run on dubnium is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:24:32] RECOVERY - puppet last run on elastic2014 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:24:32] RECOVERY - puppet last run on pybal-test2002 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:24:33] RECOVERY - puppet last run on prometheus2001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:24:33] RECOVERY - puppet last run on rdb2003 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [21:24:34] RECOVERY - puppet last run on mw2145 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:24:42] RECOVERY - puppet last run on elastic2006 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:24:42] RECOVERY - puppet last run on wtp2017 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:24:42] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:24:52] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:24:52] RECOVERY - puppet last run on mw1215 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:24:52] RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:24:52] RECOVERY - puppet last run on ms-be2020 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:25:02] RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:25:02] RECOVERY - puppet last run on db1086 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:25:02] RECOVERY - puppet last run on mw1276 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:25:02] RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [21:25:02] RECOVERY - puppet last run on db1045 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:25:03] RECOVERY - puppet last run on labservices1001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:25:03] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:25:04] RECOVERY - puppet last run on analytics1053 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:25:12] RECOVERY - puppet last run on ms-be1022 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:25:12] RECOVERY - puppet last run on mc2030 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:25:12] RECOVERY - puppet last run on db2064 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:25:12] RECOVERY - puppet last run on analytics1028 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:25:12] RECOVERY - puppet last run on ms-be2025 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:25:12] RECOVERY - puppet last run on db1056 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:25:13] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:25:22] RECOVERY - puppet last run on etcd1003 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [21:25:22] RECOVERY - puppet last run on mc1017 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [21:25:22] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:25:22] RECOVERY - puppet last run on mw1284 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:25:22] RECOVERY - puppet last run on maps2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:25:22] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:25:23] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:25:32] RECOVERY - puppet last run on mc1021 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:25:32] RECOVERY - puppet last run on maps1002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:25:32] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:25:32] RECOVERY - puppet last run on labvirt1004 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:25:32] RECOVERY - puppet last run on cp2002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:25:33] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:25:33] RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:25:34] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:25:34] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [21:25:35] RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:25:35] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:25:36] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:25:36] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:25:42] RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:25:42] RECOVERY - puppet last run on db2058 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:25:42] RECOVERY - puppet last run on subra is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [21:25:42] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:25:42] RECOVERY - puppet last run on ms-be1002 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:25:52] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:25:52] RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:26:02] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:26:02] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:26:02] RECOVERY - puppet last run on prometheus1004 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:26:02] RECOVERY - puppet last run on wtp1008 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:26:02] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:26:12] RECOVERY - puppet last run on druid1002 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:26:12] RECOVERY - puppet last run on elastic1042 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:26:12] RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:26:12] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:26:12] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:26:12] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:26:13] RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:26:13] RECOVERY - puppet last run on elastic2007 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:26:14] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:26:14] RECOVERY - puppet last run on eventlog2001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:26:15] RECOVERY - puppet last run on mw2228 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:26:15] RECOVERY - puppet last run on mw2095 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:26:16] RECOVERY - puppet last run on mc2026 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:26:16] RECOVERY - puppet last run on ms-be2022 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:26:32] RECOVERY - puppet last run on mc1027 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:26:32] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:26:32] RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:26:32] RECOVERY - puppet last run on db2056 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:26:32] RECOVERY - puppet last run on es2013 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:26:33] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:26:33] RECOVERY - puppet last run on mw2250 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:26:34] RECOVERY - puppet last run on mw2146 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:26:34] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:26:35] RECOVERY - puppet last run on mw2120 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:26:35] RECOVERY - puppet last run on ms-be2026 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [21:26:36] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:26:36] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:26:37] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:26:42] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:26:42] RECOVERY - puppet last run on elastic2034 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:26:42] RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:26:52] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:26:52] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:27:02] RECOVERY - puppet last run on mc2029 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:27:02] RECOVERY - puppet last run on wtp1012 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:27:03] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:27:12] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [21:27:12] RECOVERY - puppet last run on mw2254 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [21:27:12] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:27:12] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:27:13] RECOVERY - puppet last run on ms-be2021 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:27:13] RECOVERY - puppet last run on elastic1039 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:27:13] RECOVERY - puppet last run on snapshot1007 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:27:14] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [21:27:14] RECOVERY - puppet last run on wtp2019 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:27:15] RECOVERY - puppet last run on labtestvirt2001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:27:22] RECOVERY - puppet last run on mwdebug1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:27:23] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:27:23] RECOVERY - puppet last run on rdb1005 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [21:27:23] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:27:23] RECOVERY - puppet last run on wtp2004 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:27:26] what is going? [21:27:28] on* [21:27:32] RECOVERY - puppet last run on mc1022 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:27:32] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:27:32] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:27:32] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:27:32] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:27:32] RECOVERY - puppet last run on elastic2024 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:27:33] RECOVERY - puppet last run on labtestneutron2001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:27:33] RECOVERY - puppet last run on es2018 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:27:34] RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:27:34] RECOVERY - puppet last run on db1089 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:27:35] RECOVERY - puppet last run on elastic2035 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:27:35] RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [21:27:42] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:27:42] RECOVERY - puppet last run on mc2035 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:27:42] RECOVERY - puppet last run on mw2166 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:27:42] RECOVERY - puppet last run on mw2198 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:27:42] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:27:52] RECOVERY - puppet last run on ununpentium is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:28:02] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:28:02] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:28:02] RECOVERY - puppet last run on scb2001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:28:02] RECOVERY - puppet last run on mw2138 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:28:02] RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:28:02] RECOVERY - puppet last run on stat1003 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [21:28:03] RECOVERY - puppet last run on db1049 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:28:12] RECOVERY - puppet last run on restbase-dev1003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:28:12] RECOVERY - puppet last run on db2062 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:28:12] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:28:12] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:28:22] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:28:22] RECOVERY - puppet last run on mw1302 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:28:22] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:28:22] RECOVERY - puppet last run on wtp1014 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:28:23] RECOVERY - puppet last run on elastic2010 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:28:23] RECOVERY - puppet last run on ms-be2015 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:28:23] RECOVERY - puppet last run on wtp1005 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:28:24] RECOVERY - puppet last run on ms-be1005 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:28:24] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:28:32] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:28:32] RECOVERY - puppet last run on cobalt is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:28:32] RECOVERY - puppet last run on aqs1008 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:28:32] RECOVERY - puppet last run on dbproxy1003 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [21:28:32] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:28:32] RECOVERY - puppet last run on db2068 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:28:33] RECOVERY - puppet last run on nihal is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:28:33] RECOVERY - puppet last run on restbase2008 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:28:34] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:28:34] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:28:35] RECOVERY - puppet last run on restbase1008 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:28:42] RECOVERY - puppet last run on graphite2001 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:28:42] RECOVERY - puppet last run on maps2004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:28:52] RECOVERY - puppet last run on snapshot1006 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:28:52] RECOVERY - puppet last run on mw2177 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:29:02] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:29:02] RECOVERY - puppet last run on mc2025 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:29:02] RECOVERY - puppet last run on elastic1038 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:29:02] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:29:12] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:29:12] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:29:12] RECOVERY - puppet last run on mw1261 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:29:12] RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:29:12] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:29:12] RECOVERY - puppet last run on mw2132 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:29:13] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:29:14] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:29:14] RECOVERY - puppet last run on elastic2025 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:29:14] RECOVERY - puppet last run on ms-fe2005 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [21:29:15] RECOVERY - puppet last run on suhail is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:29:15] RECOVERY - puppet last run on planet2001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:29:22] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:29:22] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:29:22] RECOVERY - puppet last run on elastic2030 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:29:22] RECOVERY - puppet last run on mw2235 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:29:22] RECOVERY - puppet last run on mw2191 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:29:22] RECOVERY - puppet last run on mw2125 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:29:22] RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:29:23] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:29:34] RECOVERY - puppet last run on mw2162 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:29:34] RECOVERY - puppet last run on cp3032 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:29:35] RECOVERY - puppet last run on francium is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:29:42] RECOVERY - puppet last run on kafka2002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:29:42] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:29:52] RECOVERY - puppet last run on mw1282 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:30:02] RECOVERY - puppet last run on maps-test2004 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:30:02] RECOVERY - puppet last run on es2019 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:30:02] RECOVERY - puppet last run on ms-be1008 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:30:02] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:30:02] RECOVERY - puppet last run on mw1259 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:30:12] RECOVERY - puppet last run on db1091 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [21:30:12] RECOVERY - puppet last run on wtp1023 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:30:12] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:30:12] RECOVERY - puppet last run on etcd1002 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:30:22] RECOVERY - puppet last run on db1041 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:30:22] RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:30:22] RECOVERY - puppet last run on mw1257 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:30:23] RECOVERY - puppet last run on analytics1045 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:30:23] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:30:32] RECOVERY - puppet last run on mw1303 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:30:32] RECOVERY - puppet last run on es2015 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:30:32] RECOVERY - puppet last run on mw2181 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:30:32] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:30:42] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:30:49] 06Operations, 10ops-eqiad, 13Patch-For-Review: Degraded RAID on relforge1001 - https://phabricator.wikimedia.org/T156663#3014820 (10Cmjohnson) @gehel not sure why it doesn't show on smartctl but I did hdparm and it shows. I need it for the ticket to HP. cmjohnson@relforge1001:~$ sudo hdparm -I /dev/sda /... [21:31:02] RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:31:06] 06Operations, 10ops-eqiad, 13Patch-For-Review: Degraded RAID on relforge1001 - https://phabricator.wikimedia.org/T156663#3014823 (10Cmjohnson) Disks should hopefully shit today or first thing tomorrow. [21:31:12] RECOVERY - puppet last run on ms-be1020 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:31:22] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:31:22] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:33:40] robh: ^^ [21:34:23] huh [21:34:37] all failed and then all clear, something odd. [21:34:54] someone merged a bad patch that affected the base module (all nodes) [21:35:06] and then fixed it after a few tries :) [21:35:20] I guess the details got lost in the message spam [21:35:30] (03PS1) 10Chad: Scap clean: Automate purging of old deployment branches from gerrit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336901 [21:35:44] my first reaction was SAL [21:35:47] and didnt see anything, heh [21:41:38] hmm, is mwscript supposed to work on mwdebug1002? i get an error about missing Memcached class [21:42:54] I guess it should [21:43:04] I dunno why you'd need it though [21:43:15] https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment#Run_a_maintenance_script_on_a_wiki says i should use terbium [21:44:00] Yep, that's what terbium is for :) [21:44:04] perfect :D [21:44:07] seems working there [21:44:23] terbium is the "cron and other mw maintenance script running" host [21:44:58] (03PS1) 10Ottomata: Set hue allowed_hosts=* to work around bug http://community.cloudera.com/t5/Web-UI-Hue-Beeswax/New-Cloudera-installation-Hue-Bad-Request-400/td-p/50344/page/5 [puppet/cdh] - 10https://gerrit.wikimedia.org/r/336906 (https://phabricator.wikimedia.org/T152714) [21:45:28] (03PS2) 10Ottomata: Set hue allowed_hosts=* to work around bug http://community.cloudera.com/t5/Web-UI-Hue-Beeswax/New-Cloudera-installation-Hue-Bad-Request-400/td-p/50344/page/5 [puppet/cdh] - 10https://gerrit.wikimedia.org/r/336906 (https://phabricator.wikimedia.org/T152714) [21:46:31] (03CR) 10Dzahn: "remove the newly added config option, that is "unknown". instead change the type of the existing config option from string to int." [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [21:49:12] (03PS1) 10Cmjohnson: Adding elastic1048-52 to netboot cfg [puppet] - 10https://gerrit.wikimedia.org/r/336907 [21:49:47] (03Abandoned) 10Paladox: phabricator: elasticsearch version settings (WIP, DNM) [puppet] - 10https://gerrit.wikimedia.org/r/336304 (owner: 10Paladox) [21:49:49] (03CR) 10Dzahn: [C: 032] hieradata: remove stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/336802 (https://phabricator.wikimedia.org/T154164) (owner: 10Filippo Giunchedi) [21:49:57] (03PS2) 10Dzahn: hieradata: remove stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/336802 (https://phabricator.wikimedia.org/T154164) (owner: 10Filippo Giunchedi) [21:50:11] (03CR) 10Dzahn: [C: 032] "ticket says disk is already wiped :)" [puppet] - 10https://gerrit.wikimedia.org/r/336802 (https://phabricator.wikimedia.org/T154164) (owner: 10Filippo Giunchedi) [21:50:20] (03CR) 10Dzahn: [V: 032 C: 032] hieradata: remove stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/336802 (https://phabricator.wikimedia.org/T154164) (owner: 10Filippo Giunchedi) [21:50:26] jouncebot: next [21:50:26] In 2 hour(s) and 9 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170210T0000) [21:54:01] !log cp3014,cp3020,cp3022 - puppet node deactivate - cp3020 delete salt key (T130883) [21:54:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:08] T130883: decom cp3011-22 (12 machines) - https://phabricator.wikimedia.org/T130883 [21:57:43] 06Operations, 10hardware-requests: Replace bast3001 - https://phabricator.wikimedia.org/T156506#3014948 (10Dzahn) a:03Dzahn [21:57:47] 06Operations, 06Analytics-Kanban, 13Patch-For-Review, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3014951 (10Ottomata) Updating here: git-fat 0.1.2 now installed everywhere by puppet. [22:02:58] !log brion running tests of requeueTranscodes.php on terbium to restart subsets of video scaler work [22:03:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:03:11] (03PS3) 10Zppix: Swap from protocol-relative urls to https everywhere [puppet] - 10https://gerrit.wikimedia.org/r/332707 (owner: 10Chad) [22:10:56] (03CR) 10Krinkle: [C: 031] "Nice :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336901 (owner: 10Chad) [22:13:08] (03CR) 10Dzahn: [C: 032] "debugging. temp." [puppet] - 10https://gerrit.wikimedia.org/r/336891 (owner: 10Dzahn) [22:13:20] (03PS2) 10Dzahn: icinga/base: test skipping icinga monitoring for mc2016 by host [puppet] - 10https://gerrit.wikimedia.org/r/336891 [22:13:38] (03CR) 10Dzahn: [V: 032 C: 032] icinga/base: test skipping icinga monitoring for mc2016 by host [puppet] - 10https://gerrit.wikimedia.org/r/336891 (owner: 10Dzahn) [22:25:22] PROBLEM - puppet last run on ms-be1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:29:56] (03CR) 10Rush: "Dang, Tim. Sorry I sandbagged you here. I honestly lost track of the self -1 after the +1 and rebase. That's on me. IIUC we don't need " [puppet] - 10https://gerrit.wikimedia.org/r/336351 (https://phabricator.wikimedia.org/T157400) (owner: 10Tim Landscheidt) [22:33:52] PROBLEM - Host cp2006 is DOWN: PING CRITICAL - Packet loss = 100% [22:34:17] (03PS2) 10Cmjohnson: Adding elastic1048-52 to netboot cfg [puppet] - 10https://gerrit.wikimedia.org/r/336907 [22:34:22] PROBLEM - puppet last run on mc1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:39:02] PROBLEM - IPsec on cp3008 is CRITICAL: Strongswan CRITICAL - ok: 26 connecting: cp2006_v4, cp2006_v6 [22:39:12] PROBLEM - IPsec on cp4001 is CRITICAL: Strongswan CRITICAL - ok: 26 connecting: cp2006_v4, cp2006_v6 [22:39:32] PROBLEM - IPsec on cp3007 is CRITICAL: Strongswan CRITICAL - ok: 26 connecting: cp2006_v4, cp2006_v6 [22:39:32] PROBLEM - IPsec on cp4003 is CRITICAL: Strongswan CRITICAL - ok: 26 connecting: cp2006_v4, cp2006_v6 [22:39:32] PROBLEM - IPsec on cp4002 is CRITICAL: Strongswan CRITICAL - ok: 26 connecting: cp2006_v4, cp2006_v6 [22:39:32] PROBLEM - IPsec on cp3010 is CRITICAL: Strongswan CRITICAL - ok: 26 connecting: cp2006_v4, cp2006_v6 [22:39:42] PROBLEM - IPsec on cp4004 is CRITICAL: Strongswan CRITICAL - ok: 26 connecting: cp2006_v4, cp2006_v6 [22:41:36] (03PS6) 10EBernhardson: Update elasticsearch module for es5 compatability [puppet] - 10https://gerrit.wikimedia.org/r/333969 (https://phabricator.wikimedia.org/T155578) [22:42:23] !log bblack@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp2006.codfw.wmnet [22:42:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:36] !log cp2006 depooled due to icinga report of host-down [22:42:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:44:14] !log thcipriani@tin Synchronized php-1.29.0-wmf.11/skins/CologneBlue/SkinCologneBlue.php: [[gerrit:336931|Revert "Remove warning suppression"]] (duration: 00m 59s) [22:44:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:44:52] (03PS32) 10Paladox: Phabricator: Fix phd init and systemd script also update ssh-phab to use base class [puppet] - 10https://gerrit.wikimedia.org/r/333358 [22:45:58] (03PS1) 10Thcipriani: all wikis to 1.29.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336934 [22:46:02] (03CR) 10Thcipriani: [C: 032] all wikis to 1.29.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336934 (owner: 10Thcipriani) [22:46:22] PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:48:09] (03Merged) 10jenkins-bot: all wikis to 1.29.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336934 (owner: 10Thcipriani) [22:48:15] (03CR) 10jenkins-bot: all wikis to 1.29.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336934 (owner: 10Thcipriani) [22:48:33] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Puppet changes required for elasticsearch 5.x upgrade - https://phabricator.wikimedia.org/T155578#3015199 (10EBernhardson) elasticsearch 5 additionally does not allow us to set global defaults for index settings anymore, they need to... [22:48:42] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.29.0-wmf.11 [22:48:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:49:04] 06Operations, 10ops-eqiad, 06DC-Ops: Racktables equipment that should probably be renamed ? - https://phabricator.wikimedia.org/T150744#3015200 (10RobH) a:03Cmjohnson It seems like this has been cleaned up. I know that since November, Chris has cleaned up quite a few systems. I'm going to assign this to... [22:49:24] 06Operations, 10Ops-Access-Requests, 06Research-and-Data: Cluster Access for Nithum Thain - https://phabricator.wikimedia.org/T157724#3015203 (10ellery) [22:51:14] (03PS1) 10EBernhardson: Configure cirrus per-index settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336936 (https://phabricator.wikimedia.org/T155578) [22:52:44] (03CR) 10jerkins-bot: [V: 04-1] Configure cirrus per-index settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336936 (https://phabricator.wikimedia.org/T155578) (owner: 10EBernhardson) [22:53:22] RECOVERY - puppet last run on ms-be1025 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [22:54:22] RECOVERY - Host cp2006 is UP: PING OK - Packet loss = 0%, RTA = 36.03 ms [22:54:32] RECOVERY - IPsec on cp4003 is OK: Strongswan OK - 28 ESP OK [22:54:32] RECOVERY - IPsec on cp4002 is OK: Strongswan OK - 28 ESP OK [22:54:32] RECOVERY - IPsec on cp3007 is OK: Strongswan OK - 28 ESP OK [22:54:32] RECOVERY - IPsec on cp3010 is OK: Strongswan OK - 28 ESP OK [22:54:42] RECOVERY - IPsec on cp4004 is OK: Strongswan OK - 28 ESP OK [22:54:57] 06Operations, 10Ops-Access-Requests, 06Research-and-Data: Cluster Access for Nithum Thain - https://phabricator.wikimedia.org/T157724#3015230 (10ellery) Thanks @RobH. Nithum signed an NDA that was approved by Manprit, Dario and Wes. I pointed Nithum to this ticket and asked him to complete the tasks you list... [22:55:02] RECOVERY - IPsec on cp3008 is OK: Strongswan OK - 28 ESP OK [22:55:12] RECOVERY - IPsec on cp4001 is OK: Strongswan OK - 28 ESP OK [22:56:26] 06Operations, 10Ops-Access-Requests, 06Research-and-Data: Cluster Access for Nithum Thain - https://phabricator.wikimedia.org/T157724#3015233 (10RobH) @ellery: Thanks for the info! I still think our new guidelines mean we have to have a legal person confirm, but I'll find out! [22:57:29] !log cp2006: unresponsive control, powercycled from racadm, normal boot, no evidence in logs - repooling for now [22:57:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:58:04] !log bblack@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2006.codfw.wmnet [22:58:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:02:22] RECOVERY - puppet last run on mc1023 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [23:10:17] 06Operations, 10Ops-Access-Requests, 06Research-and-Data: Cluster Access for Nithum Thain - https://phabricator.wikimedia.org/T157724#3015283 (10Nithum) @RobH: Thanks for all the help! I've signed the L3 document, @ellery attached the public ssh key and I'd like to use the e-mail address nthain@google.com.... [23:14:23] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Puppet changes required for elasticsearch 5.x upgrade - https://phabricator.wikimedia.org/T155578#3015302 (10EBernhardson) With the slowlog and merge_threads settings moving out of the configuration file they have to all be explicitl... [23:15:09] (03PS2) 10EBernhardson: Configure cirrus per-index settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336936 (https://phabricator.wikimedia.org/T155578) [23:16:22] RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [23:45:10] (03CR) 10Cmjohnson: [C: 032] Adding elastic1048-52 to netboot cfg [puppet] - 10https://gerrit.wikimedia.org/r/336907 (owner: 10Cmjohnson) [23:49:32] (03CR) 10Chad: "I'm wondering if I should hide this behind --keep-static as well so we don't prune the branch until we're ready to totally drop the branch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336901 (owner: 10Chad)