[00:01:51] (03CR) 10Paladox: [C: 031] gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn) [00:02:25] 10Operations, 10Traffic, 10netops: Japanese hotel resolving to esams and going the long way round - https://phabricator.wikimedia.org/T178726#3700933 (10Reedy) p:05Triage>03Low Mobile WiFi on the bus I'm currently on is resolving to ulsfo... So not broken for the whole country ;) Hopefully just one ISP [00:02:56] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/8408/" [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn) [00:04:00] (03PS11) 10Dzahn: gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074 [00:04:02] (03CR) 10Paladox: [C: 04-1] "Fails the diff, it is removing the port." [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn) [00:04:13] mutante you removed the port [00:04:14] :) [00:04:35] (03CR) 10Paladox: [C: 031] gerrit-ssh: don't listen on all interfaces, disable on slaves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn) [00:04:45] (03PS12) 10Dzahn: gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074 [00:05:03] paladox: yes, i did [00:05:09] 19:57 < mutante> :'port' may be omitted to use the default of 29418. [00:05:16] oh ok [00:05:35] I prefer being explicit :) [00:06:35] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds [00:06:38] (03PS13) 10Dzahn: gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074 [00:06:44] ok.. ok. it's back [00:06:57] (03CR) 10Paladox: [C: 031] gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn) [00:07:01] paladox: wow, i used inline editor :p [00:07:04] :) [00:08:31] no_justification wondering could you add me to this https://gerrit-review.googlesource.com/#/admin/groups/uuid-819ed1064786ed5c11fc9a1fe617b0103fd18d03 groups please. [00:08:35] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 579 bytes in 16.392 second response time [00:09:13] paladox: Nope cuz it won't load for me :p [00:09:17] oh [00:09:17] Stuck "Loading...." [00:09:20] Hehe [00:09:25] i thought you had +2 on there [00:09:27] Also it's 5pm on a friday, it's beer o'clock :) [00:09:28] ah [00:09:36] it's probaly slow. [00:09:43] and ok [00:09:47] admin/groups won't even load for me rn [00:10:14] Ah at the bottom: "Server Error: uuid-819ed1064786ed5c11fc9a1fe617b0103fd18d03" [00:10:22] Soooo, probably don't have permission to view? [00:10:24] oh isee [00:11:41] Funny, this works: https://gerrit-review.googlesource.com/admin/groups/904 [00:11:45] Just not by UUID [00:11:50] Prolly a bug? :) [00:12:30] oh lol [00:12:32] it's a poly bug [00:12:38] i desgned the page [00:12:44] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds [00:12:53] Anyway, remind me monday or something [00:12:55] * no_justification dips out [00:12:58] ok [00:13:34] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 579 bytes in 6.990 second response time [00:14:24] * paladox looks for a fix for the poly bug [00:14:35] ah it's the router [00:14:55] not actually the router but the router in poly it's called that. [00:19:29] 10Operations, 10Cloud-Services, 10Community-Wikimetrics, 10DBA, and 2 others: Evaluate future of wmf puppet module "mysql" - https://phabricator.wikimedia.org/T165625#3272009 (10Dzahn) I recently looked at this and found that quarry is one of the few (or the only) modules currently using this. I attempted... [00:22:34] 10Operations, 10Cloud-Services, 10Community-Wikimetrics, 10DBA, and 2 others: Evaluate future of wmf puppet module "mysql" - https://phabricator.wikimedia.org/T165625#3700943 (10Dzahn) ``` modules/quarry/manifests/database.pp: class { '::mysql::server': modules/role/manifests/wikimetrics/staging.pp:... [00:23:10] Can anyone purge the mobile app's cache? [00:23:17] currently showing a big ol' porn image [00:23:59] maybe mutante ? [00:24:24] https://wikitech.wikimedia.org/wiki/MobileFrontend#Flushing_the_cache perhaps [00:24:33] sorry, i don't know how [00:24:47] but you said app? [00:25:23] yea [00:26:01] the docs reference "fenari". that's not a good sign [00:26:13] because that doesnt exist since years [00:26:33] that's...definitely not the right cache [00:27:00] it's the mobileapps service I believe [00:27:18] https://wikitech.wikimedia.org/wiki/Mobileapps_(service) isn't very helpful :( [00:27:59] yeah, the cache on the article itself seems fine it's the main page that's the real problem [00:28:04] so we should contact one of https://www.mediawiki.org/wiki/Wikimedia_Reading_Infrastructure_team ? [00:28:43] Yeah, probably, tzatziki is pinging Josh too who may have a sense [00:29:16] tzatziki: Which page? If it's single page then usually and action=purge does help. [00:29:32] bearND: the main page [00:29:33] If it's part of the feed that's another story. [00:29:34] 10Operations, 10Cloud-Services, 10Community-Wikimetrics, 10DBA, and 2 others: Evaluate future of wmf puppet module "mysql" - https://phabricator.wikimedia.org/T165625#3272009 (10zhuyifei1999) >>! In T165625#3700943, @Dzahn wrote: > quarry (it's on trusty and has the precise repos, how does that even work r... [00:29:35] on the iOS app [00:29:50] tzatziki: what language? [00:30:16] bearND: English main feed I think. It's actually showing the old featured article (on the enWP page that changed) [00:30:27] (changed at midnight utc0 [00:31:03] the actual article is fine/not showing the image it's the front page/feed cache that's the issue [00:32:01] Jamesofur: oh, ok. I see it now. It's the previous days TFA. [00:32:14] yeah [00:32:17] no_justification: https://gerrit-review.googlesource.com/#/c/gerrit/+/135550/ :) [00:33:25] probably similar to https://phabricator.wikimedia.org/T174993 [00:34:51] since it's pageimages related. [00:35:57] (03PS1) 10Dzahn: rm requesttracker::labs class [puppet] - 10https://gerrit.wikimedia.org/r/385495 [00:40:35] bearND: Yeah, could be, https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&pilicense=any&titles=Boogeyman_2 seems to show the right image atm though not sure if it always has (just checked for the first time) [00:41:00] Not sure why it's still showing up in the app. https://en.wikipedia.org/api/rest_v1/feed/featured/2017/10/20 seems to be updated to show the correct thumbnail. [00:42:11] ok, on my desktop browser it's updated, but my Android device still shows the old version. [00:42:23] bearND: hmmm, weirdly when I click that link it is not updated [00:42:31] (on my laptop not my phone) [00:42:39] has the porn pic [00:42:49] no_justification bug is fixed now heh, just need to get upstream to merge. [00:43:51] * Josve05a is getting a few emails about the P0rn image at OTRS..guessing iOS cache/fetch issue from a few week ago? [00:45:00] Josve05a: Already reported, people are looking into it [00:45:04] Ty though [00:45:28] Yeah, just wanted to check if it is the same issue, so I know how to respond [00:45:31] I'm wondering if it's something in Varnish. I've purged that thing in RESTBase storage. [00:45:54] curl -H 'Cache-Control: no-cache' http://restbase1007.eqiad.wmnet:7231/en.wikipedia.org/v1/feed/featured/2017/10/20 [00:46:22] when I refresh the page it seems to have worked [00:47:03] I think tzatziki just saw it work on the ios app too \o/ [00:47:32] Yeah it seems to be good now!! [00:48:06] Ah, good. Back to dinner then. :) [00:48:31] thanks bearND ! [00:48:46] I just did a clean install of the beta version of the iOS app, it is fixed there too :) [00:49:00] Basically got to run this from the cluster to purge the feed content. That's also documented a bit here: https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps/Deployment_process#Troubleshooting_.26_Restarting_services [00:50:33] Pchelolo: mobrovac ^^ see backscroll about more vandalism (possibly another episode of https://phabricator.wikimedia.org/T174993) [00:51:57] thanks so much bearND|afk :D [01:04:14] PROBLEM - Check health of redis instance on 6481 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1508547849 600 - REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 4202408 keys, up 4 minutes 6 seconds - replication_delay is 1508547849 [01:04:15] PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1508547849 600 - REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 4205057 keys, up 4 minutes 6 seconds - replication_delay is 1508547849 [01:04:44] PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1508547880 600 - REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4200888 keys, up 4 minutes 37 seconds - replication_delay is 1508547880 [01:05:45] RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4201107 keys, up 5 minutes 37 seconds - replication_delay is 0 [01:06:15] RECOVERY - Check health of redis instance on 6481 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 4195972 keys, up 6 minutes 7 seconds - replication_delay is 0 [01:06:15] RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 4198865 keys, up 6 minutes 7 seconds - replication_delay is 0 [01:13:24] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [02:33:24] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [02:41:39] That's a lot of WriteThrough [02:43:24] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [03:13:24] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [03:26:14] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 785.13 seconds [04:00:24] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 256.84 seconds [04:25:41] (03CR) 10Dzahn: extdist: use profile::labs::lvm::srv instead of role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/385477 (owner: 10Hashar) [04:27:04] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [140.0] [04:36:38] Does anyone here have any contct with someone at Google (involved with Google Knowledge Graph perhps)? Iknow someone said in a Wikipedia group on Facebook to contact him if there as any issues with it, but I can't find him... [05:23:24] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [05:30:40] (03CR) 10Dzahn: [C: 04-2] "too old now to still rebase and be useful? as you can see i tried this back in May but yea..it's messy now" [puppet] - 10https://gerrit.wikimedia.org/r/355156 (owner: 10Dzahn) [05:32:14] (03Abandoned) 10Dzahn: contint: role/profile conversion [puppet] - 10https://gerrit.wikimedia.org/r/355156 (owner: 10Dzahn) [05:44:00] (03CR) 10Dzahn: "also one reason to delete this is that it's one of the few things left using the mysql class (T165625)" [puppet] - 10https://gerrit.wikimedia.org/r/385495 (owner: 10Dzahn) [05:51:12] (03PS10) 10Dzahn: gerrit: let Apache proxy only listen on service IP [puppet] - 10https://gerrit.wikimedia.org/r/354078 [05:52:18] (03CR) 10Dzahn: [C: 04-1] gerrit: let Apache proxy only listen on service IP (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn) [05:53:24] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [06:04:47] (03PS11) 10Dzahn: gerrit: let Apache proxy only listen on service IP [puppet] - 10https://gerrit.wikimedia.org/r/354078 [06:05:17] (03CR) 10jerkins-bot: [V: 04-1] gerrit: let Apache proxy only listen on service IP [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn) [06:09:22] (03PS12) 10Dzahn: gerrit: let Apache proxy only listen on service IP [puppet] - 10https://gerrit.wikimedia.org/r/354078 [06:11:27] (03CR) 10Dzahn: "re: comments on Freddy's change: he did nothing wrong, we were at Wikimania and i was showing him Gerrit and it was part of a demo in a wo" [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn) [06:15:10] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/8409/" [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn) [06:53:24] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [07:23:24] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [08:08:18] !log Stopping Zuul to flush its queue [08:08:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:29] !log Mass code-review+2 changes made by LibraryUpdater that already had Code-Review:+2 and NOT verified=-1 ( ping legoktm ) [08:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:49] (it is harmless mostly. We now have a bot that massively send patches to mediawiki extensions) [08:20:54] should be fine now [08:22:34] RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] [08:23:24] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [08:26:08] Work requests waiting in Zuul Gearman server -> it will come back eventually but I will monitor it over the next few hours [08:26:13] good week-end [09:43:25] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [10:23:25] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [10:53:25] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [11:17:28] 10Operations, 10Traffic, 10netops: Japanese hotel resolving to esams and going the long way round - https://phabricator.wikimedia.org/T178726#3701226 (10Reedy) [11:18:54] 10Operations, 10Traffic, 10netops: Japanese hotel resolving to esams and going the long way round - https://phabricator.wikimedia.org/T178726#3700842 (10Reedy) 05Open>03Invalid Seem this may have "resolved itself" Now: ``` $ dig en.wikipedia.org ; <<>> DiG 9.9.7-P3 <<>> en.wikipedia.org ;; global opti... [11:21:47] HI [11:21:50] I have one question [11:22:06] To add this patch: https://gerrit.wikimedia.org/r/#/c/385771/ on deployments table or no? [11:22:22] Optional, you can deploy this right now. :D [11:27:28] no, that is for core so it can go out in the next deployment cycle [11:27:48] s/cycle/train/ [11:30:32] when patch will be deployed? [11:31:57] when the train next happens, its just removing a unused language so its not urgent [11:33:43] although i'm not sure removing it is correct, so we still have wikis using it according to that task [11:35:45] PROBLEM - MariaDB Slave Lag: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 180056.95 seconds [11:49:50] 10Operations, 10media-storage, 10User-fgiunchedi: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3701236 (10Jcb) I have tried to delete the files several times in the past few days, but I cannot. [12:33:25] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [13:13:25] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [13:43:25] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [14:13:24] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [14:37:15] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 38 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [14:42:15] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 8 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [15:29:36] 10Operations, 10Goal, 10Technical-Debt, 10User-fgiunchedi: Reduce technical debt in metrics monitoring - https://phabricator.wikimedia.org/T177195#3701369 (10Dzahn) [15:29:39] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3701368 (10Dzahn) [15:33:25] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [16:03:25] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [16:05:57] 10Operations, 10ops-eqiad, 10Analytics: Possibly faulty BBU on analytics1029 - https://phabricator.wikimedia.org/T178742#3701391 (10elukey) [16:08:18] 10Operations, 10ops-eqiad, 10Analytics: Possibly faulty BBU on analytics1029 - https://phabricator.wikimedia.org/T178742#3701403 (10elukey) Tried with `sudo megacli -AdpBbuCmd -BbuLearn -aALL` but the battery state seems still to be unknown and not charging :( @Cmjohnson hi! I think that we might need a new... [16:08:28] going to ack the alarms for an1029 [16:09:05] ACKNOWLEDGEMENT - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough Elukey Probably faulty BBU battery - T178742 [16:54:41] (03PS1) 10Ladsgroup: labs: Disable reverted and wp10 in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385794 [16:57:31] (03CR) 10Ladsgroup: [C: 032] "The ores in labs is broken and this is a labs thing only :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385794 (owner: 10Ladsgroup) [17:06:10] (03Merged) 10jenkins-bot: labs: Disable reverted and wp10 in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385794 (owner: 10Ladsgroup) [17:21:46] (03CR) 10jenkins-bot: labs: Disable reverted and wp10 in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385794 (owner: 10Ladsgroup) [18:03:25] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [18:43:25] PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough [18:53:39] acked again and set downtime to avoid spam --^ [20:33:25] RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy [21:39:55] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:09:55] RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures