[00:01:53] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:01:57] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:02:41] PROBLEM - Request latencies on acrab is CRITICAL: instance=10.192.16.26:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:03:59] RECOVERY - Request latencies on acrab is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:05:51] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:12:17] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:13:29] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:14:51] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:21:13] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:21:17] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:22:29] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:23:51] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [00:58:29] PROBLEM - puppet last run on analytics1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:08:31] PROBLEM - puppet last run on db1074 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:14:02] (03PS7) 10Alex Monk: Re-combine labs and production exim minimal config [puppet] - 10https://gerrit.wikimedia.org/r/439774 [01:30:07] RECOVERY - puppet last run on analytics1061 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [01:34:03] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:40:11] RECOVERY - puppet last run on db1074 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [02:00:27] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [02:13:05] PROBLEM - puppet last run on db1122 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:39:31] RECOVERY - puppet last run on db1122 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:07:57] PROBLEM - puppet last run on mw1339 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:34:19] RECOVERY - puppet last run on mw1339 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:48:03] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:14:23] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:21] PROBLEM - puppet last run on analytics1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:28:27] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 45 probes of 400 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [04:33:45] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 17 probes of 400 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [04:53:43] RECOVERY - puppet last run on analytics1063 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:02:43] PROBLEM - WDQS HTTP Port on wdqs1009 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time [05:03:15] PROBLEM - Blazegraph Port for wdqs-categories on wdqs1009 is CRITICAL: connect to address 127.0.0.1 and port 9990: Connection refused [05:03:19] PROBLEM - Blazegraph Port for wdqs-blazegraph on wdqs1009 is CRITICAL: connect to address 127.0.0.1 and port 9999: Connection refused [05:03:35] PROBLEM - Blazegraph process -wdqs-categories- on wdqs1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 499 (blazegraph), regex args ^java .* --port 9990 .* blazegraph-service-.*war [05:03:55] PROBLEM - Check systemd state on wdqs1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:04:25] PROBLEM - Blazegraph process -wdqs-blazegraph- on wdqs1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 499 (blazegraph), regex args ^java .* --port 9999 .* blazegraph-service-.*war [05:04:32] Crap [05:04:54] I'm checking this [05:09:35] RECOVERY - Blazegraph process -wdqs-blazegraph- on wdqs1009 is OK: PROCS OK: 1 process with UID = 499 (blazegraph), regex args ^java .* --port 9999 .* blazegraph-service-.*war [05:13:27] PROBLEM - Blazegraph process -wdqs-blazegraph- on wdqs1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 499 (blazegraph), regex args ^java .* --port 9999 .* blazegraph-service-.*war [05:19:59] RECOVERY - Blazegraph Port for wdqs-categories on wdqs1009 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9990 [05:20:21] RECOVERY - Blazegraph process -wdqs-categories- on wdqs1009 is OK: PROCS OK: 1 process with UID = 499 (blazegraph), regex args ^java .* --port 9990 .* blazegraph-service-.*war [05:20:41] RECOVERY - Check systemd state on wdqs1009 is OK: OK - running: The system is fully operational [05:20:45] RECOVERY - WDQS HTTP Port on wdqs1009 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.058 second response time [05:21:11] RECOVERY - Blazegraph process -wdqs-blazegraph- on wdqs1009 is OK: PROCS OK: 1 process with UID = 499 (blazegraph), regex args ^java .* --port 9999 .* blazegraph-service-.*war [05:21:21] RECOVERY - Blazegraph Port for wdqs-blazegraph on wdqs1009 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 [05:32:00] 10Operations, 10Discovery-Search, 10Wikidata, 10Wikidata-Query-Service: Blazegraph and updater failed on wdqs1009 - https://phabricator.wikimedia.org/T219052 (10Mathew.onipe) [06:31:49] PROBLEM - puppet last run on db1079 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/x509-bundle] [06:58:09] RECOVERY - puppet last run on db1079 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [09:04:38] (03CR) 10D3r1ck01: "nit" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490633 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [09:05:09] (03CR) 10D3r1ck01: "nit" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490104 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [09:48:26] (03PS10) 10Alex Monk: Allow acme-chief to provide unified cert [puppet] - 10https://gerrit.wikimedia.org/r/497929 (https://phabricator.wikimedia.org/T182927) [12:30:20] (03PS1) 10Mholloway: Fix: Point WikimediaEditorTasks at the correct DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498637 [12:32:56] (03CR) 10Mholloway: [C: 03+2] Fix: Point WikimediaEditorTasks at the correct DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498637 (owner: 10Mholloway) [12:34:01] (03Merged) 10jenkins-bot: Fix: Point WikimediaEditorTasks at the correct DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498637 (owner: 10Mholloway) [12:36:42] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config (duration: 00m 52s) [12:36:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:13] (03CR) 10jenkins-bot: Fix: Point WikimediaEditorTasks at the correct DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498637 (owner: 10Mholloway) [12:43:18] jouncebot: next [12:43:18] In 45 hour(s) and 46 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190325T1030) [12:43:24] jouncebot: now [12:43:24] No deployments scheduled for the next 45 hour(s) and 46 minute(s) [12:57:40] (03PS1) 10Mholloway: Fix: Point ReadingLists & WikimediaEditorTasks at the correct DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498638 [12:58:52] (03CR) 10Mholloway: [C: 03+2] Fix: Point ReadingLists & WikimediaEditorTasks at the correct DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498638 (owner: 10Mholloway) [12:59:56] (03Merged) 10jenkins-bot: Fix: Point ReadingLists & WikimediaEditorTasks at the correct DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498638 (owner: 10Mholloway) [13:02:22] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config, take 2 (duration: 00m 50s) [13:02:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:06] (03CR) 10jenkins-bot: Fix: Point ReadingLists & WikimediaEditorTasks at the correct DB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498638 (owner: 10Mholloway) [13:07:05] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:23:00] (03PS1) 10Alex Monk: uwsgi::app: Handle ensure absent [puppet] - 10https://gerrit.wikimedia.org/r/498641 [13:23:43] (03CR) 10jerkins-bot: [V: 04-1] uwsgi::app: Handle ensure absent [puppet] - 10https://gerrit.wikimedia.org/r/498641 (owner: 10Alex Monk) [13:24:27] (03CR) 10Alex Monk: Expose some PuppetDB values to netmon via microservice (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/496836 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [13:28:11] (03PS2) 10Alex Monk: uwsgi::app: Handle ensure absent [puppet] - 10https://gerrit.wikimedia.org/r/498641 [13:28:39] (03CR) 10jerkins-bot: [V: 04-1] uwsgi::app: Handle ensure absent [puppet] - 10https://gerrit.wikimedia.org/r/498641 (owner: 10Alex Monk) [13:29:35] (03CR) 10Alex Monk: "uh:" [puppet] - 10https://gerrit.wikimedia.org/r/498641 (owner: 10Alex Monk) [13:30:37] PROBLEM - puppet last run on cloudvirt1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:33:28] (03PS1) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498642 [13:33:31] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:35:18] (03CR) 10Paladox: [V: 03+2 C: 03+2] Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498642 (owner: 10Paladox) [13:35:34] (03Abandoned) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498642 (owner: 10Paladox) [13:36:00] (03PS1) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498643 [13:36:11] (03CR) 10Paladox: [V: 03+2 C: 03+2] Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498643 (owner: 10Paladox) [13:36:29] (03PS2) 10Paladox: Update image-diff plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498427 [13:48:28] (03PS1) 10Alex Monk: ferm::service: Allow ensure absent without proto/port [puppet] - 10https://gerrit.wikimedia.org/r/498645 [13:49:23] (03CR) 10jerkins-bot: [V: 04-1] ferm::service: Allow ensure absent without proto/port [puppet] - 10https://gerrit.wikimedia.org/r/498645 (owner: 10Alex Monk) [13:57:03] RECOVERY - puppet last run on cloudvirt1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:22:34] (03PS3) 10Paladox: Add readonly plugin [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498448 [14:22:59] (03PS4) 10Paladox: Add readonly plugin [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498448 [15:36:59] (03CR) 10Paladox: [V: 03+2 C: 03+2] "Verified locally that it builds." [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498448 (owner: 10Paladox) [15:40:16] (03PS1) 10Paladox: Add readonly plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498651 [15:41:19] (03PS2) 10Paladox: Add readonly plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498651 [15:42:04] (03PS3) 10Paladox: Add readonly plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498651 [15:42:19] (03CR) 10Paladox: [V: 03+2 C: 03+2] Add readonly plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498651 (owner: 10Paladox) [16:26:35] 10Operations, 10Discovery-Search, 10Wikidata, 10Wikidata-Query-Service: Blazegraph and updater failed on wdqs1009 - https://phabricator.wikimedia.org/T219052 (10Smalyshev) 05Open→03Resolved a:03Smalyshev Sorry, my fault - I checked broken commit into deploy repo, and wdq9 deploys automatically from i... [16:46:55] PROBLEM - puppet last run on kraz is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:18:35] RECOVERY - puppet last run on kraz is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:39:23] (03PS3) 10Alex Monk: uwsgi::app: Handle ensure absent [puppet] - 10https://gerrit.wikimedia.org/r/498641 [17:40:04] (03CR) 10jerkins-bot: [V: 04-1] uwsgi::app: Handle ensure absent [puppet] - 10https://gerrit.wikimedia.org/r/498641 (owner: 10Alex Monk) [17:41:05] 10Operations, 10Cloud-VPS, 10Toolforge, 10LDAP, and 2 others: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 (10GTirloni) After Toolforge Stretch was moved away from seaborgium/serpens, these two servers stopped exhibiting the memor... [18:06:31] PROBLEM - eventstreams on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 8092: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/eventstreams [18:08:35] PROBLEM - graphoid endpoints health on scb1001 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [18:09:05] RECOVERY - eventstreams on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1043 bytes in 0.004 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/eventstreams [18:09:45] RECOVERY - graphoid endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [18:10:12] (03PS1) 10Alex Monk: nrpe::monitor_service: Allow ensure absent without description/nrpe_command [puppet] - 10https://gerrit.wikimedia.org/r/498655 [18:11:05] (03CR) 10jerkins-bot: [V: 04-1] nrpe::monitor_service: Allow ensure absent without description/nrpe_command [puppet] - 10https://gerrit.wikimedia.org/r/498655 (owner: 10Alex Monk) [18:19:47] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Puppet has 18 failures. Last run 3 minutes ago with 18 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[cpjobqueue/deploy],Exec[chown /srv/deployment/cpjobqueue for deploy-service],Package[recommendation-api/deploy] [18:35:35] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:40:13] (03PS2) 10Ammarpad: Wikimaniawiki: Enable visual editor in 2019 namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497682 (https://phabricator.wikimedia.org/T218645) [19:27:01] (03PS1) 10Alaa Sarhan: Add wgScoreLineWidthInches to labs config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498660 (https://phabricator.wikimedia.org/T218191) [19:28:31] (03CR) 10jerkins-bot: [V: 04-1] Add wgScoreLineWidthInches to labs config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498660 (https://phabricator.wikimedia.org/T218191) (owner: 10Alaa Sarhan) [19:29:07] (03PS1) 10Alaa Sarhan: Add wgScoreLineWidthInches to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498661 (https://phabricator.wikimedia.org/T218191) [19:58:20] (03PS2) 10Alaa Sarhan: Add wgScoreLineWidthInches to labs config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498660 (https://phabricator.wikimedia.org/T218191) [19:59:22] (03CR) 10jerkins-bot: [V: 04-1] Add wgScoreLineWidthInches to labs config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498660 (https://phabricator.wikimedia.org/T218191) (owner: 10Alaa Sarhan) [20:00:50] (03PS3) 10Alaa Sarhan: Add wgScoreLineWidthInches to labs config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498660 (https://phabricator.wikimedia.org/T218191) [20:44:05] (03PS2) 10Alex Monk: nrpe::monitor_service: Allow ensure absent without description/nrpe_command [puppet] - 10https://gerrit.wikimedia.org/r/498655 [20:44:59] (03CR) 10jerkins-bot: [V: 04-1] nrpe::monitor_service: Allow ensure absent without description/nrpe_command [puppet] - 10https://gerrit.wikimedia.org/r/498655 (owner: 10Alex Monk) [21:24:54] 10Operations, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): Ferm rules for labstore NFS hosts - https://phabricator.wikimedia.org/T165136 (10GTirloni) [21:25:39] 10Operations, 10Data-Services, 10Tracking, 10cloud-services-team (Kanban): overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083 (10GTirloni) [21:38:46] 10Operations, 10Data-Services, 10cloud-services-team (Kanban): Convert labstore cluster configuration to hiera and profiles - https://phabricator.wikimedia.org/T161835 (10GTirloni) [21:42:28] 10Operations, 10Data-Services, 10video2commons, 10cloud-services-team (Kanban): Consider mounting labs NFS labstore1003.eqiad.wmnet:/scratch for server-side uploads - https://phabricator.wikimedia.org/T153068 (10GTirloni) [21:42:48] 10Puppet, 10Toolforge, 10Goal, 10Patch-For-Review, 10cloud-services-team (Kanban): Fully puppetize Grid Engine - https://phabricator.wikimedia.org/T88711 (10GTirloni) [21:47:52] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10User-Smalyshev, 10cloud-services-team (Kanban): Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10GTirloni) [21:52:57] 10Operations, 10Operations-Software-Development, 10Continuous-Integration-Config, 10Patch-For-Review, 10cloud-services-team (Kanban): Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169 (10GTirloni) [21:53:41] 10Operations, 10Data-Services, 10cloud-services-team (Kanban): evaluate possibility for nscd use with useldap - https://phabricator.wikimedia.org/T124991 (10GTirloni) [22:41:41] PROBLEM - puppet last run on dbproxy1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:08:03] RECOVERY - puppet last run on dbproxy1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:10:51] 10Operations, 10Cloud-VPS, 10Toolforge, 10LDAP, and 2 others: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 (10Krenair) > ...; or > Production traffic also causes memory leaks Or non-production non-toolforge traffic :) [23:14:03] (03PS3) 10Alex Monk: nrpe::monitor_service and monitoring::service: Allow ensure absent without description/nrpe_command [puppet] - 10https://gerrit.wikimedia.org/r/498655 [23:14:44] (03CR) 10jerkins-bot: [V: 04-1] nrpe::monitor_service and monitoring::service: Allow ensure absent without description/nrpe_command [puppet] - 10https://gerrit.wikimedia.org/r/498655 (owner: 10Alex Monk) [23:18:53] PROBLEM - puppet last run on an-worker1090 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:27:39] PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:45:17] RECOVERY - puppet last run on an-worker1090 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:59:15] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures