[00:00:06] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:02:27] PROBLEM - pdfrender on scb2003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:12:07] PROBLEM - puppet last run on planet1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:13:27] PROBLEM - puppet last run on analytics1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:15:47] RECOVERY - puppet last run on ores1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [00:16:37] RECOVERY - puppet last run on mc1029 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:17:28] RECOVERY - puppet last run on labvirt1006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [00:17:36] PROBLEM - puppet last run on mc1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:18:36] RECOVERY - puppet last run on etcd1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:21:26] RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:36] RECOVERY - puppet last run on planet1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:43:56] RECOVERY - puppet last run on analytics1043 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:45:26] PROBLEM - puppet last run on cp1073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:48:06] RECOVERY - puppet last run on mc1019 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [00:57:57] PROBLEM - puppet last run on analytics1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:01:26] PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:02:17] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb={GET,LIST,PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:02:56] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:03:36] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:03:46] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb={LIST,PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:03:56] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb={LIST,PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:05:06] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:06:26] PROBLEM - puppet last run on kafka1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:10:46] RECOVERY - puppet last run on cp1073 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:23:16] PROBLEM - puppet last run on wtp1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:23:17] RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:25:26] PROBLEM - puppet last run on db1090 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:31:56] RECOVERY - puppet last run on kafka1013 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [01:31:57] RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [01:35:17] RECOVERY - pdfrender on scb2003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time [01:35:26] PROBLEM - graphoid endpoints health on scb2003 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) timed out before a response was received [01:36:26] RECOVERY - graphoid endpoints health on scb2003 is OK: All endpoints are healthy [01:50:47] RECOVERY - puppet last run on db1090 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:53:46] RECOVERY - puppet last run on wtp1037 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [01:57:07] PROBLEM - High lag on wdqs2003 is CRITICAL: 3609 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [02:02:47] PROBLEM - puppet last run on ms-be1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:03:56] PROBLEM - puppet last run on prometheus1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:07:06] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:07:37] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:08:06] PROBLEM - High lag on wdqs2003 is CRITICAL: 4040 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [02:08:17] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:08:27] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:08:37] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:08:47] RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:12:47] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:26:56] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 22 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[zuul] [02:33:16] RECOVERY - puppet last run on ms-be1028 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [02:34:17] RECOVERY - puppet last run on prometheus1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [02:37:56] PROBLEM - puppet last run on restbase-dev1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:56:37] PROBLEM - puppet last run on mc1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:08:16] RECOVERY - puppet last run on restbase-dev1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [03:12:26] PROBLEM - puppet last run on wtp1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:13:37] RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [03:20:27] PROBLEM - puppet last run on multatuli is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:24:06] PROBLEM - kubelet operational latencies on kubernetes2003 is CRITICAL: instance=kubernetes2003.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [03:24:16] PROBLEM - kubelet operational latencies on kubernetes2004 is CRITICAL: instance=kubernetes2004.codfw.wmnet https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [03:26:27] PROBLEM - puppet last run on elastic1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:26:57] RECOVERY - puppet last run on mc1027 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:27:27] PROBLEM - High lag on wdqs2003 is CRITICAL: 7065 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [03:28:17] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [03:28:27] RECOVERY - kubelet operational latencies on kubernetes2004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [03:29:16] PROBLEM - puppet last run on labpuppetmaster1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:30:27] RECOVERY - kubelet operational latencies on kubernetes2003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [03:31:07] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 838.45 seconds [03:33:32] Hi, can anyone tell me if there are some general alarms or monitoring if a box has disk issues? [03:34:46] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [03:35:07] _joe_: ^ ? [03:36:57] mark: ^ ? [03:37:29] Seems like a box on the fundraising cluster is having disk trouble. Not sure if there's much that can be done, though [03:37:35] I think it's in eqiad [03:37:56] PROBLEM - puppet last run on elastic1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:39:06] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [03:42:56] RECOVERY - puppet last run on wtp1032 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [03:47:16] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 278.94 seconds [03:49:16] PROBLEM - puppet last run on mw1329 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:50:47] RECOVERY - puppet last run on multatuli is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:51:16] RECOVERY - High lag on wdqs2003 is OK: (C)3600 ge (W)1200 ge 1163 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [03:56:27] PROBLEM - High lag on wdqs2003 is CRITICAL: 8157 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [03:56:56] RECOVERY - puppet last run on elastic1051 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:59:37] RECOVERY - puppet last run on labpuppetmaster1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:08:17] RECOVERY - puppet last run on elastic1020 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [04:11:16] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [04:12:37] RECOVERY - High lag on wdqs2003 is OK: (C)3600 ge (W)1200 ge 854 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [04:17:27] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [04:19:46] RECOVERY - puppet last run on mw1329 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:21:56] PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:24:46] PROBLEM - puppet last run on ms-be1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:29:14] !log temp depooled wdqs2003 [04:29:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:32:27] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [04:43:17] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [04:49:57] RECOVERY - puppet last run on ms-be1024 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [04:52:26] RECOVERY - puppet last run on analytics1057 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:19:47] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [05:23:07] PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:23:57] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [05:48:36] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:17:16] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [06:21:36] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [07:42:47] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [07:44:56] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [08:34:33] AndyRussG|a-whey: which host? we do have checks for the raid status and disk space, but I don't see any of those in alarm on FR hosts right now [08:36:23] there is only one warning for disk space on bellatrix fwiw [08:44:29] (03PS1) 10Volans: cumin: fix alias query [puppet] - 10https://gerrit.wikimedia.org/r/465012 [09:04:36] (03PS2) 10Volans: cumin: fix alias query [puppet] - 10https://gerrit.wikimedia.org/r/465012 [09:04:38] (03PS1) 10Volans: cumin: alias checker, catch exception [puppet] - 10https://gerrit.wikimedia.org/r/465014 [09:57:48] (03CR) 10Framawiki: [C: 031] Additional namespaces for Governance Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/464752 (https://phabricator.wikimedia.org/T206173) (owner: 10Varnent) [10:03:46] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [10:05:56] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [11:06:06] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [11:10:16] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [11:27:26] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [11:29:36] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [13:21:26] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:33:36] PROBLEM - puppet last run on boron is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:48:06] PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:50:37] PROBLEM - puppet last run on wtp1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:50:56] PROBLEM - High lag on wdqs2001 is CRITICAL: 3619 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [13:51:47] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:58:36] PROBLEM - puppet last run on mw1314 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:13:25] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Paladox) note that the avatar would have to match the users username in gerrit. So using phabricator would not work seeing as the name of the file does not match the users username. [14:13:27] RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:18:54] (03PS11) 10Paladox: Gerrit: Setup avatars url in gerrit config [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) [14:21:06] RECOVERY - puppet last run on wtp1047 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:28:57] RECOVERY - puppet last run on mw1314 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:29:27] RECOVERY - puppet last run on boron is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:32:36] PROBLEM - High lag on wdqs2002 is CRITICAL: 3662 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [14:36:17] RECOVERY - High lag on wdqs2001 is OK: (C)3600 ge (W)1200 ge 377 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [14:41:46] PROBLEM - High lag on wdqs2001 is CRITICAL: 5765 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [14:42:06] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [14:49:10] volans: civi1001 [14:51:16] PROBLEM - puppet last run on mw1264 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:51:40] AndyRussG: all checks on icinga for that host are green, fwiw :) [14:52:50] !log repooling wdqs2003. Catched up on Lag and also Lag issues seems to be creeping on wdqs200[1|2] [14:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:07] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [14:58:28] volans: hmmm ok... I'm not the one who checked it. Quoting from Elliott (ejegg): "md3_raid1 and md3_resync have been in the top 3 in 'top' for a while." [14:59:16] Is there a dashboard or something that non-roots can see? [14:59:27] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:00:01] AndyRussG, are you looking for icinga.wm.o? [15:01:45] Krenair: maybe? [15:02:13] well what exactly are you looking for a dashboard of? [15:02:20] just the icinga checks? [15:03:45] Krenair: really I'm not sure. I just have ejegg's report (summarized above) ^ that it looks like there's some disk issues or something with civi1001 [15:04:27] Seems he's not around just now, but he did turn off a bunch of FR jobs [15:04:58] oh if it's frack I have no idea [15:05:04] sorry [15:06:08] Krenair: Mmm hey no worries, and thanks... It's part of the FR pipeline that I'm not as familiar with as others on the team [15:07:28] I got into https://icinga.wikimedia.org/icinga/, I see a lot of OK statuses for that box [15:08:06] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:11:47] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): The usual Lag pattern for wdqs2003 seems to be taking another turn - https://phabricator.wikimedia.org/T206423 (10Mathew.onipe) [15:12:27] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:12:43] AndyRussG: my bet is on mdadm monthly cron that checks the raid [15:12:54] as I don't have access to FR hosts cannot say much more [15:14:22] /usr/share/mdadm/checkarray [15:16:23] AndyRussG: https://grafana.wikimedia.org/dashboard/db/fundraising-host-overview?panelId=6&fullscreen&orgId=1&from=1538851574213&to=1538925343094&var-server=civi1001.frack.eqiad.wmnet&var-datasource=frack.eqiad%20prometheus [15:17:18] volans: ok cool, that's really useful [15:17:27] AndyRussG: if you've access to the host check /etc/cron.d/mdadm to see if the time matches [15:17:29] I'm adding this to the e-mail thread that ejegg started [15:17:38] K I might, gonna check [15:20:13] volans: yep that's it! [15:20:19] woohooo :) [15:21:32] for more info you can search in syslog to see at which time it did each md array [15:21:46] look for something like "data-check of RAID array" [15:21:46] RECOVERY - puppet last run on mw1264 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:24:46] volans: okok cool beans [15:24:48] AndyRussG: if it affects normal operations the check could be tuned to use less resources btw [15:24:53] :) [15:25:04] Ah cool also relevant [15:25:07] PROBLEM - puppet last run on analytics1070 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:25:14] I don't know if that was the case, but I'll mention :) [15:28:36] RECOVERY - High lag on wdqs2001 is OK: (C)3600 ge (W)1200 ge 36 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [15:31:13] (03PS2) 10Mathew.onipe: prometheus-blazegraph-exporter: added Query and Concurrency related counters [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/464854 (https://phabricator.wikimedia.org/T206123) [15:33:26] PROBLEM - puppet last run on analytics1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:33:57] PROBLEM - High lag on wdqs2001 is CRITICAL: 4631 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [15:37:56] PROBLEM - puppet last run on db1105 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:56] RECOVERY - High lag on wdqs2002 is OK: (C)3600 ge (W)1200 ge 56 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [15:41:47] PROBLEM - puppet last run on dns4002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:42:46] PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:56] PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:45:25] volans: thx so much! [15:46:45] AndyRussG: no prob at all, anytime :) [15:46:57] :) [15:55:36] RECOVERY - puppet last run on analytics1070 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:01:17] PROBLEM - puppet last run on deploy1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:02:26] PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:03:56] RECOVERY - puppet last run on analytics1067 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:05:17] PROBLEM - puppet last run on dbproxy1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:08:16] RECOVERY - puppet last run on db1105 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [16:12:16] RECOVERY - puppet last run on dns4002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:12:26] PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:13:06] RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:13:25] (03PS4) 10Mathew.onipe: wdqs: auto deployment of wdqs on wdqs1009 [puppet] - 10https://gerrit.wikimedia.org/r/464659 (https://phabricator.wikimedia.org/T197187) [16:14:24] (03CR) 10Mathew.onipe: wdqs: auto deployment of wdqs on wdqs1009 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/464659 (https://phabricator.wikimedia.org/T197187) (owner: 10Mathew.onipe) [16:15:26] RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:19:26] PROBLEM - puppet last run on roentgenium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:21:17] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:21:27] PROBLEM - puppet last run on analytics1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:24:27] PROBLEM - puppet last run on db1107 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:26:37] RECOVERY - puppet last run on deploy1001 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [16:27:47] RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:29:16] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:35:19] !log run a script in tmux (my username) on mw2201 to poll the status of a mcrouter key/route every 10s using its admin api (very lightweight but kill if needed) [16:35:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:47] RECOVERY - puppet last run on dbproxy1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:40:08] !log Reset user email for account "Dominic Mayers" (T206421) [16:40:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:12] T206421: Recover password for Dominic Mayers account - https://phabricator.wikimedia.org/T206421 [16:40:41] (03PS1) 10Urbanecm: Add throttle exception for Netherlands Hackathon October 2018 - Wiki Techstorm [mediawiki-config] - 10https://gerrit.wikimedia.org/r/465047 (https://phabricator.wikimedia.org/T206241) [16:41:46] PROBLEM - puppet last run on etcd1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:42:56] RECOVERY - puppet last run on rdb1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:44:46] RECOVERY - puppet last run on roentgenium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:51:46] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:51:56] RECOVERY - puppet last run on analytics1062 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:54:36] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:54:56] RECOVERY - puppet last run on db1107 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:07:06] RECOVERY - puppet last run on etcd1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:08:26] PROBLEM - puppet last run on bast5001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:14:36] PROBLEM - puppet last run on mw1331 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:27:26] PROBLEM - puppet last run on db1107 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:33:08] 10Operations, 10Thumbor: in Commons, some PDFs are failing to render thumbnails. - https://phabricator.wikimedia.org/T203402 (10Hrishikes) This is a duplicate of T196961. [17:35:36] PROBLEM - puppet last run on restbase1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:38:47] RECOVERY - puppet last run on bast5001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:39:47] RECOVERY - puppet last run on mw1331 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [17:41:23] (03PS20) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [17:41:25] (03PS21) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [17:52:37] PROBLEM - puppet last run on analytics1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:52:47] RECOVERY - puppet last run on db1107 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:54:47] PROBLEM - puppet last run on analytics1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:00:56] RECOVERY - puppet last run on restbase1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:06:36] PROBLEM - puppet last run on elastic1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:07:47] PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:14:38] PROBLEM - puppet last run on labnet1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:17:47] PROBLEM - puppet last run on cp3032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:22:27] RECOVERY - High lag on wdqs2001 is OK: (C)3600 ge (W)1200 ge 60 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [18:23:07] RECOVERY - puppet last run on analytics1046 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:25:16] RECOVERY - puppet last run on analytics1060 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:27:17] PROBLEM - puppet last run on ping1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:36:56] RECOVERY - puppet last run on elastic1020 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:38:16] RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:43:16] RECOVERY - puppet last run on cp3032 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [18:44:58] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:47:27] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:57:47] RECOVERY - puppet last run on ping1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:58:46] PROBLEM - puppet last run on logstash1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:59:17] PROBLEM - puppet last run on mw1319 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:08:07] PROBLEM - puppet last run on phab1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:08:56] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:12:46] RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:15:01] (03PS5) 10ArielGlenn: Add lexemes dump as separate dump [puppet] - 10https://gerrit.wikimedia.org/r/461862 (https://phabricator.wikimedia.org/T202830) (owner: 10Smalyshev) [19:16:38] (03CR) 10ArielGlenn: [C: 032] Add lexemes dump as separate dump [puppet] - 10https://gerrit.wikimedia.org/r/461862 (https://phabricator.wikimedia.org/T202830) (owner: 10Smalyshev) [19:28:16] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:29:16] RECOVERY - puppet last run on logstash1007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [19:29:37] RECOVERY - puppet last run on mw1319 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [19:33:17] PROBLEM - puppet last run on kafka1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:33:37] RECOVERY - puppet last run on phab1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [19:39:26] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [19:48:17] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:53:37] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [20:01:36] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:03:47] PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:03:47] RECOVERY - puppet last run on kafka1013 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [20:11:17] PROBLEM - puppet last run on mw1267 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:13:46] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:14:26] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:14:27] PROBLEM - puppet last run on ununpentium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:24:57] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:31:46] PROBLEM - puppet last run on cobalt is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:34:16] RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [20:34:58] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:41:37] RECOVERY - puppet last run on mw1267 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [20:44:47] RECOVERY - puppet last run on ununpentium is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [20:55:27] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [20:56:46] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:57:06] RECOVERY - puppet last run on cobalt is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:03:57] PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:05:17] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:17:26] PROBLEM - puppet last run on etcd1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:27:07] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [21:28:52] (03CR) 10Gehel: "Looks good, minor comment inline." (031 comment) [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/464854 (https://phabricator.wikimedia.org/T206123) (owner: 10Mathew.onipe) [21:34:08] (03CR) 10Gehel: [C: 04-1] "Almost good! See comments inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/464659 (https://phabricator.wikimedia.org/T197187) (owner: 10Mathew.onipe) [21:34:19] RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [21:42:30] PROBLEM - puppet last run on mw1313 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:47:30] RECOVERY - puppet last run on etcd1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:51:49] PROBLEM - puppet last run on dns5001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:52:59] PROBLEM - puppet last run on elastic1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:12:43] RECOVERY - puppet last run on mw1313 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [22:13:03] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:19:44] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:21:54] RECOVERY - puppet last run on dns5001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [22:23:04] RECOVERY - puppet last run on elastic1041 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [22:23:53] PROBLEM - puppet last run on thumbor1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:38:14] RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:50:04] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:54:14] RECOVERY - puppet last run on thumbor1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:05:43] PROBLEM - puppet last run on db1105 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:12:14] PROBLEM - puppet last run on mw1255 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:13:34] PROBLEM - puppet last run on ununpentium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:24:34] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [23:30:54] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [23:35:54] RECOVERY - puppet last run on db1105 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:37:34] RECOVERY - puppet last run on mw1255 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:42:33] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:43:54] RECOVERY - puppet last run on ununpentium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:44:14] PROBLEM - puppet last run on kafka-jumbo1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:57:54] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [23:58:54] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen