[00:01:56] RECOVERY - puppet last run on cp3032 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:10:45] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:14:42] 06Operations, 10Domains, 10Traffic, 06WMF-Legal: Use .wiki domains instead of .org on wiki sites owned by wikimedia foundation - https://phabricator.wikimedia.org/T145907#2644754 (10Peachey88) Also, our TLD is on millions(?)/billions(?) of printed materials, it's not a decision we can take lightly. Even wi... [00:34:15] (03PS2) 10Dzahn: salt: add Icinga plugin to check for unaccepted keys [puppet] - 10https://gerrit.wikimedia.org/r/311079 (https://phabricator.wikimedia.org/T144801) [00:37:38] RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [00:43:21] 06Operations, 10Icinga, 13Patch-For-Review: Icinga check for unaccepted Salt keys - https://phabricator.wikimedia.org/T144801#2645101 (10Dzahn) a:03Dzahn [00:43:24] (03PS3) 10Dzahn: salt: add Icinga plugin to check for unaccepted keys [puppet] - 10https://gerrit.wikimedia.org/r/311079 (https://phabricator.wikimedia.org/T144801) [00:54:03] (03PS4) 10Dzahn: salt: add Icinga plugin to check for unaccepted keys [puppet] - 10https://gerrit.wikimedia.org/r/311079 (https://phabricator.wikimedia.org/T144801) [00:56:51] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/4112/" [puppet] - 10https://gerrit.wikimedia.org/r/311079 (https://phabricator.wikimedia.org/T144801) (owner: 10Dzahn) [00:57:07] (03PS5) 10Dzahn: salt: add Icinga plugin to check for unaccepted keys [puppet] - 10https://gerrit.wikimedia.org/r/311079 (https://phabricator.wikimedia.org/T144801) [01:21:13] PROBLEM - puppet last run on mw2084 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:28:16] (03PS1) 10Dzahn: salt/icinga: fix permissions to run plugin script [puppet] - 10https://gerrit.wikimedia.org/r/311205 [01:28:55] (03CR) 10Dzahn: [C: 032] salt/icinga: fix permissions to run plugin script [puppet] - 10https://gerrit.wikimedia.org/r/311205 (owner: 10Dzahn) [01:29:53] (03PS2) 10Dzahn: salt/icinga: fix permissions to run plugin script [puppet] - 10https://gerrit.wikimedia.org/r/311205 [01:47:36] RECOVERY - puppet last run on mw2084 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:49:38] PROBLEM - Postgres Replication Lag on maps-test2004 is CRITICAL: CRITICAL - Rep Delay is: 1801.675768 Seconds [01:52:06] RECOVERY - Postgres Replication Lag on maps-test2004 is OK: OK - Rep Delay is: 122.622784 Seconds [01:59:23] 06Operations, 10Icinga, 13Patch-For-Review: Icinga check for unaccepted Salt keys - https://phabricator.wikimedia.org/T144801#2645118 (10Dzahn) it would work, but there is an issue with sudo config. for some reason nagios user still gets a password prompt, even though there is NOPASSWD and the format is exa... [02:02:32] mutante, so here's the thing [02:02:46] I'm pretty sure the nagios script is run as the nagios user [02:02:48] no sudo required [02:03:09] if you look into the script, it uses sudo */usr/bin/salt-key -l u* [02:03:14] that's what needs a sudo rule [02:03:41] current rule allows nagios to do sudo /path/to/your/script [02:04:34] er, */usr/bin/salt-key -l un* [02:04:35] with the n [02:04:38] but you get the idea [02:27:30] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.18) (duration: 11m 40s) [02:27:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:34:33] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Sep 17 02:34:33 UTC 2016 (duration 7m 3s) [02:34:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:36:35] (03PS1) 10Smalyshev: Add config for units on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311206 [02:41:37] PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:06:17] 06Operations, 10Phabricator: networking: allow ssh between iridium and phab2001 - https://phabricator.wikimedia.org/T143363#2645176 (10mmodell) [03:08:29] RECOVERY - puppet last run on ms-be2023 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:11:24] 06Operations, 10Phabricator: networking: allow ssh between iridium and phab2001 - https://phabricator.wikimedia.org/T143363#2645178 (10mmodell) ssh from iridium to phab2001 still isn't working.. :( [03:18:26] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: CRITICAL - Rep Delay is: 1806.932054 Seconds [03:19:54] 07Puppet, 06Labs: Expose public hostname as Fact in puppet - https://phabricator.wikimedia.org/T101903#1350963 (10AlexMonk-WMF) Most projects should now be able to use rdns on ec2_public_ipv4 and use some magic to only pick the instance-{name}.{project}.wmflabs.org result Note there are some projects that don... [03:20:47] RECOVERY - Postgres Replication Lag on maps2003 is OK: OK - Rep Delay is: 127.364036 Seconds [03:23:12] 06Operations, 10DBA: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#2645185 (10jcrespo) > The server has behaved fine and I didn't see any significant issue. > Check size of conntrack table > > Notifications for this service have been disabled > WARNING 2016-09-17 03:19:29 1d 13h... [03:30:09] 06Operations, 10Phabricator: networking: allow ssh between iridium and phab2001 - https://phabricator.wikimedia.org/T143363#2645192 (10mmodell) a:05mmodell>03None ok from phab2001 I can ssh to `10.64.32.186` but not to `10.64.32.150` and `host iridium.eqiad.wmnet` resolves to: ``` iridium.eqiad.wmnet has... [03:30:15] 06Operations, 06Labs, 13Patch-For-Review: update star.wmflabs.org cert from sha1 to sha256 - https://phabricator.wikimedia.org/T104017#2645198 (10AlexMonk-WMF) [04:00:02] (03PS1) 10Smalyshev: Bump batch size for WDQS updater to 500 [puppet] - 10https://gerrit.wikimedia.org/r/311209 [04:03:50] (03CR) 10Alex Monk: "What is this 10.68.42 IP exactly?" [puppet] - 10https://gerrit.wikimedia.org/r/297315 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [04:13:11] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:16:03] (03PS1) 10Alex Monk: openstack nova network: update private_ips of instances [puppet] - 10https://gerrit.wikimedia.org/r/311210 [04:23:51] (03PS1) 10Alex Monk: Attempt to fix salt key monitoring sudo rule [puppet] - 10https://gerrit.wikimedia.org/r/311211 (https://phabricator.wikimedia.org/T144801) [04:24:52] (03CR) 10jenkins-bot: [V: 04-1] Attempt to fix salt key monitoring sudo rule [puppet] - 10https://gerrit.wikimedia.org/r/311211 (https://phabricator.wikimedia.org/T144801) (owner: 10Alex Monk) [04:24:59] (03CR) 10Alex Monk: "untested" [puppet] - 10https://gerrit.wikimedia.org/r/311211 (https://phabricator.wikimedia.org/T144801) (owner: 10Alex Monk) [04:25:41] (03PS2) 10Alex Monk: Attempt to fix salt key monitoring sudo rule [puppet] - 10https://gerrit.wikimedia.org/r/311211 (https://phabricator.wikimedia.org/T144801) [04:37:46] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [04:40:10] PROBLEM - puppet last run on ms-be2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:58:19] (03PS1) 10Alex Monk: labs firstboot.sh: Add instance hostname to /etc/hosts [puppet] - 10https://gerrit.wikimedia.org/r/311212 (https://phabricator.wikimedia.org/T120830) [05:07:12] RECOVERY - puppet last run on ms-be2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:32:37] PROBLEM - Disk space on puppetmaster1001 is CRITICAL: DISK CRITICAL - free space: /var/lib/puppet 1115 MB (3% inode=97%) [06:09:57] (03PS5) 10Yuvipanda: labs: Add a per-project puppetmaster role [puppet] - 10https://gerrit.wikimedia.org/r/311163 [06:15:19] (03PS6) 10Yuvipanda: labs: Add a per-project puppetmaster role [puppet] - 10https://gerrit.wikimedia.org/r/311163 [06:20:12] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:20:16] (03PS7) 10Yuvipanda: labs: Add a per-project puppetmaster role [puppet] - 10https://gerrit.wikimedia.org/r/311163 [06:21:43] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:22:04] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:23:57] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:24:34] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:25:59] PROBLEM - puppet last run on labvirt1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:27:53] PROBLEM - puppet last run on ganeti2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:05] PROBLEM - puppet last run on ms-be1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:15] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:25] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:33] PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:35] PROBLEM - puppet last run on labvirt1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:43] PROBLEM - puppet last run on cp3039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:54] PROBLEM - puppet last run on dbproxy1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:54] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:02] PROBLEM - puppet last run on db1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:03] PROBLEM - puppet last run on heze is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:03] PROBLEM - puppet last run on mc2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:13] PROBLEM - puppet last run on pybal-test2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:13] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:14] PROBLEM - puppet last run on restbase1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:14] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:14] PROBLEM - puppet last run on ms-be1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:26] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:33] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:34] PROBLEM - puppet last run on db1077 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:34] PROBLEM - puppet last run on mw2118 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:43] PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:43] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:44] PROBLEM - puppet last run on mw2217 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:45] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:29:54] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:30:24] PROBLEM - puppet last run on db1084 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:30:24] PROBLEM - puppet last run on rdb2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:30:35] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:30:45] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:30:56] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:30:56] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:30:57] PROBLEM - puppet last run on elastic2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:31:23] PROBLEM - puppet last run on ms-be2025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:31:36] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:31:42] PROBLEM - puppet last run on mc2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:31:42] PROBLEM - puppet last run on labsdb1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:31:42] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:31:55] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:32:04] (03PS8) 10Yuvipanda: labs: Add a per-project puppetmaster role [puppet] - 10https://gerrit.wikimedia.org/r/311163 [06:32:15] PROBLEM - puppet last run on wtp2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:32:24] PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:32:34] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:32:35] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:32:48] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:32:53] PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:33:04] PROBLEM - puppet last run on mw2228 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:33:04] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:33:14] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:34:34] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:06] PROBLEM - puppet last run on restbase1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:15] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:26] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:35] PROBLEM - puppet last run on ms-be2015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:37] PROBLEM - puppet last run on planet2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:47] PROBLEM - puppet last run on analytics1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:56] PROBLEM - puppet last run on mw1288 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:57] PROBLEM - puppet last run on mw1257 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:58] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:36:06] PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:36:18] PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:36:18] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:36:26] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:36:41] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:36:45] RECOVERY - puppet last run on restbase1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:36:45] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:37:25] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:37:57] PROBLEM - puppet last run on labvirt1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:16] PROBLEM - puppet last run on mw1224 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:16] PROBLEM - puppet last run on ununpentium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:17] PROBLEM - puppet last run on dbproxy1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:37] PROBLEM - puppet last run on lvs1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:37] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:42] PROBLEM - puppet last run on mw2232 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:42] PROBLEM - puppet last run on nihal is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:44] PROBLEM - puppet last run on mw1250 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:55] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:38:57] PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:10] PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:11] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:15] PROBLEM - puppet last run on mw2180 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:36] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:36] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:36] PROBLEM - puppet last run on mw2132 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:36] PROBLEM - puppet last run on analytics1055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:45] PROBLEM - puppet last run on lvs1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:48] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:39:56] PROBLEM - puppet last run on kafka1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:13] PROBLEM - puppet last run on restbase2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:14] PROBLEM - puppet last run on restbase1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:15] PROBLEM - puppet last run on wtp1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:25] PROBLEM - puppet last run on db1090 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:25] PROBLEM - puppet last run on wtp2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:26] PROBLEM - puppet last run on mc2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:26] PROBLEM - puppet last run on mc2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:36] PROBLEM - puppet last run on mw2245 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:37] PROBLEM - puppet last run on mw2128 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:46] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:40:52] Apparently no-one is worried about this flood of alerts? [06:40:57] PROBLEM - puppet last run on meitnerium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:14] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:25] PROBLEM - puppet last run on elastic2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:26] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:26] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:27] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:41] PROBLEM - puppet last run on db2063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:45] PROBLEM - puppet last run on hassaleh is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:45] PROBLEM - puppet last run on mw2239 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:45] PROBLEM - puppet last run on mw2154 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:42:06] PROBLEM - puppet last run on db1092 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:42:17] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:42:43] PROBLEM - puppet last run on mw2213 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:42:43] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:42:55] RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:42:55] PROBLEM - puppet last run on cp2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:42:57] PROBLEM - puppet last run on restbase-test2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:42:57] PROBLEM - puppet last run on ganeti1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:42:57] PROBLEM - puppet last run on dbproxy1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:05] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:05] PROBLEM - puppet last run on aqs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:06] PROBLEM - puppet last run on mw2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:06] PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:06] PROBLEM - puppet last run on mw2190 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:06] PROBLEM - puppet last run on mw2137 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:16] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:16] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:16] PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:44] PROBLEM - puppet last run on analytics1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:44] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:44] PROBLEM - puppet last run on nitrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:56] PROBLEM - puppet last run on mw1272 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:43:57] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:44:11] PROBLEM - puppet last run on es2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:44:27] PROBLEM - puppet last run on wtp1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:44:45] PROBLEM - puppet last run on thumbor1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:44:47] PROBLEM - puppet last run on ms-be2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:44:47] PROBLEM - puppet last run on restbase2009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:44:52] o_O [06:44:57] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:45:15] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:45:15] PROBLEM - puppet last run on db1075 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:45:26] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:45:27] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:45:36] PROBLEM - puppet last run on mw2151 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:45:36] PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:45:46] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:45:57] PROBLEM - puppet last run on cp1074 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:46:11] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:46:19] [22:32:37] PROBLEM - Disk space on puppetmaster1001 is CRITICAL: DISK CRITICAL - free space: /var/lib/puppet 1115 MB (3% inode=97%) [06:46:26] PROBLEM - puppet last run on mc1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:46:42] PROBLEM - puppet last run on cp2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:46:42] PROBLEM - puppet last run on mw2246 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:46:43] PROBLEM - puppet last run on wtp2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:46:45] PROBLEM - puppet last run on mw2160 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:47:07] PROBLEM - puppet last run on dbproxy1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:47:17] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:47:17] PROBLEM - puppet last run on mw1283 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:47:17] PROBLEM - puppet last run on db2069 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:47:17] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:25] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:47:26] PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:47:43] PROBLEM - puppet last run on mw2201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:47:57] PROBLEM - puppet last run on mw2105 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:06] PROBLEM - puppet last run on mw2225 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:06] PROBLEM - puppet last run on mw2164 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:25] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:26] PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:27] PROBLEM - puppet last run on ganeti1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:27] PROBLEM - puppet last run on aqs1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:41] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:42] PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:55] PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:56] PROBLEM - puppet last run on mw2188 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:57] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:49:10] PROBLEM - puppet last run on elastic2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:49:16] PROBLEM - puppet last run on mw2117 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:49:36] PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:49:36] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:49:37] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:49:46] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:50:12] PROBLEM - puppet last run on mw2113 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:50:12] PROBLEM - puppet last run on mw1298 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:50:12] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:50:35] PROBLEM - puppet last run on cp2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:50:35] PROBLEM - puppet last run on rdb2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:50:46] PROBLEM - puppet last run on mw1267 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:50:57] RECOVERY - puppet last run on labvirt1005 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:51:25] PROBLEM - puppet last run on mw2083 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:51:38] PROBLEM - puppet last run on ms-fe1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:51:40] PROBLEM - puppet last run on db2054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:51:57] PROBLEM - puppet last run on restbase1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:07] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:25] PROBLEM - puppet last run on oresrdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:38] PROBLEM - puppet last run on db2065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:42] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:45] PROBLEM - puppet last run on cp1099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:45] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:52] 06Operations: PROBLEM - Disk space on puppetmaster1001 is CRITICAL: DISK CRITICAL - free space: /var/lib/puppet 1115 MB (3% inode=97%) - https://phabricator.wikimedia.org/T145924#2645305 (10Legoktm) [06:52:55] PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:56] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:57] RECOVERY - puppet last run on ganeti2002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:53:06] PROBLEM - puppet last run on mw1304 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:53:15] RECOVERY - puppet last run on ms-be1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:53:16] RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:53:40] RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:53:42] PROBLEM - puppet last run on rdb1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:53:56] RECOVERY - puppet last run on cp3039 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:53:56] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:53:56] RECOVERY - puppet last run on dbproxy1008 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:53:56] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:53:56] RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:54:09] RECOVERY - puppet last run on heze is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:54:09] RECOVERY - puppet last run on mc2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:54:10] RECOVERY - puppet last run on pybal-test2002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:54:25] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:54:26] RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:54:26] PROBLEM - puppet last run on es1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:54:35] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:54:36] RECOVERY - puppet last run on mw2118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:54:37] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:54:54] !log silenced (+q) icinga-wm in operations channel, due to channel spam from low disk space on puppetm1001 [06:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:55:49] !log see T145924 or email to ops list for more info [06:55:50] T145924: PROBLEM - Disk space on puppetmaster1001 is CRITICAL: DISK CRITICAL - free space: /var/lib/puppet 1115 MB (3% inode=97%) - https://phabricator.wikimedia.org/T145924 [06:55:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:06:20] !log set +z on -operations, allows messages sent by +b or +q users (normally blocked) to be seen by users that currently op'ed [07:06:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:15:26] <_joe_> !log enlarged puppet partition on puppetmaster1001, rendered full by reports [08:15:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:23:01] <_joe_> p858snake: you can de-silence icinga-wm whenever you want :) [08:41:17] RECOVERY - puppet last run on es1013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:41:18] RECOVERY - puppet last run on mw2099 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [08:41:27] RECOVERY - puppet last run on restbase2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:41:27] RECOVERY - puppet last run on mw1265 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:41:36] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [08:41:36] RECOVERY - puppet last run on cp1072 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [08:41:37] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:41:47] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:42:19] RECOVERY - puppet last run on cp2025 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [08:42:20] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [08:42:29] RECOVERY - puppet last run on cp2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:42:29] RECOVERY - puppet last run on mw2243 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [08:42:29] RECOVERY - puppet last run on mw2112 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:42:42] RECOVERY - puppet last run on mw1266 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [08:42:44] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [08:43:10] RECOVERY - puppet last run on db2069 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:43:10] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:43:20] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:43:49] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:44:53] RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:44:53] RECOVERY - puppet last run on maps2002 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [08:44:53] RECOVERY - puppet last run on mw2114 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [08:44:54] RECOVERY - puppet last run on mw1300 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [08:44:54] RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [08:45:03] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:45:03] RECOVERY - puppet last run on mw2109 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:45:33] RECOVERY - puppet last run on db1033 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:45:34] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:45:54] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:46:25] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:46:34] RECOVERY - puppet last run on mw2247 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:47:24] RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:58:37] PROBLEM - puppet last run on maps-test2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:24:12] RECOVERY - puppet last run on maps-test2003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:38:33] 06Operations: PROBLEM - Disk space on puppetmaster1001 is CRITICAL: DISK CRITICAL - free space: /var/lib/puppet 1115 MB (3% inode=97%) - https://phabricator.wikimedia.org/T145924#2645407 (10Joe) 05Open>03Resolved p:05Triage>03Unbreak! a:03Joe [09:42:21] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled] [09:44:31] (03PS1) 10Giuseppe Lavagetto: realm: convert main_ipaddress and site into facts [puppet] - 10https://gerrit.wikimedia.org/r/311223 (https://phabricator.wikimedia.org/T85459) [10:00:09] (03CR) 10MarcoAurelio: "> Should use a generic group name of technical administrator (but not" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308448 (https://phabricator.wikimedia.org/T144599) (owner: 10MarcoAurelio) [10:01:52] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apparmor.d/abstractions/ssl_certs] [10:05:15] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [10:12:17] 06Operations, 10Mail: Delivery failed to eng-admin - https://phabricator.wikimedia.org/T145800#2645452 (10faidon) [10:14:18] 06Operations, 10Mail: Delivery failed to eng-admin - https://phabricator.wikimedia.org/T145800#2641471 (10faidon) LDAP replication between corp and prod seems to be broken — I've verified that there is e.g. an eng-admin group in the corp LDAP and is configured correctly, but the change has not been replicated... [10:17:51] (03CR) 10MarcoAurelio: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285209 (https://phabricator.wikimedia.org/T133564) (owner: 10Dereckson) [10:18:04] (03CR) 10jenkins-bot: [V: 04-1] Reconfigure interface editor group on ur.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285209 (https://phabricator.wikimedia.org/T133564) (owner: 10Dereckson) [10:19:55] (03CR) 10MarcoAurelio: [C: 031] "Yes, this should be restricted to 'crats as it happens on all wikis. Notwithstanding, this patch needs to be rebased." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285209 (https://phabricator.wikimedia.org/T133564) (owner: 10Dereckson) [10:24:05] 06Operations, 10Mail: Delivery failed to eng-admin - https://phabricator.wikimedia.org/T145800#2645457 (10MoritzMuehlenhoff) @bbogaert: Where's the yubikey attribute coming from, is that a custom schema extensions? There're various unofficial schema extensions for Yubico, but I couldn't find one where the attr... [10:26:52] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:29:20] (03CR) 10Faidon Liambotis: [C: 04-1] "Well, the fact that it's helpful, doesn't mean it's conceptually right :) The site of a host isn't something that it should be host-contro" [puppet] - 10https://gerrit.wikimedia.org/r/311223 (https://phabricator.wikimedia.org/T85459) (owner: 10Giuseppe Lavagetto) [10:37:35] (03CR) 10Giuseppe Lavagetto: "The issue is that, right now, we do ask the agent to tell us the site it's in: this is absolutely equivalent to compute the value server-s" [puppet] - 10https://gerrit.wikimedia.org/r/311223 (https://phabricator.wikimedia.org/T85459) (owner: 10Giuseppe Lavagetto) [10:39:12] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:39:20] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:39:23] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:39:23] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:39:23] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:39:23] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:39:23] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:39:23] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:39:28] <_joe_> ouch [10:39:42] <_joe_> I have no idea how to fix this :) [10:40:09] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:40:09] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:40:23] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:40:24] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:40:28] <_joe_> let's see [10:40:36] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:40:37] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:40:43] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:40:44] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:40:53] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:41:34] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:41:35] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:41:35] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 responds with malformed body: list index out of range [10:44:00] <_joe_> I am debugging now [10:44:09] <_joe_> seems like a content change [10:46:55] (03CR) 10Aude: "suggest that unitConfig.json be renamed to include "wikibase" or "Wikidata" in the filename, so it's purpose is more clear." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311206 (owner: 10Smalyshev) [10:47:31] <_joe_> yes, it's a news Item with an empty links list, and it happens to be the first one [10:48:21] <_joe_> I have no idea what caused this though [10:48:23] <_joe_> curl -L https://en.wikipedia.org/api/rest_v1/feed/featured/2016/04/29 | jq .news[0] [10:49:38] <_joe_> I won't page anyone as it doesn't seem critical [10:50:35] <_joe_> that's common to the feed of any day, it appears [10:52:40] @seen MatmaRex [10:52:40] Steinsplitter: Last time I saw MatmaRex they were quitting the network with reason: Quit: Miranda N/A at 9/17/2016 12:11:48 AM (10h40m51s ago) [10:52:53] I will let you know when I see MatmaRex around here [10:52:53] @notify MatmaRex [10:58:36] <_joe_> I just acknowledged all of the rb alerts [10:58:45] <_joe_> I'll follow up on monday if needed [12:07:22] (03PS1) 10Ladsgroup: ORES default threshold to high for wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311229 (https://phabricator.wikimedia.org/T144784) [12:09:31] (03PS3) 10Urbanecm: Throttling rule for RCL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311086 (https://phabricator.wikimedia.org/T145838) [12:11:07] (03PS3) 10Urbanecm: Throttle for RCL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311088 (https://phabricator.wikimedia.org/T145838) [12:30:50] (03CR) 10Aude: [C: 04-1] "would like unitConfig.json renamed to be more clear it's for Wikidata / Wikibase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311206 (owner: 10Smalyshev) [12:32:57] 06Operations, 10Domains, 10Traffic, 06WMF-Legal: Use .wiki domains instead of .org on wiki sites owned by wikimedia foundation - https://phabricator.wikimedia.org/T145907#2644754 (10tom29739) Why not just redirect the .wiki domains to the .org domains? [12:33:32] (03CR) 10Aude: Add config for units on Wikidata (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311206 (owner: 10Smalyshev) [12:56:18] 06Operations, 10Domains, 10Traffic, 06WMF-Legal: Use .wiki domains instead of .org on wiki sites owned by wikimedia foundation - https://phabricator.wikimedia.org/T145907#2645554 (10Aklapper) 05Open>03declined Declining this task as the summary says //instead of .org//. >>! In T145907#2645540, @tom297... [13:01:45] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/2: down - Transit: Zayo (IPYX/125449/002/ZYO) {#11402} [10Gbps]BR [13:14:00] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [14:21:18] (03PS4) 10BBlack: varnish: restart upload backends once a day [puppet] - 10https://gerrit.wikimedia.org/r/311142 (owner: 10Ema) [14:23:47] (03CR) 10BBlack: [C: 032] varnish: restart upload backends once a day [puppet] - 10https://gerrit.wikimedia.org/r/311142 (owner: 10Ema) [14:48:01] (03PS1) 10BBlack: upload: splay restart cron a little better [puppet] - 10https://gerrit.wikimedia.org/r/311232 [14:49:14] (03CR) 10BBlack: [C: 032 V: 032] upload: splay restart cron a little better [puppet] - 10https://gerrit.wikimedia.org/r/311232 (owner: 10BBlack) [14:54:38] (03PS1) 10BBlack: upload: drop FE size limit 1MB->512KB [puppet] - 10https://gerrit.wikimedia.org/r/311233 [15:03:27] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:15:48] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/lib/nagios/plugins/check-fresh-files-in-dir.py] [15:26:23] (03PS1) 10BBlack: upload: splay restart cron a lot better [puppet] - 10https://gerrit.wikimedia.org/r/311234 [15:27:57] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [15:28:09] (03CR) 10BBlack: [C: 032] upload: drop FE size limit 1MB->512KB [puppet] - 10https://gerrit.wikimedia.org/r/311233 (owner: 10BBlack) [15:31:20] PROBLEM - puppet last run on mw2077 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:00] (03CR) 10BBlack: [C: 032] upload: splay restart cron a lot better [puppet] - 10https://gerrit.wikimedia.org/r/311234 (owner: 10BBlack) [15:40:40] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:58:17] RECOVERY - puppet last run on mw2077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:59:45] <_joe_> we're having way more puppet failures than before [16:00:16] (03PS1) 10BBlack: upload: splay restart cron ever betterer [puppet] - 10https://gerrit.wikimedia.org/r/311235 [16:01:23] (03CR) 10BBlack: [C: 032 V: 032] upload: splay restart cron ever betterer [puppet] - 10https://gerrit.wikimedia.org/r/311235 (owner: 10BBlack) [16:12:56] (03PS1) 10BBlack: Revert "upload: splay restart cron ever betterer" [puppet] - 10https://gerrit.wikimedia.org/r/311236 [16:13:06] (03CR) 10BBlack: [C: 032 V: 032] Revert "upload: splay restart cron ever betterer" [puppet] - 10https://gerrit.wikimedia.org/r/311236 (owner: 10BBlack) [16:32:34] 06Operations, 10MediaWiki-API, 07Availability, 07HHVM: HHVM is leaking memory on the API appservers - https://phabricator.wikimedia.org/T133674#2645688 (10Southparkfan) Is this the same as the recent behavior seen on various API appservers? https://ganglia.wikimedia.org/latest/?c=API%20application%20server... [16:39:05] <_joe_> SPF|Cloud: thanks for noticing, I'm not really focused on the appservers lately [16:39:24] <_joe_> and btw what I see here is that hhvm hasn't crashed for weeks on those machines [16:40:01] Yeah, since all R410s have been replaced by newer 64GB machines (and the R420s have 64GB RAM as well), there is more headroom for memory leaks [16:40:09] <_joe_> yes [16:40:48] <_joe_> !log rolling restart of HHVM on part fo the API cluster in eqiad, T133674 [16:40:49] T133674: HHVM is leaking memory on the API appservers - https://phabricator.wikimedia.org/T133674 [16:40:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:41:26] but I do hope HHVM won't crash at the same time on those servers, that would be very painful.. I'm also not sure what the performance impact (loading time/max concurrent users before slowdown) of the higher CPU usage is [16:41:37] restarting HHVM is wise yeah. [16:42:09] <_joe_> yes, usually, HHVM (like php itself) crashes often enough that we can just disregard such issues [16:42:19] <_joe_> it became way too stable in recent releases :P [16:43:05] or you can use the extra time to debug the cause of the memory leaks ;-) [16:43:22] <_joe_> well, given the memory profiler isn't working in 3.12 [16:43:31] <_joe_> that is, it crashes hhvm pretty soon [16:43:40] <_joe_> it's quite hard to do at the moment [16:43:57] <_joe_> I also suspect the ever growing sqlite file that keeps the bytecode cache [16:44:01] <_joe_> is another pain there [16:44:42] <_joe_> SPF|Cloud: trust me, I have no such thing as "extra time" :P [16:44:50] haha :p [16:47:03] I've played a bit with HHVM in the past, but I have zero experience with tools such as memory profilers, gdb, and whatever you can use.. I got tired of all crashes I experienced, so I decided to use php-fpm again... :( [16:48:21] <_joe_> SPF|Cloud: HHVM certainly required an adventurous soul and a lot of resources to get deployed, a couple of years back [16:48:27] <_joe_> now I think it's way more stable [16:48:42] <_joe_> but then again, I think the benefits are mostly there for pretty large websites [16:49:30] <_joe_> for us specifically, given how dependent we are on applayer-level expensive parsing, it proved a definitive gain [16:50:30] of course. I just wished these issues can be fixed quicker, but I can imagine that is a very hard task [17:12:05] (03PS3) 10Dzahn: Attempt to fix salt key monitoring sudo rule [puppet] - 10https://gerrit.wikimedia.org/r/311211 (https://phabricator.wikimedia.org/T144801) (owner: 10Alex Monk) [17:14:49] (03CR) 10Dzahn: [C: 032] "thank you, yes, that's it :)" [puppet] - 10https://gerrit.wikimedia.org/r/311211 (https://phabricator.wikimedia.org/T144801) (owner: 10Alex Monk) [17:28:24] 06Operations, 10Icinga, 13Patch-For-Review: Icinga check for unaccepted Salt keys - https://phabricator.wikimedia.org/T144801#2645734 (10Dzahn) Thanks @Krenair for that fix works now https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=neodymium&service=unaccepted+salt+keys https://icinga.... [17:28:36] 06Operations, 10Icinga, 13Patch-For-Review: Icinga check for unaccepted Salt keys - https://phabricator.wikimedia.org/T144801#2645736 (10Dzahn) 05Open>03Resolved [17:29:05] 06Operations, 10Icinga: Icinga check for unaccepted Salt keys - https://phabricator.wikimedia.org/T144801#2610651 (10Dzahn) [17:33:46] mutante, why does it go UNKNOWN status on labcontrol1001? [17:37:17] seems it does this if `/usr/bin/salt-key -l un` doesn't contain 'Unaccepted' [18:18:42] Krenair: it was just taking a bit longer to recover, it's all working now [18:25:41] (03PS1) 10BBlack: cron_splay() with first use in cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/311239 [18:26:22] (03CR) 10BBlack: [C: 04-1] "This is very much an untested WIP at this point" [puppet] - 10https://gerrit.wikimedia.org/r/311239 (owner: 10BBlack) [18:26:57] (03CR) 10jenkins-bot: [V: 04-1] cron_splay() with first use in cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/311239 (owner: 10BBlack) [18:33:26] (03PS2) 10BBlack: cron_splay() with first use in cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/311239 [18:41:09] 06Operations, 06Operations-Software-Development: Evaluation of automation/orchestration tools - https://phabricator.wikimedia.org/T143306#2645777 (10Volans) [18:41:43] ah ok [18:42:18] (03PS3) 10BBlack: cron_splay() with first use in cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/311239 [18:49:46] PROBLEM - puppet last run on cp2026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:54:59] (03PS4) 10BBlack: cron_splay() with first use in cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/311239 [19:04:47] (03PS5) 10BBlack: cron_splay() with first use in cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/311239 [19:09:18] (03CR) 10BBlack: [C: 032] cron_splay() with first use in cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/311239 (owner: 10BBlack) [19:12:06] RECOVERY - puppet last run on cp2026 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [19:16:27] <_joe_> bblack: nice :) [19:53:33] (03CR) 10Smalyshev: Add config for units on Wikidata (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311206 (owner: 10Smalyshev) [20:34:46] PROBLEM - puppet last run on db2049 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apparmor.d/abstractions/ssl_certs] [21:01:51] RECOVERY - puppet last run on db2049 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:34:42] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [50.0] [21:49:18] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:46:16] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 618 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4525413 keys - replication_delay is 618 [22:51:06] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4504853 keys - replication_delay is 1 [22:58:11] PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:24:53] RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures