[00:00:04] RoanKattouw, ^d, marktraceur, MaxSem, ^d: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141203T0000). Please do the needful. [00:02:36] <^d> ebernhardson: Did you grab my change with that scap? [00:02:40] <^d> I don't think so :) [00:02:41] <^d> It's k. [00:02:55] (03CR) 10Chad: [C: 032] Create new pool counter for prefix searches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176931 (owner: 10Manybubbles) [00:03:05] (03Merged) 10jenkins-bot: Create new pool counter for prefix searches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176931 (owner: 10Manybubbles) [00:03:31] ^d: doh, yea [00:03:44] <^d> Depending on where you are in scap I might be tagging along. [00:03:48] <^d> Just merged + pulled to tin [00:05:12] its mid way through sync-proxies. can sync-file again once scap is done to make sure its consistent [00:06:43] <^d> okie dokie [00:12:44] (03PS1) 10Kaldari: Turning on wgWikiGrokDebug on en BetaLabs and removing obsolete comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177134 [00:17:03] RoanKattouw, ^d, marktraceur, MaxSem: Who's doing the SWAT deployment today? I have a late addition - just a Beta Labs config change [00:18:12] <^d> ebernhardson is :) [00:21:35] (03PS9) 10BryanDavis: logstash: Forward syslog events for apache2 + hhvm [puppet] - 10https://gerrit.wikimedia.org/r/176693 [00:22:13] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: puppet fail [00:22:13] PROBLEM - puppet last run on es1004 is CRITICAL: CRITICAL: puppet fail [00:22:13] PROBLEM - puppet last run on tmh1001 is CRITICAL: CRITICAL: puppet fail [00:22:13] PROBLEM - puppet last run on dysprosium is CRITICAL: CRITICAL: Puppet has 21 failures [00:22:14] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: puppet fail [00:22:20] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: Puppet has 21 failures [00:22:20] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: puppet fail [00:22:29] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: Puppet has 43 failures [00:22:52] PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: puppet fail [00:22:52] PROBLEM - puppet last run on mw1047 is CRITICAL: CRITICAL: puppet fail [00:23:03] PROBLEM - puppet last run on analytics1021 is CRITICAL: CRITICAL: Puppet has 21 failures [00:23:09] PROBLEM - puppet last run on virt1005 is CRITICAL: CRITICAL: puppet fail [00:23:09] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: Puppet has 24 failures [00:23:10] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: puppet fail [00:23:10] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: puppet fail [00:23:16] kaldari|2: sure i can ship that, scap is running atm 93% on sync-common [00:23:20] PROBLEM - puppet last run on analytics1012 is CRITICAL: CRITICAL: Puppet has 29 failures [00:23:21] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: Puppet has 1 failures [00:23:21] PROBLEM - puppet last run on elastic1009 is CRITICAL: CRITICAL: puppet fail [00:23:22] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: Puppet has 67 failures [00:23:24] PROBLEM - puppet last run on mw1070 is CRITICAL: CRITICAL: puppet fail [00:23:29] PROBLEM - puppet last run on db1049 is CRITICAL: CRITICAL: puppet fail [00:23:29] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet has 22 failures [00:23:29] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Puppet has 11 failures [00:23:33] ebernhardson: Much obligued! [00:23:40] PROBLEM - puppet last run on search1008 is CRITICAL: CRITICAL: puppet fail [00:23:40] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: puppet fail [00:23:40] PROBLEM - puppet last run on mw1073 is CRITICAL: CRITICAL: Puppet has 72 failures [00:23:40] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: puppet fail [00:23:40] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: Puppet has 65 failures [00:23:41] PROBLEM - puppet last run on mw1131 is CRITICAL: CRITICAL: Puppet has 56 failures [00:23:41] PROBLEM - puppet last run on mw1104 is CRITICAL: CRITICAL: Puppet has 45 failures [00:23:42] PROBLEM - puppet last run on es2006 is CRITICAL: CRITICAL: Puppet has 16 failures [00:23:42] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: Puppet has 23 failures [00:23:43] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Puppet has 42 failures [00:23:43] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: puppet fail [00:23:44] PROBLEM - puppet last run on labsdb1002 is CRITICAL: CRITICAL: Puppet has 24 failures [00:23:58] PROBLEM - puppet last run on mw1113 is CRITICAL: CRITICAL: Puppet has 57 failures [00:24:00] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Puppet has 62 failures [00:24:03] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Puppet has 68 failures [00:24:03] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: puppet fail [00:24:03] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: Puppet has 23 failures [00:24:03] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail [00:24:03] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Puppet has 35 failures [00:24:04] PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: puppet fail [00:24:04] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Puppet has 26 failures [00:24:05] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: Puppet has 23 failures [00:24:05] PROBLEM - puppet last run on elastic1025 is CRITICAL: CRITICAL: Puppet has 24 failures [00:24:06] PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: puppet fail [00:24:06] PROBLEM - puppet last run on db1027 is CRITICAL: CRITICAL: Puppet has 24 failures [00:24:15] PROBLEM - puppet last run on search1019 is CRITICAL: CRITICAL: puppet fail [00:24:15] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: puppet fail [00:24:15] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: Puppet has 37 failures [00:24:16] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: Puppet has 32 failures [00:24:16] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Puppet has 21 failures [00:24:16] PROBLEM - puppet last run on db2010 is CRITICAL: CRITICAL: Puppet has 24 failures [00:24:16] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: Puppet has 26 failures [00:24:16] PROBLEM - puppet last run on search1003 is CRITICAL: CRITICAL: Puppet has 50 failures [00:24:19] spagewmf, ebernhardson - are you done depl? [00:24:24] yurikR: still going [00:24:26] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet has 72 failures [00:24:27] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: puppet fail [00:24:27] PROBLEM - puppet last run on search1021 is CRITICAL: CRITICAL: Puppet has 50 failures [00:24:28] PROBLEM - puppet last run on mw1075 is CRITICAL: CRITICAL: puppet fail [00:24:33] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Puppet has 19 failures [00:24:33] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: Puppet has 36 failures [00:24:33] PROBLEM - puppet last run on lvs2005 is CRITICAL: CRITICAL: Puppet has 17 failures [00:24:33] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 29 failures [00:24:33] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet has 32 failures [00:24:44] PROBLEM - puppet last run on analytics1024 is CRITICAL: CRITICAL: Puppet has 21 failures [00:24:50] yurikR: swat was empty at the time, so let it run over. then we had a few late stragglers :) [00:24:53] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: puppet fail [00:24:54] PROBLEM - puppet last run on elastic1010 is CRITICAL: CRITICAL: puppet fail [00:24:54] PROBLEM - puppet last run on mw1085 is CRITICAL: CRITICAL: puppet fail [00:24:54] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Puppet has 78 failures [00:24:54] PROBLEM - puppetmaster backend https on palladium is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8141: HTTP/1.1 500 Internal Server Error [00:25:00] PROBLEM - puppet last run on elastic1016 is CRITICAL: CRITICAL: Puppet has 28 failures [00:25:14] PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: puppet fail [00:25:14] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Puppet has 30 failures [00:25:14] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 26 failures [00:25:14] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 33 failures [00:25:14] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Puppet has 73 failures [00:25:24] hehe. ebernhardson, could you ping me when done? i have a minor sec patch to push out [00:25:24] PROBLEM - puppet last run on elastic1013 is CRITICAL: CRITICAL: Puppet has 22 failures [00:25:24] PROBLEM - puppet last run on search1009 is CRITICAL: CRITICAL: puppet fail [00:25:24] PROBLEM - puppet last run on zinc is CRITICAL: CRITICAL: Puppet has 18 failures [00:25:24] PROBLEM - puppet last run on mw1083 is CRITICAL: CRITICAL: puppet fail [00:25:24] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: puppet fail [00:25:25] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [00:25:25] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: puppet fail [00:25:34] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: puppet fail [00:25:41] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: Puppet has 15 failures [00:25:41] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Puppet has 29 failures [00:25:41] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: puppet fail [00:25:41] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: puppet fail [00:25:44] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: puppet fail [00:25:44] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: Puppet has 24 failures [00:25:44] PROBLEM - puppet last run on es1005 is CRITICAL: CRITICAL: puppet fail [00:25:44] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 26 failures [00:25:44] PROBLEM - puppet last run on db1005 is CRITICAL: CRITICAL: Puppet has 25 failures [00:25:45] PROBLEM - puppet last run on mw1094 is CRITICAL: CRITICAL: puppet fail [00:25:45] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: Puppet has 57 failures [00:25:54] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: Puppet has 28 failures [00:25:54] PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Puppet has 23 failures [00:25:55] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: Puppet has 71 failures [00:25:55] PROBLEM - puppet last run on ms-be2015 is CRITICAL: CRITICAL: puppet fail [00:25:55] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: Puppet has 73 failures [00:25:55] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: puppet fail [00:25:55] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: Puppet has 23 failures [00:26:06] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: puppet fail [00:26:15] PROBLEM - puppet last run on erbium is CRITICAL: CRITICAL: puppet fail [00:26:20] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: puppet fail [00:26:25] PROBLEM - puppet last run on mw1096 is CRITICAL: CRITICAL: Puppet has 77 failures [00:26:25] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Puppet has 21 failures [00:26:25] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: puppet fail [00:26:26] PROBLEM - puppet last run on mw1245 is CRITICAL: CRITICAL: puppet fail [00:26:26] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: puppet fail [00:26:35] PROBLEM - puppet last run on db1007 is CRITICAL: CRITICAL: puppet fail [00:26:35] PROBLEM - puppet last run on mw1015 is CRITICAL: CRITICAL: Puppet has 52 failures [00:26:35] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 70 failures [00:26:35] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: puppet fail [00:26:36] PROBLEM - puppet last run on mw1095 is CRITICAL: CRITICAL: Puppet has 76 failures [00:26:44] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: puppet fail [00:26:47] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: puppet fail [00:26:47] PROBLEM - puppet last run on mw1035 is CRITICAL: CRITICAL: Puppet has 81 failures [00:26:48] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: Puppet has 19 failures [00:26:48] PROBLEM - puppet last run on db1041 is CRITICAL: CRITICAL: Puppet has 30 failures [00:26:48] PROBLEM - puppet last run on db1058 is CRITICAL: CRITICAL: puppet fail [00:26:48] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: puppet fail [00:26:48] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Puppet has 28 failures [00:26:48] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: puppet fail [00:26:54] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: Puppet has 23 failures [00:26:55] PROBLEM - puppet last run on elastic1026 is CRITICAL: CRITICAL: puppet fail [00:26:55] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: puppet fail [00:26:55] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: puppet fail [00:26:55] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: Puppet has 68 failures [00:26:57] PROBLEM - puppet last run on ytterbium is CRITICAL: CRITICAL: Puppet has 30 failures [00:26:57] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: Puppet has 28 failures [00:26:57] PROBLEM - puppet last run on wtp1019 is CRITICAL: CRITICAL: puppet fail [00:26:57] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Puppet has 15 failures [00:26:57] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: puppet fail [00:26:57] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: puppet fail [00:27:12] PROBLEM - puppet last run on es1003 is CRITICAL: CRITICAL: puppet fail [00:27:12] PROBLEM - puppet last run on search1014 is CRITICAL: CRITICAL: puppet fail [00:27:12] PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Puppet has 26 failures [00:27:12] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Puppet has 34 failures [00:27:12] PROBLEM - puppet last run on mc1008 is CRITICAL: CRITICAL: puppet fail [00:27:13] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: Puppet has 29 failures [00:27:13] PROBLEM - puppet last run on mw1031 is CRITICAL: CRITICAL: puppet fail [00:27:14] PROBLEM - puppet last run on cp1043 is CRITICAL: CRITICAL: Puppet has 24 failures [00:27:14] PROBLEM - puppet last run on wtp1021 is CRITICAL: CRITICAL: puppet fail [00:27:15] PROBLEM - puppet last run on zirconium is CRITICAL: CRITICAL: Puppet has 45 failures [00:27:15] PROBLEM - puppet last run on ms-fe1003 is CRITICAL: CRITICAL: puppet fail [00:27:16] PROBLEM - puppet last run on analytics1019 is CRITICAL: CRITICAL: Puppet has 20 failures [00:27:16] PROBLEM - puppet last run on mw1028 is CRITICAL: CRITICAL: puppet fail [00:27:17] PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: Puppet has 23 failures [00:27:25] heh [00:27:25] PROBLEM - puppet last run on virt1002 is CRITICAL: CRITICAL: Puppet has 24 failures [00:27:28] PROBLEM - puppet last run on mw1062 is CRITICAL: CRITICAL: puppet fail [00:27:29] PROBLEM - puppet last run on mw1124 is CRITICAL: CRITICAL: puppet fail [00:27:30] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Puppet has 81 failures [00:27:30] PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Puppet has 75 failures [00:27:31] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: puppet fail [00:27:38] PROBLEM - puppet last run on elastic1028 is CRITICAL: CRITICAL: Puppet has 22 failures [00:27:39] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: puppet fail [00:27:40] PROBLEM - puppet last run on es1009 is CRITICAL: CRITICAL: puppet fail [00:27:40] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Puppet has 27 failures [00:27:40] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: puppet fail [00:27:45] PROBLEM - puppet last run on mw1109 is CRITICAL: CRITICAL: puppet fail [00:27:45] PROBLEM - puppet last run on mw1089 is CRITICAL: CRITICAL: puppet fail [00:27:45] PROBLEM - puppet last run on mw1106 is CRITICAL: CRITICAL: puppet fail [00:27:45] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: puppet fail [00:27:45] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: Puppet has 69 failures [00:27:46] PROBLEM - puppet last run on tmh1002 is CRITICAL: CRITICAL: Puppet has 63 failures [00:27:46] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: puppet fail [00:27:47] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Puppet has 72 failures [00:27:47] PROBLEM - puppet last run on mw1036 is CRITICAL: CRITICAL: puppet fail [00:27:48] PROBLEM - puppet last run on mw1080 is CRITICAL: CRITICAL: Puppet has 75 failures [00:27:48] PROBLEM - puppet last run on virt1009 is CRITICAL: CRITICAL: Puppet has 21 failures [00:27:55] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 28 failures [00:27:55] PROBLEM - puppet last run on mw1257 is CRITICAL: CRITICAL: puppet fail [00:27:55] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: puppet fail [00:27:55] PROBLEM - puppet last run on db1010 is CRITICAL: CRITICAL: puppet fail [00:27:55] (03CR) 10BryanDavis: "This is a production no-op and fixes a bug in beta. It would be nice to see it merged." [puppet] - 10https://gerrit.wikimedia.org/r/176191 (owner: 10BryanDavis) [00:28:17] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Puppet has 69 failures [00:28:26] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: Puppet has 20 failures [00:28:26] PROBLEM - puppet last run on mw1067 is CRITICAL: CRITICAL: puppet fail [00:28:26] PROBLEM - puppet last run on mc1016 is CRITICAL: CRITICAL: puppet fail [00:28:26] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: puppet fail [00:28:26] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: Puppet has 94 failures [00:28:35] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Puppet has 86 failures [00:28:35] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: Puppet has 18 failures [00:28:36] PROBLEM - puppet last run on db1053 is CRITICAL: CRITICAL: puppet fail [00:28:36] PROBLEM - puppet last run on logstash1003 is CRITICAL: CRITICAL: Puppet has 24 failures [00:28:36] PROBLEM - puppet last run on mw1252 is CRITICAL: CRITICAL: puppet fail [00:28:36] PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: Puppet has 21 failures [00:28:36] PROBLEM - puppet last run on search1020 is CRITICAL: CRITICAL: puppet fail [00:28:37] PROBLEM - puppet last run on mw1040 is CRITICAL: CRITICAL: Puppet has 75 failures [00:28:37] PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: puppet fail [00:28:38] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: puppet fail [00:28:38] PROBLEM - puppet last run on mc1010 is CRITICAL: CRITICAL: puppet fail [00:28:39] PROBLEM - puppet last run on mw1059 is CRITICAL: CRITICAL: puppet fail [00:28:39] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: puppet fail [00:28:40] PROBLEM - puppet last run on amslvs4 is CRITICAL: CRITICAL: Puppet has 17 failures [00:28:40] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: puppet fail [00:28:41] PROBLEM - puppet last run on mw1048 is CRITICAL: CRITICAL: puppet fail [00:28:45] PROBLEM - puppet last run on pc1001 is CRITICAL: CRITICAL: puppet fail [00:28:47] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: puppet fail [00:28:47] PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: Puppet has 27 failures [00:28:47] PROBLEM - puppet last run on mw1063 is CRITICAL: CRITICAL: puppet fail [00:28:47] PROBLEM - puppet last run on mw1005 is CRITICAL: CRITICAL: Puppet has 59 failures [00:28:56] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Puppet has 19 failures [00:28:56] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Puppet has 19 failures [00:28:56] PROBLEM - puppet last run on mw1187 is CRITICAL: CRITICAL: puppet fail [00:28:57] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Puppet has 72 failures [00:28:57] PROBLEM - puppet last run on mw1134 is CRITICAL: CRITICAL: puppet fail [00:29:00] um? [00:29:08] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: puppet fail [00:29:09] PROBLEM - puppet last run on amslvs2 is CRITICAL: CRITICAL: puppet fail [00:29:09] PROBLEM - puppet last run on es1006 is CRITICAL: CRITICAL: Puppet has 23 failures [00:29:09] PROBLEM - puppet last run on dbproxy1002 is CRITICAL: CRITICAL: puppet fail [00:29:09] PROBLEM - puppet last run on analytics1017 is CRITICAL: CRITICAL: Puppet has 19 failures [00:29:09] PROBLEM - puppet last run on es1001 is CRITICAL: CRITICAL: puppet fail [00:29:10] PROBLEM - puppet last run on gold is CRITICAL: CRITICAL: puppet fail [00:29:10] PROBLEM - puppet last run on mw1006 is CRITICAL: CRITICAL: Puppet has 61 failures [00:29:12] puppetmaster croak again? [00:29:18] PROBLEM - puppet last run on elastic1001 is CRITICAL: CRITICAL: puppet fail [00:29:18] PROBLEM - puppet last run on mw1072 is CRITICAL: CRITICAL: puppet fail [00:29:18] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Puppet has 28 failures [00:29:27] PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: Puppet has 21 failures [00:29:27] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: puppet fail [00:29:28] PROBLEM - puppet last run on wtp1006 is CRITICAL: CRITICAL: puppet fail [00:29:28] PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: puppet fail [00:29:36] PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: Puppet has 76 failures [00:29:36] PROBLEM - puppet last run on mw1082 is CRITICAL: CRITICAL: puppet fail [00:29:37] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: puppet fail [00:29:37] PROBLEM - puppet last run on elastic1007 is CRITICAL: CRITICAL: puppet fail [00:29:37] PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: puppet fail [00:29:37] PROBLEM - puppet last run on search1010 is CRITICAL: CRITICAL: puppet fail [00:29:37] PROBLEM - puppet last run on search1016 is CRITICAL: CRITICAL: puppet fail [00:29:45] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: puppet fail [00:29:48] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: puppet fail [00:29:48] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: Puppet has 27 failures [00:29:49] PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: puppet fail [00:29:49] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: puppet fail [00:29:49] PROBLEM - puppet last run on caesium is CRITICAL: CRITICAL: puppet fail [00:29:49] PROBLEM - puppet last run on analytics1020 is CRITICAL: CRITICAL: puppet fail [00:29:49] PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: Puppet has 28 failures [00:29:50] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: Puppet has 18 failures [00:29:56] PROBLEM - puppet last run on wtp1009 is CRITICAL: CRITICAL: puppet fail [00:30:01] PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: puppet fail [00:30:01] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: Puppet has 22 failures [00:30:02] PROBLEM - puppet last run on magnesium is CRITICAL: CRITICAL: Puppet has 34 failures [00:30:02] PROBLEM - puppet last run on elastic1031 is CRITICAL: CRITICAL: Puppet has 25 failures [00:30:02] PROBLEM - puppet last run on mw1038 is CRITICAL: CRITICAL: Puppet has 75 failures [00:30:02] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: puppet fail [00:30:02] PROBLEM - puppet last run on wtp1024 is CRITICAL: CRITICAL: Puppet has 21 failures [00:30:03] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Puppet has 74 failures [00:30:03] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: Puppet has 25 failures [00:30:07] Unexpected error in mod_passenger: Could not connect to the ApplicationPool server: Broken pipe (32) [00:30:15] (03CR) 10BryanDavis: "Seems to be working in beta. We can always refine later." [puppet] - 10https://gerrit.wikimedia.org/r/175896 (owner: 10BryanDavis) [00:30:15] !log ebernhardson Finished scap: Bumping flow submodule in 1.25wmf10 (duration: 38m 55s) [00:30:15] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Puppet has 28 failures [00:30:15] PROBLEM - puppet last run on neptunium is CRITICAL: CRITICAL: Puppet has 39 failures [00:30:15] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: puppet fail [00:30:15] PROBLEM - puppet last run on mw1007 is CRITICAL: CRITICAL: puppet fail [00:30:15] PROBLEM - puppet last run on mc1006 is CRITICAL: CRITICAL: puppet fail [00:30:16] PROBLEM - puppet last run on mw1115 is CRITICAL: CRITICAL: Puppet has 68 failures [00:30:16] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: Puppet has 71 failures [00:30:20] Logged the message, Master [00:30:25] PROBLEM - puppet last run on cp1039 is CRITICAL: CRITICAL: puppet fail [00:30:25] PROBLEM - puppet last run on analytics1041 is CRITICAL: CRITICAL: puppet fail [00:30:25] PROBLEM - puppet last run on mw1012 is CRITICAL: CRITICAL: puppet fail [00:30:25] PROBLEM - puppet last run on mw1160 is CRITICAL: CRITICAL: Puppet has 66 failures [00:30:25] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: puppet fail [00:30:26] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: Puppet has 57 failures [00:30:26] PROBLEM - puppet last run on lvs4002 is CRITICAL: CRITICAL: puppet fail [00:30:27] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: puppet fail [00:30:36] PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: Puppet has 23 failures [00:30:36] PROBLEM - puppet last run on mw1141 is CRITICAL: CRITICAL: puppet fail [00:30:36] PROBLEM - puppet last run on elastic1004 is CRITICAL: CRITICAL: Puppet has 26 failures [00:30:36] PROBLEM - puppet last run on curium is CRITICAL: CRITICAL: Puppet has 20 failures [00:30:36] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Puppet has 23 failures [00:30:36] PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: Puppet has 25 failures [00:30:36] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CRITICAL: puppet fail [00:30:37] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Puppet has 29 failures [00:30:37] PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: puppet fail [00:30:45] PROBLEM - puppet last run on es1008 is CRITICAL: CRITICAL: puppet fail [00:30:45] PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: Puppet has 23 failures [00:30:45] PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: Puppet has 27 failures [00:30:46] PROBLEM - puppet last run on ms-be1003 is CRITICAL: CRITICAL: Puppet has 24 failures [00:30:46] PROBLEM - puppet last run on db1050 is CRITICAL: CRITICAL: Puppet has 19 failures [00:30:46] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: puppet fail [00:30:46] PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: puppet fail [00:30:47] PROBLEM - puppet last run on potassium is CRITICAL: CRITICAL: puppet fail [00:30:47] PROBLEM - puppet last run on mw1242 is CRITICAL: CRITICAL: puppet fail [00:30:48] PROBLEM - puppet last run on mw1041 is CRITICAL: CRITICAL: puppet fail [00:30:48] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 24 failures [00:30:49] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Puppet has 26 failures [00:30:49] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: puppet fail [00:30:50] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: puppet fail [00:30:50] PROBLEM - puppet last run on db2009 is CRITICAL: CRITICAL: Puppet has 16 failures [00:30:51] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [00:30:58] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: Puppet has 25 failures [00:31:03] PROBLEM - puppet last run on db1022 is CRITICAL: CRITICAL: puppet fail [00:31:05] PROBLEM - puppet last run on analytics1025 is CRITICAL: CRITICAL: puppet fail [00:31:05] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Puppet has 33 failures [00:31:06] PROBLEM - puppet last run on amssq44 is CRITICAL: CRITICAL: Puppet has 22 failures [00:31:06] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: puppet fail [00:31:06] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [00:31:06] PROBLEM - puppet last run on db2019 is CRITICAL: CRITICAL: puppet fail [00:31:06] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: Puppet has 57 failures [00:31:07] PROBLEM - puppet last run on ms-be1004 is CRITICAL: CRITICAL: Puppet has 21 failures [00:31:16] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: puppet fail [00:31:17] PROBLEM - puppet last run on mw1145 is CRITICAL: CRITICAL: Puppet has 82 failures [00:31:17] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: puppet fail [00:31:17] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: puppet fail [00:31:17] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: puppet fail [00:31:17] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: puppet fail [00:31:18] not sure if related, but scap just finished and failed syncing wikiversions to mw1205 and mw1085. I'm not familiar enough with what wikiversions.{json,cdb} contains to say if anything changed [00:31:20] (03PS2) 10Krinkle: gerrit: Don't match Phabricator identifiers within urls [puppet] - 10https://gerrit.wikimedia.org/r/177128 [00:31:27] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: puppet fail [00:31:27] PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: puppet fail [00:31:27] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: puppet fail [00:31:27] PROBLEM - puppet last run on mw1026 is CRITICAL: CRITICAL: Puppet has 67 failures [00:31:27] PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: Puppet has 77 failures [00:31:28] PROBLEM - puppet last run on mw1250 is CRITICAL: CRITICAL: Puppet has 72 failures [00:31:28] PROBLEM - puppet last run on sca1002 is CRITICAL: CRITICAL: puppet fail [00:31:29] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: puppet fail [00:31:29] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: puppet fail [00:31:30] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: puppet fail [00:31:30] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet has 29 failures [00:31:31] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: Puppet has 24 failures [00:31:31] PROBLEM - puppet last run on mw1045 is CRITICAL: CRITICAL: Puppet has 69 failures [00:31:32] PROBLEM - puppet last run on mc1003 is CRITICAL: CRITICAL: Puppet has 22 failures [00:31:32] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Puppet has 22 failures [00:31:33] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 48 failures [00:31:33] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: puppet fail [00:31:33] !log restarted apache2 on palladium [00:31:36] Logged the message, Master [00:31:40] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: puppet fail [00:31:40] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: puppet fail [00:31:40] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: puppet fail [00:31:42] PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail [00:31:42] PROBLEM - puppet last run on es2008 is CRITICAL: CRITICAL: puppet fail [00:31:42] PROBLEM - puppet last run on es2001 is CRITICAL: CRITICAL: puppet fail [00:31:42] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet has 24 failures [00:31:42] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Puppet has 19 failures [00:31:43] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: puppet fail [00:31:46] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [00:31:46] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 62 failures [00:31:50] ebernhardson: wikiversions only changes when a new branch is deployed. It should be fine. [00:31:58] PROBLEM - puppet last run on platinum is CRITICAL: CRITICAL: Puppet has 20 failures [00:31:59] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Puppet has 78 failures [00:32:00] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: puppet fail [00:32:00] PROBLEM - puppet last run on elastic1008 is CRITICAL: CRITICAL: puppet fail [00:32:00] PROBLEM - puppet last run on elastic1022 is CRITICAL: CRITICAL: Puppet has 19 failures [00:32:00] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: Puppet has 58 failures [00:32:00] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 83 failures [00:32:01] PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: Puppet has 25 failures [00:32:01] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 24 failures [00:32:06] PROBLEM - puppet last run on mw1224 is CRITICAL: CRITICAL: Puppet has 82 failures [00:32:06] PROBLEM - puppet last run on rbf1002 is CRITICAL: CRITICAL: Puppet has 22 failures [00:32:07] PROBLEM - puppet last run on db1042 is CRITICAL: CRITICAL: puppet fail [00:32:07] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: Puppet has 24 failures [00:32:07] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: Puppet has 27 failures [00:32:07] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Puppet has 18 failures [00:32:09] bd808: yup should be fine then. thanks [00:32:16] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: puppet fail [00:32:22] PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: puppet fail [00:32:27] RECOVERY - puppet last run on elastic1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:32:28] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: puppet fail [00:32:30] PROBLEM - puppet last run on mw1217 is CRITICAL: CRITICAL: Puppet has 62 failures [00:32:31] PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: puppet fail [00:32:31] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: Puppet has 22 failures [00:32:36] !log ebernhardson Synchronized wmf-config/PoolCounterSettings-eqiad.php: Create new pool counter for prefix searches (duration: 00m 05s) [00:32:36] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: puppet fail [00:32:36] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 19 failures [00:32:37] PROBLEM - puppet last run on lead is CRITICAL: CRITICAL: puppet fail [00:32:37] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: puppet fail [00:32:37] ^d: sync'd your change out [00:32:37] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: puppet fail [00:32:37] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: Puppet has 22 failures [00:32:37] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: puppet fail [00:32:38] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: puppet fail [00:32:38] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: puppet fail [00:32:38] Logged the message, Master [00:32:46] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: puppet fail [00:32:46] PROBLEM - puppet last run on virt1001 is CRITICAL: CRITICAL: puppet fail [00:32:46] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: puppet fail [00:32:46] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: puppet fail [00:32:46] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: puppet fail [00:32:47] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: puppet fail [00:32:47] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: Puppet has 32 failures [00:32:48] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 66 failures [00:32:48] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: Puppet has 63 failures [00:32:50] <^d> ebernhardson: thank you sir. [00:32:58] PROBLEM - puppet last run on mw1088 is CRITICAL: CRITICAL: Puppet has 65 failures [00:32:58] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 25 failures [00:32:58] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: puppet fail [00:32:58] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 46 failures [00:32:58] PROBLEM - puppet last run on labnet1001 is CRITICAL: CRITICAL: puppet fail [00:32:59] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: puppet fail [00:32:59] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: puppet fail [00:33:06] PROBLEM - puppet last run on labstore1001 is CRITICAL: CRITICAL: Puppet has 21 failures [00:33:13] PROBLEM - puppet last run on wtp1016 is CRITICAL: CRITICAL: Puppet has 28 failures [00:33:14] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: puppet fail [00:33:14] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: puppet fail [00:33:15] PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: Puppet has 68 failures [00:33:20] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Puppet has 24 failures [00:33:20] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: puppet fail [00:33:20] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: puppet fail [00:33:20] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: puppet fail [00:33:20] PROBLEM - puppet last run on mw1076 is CRITICAL: CRITICAL: puppet fail [00:33:20] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 81 failures [00:33:21] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: puppet fail [00:33:22] (03CR) 10EBernhardson: [C: 032] Turning on wgWikiGrokDebug on en BetaLabs and removing obsolete comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177134 (owner: 10Kaldari) [00:33:26] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 65 failures [00:33:27] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 64 failures [00:33:27] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: puppet fail [00:33:27] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 27 failures [00:33:34] (03Merged) 10jenkins-bot: Turning on wgWikiGrokDebug on en BetaLabs and removing obsolete comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177134 (owner: 10Kaldari) [00:33:36] (03PS3) 10Krinkle: gerrit: Don't match Phabricator identifiers within urls [puppet] - 10https://gerrit.wikimedia.org/r/177128 [00:33:39] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 77 failures [00:33:39] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 45 failures [00:33:39] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: puppet fail [00:33:39] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 65 failures [00:33:39] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: puppet fail [00:33:40] PROBLEM - puppet last run on elastic1012 is CRITICAL: CRITICAL: Puppet has 26 failures [00:33:40] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 66 failures [00:33:41] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: puppet fail [00:33:41] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: Puppet has 27 failures [00:33:42] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: puppet fail [00:33:42] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: puppet fail [00:33:43] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: puppet fail [00:33:46] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 78 failures [00:33:46] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 77 failures [00:33:46] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: puppet fail [00:33:46] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 68 failures [00:33:46] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: puppet fail [00:33:47] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.064 second response time [00:33:47] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 21 failures [00:33:48] PROBLEM - puppet last run on analytics1010 is CRITICAL: CRITICAL: puppet fail [00:33:48] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 66 failures [00:33:49] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: puppet fail [00:33:49] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: puppet fail [00:33:50] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: puppet fail [00:33:56] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 24 failures [00:33:57] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: puppet fail [00:33:58] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 21 failures [00:33:59] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 25 failures [00:34:12] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 21 failures [00:34:12] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: puppet fail [00:34:12] PROBLEM - puppet last run on elastic1019 is CRITICAL: CRITICAL: puppet fail [00:34:12] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: puppet fail [00:34:13] PROBLEM - puppet last run on elastic1030 is CRITICAL: CRITICAL: Puppet has 26 failures [00:34:13] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: Puppet has 19 failures [00:34:13] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: puppet fail [00:34:14] PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: Puppet has 48 failures [00:34:14] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: puppet fail [00:34:15] PROBLEM - puppet last run on amssq48 is CRITICAL: CRITICAL: puppet fail [00:34:15] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: puppet fail [00:34:16] PROBLEM - puppet last run on polonium is CRITICAL: CRITICAL: Puppet has 24 failures [00:34:16] PROBLEM - puppet last run on virt1003 is CRITICAL: CRITICAL: Puppet has 25 failures [00:34:16] !log ebernhardson Synchronized wmf-config/: Turning on wgWikiGrokDebug on en BetaLabs (duration: 00m 06s) [00:34:17] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 60 failures [00:34:17] PROBLEM - puppet last run on virt1007 is CRITICAL: CRITICAL: puppet fail [00:34:18] Logged the message, Master [00:34:28] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: puppet fail [00:34:29] PROBLEM - puppet last run on mw1051 is CRITICAL: CRITICAL: puppet fail [00:34:29] kaldari|2: your wmf-config update is pushed out [00:34:30] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 68 failures [00:34:33] (03CR) 10BryanDavis: "Production no-op that is useful in beta (and has been for 5 months). Merge plz?" [puppet] - 10https://gerrit.wikimedia.org/r/143788 (https://bugzilla.wikimedia.org/60690) (owner: 10BryanDavis) [00:34:38] PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: Puppet has 21 failures [00:34:38] PROBLEM - puppet last run on lithium is CRITICAL: CRITICAL: Puppet has 18 failures [00:34:38] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: puppet fail [00:34:38] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: puppet fail [00:34:38] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: puppet fail [00:34:39] yurikR: i'm done, you can proceed [00:34:39] PROBLEM - puppet last run on db1016 is CRITICAL: CRITICAL: Puppet has 23 failures [00:34:39] PROBLEM - puppet last run on mw1247 is CRITICAL: CRITICAL: puppet fail [00:34:40] PROBLEM - puppet last run on labcontrol2001 is CRITICAL: CRITICAL: Puppet has 39 failures [00:34:40] PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: puppet fail [00:34:41] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Puppet has 27 failures [00:34:41] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: puppet fail [00:34:42] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Puppet has 28 failures [00:34:42] PROBLEM - puppet last run on db1004 is CRITICAL: CRITICAL: puppet fail [00:34:43] PROBLEM - puppet last run on wtp1005 is CRITICAL: CRITICAL: puppet fail [00:34:43] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: puppet fail [00:34:44] PROBLEM - puppet last run on wtp1012 is CRITICAL: CRITICAL: puppet fail [00:34:44] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: puppet fail [00:34:49] PROBLEM - puppet last run on mw1237 is CRITICAL: CRITICAL: Puppet has 82 failures [00:34:49] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: puppet fail [00:34:49] PROBLEM - puppet last run on analytics1016 is CRITICAL: CRITICAL: Puppet has 18 failures [00:34:50] PROBLEM - puppet last run on pc1002 is CRITICAL: CRITICAL: Puppet has 29 failures [00:34:59] PROBLEM - puppet last run on db2007 is CRITICAL: CRITICAL: Puppet has 24 failures [00:35:02] PROBLEM - puppet last run on mw1030 is CRITICAL: CRITICAL: puppet fail [00:35:02] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: puppet fail [00:35:02] PROBLEM - puppet last run on mw1044 is CRITICAL: CRITICAL: puppet fail [00:35:03] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 65 failures [00:35:03] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Puppet has 86 failures [00:35:09] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: Puppet has 21 failures [00:35:09] PROBLEM - puppet last run on search1007 is CRITICAL: CRITICAL: Puppet has 54 failures [00:35:09] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: Puppet has 74 failures [00:35:09] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: puppet fail [00:35:09] PROBLEM - puppet last run on ms-fe2003 is CRITICAL: CRITICAL: Puppet has 33 failures [00:35:10] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail [00:35:10] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: puppet fail [00:35:11] PROBLEM - puppet last run on amslvs1 is CRITICAL: CRITICAL: Puppet has 19 failures [00:35:11] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 26 failures [00:35:12] PROBLEM - puppet last run on antimony is CRITICAL: CRITICAL: Puppet has 37 failures [00:35:21] PROBLEM - puppet last run on elastic1006 is CRITICAL: CRITICAL: Puppet has 19 failures [00:35:21] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: puppet fail [00:35:21] PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Puppet has 25 failures [00:35:21] PROBLEM - puppet last run on plutonium is CRITICAL: CRITICAL: Puppet has 33 failures [00:35:21] PROBLEM - puppet last run on es1007 is CRITICAL: CRITICAL: Puppet has 25 failures [00:35:29] RECOVERY - puppet last run on es1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:35:29] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: puppet fail [00:35:29] PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: puppet fail [00:35:29] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: Puppet has 25 failures [00:35:30] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: puppet fail [00:35:30] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: Puppet has 26 failures [00:35:30] PROBLEM - puppet last run on mw1156 is CRITICAL: CRITICAL: puppet fail [00:35:31] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: puppet fail [00:35:31] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 21 failures [00:35:35] ebernhardson: thanks. testing now [00:35:39] PROBLEM - puppet last run on thallium is CRITICAL: CRITICAL: puppet fail [00:35:42] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: puppet fail [00:35:50] PROBLEM - puppet last run on mw1050 is CRITICAL: CRITICAL: puppet fail [00:35:50] PROBLEM - puppet last run on elastic1024 is CRITICAL: CRITICAL: puppet fail [00:35:51] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: puppet fail [00:35:51] PROBLEM - puppet last run on search1002 is CRITICAL: CRITICAL: Puppet has 50 failures [00:35:51] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 29 failures [00:35:51] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Puppet has 21 failures [00:35:51] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [00:35:52] PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: puppet fail [00:35:52] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: puppet fail [00:35:53] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Puppet has 27 failures [00:35:53] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: puppet fail [00:35:54] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: puppet fail [00:35:54] PROBLEM - puppet last run on berkelium is CRITICAL: CRITICAL: puppet fail [00:36:01] PROBLEM - puppet last run on mw1079 is CRITICAL: CRITICAL: puppet fail [00:36:02] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: puppet fail [00:36:02] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet has 24 failures [00:36:02] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Puppet has 20 failures [00:36:03] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: puppet fail [00:36:03] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: puppet fail [00:36:03] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: Puppet has 20 failures [00:36:04] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:36:04] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Puppet has 24 failures [00:36:04] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Puppet has 72 failures [00:36:05] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: puppet fail [00:36:05] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: Puppet has 61 failures [00:36:06] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: Puppet has 69 failures [00:36:13] PROBLEM - puppet last run on mw1227 is CRITICAL: CRITICAL: puppet fail [00:36:13] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: puppet fail [00:36:23] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: puppet fail [00:36:23] PROBLEM - puppet last run on virt1004 is CRITICAL: CRITICAL: Puppet has 28 failures [00:36:33] PROBLEM - puppet last run on mw1057 is CRITICAL: CRITICAL: puppet fail [00:36:33] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: puppet fail [00:36:33] PROBLEM - puppet last run on search1017 is CRITICAL: CRITICAL: puppet fail [00:36:33] PROBLEM - puppet last run on ms-be1008 is CRITICAL: CRITICAL: puppet fail [00:36:33] PROBLEM - puppet last run on db2029 is CRITICAL: CRITICAL: puppet fail [00:36:34] puppet is really upset huh? [00:36:34] PROBLEM - puppet last run on snapshot1004 is CRITICAL: CRITICAL: puppet fail [00:36:34] PROBLEM - puppet last run on mw1081 is CRITICAL: CRITICAL: puppet fail [00:36:35] PROBLEM - puppet last run on mw1034 is CRITICAL: CRITICAL: Puppet has 73 failures [00:36:41] PROBLEM - puppet last run on snapshot1002 is CRITICAL: CRITICAL: puppet fail [00:36:41] PROBLEM - puppet last run on mw1049 is CRITICAL: CRITICAL: Puppet has 74 failures [00:36:41] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Puppet has 74 failures [00:36:41] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Puppet has 23 failures [00:36:41] PROBLEM - puppet last run on mc1001 is CRITICAL: CRITICAL: puppet fail [00:36:42] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: puppet fail [00:36:42] RECOVERY - puppet last run on analytics1031 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:36:43] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: puppet fail [00:36:43] PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Puppet has 52 failures [00:36:44] RECOVERY - puppet last run on db2010 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:36:44] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: Puppet has 64 failures [00:36:45] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: puppet fail [00:36:52] PROBLEM - puppet last run on analytics1013 is CRITICAL: CRITICAL: puppet fail [00:36:52] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: puppet fail [00:36:53] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: puppet fail [00:36:53] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:36:53] PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: puppet fail [00:36:53] PROBLEM - puppet last run on elastic1014 is CRITICAL: CRITICAL: Puppet has 31 failures [00:37:01] PROBLEM - puppet last run on analytics1026 is CRITICAL: CRITICAL: Puppet has 27 failures [00:37:05] PROBLEM - puppet last run on search1023 is CRITICAL: CRITICAL: puppet fail [00:37:05] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: puppet fail [00:37:06] PROBLEM - puppet last run on search1005 is CRITICAL: CRITICAL: Puppet has 54 failures [00:37:07] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Puppet has 70 failures [00:37:11] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: Puppet has 25 failures [00:37:12] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: puppet fail [00:37:12] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: puppet fail [00:37:12] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: Puppet has 26 failures [00:37:25] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [00:37:25] RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:37:25] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Puppet has 26 failures [00:37:25] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: puppet fail [00:37:26] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:37:26] PROBLEM - puppet last run on mw1149 is CRITICAL: CRITICAL: Puppet has 62 failures [00:37:26] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [00:37:27] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: puppet fail [00:37:27] PROBLEM - puppet last run on mw1111 is CRITICAL: CRITICAL: Puppet has 80 failures [00:37:32] RECOVERY - puppet last run on es1004 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [00:37:32] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Puppet has 25 failures [00:37:33] PROBLEM - puppet last run on mw1121 is CRITICAL: CRITICAL: puppet fail [00:37:33] RECOVERY - puppet last run on elastic1013 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:37:33] PROBLEM - puppet last run on mw1074 is CRITICAL: CRITICAL: puppet fail [00:37:33] RECOVERY - puppet last run on dysprosium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:37:33] RECOVERY - puppet last run on tmh1001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:37:34] RECOVERY - puppet last run on zinc is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:37:34] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet has 14 failures [00:37:35] PROBLEM - puppet last run on mw1055 is CRITICAL: CRITICAL: Puppet has 67 failures [00:37:35] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:37:36] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: Puppet has 8 failures [00:37:36] PROBLEM - puppet last run on mw1116 is CRITICAL: CRITICAL: puppet fail [00:37:42] PROBLEM - puppet last run on mw1014 is CRITICAL: CRITICAL: Puppet has 51 failures [00:37:45] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: Puppet has 10 failures [00:37:45] RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:37:45] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [00:37:45] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [00:37:45] PROBLEM - puppet last run on analytics1023 is CRITICAL: CRITICAL: Puppet has 30 failures [00:37:45] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: Puppet has 68 failures [00:37:46] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: Puppet has 28 failures [00:37:52] PROBLEM - puppet last run on argon is CRITICAL: CRITICAL: puppet fail [00:37:52] PROBLEM - puppet last run on analytics1014 is CRITICAL: CRITICAL: puppet fail [00:37:52] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:38:01] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Puppet has 29 failures [00:38:01] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:01] RECOVERY - puppet last run on db1005 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:38:02] PROBLEM - puppet last run on search1024 is CRITICAL: CRITICAL: Puppet has 58 failures [00:38:02] PROBLEM - puppet last run on elastic1015 is CRITICAL: CRITICAL: puppet fail [00:38:12] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Puppet has 44 failures [00:38:12] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Puppet has 15 failures [00:38:13] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: Puppet has 5 failures [00:38:13] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:38:13] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: puppet fail [00:38:13] RECOVERY - puppet last run on mw1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:13] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:38:14] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: puppet fail [00:38:14] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Puppet has 20 failures [00:38:15] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Puppet has 29 failures [00:38:15] RECOVERY - puppet last run on radon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:24] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: puppet fail [00:38:24] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: Puppet has 25 failures [00:38:25] RECOVERY - puppet last run on cp1057 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [00:38:32] RECOVERY - puppet last run on analytics1021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:32] PROBLEM - puppet last run on db1055 is CRITICAL: CRITICAL: Puppet has 5 failures [00:38:32] RECOVERY - puppet last run on virt1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:32] RECOVERY - puppet last run on erbium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:38:32] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:33] PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: Puppet has 14 failures [00:38:33] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: puppet fail [00:38:42] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:42] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:42] PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: Puppet has 26 failures [00:38:43] PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: Puppet has 8 failures [00:38:43] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: Puppet has 18 failures [00:38:43] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:38:43] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: Puppet has 68 failures [00:38:44] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Puppet has 2 failures [00:38:50] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:50] PROBLEM - puppet last run on mw1001 is CRITICAL: CRITICAL: puppet fail [00:38:50] RECOVERY - puppet last run on elastic1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:51] RECOVERY - puppet last run on mw1070 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:51] RECOVERY - puppet last run on analytics1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:38:51] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:38:51] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:51] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: puppet fail [00:38:52] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: Puppet has 25 failures [00:38:52] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:38:53] PROBLEM - puppet last run on search1015 is CRITICAL: CRITICAL: puppet fail [00:38:53] RECOVERY - puppet last run on db1049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:38:54] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: Puppet has 22 failures [00:39:01] PROBLEM - puppet last run on mw1159 is CRITICAL: CRITICAL: Puppet has 9 failures [00:39:07] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:07] RECOVERY - puppet last run on mw1095 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [00:39:07] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:08] RECOVERY - puppet last run on search1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:08] RECOVERY - puppet last run on mw1073 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:11] RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:39:11] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:11] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:11] RECOVERY - puppet last run on mw1104 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:11] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:12] PROBLEM - puppet last run on mw1087 is CRITICAL: CRITICAL: Puppet has 2 failures [00:39:12] RECOVERY - puppet last run on db1041 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [00:39:13] RECOVERY - puppet last run on db1058 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:13] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:39:24] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: Puppet has 17 failures [00:39:24] RECOVERY - puppet last run on es2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:25] RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:25] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:25] RECOVERY - puppet last run on lanthanum is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:25] RECOVERY - puppet last run on labsdb1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:25] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:39:26] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Puppet has 77 failures [00:39:26] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:27] RECOVERY - puppet last run on mw1113 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:27] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:28] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:28] PROBLEM - puppet last run on elastic1011 is CRITICAL: CRITICAL: puppet fail [00:39:29] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:29] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: puppet fail [00:39:30] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:33] RECOVERY - puppet last run on es1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:33] RECOVERY - puppet last run on wtp1014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:34] RECOVERY - puppet last run on mc1008 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:39:34] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [00:39:34] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:41] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:41] RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:41] RECOVERY - puppet last run on elastic1025 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:39:41] RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:41] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Puppet has 14 failures [00:39:51] RECOVERY - puppet last run on db1027 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:57] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:39:57] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:39:57] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:01] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:01] RECOVERY - puppet last run on search1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:01] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:40:02] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:40:03] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:06] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:40:06] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:40:06] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: Puppet has 2 failures [00:40:06] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: puppet fail [00:40:06] RECOVERY - puppet last run on search1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:40:07] PROBLEM - puppet last run on db2003 is CRITICAL: CRITICAL: puppet fail [00:40:07] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:08] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:12] RECOVERY - puppet last run on search1021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:12] RECOVERY - puppet last run on mw1075 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:12] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:12] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:12] RECOVERY - puppet last run on lvs2005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:12] RECOVERY - puppet last run on analytics1024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:13] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:13] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: puppet fail [00:40:14] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:14] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Puppet has 12 failures [00:40:23] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:23] RECOVERY - puppet last run on elastic1010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:40:23] RECOVERY - puppet last run on mw1085 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:23] RECOVERY - puppet last run on tmh1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:23] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [00:40:24] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:24] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Puppet has 3 failures [00:40:24] RECOVERY - puppet last run on virt1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:25] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Puppet has 40 failures [00:40:25] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Puppet has 39 failures [00:40:26] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [00:40:32] RECOVERY - puppet last run on db1010 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [00:40:32] PROBLEM - puppet last run on analytics1037 is CRITICAL: CRITICAL: Puppet has 2 failures [00:40:32] RECOVERY - puppet last run on elastic1016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:35] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: Puppet has 13 failures [00:40:35] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:40:45] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:47] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:40:51] RECOVERY - puppet last run on search1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:40:51] RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:41:01] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:41:06] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:41:06] RECOVERY - puppet last run on mw1083 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:41:07] RECOVERY - puppet last run on labsdb1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:41:07] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: Puppet has 43 failures [00:41:07] PROBLEM - puppet last run on mw1146 is CRITICAL: CRITICAL: Puppet has 10 failures [00:41:07] RECOVERY - puppet last run on mc1010 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [00:41:08] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:41:08] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:41:08] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [00:41:11] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:41:11] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:41:11] RECOVERY - puppet last run on amslvs4 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [00:41:11] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:41:25] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:41:29] RECOVERY - puppet last run on ms-be1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:41:29] PROBLEM - puppet last run on amslvs3 is CRITICAL: CRITICAL: Puppet has 5 failures [00:41:30] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 2 failures [00:41:30] RECOVERY - puppet last run on ms-be2015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:41:30] RECOVERY - puppet last run on es1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:41:31] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:41:31] RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:41:31] RECOVERY - puppet last run on mw1094 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:41:33] RECOVERY - puppet last run on mw1013 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [00:41:33] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [00:41:33] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:41:33] RECOVERY - puppet last run on ms-be1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:41:54] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:41:54] RECOVERY - puppet last run on es1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:41:55] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [00:41:55] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: Puppet has 10 failures [00:42:01] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:42:03] RECOVERY - puppet last run on mw1096 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:04] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:06] RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:06] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:06] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:42:06] RECOVERY - puppet last run on db1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:06] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:42:17] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:17] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:18] RECOVERY - puppet last run on cp1069 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:18] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:18] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:18] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:42:21] RECOVERY - puppet last run on mw1035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:31] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:42:32] RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [00:42:32] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:32] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:33] RECOVERY - puppet last run on elastic1026 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:34] RECOVERY - puppet last run on magnesium is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [00:42:34] RECOVERY - puppet last run on ytterbium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:34] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:42:34] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [00:42:34] RECOVERY - puppet last run on wtp1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:35] RECOVERY - puppet last run on elastic1031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:41] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:41] RECOVERY - puppet last run on mw1038 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:42:41] RECOVERY - puppet last run on wtp1009 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:42:42] RECOVERY - puppet last run on search1014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:42] RECOVERY - puppet last run on wtp1024 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:42] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:42] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:42:43] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:43] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:42:44] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:42:44] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:42:45] RECOVERY - puppet last run on cp1043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:45] RECOVERY - puppet last run on zirconium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:46] RECOVERY - puppet last run on wtp1021 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:52] RECOVERY - puppet last run on ms-fe1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:52] RECOVERY - puppet last run on analytics1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:42:52] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:42:58] RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:59] RECOVERY - puppet last run on neptunium is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:43:01] RECOVERY - puppet last run on mw1115 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:43:02] RECOVERY - puppet last run on mw1028 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:02] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:43:02] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:02] RECOVERY - puppet last run on virt1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:02] RECOVERY - puppet last run on mw1062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:02] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:03] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:43:06] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:06] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [00:43:07] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:43:12] RECOVERY - puppet last run on curium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:14] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [00:43:14] RECOVERY - puppet last run on elastic1028 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:23] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:26] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:26] RECOVERY - puppet last run on es1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:26] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [00:43:26] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:26] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:32] RECOVERY - puppet last run on mw1089 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:33] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:33] RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:43:33] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:33] RECOVERY - puppet last run on potassium is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:43:33] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:33] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:34] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:34] RECOVERY - puppet last run on mw1041 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:43:35] RECOVERY - puppet last run on mw1036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:35] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:36] RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:43:42] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:43] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:43:47] RECOVERY - puppet last run on mw1257 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:43:49] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:43:56] RECOVERY - puppet last run on ms-be1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:56] RECOVERY - puppet last run on db2019 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:43:57] RECOVERY - puppet last run on amssq44 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:57] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:43:57] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:07] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:07] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [00:44:07] RECOVERY - puppet last run on mw1250 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:44:07] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:07] RECOVERY - puppet last run on mw1040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:08] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:44:08] RECOVERY - puppet last run on mc1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:09] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:44:09] RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:44:10] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:10] RECOVERY - puppet last run on db1031 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:44:11] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:11] RECOVERY - puppet last run on mw1045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:12] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:12] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:13] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:13] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:14] RECOVERY - puppet last run on db1053 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:14] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:44:15] RECOVERY - puppet last run on search1020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:15] RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:44:16] RECOVERY - puppet last run on es2008 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:44:17] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:18] RECOVERY - puppet last run on mw1059 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:44:18] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:18] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:18] RECOVERY - puppet last run on es2001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:44:19] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:44:21] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [00:44:22] RECOVERY - puppet last run on pc1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:22] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:40] RECOVERY - puppet last run on mw1063 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:40] RECOVERY - puppet last run on mw1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:44:40] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:40] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:44:40] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [00:44:41] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:41] RECOVERY - puppet last run on platinum is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:42] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:42] RECOVERY - puppet last run on elastic1008 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:44:43] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:44:46] RECOVERY - puppet last run on mw1224 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:50] RECOVERY - puppet last run on rbf1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:51] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:51] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:44:57] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:57] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:44:58] RECOVERY - puppet last run on amslvs2 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:58] RECOVERY - puppet last run on dbproxy1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:58] RECOVERY - puppet last run on gold is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:58] RECOVERY - puppet last run on analytics1017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:58] RECOVERY - puppet last run on mw1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:44:59] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [00:45:13] RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:45:13] RECOVERY - puppet last run on mw1217 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:45:13] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:45:13] RECOVERY - puppet last run on wtp1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:13] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:45:14] RECOVERY - puppet last run on db1073 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:14] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:45:15] RECOVERY - puppet last run on mw1082 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:45:15] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:45:16] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [00:45:17] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:17] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:45:26] RECOVERY - puppet last run on elastic1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:26] RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:26] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:45:26] RECOVERY - puppet last run on search1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:45:27] RECOVERY - puppet last run on caesium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:27] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:27] RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:28] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:45:36] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [00:45:40] RECOVERY - puppet last run on analytics1020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:43] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:45:43] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:45:43] RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:43] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:45:44] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:45:44] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:47] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [00:45:47] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [00:45:47] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:45:47] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:56] RECOVERY - puppet last run on wtp1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:45:58] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:45:59] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:45:59] RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:59] RECOVERY - puppet last run on cp1039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:45:59] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:45:59] RECOVERY - puppet last run on analytics1041 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:07] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:07] RECOVERY - puppet last run on mw1160 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:07] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:07] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:07] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:16] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:46:22] RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:22] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [00:46:23] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:46:23] RECOVERY - puppet last run on elastic1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:24] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:24] RECOVERY - puppet last run on elastic1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:24] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [00:46:24] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:46:25] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [00:46:28] RECOVERY - puppet last run on es1008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:28] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:28] RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:37] RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:37] RECOVERY - puppet last run on db1050 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:37] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:46] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:46] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:46] RECOVERY - puppet last run on mw1242 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:46] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:46:46] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:47] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:47] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:48] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:48] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:46:49] RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:49] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:50] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:50] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:51] RECOVERY - puppet last run on analytics1025 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:46:51] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:52] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:52] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:53] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:57] RECOVERY - puppet last run on elastic1030 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:46:57] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:57] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:57] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:58] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:46:58] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:46:58] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:06] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:09] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [00:47:09] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:47:09] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:47:09] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:18] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:18] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:18] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:19] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:47:19] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:47:19] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:47:19] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:47:20] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:20] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:47:21] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:47:21] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:22] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:47:27] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:38] RECOVERY - puppet last run on pc1002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:47:46] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:47:51] RECOVERY - puppet last run on db2007 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:47:57] RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:47:57] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:47:57] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:47:57] RECOVERY - puppet last run on search1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:48:07] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:48:08] RECOVERY - puppet last run on ms-fe2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:48:08] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:48:08] RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:48:08] RECOVERY - puppet last run on amslvs1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:48:08] RECOVERY - puppet last run on db1042 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:48:08] RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:48:09] RECOVERY - puppet last run on es1007 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:48:09] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [00:48:18] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:48:18] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:48:18] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:48:28] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:48:33] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:48:42] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:48:42] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:48:42] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:48:52] RECOVERY - puppet last run on virt1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:48:52] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [00:48:53] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:48:53] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [00:48:53] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:48:53] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:49:02] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:49:03] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:49:03] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:03] RECOVERY - puppet last run on mw1227 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:49:03] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:49:03] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:14] RECOVERY - puppet last run on virt1004 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [00:49:17] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:49:19] RECOVERY - puppet last run on labstore1001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:49:20] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:49:21] RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:49:21] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:21] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:49:21] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:49:21] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:22] RECOVERY - puppet last run on cp1058 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:22] RECOVERY - puppet last run on mw1076 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:23] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:49:23] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:49:24] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:24] RECOVERY - puppet last run on snapshot1002 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:49:25] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:25] RECOVERY - puppet last run on mw1049 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:49:26] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:49:26] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:49:27] RECOVERY - puppet last run on mw1054 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:33] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:34] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:34] RECOVERY - puppet last run on analytics1013 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:49:34] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:49:42] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:49:43] RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:49:52] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:49:54] RECOVERY - puppet last run on analytics1026 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:54] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:54] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:54] RECOVERY - puppet last run on search1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:54] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:55] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:49:55] RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:56] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:56] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:49:57] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:50:06] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:11] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:13] RECOVERY - puppet last run on rubidium is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:50:17] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [00:50:17] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [00:50:17] RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:18] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [00:50:18] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:18] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:50:18] RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:50:19] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:50:19] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:20] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:50:20] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [00:50:21] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:21] RECOVERY - puppet last run on mw1149 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:22] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:50:22] RECOVERY - puppet last run on polonium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:23] RECOVERY - puppet last run on mw1111 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [00:50:23] RECOVERY - puppet last run on virt1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:24] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [00:50:27] RECOVERY - puppet last run on mw1055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:30] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:50:30] RECOVERY - puppet last run on mw1051 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:30] RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:30] RECOVERY - puppet last run on lithium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:30] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [00:50:31] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:31] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:32] RECOVERY - puppet last run on db1016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:32] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:33] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:33] RECOVERY - puppet last run on mw1146 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:50:34] RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:34] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:35] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [00:50:38] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:50:39] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:39] RECOVERY - puppet last run on db1004 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:50:39] RECOVERY - puppet last run on wtp1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:39] RECOVERY - puppet last run on mw1056 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [00:50:39] RECOVERY - puppet last run on analytics1023 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [00:50:40] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:40] RECOVERY - puppet last run on wtp1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:41] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:41] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:50:42] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:50:47] RECOVERY - puppet last run on argon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:53] RECOVERY - puppet last run on mw1237 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:50:53] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:53] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:50:53] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:55] RECOVERY - puppet last run on analytics1016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:50:57] RECOVERY - puppet last run on search1024 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:50:57] RECOVERY - puppet last run on elastic1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:57] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:57] RECOVERY - puppet last run on mw1030 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:50:57] RECOVERY - puppet last run on mw1044 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:09] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [00:51:09] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:51:09] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:09] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:51:09] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:10] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:10] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:11] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:11] RECOVERY - puppet last run on wtp1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:12] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:12] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:13] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:13] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:14] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:14] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:15] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:15] RECOVERY - puppet last run on elastic1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:17] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:18] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [00:51:18] RECOVERY - puppet last run on plutonium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:19] RECOVERY - puppet last run on db1055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:27] RECOVERY - puppet last run on mw1023 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:51:27] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:27] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:51:27] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:27] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:28] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:28] RECOVERY - puppet last run on db1062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:29] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:29] RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:30] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:51:30] RECOVERY - puppet last run on db2037 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:31] RECOVERY - puppet last run on thallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:31] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:32] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:32] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:33] RECOVERY - puppet last run on elastic1024 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:33] RECOVERY - puppet last run on mw1050 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:51:34] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:51:34] RECOVERY - puppet last run on search1015 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:51:48] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:51:48] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:48] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:48] RECOVERY - puppet last run on search1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:51:48] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:49] RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:49] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:50] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:50] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:51] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:57] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:57] RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:57] RECOVERY - puppet last run on berkelium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:57] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:57] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:58] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:51:58] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:51:59] RECOVERY - puppet last run on mw1087 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:51:59] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:52:00] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:00] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:01] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:52:07] RECOVERY - puppet last run on elastic1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:52:07] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:07] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:52:07] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:52:18] RECOVERY - puppet last run on mw1057 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:21] RECOVERY - puppet last run on search1017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:21] RECOVERY - puppet last run on ms-be1008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:25] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:25] RECOVERY - puppet last run on snapshot1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:25] RECOVERY - puppet last run on mw1081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:52:25] RECOVERY - puppet last run on mw1034 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:26] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:26] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:26] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:52:27] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [00:52:27] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:28] RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:52:28] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:52:38] RECOVERY - puppet last run on db2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:52:40] RECOVERY - puppet last run on wtp1023 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:46] !log yurik Synchronized php-1.25wmf10/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 00m 07s) [00:52:49] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:52:50] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:52:52] Logged the message, Master [00:53:03] RECOVERY - puppet last run on elastic1014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:07] RECOVERY - puppet last run on search1023 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:12] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:12] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:53:12] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:17] RECOVERY - puppet last run on cp1038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:18] RECOVERY - puppet last run on analytics1037 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:18] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:19] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:28] RECOVERY - puppet last run on virt1007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:53:28] RECOVERY - puppet last run on mw1074 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:39] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:39] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:58] RECOVERY - puppet last run on amslvs3 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:53:58] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:53:58] RECOVERY - puppet last run on analytics1014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:54:19] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:54:35] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:54:38] RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:55:08] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:55:48] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:56:32] RECOVERY - puppet last run on mw1121 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:44:42] (03PS1) 10BryanDavis: wgHooks['SpecialVersionVersionUrl']: support alpha version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177151 [01:50:22] (03PS1) 10Kaldari: Setting $wgMFEnableWikiGrokOnAllDevices to true on en Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177153 [02:04:08] (03CR) 10Dzahn: "14:28 <+icinga-wm> RECOVERY - LVS HTTP IPv4 on mathoid.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 301 bytes in 0.010 second respons" [puppet] - 10https://gerrit.wikimedia.org/r/176942 (owner: 10Alexandros Kosiaris) [02:19:17] !log l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 02s) [02:19:21] !log LocalisationUpdate completed (1.25wmf9) at 2014-12-03 02:19:21+00:00 [02:19:25] Logged the message, Master [02:19:28] Logged the message, Master [02:20:44] (03PS1) 10Springle: depool db1072 for upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177156 [02:21:16] (03CR) 10Springle: [C: 032] depool db1072 for upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177156 (owner: 10Springle) [02:21:21] (03PS1) 10BryanDavis: beta: Fix unset $lang from MWMultiVersion::setSiteInfoForWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177157 [02:22:17] !log springle Synchronized wmf-config/db-eqiad.php: depool db1072 (duration: 00m 08s) [02:22:21] Logged the message, Master [02:32:26] !log l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s) [02:32:29] !log LocalisationUpdate completed (1.25wmf10) at 2014-12-03 02:32:29+00:00 [02:32:33] Logged the message, Master [02:32:35] Logged the message, Master [02:41:00] (03CR) 10Reedy: beta: Fix unset $lang from MWMultiVersion::setSiteInfoForWiki() (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177157 (owner: 10BryanDavis) [03:13:02] !log upgrade db1072 trusty [03:13:05] Logged the message, Master [03:21:53] PROBLEM - SSH on lvs1004 is CRITICAL: Server answer: [03:23:35] huh? [03:24:30] lvs1004 is fine, must be neon with issues [03:32:08] (03PS1) 10EBernhardson: Enable flow on officewiki NS_PROJECT_TALK [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177166 [03:33:13] RECOVERY - SSH on lvs1004 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [03:36:39] (03PS1) 10Springle: repool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177169 [03:37:17] (03CR) 10Springle: [C: 032] repool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177169 (owner: 10Springle) [03:37:25] (03Merged) 10jenkins-bot: repool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177169 (owner: 10Springle) [03:40:03] !log springle Synchronized wmf-config/db-eqiad.php: repool db1072, warm up (duration: 00m 10s) [03:40:07] Logged the message, Master [03:48:38] (03CR) 10MZMcBride: "Thank you! <3" [puppet] - 10https://gerrit.wikimedia.org/r/177128 (owner: 10Krinkle) [03:58:58] (03CR) 10Gage: [C: 032] logstash: Rules for processing MW input via Redis [puppet] - 10https://gerrit.wikimedia.org/r/175896 (owner: 10BryanDavis) [04:22:41] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Dec 3 04:22:41 UTC 2014 (duration 22m 40s) [04:22:47] Logged the message, Master [04:35:59] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: Puppet has 1 failures [04:51:15] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: Puppet has 1 failures [04:55:56] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [04:56:10] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [04:56:11] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: Puppet has 1 failures [04:56:32] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Puppet has 1 failures [04:57:31] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: Puppet has 1 failures [05:05:30] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:07:24] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:07:40] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:07:46] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [05:08:02] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:09:02] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:09:43] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:26:19] PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: Puppet has 1 failures [05:26:26] PROBLEM - puppet last run on virt1005 is CRITICAL: CRITICAL: Puppet has 1 failures [05:26:52] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 1 failures [05:28:33] PROBLEM - puppet last run on es2006 is CRITICAL: CRITICAL: Puppet has 1 failures [05:28:33] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: Puppet has 1 failures [05:28:33] PROBLEM - puppet last run on mc1015 is CRITICAL: CRITICAL: Puppet has 1 failures [05:28:33] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: Puppet has 1 failures [05:28:33] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: Puppet has 1 failures [05:28:33] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet has 1 failures [05:28:33] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: Puppet has 1 failures [05:28:33] PROBLEM - puppet last run on tmh1001 is CRITICAL: CRITICAL: Puppet has 1 failures [05:36:01] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:36:23] RECOVERY - puppet last run on analytics1031 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [05:36:42] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:36:53] RECOVERY - puppet last run on tmh1001 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:37:42] RECOVERY - puppet last run on radon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:38:13] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:38:57] RECOVERY - puppet last run on es2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:38:57] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 5 below the confidence bounds [05:39:23] RECOVERY - puppet last run on mc1015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:39:54] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:39:59] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:40:46] RECOVERY - puppet last run on virt1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:30:55] !log Reloading Zuul to deploy If499fe06e0392f4046f97f5633c08ba442649ec5 [06:31:03] Logged the message, Master [06:32:59] PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: puppet fail [06:35:15] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:36] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 3 failures [06:36:37] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:41:02] !log springle Synchronized wmf-config/db-eqiad.php: db1072 full load (duration: 00m 06s) [06:41:05] Logged the message, Master [06:45:48] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:46:39] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:47:30] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: Puppet last ran 14 hours ago [06:48:39] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:52] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: Puppet last ran 23 hours ago [06:49:41] RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:50:30] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: Puppet has 1 failures [06:50:50] PROBLEM - Apache HTTP on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:52:11] PROBLEM - HHVM rendering on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:53:56] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:55] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures [06:59:34] RECOVERY - puppet last run on lanthanum is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:04:03] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:13:12] (03CR) 10Giuseppe Lavagetto: [C: 031] logstash: Forward syslog events for apache2 + hhvm [puppet] - 10https://gerrit.wikimedia.org/r/176693 (owner: 10BryanDavis) [07:21:33] (03PS1) 10Hoo man: Enable displayStatementsOnProperties for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177182 [07:21:35] (03PS1) 10Hoo man: Simplify Wikibase configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177183 [07:45:44] (03CR) 10Hoo man: "Please note that the setting is true per default, so just removing it will work as intended." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177182 (owner: 10Hoo man) [07:59:14] !log depooling mw1183 for re-imaging [07:59:19] Logged the message, Master [08:02:09] !log depooling mw1182 for re-imaging [08:02:10] Logged the message, Master [08:03:39] PROBLEM - Host mw1182 is DOWN: PING CRITICAL - Packet loss = 100% [08:03:50] I scheduled downtime, I thought. [08:10:32] RECOVERY - Host mw1182 is UP: PING OK - Packet loss = 0%, RTA = 12.60 ms [08:13:59] !log depooling mw1181 for re-imaging [08:14:03] Logged the message, Master [08:19:44] !log depooling mw1180 for re-imaging [08:19:48] Logged the message, Master [08:24:11] !log depooling mw1179 for re-imaging [08:24:14] Logged the message, Master [08:27:00] <_joe_> !log depooling mw1081-1087 [08:27:02] Logged the message, Master [08:27:36] !log depooling mw1178 for re-imaging [08:27:38] Logged the message, Master [08:30:15] !log depooling mw1174-mw1176 [08:30:19] Logged the message, Master [08:35:27] PROBLEM - Host mw1174 is DOWN: PING CRITICAL - Packet loss = 100% [08:57:37] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Use class parameters specific to the class where they're used." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/176191 (owner: 10BryanDavis) [09:01:28] !log repooling mw1182 [09:01:33] Logged the message, Master [09:08:03] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: Puppet has 102 failures [09:08:03] PROBLEM - puppet last run on mw1174 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:08:12] PROBLEM - check configured eth on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:08:15] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Puppet has 102 failures [09:08:22] PROBLEM - DPKG on mw1174 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:08:43] PROBLEM - check if dhclient is running on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:08:52] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 102 failures [09:08:52] PROBLEM - Disk space on mw1174 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:09:25] PROBLEM - check if salt-minion is running on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:10:07] PROBLEM - nutcracker port on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:10:07] PROBLEM - DPKG on mw1083 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:10:17] PROBLEM - nutcracker process on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:10:29] PROBLEM - puppet last run on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:10:35] sigh, 2h not enough, apparently [09:10:36] PROBLEM - DPKG on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:10:37] oh [09:10:37] wait [09:10:40] these aren't mine [09:10:56] PROBLEM - Disk space on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:11:06] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 102 failures [09:11:09] _joe_: are you re-imaging 108x? [09:11:26] PROBLEM - HHVM processes on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:11:49] RECOVERY - Disk space on mw1174 is OK: DISK OK [09:12:17] PROBLEM - puppet last run on mw1082 is CRITICAL: CRITICAL: Puppet has 102 failures [09:12:47] PROBLEM - RAID on mw1086 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:13:08] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: Puppet has 102 failures [09:15:56] RECOVERY - DPKG on mw1083 is OK: All packages OK [09:15:58] PROBLEM - puppet last run on mw1085 is CRITICAL: CRITICAL: Puppet has 102 failures [09:16:12] <_joe_> YuviPanda: yes [09:16:16] PROBLEM - puppet last run on mw1087 is CRITICAL: CRITICAL: Puppet has 102 failures [09:16:17] ok [09:16:35] <_joe_> and it's not "2h not enough" [09:16:54] <_joe_> it's that these servers get yanked from icinga when you clean their puppet facts [09:16:56] RECOVERY - Disk space on mw1086 is OK: DISK OK [09:16:57] RECOVERY - check configured eth on mw1086 is OK: NRPE: Unable to read output [09:17:33] RECOVERY - HHVM processes on mw1086 is OK: PROCS OK: 1 process with command name hhvm [09:17:34] RECOVERY - check if dhclient is running on mw1086 is OK: PROCS OK: 0 processes with command name dhclient [09:17:39] _joe_: aah, right. so there's going to be alerts anyway. [09:17:43] when they come back [09:17:56] RECOVERY - check if salt-minion is running on mw1086 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:18:27] RECOVERY - RAID on mw1086 is OK: OK: no RAID installed [09:18:49] RECOVERY - nutcracker port on mw1086 is OK: TCP OK - 0.000 second response time on port 11212 [09:18:56] RECOVERY - nutcracker process on mw1086 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [09:20:07] RECOVERY - DPKG on mw1174 is OK: All packages OK [09:20:08] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:22:56] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:24:24] !log repooled mw1174-6,8,9, 80,81 [09:24:30] Logged the message, Master [09:24:35] !log depooled mw1045 for _joe_ [09:24:38] Logged the message, Master [09:25:17] RECOVERY - DPKG on mw1086 is OK: All packages OK [09:25:33] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:26:21] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:28:17] PROBLEM - Host mw1045 is DOWN: PING CRITICAL - Packet loss = 100% [09:28:37] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:30:38] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [09:31:05] RECOVERY - puppet last run on mw1087 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [09:31:14] <_joe_> !log repooled mw1081-1087 [09:31:18] Logged the message, Master [09:31:57] RECOVERY - Host mw1045 is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms [09:36:47] RECOVERY - puppet last run on mw1086 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:39:17] <_joe_> !log repooling mw1045, depooling 1088-1094 [09:39:22] Logged the message, Master [09:39:37] RECOVERY - puppet last run on mw1085 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:45:00] RECOVERY - puppet last run on mw1082 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:01:21] (03CR) 10Filippo Giunchedi: "minor comment, the rest LGTM." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/176334 (owner: 10Giuseppe Lavagetto) [10:06:52] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [10:07:47] (03CR) 10Giuseppe Lavagetto: hiera: role-based backend, role keyword (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/176334 (owner: 10Giuseppe Lavagetto) [10:08:10] (03CR) 10Giuseppe Lavagetto: "I will add tests as well, it makes sense." [puppet] - 10https://gerrit.wikimedia.org/r/176334 (owner: 10Giuseppe Lavagetto) [10:12:47] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [10:19:20] bleh yahoo is still soft-rejecting mail from lists.wikimedia.org. queue on sodium is now 40k. [10:21:25] <_joe_> yahoo dmarc policy is a PITA [10:24:06] Yahoo bounces started before the new DMARC policy, IIRC [10:25:33] See https://old-bugzilla.wikimedia.org/showdependencytree.cgi?id=56414&hide_resolved=1 , which might be correct or not [10:27:19] PROBLEM - check if dhclient is running on mw1088 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:27:59] PROBLEM - check if salt-minion is running on mw1088 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:00] PROBLEM - check configured eth on mw1090 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:00] PROBLEM - RAID on mw1092 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:25] PROBLEM - check if dhclient is running on mw1090 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:26] PROBLEM - RAID on mw1093 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:30] PROBLEM - nutcracker port on mw1088 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:30] PROBLEM - check if salt-minion is running on mw1090 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:40] PROBLEM - check configured eth on mw1092 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:28:46] <_joe_> grrr [10:29:20] PROBLEM - check configured eth on mw1093 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:29:20] PROBLEM - check if dhclient is running on mw1092 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:29:30] PROBLEM - check if dhclient is running on mw1093 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:30:51] RECOVERY - RAID on mw1092 is OK: OK: no RAID installed [10:31:33] RECOVERY - nutcracker port on mw1088 is OK: TCP OK - 0.000 second response time on port 11212 [10:31:33] RECOVERY - check if salt-minion is running on mw1090 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:31:33] RECOVERY - check configured eth on mw1092 is OK: NRPE: Unable to read output [10:32:04] RECOVERY - check if dhclient is running on mw1092 is OK: PROCS OK: 0 processes with command name dhclient [10:33:03] RECOVERY - check if dhclient is running on mw1088 is OK: PROCS OK: 0 processes with command name dhclient [10:33:42] RECOVERY - check if salt-minion is running on mw1088 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:33:42] RECOVERY - check configured eth on mw1090 is OK: NRPE: Unable to read output [10:34:14] RECOVERY - check if dhclient is running on mw1090 is OK: PROCS OK: 0 processes with command name dhclient [10:34:14] RECOVERY - RAID on mw1093 is OK: OK: no RAID installed [10:35:07] RECOVERY - check configured eth on mw1093 is OK: NRPE: Unable to read output [10:35:22] RECOVERY - check if dhclient is running on mw1093 is OK: PROCS OK: 0 processes with command name dhclient [10:37:40] godog: what is the file limit for swift ? [10:38:22] matanya: 5GB by default IIRC [10:40:07] godog: trying to upload 4.7 GB file, and fails with "file too large" is that a wiki software limit or swift limit ? [10:42:14] matanya: good question, I don't know offhand. Reedy might I believe since he uploaded wikimania material in the past (?) [10:42:35] 4.7 is disappointing... if it was 4G I could go around and ramble about 32-bit... what I am supposed to do with 4.7 ??? :-( [10:42:36] i have been try to exactly that :) [10:42:53] +1 akosiaris :D [10:43:20] maybe the multimedia guyes would know [10:43:20] 32.something bits! [10:43:37] btw godog you mean 5GB and not 5GiB right ? [10:43:40] PROBLEM - RAID on virt1005 is CRITICAL: CRITICAL: Active: 14, Working: 14, Failed: 2, Spare: 0 [10:43:53] 2 failed disks ??? [10:43:56] that can't be good [10:44:19] UU_UUUUU [10:44:20] weird [10:44:43] oh two arrays sharing the same set of disks [10:44:48] scary message [10:44:54] akosiaris: haha I have no idea, chances are that nor the people that wrote it had [10:45:12] jgage: ok that makes sense [10:45:14] thnaks [10:45:16] thanks* [10:46:05] godog: well if the limit is 5GiB and matanya is trying to upload 4.7GB, we just found out why [10:46:24] if not ... wild goose chase :-( [10:46:58] I guess i'll just split the file and see what happens [10:47:21] true that, I'm sure though that commons has files bigger than 4.7G, how they got there I have no idea [10:48:33] server side upload, i guess [10:52:57] <_joe_> !log repooling mw1088-1094, depooling mw1095-1100 [10:53:01] Logged the message, Master [10:53:36] <_joe_> akosiaris: I've seen you marked a few servers for reimaging, are you going to do those today? else I'd reimage them today [10:53:48] <_joe_> akosiaris: also, mw1054 and mw1055 are the ganglia aggregators [10:53:59] <_joe_> so don't do them at the same time [10:54:43] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: Puppet has 1 failures [10:54:53] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 1 failures [10:55:04] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [10:55:48] (03CR) 10Alexandros Kosiaris: [C: 031] facilities: move to module [puppet] - 10https://gerrit.wikimedia.org/r/176863 (owner: 10Dzahn) [10:55:57] <_joe_> \o/ [10:56:11] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Puppet has 1 failures [10:56:12] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: Puppet has 1 failures [10:56:20] _joe_: yeah, I am gonna do them today, but thanks for the heads up [10:56:27] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: Puppet has 1 failures [10:56:28] I might have missed it [10:56:35] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 1 failures [10:56:36] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet has 1 failures [10:56:36] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [10:56:40] apt ? [10:56:47] <_joe_> lemme look [10:56:54] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [10:56:54] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 1 failures [10:57:36] yeah [10:57:39] message: "Command exceeded timeout" [10:57:40] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [10:57:43] for apt-get update [10:58:04] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [10:58:05] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [10:58:19] <_joe_> yep [11:00:03] finding right second to cut over ssh, ugh, ouch, pain [11:05:56] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [11:05:56] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [11:06:44] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:48] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [11:06:54] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:55] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:48] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:07:54] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:12] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:25] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:08:28] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:08:28] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:09:14] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [11:09:22] <_joe_> godog: testing of hiera backends looks painful, but I'll try [11:09:24] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:09:24] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:10:00] _joe_: yeah no idea how to do that, even just regression testing would be useful I think [11:10:29] <_joe_> godog: I know, studying it, I know exactly zero about ruby testing [11:10:49] sweet [11:16:32] (03CR) 10JanZerebecki: [C: 031] "It was confirmed: https://ru.wikinews.org/w/index.php?title=%D0%92%D0%B8%D0%BA%D0%B8%D0%BD%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8%3A%D0%A4%D0" [puppet] - 10https://gerrit.wikimedia.org/r/173078 (owner: 10JanZerebecki) [11:24:47] !log reimaging mw1054-mw1059 [11:24:49] Logged the message, Master [11:26:29] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: puppet fail [11:31:58] PROBLEM - Disk space on mw1058 is CRITICAL: Connection refused by host [11:32:00] PROBLEM - Apache HTTP on mw1058 is CRITICAL: Connection refused [11:32:19] PROBLEM - nutcracker process on mw1058 is CRITICAL: Connection refused by host [11:32:38] PROBLEM - puppet last run on mw1055 is CRITICAL: Connection refused by host [11:33:09] PROBLEM - DPKG on mw1055 is CRITICAL: Connection refused by host [11:33:09] PROBLEM - Disk space on mw1055 is CRITICAL: Connection refused by host [11:33:10] PROBLEM - SSH on mw1055 is CRITICAL: Connection refused [11:33:18] godog: no joy even after splitting [11:33:58] PROBLEM - RAID on mw1055 is CRITICAL: Connection refused by host [11:34:08] PROBLEM - check if dhclient is running on mw1055 is CRITICAL: Connection refused by host [11:34:28] PROBLEM - Disk space on mw1059 is CRITICAL: Connection refused by host [11:34:28] PROBLEM - check if salt-minion is running on mw1055 is CRITICAL: Connection refused by host [11:34:38] PROBLEM - nutcracker process on mw1059 is CRITICAL: Connection refused by host [11:34:50] PROBLEM - nutcracker port on mw1055 is CRITICAL: Connection refused by host [11:34:52] PROBLEM - Apache HTTP on mw1055 is CRITICAL: Connection refused [11:34:52] PROBLEM - puppet last run on mw1059 is CRITICAL: Connection refused by host [11:35:21] PROBLEM - check configured eth on mw1055 is CRITICAL: Connection refused by host [11:35:27] PROBLEM - nutcracker process on mw1055 is CRITICAL: Connection refused by host [11:35:27] PROBLEM - RAID on mw1059 is CRITICAL: Connection refused by host [11:35:27] PROBLEM - SSH on mw1059 is CRITICAL: Connection refused [11:35:28] PROBLEM - check configured eth on mw1059 is CRITICAL: Connection refused by host [11:35:37] matanya: what's a filename I can look for? [11:35:48] PROBLEM - SSH on mw1058 is CRITICAL: Connection refused [11:35:59] PROBLEM - check configured eth on mw1058 is CRITICAL: Connection refused by host [11:36:07] Evaluation_I_Metrics_p1-001.webm [11:36:12] PROBLEM - check if dhclient is running on mw1058 is CRITICAL: Connection refused by host [11:36:12] PROBLEM - nutcracker port on mw1058 is CRITICAL: Connection refused by host [11:36:12] PROBLEM - check if salt-minion is running on mw1058 is CRITICAL: Connection refused by host [11:36:48] PROBLEM - puppet last run on mw1058 is CRITICAL: Connection refused by host [11:36:51] PROBLEM - check if dhclient is running on mw1059 is CRITICAL: Connection refused by host [11:36:51] PROBLEM - nutcracker port on mw1059 is CRITICAL: Connection refused by host [11:36:59] PROBLEM - Apache HTTP on mw1059 is CRITICAL: Connection refused [11:37:00] PROBLEM - DPKG on mw1059 is CRITICAL: Connection refused by host [11:37:09] PROBLEM - RAID on mw1058 is CRITICAL: Connection refused by host [11:37:10] PROBLEM - DPKG on mw1058 is CRITICAL: Connection refused by host [11:37:19] PROBLEM - check if salt-minion is running on mw1059 is CRITICAL: Connection refused by host [11:37:49] matanya: mhh doesn't look like it is making it to swift, perhaps it is limited in MW [11:38:01] thanks [11:39:29] PROBLEM - Apache HTTP on mw1057 is CRITICAL: Connection refused [11:39:29] PROBLEM - nutcracker process on mw1057 is CRITICAL: Connection refused by host [11:40:16] yes godog 1GB [11:40:21] found it [11:43:34] PROBLEM - RAID on mw1057 is CRITICAL: Connection refused by host [11:43:44] PROBLEM - SSH on mw1057 is CRITICAL: Connection refused [11:44:04] PROBLEM - check configured eth on mw1057 is CRITICAL: Connection refused by host [11:44:22] PROBLEM - check if dhclient is running on mw1057 is CRITICAL: Connection refused by host [11:44:34] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:44:37] PROBLEM - check if salt-minion is running on mw1057 is CRITICAL: Connection refused by host [11:44:47] PROBLEM - nutcracker port on mw1057 is CRITICAL: Connection refused by host [11:45:03] PROBLEM - puppet last run on mw1057 is CRITICAL: Connection refused by host [11:45:53] 1GB ? [11:45:55] sigh [11:46:23] RECOVERY - SSH on mw1059 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [11:46:42] matanya: btw what browser ? this always comes to mind when I hear about browser upload problems: http://www.motobit.com/help/scptutl/pa98.htm [11:46:53] PROBLEM - DPKG on mw1057 is CRITICAL: Connection refused by host [11:46:57] outdated these days but it was very helpful some years ago ... [11:47:03] akosiaris: not browser side, mw restrication [11:47:11] https://commons.wikimedia.org/wiki/Commons:Maximum_file_size [11:47:12] PROBLEM - Disk space on mw1057 is CRITICAL: Connection refused by host [11:47:13] (03PS2) 10Nemo bis: Graph User::pingLimiter() actions in gdash [puppet] - 10https://gerrit.wikimedia.org/r/166511 (https://bugzilla.wikimedia.org/65478) [11:47:28] matanya: yeah, I got that... just curious and reminiscent of the old (bad) days [11:47:57] i have uploaded a 1.1 TB file using ff a few weeks ago [11:48:55] anyway' it is ff [11:49:52] but it did try to upload the 4.7 file. I think old ff would not even try [11:50:36] there... proof things are getting better. Take that old guys at Muppet show!!!! [11:50:55] and let's use wp to remind myself of their names... [11:51:34] Statler and Waldorf :) [11:52:08] RECOVERY - SSH on mw1057 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [12:00:44] godog: akosiaris once all done i'll file a server side upload bug [12:01:05] likely to take about a month from now [12:03:05] for the bug? [12:04:09] yes, encoding those sizes with the hardware i have takes days [12:06:45] RECOVERY - Apache HTTP on mw1057 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.002 second response time [12:09:16] I'm sure we can find some cpu too if needed [12:09:44] RECOVERY - nutcracker process on mw1057 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [12:09:58] RECOVERY - check configured eth on mw1057 is OK: NRPE: Unable to read output [12:10:10] RECOVERY - Disk space on mw1057 is OK: DISK OK [12:10:18] RECOVERY - check if dhclient is running on mw1057 is OK: PROCS OK: 0 processes with command name dhclient [12:10:22] RECOVERY - check if salt-minion is running on mw1057 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:10:28] RECOVERY - nutcracker port on mw1057 is OK: TCP OK - 0.000 second response time on port 11212 [12:11:40] RECOVERY - RAID on mw1057 is OK: OK: no RAID installed [12:11:53] PROBLEM - NTP on mw1059 is CRITICAL: NTP CRITICAL: No response from NTP server [12:16:24] godog: did some math: it takes ~6h per GB, i have 1.5 TB, so 1500GB *6h = 9000h = 375 days = over a year :) [12:17:22] matanya: yeah that's not going to work [12:18:27] RECOVERY - DPKG on mw1057 is OK: All packages OK [12:19:20] PROBLEM - puppet last run on mw1057 is CRITICAL: CRITICAL: Puppet has 102 failures [12:21:48] RECOVERY - Apache HTTP on mw1059 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.008 second response time [12:24:04] RECOVERY - NTP on mw1059 is OK: NTP OK: Offset -0.06999111176 secs [12:24:23] RECOVERY - RAID on mw1059 is OK: OK: no RAID installed [12:24:34] RECOVERY - nutcracker process on mw1059 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [12:24:54] RECOVERY - DPKG on mw1059 is OK: All packages OK [12:25:03] RECOVERY - Disk space on mw1059 is OK: DISK OK [12:26:13] RECOVERY - check configured eth on mw1059 is OK: NRPE: Unable to read output [12:26:33] PROBLEM - puppet last run on mw1055 is CRITICAL: CRITICAL: Puppet has 102 failures [12:26:35] PROBLEM - puppet last run on mw1058 is CRITICAL: Connection refused by host [12:26:36] RECOVERY - check if dhclient is running on mw1059 is OK: PROCS OK: 0 processes with command name dhclient [12:26:44] RECOVERY - check if salt-minion is running on mw1059 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:26:53] PROBLEM - DPKG on mw1058 is CRITICAL: Connection refused by host [12:26:57] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: Puppet has 102 failures [12:27:05] RECOVERY - nutcracker port on mw1059 is OK: TCP OK - 0.000 second response time on port 11212 [12:27:34] PROBLEM - Disk space on mw1058 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:30:23] RECOVERY - Disk space on mw1058 is OK: DISK OK [12:33:13] PROBLEM - puppet last run on mw1059 is CRITICAL: CRITICAL: Puppet has 102 failures [12:37:07] PROBLEM - HHVM rendering on mw1055 is CRITICAL: Connection refused [12:37:58] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 102 failures [12:38:34] RECOVERY - DPKG on mw1058 is OK: All packages OK [12:40:24] PROBLEM - HHVM rendering on mw1056 is CRITICAL: Connection refused [12:40:34] PROBLEM - HHVM rendering on mw1057 is CRITICAL: Connection refused [12:43:53] PROBLEM - Apache HTTP on mw1055 is CRITICAL: Connection refused [12:47:17] PROBLEM - Apache HTTP on mw1056 is CRITICAL: Connection refused [12:47:54] PROBLEM - Apache HTTP on mw1057 is CRITICAL: Connection refused [12:49:00] RECOVERY - HHVM rendering on mw1055 is OK: HTTP OK: HTTP/1.1 200 OK - 69282 bytes in 4.871 second response time [12:49:36] RECOVERY - puppet last run on mw1055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:49:36] RECOVERY - Apache HTTP on mw1055 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.156 second response time [12:51:16] RECOVERY - puppet last run on mw1057 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [12:51:57] RECOVERY - HHVM rendering on mw1056 is OK: HTTP OK: HTTP/1.1 200 OK - 69281 bytes in 0.311 second response time [12:52:18] RECOVERY - HHVM rendering on mw1057 is OK: HTTP OK: HTTP/1.1 200 OK - 69282 bytes in 3.762 second response time [12:52:57] RECOVERY - Apache HTTP on mw1056 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.314 second response time [12:53:06] RECOVERY - puppet last run on mw1056 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:53:37] RECOVERY - Apache HTTP on mw1057 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.129 second response time [12:54:47] PROBLEM - HHVM rendering on mw1059 is CRITICAL: Connection refused [12:56:26] PROBLEM - puppet last run on analytics1013 is CRITICAL: CRITICAL: Puppet has 1 failures [12:58:56] PROBLEM - Apache HTTP on mw1059 is CRITICAL: Connection refused [13:03:40] RECOVERY - HHVM rendering on mw1059 is OK: HTTP OK: HTTP/1.1 200 OK - 69281 bytes in 0.252 second response time [13:04:44] RECOVERY - Apache HTTP on mw1059 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 6.396 second response time [13:05:30] RECOVERY - puppet last run on mw1059 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [13:07:13] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:31] !log depool mw1177 for hhvm re-imaging [13:10:35] Logged the message, Master [13:11:04] RECOVERY - puppet last run on analytics1013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:09] !log depool mw1170-3 for re-imaging [13:13:11] Logged the message, Master [13:16:55] PROBLEM - Host mw1172 is DOWN: PING CRITICAL - Packet loss = 100% [13:16:55] PROBLEM - Host mw1171 is DOWN: PING CRITICAL - Packet loss = 100% [13:40:11] PROBLEM - Host mw1169 is DOWN: PING CRITICAL - Packet loss = 100% [13:43:09] RECOVERY - Host mw1169 is UP: PING OK - Packet loss = 0%, RTA = 1.79 ms [13:46:21] PROBLEM - RAID on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:47:22] PROBLEM - check configured eth on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:48:04] PROBLEM - check if dhclient is running on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:48:12] PROBLEM - check if salt-minion is running on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:48:53] PROBLEM - nutcracker port on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:48:54] PROBLEM - nutcracker process on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:49:01] PROBLEM - puppet last run on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:49:03] !log depooling mw1168 for re-imaging [13:49:07] Logged the message, Master [13:49:21] PROBLEM - DPKG on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:49:42] PROBLEM - Disk space on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:49:42] PROBLEM - DPKG on mw1173 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:50:07] PROBLEM - HHVM processes on mw1171 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [13:51:02] RECOVERY - check if dhclient is running on mw1171 is OK: PROCS OK: 0 processes with command name dhclient [13:51:10] !log depooling mw1167 for re-imaging [13:51:12] Logged the message, Master [13:51:13] RECOVERY - check if salt-minion is running on mw1171 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:51:48] RECOVERY - nutcracker port on mw1171 is OK: TCP OK - 0.000 second response time on port 11212 [13:51:50] RECOVERY - nutcracker process on mw1171 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [13:52:22] RECOVERY - RAID on mw1171 is OK: OK: no RAID installed [13:52:45] RECOVERY - Disk space on mw1171 is OK: DISK OK [13:53:24] RECOVERY - HHVM processes on mw1171 is OK: PROCS OK: 1 process with command name hhvm [13:53:34] RECOVERY - check configured eth on mw1171 is OK: NRPE: Unable to read output [13:55:23] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: Puppet has 102 failures [13:55:43] RECOVERY - DPKG on mw1173 is OK: All packages OK [13:56:35] PROBLEM - puppet last run on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:56:43] PROBLEM - DPKG on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:56:48] hmm, rhenium? [13:57:03] PROBLEM - Disk space on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:57:48] rhenium's down, and I've no idea what rhenium was [13:57:54] * YuviPanda checks icinga [13:58:25] RECOVERY - DPKG on mw1171 is OK: All packages OK [13:58:37] PROBLEM - SSH on rhenium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:37] PROBLEM - RAID on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:03] PROBLEM - check if dhclient is running on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:15] PROBLEM - check if salt-minion is running on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:59:23] PROBLEM - check configured eth on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:03:49] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: Puppet has 1 failures [14:07:04] <_joe_> !log depooling mw1101-mw1107 [14:07:08] Logged the message, Master [14:07:44] _joe_: at this rate we'll finish everyting in the next few hours :) [14:08:06] <_joe_> YuviPanda: we still have API left [14:08:07] <_joe_> :) [14:08:15] the app servers, at least [14:12:29] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:23:18] (03CR) 10QChris: [C: 04-1] "This would break the link between phabricator and bugzilla." [puppet] - 10https://gerrit.wikimedia.org/r/177128 (owner: 10Krinkle) [14:26:27] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [14:27:17] PROBLEM - DPKG on mw1168 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:23] !log reimaging mw1054 [14:28:28] Logged the message, Master [14:29:48] PROBLEM - check if salt-minion is running on mw1054 is CRITICAL: Connection refused by host [14:30:07] PROBLEM - nutcracker port on mw1054 is CRITICAL: Connection refused by host [14:30:27] PROBLEM - puppet last run on mw1054 is CRITICAL: Connection refused by host [14:30:37] PROBLEM - Disk space on mw1054 is CRITICAL: Connection refused by host [14:31:07] PROBLEM - NTP on rhenium is CRITICAL: NTP CRITICAL: No response from NTP server [14:31:36] PROBLEM - RAID on mw1054 is CRITICAL: Connection refused by host [14:31:37] PROBLEM - Apache HTTP on mw1054 is CRITICAL: Connection refused [14:31:46] PROBLEM - SSH on mw1054 is CRITICAL: Connection refused [14:31:47] PROBLEM - nutcracker process on mw1054 is CRITICAL: Connection refused by host [14:32:01] PROBLEM - check configured eth on mw1054 is CRITICAL: Connection refused by host [14:32:18] PROBLEM - check if dhclient is running on mw1054 is CRITICAL: Connection refused by host [14:32:28] PROBLEM - DPKG on mw1054 is CRITICAL: Connection refused by host [14:34:07] akosiaris: graphite is tcp? [14:34:17] ottomata: both tcp and udp [14:34:26] hm, i think i need tc [14:34:27] tcp [14:34:39] kind of different services depending on transport and port [14:34:47] oh [14:34:52] hm [14:34:54] http://graphite.readthedocs.org/en/latest/carbon-daemons.html [14:35:03] The [cache] section tells carbon-cache.py what ports (2003/2004/7002), protocols (newline delimited, pickle) and transports (TCP/UDP) to listen on. [14:35:04] i was testing according to this doc:https://wikitech.wikimedia.org/wiki/Graphite [14:35:06] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 1 failures [14:35:17] and also i see timeouts from my graphite sender trying to connect [14:35:24] ok so you need tcp too [14:35:34] lemme fix that [14:35:47] RECOVERY - DPKG on mw1168 is OK: All packages OK [14:36:00] akosiaris: ottomata I don't think our graphite instance lets you actually send directly to graphite. Think you've to go through statsd [14:36:02] k danke [14:36:07] oh, hm [14:36:11] i can do statsd [14:36:15] it is just as easy [14:36:19] (that is udp and alreayd open?) [14:36:25] oh [14:36:28] no, it listens too [14:36:34] i mean, graphite does too. [14:36:34] i really don't know this system [14:36:38] which is better? [14:36:48] ottomata: just send it to statsd, since that's what most of our other services do [14:36:51] ok [14:37:12] akosiaris: still, go ahead and open that, if you don't mind [14:37:15] should be open anyway [14:37:53] YuviPanda: that is udp 8125? [14:38:05] yeah, think so [14:38:21] tungsten? [14:38:24] or graphite1001? [14:38:28] tungsten [14:38:34] k [14:39:23] we also have a statsd.eqiad.wmnet cname if you need that [14:40:19] oh, ok, that sounds nicer [14:40:21] i'll use that [14:44:35] (03PS1) 10Ottomata: Add statsd parameters to kafka::server::jmxtrans [puppet/kafka] - 10https://gerrit.wikimedia.org/r/177218 [14:44:59] (03CR) 10Ottomata: [C: 032] Add statsd parameters to kafka::server::jmxtrans [puppet/kafka] - 10https://gerrit.wikimedia.org/r/177218 (owner: 10Ottomata) [14:45:42] PROBLEM - check if dhclient is running on mw1103 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:45:43] PROBLEM - check configured eth on mw1104 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:45:43] PROBLEM - check if dhclient is running on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:46:13] PROBLEM - check if salt-minion is running on mw1103 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:46:13] PROBLEM - check if dhclient is running on mw1104 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:46:13] PROBLEM - check if salt-minion is running on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:46:44] PROBLEM - check if salt-minion is running on mw1104 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:46:47] (03PS1) 10Ottomata: Send kafka jmx stats via jmxtrans to statsd (not to graphite directly) [puppet] - 10https://gerrit.wikimedia.org/r/177221 [14:47:02] (03PS2) 10Ottomata: Send kafka jmx stats via jmxtrans to statsd (not to graphite directly) [puppet] - 10https://gerrit.wikimedia.org/r/177221 [14:47:12] PROBLEM - nutcracker port on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:47:13] PROBLEM - nutcracker port on mw1103 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:47:23] !log repooled mw1173 mw1171 mw1169 mw1168 mw1167 [14:47:28] Logged the message, Master [14:47:43] PROBLEM - nutcracker port on mw1104 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:47:43] PROBLEM - nutcracker process on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:47:43] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: Puppet has 8 failures [14:47:52] PROBLEM - nutcracker process on mw1104 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:47:52] PROBLEM - puppet last run on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:48:06] PROBLEM - puppet last run on mw1104 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:48:06] PROBLEM - DPKG on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:48:13] PROBLEM - DPKG on mw1104 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:48:13] PROBLEM - Disk space on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:48:23] PROBLEM - Disk space on mw1104 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:48:32] RECOVERY - check if dhclient is running on mw1103 is OK: PROCS OK: 0 processes with command name dhclient [14:48:43] PROBLEM - HHVM processes on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:49:02] RECOVERY - check if dhclient is running on mw1104 is OK: PROCS OK: 0 processes with command name dhclient [14:49:12] RECOVERY - check if salt-minion is running on mw1103 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:49:33] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [14:49:42] RECOVERY - check if salt-minion is running on mw1104 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:49:54] PROBLEM - RAID on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:50:12] RECOVERY - nutcracker port on mw1103 is OK: TCP OK - 0.000 second response time on port 11212 [14:50:26] PROBLEM - check configured eth on mw1107 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [14:50:33] RECOVERY - nutcracker port on mw1104 is OK: TCP OK - 0.000 second response time on port 11212 [14:50:33] RECOVERY - nutcracker process on mw1107 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [14:50:33] PROBLEM - puppet last run on mw1106 is CRITICAL: CRITICAL: Puppet has 102 failures [14:50:44] RECOVERY - nutcracker process on mw1104 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [14:50:53] RECOVERY - DPKG on mw1107 is OK: All packages OK [14:51:02] (03CR) 10Ottomata: [C: 032] Send kafka jmx stats via jmxtrans to statsd (not to graphite directly) [puppet] - 10https://gerrit.wikimedia.org/r/177221 (owner: 10Ottomata) [14:51:03] RECOVERY - Disk space on mw1107 is OK: DISK OK [14:51:24] RECOVERY - Disk space on mw1104 is OK: DISK OK [14:51:25] RECOVERY - check configured eth on mw1104 is OK: NRPE: Unable to read output [14:51:25] RECOVERY - check if dhclient is running on mw1107 is OK: PROCS OK: 0 processes with command name dhclient [14:51:34] RECOVERY - HHVM processes on mw1107 is OK: PROCS OK: 1 process with command name hhvm [14:52:04] RECOVERY - check if salt-minion is running on mw1107 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:52:43] RECOVERY - RAID on mw1107 is OK: OK: no RAID installed [14:53:03] RECOVERY - nutcracker port on mw1107 is OK: TCP OK - 0.000 second response time on port 11212 [14:53:56] RECOVERY - check configured eth on mw1107 is OK: NRPE: Unable to read output [14:54:20] (03PS1) 10Ottomata: Send jvm stats to statsd as well [puppet/kafka] - 10https://gerrit.wikimedia.org/r/177225 [14:54:34] (03CR) 10Ottomata: [C: 032] Send jvm stats to statsd as well [puppet/kafka] - 10https://gerrit.wikimedia.org/r/177225 (owner: 10Ottomata) [14:55:06] ottomata: graphite's 2003 tcp port is now open as well [14:55:10] (03PS1) 10Ottomata: Update kafka module to send JVM JMX stats to statsd as well [puppet] - 10https://gerrit.wikimedia.org/r/177227 [14:55:10] danke [14:55:19] (03CR) 10jenkins-bot: [V: 04-1] Update kafka module to send JVM JMX stats to statsd as well [puppet] - 10https://gerrit.wikimedia.org/r/177227 (owner: 10Ottomata) [14:55:23] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [14:55:28] (03PS2) 10Ottomata: Update kafka module to send JVM JMX stats to statsd as well [puppet] - 10https://gerrit.wikimedia.org/r/177227 [14:55:38] (03CR) 10Ottomata: [C: 032 V: 032] Update kafka module to send JVM JMX stats to statsd as well [puppet] - 10https://gerrit.wikimedia.org/r/177227 (owner: 10Ottomata) [14:56:13] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: Puppet has 101 failures [14:56:45] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 101 failures [14:57:13] <_joe_> meh [14:57:25] <_joe_> damn downtime getting deleted during reimaging [14:57:27] RECOVERY - DPKG on mw1104 is OK: All packages OK [14:58:04] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Puppet has 1 failures [14:58:05] PROBLEM - HHVM rendering on mw1101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:58:26] (03PS1) 10Ottomata: Add statsd parameters to jmxtrans::metrics::jvm [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/177228 [14:58:37] PROBLEM - HHVM rendering on mw1103 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:58:39] (03PS2) 10Ottomata: Add statsd parameters to jmxtrans::metrics::jvm [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/177228 [14:58:45] PROBLEM - HHVM rendering on mw1104 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:58:58] _joe_: yeah, been there... not much we can do I am afraid [14:59:05] (03CR) 10Ottomata: [C: 032] Add statsd parameters to jmxtrans::metrics::jvm [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/177228 (owner: 10Ottomata) [14:59:22] <_joe_> akosiaris: I have a script that sets the downtime automatically, but it didn't work this time [14:59:33] <_joe_> (yes I am getting very lazy) [14:59:58] PROBLEM - puppet last run on mw1107 is CRITICAL: CRITICAL: Puppet has 101 failures [14:59:58] (03PS1) 10Ottomata: Update jmxtrans module with statsd support in jmxtrans::metrics::jvm [puppet] - 10https://gerrit.wikimedia.org/r/177229 [15:00:15] PROBLEM - puppet last run on mw1104 is CRITICAL: CRITICAL: Puppet has 1 failures [15:00:28] (03CR) 10Ottomata: [C: 032 V: 032] Update jmxtrans module with statsd support in jmxtrans::metrics::jvm [puppet] - 10https://gerrit.wikimedia.org/r/177229 (owner: 10Ottomata) [15:01:32] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: puppet fail [15:01:44] RECOVERY - HHVM rendering on mw1101 is OK: HTTP OK: HTTP/1.1 200 OK - 69276 bytes in 0.339 second response time [15:03:12] RECOVERY - HHVM rendering on mw1104 is OK: HTTP OK: HTTP/1.1 200 OK - 69276 bytes in 0.312 second response time [15:04:05] RECOVERY - HHVM rendering on mw1103 is OK: HTTP OK: HTTP/1.1 200 OK - 69276 bytes in 0.316 second response time [15:06:13] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:07:02] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [15:07:04] RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:08:28] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:08:32] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: puppet fail [15:12:50] !log reimaging mw1149-1152 [15:12:56] Logged the message, Master [15:14:46] !log depool mw1164-66 [15:14:48] Logged the message, Master [15:14:48] RECOVERY - puppet last run on mw1107 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [15:15:44] PROBLEM - HHVM rendering on mw1054 is CRITICAL: Connection refused [15:16:32] yuvipanda-I am going to power off mw1170 and mw1183 [15:16:51] cmjohnson: cool. [15:16:52] i tried to swapping cable but that wasn't it [15:16:57] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:17:59] RECOVERY - puppet last run on mw1104 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:18:06] PROBLEM - check if salt-minion is running on mw1183 is CRITICAL: Timeout while attempting connection [15:18:28] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [15:18:53] <_joe_> !log repooling mw1101-1107, depooling mw1108-1113 [15:18:59] Logged the message, Master [15:20:04] !log rebooting mw1054 for kernel upgrade [15:20:06] Logged the message, Master [15:20:30] <_joe_> akosiaris: I cowardly left that for a later date most of the times [15:21:10] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:21:11] PROBLEM - Host mw1054 is DOWN: CRITICAL - Plugin timed out after 15 seconds [15:22:38] RECOVERY - Host mw1054 is UP: PING OK - Packet loss = 0%, RTA = 4.35 ms [15:22:51] RECOVERY - check if salt-minion is running on mw1183 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:24:42] _joe_: oh, I usually do a salt -E 'blablah' cmd.run 'aptitude dist-upgrade -y && reboot' after the reimaging [15:24:45] RECOVERY - HHVM rendering on mw1054 is OK: HTTP OK: HTTP/1.1 200 OK - 69400 bytes in 0.242 second response time [15:24:59] it is just that mw1054 escaped me due to the ganglia aggregator thing [15:25:16] RECOVERY - puppet last run on mw1054 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:28:48] !log depooling mw1161-62 for re-imaging [15:28:52] Logged the message, Master [15:30:53] PROBLEM - Host mw1162 is DOWN: PING CRITICAL - Packet loss = 100% [15:30:53] PROBLEM - Host mw1161 is DOWN: PING CRITICAL - Packet loss = 100% [15:34:52] yuvipanda, mw1170,1172,1177,1183 are all fixed for you. Happy installs! [15:35:01] cmjohnson: w00t. thank you! [15:35:14] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: puppet fail [15:38:04] PROBLEM - Host mw1170 is DOWN: PING CRITICAL - Packet loss = 100% [15:38:05] RECOVERY - Host mw1162 is UP: PING OK - Packet loss = 0%, RTA = 4.06 ms [15:38:05] RECOVERY - Host mw1161 is UP: PING OK - Packet loss = 0%, RTA = 2.99 ms [15:38:36] PROBLEM - Host mw1183 is DOWN: PING CRITICAL - Packet loss = 100% [15:41:02] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [15:43:51] RECOVERY - Host mw1183 is UP: PING OK - Packet loss = 0%, RTA = 2.34 ms [15:44:33] <_joe_> we have a peak of 503s [15:44:51] <_joe_> due to googlebot asking rbots.txt and its timing out [15:44:57] <_joe_> both on zend and on hhvm [15:45:05] RECOVERY - Host mw1170 is UP: PING OK - Packet loss = 0%, RTA = 7.55 ms [15:46:37] <_joe_> can we stop the craziness for one second? [15:47:02] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 7 below the confidence bounds [15:49:35] PROBLEM - DPKG on mw1183 is CRITICAL: Connection refused by host [15:49:47] <_joe_> https://gdash.wikimedia.org/dashboards/reqerror/ [15:49:57] PROBLEM - nutcracker process on mw1183 is CRITICAL: Connection refused by host [15:49:58] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [15:50:26] PROBLEM - SSH on mw1183 is CRITICAL: Connection refused [15:50:26] PROBLEM - Disk space on mw1183 is CRITICAL: Connection refused by host [15:50:28] * anomie sees nothing for SWAT this morning [15:50:41] PROBLEM - puppet last run on mw1183 is CRITICAL: Connection refused by host [15:50:56] PROBLEM - check configured eth on mw1183 is CRITICAL: Connection refused by host [15:50:56] PROBLEM - RAID on mw1183 is CRITICAL: Connection refused by host [15:50:57] PROBLEM - puppet last run on mw1082 is CRITICAL: CRITICAL: Puppet has 1 failures [15:51:06] PROBLEM - check if dhclient is running on mw1183 is CRITICAL: Connection refused by host [15:51:15] PROBLEM - RAID on mw1170 is CRITICAL: Timeout while attempting connection [15:51:17] PROBLEM - SSH on mw1170 is CRITICAL: Connection timed out [15:51:26] PROBLEM - check configured eth on mw1170 is CRITICAL: Timeout while attempting connection [15:51:35] PROBLEM - nutcracker port on mw1170 is CRITICAL: Timeout while attempting connection [15:51:36] <_joe_> it's apparently fading [15:51:49] PROBLEM - check if dhclient is running on mw1170 is CRITICAL: Timeout while attempting connection [15:51:55] PROBLEM - check if salt-minion is running on mw1170 is CRITICAL: Connection refused by host [15:51:58] <_joe_> and backends can render it perfectly [15:52:07] PROBLEM - Apache HTTP on mw1170 is CRITICAL: Connection refused [15:52:08] PROBLEM - check if salt-minion is running on mw1183 is CRITICAL: Timeout while attempting connection [15:52:16] PROBLEM - puppet last run on mw1170 is CRITICAL: Connection refused by host [15:52:26] PROBLEM - DPKG on mw1170 is CRITICAL: Connection refused by host [15:52:35] PROBLEM - nutcracker process on mw1170 is CRITICAL: Connection refused by host [15:52:35] PROBLEM - Apache HTTP on mw1183 is CRITICAL: Connection timed out [15:52:35] PROBLEM - nutcracker port on mw1183 is CRITICAL: Timeout while attempting connection [15:52:57] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 1 failures [15:53:08] PROBLEM - Disk space on mw1170 is CRITICAL: Connection refused by host [15:53:25] hmm, I did schedule downtime for these [15:53:26] RECOVERY - SSH on mw1183 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [15:54:07] RECOVERY - SSH on mw1170 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [15:55:07] <_joe_> anomie: before SWATting, please sync with me YuviPanda and akosiaris [15:55:26] <_joe_> we're reimaging quite a few servers, it might be advisable to wait for those to be done [15:55:38] <_joe_> they're all running scap for a full sync right now [15:58:51] (03PS5) 10BryanDavis: Use hiera to configure udp2log endpoint for ::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/176191 [15:58:58] (03CR) 10BryanDavis: Use hiera to configure udp2log endpoint for ::mediawiki (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/176191 (owner: 10BryanDavis) [15:59:10] (03PS6) 10BryanDavis: Use hiera to configure udp2log endpoint for ::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/176191 [15:59:17] PROBLEM - puppet last run on virt1007 is CRITICAL: CRITICAL: Puppet has 1 failures [15:59:28] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [15:59:28] PROBLEM - puppet last run on thallium is CRITICAL: CRITICAL: Puppet has 1 failures [15:59:29] PROBLEM - HHVM busy threads on mw1234 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [115.2] [15:59:34] (03PS10) 10BryanDavis: logstash: Forward syslog events for apache2 + hhvm [puppet] - 10https://gerrit.wikimedia.org/r/176693 [15:59:42] _joe_: No SWAT this morning, unless someone comes up with something at the last minute. [16:00:04] manybubbles, anomie, ^d, marktraceur: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141203T1600). Please do the needful. [16:00:15] jouncebot: I'm in a meeting. [16:00:19] anomie: cool [16:00:25] PROBLEM - puppet last run on mw1116 is CRITICAL: CRITICAL: Puppet has 1 failures [16:00:25] PROBLEM - puppet last run on search1023 is CRITICAL: CRITICAL: Puppet has 1 failures [16:00:25] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: Puppet has 1 failures [16:00:36] PROBLEM - puppet last run on mc1007 is CRITICAL: CRITICAL: Puppet has 1 failures [16:00:45] PROBLEM - puppet last run on mc1001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:01:21] PROBLEM - puppet last run on db1055 is CRITICAL: CRITICAL: Puppet has 1 failures [16:01:24] PROBLEM - puppet last run on mw1156 is CRITICAL: CRITICAL: Puppet has 1 failures [16:01:37] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: Puppet has 1 failures [16:01:37] PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Puppet has 1 failures [16:01:37] PROBLEM - puppet last run on es1002 is CRITICAL: CRITICAL: Puppet has 1 failures [16:01:54] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: Puppet has 1 failures [16:02:05] RECOVERY - Apache HTTP on mw1170 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.029 second response time [16:02:05] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: Puppet has 1 failures [16:02:15] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: Puppet has 1 failures [16:02:16] PROBLEM - puppet last run on search1015 is CRITICAL: CRITICAL: Puppet has 1 failures [16:02:24] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Puppet has 1 failures [16:02:34] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: Puppet has 1 failures [16:02:34] RECOVERY - HHVM busy threads on mw1234 is OK: OK: Less than 30.00% above the threshold [76.8] [16:02:45] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: Puppet has 1 failures [16:02:46] PROBLEM - puppet last run on wtp1015 is CRITICAL: CRITICAL: Puppet has 1 failures [16:03:14] PROBLEM - puppet last run on elastic1011 is CRITICAL: CRITICAL: Puppet has 1 failures [16:03:15] PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:03:34] Uhhh [16:03:45] PROBLEM - puppet last run on db2003 is CRITICAL: CRITICAL: Puppet has 1 failures [16:03:46] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: Puppet has 1 failures [16:03:46] YuviPanda: You killin' stuff? [16:03:55] PROBLEM - puppet last run on db1057 is CRITICAL: CRITICAL: Puppet has 1 failures [16:04:05] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: Puppet has 1 failures [16:04:08] RECOVERY - check configured eth on mw1170 is OK: NRPE: Unable to read output [16:04:08] PROBLEM - puppet last run on berkelium is CRITICAL: CRITICAL: Puppet has 1 failures [16:04:17] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Puppet has 1 failures [16:04:17] PROBLEM - puppet last run on mw1050 is CRITICAL: CRITICAL: Puppet has 1 failures [16:04:17] RECOVERY - check if dhclient is running on mw1170 is OK: PROCS OK: 0 processes with command name dhclient [16:04:17] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: Puppet has 1 failures [16:04:17] RECOVERY - Apache HTTP on mw1183 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.013 second response time [16:04:27] RECOVERY - puppet last run on mw1082 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:04:35] RECOVERY - check if dhclient is running on mw1183 is OK: PROCS OK: 0 processes with command name dhclient [16:04:37] RECOVERY - RAID on mw1170 is OK: OK: no RAID installed [16:05:05] marktraceur: this is probably one of those transient storms. [16:05:08] RECOVERY - DPKG on mw1170 is OK: All packages OK [16:05:27] RECOVERY - Disk space on mw1170 is OK: DISK OK [16:05:39] RECOVERY - nutcracker port on mw1183 is OK: TCP OK - 0.000 second response time on port 11212 [16:05:39] RECOVERY - DPKG on mw1183 is OK: All packages OK [16:05:48] marktraceur: along with some of my stuff. [16:05:58] RECOVERY - nutcracker process on mw1183 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [16:05:58] should pass, too late to schedule downtime to be anay effect, I think [16:06:19] RECOVERY - Disk space on mw1183 is OK: DISK OK [16:06:19] RECOVERY - nutcracker port on mw1170 is OK: TCP OK - 0.000 second response time on port 11212 [16:06:19] PROBLEM - DPKG on mw1172 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:06:19] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 1 failures [16:06:19] RECOVERY - check configured eth on mw1183 is OK: NRPE: Unable to read output [16:06:28] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: Puppet has 1 failures [16:06:49] RECOVERY - nutcracker process on mw1170 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [16:06:50] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:18] RECOVERY - RAID on mw1183 is OK: OK: no RAID installed [16:07:19] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:29] PROBLEM - puppet last run on rcs1001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:19] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:09:32] PROBLEM - check if salt-minion is running on mw1172 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [16:09:38] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:10:08] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:10:09] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:10:19] RECOVERY - puppet last run on mw1050 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:10:19] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:10:58] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:10:58] RECOVERY - puppet last run on search1023 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:10:58] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:10:58] RECOVERY - puppet last run on thallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:10:58] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:11:08] RECOVERY - puppet last run on rubidium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:11:08] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:11:19] RECOVERY - puppet last run on mc1007 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:11:19] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:11:30] RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:11:32] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:11:32] RECOVERY - puppet last run on wtp1015 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:11:59] RECOVERY - puppet last run on elastic1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:12:10] RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:12:18] RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:12:18] RECOVERY - puppet last run on virt1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:12:39] RECOVERY - puppet last run on db2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:12:39] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:12:50] RECOVERY - puppet last run on db1057 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:12:50] RECOVERY - puppet last run on db1055 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:12:58] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:13:09] RECOVERY - puppet last run on berkelium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:13:09] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 102 failures [16:13:19] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:13:20] RECOVERY - puppet last run on rcs1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:13:21] RECOVERY - puppet last run on db1054 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:13:22] RECOVERY - puppet last run on es1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:13:41] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:13:43] (03PS1) 10Anomie: Correct Content-Length from robots.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177232 [16:13:48] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:13:49] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 102 failures [16:13:58] RECOVERY - puppet last run on search1015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:14:03] (03CR) 10Andrew Bogott: "Last I checked, hiera was broken for local puppetmasters on trusty. The deployment-prep puppetmaster is on precise, right?" [puppet] - 10https://gerrit.wikimedia.org/r/176191 (owner: 10BryanDavis) [16:14:29] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 2 failures [16:14:46] (03PS1) 10RobH: setting ip info for server californium [dns] - 10https://gerrit.wikimedia.org/r/177233 [16:15:09] RECOVERY - DPKG on mw1172 is OK: All packages OK [16:15:21] PROBLEM - HHVM rendering on mw1170 is CRITICAL: Connection refused [16:15:54] (03CR) 10RobH: [C: 032] setting ip info for server californium [dns] - 10https://gerrit.wikimedia.org/r/177233 (owner: 10RobH) [16:16:26] PROBLEM - HHVM rendering on mw1183 is CRITICAL: Connection refused [16:16:39] PROBLEM - HHVM rendering on mw1172 is CRITICAL: Connection refused [16:17:58] PROBLEM - HHVM rendering on mw1162 is CRITICAL: Connection refused [16:18:03] (03CR) 10Anomie: [C: 032] "Reviewed by _joe_ on IRC." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177232 (owner: 10Anomie) [16:18:18] (03CR) 10BryanDavis: "> Last I checked, hiera was broken for local puppetmasters on trusty. The deployment-prep puppetmaster is on precise, right?" [puppet] - 10https://gerrit.wikimedia.org/r/176191 (owner: 10BryanDavis) [16:18:28] (03Merged) 10jenkins-bot: Correct Content-Length from robots.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177232 (owner: 10Anomie) [16:19:11] !log anomie Synchronized w/robots.php: Fix Content-Length from robots.txt (duration: 00m 06s) [16:19:15] Logged the message, Master [16:19:31] _joe_: A bunch of scap errors, probably from your reimaging. [16:19:38] PROBLEM - Apache HTTP on mw1170 is CRITICAL: Connection refused [16:19:45] (03CR) 10Andrew Bogott: [C: 031] Use hiera to configure udp2log endpoint for ::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/176191 (owner: 10BryanDavis) [16:19:48] (03PS1) 10RobH: setting base install params for server californium [puppet] - 10https://gerrit.wikimedia.org/r/177236 [16:20:43] PROBLEM - Apache HTTP on mw1172 is CRITICAL: Connection refused [16:21:10] RECOVERY - HHVM rendering on mw1162 is OK: HTTP OK: HTTP/1.1 200 OK - 69396 bytes in 1.875 second response time [16:22:09] PROBLEM - Apache HTTP on mw1183 is CRITICAL: Connection refused [16:22:12] anomie: host key mismatches? [16:22:33] bd808: And a few "/srv/deployment/scap/scap/bin/sync-common: No such file or directory" [16:23:11] *nod* missing scap trebuchet deploys. Possibly due to salt setup issues. [16:23:47] Setting the salt grain on the first puppet run almost never works [16:24:31] mw1150, mw1149, mw1151, and mw1152 have host key. mw1177, mw1170, mw1183, and mw1172 have the missing file. mw1058 gave an erryr trying to touch /srv/mediawiki/wmf-config/InitialiseSettings.php. [16:24:57] (03CR) 10RobH: [C: 032] setting base install params for server californium [puppet] - 10https://gerrit.wikimedia.org/r/177236 (owner: 10RobH) [16:25:01] anomie: latter is servers still being re-imaged, I think. [16:25:03] missing files, that is [16:25:12] 77 also has a bad disk [16:25:27] those with missing files are also all out of pybal, but not scap, I guess [16:25:28] bd808: ^ [16:25:51] pybal and dsh are different config files [16:26:11] and we don't really use dsh for this anymore but we still get the host list from there [16:26:35] because hysterical rasins [16:26:59] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:27:12] !log repool mw1166 mw1165 mw1164 mw1162 mw1161 [16:27:14] Logged the message, Master [16:27:58] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:35:09] RECOVERY - check if salt-minion is running on mw1170 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:35:34] !log anomie Synchronized w/robots.php: Remove Content-Length from robots.txt (live hack for test, will commit or revert momentarily) (duration: 00m 07s) [16:35:39] Logged the message, Master [16:35:49] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: puppet fail [16:36:22] <_joe_> anomie: it worked :) [16:36:23] (03PS1) 10Anomie: Just remove Content-Length in robots.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177240 [16:36:46] (03CR) 10Anomie: [C: 032] "Yeah, that worked in a live hack" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177240 (owner: 10Anomie) [16:36:53] (03Merged) 10jenkins-bot: Just remove Content-Length in robots.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177240 (owner: 10Anomie) [16:37:10] !log anomie Synchronized w/robots.php: Committed live hack (duration: 00m 05s) [16:37:12] Logged the message, Master [16:40:50] RECOVERY - check if salt-minion is running on mw1183 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:41:43] !log anomie Synchronized w/robots.php: Committed live hack (for real this time) (duration: 00m 05s) [16:41:49] Logged the message, Master [16:41:53] !log starting trusty upgrade of analytics1027 [16:41:55] Logged the message, Master [16:42:50] (03Abandoned) 10Cscott: Correctly remove the 'Download as PDF' link from sidebar. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176140 (owner: 10Cscott) [16:42:54] RECOVERY - check if salt-minion is running on mw1172 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:46:57] PROBLEM - check if dhclient is running on mw1149 is CRITICAL: Connection refused by host [16:46:57] PROBLEM - check configured eth on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:47:02] (03PS1) 10RobH: changing californium from public1-c to private1-b subnets [dns] - 10https://gerrit.wikimedia.org/r/177245 [16:47:25] (03PS1) 10RobH: californium shifted from public to private ip [puppet] - 10https://gerrit.wikimedia.org/r/177246 [16:47:44] PROBLEM - check if salt-minion is running on mw1149 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:47:44] PROBLEM - check if dhclient is running on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:48:01] (03CR) 10RobH: [C: 032] changing californium from public1-c to private1-b subnets [dns] - 10https://gerrit.wikimedia.org/r/177245 (owner: 10RobH) [16:48:03] PROBLEM - check if salt-minion is running on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:48:09] scapp takkkeesss forrever [16:48:42] (03CR) 10RobH: [C: 032] californium shifted from public to private ip [puppet] - 10https://gerrit.wikimedia.org/r/177246 (owner: 10RobH) [16:48:53] californium! [16:48:57] !log starting upgrade of analytics1027 to trusty, hive and oozie are offline for a bit [16:49:00] Logged the message, Master [16:49:05] PROBLEM - nutcracker port on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:49:07] PROBLEM - nutcracker process on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:49:22] PROBLEM - puppet last run on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:49:23] PROBLEM - DPKG on mw1149 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:49:42] PROBLEM - DPKG on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:49:42] PROBLEM - puppet last run on mw1152 is CRITICAL: CRITICAL: Puppet has 102 failures [16:49:49] RECOVERY - HHVM rendering on mw1170 is OK: HTTP OK: HTTP/1.1 200 OK - 69396 bytes in 4.803 second response time [16:50:02] PROBLEM - Disk space on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:50:12] RECOVERY - check if dhclient is running on mw1149 is OK: PROCS OK: 0 processes with command name dhclient [16:50:12] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:50:32] PROBLEM - HHVM processes on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:50:33] RECOVERY - Apache HTTP on mw1183 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 2.938 second response time [16:50:43] RECOVERY - check if salt-minion is running on mw1149 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:50:52] RECOVERY - HHVM rendering on mw1172 is OK: HTTP OK: HTTP/1.1 200 OK - 69396 bytes in 4.683 second response time [16:51:29] PROBLEM - RAID on mw1150 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:51:44] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:51:54] RECOVERY - HHVM rendering on mw1183 is OK: HTTP OK: HTTP/1.1 200 OK - 69395 bytes in 0.345 second response time [16:51:54] RECOVERY - Apache HTTP on mw1170 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.066 second response time [16:51:55] RECOVERY - Apache HTTP on mw1172 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.073 second response time [16:51:55] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:52:15] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:52:25] RECOVERY - DPKG on mw1149 is OK: All packages OK [16:53:01] !log repooling mw1183 mw1172 mw1170 as hhvm [16:53:05] Logged the message, Master [16:54:04] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [16:55:55] RECOVERY - Disk space on mw1150 is OK: DISK OK [16:56:06] RECOVERY - check configured eth on mw1150 is OK: NRPE: Unable to read output [16:56:25] RECOVERY - HHVM processes on mw1150 is OK: PROCS OK: 1 process with command name hhvm [16:56:47] RECOVERY - check if dhclient is running on mw1150 is OK: PROCS OK: 0 processes with command name dhclient [16:57:04] RECOVERY - check if salt-minion is running on mw1150 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:57:30] RECOVERY - RAID on mw1150 is OK: OK: no RAID installed [16:57:59] RECOVERY - nutcracker port on mw1150 is OK: TCP OK - 0.000 second response time on port 11212 [16:58:19] PROBLEM - puppet last run on mw1149 is CRITICAL: CRITICAL: Puppet has 102 failures [16:58:19] RECOVERY - nutcracker process on mw1150 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [17:03:49] PROBLEM - puppet last run on mw1116 is CRITICAL: CRITICAL: Puppet has 1 failures [17:04:27] PROBLEM - HHVM rendering on mw1152 is CRITICAL: Connection refused [17:04:27] RECOVERY - DPKG on mw1150 is OK: All packages OK [17:05:19] (03PS5) 10Filippo Giunchedi: import debian/ directory [debs/python-diamond] - 10https://gerrit.wikimedia.org/r/168599 [17:06:12] <_joe_> !log repooling mw1108-1113 [17:06:18] Logged the message, Master [17:07:08] PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: Puppet has 101 failures [17:10:27] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: Puppet has 102 failures [17:10:39] PROBLEM - Apache HTTP on mw1152 is CRITICAL: Connection refused [17:12:48] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:15:30] RECOVERY - puppet last run on mw1152 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:16:30] RECOVERY - HHVM rendering on mw1152 is OK: HTTP OK: HTTP/1.1 200 OK - 69400 bytes in 0.171 second response time [17:16:40] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.158 second response time [17:18:00] PROBLEM - HHVM rendering on mw1149 is CRITICAL: Connection refused [17:20:10] PROBLEM - HHVM rendering on mw1151 is CRITICAL: Connection refused [17:20:48] (03PS1) 10Cmjohnson: Adding new entries for virt1010-12 [dns] - 10https://gerrit.wikimedia.org/r/177252 [17:24:39] PROBLEM - Host mw1149 is DOWN: PING CRITICAL - Packet loss = 100% [17:24:39] PROBLEM - Host mw1150 is DOWN: PING CRITICAL - Packet loss = 100% [17:24:39] PROBLEM - Host mw1151 is DOWN: PING CRITICAL - Packet loss = 100% [17:26:07] RECOVERY - Host mw1149 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [17:26:10] (03CR) 10Cmjohnson: [C: 032] Adding new entries for virt1010-12 [dns] - 10https://gerrit.wikimedia.org/r/177252 (owner: 10Cmjohnson) [17:26:19] RECOVERY - Host mw1151 is UP: PING OK - Packet loss = 0%, RTA = 1.26 ms [17:26:30] RECOVERY - HHVM rendering on mw1151 is OK: HTTP OK: HTTP/1.1 200 OK - 69401 bytes in 7.975 second response time [17:26:41] RECOVERY - Host mw1150 is UP: PING OK - Packet loss = 0%, RTA = 1.03 ms [17:27:33] RECOVERY - HHVM rendering on mw1149 is OK: HTTP OK: HTTP/1.1 200 OK - 69401 bytes in 5.239 second response time [17:27:52] RECOVERY - puppet last run on mw1149 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [17:28:37] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:28:51] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:40:27] (03CR) 10Aaron Schulz: [C: 031] Graph User::pingLimiter() actions in gdash [puppet] - 10https://gerrit.wikimedia.org/r/166511 (https://bugzilla.wikimedia.org/65478) (owner: 10Nemo bis) [17:59:54] (03PS1) 10coren: Tool labs: webservice2 syntax for bigbrother [puppet] - 10https://gerrit.wikimedia.org/r/177262 [18:00:56] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [18:13:26] (03PS1) 10RobH: put californium in private subnet, but not labs private [dns] - 10https://gerrit.wikimedia.org/r/177264 [18:13:54] (03PS4) 10Giuseppe Lavagetto: hiera: role-based backend, role keyword [puppet] - 10https://gerrit.wikimedia.org/r/176334 [18:14:03] (03CR) 10RobH: [C: 032] put californium in private subnet, but not labs private [dns] - 10https://gerrit.wikimedia.org/r/177264 (owner: 10RobH) [18:15:09] <_joe_> godog: I added some tests for the role class, I'll add more for the hiera backend tomorrow [18:17:47] _joe_: nice, I'll take a look tomorrow too! [18:29:06] !log restarted parsoid to clear any cached v2 api state to prevent leakage into v1 api requests [18:29:11] Logged the message, Master [18:29:36] Reedy: Possible to scap a Wikidata update with some new messages during the train? [18:35:12] PROBLEM - Parsoid on wtp1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:35:53] * YuviPanda waves at subbu [18:36:03] PROBLEM - Parsoid on wtp1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:36:12] PROBLEM - Parsoid on wtp1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:36:16] _joe_: thanks so much for the hhvm work (YuviPanda and akosiaris too, and probably many more) [18:36:22] :D [18:36:42] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail [18:37:42] looking [18:39:49] YuviPanda, they will be stuck for a max of 5 mins before they restart cleanly. known issue that we haven't resolved yet. [18:40:01] subbu: :) [18:40:21] gwicke thinks it is upstart related. [18:40:40] https://phabricator.wikimedia.org/T75395 [18:42:24] RECOVERY - Parsoid on wtp1024 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.022 second response time [18:42:55] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Puppet has 1 failures [18:44:15] <_joe_> matanya: well thank ori and TimStarling [18:44:20] (03PS1) 10Gergő Tisza: Force CommonsMetadata on beta to recalculate data from prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177271 [18:44:24] <_joe_> before you thank us [18:44:36] I thank them as well :) [18:45:34] subbu: hmm, one easy way to test that would be to just test out the init script and see if it has the same issue [18:48:54] hmm .. 1009 and 1017 haven't recovered yet. i see a stuck process on 1009 [18:49:30] YuviPanda, can you update the bug report? [18:51:55] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [18:53:03] YuviPanda, i am going to kill that stuck process on wtp1009 [18:54:07] subbu: ok [18:54:49] YuviPanda, looks like i cannot. [18:55:01] since i am being asked for a password. [18:55:04] subbu: let me try [18:56:46] subbu: ok, killed them all and restarted [18:56:59] !log manually killed parsoid on wtp1009, restarted with service parsoid restart [18:57:03] Logged the message, Master [18:57:11] subbu: check if that's running ok? [18:57:29] also wtp1017 process id 16913 [18:58:23] thanks. [18:58:25] !log manually killed parsoid on wtp1017, restarted with service parsoid restart [18:58:27] Logged the message, Master [18:58:31] subbu: :) yw! [18:58:37] subbu: am about to go eat food, anything else you need? [18:58:55] that is all for now. :) [18:59:03] alright! [18:59:16] we'll look at logs to see what titles got them stuck. [18:59:48] RECOVERY - Parsoid on wtp1009 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.016 second response time [19:00:04] Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141203T1900). [19:00:30] RECOVERY - Parsoid on wtp1017 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.092 second response time [19:05:40] (03PS1) 10Aaron Schulz: Set the redis job queue daemonized flag in labs too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177275 [19:09:32] greg-g: What's up with the train? [19:09:57] ? [19:10:11] Reedy: Can you scap a Wikibase update? [19:10:23] (03PS1) 10Ejegg: Change banner hide cookie duration to one week [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177278 [19:10:28] I can when I scap/branch [19:10:30] It's not urgent, so can be after the train (I can also do it myself after your done, if you prefer) [19:10:32] ok [19:10:49] Reedy: one file is ready for, i'll hold until all are done, and poke for a server side upload [19:11:10] I also have a config change up, which you can merge anytime (doesn't depend on any updates) [19:14:48] hey YuviPAnda|foood [19:14:59] or, maybe godog knows [19:14:59] yt? [19:17:15] Reedy: in case you haven't figured this out yourself yet, logstash doesn't have hhvm errors yet, so the fatallog report there won't see a lot of things that it would have with php5. [19:17:20] ottomata: sure, what's up? [19:18:30] why are the monitoring::graphite_* classes included directly on tungsten? [19:18:36] and not the hosts where it is relevant for them? [19:18:52] monitoring::service eventually uses an exported resource [19:19:04] see class role::graphite::production [19:19:08] it has things like [19:19:12] include ::eventlogging::monitoring::graphite [19:19:55] i'm looking at converitng some kafka checks to graphite instead of ganglia [19:20:07] and right now, I have monitoring::ganglia inside of my kafka roles [19:20:10] not separate [19:20:20] ottomata: no idea sorry, I've barely looked at graphite in puppet [19:20:24] hm ok [19:20:31] histerical raisins I'm tempted to say [19:20:36] haha [19:20:37] hm [19:21:43] ottomata: git log -Gmonitoring::graphite perhaps ? [19:21:57] or sth like that [19:23:46] hmm, looks like maybe _joe_ knows somehting? [19:23:49] _joe_: yt? [19:24:06] (03PS4) 10Aaron Schulz: Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 [19:24:28] (03CR) 10Aaron Schulz: Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 (owner: 10Aaron Schulz) [19:26:37] (03CR) 10Aaron Schulz: Moved sampling to the profiler config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175891 (owner: 10Aaron Schulz) [19:27:46] PROBLEM - puppet last run on virt1005 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:22] ottomata: hi! I can explain in about 5mins [19:31:51] k [19:36:16] (03PS1) 10MaxSem: Enable $wgMFUseWikibaseDescription [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177288 [19:39:50] RECOVERY - puppet last run on virt1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [19:41:33] ottomata: so, monitoring::graphite does use exported resources. [19:41:52] ottomata: so anything you want monitored related to a particular host, you just declare same as the ganglia monitors [19:42:15] ottomata: the things you see on the graphite role itself (eventlogging, mw, etc) are general stats that're independent of any particular host, so they just need a place to be declared [19:42:20] ottomata: and so they're declared there. [19:42:45] ottomata: this is why when there's a 5xx spike, you see icinga-wm complain about spike *on tungsten*, which just means that in icinga the check is associated with tungsten. It's still checking prod's 5xx stats only [19:42:56] hm, aye ok [19:42:58] ottomata: so for your purpose (replacing ganglia with graphite), you don't have to worry about the includes. [19:43:15] ok, are the metric names the same, even though the metric in graphite is now prefixed with the root prefix (kafka) and the hostname? [19:43:24] e.g. javascript:%20Composer.toggleTarget('kafka.analytics1022_eqiad_wmnet_9999.kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate'); [19:43:28] uhh [19:43:32] kafka.analytics1022_eqiad_wmnet_9999.kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate [19:43:41] so, in monitoring::graphite_threshold [19:43:42] i would use: [19:43:45] kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate [19:43:46] ? [19:45:02] ottomata: you can check the metric name [19:45:08] ottomata: graphite.wikimedia.org [19:45:11] ottomata: and pick whatever? [19:45:22] those are from grpahite.,wikimedia.org [19:45:26] oh, right [19:45:45] ottomata: yeah, that would work. [19:45:49] the latter? [19:45:52] without the prefixed hostname? [19:45:55] right? [19:46:06] > kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate [19:46:07] you mean? [19:46:10] aye, yes [19:46:24] hmm, no, the names have to match exactly what you see in the graphite interface. [19:46:30] oh [19:46:31] there's a way to make it say 'current host', let me look [19:47:02] ah [19:47:04] metric => "servers.${::hostname}.hhvmHealthCollector.queued.value", [19:47:11] in monitoring::graphite_threshold { 'hhvm_queue_size': [19:47:18] actually uses the hostname in the metric [19:47:20] i guess i can do that [19:47:40] ok, another q. [19:47:46] i'm pushing these through statsd now [19:47:49] these are from jmx [19:47:57] jmx maintains rolling averages already [19:48:00] but also a raw count [19:48:07] both of these stats are being sent to statsd [19:48:09] ottomata: heh, was just going to point you to that example [19:48:19] but, the data in graphite seems a little weird [19:48:30] does statsd do any transformation of the data it is sent before giving it to graphite? [19:48:51] ah, it does. kind of. depends. [19:48:55] let me find link [19:50:01] ottomata: https://github.com/etsy/statsd/blob/master/docs/metric_types.md [19:50:15] ottomata: so depends on what type jmx is sending them to statsd as [19:50:54] hm [19:51:03] yuvipanda mw1177 is just about finished installing. there was pre-existing s/w partition that had to be removed first [19:51:18] cmjohnson: ah, ok! [19:51:36] yep once it completes...i will ping you again [19:51:44] hmmm [19:52:07] (03PS1) 10Reedy: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177292 [19:52:09] (03PS1) 10Reedy: testwiki to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177293 [19:52:11] (03PS1) 10Reedy: wikipedias to 1.25wmf10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177294 [19:52:13] (03PS1) 10Reedy: group0 to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177295 [19:52:29] (03CR) 10Reedy: [C: 032] Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177292 (owner: 10Reedy) [19:52:35] (03CR) 10Reedy: [C: 032] testwiki to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177293 (owner: 10Reedy) [19:52:37] (03Merged) 10jenkins-bot: Add symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177292 (owner: 10Reedy) [19:52:45] cmjohnson: thank you :) [19:52:46] (03Merged) 10jenkins-bot: testwiki to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177293 (owner: 10Reedy) [19:54:14] ooof [19:54:20] this will be a lot of configuration :( [19:54:53] yuvipanda: all yours! [19:55:06] cmjohnson: thanks! [19:56:11] ottomata: if it has raw numbers per-interval, you can usually get away with just counters. [19:56:34] ottomata: or gauges. [19:57:42] YuviPanda: it has gauages (the averages) [19:57:46] and it has a total count ever [19:57:53] it does not have numbers per interval [19:57:56] it will have an ever increasing count [19:58:03] total number of messages ever sent [19:58:04] !log reedy Started scap: testwiki to 1.25wmf11 and rebuild l10n caches [19:58:09] Logged the message, Master [19:58:17] right now, I wildcard though, hmmm [19:58:26] so I don't have to specific every metric I wnat to collect [19:58:35] i get a lot of the jmx stats by wildcarding them [19:58:35] hm [19:59:02] ottomata: graphite also lets you use functions to aggregate different metrics [19:59:11] https://graphite.readthedocs.org/en/0.9.10/functions.html [19:59:23] well, right now i'm just interested in creating alerts from graphite dat [19:59:24] data [20:00:58] ottomata: the functions work for alerts too [20:01:01] oh [20:01:02] m [20:01:03] hm [20:01:08] ottomata: in 'metric' you can put any graphite expression [20:01:15] ah [20:07:01] !log reedy Started scap: testwiki to 1.25wmf11 (take 2) [20:07:03] Logged the message, Master [20:07:03] ffs [20:07:38] greg-g: hey, when can ebernhardson run a conversion script on officewiki to convert an officewiki namespace (only three talk pages) to Flow? It went over our normal Tuesday deploy window. [20:07:48] PROBLEM - check if salt-minion is running on mw1177 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [20:08:00] (03PS1) 10Ottomata: Add support for bucketType in statsd attrs [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/177300 [20:08:10] greg-g hmm, maybe we can sublet yurikR Wikipedia Zero window today [20:08:44] spagewmf: I can possibly run it if I get scap done intime [20:08:45] Or you can during the window [20:09:04] spagewmf, go for it [20:09:11] i deployed yesterday [20:09:31] greg-g, come to think of it, kill our window alltogether - we can always find a window whenever we need, no point in keeping it on wed [20:09:52] yurikR: noted [20:10:29] yurikR: thanks! I'll update Deployments [20:14:58] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 102 failures [20:16:47] PROBLEM - HHVM rendering on mw1177 is CRITICAL: Connection refused [20:16:58] PROBLEM - puppet last run on mw1044 is CRITICAL: CRITICAL: Puppet has 1 failures [20:17:47] (03PS2) 10Ottomata: Add support for bucketType in statsd attrs [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/177300 [20:24:09] PROBLEM - Apache HTTP on mw1177 is CRITICAL: Connection refused [20:24:11] YuviPanda: any clue what happens in statsd if I change the type of a metric after it already has some? [20:24:16] these have been sent as counters already [20:24:23] but they should have been sent as guages [20:24:24] gauges [20:24:26] * [20:24:39] ottomata: ah, hmm. not sure. [20:25:00] i can tell it is sending the right stuff to statsd now, i thikn [20:25:01] e.g. [20:25:02] ottomata: you could rm the metrics from storage, restart txstatsd to clear the old ones out [20:25:04] kafka.analytics1022_eqiad_wmnet_9999.kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate [20:25:05] oops [20:25:11] kafka.analytics1022_eqiad_wmnet_9999.kafka.server.BrokerTopicMetrics.webrequest_mobile-MessagesInPerSec.FifteenMinuteRate:1812.508985714912|g [20:25:12] that [20:25:20] hm, ok [20:25:23] from storage... [20:26:00] yeah, on tungsten [20:26:09] /srv/carbon/whsiper i think [20:26:19] looks like var lib [20:26:33] oh? [20:26:38] maybe? [20:26:42] /var/lib/graphite/whisper/carbon [20:26:42] ? [20:26:56] nothing in /srv/ but deployment [20:27:18] don't see anyting relevant here though either... [20:27:22] ottomata: ah, right. discrepancy between labs and prod graphite. you're right. [20:27:31] let me find where it is [20:27:58] 20:20:40 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', 'mw1010.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet'] on mw1177 returned [127]: bash: /srv/deployment/scap/scap/bin/sync-common: No such file or directory [20:28:11] ottomata: /var/lib/carbon [20:28:17] ah /var/lib/carbon [20:28:19] :) [20:28:22] Reedy: yeah, 1177 is still dead. [20:28:27] Reedy: shall ressurect shortly [20:28:36] ahh cool, all top level prefixed. [20:28:36] nice [20:28:37] hm, ok [20:28:39] YuviPanda: so [20:28:40] ottomata: you also need to restart txstatsd right after removing 'em [20:28:46] and graphite? [20:28:48] /carbon? [20:28:55] YuviPanda: righto, I knew one was borked, juts not which one :) [20:29:01] or just txstatsd? [20:29:09] Reedy: :) [20:29:12] ottomata: just txstatsd [20:29:14] k [20:29:17] RECOVERY - puppet last run on mw1044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:31:37] (03CR) 10Ottomata: [C: 032] Add support for bucketType in statsd attrs [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/177300 (owner: 10Ottomata) [20:32:28] (03PS1) 10Ottomata: Set bucketType for gauge metrics for statsd usage [puppet/kafka] - 10https://gerrit.wikimedia.org/r/177307 [20:32:51] (03CR) 10Ottomata: [C: 032] Set bucketType for gauge metrics for statsd usage [puppet/kafka] - 10https://gerrit.wikimedia.org/r/177307 (owner: 10Ottomata) [20:33:29] (03PS1) 10Ottomata: Update kafka and jmxtrans modules with gauge metrics set for statsd usage [puppet] - 10https://gerrit.wikimedia.org/r/177310 [20:34:26] Reedy: Don't we all love backports? https://gerrit.wikimedia.org/r/177308 and https://gerrit.wikimedia.org/r/177309 [20:34:44] (03PS2) 10Ottomata: Update kafka and jmxtrans modules with gauge metrics set for statsd usage [puppet] - 10https://gerrit.wikimedia.org/r/177310 [20:34:48] RECOVERY - check if salt-minion is running on mw1177 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [20:34:50] (03CR) 10Ottomata: [C: 032 V: 032] Update kafka and jmxtrans modules with gauge metrics set for statsd usage [puppet] - 10https://gerrit.wikimedia.org/r/177310 (owner: 10Ottomata) [20:34:58] hoo: As long as they're not needing a scap again it's not so bad ;) [20:35:08] No, sync-dir will do [20:36:18] PROBLEM - HHVM rendering on mw1058 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki exception - 687 bytes in 0.049 second response time [20:39:18] RECOVERY - HHVM rendering on mw1058 is OK: HTTP OK: HTTP/1.1 200 OK - 69219 bytes in 0.245 second response time [20:43:37] !log reedy Finished scap: testwiki to 1.25wmf11 (take 2) (duration: 36m 35s) [20:43:42] Logged the message, Master [20:46:07] ori: ping [20:46:22] !log reedy Synchronized php-1.25wmf10/extensions/Wikidata: (no message) (duration: 00m 13s) [20:46:25] Logged the message, Master [20:46:55] !log reedy Synchronized php-1.25wmf11/extensions/Wikidata: (no message) (duration: 00m 12s) [20:46:58] Logged the message, Master [20:47:03] hoo: ^^ [20:47:24] Thanks :) [20:47:57] (03CR) 10Reedy: [C: 032] wikipedias to 1.25wmf10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177294 (owner: 10Reedy) [20:48:07] (03Merged) 10jenkins-bot: wikipedias to 1.25wmf10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177294 (owner: 10Reedy) [20:49:01] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf10 [20:49:04] Logged the message, Master [20:49:24] (03CR) 10Reedy: [C: 032] group0 to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177295 (owner: 10Reedy) [20:49:31] (03Merged) 10jenkins-bot: group0 to 1.25wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177295 (owner: 10Reedy) [20:49:59] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf11 [20:50:01] Logged the message, Master [20:50:40] MaxSem: can https://gerrit.wikimedia.org/r/#/c/177288/ go out? [20:51:23] hoo: which config did you want? both of your outstanding ones? [20:51:33] RECOVERY - Apache HTTP on mw1177 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 2.981 second response time [20:51:44] Reedy: Second one is a clean up noop [20:51:49] feel free to sync it or not [20:51:54] still, might aswell get it merged though :) [20:52:05] (03PS2) 10Reedy: Enable displayStatementsOnProperties for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177182 (owner: 10Hoo man) [20:52:13] (03CR) 10Reedy: [C: 032] Enable displayStatementsOnProperties for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177182 (owner: 10Hoo man) [20:52:23] (03Merged) 10jenkins-bot: Enable displayStatementsOnProperties for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177182 (owner: 10Hoo man) [20:52:43] (03PS2) 10Reedy: Simplify Wikibase configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177183 (owner: 10Hoo man) [20:52:50] (03CR) 10Reedy: [C: 032] Simplify Wikibase configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177183 (owner: 10Hoo man) [20:52:59] (03Merged) 10jenkins-bot: Simplify Wikibase configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177183 (owner: 10Hoo man) [20:54:01] (03PS2) 10Reedy: Set the redis job queue daemonized flag in labs too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177275 (owner: 10Aaron Schulz) [20:54:17] (03CR) 10Reedy: [C: 032] Set the redis job queue daemonized flag in labs too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177275 (owner: 10Aaron Schulz) [20:54:24] (03Merged) 10jenkins-bot: Set the redis job queue daemonized flag in labs too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177275 (owner: 10Aaron Schulz) [20:55:09] Reedy, yes it can [20:55:36] (03CR) 10Reedy: wgHooks['SpecialVersionVersionUrl']: support alpha version (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177151 (owner: 10BryanDavis) [20:55:59] (03PS2) 10Reedy: Enable $wgMFUseWikibaseDescription [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177288 (owner: 10MaxSem) [20:56:04] (03CR) 10Reedy: [C: 032] Enable $wgMFUseWikibaseDescription [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177288 (owner: 10MaxSem) [20:56:15] (03Merged) 10jenkins-bot: Enable $wgMFUseWikibaseDescription [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177288 (owner: 10MaxSem) [20:56:53] (03PS3) 10Reedy: multiversion: Set prepend-autoload: false, optimize-autoload: true for composer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176994 (owner: 10Legoktm) [20:57:02] (03CR) 10Reedy: [C: 032] multiversion: Set prepend-autoload: false, optimize-autoload: true for composer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176994 (owner: 10Legoktm) [20:57:11] (03Merged) 10jenkins-bot: multiversion: Set prepend-autoload: false, optimize-autoload: true for composer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/176994 (owner: 10Legoktm) [20:59:33] (03PS2) 10Reedy: beta: Fix unset $lang from MWMultiVersion::setSiteInfoForWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177157 (owner: 10BryanDavis) [20:59:58] (03PS3) 10Reedy: beta: Fix unset $lang from MWMultiVersion::setSiteInfoForWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177157 (owner: 10BryanDavis) [21:00:05] gwicke, cscott, arlolra, subbu: Respected human, time to deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141203T2100). Please do the needful. [21:00:11] (03CR) 10Reedy: [C: 032] beta: Fix unset $lang from MWMultiVersion::setSiteInfoForWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177157 (owner: 10BryanDavis) [21:01:13] (03Merged) 10jenkins-bot: beta: Fix unset $lang from MWMultiVersion::setSiteInfoForWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177157 (owner: 10BryanDavis) [21:01:13] jouncebot, greg-g: yup, ocg/parsoid deploy time! yay! [21:01:44] what? [21:02:11] greg-g: i'm just confirming that i'm doing the parsoid and ocg deploys in this window [21:02:19] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 07s) [21:02:21] greg-g: nevermind my exuberance [21:02:22] Logged the message, Master [21:02:47] is wmf10 everywhere now? [21:02:48] Reedy: are you still deploying stuff? [21:02:56] cscott: Nope, just done [21:02:57] paravoid: yup [21:02:57] cscott: no need to ping me :) [21:03:05] cscott: enjoy your elation! [21:03:20] paravoid: any particular reason? :) [21:03:33] yeah, I was waiting on a fix that did nothing [21:03:43] https://gerrit.wikimedia.org/r/#/c/174113/ specifically [21:03:44] greg-g: the room title says cmjohnson is on ops duty, but there's no user by that nick here. so you were just my default pingee. [21:03:51] aude: ^^^ [21:03:51] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:03:58] FlaggedRevs seems to go crazy on ruwiki [21:04:12] MWException from line 1931 of /srv/mediawiki/php-1.25wmf10/includes/api/ApiBase.php: Internal error in ApiResult::setElement: Bad parameter [21:06:58] ori: ^ too [21:12:24] <^d> hoo: s/on ruwiki// [21:16:22] (03PS2) 10BryanDavis: wgHooks['SpecialVersionVersionUrl']: support alpha version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177151 [21:17:57] (03PS3) 10BryanDavis: wgHooks['SpecialVersionVersionUrl']: support alpha version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177151 [21:18:25] (03CR) 10BryanDavis: wgHooks['SpecialVersionVersionUrl']: support alpha version (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177151 (owner: 10BryanDavis) [21:20:12] (03CR) 10Reedy: [C: 031] wgHooks['SpecialVersionVersionUrl']: support alpha version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177151 (owner: 10BryanDavis) [21:21:34] Reedy: Want to merge that or should I? [21:22:01] bd808: I don't mind, just wondering about deploying it... Not sure what cscott is exactly doing (if anything) that might conflict atm [21:22:14] *nod* fire when ready [21:22:42] Reedy: i'm still working on preparing the parsoid and ocg commits, so go ahead. [21:22:44] cscott: Are we ok to sync CommonSettings.php? [21:22:44] Reedy: just let me know when you're done. [21:22:52] Should only be a minute or 2 :) [21:23:07] (03CR) 10Reedy: [C: 032] wgHooks['SpecialVersionVersionUrl']: support alpha version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177151 (owner: 10BryanDavis) [21:23:14] (03Merged) 10jenkins-bot: wgHooks['SpecialVersionVersionUrl']: support alpha version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177151 (owner: 10BryanDavis) [21:25:07] !log reedy Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s) [21:25:13] Logged the message, Master [21:25:14] bd808: cscott done [21:25:40] Reedy: thx. less log spam in beta \o/ [21:25:45] yay [21:27:47] thanks. starting ocg deploy. [21:27:49] Jeff_Green: hey err who u think I should ask for a dump of the metawiki database? I could make one myself, but would like to avoid the users table, etc... [21:28:04] Jeff_Green: actually, I only want the cn_* tables... [21:28:39] how soon do you need it? [21:29:02] awight: I added a mysqldump wrapper script a while ago [21:29:06] Jeff_Green: todayish would be nice, this is low-normal prio though. [21:29:13] (03PS7) 10Giuseppe Lavagetto: Use hiera to configure udp2log endpoint for ::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/176191 (owner: 10BryanDavis) [21:29:15] Reedy: ah, lemme take a look at that... [21:29:18] so it works like sql [21:29:21] roughly [21:29:26] at least, it takes care of auth and such [21:29:40] (03CR) 10Giuseppe Lavagetto: [C: 032] Use hiera to configure udp2log endpoint for ::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/176191 (owner: 10BryanDavis) [21:29:44] sqldump metawiki tablename > table.sql [21:29:45] I think [21:29:52] Reedy: awesome, thanks! [21:30:22] ie, it should just take the usual mysqldump parameters after the dbname [21:30:48] Reedy: looks great [21:30:53] hmm, is it missing a shift call? [21:30:55] * Reedy ponders [21:31:34] Reedy: yeah it is :) [21:32:07] Reedy: strangely enuf, it works fine! [21:32:19] <_joe_> bd808: merged [21:32:28] _joe_: thanks [21:32:40] sweet [21:32:49] Reedy: oh I see, the dbname param is reused from "$@" [21:32:57] (03PS11) 10BryanDavis: logstash: Forward syslog events for apache2 + hhvm [puppet] - 10https://gerrit.wikimedia.org/r/176693 [21:33:08] yeah [21:33:31] no, it's right isn't it? just looks a little funny [21:33:34] So I retract my libel wrt the missing "shift" [21:33:37] yeah [21:33:49] maybe a comment saying "you are not the crazy one" [21:34:06] (03CR) 10BryanDavis: "This is working in beta and would give us much needed visibility for hhvm errors in the prod logstash." [puppet] - 10https://gerrit.wikimedia.org/r/176693 (owner: 10BryanDavis) [21:34:12] (btw, thank YuviPanda, I think my kafka graphite stuff is working now) [21:34:22] http://grafana.wikimedia.org/#/dashboard/db/kafkatest [21:34:24] ottomata: \o/ yay [21:34:50] i haven't created the alert yet, i'm going to wait til tomorrow to get more data first [21:35:19] jgage, paravoid, _joe_: If any of you have time to re-review and merge that logstash via rsyslog patch I'd appreciate it -- https://gerrit.wikimedia.org/r/#/c/176693/ [21:35:39] <_joe_> bd808: I can review, but not merge sorry [21:35:50] <_joe_> it's been a looong day :) [21:36:11] _joe_: no worries. I was surprised to find you here still [21:36:20] <_joe_> oh you already have my +1 [21:36:41] hey bblack, yt? [21:36:51] (03PS1) 10Reedy: Add comment about lack of shift call in sqldump script [puppet] - 10https://gerrit.wikimedia.org/r/177376 [21:37:04] i want to do a little poking at varnishkafka code, just noticed that you didn't commit any changes to the debian branch [21:37:04] awight: ^^ want to +1 that as a sanity check? :) [21:37:10] you did make a new version, right? [21:37:22] for trusty, witih your couple of recent changes? [21:38:20] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: Puppet has 1 failures [21:38:57] PROBLEM - puppet last run on mw1022 is CRITICAL: CRITICAL: Puppet has 1 failures [21:39:12] (03CR) 10Awight: [C: 031] "Nice sanity hack!" [puppet] - 10https://gerrit.wikimedia.org/r/177376 (owner: 10Reedy) [21:39:52] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: Puppet has 1 failures [21:40:34] PROBLEM - puppet last run on mw1032 is CRITICAL: CRITICAL: Puppet has 1 failures [21:40:35] PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: Puppet has 1 failures [21:40:35] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet has 1 failures [21:41:13] <_joe_> mmmh [21:42:01] <_joe_> puppet-merge failure maybe? [21:42:34] <_joe_> nah, whatever [21:42:46] <_joe_> it's still searching for the file where it has the template [21:47:16] ori: do you know if slashes are valid in graphite stat names? [21:48:46] YuviPanda: if I wanted to send an always increasing counter to statsd [21:48:50] what is the proper metric type [21:48:55] i can't tell if |c is correct [21:49:01] from the description [21:49:03] YuviPanda: ^^ [21:49:13] gwicke: don't think they are. [21:49:20] haha, YuviPanda is now the guru on graphite and statsd [21:49:40] YuviPanda: okay [21:49:56] !log updated OCG to version 08e94b19c3f17e699d7e53d9605f65c58e17ea0e [21:49:58] gwicke: graphite stores metric names on the file system, and slashes don't sound good there. [21:49:59] Logged the message, Master [21:50:18] counter looks like statsd expects a value of new counts since last sent to statsd [21:50:19] like [21:50:26] ottomata: I *think* gauges are what you should use? [21:50:27] YuviPanda: somebody could have handed them some percent encoding method back when,.. [21:50:29] they're arbitrary values. [21:50:36] gwicke: heh, true, but it's still ugly. [21:50:46] hmm, i suppose so, hm [21:50:52] hm [21:50:52] hm [21:51:06] i thought gauges would be for things like flucuating values [21:51:07] up and down [21:51:15] but i suppose it would work for just up too [21:51:23] hm [21:51:39] counters count [21:51:45] ottomata: it looks like the closest to what you want? [21:51:50] gauges set a value [21:52:03] ottomata: I might also be totally wrong :) godog and ori might be able to answer *this* one better. [21:52:11] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [21:52:14] ottomata: https://github.com/etsy/statsd/blob/master/docs/metric_types.md [21:52:33] ja i am reading that gwicke [21:52:35] RECOVERY - puppet last run on mw1032 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:52:37] so, counters sum, is what you mean [21:52:46] each counter value received is added to the current value [21:52:47] ? [21:53:02] yup [21:53:09] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:53:57] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:54:21] RECOVERY - puppet last run on mw1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:55:45] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:58:31] (03CR) 10Chad: [C: 032] Moved sampling to the profiler config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175891 (owner: 10Aaron Schulz) [21:58:46] (03Merged) 10jenkins-bot: Moved sampling to the profiler config itself [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175891 (owner: 10Aaron Schulz) [21:58:48] (03CR) 10Chad: [C: 032] Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 (owner: 10Aaron Schulz) [21:58:51] (03CR) 10jenkins-bot: [V: 04-1] Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 (owner: 10Aaron Schulz) [22:00:04] ebernhardson: Dear anthropoid, the time has come. Please deploy Wikipedia Zero sublet to Flow team (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141203T2200). [22:02:47] (03PS5) 10Chad: Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 (owner: 10Aaron Schulz) [22:03:06] (03CR) 10Aaron Schulz: [C: 031] Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 (owner: 10Aaron Schulz) [22:03:35] (03CR) 10Chad: [C: 032] Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 (owner: 10Aaron Schulz) [22:03:42] (03Merged) 10jenkins-bot: Enable xhprof in labs, testwiki, and with ?forceprofile anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173472 (owner: 10Aaron Schulz) [22:04:54] !log demon Synchronized wmf-config/StartProfiler.php: xhprof & such (duration: 00m 05s) [22:04:56] Logged the message, Master [22:05:54] Jeff_Green: sorry, to close the loop, I managed to dump the tables I needed using Reedy's script. [22:06:19] <^d> AaronS: new profiling conf looking good in prod afaict. [22:06:30] awight: great [22:06:58] hehe, whenever ^d says something, I look up^^ [22:08:18] <^d> awight: I feel like there's a song for this :) [22:08:48] :p [22:10:44] (03CR) 10EBernhardson: [C: 032] Enable flow on officewiki NS_PROJECT_TALK [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177166 (owner: 10EBernhardson) [22:10:56] (03Merged) 10jenkins-bot: Enable flow on officewiki NS_PROJECT_TALK [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177166 (owner: 10EBernhardson) [22:11:51] !log ebernhardson Synchronized wmf-config/: Flow enable NS_PROJECT_TALK on officewiki (duration: 00m 07s) [22:11:53] Logged the message, Master [22:15:08] (03PS1) 10Cscott: Allow OCG to read the contents of its output/temp/postmortem directories. [puppet] - 10https://gerrit.wikimedia.org/r/177384 [22:15:45] hi Reedy [22:15:54] is there any way I can run scap manually just on mw1177? [22:15:57] bd808: ^ [22:16:06] would running scap on that host have the same effect? [22:16:07] sync-common [22:16:10] sync... [22:16:11] ^ [22:16:13] I think it didn't quite deploy mw properly [22:16:26] yeah, just log onto it and run sync-common [22:16:29] right [22:16:31] doing [22:16:43] this gonna take a while isn't it? :) [22:17:02] shouldn't really [22:17:04] 5-10 at the most [22:17:23] (03PS2) 10Yuvipanda: Tool labs: webservice2 syntax for bigbrother [puppet] - 10https://gerrit.wikimedia.org/r/177262 (owner: 10coren) [22:17:45] (03CR) 10Yuvipanda: [C: 032] "Looks ok to my very limited knowledge of perl." [puppet] - 10https://gerrit.wikimedia.org/r/177262 (owner: 10coren) [22:18:55] RECOVERY - HHVM rendering on mw1177 is OK: HTTP OK: HTTP/1.1 200 OK - 69233 bytes in 4.521 second response time [22:19:21] Reedy: wah, only 30s [22:19:47] bd808: I guess l10n cache needs building on that machine too? [22:19:53] it's brand new. [22:20:02] scap-rebuild-cdbs [22:20:03] I think [22:20:04] there's a flag for that [22:20:18] --no-update-l10n Do not update l10n cache files. [22:20:19] hmm [22:20:24] YuviPanda: it might've been up to date then [22:20:42] well, it was throwing MW errors before [22:20:43] and isn't now [22:21:01] (03PS1) 10Ottomata: Remove trailing spaces in logster.py [debs/logster] - 10https://gerrit.wikimedia.org/r/177388 [22:21:02] profit! [22:21:24] (03CR) 10Ottomata: [C: 032 V: 032] Remove trailing spaces in logster.py [debs/logster] - 10https://gerrit.wikimedia.org/r/177388 (owner: 10Ottomata) [22:21:39] (03PS1) 10Ottomata: Fix statsd_submit method parameter [debs/logster] - 10https://gerrit.wikimedia.org/r/177389 [22:21:55] YuviPanda, another graphite question: is it possible to archive / delete old graphite entries? [22:21:57] (03CR) 10Ottomata: [C: 032] Fix statsd_submit method parameter [debs/logster] - 10https://gerrit.wikimedia.org/r/177389 (owner: 10Ottomata) [22:21:58] !log updated Parsoid to version 733986a6 [22:22:03] (03CR) 10Ottomata: [V: 032] Fix statsd_submit method parameter [debs/logster] - 10https://gerrit.wikimedia.org/r/177389 (owner: 10Ottomata) [22:22:03] Logged the message, Master [22:22:20] gwicke: manually, yes. I've a script that almost does it as well. [22:22:36] gwicke: you need to get on the machine and just move the files around and restart txstatsd and it's all good. [22:22:55] YuviPanda: okay, thanks! [22:23:02] gwicke: yw! [22:23:12] I'll ping you once we are happy with the format [22:23:18] :) [22:23:19] ok [22:23:28] gwicke: if I'm not around, just file on phab and cc me [22:23:44] YuviPanda: yup, will do. Thanks! [22:23:56] gwicke: :) [22:24:27] Reedy: bd808 hmm, so I rebuilt cdbs as well, and it didn't actually seem to touch anything [22:24:41] YuviPanda: It might've just been nearly up to date [22:24:42] Reedy: bd808 is there more testing I should do before re-pooling it? [22:24:54] You've run apache-fast test against a couple of urls? [22:24:56] * YuviPanda is embarassingly nooby about our actual mw deployment setup. [22:24:58] yeah [22:25:00] and they seem ok [22:25:15] I think it'll be fine then [22:25:21] alright [22:25:39] PROBLEM - Parsoid on wtp1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:25:40] We need the hhvm logstash bits merged and deployed so we can see things when they blow up [22:25:51] !log repooled mw1177 [22:25:53] Logged the message, Master [22:27:01] (03PS5) 10Yuvipanda: facilities: move to module [puppet] - 10https://gerrit.wikimedia.org/r/176863 (owner: 10Dzahn) [22:28:27] RECOVERY - Parsoid on wtp1004 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.007 second response time [22:29:42] (03CR) 10Yuvipanda: [C: 032] facilities: move to module [puppet] - 10https://gerrit.wikimedia.org/r/176863 (owner: 10Dzahn) [22:33:10] oh! [22:33:22] i thought all varnishes had been upgraded to trusty [22:33:23] guess not.. [22:33:23] hm [22:34:30] (03PS1) 10Ottomata: Bump version to 0.8 [debs/logster] - 10https://gerrit.wikimedia.org/r/177394 [22:34:45] (03CR) 10Ottomata: [C: 032 V: 032] Bump version to 0.8 [debs/logster] - 10https://gerrit.wikimedia.org/r/177394 (owner: 10Ottomata) [22:36:48] csteipp: can i poke you about https://gerrit.wikimedia.org/r/177384 ? apparmor is preventing OCG from garbage collecting its cache directories. [22:38:38] (03CR) 10CSteipp: [C: 031] "I thought they fixed where /** didn't refer to the directory itself. But yeah, if this works, no harm." [puppet] - 10https://gerrit.wikimedia.org/r/177384 (owner: 10Cscott) [22:38:50] cscott: You have to get someone in ops to merge it [22:39:16] csteipp: the log entry was: [22:39:17] Dec 3 22:06:10 ocg1001 kernel: [11677020.109237] type=1400 audit(1417644370.869:5207): apparmor="DENIED" operation="open" profile="/usr/bin/nodejs-ocg" name="/srv/deployment/ocg/output/" pid=28285 comm="nodejs-ocg" requested_mask="r" denied_mask="r" fsuid=997 ouid=997 [22:39:28] hmm, I *could* but I actually have absolutely no experience with apparmor [22:39:29] so it appears that /** doesn't actually refer to the directory itself. alas. [22:39:56] i usually poke _joe_ around this time and he grumbles about being tired and sleepy and past his bedtime [22:40:16] Yeah, I think I hit that exact bug a while back, and there was an apparmor fix for it... but must not have made it. [22:40:16] heh :) [22:43:03] (03CR) 10Cscott: "IRC:" [puppet] - 10https://gerrit.wikimedia.org/r/177384 (owner: 10Cscott) [22:49:10] People can't login: https://nl.wikipedia.org/w/index.php?title=Help:Helpdesk&curid=1728882&diff=42634739&oldid=42629431 [22:49:59] ori, HHVM-related? ^^^ [22:51:02] sjoerddebruin: whats their username? [22:51:20] I think Piero. [22:51:27] Hence his IP-contributions. [22:51:48] (non-sul user, btw.) [22:52:21] "signing up" or "logging in"? [22:52:51] https://en.wikipedia.org/wiki/Special:CentralAuth/Piero :| [22:52:58] Google translate says "signing up"... but I'm not sure if that's accurate [22:53:17] csteipp: logging in [22:54:04] I can see them triggering the auto-migration stuff [22:54:32] but they only make it to idwiki [22:54:42] and then I'm guessing it times out [22:54:53] Auto-migration is on now? [22:55:17] if there are no conflicts [22:55:34] last time this came up the user had like 200 accounts and we just forced them onto a g lobal [22:55:39] onto a global account* [22:57:09] So it's clearly SUL-related? [22:57:12] yes [22:57:20] Okay, going to say that. [22:57:29] but why does it involve a 503? [22:58:15] hmm, it should be a 504... [22:59:42] It seems odd that it would time out on so few (relatively) wikis.. 63? [23:00:06] yeah... [23:00:26] to me, sounds more like a hhvm crash [23:00:49] are hhvm logs different from exception/fatal.log? [23:01:14] bd808: ^ [23:01:18] he was saying something about it... [23:01:36] legoktm: hhvm.log [23:02:20] for hhvm stuff that would be equivalent to the apache2.log errors for php5 [23:03:32] no mentions of "CentralAuth" or "Userlogin" in hhvm.log... [23:04:58] YuviPanda: https://gerrit.wikimedia.org/r/#/c/176693/ would help with making hhvm errors more visible (hint hint) [23:05:53] sjoerddebruin: so...ignore what I said earlier. Not sure what it is. [23:06:03] sjoerddebruin: Could you (or the user) file a bug for this? [23:06:11] (03PS12) 10Yuvipanda: logstash: Forward syslog events for apache2 + hhvm [puppet] - 10https://gerrit.wikimedia.org/r/176693 (owner: 10BryanDavis) [23:06:13] bd808: alright... [23:06:44] bd808: I can merge it and babysit for a while if you can help check the logstash side of things. I've not so much as used it ever. [23:07:08] bd808: or if it's too late for you we can do it tomorrow too :) [23:07:11] * YuviPanda has no idea of bd808's TZ [23:07:11] YuviPanda: I can totally handle the logstash bits. I have sudo there and everything [23:07:17] bd808: ah, cool then [23:07:26] 16:00 local time here [23:07:27] (03CR) 10Yuvipanda: [C: 032] logstash: Forward syslog events for apache2 + hhvm [puppet] - 10https://gerrit.wikimedia.org/r/176693 (owner: 10BryanDavis) [23:07:43] bd808: ah, same as PST [23:07:59] forcing puppet run on m1170 [23:08:23] that's one hour later than PST :) [23:08:31] PST is 15:00 local time now. Same as PDT :) [23:08:41] wait what. [23:08:42] oh, right [23:08:44] that. [23:08:45] * greg-g nods [23:08:59] 'tis 3pm, in ugly civilian notation [23:09:11] DST is annoying to those who live in tropics ;) [23:09:19] heh, yeah. [23:09:24] <^d> DST is annoying. [23:09:27] <^d> Period. [23:09:28] <^d> Full stop. [23:09:30] I didn't really understand *why* you would want to do that until I was in the UK :) [23:09:33] and even then, bleh. [23:09:54] <^d> I don't understand it and I've never lived near the equator. [23:10:00] YuviPanda: but aren't you one of the weirdo GMT offsets? Like x:45 or x:30 or something [23:10:09] bd808: +0530, yes [23:10:21] bd808: one of these days I want to work for a while from Nepal, just for their +0545 offset [23:10:24] UTC % π [23:10:32] heh [23:10:36] beats! [23:10:47] swatch internet time! [23:11:00] * YuviPanda is young enough to have known of swatch time only from the wiki article [23:11:08] bd808: check logstash to see if events are coming in? [23:11:37] <^d> YuviPanda: China's also fun. Country spans 4 1/2 "standard" timezones. [23:11:42] <^d> Except they only use 1. [23:12:02] heh [23:12:08] that must be fun for people on the extremes [23:12:17] I think TimeZones in general are terrible. [23:12:28] conforming to rules about when you should wake up / work [23:12:43] ^d: I loved that about china when I traveled around there for a month. Guess how hard it was to plan things with train schedules and flight schedules? Not hard at all! [23:12:48] and 'light good, dark bad!' [23:12:54] 'a grue might eat you!' [23:12:54] <^d> YuviPanda: Doesn't bother people in Beijing, Shanghai or other costal places. Considering the whole country uses UTC+8. [23:13:01] <^d> It gets worse the further west you get. [23:13:13] ^d: not surprising, considering china hates the west [23:13:14] * YuviPanda hides [23:13:19] we traveled the west side (Kunming) [23:13:20] <^d> I was about to say :p [23:13:34] * YuviPanda has never been to china. [23:14:11] * greg-g re-looks at the map [23:14:12] YuviPanda: I am seeing lots of input with tcpdump [23:14:15] I guess "west" [23:14:21] <^d> Also, UTC+7 seems very unloved. [23:14:33] <^d> http://www.fgienr.net/time-zone/fuseaux.gif [23:14:54] bd808: heh :) [23:15:11] YuviPanda: yup. we haz syslog input -- https://logstash.wikimedia.org/#dashboard/temp/UwFvOkJMSian2EJpfTvTIw [23:15:21] yay [23:15:50] "Notice: Undefined index: name in /srv/mediawiki/php-1.25wmf10/includes/specials/SpecialVersion.php on line 609" [23:16:00] bd808: \o/ cool. [23:16:00] bd808 have you seen this? http://untergeek.com/2012/10/11/using-rsyslog-to-send-pre-formatted-json-to-logstash/ [23:16:08] would like to try it someday [23:16:34] there's an rsyslog elasticsearch output plugin too [23:16:48] so it bypasses logstash? [23:17:05] yeah. just dump right to elastic [23:17:14] bd808: I'm going to be around for another 5-10 mins to make sure nothing immediate blows up :) [23:17:23] hm. as long as tags are consistent i guess that sounds ok [23:17:31] http://www.rsyslog.com/doc/master/configuration/modules/omelasticsearch.html [23:18:11] <^d> Is there a benefit to logstash? [23:18:14] * greg-g takes too long doing this... [23:18:22] our route, only with trains/buses instead of driving: https://goo.gl/maps/7fvJE [23:18:23] <^d> Why not just format your data to elastic and go straight to it? [23:18:44] ^d: It's an input filter (unix pipe style) so you could twist things around if you need to [23:19:16] but if you can make your log endpoint do just what you want logstash doesn't add anything [23:19:54] !log ebernhardson Synchronized php-1.25wmf10/extensions/Flow/includes/Parsoid/: (no message) (duration: 00m 05s) [23:19:58] Logged the message, Master [23:20:19] ori: Is the apache error "AH01070: Error parsing script headers" the fcgi output when the client disconnects? [23:21:49] "Notice: Undefined property: stdClass::$cuc_id in /srv/mediawiki/php-1.25wmf10/extensions/Flow/Hooks.php on line 350" [23:22:17] "Notice: Undefined index: 0 in /srv/mediawiki/php-1.25wmf10/languages/Language.php on line 3348" [23:25:13] Hmm, YuviPanda [23:25:16] any idea why this wouldnt' work? [23:25:16] http://graphite.wikimedia.org/render?target=perSecond(varnishkafka.cp1056.bits.varnishkafka.seq.value)&from=-5min&until=now&format=json&maxDataPoints=1370 [23:25:21] perSecond [23:25:22] ? [23:27:07] ottomata: I suspect it's not available in the graphite version we have/ [23:27:08] ? [23:27:26] ottomata: we have 0.9.12 [23:27:44] ottomata: and I don't find 'perSecond' in https://graphite.readthedocs.org/en/0.9.12/functions.html [23:28:06] ottomata: I see it's introduced in 0.10 [23:29:31] awww man [23:29:43] that's the one that would make these ever increasing gauge stats useful [23:30:27] well well, maybe we should upgrade! [23:31:50] ottomata: heh, talk to godog, he's been re-working our graphite/txstatsd setup (at least the hardware) [23:32:15] ottomata: you can combine functions too, maybe ask in the graphite IRC channels? [23:32:34] YuviPanda: # labs-vagrant up [23:32:42] /bin/labs-vagrant:3: undefined method `require_relative' for main:Object (NoMethodError) [23:32:47] Does that look familiar? [23:32:51] (This is on a brand now VM) [23:32:56] andrewbogott: is it precise? [23:33:00] yes [23:33:02] andrewbogott: if so, you need to checkout the precise-compat branch [23:33:12] …ok [23:33:16] *nod* ruby version incompatibility [23:33:18] Could the puppet labs-vagrant class do that? [23:33:27] (03PS1) 10Hoo man: Don't lookup Sites from mc for the 'languageLinkSiteGroup' setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177419 [23:33:39] aude: ^ +1? [23:33:54] bd808: andrewbogott hmm, it probably could, at least for new machines [23:34:19] Looks like that's working, anyway -- thanks! [23:34:27] andrewbogott: cool. [23:34:32] (03CR) 10Aude: [C: 031] "looks fine, it's before we special case it for wikidata and commons" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177419 (owner: 10Hoo man) [23:34:32] there [23:35:00] (03CR) 10Hoo man: [C: 032] Don't lookup Sites from mc for the 'languageLinkSiteGroup' setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177419 (owner: 10Hoo man) [23:35:08] (03Merged) 10jenkins-bot: Don't lookup Sites from mc for the 'languageLinkSiteGroup' setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/177419 (owner: 10Hoo man) [23:35:28] thanks YuviPanda. i'm out for the eve. [23:35:29] laters! [23:35:32] bd808: andrewbogott https://phabricator.wikimedia.org/T76675?workflow=create [23:35:34] ottomata: cya! [23:35:55] !log hoo Synchronized wmf-config/Wikibase.php: Don't lookup Sites from mc for the 'languageLinkSiteGroup' setting (duration: 00m 06s) [23:36:01] Logged the message, Master [23:36:04] :) [23:37:08] PROBLEM - puppet last run on virt1003 is CRITICAL: CRITICAL: Puppet has 1 failures [23:37:35] (03PS1) 10Yuvipanda: labs_vagrant: Check out precise-compat branch for precise hosts [puppet] - 10https://gerrit.wikimedia.org/r/177422 [23:37:38] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: Puppet has 1 failures [23:37:40] andrewbogott: bd808 ^, need to check what it does to hosts that are already on a different branch, though [23:37:49] PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: Puppet has 1 failures [23:38:03] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: Puppet has 1 failures [23:38:17] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Puppet has 1 failures [23:38:30] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Puppet has 1 failures [23:38:31] PROBLEM - puppet last run on virt1004 is CRITICAL: CRITICAL: Puppet has 1 failures [23:38:37] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Puppet has 1 failures [23:38:38] PROBLEM - puppet last run on elastic1019 is CRITICAL: CRITICAL: Puppet has 1 failures [23:38:52] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Puppet has 1 failures [23:39:09] PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:39:09] PROBLEM - puppet last run on db2007 is CRITICAL: CRITICAL: Puppet has 1 failures [23:39:21] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Puppet has 1 failures [23:39:22] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: Puppet has 1 failures [23:39:23] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: Puppet has 1 failures [23:39:25] PROBLEM - puppet last run on mw1247 is CRITICAL: CRITICAL: Puppet has 1 failures [23:39:31] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: Puppet has 1 failures [23:39:40] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: Puppet has 1 failures [23:39:49] YuviPanda: It shouldn't mess with an existing clone [23:40:00] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:00] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:00] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:05] bd808: yeah, that's what I got from reading git::clone code too. [23:40:09] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:09] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:09] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:09] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:11] bd808: since we don't have ensure => latest set [23:40:22] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:22] YuviPanda: *nod* [23:40:31] PROBLEM - puppet last run on mw1014 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:40] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:43] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:47] bd808: wanna +1? [23:40:49] PROBLEM - puppet last run on search1002 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:49] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 1 failures [23:41:09] YuviPanda: you're ops now! just merge it :) [23:41:17] bd808: tch tch :) [23:41:29] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Puppet has 1 failures [23:41:31] bd808: haven't directly broken any prod service yet, so not real ops yet. [23:41:32] (03CR) 10BryanDavis: [C: 031] "Seems like it should work." [puppet] - 10https://gerrit.wikimedia.org/r/177422 (owner: 10Yuvipanda) [23:41:39] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures [23:41:50] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Puppet has 1 failures [23:41:54] I gave you a chance with this hhvm/rsyslog stuff [23:43:21] Reedy: https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor has hhvm errors now. [23:43:36] sweeet [23:44:00] And there's a ton of bad index access errors to squash [23:44:22] we must not have had php5's error reporting turned up for that [23:44:51] ughwtf Warning: could not unserialize value, no igbinary support [23:44:58] (03PS1) 10Yuvipanda: base: Make number of days acct logs are kept customizable [puppet] - 10https://gerrit.wikimedia.org/r/177427 [23:45:02] bd808: Sanitizer? [23:45:16] Krinkle|detached and I found that (and many others) with the "error" log [23:45:25] I think I filed a bug for it [23:45:31] <^d> MaxSem: we still had shuff igbinaried after all this time? [23:45:36] <^d> We swapped that out *months* ago [23:45:59] * YuviPanda restarts our entire memcached cluster, at once [23:46:03] that should fix it [23:46:13] lol [23:46:15] ^d, I suspect a server might be outta sync again and still poop serialized data into caches [23:46:27] YuviPanda: That's one way to take the site down :) [23:47:01] Reedy: heh :) that's what we have salt for [23:47:02] <^d> MaxSem: So flush all the caches and scap. Ok let's do it. [23:47:31] bd808: https://phabricator.wikimedia.org/T75487 [23:47:32] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [23:47:36] well, if it out of sync it doesn't mean that a scap will help [23:47:36] Reedy: They are from all over (Echo, Language, ttmserver, doublewiki, wikipage, tmh) [23:47:50] The echo one has a patch pending I think [23:47:57] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [23:47:58] RECOVERY - puppet last run on db2007 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [23:48:00] I found it in beta the other day and reported it [23:48:14] I did do a pass on the hhvm logs a month or 2 ago [23:48:17] logging bugs/fixing [23:48:34] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [23:48:38] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:49:10] (03PS2) 10Yuvipanda: base: Make number of days acct logs are kept customizable [puppet] - 10https://gerrit.wikimedia.org/r/177427 [23:49:20] lol ^d, try php5 -i|grep igbinary anywhere [23:49:32] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:49:32] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:49:32] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:49:32] RECOVERY - puppet last run on search1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:49:43] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [23:49:45] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:49:49] MaxSem: Yeah, I remember seeing loads of those in the all of the error logs [23:50:08] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [23:50:09] <^d> MaxSem: Works on tin :p [23:50:20] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [23:50:20] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [23:50:29] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:50:35] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [23:50:36] RECOVERY - puppet last run on virt1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:50:48] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:50:55] RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:51:09] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:51:21] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:51:25] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:51:27] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:51:33] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:51:45] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:51:45] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:52:15] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:52:15] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:52:15] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:52:16] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:52:16] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:52:17] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:52:25] RECOVERY - puppet last run on virt1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:52:34] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:52:39] (03PS2) 10Yuvipanda: labs_vagrant: Check out precise-compat branch for precise hosts [puppet] - 10https://gerrit.wikimedia.org/r/177422 [23:53:28] bd808: Yet another sign I really need to poke at this logstash and snetry stuff [23:53:48] It would be cool to get that working [23:54:32] half of the hhvm error stream is undefined index warnings [23:54:56] (03CR) 10Yuvipanda: [C: 032] labs_vagrant: Check out precise-compat branch for precise hosts [puppet] - 10https://gerrit.wikimedia.org/r/177422 (owner: 10Yuvipanda) [23:55:18] * Reedy dreams of the days where these issues are reported to phabriactor automatically, and only once [23:56:04] bd808: alright, looks like I didn't become real ops today with that logstash change :) I'm off now [23:56:51] YuviPanda: o/ thanks for the help [23:56:58] \o [23:56:59] night [23:57:02] (03PS1) 10MaxSem: Remove igbinary everywhere [puppet] - 10https://gerrit.wikimedia.org/r/177431 [23:57:09] ^d, ^^^ [23:57:17] <^d> I saw :) [23:57:59] (03CR) 10Chad: [C: 031] Remove igbinary everywhere [puppet] - 10https://gerrit.wikimedia.org/r/177431 (owner: 10MaxSem) [23:59:28] (03PS1) 10BryanDavis: Convert rsyslog config to RainerScript-based filters [puppet] - 10https://gerrit.wikimedia.org/r/177432