[00:02:21] PROBLEM - puppet last run on virt1005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:01] PROBLEM - puppet last run on elastic1010 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:02] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:11] PROBLEM - puppet last run on elastic1013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:12] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:12] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:12] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:21] PROBLEM - puppet last run on mw1015 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:21] PROBLEM - puppet last run on search1019 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:21] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:21] PROBLEM - puppet last run on mw1075 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:22] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:22] PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:31] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:41] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:42] PROBLEM - puppet last run on elastic1009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:43] PROBLEM - puppet last run on db1049 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:43] PROBLEM - puppet last run on labsdb1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:43] PROBLEM - puppet last run on mw1095 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:43] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:43] PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:43] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:43] PROBLEM - puppet last run on search1008 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:44] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:44] PROBLEM - puppet last run on virt1009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:45] PROBLEM - puppet last run on mw1085 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:45] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:46] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:47] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:47] PROBLEM - puppet last run on db1058 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:47] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:48] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:49] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:52] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:52] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: Puppet has 1 failures [00:03:52] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:01] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:01] PROBLEM - puppet last run on mw1070 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:02] PROBLEM - puppet last run on db1005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:02] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:11] PROBLEM - puppet last run on es1003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:11] PROBLEM - puppet last run on search1021 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:12] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:13] PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:13] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:13] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:31] PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:35] PROBLEM - puppet last run on erbium is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:35] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:35] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:35] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:35] PROBLEM - puppet last run on analytics1024 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:36] PROBLEM - puppet last run on ssl1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:36] PROBLEM - puppet last run on nickel is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:37] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:41] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:41] PROBLEM - puppet last run on es1005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:42] PROBLEM - puppet last run on db1041 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:42] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:42] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:42] PROBLEM - puppet last run on mw1094 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:42] PROBLEM - puppet last run on es7 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:42] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:43] PROBLEM - puppet last run on amslvs4 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:51] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:51] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:51] PROBLEM - puppet last run on elastic1016 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:52] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:52] PROBLEM - puppet last run on mw1083 is CRITICAL: CRITICAL: Puppet has 1 failures [00:04:52] PROBLEM - puppet last run on tmh1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:05] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:05] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:05] PROBLEM - puppet last run on mw1035 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:05] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:16] PROBLEM - puppet last run on search1014 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:16] PROBLEM - puppet last run on search1009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:17] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:31] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:31] apt timeout [00:05:31] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:31] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:31] PROBLEM - puppet last run on mw1036 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:31] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:32] PROBLEM - puppet last run on es10 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:32] PROBLEM - puppet last run on db2033 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:33] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:33] PROBLEM - puppet last run on analytics1019 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:41] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:41] PROBLEM - puppet last run on mw1096 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:41] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:42] PROBLEM - puppet last run on es4 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:42] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:05:57] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:05:57] PROBLEM - puppet last run on mc1008 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:03] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:03] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:11] PROBLEM - puppet last run on zirconium is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:11] PROBLEM - puppet last run on mc1010 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:12] PROBLEM - puppet last run on db1007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:21] PROBLEM - puppet last run on mw1062 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:21] PROBLEM - puppet last run on wtp1024 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:21] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:22] PROBLEM - puppet last run on db1010 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:22] PROBLEM - puppet last run on ytterbium is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:31] PROBLEM - puppet last run on cp1043 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:37] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:39] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:39] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:41] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:51] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:52] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:52] PROBLEM - puppet last run on wtp1021 is CRITICAL: CRITICAL: Puppet has 1 failures [00:06:52] PROBLEM - puppet last run on es1009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:01] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:11] PROBLEM - puppet last run on mw1109 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:11] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:12] PROBLEM - puppet last run on ssl3003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:15] * RD calls Houston [00:07:21] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:27] PROBLEM - puppet last run on search1020 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:41] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:41] PROBLEM - puppet last run on tarin is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:41] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:42] PROBLEM - puppet last run on mw1124 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:42] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: Puppet has 1 failures [00:07:52] PROBLEM - puppet last run on mw1040 is CRITICAL: CRITICAL: Puppet has 1 failures [00:08:11] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:01] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:02] PROBLEM - puppet last run on virt1003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:02] PROBLEM - puppet last run on wtp1005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:11] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:11] PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:11] PROBLEM - puppet last run on db2007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:11] PROBLEM - puppet last run on virt1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:12] PROBLEM - puppet last run on mw1055 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:12] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:21] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:24] Caught this on db1009 during another burst of failures: [00:13:25] Err http://security.ubuntu.com precise-security Release.gpg Connection failed [IP: 208.80.154.10 8080] [00:13:29] PROBLEM - puppet last run on pc1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:29] PROBLEM - puppet last run on es1007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:35] PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:35] PROBLEM - puppet last run on mw1044 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:42] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:42] PROBLEM - puppet last run on mw1076 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:43] PROBLEM - puppet last run on wtp1012 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:43] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:43] PROBLEM - puppet last run on antimony is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:43] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:43] PROBLEM - puppet last run on polonium is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:44] PROBLEM - puppet last run on mw1149 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:51] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:51] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:51] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:52] PROBLEM - puppet last run on virt1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:52] PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:52] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:52] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:53] PROBLEM - puppet last run on search1005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:13:53] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:01] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:01] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:01] PROBLEM - puppet last run on labstore1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:01] PROBLEM - puppet last run on mw1111 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:01] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:02] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:14] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:14] PROBLEM - puppet last run on nfs1 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:14] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:14] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:14] PROBLEM - puppet last run on db1016 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:15] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:15] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:16] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:16] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:17] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:17] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:18] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:23] PROBLEM - puppet last run on mw1049 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:23] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:23] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:23] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:23] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:24] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:31] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:31] PROBLEM - puppet last run on mw1051 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:31] PROBLEM - puppet last run on ssl1005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:31] PROBLEM - puppet last run on search1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:31] PROBLEM - puppet last run on mw1079 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:32] PROBLEM - puppet last run on snapshot1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:41] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:41] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:41] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:42] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:42] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:51] PROBLEM - puppet last run on analytics1013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:52] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:52] PROBLEM - puppet last run on virt1007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:52] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:52] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:52] PROBLEM - puppet last run on thallium is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:53] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:53] PROBLEM - puppet last run on analytics1026 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:54] PROBLEM - puppet last run on elastic1006 is CRITICAL: CRITICAL: Puppet has 1 failures [00:14:54] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:01] PROBLEM - puppet last run on mw1014 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:14] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:14] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:14] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:14] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:14] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:15] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:15] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:16] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:16] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:17] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:17] PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:18] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:18] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:19] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:19] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:20] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:20] PROBLEM - puppet last run on db72 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:21] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:21] PROBLEM - puppet last run on db1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:22] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:22] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:23] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:23] PROBLEM - puppet last run on mw1146 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:24] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:24] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:25] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:25] PROBLEM - puppet last run on mw1081 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:26] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:26] PROBLEM - puppet last run on mw1156 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:27] PROBLEM - puppet last run on analytics1023 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:27] PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:31] PROBLEM - puppet last run on search1017 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:31] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:31] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:31] PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:31] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:41] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:45] PROBLEM - puppet last run on ssl1009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:46] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:51] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:53] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:56] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:56] PROBLEM - puppet last run on ssl3002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:57] PROBLEM - puppet last run on mw1034 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:57] PROBLEM - puppet last run on analytics1037 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:57] PROBLEM - puppet last run on argon is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:57] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Puppet has 1 failures [00:15:57] PROBLEM - puppet last run on snapshot1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:01] PROBLEM - puppet last run on db69 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:01] PROBLEM - puppet last run on ms-be1008 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:01] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:01] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:01] PROBLEM - puppet last run on mw1074 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:02] PROBLEM - puppet last run on search1024 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:02] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:03] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:03] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:11] PROBLEM - puppet last run on mw1057 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:11] PROBLEM - puppet last run on mw1159 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:11] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:11] PROBLEM - puppet last run on mw1030 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:11] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:12] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:12] PROBLEM - puppet last run on elastic1011 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:21] PROBLEM - puppet last run on mw1050 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:21] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:21] PROBLEM - puppet last run on elastic1014 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:21] PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:22] PROBLEM - puppet last run on mc1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:23] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:41] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:41] PROBLEM - puppet last run on analytics1032 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:42] PROBLEM - puppet last run on ssl1008 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:42] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:42] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:42] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:42] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:43] PROBLEM - puppet last run on search1023 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:43] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:51] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:51] RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:16:51] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:51] PROBLEM - puppet last run on analytics1011 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:52] PROBLEM - puppet last run on db1057 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:52] PROBLEM - puppet last run on protactinium is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:52] PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:53] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: Puppet has 1 failures [00:16:53] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:01] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:01] PROBLEM - puppet last run on mc1007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:01] PROBLEM - puppet last run on mw1087 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:01] PROBLEM - puppet last run on db1055 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:01] PROBLEM - puppet last run on mw1116 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:11] PROBLEM - puppet last run on mw1032 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:15] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:15] PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:16] PROBLEM - puppet last run on elastic1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:16] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:16] PROBLEM - puppet last run on db2003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:21] PROBLEM - puppet last run on wtp1011 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:21] PROBLEM - puppet last run on rcs1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:21] PROBLEM - puppet last run on es1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:28] PROBLEM - puppet last run on mw1097 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:28] PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:28] PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:31] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:41] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:17:41] PROBLEM - puppet last run on wtp1015 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:41] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:41] RECOVERY - puppet last run on virt1005 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:17:42] RECOVERY - puppet last run on search1019 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [00:17:42] PROBLEM - puppet last run on elastic1005 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:42] PROBLEM - puppet last run on titanium is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:43] RECOVERY - puppet last run on mw1075 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:17:43] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:44] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:44] RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:17:45] PROBLEM - puppet last run on analytics1014 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:52] PROBLEM - puppet last run on db1063 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:52] PROBLEM - puppet last run on mw1033 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:53] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:53] PROBLEM - puppet last run on amslvs3 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:53] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:53] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:53] RECOVERY - puppet last run on labsdb1002 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [00:17:54] RECOVERY - puppet last run on db1049 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:17:54] PROBLEM - puppet last run on mw1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:55] PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:17:55] PROBLEM - puppet last run on ocg1003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:01] RECOVERY - puppet last run on search1008 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:18:11] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [00:18:11] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:18:11] RECOVERY - puppet last run on mw1085 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:18:11] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:18:11] PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:12] RECOVERY - puppet last run on lanthanum is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [00:18:12] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:18:13] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:18:13] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [00:18:14] PROBLEM - puppet last run on search1015 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:14] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [00:18:15] RECOVERY - puppet last run on mw1070 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:18:15] PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:16] PROBLEM - puppet last run on mw1122 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:16] PROBLEM - puppet last run on pc1003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:21] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:18:21] RECOVERY - puppet last run on elastic1010 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:18:21] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:18:22] PROBLEM - puppet last run on mc1013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:22] PROBLEM - puppet last run on db1070 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:22] RECOVERY - puppet last run on db1005 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:18:22] RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [00:18:23] PROBLEM - puppet last run on db1030 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:31] RECOVERY - puppet last run on elastic1013 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:18:32] PROBLEM - puppet last run on db2011 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:32] PROBLEM - puppet last run on mw1167 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:32] PROBLEM - puppet last run on search1013 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:32] PROBLEM - puppet last run on mw1105 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:32] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:33] PROBLEM - puppet last run on ssl1007 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:33] PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:34] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:18:34] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:18:35] PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:35] PROBLEM - puppet last run on db1064 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:49] PROBLEM - puppet last run on wtp1003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:49] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [00:18:49] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [00:18:51] PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:51] PROBLEM - puppet last run on mw1024 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:51] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:18:51] PROBLEM - puppet last run on mw1077 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:51] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:18:52] PROBLEM - puppet last run on mw1121 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:52] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:53] PROBLEM - puppet last run on sanger is CRITICAL: CRITICAL: Puppet has 1 failures [00:18:53] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [00:19:02] PROBLEM - puppet last run on nitrogen is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:02] PROBLEM - puppet last run on mw1022 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:02] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:19:02] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:19:02] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:19:11] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:19:11] PROBLEM - puppet last run on mw1010 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:11] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:11] RECOVERY - puppet last run on cp1057 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [00:19:11] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:19:12] PROBLEM - puppet last run on mw1043 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:12] PROBLEM - puppet last run on iodine is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:13] PROBLEM - puppet last run on ssl1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:13] PROBLEM - puppet last run on mercury is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:23] PROBLEM - puppet last run on mw1139 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:23] PROBLEM - puppet last run on db1047 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:23] PROBLEM - puppet last run on ssl1006 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:23] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:23] PROBLEM - puppet last run on db2012 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:24] PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:24] PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:39] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:39] PROBLEM - puppet last run on mw1108 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:39] PROBLEM - puppet last run on mw1093 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:39] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:39] PROBLEM - puppet last run on calcium is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:40] PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:41] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:51] PROBLEM - puppet last run on search1022 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:51] PROBLEM - puppet last run on analytics1028 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:51] RECOVERY - puppet last run on wtp1014 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:19:51] PROBLEM - puppet last run on mw1091 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:51] PROBLEM - puppet last run on ms-be1011 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:52] PROBLEM - puppet last run on mw1219 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:52] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:53] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [00:19:53] PROBLEM - puppet last run on cp1037 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:54] RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:19:54] PROBLEM - puppet last run on search1011 is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:55] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:07] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:07] PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:07] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:08] PROBLEM - puppet last run on analytics1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:11] PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:11] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:20:11] PROBLEM - puppet last run on mw1016 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:11] PROBLEM - puppet last run on stat1003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:21] RECOVERY - puppet last run on elastic1016 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [00:20:21] PROBLEM - puppet last run on mw1209 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:21] PROBLEM - puppet last run on db2028 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:21] PROBLEM - puppet last run on virt1008 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:22] PROBLEM - puppet last run on cp1040 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:22] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:25] PROBLEM - puppet last run on dbstore1002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:25] PROBLEM - puppet last run on db2030 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:25] PROBLEM - puppet last run on db1038 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:31] PROBLEM - puppet last run on es1010 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:31] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:20:31] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:31] RECOVERY - puppet last run on es1003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:20:31] PROBLEM - puppet last run on mw1152 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:51] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [00:20:51] RECOVERY - puppet last run on wtp1024 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [00:20:52] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:53] RECOVERY - puppet last run on db1010 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [00:20:53] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:20:53] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: Puppet has 1 failures [00:20:53] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:20:53] RECOVERY - puppet last run on ssl1004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [00:20:54] RECOVERY - puppet last run on cp1043 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:20:54] RECOVERY - puppet last run on analytics1024 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:21:02] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [00:21:02] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:21:02] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Puppet has 1 failures [00:21:02] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [00:21:03] RECOVERY - puppet last run on tarin is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [00:21:03] RECOVERY - puppet last run on es10 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:21:03] RECOVERY - puppet last run on analytics1019 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:21:04] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:21:04] RECOVERY - puppet last run on db2033 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:21:12] RECOVERY - puppet last run on db1041 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [00:21:12] RECOVERY - puppet last run on es1005 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:21:12] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:21:12] RECOVERY - puppet last run on mw1095 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [00:21:12] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:21:13] RECOVERY - puppet last run on cp1069 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:21:13] RECOVERY - puppet last run on wtp1021 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [00:21:14] RECOVERY - puppet last run on mw1096 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:21:14] RECOVERY - puppet last run on mw1094 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:21:15] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:21:15] RECOVERY - puppet last run on labsdb1001 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:21:16] RECOVERY - puppet last run on virt1009 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [00:21:16] RECOVERY - puppet last run on es7 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:21:17] RECOVERY - puppet last run on mw1013 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:21:17] RECOVERY - puppet last run on mw1040 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:21:18] RECOVERY - puppet last run on es4 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [00:21:18] RECOVERY - puppet last run on db1058 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:21:21] RECOVERY - puppet last run on es1009 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:21:21] RECOVERY - puppet last run on mc1008 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:21:21] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [00:21:21] RECOVERY - puppet last run on mw1083 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:21:21] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:21:37] RECOVERY - puppet last run on tmh1002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:21:39] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:21:39] RECOVERY - puppet last run on search1009 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [00:21:40] RECOVERY - puppet last run on search1014 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [00:21:40] RECOVERY - puppet last run on search1021 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:21:40] RECOVERY - puppet last run on ssl3003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:21:47] RECOVERY - puppet last run on ms-be1005 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [00:21:50] RECOVERY - puppet last run on mw1062 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:21:50] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:21:50] RECOVERY - puppet last run on mc1010 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:21:50] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:21:50] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:21:51] RECOVERY - puppet last run on erbium is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:21:51] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:21:52] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:21:52] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [00:21:53] RECOVERY - puppet last run on search1020 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:21:53] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:21:54] RECOVERY - puppet last run on ytterbium is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:21:59] RECOVERY - puppet last run on mw1036 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [00:21:59] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:21:59] RECOVERY - puppet last run on nickel is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:22:07] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:22:08] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:22:08] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:22:08] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [00:22:17] RECOVERY - puppet last run on db1007 is OK: OK: Puppet is currently enabled, last run 93 seconds ago with 0 failures [00:22:17] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:22:17] PROBLEM - puppet last run on tmh1001 is CRITICAL: CRITICAL: Puppet has 1 failures [00:22:18] RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 83 seconds ago with 0 failures [00:22:27] RECOVERY - puppet last run on amslvs4 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:22:27] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:22:37] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 71 seconds ago with 0 failures [00:22:40] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 96 seconds ago with 0 failures [00:22:57] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [00:23:07] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 99 seconds ago with 0 failures [00:23:10] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 138 seconds ago with 0 failures [00:23:17] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 1 failures [00:23:37] RECOVERY - puppet last run on zirconium is OK: OK: Puppet is currently enabled, last run 156 seconds ago with 0 failures [00:23:47] RECOVERY - puppet last run on mw1035 is OK: OK: Puppet is currently enabled, last run 170 seconds ago with 0 failures [00:23:48] PROBLEM - puppet last run on search1004 is CRITICAL: CRITICAL: Puppet has 1 failures [00:24:07] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 167 seconds ago with 0 failures [00:27:07] RECOVERY - puppet last run on wtp1012 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:27:17] RECOVERY - puppet last run on polonium is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:27:18] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:27:27] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:27:27] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [00:27:28] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:27:49] RECOVERY - puppet last run on pc1002 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:27:57] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [00:27:57] RECOVERY - puppet last run on es1007 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:27:58] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:27:58] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [00:27:58] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:27:58] RECOVERY - puppet last run on db2007 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:27:58] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:27:59] RECOVERY - puppet last run on mw1076 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:27:59] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:28:07] RECOVERY - puppet last run on db1016 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:28:07] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:28:07] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:28:07] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:28:07] RECOVERY - puppet last run on mw1149 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [00:28:08] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:28:08] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:28:20] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:28:21] RECOVERY - puppet last run on analytics1026 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:28:21] RECOVERY - puppet last run on elastic1006 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:28:22] RECOVERY - puppet last run on virt1001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:28:28] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:28:28] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [00:28:38] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:28:38] RECOVERY - puppet last run on wtp1005 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [00:28:38] RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 65 seconds ago with 0 failures [00:28:38] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:28:38] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [00:28:39] RECOVERY - puppet last run on labstore1001 is OK: OK: Puppet is currently enabled, last run 66 seconds ago with 0 failures [00:28:39] RECOVERY - puppet last run on virt1003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [00:28:40] RECOVERY - puppet last run on mw1055 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:28:40] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:28:41] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [00:28:48] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [00:28:48] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:28:48] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:28:48] RECOVERY - puppet last run on mw1049 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [00:28:48] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:28:49] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:28:49] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [00:28:50] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:28:50] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:28:51] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:28:51] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:28:52] RECOVERY - puppet last run on nfs1 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:28:52] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:28:58] RECOVERY - puppet last run on ssl1005 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [00:28:58] RECOVERY - puppet last run on search1002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:28:58] RECOVERY - puppet last run on mw1051 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:28:58] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:28:59] RECOVERY - puppet last run on mw1044 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:28:59] RECOVERY - puppet last run on snapshot1002 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:28:59] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:29:00] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [00:29:08] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:29:10] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:29:23] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:29:24] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:29:24] RECOVERY - puppet last run on analytics1013 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [00:29:24] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:29:24] RECOVERY - puppet last run on db2037 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:29:24] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [00:29:25] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [00:29:26] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:29:26] RECOVERY - puppet last run on thallium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [00:29:26] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:29:27] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:29:28] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:29:28] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 63 seconds ago with 0 failures [00:29:28] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:29:29] RECOVERY - puppet last run on search1005 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:29:38] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:29:38] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [00:29:39] RECOVERY - puppet last run on mw1111 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:29:39] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [00:29:39] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:29:39] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:29:39] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [00:29:40] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [00:29:40] RECOVERY - puppet last run on db72 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:29:41] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:29:41] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:29:48] RECOVERY - puppet last run on virt1004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:29:48] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:29:48] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:29:49] RECOVERY - puppet last run on mw1050 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [00:29:49] RECOVERY - puppet last run on db1062 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:29:49] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:29:49] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:29:54] RECOVERY - puppet last run on mw1081 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:29:54] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [00:29:54] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [00:29:59] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:30:09] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:30:09] RECOVERY - puppet last run on analytics1023 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:30:09] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:30:09] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:30:09] RECOVERY - puppet last run on mw1056 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [00:30:10] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [00:30:10] RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:30:11] RECOVERY - puppet last run on search1017 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:30:11] RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:30:12] RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:30:12] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [00:30:13] RECOVERY - puppet last run on rubidium is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:30:13] RECOVERY - puppet last run on ssl1009 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:30:18] RECOVERY - puppet last run on elastic1014 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:30:18] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [00:30:18] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 64 seconds ago with 0 failures [00:30:18] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 64 seconds ago with 0 failures [00:30:18] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:30:19] RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:30:19] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [00:30:20] RECOVERY - puppet last run on mw1034 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:30:20] RECOVERY - puppet last run on argon is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [00:30:21] RECOVERY - puppet last run on analytics1037 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [00:30:21] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:30:22] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:30:22] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [00:30:28] RECOVERY - puppet last run on virt1007 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:30:28] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:30:28] RECOVERY - puppet last run on ssl3002 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [00:30:28] RECOVERY - puppet last run on snapshot1004 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:30:28] RECOVERY - puppet last run on db69 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:30:29] RECOVERY - puppet last run on ms-be1008 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [00:30:29] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [00:30:30] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [00:30:38] RECOVERY - puppet last run on search1024 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:30:40] RECOVERY - puppet last run on db1055 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [00:30:40] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:30:40] RECOVERY - puppet last run on mw1057 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:30:40] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:30:40] RECOVERY - puppet last run on wtp1022 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:30:41] RECOVERY - puppet last run on mw1030 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [00:30:41] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [00:30:42] RECOVERY - puppet last run on elastic1011 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:30:42] RECOVERY - puppet last run on db1004 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:30:43] RECOVERY - puppet last run on cp1038 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [00:30:43] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:30:48] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:30:48] RECOVERY - puppet last run on mw1097 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:30:48] RECOVERY - puppet last run on db1054 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [00:30:49] RECOVERY - puppet last run on mw1146 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:30:49] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:30:58] RECOVERY - puppet last run on rcs1001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:30:58] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [00:30:58] RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:30:58] RECOVERY - puppet last run on wtp1015 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:30:58] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:30:59] RECOVERY - puppet last run on elastic1005 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:31:08] RECOVERY - puppet last run on ssl1008 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:31:08] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:31:08] RECOVERY - puppet last run on wtp1023 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [00:31:08] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:31:08] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:31:18] RECOVERY - puppet last run on es1002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:31:18] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:31:18] RECOVERY - puppet last run on search1023 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:31:18] RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:31:18] RECOVERY - puppet last run on analytics1014 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:31:19] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:31:19] RECOVERY - puppet last run on db1057 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:31:23] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:31:23] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:31:39] RECOVERY - puppet last run on rdb1002 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:31:39] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:31:39] RECOVERY - puppet last run on protactinium is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [00:31:40] RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [00:31:40] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [00:31:40] RECOVERY - puppet last run on mw1074 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [00:31:49] RECOVERY - puppet last run on mw1087 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:31:49] RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 72 seconds ago with 0 failures [00:31:49] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:31:50] RECOVERY - puppet last run on mw1032 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:31:50] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [00:31:50] RECOVERY - puppet last run on search1015 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [00:31:50] RECOVERY - puppet last run on db1001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:31:51] RECOVERY - puppet last run on db1070 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:32:00] RECOVERY - puppet last run on mw1023 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:32:01] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:32:01] RECOVERY - puppet last run on elastic1002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:32:01] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:32:01] RECOVERY - puppet last run on db1064 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [00:32:01] RECOVERY - puppet last run on wtp1011 is OK: OK: Puppet is currently enabled, last run 64 seconds ago with 0 failures [00:32:11] RECOVERY - puppet last run on wtp1003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [00:32:11] RECOVERY - puppet last run on titanium is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:32:11] RECOVERY - puppet last run on db1044 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [00:32:19] RECOVERY - puppet last run on mc1013 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:32:20] RECOVERY - puppet last run on rcs1002 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:32:21] RECOVERY - puppet last run on wtp1007 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [00:32:29] RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [00:32:30] RECOVERY - puppet last run on analytics1011 is OK: OK: Puppet is currently enabled, last run 64 seconds ago with 0 failures [00:32:30] RECOVERY - puppet last run on db2003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:32:30] RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:32:30] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:32:30] RECOVERY - puppet last run on amslvs3 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [00:32:31] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:32:31] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:32:32] RECOVERY - puppet last run on ocg1003 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [00:32:39] RECOVERY - puppet last run on mc1007 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:32:49] RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [00:32:50] RECOVERY - puppet last run on mw1043 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [00:32:54] RECOVERY - puppet last run on mercury is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [00:32:56] RECOVERY - puppet last run on pc1003 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:33:07] RECOVERY - puppet last run on db1030 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:33:07] RECOVERY - puppet last run on search1013 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [00:33:07] RECOVERY - puppet last run on mw1105 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:33:07] RECOVERY - puppet last run on mw1167 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:33:19] RECOVERY - puppet last run on db2011 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [00:33:19] RECOVERY - puppet last run on dbstore1002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:33:19] RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:33:19] RECOVERY - puppet last run on mw1186 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:33:19] RECOVERY - puppet last run on ssl1007 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:33:20] RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 70 seconds ago with 0 failures [00:33:20] RECOVERY - puppet last run on db1047 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:33:21] RECOVERY - puppet last run on mw1108 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [00:33:21] RECOVERY - puppet last run on search1022 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:33:22] RECOVERY - puppet last run on mw1024 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [00:33:22] RECOVERY - puppet last run on mw1077 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:33:23] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:33:23] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:33:24] RECOVERY - puppet last run on mw1121 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [00:33:24] RECOVERY - puppet last run on db1038 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [00:33:29] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:33:29] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [00:33:29] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:33:29] RECOVERY - puppet last run on sanger is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:33:29] RECOVERY - puppet last run on mw1033 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:33:30] RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:33:30] RECOVERY - puppet last run on nitrogen is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:33:31] RECOVERY - puppet last run on mw1022 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [00:33:31] RECOVERY - puppet last run on analytics1004 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:33:39] RECOVERY - puppet last run on fluorine is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:33:53] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [00:33:53] RECOVERY - puppet last run on stat1003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:33:53] RECOVERY - puppet last run on mw1093 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:33:53] RECOVERY - puppet last run on mw1122 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:33:53] RECOVERY - puppet last run on ssl1001 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [00:33:59] RECOVERY - puppet last run on es1010 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [00:33:59] RECOVERY - puppet last run on calcium is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:34:00] RECOVERY - puppet last run on db2028 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [00:34:00] RECOVERY - puppet last run on db2012 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [00:34:09] RECOVERY - puppet last run on db1033 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:34:09] RECOVERY - puppet last run on ssl1006 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [00:34:09] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:34:19] RECOVERY - puppet last run on analytics1028 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [00:34:20] RECOVERY - puppet last run on mw1091 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:34:20] RECOVERY - puppet last run on ms-be1011 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [00:34:28] RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [00:34:39] RECOVERY - puppet last run on cp1044 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [00:34:39] RECOVERY - puppet last run on cp1037 is OK: OK: Puppet is currently enabled, last run 64 seconds ago with 0 failures [00:34:39] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [00:34:39] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:34:39] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:34:40] RECOVERY - puppet last run on db2030 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:34:40] RECOVERY - puppet last run on search1011 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:34:42] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:34:49] RECOVERY - puppet last run on mw1010 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [00:34:49] RECOVERY - puppet last run on mw1016 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:34:49] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:34:50] RECOVERY - puppet last run on mw1139 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:34:50] RECOVERY - puppet last run on iodine is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [00:34:59] RECOVERY - puppet last run on mw1209 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:34:59] RECOVERY - puppet last run on cp1040 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [00:35:00] RECOVERY - puppet last run on mw1152 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [00:35:00] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:35:20] RECOVERY - puppet last run on search1004 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:35:29] RECOVERY - puppet last run on mw1219 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:35:49] RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [00:36:19] RECOVERY - puppet last run on virt1008 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:36:49] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:36:50] RECOVERY - puppet last run on tmh1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:37:40] RECOVERY - puppet last run on elastic1009 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [00:48:01] (03PS1) 10Springle: prepare to retire remaining pmtpa DBs [puppet] - 10https://gerrit.wikimedia.org/r/163536 [00:49:14] (03CR) 10Springle: [C: 032] prepare to retire remaining pmtpa DBs [puppet] - 10https://gerrit.wikimedia.org/r/163536 (owner: 10Springle) [01:15:14] (03PS1) 10Springle: Fix invalid range for trusty nagios check_procs. [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/163537 [01:18:39] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [01:19:56] that's not good [01:20:46] (03CR) 10Springle: [C: 032] Fix invalid range for trusty nagios check_procs. [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/163537 (owner: 10Springle) [01:21:57] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [01:24:01] 5xxs spiked at 1:15: https://graphite.wikimedia.org/render/?title=HTTP%205xx%20Responses%20-1hour&from=-1hours&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=2&lineMode=connected&target=color(cactiStyle(alias(reqstats.5xx,%225xx%20resp/min%22)),%22blue%22) [01:24:27] springle: ^ related to anything you're doing? [01:24:43] ori: not that i know of [01:25:28] i don't see a corresponding spike in fluorine:/a/mw-log/fatal.log, so i don't think it's mediawiki; maybe varnish? [01:25:41] ori: fluorine apache2 log seems busy. i've never looked at it before, but file errors seem high [01:30:36] it hasn't spiked [01:30:49] [fluorine:/a/mw-log] $ grep -c 'Sep 29 01' apache2.log [01:30:49] 157148 [01:30:57] [fluorine:/a/mw-log] $ grep -c 'Sep 29 00' apache2.log [01:30:57] 301054 [01:31:00] and we're at half past the hour [01:38:03] ori: did you do anything, or did it fix itself? [01:38:21] going by the gdash varnish err graphs [01:39:45] springle: i didn't do anything. there are brief gaps followed by spikes in several unrelated metrics on the esams text varnishes that makes me suspect an eqiad-esams networki issue [01:40:18] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Text+caches+esams&h=amssq31.esams.wmnet&jr=&js=&v=323882124&m=varnish.s_sess&vl=N%2Fs&ti=Total+Sessions [01:40:30] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: puppet fail [01:48:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [01:56:26] ori: that often happens just from some kind of ganglia thing, though. I mean, it could be a clue, but it could also be purely an effect on stats gathering. [01:59:38] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [01:59:51] (03PS1) 10Springle: update mariadb submodule [puppet] - 10https://gerrit.wikimedia.org/r/163538 [02:01:11] (03CR) 10Springle: [C: 032] update mariadb submodule [puppet] - 10https://gerrit.wikimedia.org/r/163538 (owner: 10Springle) [02:07:33] bblack: yeah, that was my suspicion, since a similar jump was registered in metrics that shouldn't be related. anyways, it stopped, so i stopped investigating. [02:08:38] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3613 MB (3% inode=99%): [02:09:08] removed now [02:14:42] !log restarting squid on carbon (webproxy) [02:14:50] Logged the message, Master [02:15:34] !log LocalisationUpdate completed (1.24wmf22) at 2014-09-29 02:15:34+00:00 [02:15:39] Logged the message, Master [02:26:50] !log LocalisationUpdate completed (1.25wmf1) at 2014-09-29 02:26:50+00:00 [02:26:57] Logged the message, Master [03:00:37] RECOVERY - Disk space on virt0 is OK: DISK OK [03:04:26] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [03:25:12] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Sep 29 03:25:11 UTC 2014 (duration 25m 10s) [03:25:17] Logged the message, Master [03:55:59] (03PS1) 10Ori.livneh: Set appropriate 500 and 404 documents for HHVM [puppet] - 10https://gerrit.wikimedia.org/r/163542 [03:58:57] (03PS2) 10Ori.livneh: Set appropriate 500 and 404 documents for HHVM [puppet] - 10https://gerrit.wikimedia.org/r/163542 [04:27:56] (03PS1) 10KartikMistry: Add .gitreview file [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163545 [04:42:39] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: puppet fail [04:46:06] (03PS1) 10Mattflaschen: Change how GettingStarted Redis server IP is determined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) [04:50:29] (03PS2) 10Mattflaschen: Change how GettingStarted Redis server IP is determined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) [04:51:27] (03PS3) 10Mattflaschen: Change how GettingStarted Redis server IP is determined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) [04:52:19] (03CR) 10Mattflaschen: [C: 04-1] Change how GettingStarted Redis server IP is determined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) (owner: 10Mattflaschen) [04:57:25] (03PS4) 10Mattflaschen: Change how GettingStarted Redis server IP is determined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) [04:57:41] (03CR) 10Mattflaschen: [C: 04-1] Change how GettingStarted Redis server IP is determined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) (owner: 10Mattflaschen) [04:59:56] (03PS5) 10Mattflaschen: Change how GettingStarted Redis server IP is determined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) [05:00:48] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [05:01:24] (03CR) 10Mattflaschen: [C: 04-1] "Please review but don't merge yet. After initial code review, I would like to test on Beta Labs before merging if possible." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) (owner: 10Mattflaschen) [05:04:10] (03PS2) 10Springle: WIP: Cleanup the Sanitarium [software] - 10https://gerrit.wikimedia.org/r/163147 [05:06:05] (03CR) 10Ori.livneh: "Let's just configure labs to use redis, like production. I'm not sure why it wasn't done like that from the get-go. Digging through the gi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) (owner: 10Mattflaschen) [05:55:19] morebots [05:55:19] I am a logbot running on tools-exec-07. [05:55:19] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [05:55:19] To log a message, type !log . [06:25:40] (03PS1) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163548 [06:28:18] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: puppet fail [06:28:58] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: puppet fail [06:29:07] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail [06:30:18] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:29] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:37] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:38] PROBLEM - puppet last run on amslvs1 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:47] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:57] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:59] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:59] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:08] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:09] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:17] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:19] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:19] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:37] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:37] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:57] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:29] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:45:48] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:45:59] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:46:08] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:08] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:46:09] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:46:09] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:46:18] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:46:18] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [06:46:19] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:19] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:46:28] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:46:33] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:46:33] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:47:08] RECOVERY - puppet last run on amslvs1 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:47:18] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:47:39] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:47:48] RECOVERY - puppet last run on mw1054 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:55:24] (03PS1) 10Ori.livneh: Beta: use redis for session storage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163549 [06:56:10] (03CR) 10Ori.livneh: [C: 032] Beta: use redis for session storage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163549 (owner: 10Ori.livneh) [06:56:16] (03Merged) 10jenkins-bot: Beta: use redis for session storage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163549 (owner: 10Ori.livneh) [07:07:03] (03CR) 10Giuseppe Lavagetto: [C: 032] Set appropriate 500 and 404 documents for HHVM [puppet] - 10https://gerrit.wikimedia.org/r/163542 (owner: 10Ori.livneh) [07:21:44] (03PS1) 10Ori.livneh: Labs: specify password for session redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163550 [07:22:07] (03CR) 10Ori.livneh: [C: 032] Labs: specify password for session redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163550 (owner: 10Ori.livneh) [07:22:13] (03Merged) 10jenkins-bot: Labs: specify password for session redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163550 (owner: 10Ori.livneh) [07:38:44] (03PS1) 10Ori.livneh: apache: add vhost_combined log format to defaults.conf [puppet] - 10https://gerrit.wikimedia.org/r/163551 [07:49:34] (03CR) 10Ori.livneh: "Done; Beta uses Redis for session storage now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) (owner: 10Mattflaschen) [08:01:10] (03PS4) 10Giuseppe Lavagetto: HHVM: warm up the JIT by making web requests in Upstart post-start [puppet] - 10https://gerrit.wikimedia.org/r/150992 (owner: 10Ori.livneh) [08:21:49] !log Restarting Jenkins to have a plugin installed/loaded properly [08:21:54] Logged the message, Master [08:33:47] (03CR) 10Hashar: [C: 031] "Seems fine to me." [puppet] - 10https://gerrit.wikimedia.org/r/159226 (owner: 10Krinkle) [08:34:34] (03PS3) 10Hashar: contint: Package 'php5-parsekit' is absent on Trusty, don't require it [puppet] - 10https://gerrit.wikimedia.org/r/161748 (https://bugzilla.wikimedia.org/68255) (owner: 10Krinkle) [08:35:04] (03CR) 10Hashar: [C: 031] contint: Package 'php5-parsekit' is absent on Trusty, don't require it [puppet] - 10https://gerrit.wikimedia.org/r/161748 (https://bugzilla.wikimedia.org/68255) (owner: 10Krinkle) [08:42:26] (03CR) 10Hashar: "What Ori said/implemented, definitely migrate to the same configuration as production. Ie reuse the Redis used for session since we now ha" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163547 (https://bugzilla.wikimedia.org/59838) (owner: 10Mattflaschen) [08:43:50] (03Abandoned) 10Hashar: mediawiki: extract nutcracker to its own manifest [puppet] - 10https://gerrit.wikimedia.org/r/148041 (owner: 10Hashar) [08:43:57] (03Abandoned) 10Hashar: beta: configure nutcracker on bastion [puppet] - 10https://gerrit.wikimedia.org/r/148042 (owner: 10Hashar) [08:44:42] (03PS5) 10Giuseppe Lavagetto: HHVM: warm up the JIT by making web requests in Upstart post-start [puppet] - 10https://gerrit.wikimedia.org/r/150992 (owner: 10Ori.livneh) [08:46:09] (03PS6) 10Hashar: hhvm: create module + list all dev dependencies [puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) [08:48:02] (03Abandoned) 10Hashar: Revert "mediawiki: create common-local directory" [puppet] - 10https://gerrit.wikimedia.org/r/154329 (https://bugzilla.wikimedia.org/69590) (owner: 10Hashar) [08:50:39] (03PS6) 10Giuseppe Lavagetto: HHVM: warm up the JIT by making web requests in Upstart post-start [puppet] - 10https://gerrit.wikimedia.org/r/150992 (owner: 10Ori.livneh) [08:53:20] (03CR) 10Giuseppe Lavagetto: [C: 032] "We may do better in the future, using a python warmup script that can run in parallel, or whatever we want to do." [puppet] - 10https://gerrit.wikimedia.org/r/150992 (owner: 10Ori.livneh) [08:58:05] (03CR) 10Filippo Giunchedi: [C: 031] contint: Package 'php5-parsekit' is absent on Trusty, don't require it [puppet] - 10https://gerrit.wikimedia.org/r/161748 (https://bugzilla.wikimedia.org/68255) (owner: 10Krinkle) [09:02:39] (03PS1) 10Nemo bis: Disable local uploads where unused, per local request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163553 (https://bugzilla.wikimedia.org/71403) [09:02:53] (03CR) 10Hashar: "I am testing this change in labs. Will amend a few times till I have something working properly." [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [09:06:23] (03CR) 10Filippo Giunchedi: [C: 031] contint: Ensure nodejs-legacy is installed [puppet] - 10https://gerrit.wikimedia.org/r/159226 (owner: 10Krinkle) [09:08:07] (03CR) 10Filippo Giunchedi: [C: 031] Elasticsearch Drop number of concurrent merges [puppet] - 10https://gerrit.wikimedia.org/r/163188 (owner: 10Manybubbles) [09:11:37] godog: you can merge nodejs-legacy https://gerrit.wikimedia.org/r/159226 Timo already pushed it to the contint puppet master :) [09:17:05] (03CR) 10Filippo Giunchedi: [C: 031] zuul: client to easily query Gearman server [puppet] - 10https://gerrit.wikimedia.org/r/162856 (owner: 10Hashar) [09:18:00] hashar: ok! [09:19:29] (03CR) 10Filippo Giunchedi: [C: 031] Add robots.txt rewrite rule where wiki is public [puppet] - 10https://gerrit.wikimedia.org/r/147487 (owner: 10Reedy) [09:20:43] (03PS2) 10Hashar: contint: configuration files renaming [puppet] - 10https://gerrit.wikimedia.org/r/162584 [09:21:37] (03PS10) 10Filippo Giunchedi: contint: Ensure nodejs-legacy is installed [puppet] - 10https://gerrit.wikimedia.org/r/159226 (owner: 10Krinkle) [09:21:52] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] contint: Ensure nodejs-legacy is installed [puppet] - 10https://gerrit.wikimedia.org/r/159226 (owner: 10Krinkle) [09:23:17] (03PS3) 10Hashar: contint: configuration files renaming [puppet] - 10https://gerrit.wikimedia.org/r/162584 [09:25:21] (03CR) 10Filippo Giunchedi: "to clarify, the related RT is about adding the vhost_combined logformat so that logs are filled with something useful not changing the exi" [puppet] - 10https://gerrit.wikimedia.org/r/162541 (owner: 10Jeremyb) [09:28:52] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] eqiad-prod: bump ms-be1013/1014/1015 weight to 3000 [software/swift-ring] - 10https://gerrit.wikimedia.org/r/163137 (owner: 10Filippo Giunchedi) [09:28:57] (03PS1) 10Aude: Add wikidatawiki and test wikis (e.g. test2) to wikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163554 [09:30:19] (03PS1) 10Giuseppe Lavagetto: varnish: remove fallback to Zend for HHVM. [puppet] - 10https://gerrit.wikimedia.org/r/163555 [09:31:21] (03CR) 10Hashar: [C: 031 V: 032] "I forgot to adjust zuul.conf file mode to make it publicly available. It was previously 0400 and the exec stanza does not change the fil" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [09:31:50] !log deployed new swift ring to eqiad-prod [09:31:53] (03PS2) 10Giuseppe Lavagetto: varnish: remove fallback to Zend for HHVM. [puppet] - 10https://gerrit.wikimedia.org/r/163555 [09:31:56] Logged the message, Master [09:32:44] hhm: Error: https://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=20140923T181438Z at line 3: Error: cannot call methods on slider prior to initialization; attempted to call method 'value' [09:32:59] godog: sure, fine but not doing at 5+ am :) [09:33:05] about a miliion like this [09:33:32] anything known ? [09:37:36] (03CR) 10Filippo Giunchedi: contint: configuration files renaming (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [09:38:30] jeremyb: haha yep no worries, this has been going on for a while [09:38:42] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "LG, 1 minor comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) (owner: 10Hashar) [09:39:01] <_joe_> hashar: one small correction for you [09:40:46] _joe_: can you please update: https://etherpad.wikimedia.org/p/puppet3 [09:46:56] <_joe_> matanya: I don't think I have time for that today [09:47:13] (03CR) 10Hashar: contint: configuration files renaming (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [09:48:18] (03PS4) 10Hashar: contint: configuration files renaming [puppet] - 10https://gerrit.wikimedia.org/r/162584 [09:49:39] (03CR) 10Hashar: "The zuul.conf exec {} now has creates => '/etc/zuul/zuul.conf'." [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [09:52:52] no rush _joe_ thanks [10:00:36] (03PS5) 10Hashar: contint: configuration files renaming [puppet] - 10https://gerrit.wikimedia.org/r/162584 [10:01:01] (03CR) 10Hashar: "Removed 'creates => ..' it prevents the exec from running whenever the file exist." [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [10:08:36] (03CR) 10Hashar: "That seems to work though if zuul.conf is deleted, it is not regenerated until zuul-server.conf is changed :-/" [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [10:25:48] (03CR) 10Hashar: [C: 031 V: 032] "Tested on labs. /etc/zuul/zuul.conf will not be regenerated if it ever disappears but I am willing to live with that issue which has no i" [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [10:27:19] unsubscribe? https://rt.wikimedia.org/Ticket/Display.html?id=8453 [10:29:11] mark: does https://gerrit.wikimedia.org/r/#/c/162887/ look ok to you? wanted to merge it and steal pmtpa svc range into codfw [10:30:22] yup [10:40:42] hasharLunch: [10:40:44] 2014-09-29 07:27:42 1XYVN0-0001RV-Dm <= jenkins-bot@wikimedia.org H=gallium.wikimedia.org [208.80.154.135]:9061 I=[208.80.152.133]:25 P=esmtp S=3942 id=1262378705.553.1411975661741.JavaMail.jenkins@gallium [10:41:02] so jenkins is still using mchenry, which will be shutdown any moment now [10:46:50] (03CR) 10Mark Bergsma: "I actually feel this seems a bit premature, given that we still hit some bugs regularly as far as I'm aware. What exactly is the big upsid" [puppet] - 10https://gerrit.wikimedia.org/r/163555 (owner: 10Giuseppe Lavagetto) [10:52:54] (03PS5) 10Filippo Giunchedi: codfw: steal pmtpa svc range [dns] - 10https://gerrit.wikimedia.org/r/162887 [10:53:02] mark: ack, thanks [10:53:46] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] codfw: steal pmtpa svc range [dns] - 10https://gerrit.wikimedia.org/r/162887 (owner: 10Filippo Giunchedi) [11:12:17] (03CR) 10Filippo Giunchedi: [C: 031] contint: configuration files renaming [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [11:17:15] (03CR) 10Alexandros Kosiaris: [C: 032] decom linne [puppet] - 10https://gerrit.wikimedia.org/r/162171 (owner: 10Dzahn) [11:35:00] (03CR) 10Giuseppe Lavagetto: "Last week's hhvm patches have drastically reduced the number of reported bugs - so our fear is falling back to zend is actually masking so" [puppet] - 10https://gerrit.wikimedia.org/r/163555 (owner: 10Giuseppe Lavagetto) [11:42:57] (03PS3) 10Giuseppe Lavagetto: mediawiki: move memcached servers list to a hiera variable [puppet] - 10https://gerrit.wikimedia.org/r/162622 [11:44:58] (03CR) 10Mark Bergsma: "Sorry, but I don't understand how this is "masking" anything. The user is not aware of problems, but that seems to be a feature?" [puppet] - 10https://gerrit.wikimedia.org/r/163555 (owner: 10Giuseppe Lavagetto) [11:45:09] (03CR) 10BBlack: [C: 031] "I'm an advocate of this; I think HHVM needs to stand on its own from fairly early, or else some will get false perceptions of its stabilit" [puppet] - 10https://gerrit.wikimedia.org/r/163555 (owner: 10Giuseppe Lavagetto) [11:55:06] (03CR) 10Giuseppe Lavagetto: [C: 032] "Verified with the puppet compiler." [puppet] - 10https://gerrit.wikimedia.org/r/162622 (owner: 10Giuseppe Lavagetto) [12:05:38] (03CR) 10Ori.livneh: "Note that the last hard crash of the application servers occurred on August 31; we haven't had any in September. (See fluorine:/a/mw-log/a" [puppet] - 10https://gerrit.wikimedia.org/r/163555 (owner: 10Giuseppe Lavagetto) [12:15:47] PROBLEM - puppet last run on fenari is CRITICAL: CRITICAL: puppet fail [12:20:40] (03Abandoned) 10Ori.livneh: apache: add vhost_combined log format to defaults.conf [puppet] - 10https://gerrit.wikimedia.org/r/163551 (owner: 10Ori.livneh) [12:21:35] mark: gotta figure it out :] [12:21:45] thanks [12:21:52] let me know when you think you've fixed it [12:21:57] mark: I remember mail got broken a few months ago when some mail alias was changed. What is the one to use nowadays? [12:22:08] what do you mean? [12:22:18] let me find the host being used right now [12:22:19] we have a puppet variable for mail smarthost to use [12:22:20] (03CR) 10Ori.livneh: "* This should go in /modules/apache/files/defaults.conf" [puppet] - 10https://gerrit.wikimedia.org/r/162541 (owner: 10Jeremyb) [12:22:22] which you should use [12:22:33] it's probably pointed at polonium, but don't hardcode that anywhere [12:22:34] alas Jenkins config is not puppetized :-( [12:22:51] currently points to wiki-mail.wikimedia.org [12:24:01] hm [12:24:03] thanks [12:24:06] we'll change that [12:24:37] mark: I used to have it point to smtp.{pmtpa,eqiad}.wmnet but it got removed in favor of that wiki-mail.wikimedia.org alias [12:26:41] (03PS4) 10BBlack: authdns: switch to all IPs on all hosts [puppet] - 10https://gerrit.wikimedia.org/r/163227 [12:27:02] (03PS1) 10Mark Bergsma: Move wiki-mail away from mchenry [dns] - 10https://gerrit.wikimedia.org/r/163567 [12:27:40] (03CR) 10Mark Bergsma: [C: 032] Move wiki-mail away from mchenry [dns] - 10https://gerrit.wikimedia.org/r/163567 (owner: 10Mark Bergsma) [12:28:20] dns changed [12:33:20] hopefully gallium / Jenkins / Java will catch it up [12:33:29] thanks mark! [12:48:45] (03PS5) 10BBlack: authdns: switch to all IPs on all hosts [puppet] - 10https://gerrit.wikimedia.org/r/163227 [13:00:52] (03CR) 10BBlack: [C: 032] "puppet compiler output looks ok on authservers (doesn't work for monitoring server changes). Affected machines have agent disabled so I c" [puppet] - 10https://gerrit.wikimedia.org/r/163227 (owner: 10BBlack) [13:05:38] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Puppet last ran 255998 seconds ago, expected 14400 [13:06:24] (03CR) 10Hashar: hhvm: create module + list all dev dependencies (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) (owner: 10Hashar) [13:06:34] (03PS7) 10Hashar: hhvm: create module + list all dev dependencies [puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) [13:06:44] (03PS8) 10Hashar: hhvm: create module + list all dev dependencies [puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) [13:07:35] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [13:23:55] (03PS1) 10BBlack: Fix bits monitoring URLs [puppet] - 10https://gerrit.wikimedia.org/r/163568 [13:25:07] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Puppet last ran 323659 seconds ago, expected 14400 [13:26:00] (03CR) 10BBlack: [C: 032] Fix bits monitoring URLs [puppet] - 10https://gerrit.wikimedia.org/r/163568 (owner: 10BBlack) [13:26:56] (03PS6) 10Filippo Giunchedi: contint: configuration files renaming [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [13:27:02] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] contint: configuration files renaming [puppet] - 10https://gerrit.wikimedia.org/r/162584 (owner: 10Hashar) [13:27:07] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [13:27:55] !log Zuul: tweaking configuration files {{gerrit|162584}} [13:28:00] Logged the message, Master [13:32:35] !log Restarted Zuul [13:32:40] Logged the message, Master [13:36:06] (03PS3) 10Filippo Giunchedi: zuul: client to easily query Gearman server [puppet] - 10https://gerrit.wikimedia.org/r/162856 (owner: 10Hashar) [13:36:11] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] zuul: client to easily query Gearman server [puppet] - 10https://gerrit.wikimedia.org/r/162856 (owner: 10Hashar) [13:37:38] PROBLEM - puppet last run on cp1008 is CRITICAL: CRITICAL: Puppet last ran 913711 seconds ago, expected 14400 [13:39:37] RECOVERY - puppet last run on cp1008 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [13:56:54] (03PS5) 10Filippo Giunchedi: swift: refactor into module, add codfw [puppet] - 10https://gerrit.wikimedia.org/r/162291 [13:57:15] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: refactor into module, add codfw [puppet] - 10https://gerrit.wikimedia.org/r/162291 (owner: 10Filippo Giunchedi) [13:57:52] (03PS9) 10Giuseppe Lavagetto: hhvm: create module + list all dev dependencies [puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) (owner: 10Hashar) [13:58:01] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] hhvm: create module + list all dev dependencies [puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) (owner: 10Hashar) [13:58:55] <_joe_> godog: wow, very happy it was mergeable [14:00:44] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [14:01:08] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: puppet fail [14:01:09] PROBLEM - puppet last run on db2033 is CRITICAL: CRITICAL: puppet fail [14:01:36] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Shouldn't we group all those rewrites (favicon, touch-icon.png, robots.txt, etc.) into a single rewrite block we include everywhere?" [puppet] - 10https://gerrit.wikimedia.org/r/147488 (owner: 10Reedy) [14:02:06] PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: puppet fail [14:02:25] (03CR) 10Reedy: "Like we do for the rest of the common code? ;)" [puppet] - 10https://gerrit.wikimedia.org/r/147488 (owner: 10Reedy) [14:02:55] <_joe_> Reedy: the fact we never did that is not a good reason not to start now :P [14:03:00] True [14:03:06] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: puppet fail [14:03:10] Have we an example of it anywhere? [14:03:18] <_joe_> btw, I'm about to tackle your 'docroot-deduplication' patch [14:03:30] wheee [14:03:39] <_joe_> Reedy: there were some abandoned attempts in my early apache revamps [14:03:44] (03PS4) 10Hashar: contint: Package 'php5-parsekit' is absent on Trusty, don't require it [puppet] - 10https://gerrit.wikimedia.org/r/161748 (https://bugzilla.wikimedia.org/68255) (owner: 10Krinkle) [14:03:46] <_joe_> I'll take a shot at this [14:03:51] (03CR) 10Hashar: "rebased" [puppet] - 10https://gerrit.wikimedia.org/r/161748 (https://bugzilla.wikimedia.org/68255) (owner: 10Krinkle) [14:04:02] <_joe_> tackle != merge now [14:04:30] PROBLEM - puppet last run on db2009 is CRITICAL: CRITICAL: puppet fail [14:04:41] (03CR) 10Hashar: "Cherry picked on contint puppet master." [puppet] - 10https://gerrit.wikimedia.org/r/161748 (https://bugzilla.wikimedia.org/68255) (owner: 10Krinkle) [14:04:47] PROBLEM - puppet last run on db2019 is CRITICAL: CRITICAL: puppet fail [14:04:47] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail [14:05:06] PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail [14:05:09] _joe_: heh, in fact it doesn't work [14:05:16] PROBLEM - puppet last run on db2005 is CRITICAL: CRITICAL: puppet fail [14:05:21] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Error from DataBinding 'hiera' while looking up 'admin::always_groups': syntax error on line 88, col 1: `]' on node ms-be2001.codfw.wmnet [14:05:37] <_joe_> wat? [14:06:00] not sure what file is on the syntax error though [14:06:20] <_joe_> lemme take a look at what happens on the puppetmasters [14:06:36] PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: puppet fail [14:07:06] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [14:07:14] this is probably puppet failing on hiera [14:07:16] <_joe_> it's in the nuyaml code clearly somewhere [14:07:56] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: puppet fail [14:07:56] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: puppet fail [14:08:06] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: puppet fail [14:08:10] <_joe_> maybe there are things that are defined in eqiad and not in codfw [14:08:16] <_joe_> that make puppet fail there [14:08:16] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: puppet fail [14:08:24] <_joe_> as hiera tries to check those [14:08:31] <_joe_> so all codfw hosts are failing [14:08:46] PROBLEM - puppet last run on labcontrol2001 is CRITICAL: CRITICAL: puppet fail [14:08:47] PROBLEM - puppet last run on db2007 is CRITICAL: CRITICAL: puppet fail [14:08:49] <_joe_> godog: disable puppet runs there for a few minutes if possible [14:08:56] PROBLEM - puppet last run on db2029 is CRITICAL: CRITICAL: puppet fail [14:09:06] yep same error btw [14:09:08] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Error from DataBinding 'hiera' while looking up 'admin::always_groups': syntax error on line 88, col 1: `]' on node db2009.codfw.wmnet [14:09:16] _joe_: where? [14:09:17] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail [14:09:21] <_joe_> godog: codfw [14:09:33] <_joe_> or revert :) [14:09:37] PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: puppet fail [14:09:38] revert [14:09:46] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: puppet fail [14:10:16] <_joe_> godog: no wait [14:10:16] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: puppet fail [14:10:16] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: puppet fail [14:10:20] <_joe_> I may have got this [14:10:46] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [14:10:47] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: puppet fail [14:10:47] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: puppet fail [14:10:47] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: puppet fail [14:11:19] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: puppet fail [14:11:20] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: puppet fail [14:11:20] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail [14:11:43] (03PS1) 10Filippo Giunchedi: Revert "swift: refactor into module, add codfw" [puppet] - 10https://gerrit.wikimedia.org/r/163570 [14:11:46] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail [14:11:47] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: puppet fail [14:11:53] <_joe_> AHAH [14:11:56] _joe_: ok, revert is above [14:12:05] <_joe_> godog: $::site is not defined in codfw [14:12:07] <_joe_> pretty sure [14:13:16] PROBLEM - puppet last run on db2003 is CRITICAL: CRITICAL: puppet fail [14:13:18] heh, what's the fix? [14:13:26] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: puppet fail [14:13:28] <_joe_> godog: looking into realm.pp [14:13:36] just for heira? [14:13:43] I'm pretty sure I've used $::site in codfw already [14:13:57] PROBLEM - puppet last run on db2011 is CRITICAL: CRITICAL: puppet fail [14:14:27] <_joe_> ok, no. [14:14:29] (e.g. for lvs service IP defs) [14:15:16] PROBLEM - puppet last run on db2030 is CRITICAL: CRITICAL: puppet fail [14:15:17] PROBLEM - puppet last run on db2012 is CRITICAL: CRITICAL: puppet fail [14:15:24] <_joe_> bblack: it's probably defined there [14:15:26] PROBLEM - puppet last run on db2028 is CRITICAL: CRITICAL: puppet fail [14:15:35] <_joe_> godog: revert for now [14:15:36] <_joe_> :/ [14:15:41] ack [14:15:47] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: puppet fail [14:15:57] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: puppet fail [14:15:59] <_joe_> it is failing only on codfw btw [14:16:03] <_joe_> which is pretty strange [14:16:06] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Revert "swift: refactor into module, add codfw" [puppet] - 10https://gerrit.wikimedia.org/r/163570 (owner: 10Filippo Giunchedi) [14:16:21] we've added codfw.yaml [14:16:27] <_joe_> we are using hiera everywhere [14:17:21] "well, that didn't work" http://images.wikia.com/en.futurama/images/d/da/Fry_Looking_Squint.jpg [14:17:28] _joe_: join me in #wikimedia-labs when you have a moment? [14:17:45] <_joe_> andrewbogott: about what? [14:17:46] RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [14:17:57] <_joe_> I'm pretty tangled up in hiera/puppet issues right now [14:18:12] _joe_: I take it you don't have paging turned on there :) Just wondering about puppet-compiler02 -- it has puppet disabled currently. [14:18:16] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: puppet fail [14:18:16] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: puppet fail [14:18:23] And I need it to update to get the new ldap servers. [14:18:26] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: puppet fail [14:18:47] whatever's supposed to keep the beta cluster up-to-date hasn't worked since friday [14:18:56] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [14:18:59] <_joe_> andrewbogott: not now sorry [14:19:17] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: puppet fail [14:20:26] RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [14:20:27] RECOVERY - puppet last run on db2033 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [14:20:31] <_joe_> godog: the codfw.yaml file has a bad syntax [14:20:57] <_joe_> there is still that array declared puppet-style vs yaml-style [14:22:34] RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [14:23:14] _joe_: amusingly enough, I can load it with ruby -ryaml -e 'print YAML.load_file("hieradata/codfw.yaml")' [14:23:26] RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [14:23:42] <_joe_> godog: lemme try just to be sure [14:23:44] RECOVERY - puppet last run on db2005 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [14:23:46] but indeed it is the only file with a ] in that position [14:23:59] <_joe_> I'm pretty sure that is the problem [14:24:15] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [14:24:15] RECOVERY - puppet last run on db2019 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [14:24:52] ah there we go, fails with palladium's ruby [14:25:05] RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [14:25:25] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [14:25:53] <_joe_> godog: yay [14:26:05] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [14:26:06] <_joe_> so we can BLAME RUBY [14:26:11] <_joe_> sounds like a plan [14:26:15] we would have done that anyways [14:26:24] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [14:26:24] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [14:26:28] <_joe_> yeah but now, rightfully so [14:26:37] always rightfully so [14:27:04] RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [14:27:16] where's < godog> we've added codfw.yaml ? [14:27:27] bblack: heh in the PS I have reverted [14:27:31] ah ok [14:27:44] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [14:27:51] stray comma at the end of a yaml array btw [14:27:53] <_joe_> bblack: and that file had a syntax error, at least according to ruby 1.8 [14:27:56] RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [14:27:56] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [14:28:04] RECOVERY - puppet last run on db2007 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [14:28:29] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [14:28:34] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [14:28:42] yeah that's not the first time that's bitten us. [14:28:45] hashar: how hard is it to add a syntax check to be ran on puppet.git via gerrit? [14:29:04] godog: depends how much you like YAML :] [14:29:06] <_joe_> godog: a yaml linting for hiera? yeah definitely needed [14:29:24] _joe_: yeah possibly for other yaml data too, we have some [14:29:25] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [14:29:34] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [14:29:34] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [14:29:38] hashar: haha why? I don't mind it [14:29:44] <_joe_> godog: which should move to hiera sooner than later [14:29:59] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [14:30:04] RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [14:30:04] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [14:30:04] RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [14:30:05] I have no clue how to lint yaml files properly though :( [14:30:14] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [14:30:25] RECOVERY - puppet last run on db2037 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [14:30:25] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [14:30:36] <_joe_> hashar: not "properly", but "how the braindead yaml module in ruby 1.8 wants them" [14:31:02] with the python one, you can embed objects and I have no idea whether it is going to execute the code while loading it :/ [14:31:04] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [14:31:14] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [14:31:23] and that, each language has slightly different yaml libraries having different expectations :-/ [14:31:25] RECOVERY - puppet last run on db2003 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [14:31:34] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [14:31:59] yaml has a spec, should file bugs if they don't agree [14:32:00] http://yaml.org/ [14:33:05] RECOVERY - puppet last run on db2011 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [14:33:20] hashar: it isn't very different that loading each file through ruby's YAML.load_file and see if it works, but suppose I have already a linting script, what do I need besides that? [14:33:36] RECOVERY - puppet last run on db2028 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [14:33:44] RECOVERY - puppet last run on db2012 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [14:33:58] TIL: yaml requires a whitespace after comma in sequences [14:34:24] RECOVERY - puppet last run on db2030 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [14:35:13] RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [14:35:41] _joe_: whereas in private.git I should add hieradata/codfw.yaml in the root? [14:36:05] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [14:37:34] RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [14:37:35] RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [14:37:36] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [14:37:54] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [14:39:47] (03CR) 10Hashar: "The hhvm build deps have some conflict/cycle with libjpeg62 and libjpeg8 which do not play well together. Filled as https://bugzilla.wikim" [puppet] - 10https://gerrit.wikimedia.org/r/150813 (https://bugzilla.wikimedia.org/63120) (owner: 10Hashar) [14:40:01] (03PS1) 10Filippo Giunchedi: swift: refactor into module, add codfw [puppet] - 10https://gerrit.wikimedia.org/r/163572 [14:41:09] godog: basically have a job created in Jenkins by using the Jenkins Job Builder utility and its yaml files There is a basic tutorial at https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Adding_basic_checks which should cover your need [14:41:32] godog: then (described on that tutorial) add some triggers in Zuul configuration which is the gateway between Gerrit and Jenkins [14:41:32] (03CR) 10Filippo Giunchedi: "same as https://gerrit.wikimedia.org/r/#/c/162291/ which got reverted, now with valid yaml" [puppet] - 10https://gerrit.wikimedia.org/r/163572 (owner: 10Filippo Giunchedi) [14:42:04] hashar: ack, I'll take a look, thanks! [14:42:28] godog: and poke #wikimedia-qa :] there is a few more jenkins job builder gurus there :D [14:42:31] manybubbles, marktraceur: Which of us wants to SWAT today? I'm assuming not ^demon|sick, if he's sick. [14:42:43] anomie: I can do it! [14:42:47] manybubbles: ok! [14:42:49] I'm not deep into anything [14:42:56] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: refactor into module, add codfw [puppet] - 10https://gerrit.wikimedia.org/r/163572 (owner: 10Filippo Giunchedi) [14:43:07] * anomie is just code-reviewing [14:43:35] <^d> anomie, manybubbles: Not sick anymore, just waking up :) [14:43:51] (03CR) 10Manybubbles: [C: 031] "Will deploy in today's SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162965 (https://bugzilla.wikimedia.org/71204) (owner: 10Prtksxna) [14:44:04] ^d: That's good! But if you want the SWAT you'll have to fight manybubbles for it ;) [14:44:12] (03CR) 10Manybubbles: [C: 031] Move wikitech to the new ldap server, ldap-eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163189 (owner: 10Andrew Bogott) [14:44:14] (03PS1) 10Andrew Bogott: Set up labcontrol2001 as a dns server. [puppet] - 10https://gerrit.wikimedia.org/r/163573 [14:44:19] <^d> anomie: I said I'm not sick, not that I'm in fighting mode :p [14:44:25] _joe_: ok that worked! of course it'd be nice to have the path in the error message [14:44:28] godog: and poke me as needed :-] [14:44:34] hashar: cool, ta! [14:44:47] (03PS1) 10Andrew Bogott: Add a temporary service address for labs, labs-ns2. [dns] - 10https://gerrit.wikimedia.org/r/163574 [14:46:20] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [14:46:24] gi11es: https://gerrit.wikimedia.org/r/#/c/163115 has already been +2ed. has it been deployed? [14:46:26] greg-g: early-morning OCG deploy request! [14:46:41] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [14:47:05] greg-g: i'm trying to do a deploy of the latest bits for the OCG service ASAP this morning, so it has a little bit of time to bake before we turn on OCG-PDF-by-default on all the wikis later today. [14:47:07] mlitn: I'll do your deploy in 15 minutes if you are around [14:47:09] failures on ms-be2* are fine [14:47:21] manybubbles: great! [14:47:45] manybubbles: you're working on SWAT this morning? [14:48:00] cscott: I SWAT today - if you have something stick in on the list [14:48:04] manybubbles: I've added it to the SWAT to get it deployed to machines running 1.24wmf22 [14:48:08] It can go on the end [14:48:23] manybubbles: last time I did a backport like this I was told to +2 ahead of time [14:48:28] manybubbles: no, I have an OCG deploy to do myself. just let me know when you're done with SWAT. [14:48:33] (03CR) 10Andrew Bogott: [C: 032] Add a temporary service address for labs, labs-ns2. [dns] - 10https://gerrit.wikimedia.org/r/163574 (owner: 10Andrew Bogott) [14:49:02] gi11es: ah! now I see. this is the merge to the release branch. can you make the submodule update for core? it makes my life easier if you can. [14:49:07] if you can't I'll do it [14:49:14] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: puppet fail [14:49:19] cscott: can do. can link me to what OCG means? [14:49:20] ah yes I forgot about that part, I'll get on it [14:49:33] (03CR) 10Andrew Bogott: [C: 032] Set up labcontrol2001 as a dns server. [puppet] - 10https://gerrit.wikimedia.org/r/163573 (owner: 10Andrew Bogott) [14:49:41] manybubbles: sorry: Offline Collection Generator ( https://www.mediawiki.org/wiki/OCG ) [14:50:06] (although the real documentation is really at https://wikitech.wikimedia.org/wiki/OCG , I'm still working on porting it over to :mw:) [14:51:33] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: puppet fail [14:51:50] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail [14:52:10] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [14:52:40] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail [14:52:50] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [14:55:59] <_joe_> godog: xfsprogs [14:56:20] manybubbles: should I +2 the submodule update as well? [14:56:39] gi11es: nope! only the deployer +2s that [14:56:51] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [14:56:52] same for deployment branch updates in core and mediawiki-config [14:56:53] manybubbles: in that case https://gerrit.wikimedia.org/r/163575 [14:57:01] its done right before the update. [14:57:19] perfect. can you replace the link on the deployments page? [14:57:42] added [14:59:04] (03PS1) 10Filippo Giunchedi: swift: add missing xfsprogs dependency [puppet] - 10https://gerrit.wikimedia.org/r/163576 [14:59:21] thanks! [14:59:32] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: puppet fail [14:59:33] _joe_: indeed, for private.git it'll be hieradata/codfw.yaml correct? [14:59:44] <_joe_> godog: yep [15:00:02] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: puppet fail [15:00:04] manybubbles, anomie, ^d, marktraceur, mlitn: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140929T1500). Please do the needful. [15:01:11] (03CR) 10Manybubbles: [C: 032] Flow enable mw:Talk:MediaWiki UI (fix typo) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162965 (https://bugzilla.wikimedia.org/71204) (owner: 10Prtksxna) [15:01:14] _joe_: nope, hiera.yaml says /etc/puppet/private/hiera [15:01:24] (03Merged) 10jenkins-bot: Flow enable mw:Talk:MediaWiki UI (fix typo) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162965 (https://bugzilla.wikimedia.org/71204) (owner: 10Prtksxna) [15:01:26] (03PS1) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/163577 [15:01:30] I think we should change that to be the same everywhere [15:01:31] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.001 second response time [15:02:22] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT fix config type of flow. (duration: 00m 06s) [15:02:24] mlitn: ^^^^ [15:02:27] Logged the message, Master [15:02:58] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: puppet fail [15:03:17] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.057 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.155.135 [15:03:27] hhvm seems more stable today [15:03:47] manybubbles: looks good, thanks [15:03:57] mlitn: consider yourself SWATed. thanks. [15:04:18] (03PS1) 10KartikMistry: Add initial Debian packaging [debs/contenttranslation/apertium-es-ca] - 10https://gerrit.wikimedia.org/r/163578 [15:04:19] andrewbogott: around for your SWAT update? [15:04:26] manybubbles: yep [15:04:27] aude: I +2ed your submodule update [15:04:27] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.008 second response time [15:04:33] (03CR) 10Manybubbles: [C: 032] Move wikitech to the new ldap server, ldap-eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163189 (owner: 10Andrew Bogott) [15:04:36] manybubbles: ok [15:04:39] (03Merged) 10jenkins-bot: Move wikitech to the new ldap server, ldap-eqiad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163189 (owner: 10Andrew Bogott) [15:04:54] manybubbles: l'll need to sync by hand on virt1000, please let me know when it's ready [15:05:02] andrewbogott: merged [15:05:03] ok, I guess it's already ready :) [15:05:31] !log manybubbles Synchronized wmf-config/wikitech.php: SWAT sync wikitech file - is a noop I believe (duration: 00m 05s) [15:05:34] manybubbles: want to merge https://gerrit.wikimedia.org/r/#/c/161262/ while you're at it? [15:05:36] Logged the message, Master [15:06:54] (03PS1) 10KartikMistry: Add initial Debian packaging [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/163579 [15:09:14] (03CR) 10Ottomata: "This should be parameterized via the puppet module, shouldn't it?" [puppet] - 10https://gerrit.wikimedia.org/r/163188 (owner: 10Manybubbles) [15:11:21] eek [15:11:27] https://wikitech.wikimedia.org/wiki/Special:Recentchanges [15:11:32] (03CR) 10Manybubbles: "Meh. Setting it to the same value on all of our Elasticsearch's won't hurt anything. Neither will parameterizing it. I'm lazy by nature" [puppet] - 10https://gerrit.wikimedia.org/r/163188 (owner: 10Manybubbles) [15:11:33] [aacd3e40] 2014-09-29 15:11:12: Fatal exception of type MWException [15:11:59] no localisation cache found [15:12:05] is the reason [15:12:08] for en [15:12:24] probably mid pull by andrewbogott [15:13:14] bd808: it's normal for there to be an outage during a sync-common? [15:13:15] l10n cdbs are built at the end of sync-common [15:13:23] that seems bad :) [15:13:24] _joe_: thoughts on fixing hieradata vs hiera in puppet.git vs private.git ? [15:13:38] andrewbogott: If you are changing branches and not getting scap pushes yeah it's going to happen [15:13:58] ok… seems back [15:14:00] WHen Sam rolls out a new branch he scaps once before changing wikiversions.json to avoid this [15:14:40] So the problem wikitech has right now is that it doesn't get warmed up by the initial scap on Thursday. [15:15:06] It would be optimal to figure out how to have scap update wikitech [15:15:59] (03CR) 10Reedy: [C: 031] "long live fenari" [puppet] - 10https://gerrit.wikimedia.org/r/163315 (owner: 10Dzahn) [15:16:48] (03CR) 10BryanDavis: [C: 031] "scap will seem so much faster (until codfw comes online...)" [puppet] - 10https://gerrit.wikimedia.org/r/163315 (owner: 10Dzahn) [15:17:19] !log manybubbles Synchronized php-1.25wmf1/extensions/Wikidata/: SWAT update wikidata to fix hhvm issues. (duration: 00m 14s) [15:17:21] aude: ^^ [15:17:25] thanks [15:17:26] Logged the message, Master [15:18:50] bah, memcached error [15:19:13] i don't think related to the patch though [15:20:13] (03PS1) 10Andrew Bogott: Add ferm rules to allow our labs dns server to actually serve. [puppet] - 10https://gerrit.wikimedia.org/r/163581 [15:22:10] manybubbles: thanks for the swat [15:22:29] andrewbogott: its cool! [15:22:34] gi11es: about to do yours [15:24:13] !log manybubbles Synchronized php-1.24wmf22/extensions/UploadWizard/: SWAT update UploadWizard (duration: 00m 05s) [15:24:18] Logged the message, Master [15:24:24] gi11es: and you should be live no [15:24:28] *now* [15:25:20] manybubbles: checking... [15:26:06] (03CR) 10Andrew Bogott: [C: 032] Add ferm rules to allow our labs dns server to actually serve. [puppet] - 10https://gerrit.wikimedia.org/r/163581 (owner: 10Andrew Bogott) [15:33:16] (03PS1) 10Andrew Bogott: Include base::firewall in labcontrol2001 [puppet] - 10https://gerrit.wikimedia.org/r/163587 [15:33:30] <_joe_> godog: uh? [15:36:09] manybubbles: still looking, so far I don't see the expected fix [15:37:12] _joe_: hiera.yaml states that private data is in /etc/puppet/private/hiera so hiera/ in private.git whereas it is hieradata/ in puppet.git [15:37:42] gi11es: hmmm - I just rechecked and verified that I did deploy the fix. let me keep poking [15:37:55] <_joe_> godog: there is a symlink AFAIR [15:38:00] manybubbles: the version page shows the old commit for UW [15:38:14] <_joe_> godog: nevermind, I got it backwards probably [15:38:14] e37096a [15:38:21] manybubbles: on commons [15:38:23] gi11es: the versions page doesn't always tell the truth, unfortunately [15:38:34] its lame, but it doesn't [15:39:17] _joe_: so it should be hieradata/ in private.git too I think [15:39:49] <_joe_> godog: so change hiera.yaml [15:40:03] <_joe_> or, we make a wee change to nuyaml.rb [15:40:13] Is Jenkins sick? [15:40:48] I'll change hiera.yaml [15:40:51] <_joe_> godog: change hiera.yaml [15:41:31] hashar: how's Jenkins? [15:42:20] gi11es: I verified the code is deployed on mw1001 - let me check another node [15:43:42] (03CR) 10BryanDavis: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/163587 (owner: 10Andrew Bogott) [15:44:32] manybubbles: I've tested further and it might be that I need to resave the UW campaign. which I don't have right for neither on beta nor commons [15:45:05] gi11es: is it something you can force with eval.php on a misc box? [15:45:19] andrewbogott: looks like zuul and jenkins forgot to talk to each other on your patch? https://integration.wikimedia.org/zuul/ shows it waiting for the puppetlint jobs but they are both showing as finished in jenkins [15:45:24] andrewbogott: what do you mean? [15:45:48] manybubbles: let's just assume it worked and I'll get an interested party with rights to verify the fix before the next SWAT window [15:45:59] gi11es: cool. [15:46:04] cscott: I'm done [15:46:06] hashar: um… whatever bd808 just said? [15:46:12] !log Zuul lost all Jenkins jobs :( [15:46:18] Logged the message, Master [15:46:31] probably because I am documenting how to debug / fix it :] [15:46:39] (03PS1) 10Filippo Giunchedi: hiera: use hieradata/ in private.git [puppet] - 10https://gerrit.wikimedia.org/r/163588 [15:47:19] "restart 7 times while walking windershins around a mulberry bush under a new moon" [15:47:36] that is the Gearman server being locked :/ [15:48:01] manybubbles: thanks. [15:48:20] my deploy commit isn't quite ready yet. but i'm working on it. [15:48:35] (03CR) 10Andrew Bogott: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/163587 (owner: 10Andrew Bogott) [15:48:36] manybubbles: https://wikitech.wikimedia.org/wiki/Deployments#Monday.2C.C2.A0September.C2.A029 can you do my one as well? [15:49:00] Glaisher: I just seeded the florr to cscott. looks like he's not ready yet. let me read [15:49:24] Glaisher: sorry I nissed you earlier [15:49:28] manybubbles: yeah, sure, go ahead. i'll poke you before i'm ready. [15:50:13] _joe_: no nevermind, I think 163588 is wrong [15:50:41] (03PS2) 10Andrew Bogott: Include base::firewall in labcontrol2001 [puppet] - 10https://gerrit.wikimedia.org/r/163587 [15:51:45] Glaisher: let me read the collection extension to make sure its just something I can turn on like that. I haven't done it before and don't want to break anything [15:52:17] no worries [15:52:47] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.126666666667 [15:53:20] !log Zuul jobs reregistered [15:53:25] Logged the message, Master [15:53:54] critial! bah [15:54:21] (03CR) 10Andrew Bogott: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/163587 (owner: 10Andrew Bogott) [15:54:35] andrewbogott: Zuul happy again [15:54:53] thx [15:56:15] (03CR) 10Manybubbles: [C: 032] Enable Collection extension on svwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163331 (https://bugzilla.wikimedia.org/64994) (owner: 10Glaisher) [15:56:23] (03Merged) 10jenkins-bot: Enable Collection extension on svwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163331 (https://bugzilla.wikimedia.org/64994) (owner: 10Glaisher) [15:56:25] (03CR) 10Andrew Bogott: [C: 032] Include base::firewall in labcontrol2001 [puppet] - 10https://gerrit.wikimedia.org/r/163587 (owner: 10Andrew Bogott) [15:57:24] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable collection extension svwikiversity (duration: 00m 06s) [15:57:26] Glaisher: ^^^ [15:57:32] Logged the message, Master [15:59:00] https://sv.wikiversity.org/wiki/Special:Bok seems to beworking [15:59:04] thanks [15:59:37] PROBLEM - puppet last run on labcontrol2001 is CRITICAL: Timeout while attempting connection [15:59:48] PROBLEM - LDAP on labcontrol2001 is CRITICAL: Connection timed out [15:59:48] PROBLEM - check configured eth on labcontrol2001 is CRITICAL: Timeout while attempting connection [16:00:07] PROBLEM - DPKG on labcontrol2001 is CRITICAL: Timeout while attempting connection [16:00:07] PROBLEM - LDAPS on labcontrol2001 is CRITICAL: Connection timed out [16:00:08] PROBLEM - RAID on labcontrol2001 is CRITICAL: Timeout while attempting connection [16:00:08] PROBLEM - check if dhclient is running on labcontrol2001 is CRITICAL: Timeout while attempting connection [16:00:27] PROBLEM - Disk space on labcontrol2001 is CRITICAL: Timeout while attempting connection [16:01:06] I wonder when we can just enable Collection everywhere... [16:01:09] Glaisher: wonderful! [16:01:23] Reedy: why don't we? I just read up on it and didn't see anything funny about it [16:01:32] I'm not sure [16:01:35] cscott: I'm done [16:01:37] collection for wikidata? [16:01:37] RECOVERY - check configured eth on labcontrol2001 is OK: NRPE: Unable to read output [16:01:54] Collection for Wikidata was pretty buggy [16:01:56] I remember testing it :) [16:01:58] would be nice, maybe, but doubt it works correctly [16:02:00] RECOVERY - DPKG on labcontrol2001 is OK: All packages OK [16:02:01] RECOVERY - check if dhclient is running on labcontrol2001 is OK: PROCS OK: 0 processes with command name dhclient [16:02:01] RECOVERY - RAID on labcontrol2001 is OK: OK: no disks configured for RAID [16:02:11] It's buggy in some languages I think but for the majority, it should be useable [16:02:18] RECOVERY - Disk space on labcontrol2001 is OK: DISK OK [16:02:21] buggy in chinese [16:02:27] RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 202 seconds ago with 0 failures [16:02:39] * Reedy starts a bug [16:03:21] Is the new backend any better for zh? [16:03:42] no idea [16:04:03] but collection is the magic reason that $wgEnableSidebarCache was enabled there and disabled elsewhere [16:04:09] RECOVERY - LDAPS on labcontrol2001 is OK: TCP OK - 0.046 second response time on port 636 [16:04:10] lack of collection there [16:04:37] RECOVERY - LDAP on labcontrol2001 is OK: TCP OK - 0.044 second response time on port 389 [16:05:39] https://bugzilla.wikimedia.org/show_bug.cgi?id=71416 [16:07:37] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [16:09:00] (03PS2) 10Filippo Giunchedi: hiera: use hieradata everywhere [puppet] - 10https://gerrit.wikimedia.org/r/163588 [16:09:47] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: puppet fail [16:09:55] _joe_: https://gerrit.wikimedia.org/r/#/c/163588/ [16:11:36] cscott: When you're not so busy https://bugzilla.wikimedia.org/show_bug.cgi?id=71416 [16:17:08] <_joe_> godog: seems legit, remember to remove the old symlink on the masters, though [16:17:17] (03CR) 10Giuseppe Lavagetto: [C: 031] hiera: use hieradata everywhere [puppet] - 10https://gerrit.wikimedia.org/r/163588 (owner: 10Filippo Giunchedi) [16:17:34] <_joe_> I'm off for now, ttyl [16:17:44] bye _joe_ [16:17:55] <_joe_> ori: did you sleep at all? [16:18:17] <_joe_> ori: see you @mwcore anyways :) [16:18:20] maybe! [16:18:23] nod [16:27:39] (03PS1) 10Ori.livneh: Labs: Update IP of jobqueue's redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163595 (https://bugzilla.wikimedia.org/71415) [16:27:57] (03PS2) 10Ori.livneh: Labs: Update IP of jobqueue's redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163595 (https://bugzilla.wikimedia.org/71415) [16:28:03] (03CR) 10Ori.livneh: [C: 032] Labs: Update IP of jobqueue's redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163595 (https://bugzilla.wikimedia.org/71415) (owner: 10Ori.livneh) [16:28:08] (03Merged) 10jenkins-bot: Labs: Update IP of jobqueue's redis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163595 (https://bugzilla.wikimedia.org/71415) (owner: 10Ori.livneh) [16:28:57] Dibs 16:00 SWAT [16:28:57] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:34:24] (03CR) 10Alexandros Kosiaris: [C: 032] openldap module [puppet] - 10https://gerrit.wikimedia.org/r/156322 (owner: 10Alexandros Kosiaris) [16:37:17] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [16:37:45] (03CR) 10Dzahn: [C: 04-1] "missing newline , looks like it'd break it" (031 comment) [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163545 (owner: 10KartikMistry) [16:38:25] (03PS1) 10Andrew Bogott: Include libnet-dns-perl with ferm. It's needed for certain ferm functions. [puppet] - 10https://gerrit.wikimedia.org/r/163597 [16:41:01] (03CR) 10Dzahn: [C: 031] "looks reasonable to me, in general i would trust kartik knows what he's doing with Debian packaging, he's a Debian dev since years (https:" [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163548 (owner: 10KartikMistry) [16:42:38] (03PS1) 10BBlack: Fix icinga network checks [puppet] - 10https://gerrit.wikimedia.org/r/163598 [16:43:17] PROBLEM - LDAPS on labcontrol2001 is CRITICAL: Connection refused [16:43:48] PROBLEM - LDAP on labcontrol2001 is CRITICAL: Connection refused [16:44:27] Who knows about ferm and/or base::firewall? [16:47:21] RECOVERY - LDAPS on labcontrol2001 is OK: TCP OK - 0.043 second response time on port 636 [16:47:58] RECOVERY - LDAP on labcontrol2001 is OK: TCP OK - 0.043 second response time on port 389 [16:48:39] (03PS3) 10Filippo Giunchedi: hiera: use hieradata everywhere [puppet] - 10https://gerrit.wikimedia.org/r/163588 [16:48:51] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] hiera: use hieradata everywhere [puppet] - 10https://gerrit.wikimedia.org/r/163588 (owner: 10Filippo Giunchedi) [16:48:55] (03CR) 10Mark Bergsma: "Can someone do log analysis on how often this fallback is used? Would be interesting to compare that to Zend actually..." [puppet] - 10https://gerrit.wikimedia.org/r/163555 (owner: 10Giuseppe Lavagetto) [16:50:17] (03CR) 10BBlack: [C: 032] "This may cause some false alerts, seeing how it's probably been years since they were last on and they could be outdated..." [puppet] - 10https://gerrit.wikimedia.org/r/163598 (owner: 10BBlack) [16:50:23] (03PS2) 10BBlack: Fix icinga network checks [puppet] - 10https://gerrit.wikimedia.org/r/163598 [16:50:35] (03CR) 10BBlack: [V: 032] Fix icinga network checks [puppet] - 10https://gerrit.wikimedia.org/r/163598 (owner: 10BBlack) [16:51:02] (03PS1) 10BBlack: remove pmtpa network monitor defs [puppet] - 10https://gerrit.wikimedia.org/r/163600 [16:51:16] (03CR) 10BBlack: [C: 032 V: 032] remove pmtpa network monitor defs [puppet] - 10https://gerrit.wikimedia.org/r/163600 (owner: 10BBlack) [16:51:18] (03PS2) 10KartikMistry: Add .gitreview file [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163545 [16:51:45] (03CR) 10KartikMistry: Add .gitreview file (031 comment) [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163545 (owner: 10KartikMistry) [16:53:23] (03CR) 10Dzahn: "oops, heh, the project= line is duplicate" [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163545 (owner: 10KartikMistry) [16:54:01] (03PS3) 10KartikMistry: Add .gitreview file [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163545 [16:54:30] (03CR) 10Dzahn: [C: 032] Add .gitreview file [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163545 (owner: 10KartikMistry) [16:54:56] (03CR) 10Dzahn: [V: 032] Add .gitreview file [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/163545 (owner: 10KartikMistry) [16:55:05] Look I need break from packaging. Doing silly mistakes :) [16:55:36] I just "fixed" some network checks in puppet that seemed accidentally disabled for a long time, and their definitions could have issues. They'll pop up in icinga in an hour, and I wouldn't be shocked if some of them alert at that point. [16:55:49] So, don't freak out an hour from now if some network equipment alerts show up here [16:59:44] mutante: Wait, did you actually take RT this week or was it me? [16:59:47] * Coren is confused. [17:01:06] Coren: mutante did last week at least [17:01:17] akosiaris: can I get a hand with ferm? [17:01:31] Then that should really be me then; though I expected that I had the Phab first week but that's been shuffled around [17:02:32] (03PS1) 10Andrew Bogott: Remove base::firewall from labcontrol2001 [puppet] - 10https://gerrit.wikimedia.org/r/163603 [17:02:47] PROBLEM - puppet last run on pc1001 is CRITICAL: CRITICAL: Puppet has 1 failures [17:03:39] (03PS1) 10Andrew Bogott: Revert "Include base::firewall in labcontrol2001" [puppet] - 10https://gerrit.wikimedia.org/r/163604 [17:03:45] (03Abandoned) 10Andrew Bogott: Remove base::firewall from labcontrol2001 [puppet] - 10https://gerrit.wikimedia.org/r/163603 (owner: 10Andrew Bogott) [17:04:39] (03CR) 10Andrew Bogott: [C: 032] Revert "Include base::firewall in labcontrol2001" [puppet] - 10https://gerrit.wikimedia.org/r/163604 (owner: 10Andrew Bogott) [17:04:53] ori: I think I found an HHVM bug: I cannot use Special:CreateAccount when it is enabled. [17:05:08] ori: I get this... [17:05:09] Request: POST http://en.wikipedia.org/w/index.php?title=Special:UserLogin&action=submitlogin&type=signup, from 10.128.0.116 via cp1052 cp1052 ([10.64.32.104]:3128), Varnish XID 539434733 [17:05:27] Forwarded for: 67.161.126.166, 10.128.0.110, 10.128.0.110, 10.128.0.116 [17:05:27] Error: 503, Service Unavailable at Mon, 29 Sep 2014 17:01:59 GMT [17:05:28] (03PS1) 10Rush: phab update for T171 [puppet] - 10https://gerrit.wikimedia.org/r/163605 [17:05:50] ragesoss: 1) are you logged in? (i.e., are you creating an account for someone else?), 2) can you try again? [17:05:56] ori: yes. [17:06:03] yes, logged in. [17:06:05] (03PS2) 10Rush: phab update for T171 [puppet] - 10https://gerrit.wikimedia.org/r/163605 [17:06:19] ragesoss: i think it is already reported [17:06:20] * ori tries to reproduce. [17:06:21] ori: I tried twice and failed, then unenabled HHVM and it worked. [17:06:30] then enabled again, and it did not. [17:07:56] ragesoss: i think i see it in the log; investigating. [17:08:24] (03CR) 10BryanDavis: "Poke. Waiting period should be long over." [puppet] - 10https://gerrit.wikimedia.org/r/162128 (owner: 10Reedy) [17:08:53] (03CR) 10Rush: [C: 04-1] These changes add the "extension" Sprint. The implementation is actually as a libphutil library. It can be enabled with the setting "load- (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/162873 (owner: 10Christopher Johnson (WMDE)) [17:09:27] (03CR) 10Rush: "Also, please submit a diff to add this to the 'testing' labs instance or just otherwise please link to where this is being tested now so I" [puppet] - 10https://gerrit.wikimedia.org/r/162873 (owner: 10Christopher Johnson (WMDE)) [17:09:28] PROBLEM - puppet last run on pc1002 is CRITICAL: CRITICAL: Puppet has 1 failures [17:09:47] Error: Could not find any host matching 'ms-be2012' (config file '/etc/icinga/puppet_services.cfg', starting on line 95855) [17:09:56] ^ someone broke icinga again! [17:10:16] might be a missing codfw icinga group or something [17:10:27] yeah no idae [17:10:37] filippo has been working on installing swift boxes in codfw of course [17:10:44] but I just did an icinga puppet run + adding services within the past few hours successfully, so it's something new [17:10:47] so perhaps merely installing those has triggered it [17:10:51] probably [17:10:52] hm [17:11:41] (03CR) 10Rush: [C: 032] phab update for T171 [puppet] - 10https://gerrit.wikimedia.org/r/163605 (owner: 10Rush) [17:12:08] heh we have 178K lines in puppet_services.cfg [17:12:36] (03PS1) 10Cscott: Switch default PDF renderer to OCG. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163608 [17:12:38] (03PS1) 10Cscott: Disable the old mwlib PDF render service. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163609 [17:12:52] congratulations, you beat core's InitialiseSettings.php, which is only 14k [17:13:08] <_joe_> ours is autogenerated [17:13:16] <_joe_> and, it's not php [17:13:21] <_joe_> so you win [17:13:32] even 14 lines of php is too many [17:13:45] (03CR) 10Rush: [C: 04-1] "I don't think this works since we don't terminate SSL at the phabricator box?" [puppet] - 10https://gerrit.wikimedia.org/r/162805 (https://bugzilla.wikimedia.org/38516) (owner: 10Chmarkine) [17:13:47] (03CR) 10Cscott: "To be deployed in Parsoid/Services window later today." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163608 (owner: 10Cscott) [17:14:20] (03CR) 10Cscott: "Don't deploy this yet; we'll do this on Wednesday assuming OCG is looking good." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163609 (owner: 10Cscott) [17:14:48] PROBLEM - puppet last run on pc1003 is CRITICAL: CRITICAL: Puppet has 1 failures [17:15:00] actually I think the monitoring thing might be a timing race, I'm checking now [17:15:42] (timing re: host executing puppet, master executing puppet, neon executing puppet, somehow ending up with service but not host def) [17:17:45] <_joe_> bblack: may have something to do with our "do not show new hosts in naggen" patch [17:18:17] yeah, that's kinda what I'm thinking. Since it does a 1H window on both, but perhaps their creation stamps are spaced out by the length of a puppet client run [17:18:52] (03PS1) 10Rush: phab remove obsolete option [puppet] - 10https://gerrit.wikimedia.org/r/163611 [17:19:09] (03PS2) 10Rush: phab remove obsolete option [puppet] - 10https://gerrit.wikimedia.org/r/163611 [17:19:18] but if so, me re-running puppet on palladium then neon should clear it up [17:19:24] and it did not [17:19:25] bblack: ye likely my codfw patch from earlier today, is that the only error or it just stops there? [17:19:37] it just stops there unfortunately. probably all similar ones are similar [17:19:40] (03CR) 10Rush: "on second thought maybe it will, are we doing this anywhere else w/ a service behind misc-web?" [puppet] - 10https://gerrit.wikimedia.org/r/162805 (https://bugzilla.wikimedia.org/38516) (owner: 10Chmarkine) [17:20:14] <_joe_> bblack: seems like the service defs are in place, but not the host def [17:20:39] (03CR) 10Rush: [C: 032] phab remove obsolete option [puppet] - 10https://gerrit.wikimedia.org/r/163611 (owner: 10Rush) [17:20:59] right [17:21:25] oh! perhaps the service defs stayed from the initial deploy -> revert -> repatch, but the hosts did not [17:21:36] thus making a much larger diff in their 1H delay points [17:22:17] I donno, for now I'm gonna wait it out and see if it fixes itself after delay [17:28:15] bblack: ack, let me know if I can help [17:28:37] (03PS1) 10KartikMistry: Add .gitreview [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/163614 [17:29:12] that whole 1 hour delay thing is kinda hacky anyways. it would still be better to find a way to add everything instantly, but mark them in downtime for their first hour. [17:29:43] (03PS2) 10Rush: Update Phabricator footer (License, Terms of Use) [puppet] - 10https://gerrit.wikimedia.org/r/162219 (owner: 10Aklapper) [17:29:49] (03CR) 10Rush: [C: 032 V: 032] Update Phabricator footer (License, Terms of Use) [puppet] - 10https://gerrit.wikimedia.org/r/162219 (owner: 10Aklapper) [17:31:53] mh I guess no easy way to detect whether we're adding an host for the first time? or perhaps an host can be distinctly added but not "armed" [17:32:44] someone (I think alex) sorted out a script to mark a host in downtime for an hour from the commandline, on the icinga host (neon) [17:33:11] but the generation of the data happens over on the puppetmaster, and it's only there that we can (easily) tell what's a new monitored resource and what already existed, in scriptable terms. [17:33:23] if it has a puppet cert but the hostname is not in site.pp ? [17:33:31] (because it gets the default stuff) [17:34:14] bblack: this is what we need I think https://github.com/zorkian/nagios-api [17:34:18] yeah but if it's not in site.pp, it probably doesn't have service checks either [17:35:24] the other other part of that equasion (even if we used e.g. nagios-api to mark downtime from palladium when the monitor defs were regenerated by naggen2), is that at the time they're regenerated on palladium they haven't yet been added to neon [17:35:40] which happens on its next puppet run. so the downtime command would be for a resource that doesn't exist yet, and I don't think that would work. [17:37:10] we could simply undo the 1H thing and go back to how things were before: they worked sanely, but we get alerts on new hosts commonly because icinga's checking stuff before you've sorted out the mess from the first puppet run and such. [17:37:28] (idaelly, the first puppet run wouldn't be a mess) [17:37:52] yea, you can only schedule downtimes on hosts that icinga already knows [17:38:08] heh, from what I've seen it is much more work to make puppet work the first time than let it converge [17:38:40] in the moment the icinga config for the host is generated, it should also exec that command to set a downtime.. [17:38:51] (03PS2) 10Alexandros Kosiaris: Introduce role::openldap::corp [puppet] - 10https://gerrit.wikimedia.org/r/163184 [17:39:11] ragesoss: https://bugzilla.wikimedia.org/show_bug.cgi?id=71421 [17:39:14] i'll follow up there [17:39:33] yeah perhaps, if we had something running via neon's puppet defs, which could look at the to-be-applied (or just-recently-applied) diff and parse it [17:40:00] manybubbles, greg-g: ok, starting the OCG deploy [17:40:12] took me a while to get the deploy commit in order [17:41:25] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce role::openldap::corp [puppet] - 10https://gerrit.wikimedia.org/r/163184 (owner: 10Alexandros Kosiaris) [17:41:33] hmm -- "Host bastion.wmflabs.org not found: 3(NXDOMAIN)" what's up with that? [17:42:15] that's from 8.8.8.8, although 8.8.4.4 still seems to know about it [17:42:28] hmmm [17:42:39] try bastion-eqiad.wmflabs.org [17:42:45] dunno why bastion would be gone though [17:43:06] could be one wonky google dns server, they are multicasted AFAIK. but very strange. [17:43:22] no, it's us [17:43:35] ns0-labs and ns1-labs are giving different answers [17:44:06] if bastion's address is supposed to exist, you should shut down name service on labs-n1 until you fix it. [17:44:15] * labs-ns1 [17:46:35] (03PS1) 10Alexandros Kosiaris: openldap_corp_mirror in ganglia clusters [puppet] - 10https://gerrit.wikimedia.org/r/163621 [17:47:13] (03PS1) 10Rush: phab stop trying to call outbound via http [puppet] - 10https://gerrit.wikimedia.org/r/163622 [17:47:28] !log stopped powerdns and disabled puppet on virt1000 to prevent further cache pollution w/ bad data in public caches [17:47:32] Logged the message, Master [17:47:47] (03CR) 10Alexandros Kosiaris: [C: 032] openldap_corp_mirror in ganglia clusters [puppet] - 10https://gerrit.wikimedia.org/r/163621 (owner: 10Alexandros Kosiaris) [17:48:11] (virt1000 being the real host behind labs-ns1) [17:48:45] (03PS2) 10Rush: phab stop trying to call outbound via http [puppet] - 10https://gerrit.wikimedia.org/r/163622 [17:48:52] (03CR) 10Rush: [C: 032 V: 032] phab stop trying to call outbound via http [puppet] - 10https://gerrit.wikimedia.org/r/163622 (owner: 10Rush) [17:48:57] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [17:49:07] (03PS1) 10Andrew Bogott: Move ldap ferm rules to the ldap module [puppet] - 10https://gerrit.wikimedia.org/r/163623 [17:49:41] (03CR) 10MaxSem: [C: 031] "Related is https://gerrit.wikimedia.org/r/162520 which kills skins-1.5 completely." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163454 (https://bugzilla.wikimedia.org/71385) (owner: 10Ori.livneh) [17:49:56] chrismcmahon: and mobile beta labs is totally broken: http://en.m.wikipedia.beta.wmflabs.org/ [17:50:04] ori: ^ [17:50:55] (03PS1) 10Alexandros Kosiaris: fix typo (install-certificate -> install_certificate) [puppet] - 10https://gerrit.wikimedia.org/r/163625 [17:51:13] so, labs-ns1 and labs-ns0 disagree? [17:51:37] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] fix typo (install-certificate -> install_certificate) [puppet] - 10https://gerrit.wikimedia.org/r/163625 (owner: 10Alexandros Kosiaris) [17:52:49] (03PS2) 10Cscott: Switch default PDF renderer to OCG. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163608 [17:52:51] (03PS2) 10Cscott: Disable the old mwlib PDF render service. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163609 [17:52:57] yay! [17:53:50] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.086 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.155.135 [17:55:38] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [17:56:42] (03PS2) 10Andrew Bogott: Move ldap ferm rules to the ldap module [puppet] - 10https://gerrit.wikimedia.org/r/163623 [17:58:05] !log ori Synchronized php-1.24wmf22/includes/password/Pbkdf2Password.php: I3b0a1de69: Test for string in Pbkdf2Password::crypt() (duration: 00m 05s) [17:58:09] Logged the message, Master [17:58:35] (03PS2) 10Ori.livneh: Update symlinks in w/ from /u/l/a/common-local to /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163454 (https://bugzilla.wikimedia.org/71385) [17:58:48] (03CR) 10Ori.livneh: [C: 032] Update symlinks in w/ from /u/l/a/common-local to /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163454 (https://bugzilla.wikimedia.org/71385) (owner: 10Ori.livneh) [17:58:53] (03Merged) 10jenkins-bot: Update symlinks in w/ from /u/l/a/common-local to /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163454 (https://bugzilla.wikimedia.org/71385) (owner: 10Ori.livneh) [17:59:35] !log updated OCG to version 89d8f29a24295b05d0643abe976fea83b56575c9 [17:59:40] Logged the message, Master [18:00:08] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [18:00:21] (03PS1) 10Alexandros Kosiaris: specify the correct password class [puppet] - 10https://gerrit.wikimedia.org/r/163628 [18:01:14] (03CR) 10Ori.livneh: [C: 031] Remove live-1.5 and skins-1.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162520 (owner: 10MaxSem) [18:01:35] (03CR) 10Alexandros Kosiaris: [C: 032] specify the correct password class [puppet] - 10https://gerrit.wikimedia.org/r/163628 (owner: 10Alexandros Kosiaris) [18:04:10] (03PS3) 10Andrew Bogott: Move ldap ferm rules to the ldap module [puppet] - 10https://gerrit.wikimedia.org/r/163623 [18:05:04] (03CR) 10Andrew Bogott: [C: 032] Move ldap ferm rules to the ldap module [puppet] - 10https://gerrit.wikimedia.org/r/163623 (owner: 10Andrew Bogott) [18:08:48] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [18:22:30] whoever is around: I'm taking a sick day today, so not really responsive. Ask chrismcmahon or James_F|Away :) [18:23:00] hope you feel better greg-g [18:23:04] +1 [18:23:25] (03CR) 10Chmarkine: "I think this should work, because HSTS is just a HTTP header field, rather than part of the SSL handshake. Currently we have HSTS enabled " [puppet] - 10https://gerrit.wikimedia.org/r/162805 (https://bugzilla.wikimedia.org/38516) (owner: 10Chmarkine) [18:23:58] thanks [18:24:06] * greg-g goes back to his tea [18:24:19] (that's all I can handle right now :/ ) [18:29:31] YuviPanda: are you working today? [18:29:53] andrewbogott: not really, no :( I just had a bunch of meetings, and about to go out. next 2 days in flights [18:30:14] YuviPanda: ok. You have ~1,000,000 self-hosted labs instances that in danger of falling off of ldap [18:30:19] Are you at least informed about that? [18:30:27] 1 million? [18:30:34] If you don't have time to work on them that's ok, might be handy if you can mark the ones that are yours so that I don't break them [18:30:46] andrewbogott: I think I can update the ones I care about, and let the rest just die... [18:30:56] YuviPanda: The complete list is here: https://wikitech.wikimedia.org/wiki/Ldap_rename [18:31:15] For ones that you want to die, it's easier for me if you actually delete them. Otherwise I'll feel compelled to fix them myself :) [18:31:17] andrewbogott: hmm, I can't seem to connect to bastion [18:31:41] YuviPanda: yeah, there was a dns outage, you probably have a bad cache [18:31:44] andrewbogott: haha. let me mark them as 'can die' and 'should fix' [18:31:57] Is it OK if I explicitly delete the 'can die' ones? [18:32:04] Or is that a 'can die but I'd rather didn't'? [18:32:35] YuviPanda: flying to US or back to India? [18:32:41] andrewbogott: India [18:32:51] andrewbogott: yes, you can explicitly delete the 'can die' ones [18:32:57] cool, thanks. [18:33:00] andrewbogott: saved edit [18:34:32] andrewbogott_afk: still can't login, and I've to go now :( would be awesome if you can fix the 'will fix' ones, but I'll try to try later tonight or tomorrow (not that high probability, though) [18:34:33] sorry [18:35:51] YuviPanda: no problem -- safe travels! [18:36:55] andrewbogott: cool. added 2 more hosts to 'can die' :) [18:36:56] * YuviPanda waves [18:37:11] greg-g: Get well! [18:59:30] (03PS3) 10Dzahn: Add reedy to logstash-roots [puppet] - 10https://gerrit.wikimedia.org/r/162128 (owner: 10Reedy) [19:00:19] (03CR) 10Dzahn: [C: 032] "acked in ops meeting" [puppet] - 10https://gerrit.wikimedia.org/r/162128 (owner: 10Reedy) [19:00:29] bd808|LUNCH: ^^ [19:01:00] akosiaris: ok… my first concern is that when I applied a ferm rule to a new server, that did not result in ferm actually being installed on the box. Is there a reason why that's on purpose, or shall I add a dependency in puppet? [19:01:04] (03CR) 10Dzahn: [V: 032] Add reedy to logstash-roots [puppet] - 10https://gerrit.wikimedia.org/r/162128 (owner: 10Reedy) [19:02:18] andrewbogott: so, ferm::rule creates a virtual resource that get's realized by the ferm class [19:02:26] and ferm::service as well [19:02:55] so you need to include the ferm class as well [19:03:08] ok. Would that generally happen via base::firewall? [19:03:09] now that is the technical part of the issue but !!!! [19:03:15] you have to include base::firewall if you want to use some of the network constants [19:03:22] exactly, base::firewall is the one to include! [19:03:24] and base::firewall will block everything you dont' tell it to [19:03:37] you don't tell it *not* to [19:03:37] ok… that may be ok. [19:03:49] Let me add that and see where I get next... [19:04:19] (03PS1) 10Andrew Bogott: Revert "Revert "Include base::firewall in labcontrol2001"" [puppet] - 10https://gerrit.wikimedia.org/r/163643 [19:04:51] (03PS2) 10Andrew Bogott: Revert "Revert "Include base::firewall in labcontrol2001"" [puppet] - 10https://gerrit.wikimedia.org/r/163643 [19:05:22] one step back :-( [19:05:55] mutante: Thanks! Reedy, get to work. :) [19:06:09] * Reedy looks at logstash1001 [19:06:10] 125 upgraded, 0 newly installed, 0 to remove and 3 not upgraded. [19:06:49] (03CR) 10Andrew Bogott: [C: 032] Revert "Revert "Include base::firewall in labcontrol2001"" [puppet] - 10https://gerrit.wikimedia.org/r/163643 (owner: 10Andrew Bogott) [19:07:46] akosiaris: ok, next question… ldap uses ports 389 and 636. But the ferm rule that's in puppet says 1389 and 1636. [19:07:50] Any idea why? [19:07:51] Reedy: One thing that cluster needs is an elasticsearch upgrade. manybubbles may have a handy script to do a rolling cluster upgrade/restart. [19:08:08] bd808: we should presumably get it 14.04-d too [19:09:03] bd808 and Reedy: yeah - I have a script I use. It just runs on my laptop [19:09:47] Reedy: Hmmm.. I haven't tested at all on 14.04. It should work but we'd need to have the logstash package for 14.04 added to our apt server. Probably better done in beta first. [19:11:00] elasticsearch (1.1.0 => 1.3.2) [19:12:04] yeah... I've been lazy [19:12:34] akosiaris: nevermind, now that base::firewall is applied I have ldap access. [19:12:40] But not ssh :) [19:14:37] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 116.199997 [19:14:44] manybubbles: bd808 can we "just" essentially upgrade elasticsearch [19:14:52] Reedy: I think https://wikitech.wikimedia.org/wiki/Search#Rolling_restarts is the basic process for upgrading the elasticsearch cluster for logstash. Uncomment the apt-get lines [19:15:09] bd808: that is indeed my script [19:15:23] Reedy: You just want to only do a single node at a time and wait for the cluster to heal between. [19:15:26] akosiaris: so the next question is… why is 53 still closed? (ssh being closed is probably a good thing) [19:15:48] Reedy: I'm not a logstash expert but very likely [19:15:49] andrewbogott: host ? labcontrol2001 ? [19:15:52] yep [19:16:11] this might just come down to "andrew can't read puppet" but I'm pretty sure it's right... [19:16:24] Reedy: beta is already running 1.3.2 it has just never been updated in prod so it should be safe [19:16:24] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1268.866699 [19:16:37] I guess we don't need to wget and dpkg install the deb? [19:16:39] shouldn't role::dns::ldap be opening 53? [19:17:35] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:17:42] andrewbogott: it is [19:17:55] iptables -nxvL [19:18:02] Reedy: No. What is in apt should be good. Nik does the wget stuff when he's updating in advance of adding the new package to apt. [19:18:02] last 2 rules are port 53 [19:18:28] tcp dpt:domain ? [19:18:42] yeah domain in /etc/services is 53 [19:18:52] ah, ok then. [19:19:02] So… I guess that just means that my dns server must be broken [19:19:06] since clearly no one is listening at 53 [19:19:27] bd808: part of me wants to jfdi :) [19:19:36] akosiaris: so maybe I never had firewall problems in the first place [19:19:56] andrewbogott: yeah, no dns server runs on labcontrol2001 [19:20:11] 27965 1 0 16:57 ? 00:00:02 /usr/sbin/pdns_server --daemon --guardian=yes [19:20:31] (03PS1) 10Rush: phab email pipe needs to be executable [puppet] - 10https://gerrit.wikimedia.org/r/163645 [19:20:41] (03PS2) 10Rush: phab email pipe needs to be executable [puppet] - 10https://gerrit.wikimedia.org/r/163645 [19:20:48] indeed... restart it ? [19:20:51] Reedy: It's ok with me if you do. The last time I did it I just warned here that it was going to happen. [19:20:57] "Fatal error: Trying to set unexisting parameter 'wildcards'" <- that could be the problem [19:21:01] although I've no idea what that means [19:21:02] lsof -p 27965 says indded no socket is open [19:21:13] hmmm [19:21:30] http://p.defau.lt/?Qo7GHDI9mpRw6FLS3_tROg [19:21:36] smells like configuration from precise is not working on trusty andrewbogott [19:21:44] yep, that's probably it. [19:22:51] (03CR) 10Rush: [C: 032] phab email pipe needs to be executable [puppet] - 10https://gerrit.wikimedia.org/r/163645 (owner: 10Rush) [19:23:23] Reedy: LGTM [19:25:06] bd808: presumably the rest of the packages should be good to go too (including kernel updates), restarting them all one at a time, and waiting for them to heal? [19:25:35] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.066667 [19:25:44] PROBLEM - puppet last run on db74 is CRITICAL: CRITICAL: puppet fail [19:25:50] Reedy: Yeah. Change all the things! :) [19:26:14] openjdk-6-jre (6b27-1.12.6-1ubuntu0.12.04.4 => 6b32-1.13.4-4ubuntu0.12.04.2) [19:26:18] openjdk-7-jdk (7u25-2.3.10-1ubuntu0.12.04.2 => 7u65-2.5.1-4ubuntu1~0.12.04.2) [19:26:25] I guess those might be useful too [19:26:42] !log doing rolling upgrade of elasticsearch on logstash100[1-3] [19:26:47] Logged the message, Master [19:27:10] Other than the elasticsearch data everything else comes from puppet so keeping the cluster sane is the only semi-important bit. [19:27:13] akosiaris: ok, now we've moved on to "binding UDP socket to '208.80.154.15' port 53: Cannot assign requested address" [19:28:08] ./logstash.sh: line 42: warning: here-document at line 11 delimited by end-of-file (wanted `__commands__') [19:28:52] stupid leading spaces [19:29:01] akosiaris: syslog also mentions 'Guardian is launching an instance' is that related? [19:29:24] doh. I didn't notice. Paste is not the best place for code review apparently. :( [19:29:35] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1288.199951 [19:29:43] I'm guessing it's because they're space indented for formatting on wikitech [19:30:36] akosiaris: nm, that last was due to a tyo [19:30:38] typo [19:32:12] (03PS1) 10Rush: Phab new object creation verbage tweak [puppet] - 10https://gerrit.wikimedia.org/r/163646 [19:32:17] (03CR) 10jenkins-bot: [V: 04-1] Phab new object creation verbage tweak [puppet] - 10https://gerrit.wikimedia.org/r/163646 (owner: 10Rush) [19:32:23] (03PS2) 10Rush: Phab new object creation verbage tweak [puppet] - 10https://gerrit.wikimedia.org/r/163646 [19:32:39] (03PS1) 10Nemo bis: [Planet Wikimedia] Add Finne Boonen [puppet] - 10https://gerrit.wikimedia.org/r/163647 [19:33:34] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 15 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 12, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 88, uinitializing_shards: 3, unumber_of_data_nodes: 3} [19:33:35] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 15 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 12, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 88, uinitializing_shards: 3, unumber_of_data_nodes: 3} [19:33:38] (03CR) 10Rush: [C: 032] Phab new object creation verbage tweak [puppet] - 10https://gerrit.wikimedia.org/r/163646 (owner: 10Rush) [19:34:08] icinga-wm: ssshhhhh [19:34:14] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 15 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 12, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 88, uinitializing_shards: 3, unumber_of_data_nodes: 3} [19:34:36] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 738.400024 [19:34:47] Those cluster wide checks probably shouldn't fire on every host in the cluster [19:34:49] (03PS1) 10Andrew Bogott: Split out dist-specific configs for pdns [puppet] - 10https://gerrit.wikimedia.org/r/163648 [19:35:03] mutante: there's a relatively fresh blog post in the feed I just proposed, so a merge around today would be most effective :) https://gerrit.wikimedia.org/r/#/c/163647/ [19:37:11] (03PS2) 10Andrew Bogott: Split out dist-specific configs for pdns [puppet] - 10https://gerrit.wikimedia.org/r/163648 [19:37:19] bd808: Bets on how long it'll take? [19:37:36] I guess the important things here are "initializing_shards" : 3, "unassigned_shards" : 11 [19:38:00] Reedy: it *should* be pretty fast. 10-15 min per node I'd guess. [19:38:10] (03CR) 10Andrew Bogott: [C: 032] Split out dist-specific configs for pdns [puppet] - 10https://gerrit.wikimedia.org/r/163648 (owner: 10Andrew Bogott) [19:40:36] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 226.800003 [19:40:54] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: puppet fail [19:41:45] !log restarted varnishkafka on cp3019 to troubleshoot drerrs [19:41:50] Logged the message, Master [19:42:03] (03PS1) 10Andrew Bogott: Need double-quotes if there are variables! [puppet] - 10https://gerrit.wikimedia.org/r/163650 [19:43:34] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:43:53] (03PS2) 10Andrew Bogott: Need double-quotes if there are variables! [puppet] - 10https://gerrit.wikimedia.org/r/163650 [19:43:54] RECOVERY - puppet last run on db74 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [19:44:44] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:45:01] (03CR) 10Andrew Bogott: [C: 032] Need double-quotes if there are variables! [puppet] - 10https://gerrit.wikimedia.org/r/163650 (owner: 10Andrew Bogott) [19:45:46] (03PS1) 10Rush: phab fix footer [puppet] - 10https://gerrit.wikimedia.org/r/163652 [19:46:04] (03PS2) 10Rush: phab fix footer [puppet] - 10https://gerrit.wikimedia.org/r/163652 [19:46:54] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [19:46:58] akosiaris: while I fuss with pdns… consider https://gerrit.wikimedia.org/r/#/c/163597/ ? [19:47:25] (03CR) 10Rush: [C: 032] phab fix footer [puppet] - 10https://gerrit.wikimedia.org/r/163652 (owner: 10Rush) [19:48:29] Reedy: You can watch the progress of shard recovery with `curl -s http://localhost:9200/_cat/recovery|grep -v done` on any of the logstash100X hosts. [19:49:06] Ah [19:49:16] I wonder if that should be added to Niks script [19:49:35] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: puppet fail [19:49:53] I bet Nik and Chad have even fancier shell pipelines for watching what happens [19:50:13] heh [19:52:17] (03CR) 10Alexandros Kosiaris: [C: 04-1] "solely on the basis of why is this change updating the mariadb module. Otherwise LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/163597 (owner: 10Andrew Bogott) [19:53:14] bd808: What do the 2 % columns mean? [19:53:51] Reedy: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat-recovery.html [19:53:57] godog: ping? [19:53:57] bblack: ping detected, please leave a message! [19:53:59] seems the outer one keeps increasing till it meets the inner one, then the inner goes up [19:54:02] files % and bytes % [19:54:38] godog: the fundamental issue with ms-be2012 (etc) monitoring is that the node defs for these hosts do not "include base", which is normally how all hosts get their host-monitoring defined. [19:55:04] godog: however, I don't see base in the older ones either (e.g. ms-be10xx), so I'm not sure that blindly applying it there is the solution [19:55:25] (03CR) 10Hashar: [C: 031] "The JJB macro phplint already uses "xargs -n1 -t php -l" and on gallium:" [puppet] - 10https://gerrit.wikimedia.org/r/160691 (https://bugzilla.wikimedia.org/68255) (owner: 10Filippo Giunchedi) [19:55:28] godog: there may be a reason they don't include base, and some other workaround is in place which is borked for codfw? [19:56:50] (03CR) 10Dzahn: [C: 032] [Planet Wikimedia] Add Finne Boonen [puppet] - 10https://gerrit.wikimedia.org/r/163647 (owner: 10Nemo bis) [19:57:03] Nemo_bis: ^ [19:57:22] thanks :) [19:58:30] (03PS3) 10Rush: phabricator - enable HSTS with max-age 7 days [puppet] - 10https://gerrit.wikimedia.org/r/162805 (https://bugzilla.wikimedia.org/38516) (owner: 10Chmarkine) [19:58:34] (03PS2) 10Andrew Bogott: Include libnet-dns-perl with ferm. It's needed for certain ferm functions. [puppet] - 10https://gerrit.wikimedia.org/r/163597 [19:58:38] (03CR) 10Rush: [C: 032 V: 032] "let's try" [puppet] - 10https://gerrit.wikimedia.org/r/162805 (https://bugzilla.wikimedia.org/38516) (owner: 10Chmarkine) [19:58:49] Reedy: Prettier -- curl -s 'http://localhost:9200/_cat/recovery?h=idx,st,ty,shost,thost,fp,bp&v'|grep -v done [19:59:17] 8 still unassigned [19:59:46] (03PS3) 10Andrew Bogott: Include libnet-dns-perl with ferm. It's needed for certain ferm functions. [puppet] - 10https://gerrit.wikimedia.org/r/163597 [20:00:04] gwicke, cscott, subbu: Dear anthropoid, the time has come. Please deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140929T2000). [20:00:13] I think we have it set to only fix 3 at a time. Inherited from cirrus settings [20:00:58] i wonder if the increased parallelism would help or hinder or make no difference [20:01:13] even with 3 initialising, it seems only one is really doing much [20:01:29] That pretty much depends on disk iops and network between the hosts [20:02:48] I'm still always baffled that any index other than today's needs to be changed. [20:03:17] The older ones are open for searching but get no record changes at all [20:04:13] Something less than brilliant in the elasticsearch comparison algorithm I guess. [20:04:24] !log deployed Parsoid version deed30b2 [20:04:29] Logged the message, Master [20:04:40] bd808: maybe 1.3.2 will be better in future ;) [20:04:53] I saw comments of upgrading to 1.3.3 in production [20:05:18] Yeah. It would be good to keep up with changes that Chad and Nik make [20:05:39] bblack: ok… I'm in danger of doing more dns breaking, so could use your advice and attention. Let me know when you have a bit of time? [20:06:34] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 7, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 93, initializing_shards: 3, number_of_data_nodes: 3 [20:06:45] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 7, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 93, initializing_shards: 3, number_of_data_nodes: 3 [20:07:05] RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 7, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 93, initializing_shards: 3, number_of_data_nodes: 3 [20:07:18] manybubbles: are you available to help me out with a mediawiki-config deploy? [20:07:31] cscott: technically in a meeting [20:07:34] cscott: I'm about also if manybubbles is busy... [20:07:36] heh :) [20:07:37] stupid health check [20:07:44] sounds like Reedy wins! [20:08:03] https://gerrit.wikimedia.org/r/163608 is the patch o' the day [20:08:29] i basically know nothing about deploying changes to mediawiki-config [20:08:51] so i'm happy to let someone SWAT this, but it would probably also be educational if someone was willing to walk me through the steps to do it myself [20:09:12] looks like it's trivial enough [20:09:20] Pretty easy to do [20:09:32] hopefully! [20:09:48] One question would be about the labs config though [20:10:00] (03PS1) 10Andrew Bogott: Rearrange labs-ns servers. [dns] - 10https://gerrit.wikimedia.org/r/163665 [20:10:13] bblack: The big picture is ^ [20:10:26] At quick glance, it would look like the labs servers don't know about the ocg servers (and can access them?) [20:10:29] ok. and my first question is about the review process. does it get merged magically if it's +2'ed? should I have targetted the patch to a brach? [20:10:51] $wgCollectionFormatToServeURL['rdf2latex'] = 'http://ocg.svc.eqiad.wmnet:8000'; presumably wants duplicating across [20:11:07] uh, no, ignore that [20:11:18] cscott: Yeah, if you +2 it, jenkins will test and merge [20:11:24] (03PS1) 10Jgreen: remove OCG health check filesystem collection/alerting [puppet] - 10https://gerrit.wikimedia.org/r/163666 [20:11:28] doesn't matter about the branch [20:11:33] topic is "nice" but not needed [20:11:40] just rebase before +2ing [20:11:42] Reedy: if you've got a checked out mediawiki-config, take a look at 21b86e04b8027016a80db1c88a83f60efef0736a and f0ccf4f150ea39bb6040cbb4cdd8e8e9139c8443 for a little context [20:12:42] Reedy: the labs have an override in CommonSettings-labs.php which points to a separate OCG instance. I think both labs and production share the same "old PDF" server in tampa. but that will soon go away... [20:13:06] I thought they were using different array keys [20:13:33] hence when I noticed they were the same (just different hostnames) [20:13:51] currently the labs machines display both "old PDF" ('rl') and the new OCG pdf renderer ('rdf2latex') on the sidebar. Since the 'rl' is inherited, they do $wgCollectionPortletFormats[] = 'rdf2latex'; to add the OCG renderer in CommonSettings-labs.php [20:14:37] (03CR) 10Jgreen: [C: 032 V: 031] remove OCG health check filesystem collection/alerting [puppet] - 10https://gerrit.wikimedia.org/r/163666 (owner: 10Jgreen) [20:15:03] http://ocg.svc.eqiad.wmnet:8000 is the production OCG service; http://deployment-pdf01:8000 is the beta OCG service. [20:15:13] Reedy: am i actually answering your question? [20:16:18] (03PS3) 10Reedy: Switch default PDF renderer to OCG. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163608 (owner: 10Cscott) [20:16:21] (03PS4) 10Rush: T458: Rename ext_ref description and hide it from users [puppet] - 10https://gerrit.wikimedia.org/r/162161 (owner: 10Chad) [20:16:23] (03CR) 10Reedy: [C: 031] Switch default PDF renderer to OCG. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163608 (owner: 10Cscott) [20:16:28] (03CR) 10Rush: [C: 032 V: 032] T458: Rename ext_ref description and hide it from users [puppet] - 10https://gerrit.wikimedia.org/r/162161 (owner: 10Chad) [20:17:36] cscott: Yeah, should be fine :) [20:18:29] Reedy: and we're following the process in https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Step_3:_configuration_and_other_prep_work right? [20:19:06] (03PS1) 10RobH: restrict holmium firewall to corporate office access only [puppet] - 10https://gerrit.wikimedia.org/r/163668 [20:19:43] Reedy: hm, or https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Step_3:_configuration_and_other_prep_work i guess. [20:20:08] cscott: If you +2 it, jenkins will test and merge if appropriate. SSH to tin, cd /srv/mediawiki-staging; git pull; sync-dir wmf-config MESSAGE GOES HERE [20:21:07] ok, sounds good. i just wanted to make sure there wasn't any fancy branch stuff going on. seems like the deployed configuration should be master/origin, right? [20:21:09] (03CR) 10Dzahn: "btw, how many weeks of hhvm logs do we keep, if any" [puppet] - 10https://gerrit.wikimedia.org/r/130296 (owner: 10ArielGlenn) [20:21:23] s,master/origin,origin/master, [20:21:29] (03CR) 10RobH: [C: 032] "My +2 is a lie, I am not convinced this will work. It seems too easy. Since it only changes a server that myself and one other person ac" [puppet] - 10https://gerrit.wikimedia.org/r/163668 (owner: 10RobH) [20:21:59] (03PS3) 10Cscott: Disable the old mwlib PDF render service. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163609 [20:22:15] (03CR) 10Cscott: [C: 04-1] "self--1 to prevent premature deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163609 (owner: 10Cscott) [20:22:29] (03CR) 10Cscott: [C: 031] Switch default PDF renderer to OCG. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163608 (owner: 10Cscott) [20:22:48] Looks like I don't actually have +2 privs for https://gerrit.wikimedia.org/r/163608 [20:22:51] (03PS2) 10RobH: restrict holmium firewall to corporate office access only [puppet] - 10https://gerrit.wikimedia.org/r/163668 [20:23:11] Seriously? [20:23:21] You've got deploy rights though? [20:23:48] (03CR) 10RobH: [C: 032] restrict holmium firewall to corporate office access only [puppet] - 10https://gerrit.wikimedia.org/r/163668 (owner: 10RobH) [20:23:56] Reedy: yeah, but i'm not in https://gerrit.wikimedia.org/r/#/admin/groups/21,members [20:24:16] Added [20:24:54] ok, here goes! [20:25:00] (03CR) 10Cscott: [C: 032] Switch default PDF renderer to OCG. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163608 (owner: 10Cscott) [20:25:05] (03Merged) 10jenkins-bot: Switch default PDF renderer to OCG. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163608 (owner: 10Cscott) [20:25:13] \o/ [20:25:32] puppetized infrastructure replacing non puppetized infrastructure, woooo [20:26:03] ok, https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment#Change_wiki_configuration step 1: "on tin, cd to /a/common" ... there's no /a/common. [20:26:13] anyone want to clean up the wiki while I continue the deploy? [20:26:20] haha [20:26:51] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Step_3:_configuration_and_other_prep_work looks like it's got /srv/mediawiki-staging which matches what Reedy said above, so i'm guessing that's correct [20:28:41] Reedy: and you think I should sync-dir instead of sync-file, right? [20:28:46] (03PS1) 10Ottomata: Reset queue_buffering_max_ms to default 1000 ms for varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/163669 [20:29:05] qchris: jgage: https://gerrit.wikimedia.org/r/#/c/163669/ [20:29:19] cscott: right [20:29:30] !log cscott Synchronized wmf-config: Switch default PDF renderer to OCG (duration: 00m 15s) [20:29:35] Logged the message, Master [20:29:47] i got two failures on the sync-dir: [20:30:00] 20:29:17 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/***', 'mw1010.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet'] on tmh1001 returned [255]: Received disconnect from 10.64.0.197: 2: Too many authentication failures for cscott [20:30:00] 20:29:17 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/***', 'mw1010.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet'] on tmh1002 returned [255]: Received disconnect from 10.64.16.146: 2: Too many authentication failures for cscott [20:30:02] normal? [20:30:18] nope [20:30:33] something hates me then [20:30:50] looks like the tmh100* hosts are missing your key [20:30:57] but at least 227 out of 229 servers love me [20:31:16] (03PS1) 10Jgreen: add $::site to openldap monitoring group declaration [puppet] - 10https://gerrit.wikimedia.org/r/163671 [20:31:30] Can some opsen figure out why cscott's ssh key is missing on tmh1001 and tmh1002? [20:31:42] Please. [20:31:56] (03CR) 10QChris: [C: 031] Reset queue_buffering_max_ms to default 1000 ms for varnishkafka (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/163669 (owner: 10Ottomata) [20:32:43] he doesn't have a home dir [20:32:45] bd808: i don't see a specific admin role there [20:32:46] (03PS1) 10Alexandros Kosiaris: Introduce wmf_ca_2014_2017 [puppet] - 10https://gerrit.wikimedia.org/r/163672 [20:33:05] ocg-admins ? [20:33:09] Should be from deployer rights I guess [20:33:16] what is tmh100x anyway? [20:33:26] ocg-render-admins is on ocg* hosts [20:33:27] timedmediahandler [20:33:30] video transcoding [20:33:34] tmh does not have more than default [20:33:36] ah [20:33:52] tmh = timed media handler [20:34:23] the request _might_ be "ocg-render-admins should also be on tmh hosts" ? [20:34:38] or rather "make tmh-admins, and add ..." [20:34:49] surely he should be on it as part of wikidev? [20:34:54] except I don't generally need to deploy mediawiki-config as part of my ocg-render-admins role. [20:35:15] (03PS2) 10Ottomata: Reset queue_buffering_max_ms to default 1000 ms for varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/163669 [20:35:31] (03CR) 10Ottomata: [C: 032 V: 032] Reset queue_buffering_max_ms to default 1000 ms for varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/163669 (owner: 10Ottomata) [20:35:46] (03PS2) 10Jgreen: add $::site to openldap monitoring group declaration [puppet] - 10https://gerrit.wikimedia.org/r/163671 [20:35:53] ottomata: do you not like jenkins? [20:36:02] Reedy: eh? [20:36:22] (03CR) 10Jgreen: [C: 032 V: 031] add $::site to openldap monitoring group declaration [puppet] - 10https://gerrit.wikimedia.org/r/163671 (owner: 10Jgreen) [20:36:27] You seem to C+2 V+2 nearly every commit, rather than just C+2 and letting jenkins verify/merge [20:36:46] habbit, does jenkins do V+2 [20:36:46] ? [20:36:58] if it's working (which it should be nearly all of the time) [20:37:07] i usually wait to see jenkins give me a check mark [20:37:12] before i merge [20:37:23] but if I shouldn't do V+2 i guess i won't! [20:37:59] "unassigned_shards" : 4 [20:38:03] slooooooow [20:42:11] bblack: Are you done for the day or might you still have a change to work with me re: dns? [20:44:03] Reedy/bd808: should someone with love from tmh100* redo the sync-dir to ensure the deployed state is consistent? [20:44:22] since no one seems to be quick to fix the ssh key root cause [20:46:30] Reedy: hm, maybe you were right about the labs configuration not quite being right. I still see 'Download as WMF PDF' in the sidebar of http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page -- it should be 'Download as PDF' and 'Download as mwlib PDF'. [20:47:22] (03PS1) 10Rush: Phab update fixed_settings updates [puppet] - 10https://gerrit.wikimedia.org/r/163675 [20:47:33] (03PS2) 10Rush: Phab update fixed_settings updates [puppet] - 10https://gerrit.wikimedia.org/r/163675 [20:48:27] (03CR) 10Rush: [C: 032] Phab update fixed_settings updates [puppet] - 10https://gerrit.wikimedia.org/r/163675 (owner: 10Rush) [20:49:42] !log Ran sync-common on tmh1001.eqiad.wmnet for cscott's failed sync-dir there [20:49:48] Logged the message, Master [20:50:22] !log Ran sync-common on tmh1002.eqiad.wmnet for cscott's failed sync-dir there [20:50:26] Logged the message, Master [20:51:37] hm… Coren, if you're non-sick today, can I get a hand with merging a scary change? [20:52:30] cscott: the tmh hosts that failed your sync should be happy now. [20:52:50] bd808: thanks. [20:52:57] (03PS1) 10RobH: properly applying the firewall rules per role class on holmium [puppet] - 10https://gerrit.wikimedia.org/r/163677 [20:53:02] (03CR) 10jenkins-bot: [V: 04-1] properly applying the firewall rules per role class on holmium [puppet] - 10https://gerrit.wikimedia.org/r/163677 (owner: 10RobH) [20:53:11] bd808: i guess i should file an RT for "get my keys install on tmh"? [20:53:16] (03PS2) 10RobH: properly applying the firewall rules per role class on holmium [puppet] - 10https://gerrit.wikimedia.org/r/163677 [20:53:20] i have one more mediawiki-config change to deploy at the end of the week. [20:53:24] (03PS2) 10Alexandros Kosiaris: Introduce wmf_ca_2014_2017 [puppet] - 10https://gerrit.wikimedia.org/r/163672 [20:53:32] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Introduce wmf_ca_2014_2017 [puppet] - 10https://gerrit.wikimedia.org/r/163672 (owner: 10Alexandros Kosiaris) [20:53:35] cscott: yea, that sounds right [20:53:43] It's weird that you can scap to all of the hosts except them [20:53:49] because there is just no admin class on tmh nodes so far [20:53:54] (03CR) 10jenkins-bot: [V: 04-1] properly applying the firewall rules per role class on holmium [puppet] - 10https://gerrit.wikimedia.org/r/163677 (owner: 10RobH) [20:54:15] (03PS3) 10RobH: properly applying the firewall rules per role class on holmium [puppet] - 10https://gerrit.wikimedia.org/r/163677 [20:54:20] (03CR) 10jenkins-bot: [V: 04-1] properly applying the firewall rules per role class on holmium [puppet] - 10https://gerrit.wikimedia.org/r/163677 (owner: 10RobH) [20:54:22] mutante: How did I just ssh to them? Some other group that cscott is not a member of? [20:54:27] (03PS4) 10RobH: properly applying the firewall rules per role class on holmium [puppet] - 10https://gerrit.wikimedia.org/r/163677 [20:54:39] Or leftover from before the key management changes? [20:55:05] bd808: i just looked at puppet so far, so either that or it's somewhere in role::mediawiki::videoscaler but shouldn't be [20:55:20] looks on server [20:55:27] (03CR) 10RobH: [C: 032] properly applying the firewall rules per role class on holmium [puppet] - 10https://gerrit.wikimedia.org/r/163677 (owner: 10RobH) [20:56:55] also -- my change doesn't seem to have deployed to http://en.wikipedia.beta.wmflabs.org/ -- anyone know anything about that? [20:57:14] andrewbogott: Sure, I'll be back from food in ~30m; that okay with you? [20:57:20] Coren: yep! [20:57:45] indeed, so there are a lot of deployers on it.. [20:57:59] cscott: Jenkins should've done beta [20:58:08] Reedy: is there a delay? [20:58:19] https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ [20:58:20] * bd808 looks at failed jobs in https://integration.wikimedia.org/ci/job/beta-scap-eqiad [20:58:21] There's a lot of red [20:59:02] Something looks fishy [20:59:03] that would look like it's possibly not making it to deployment-rsync01 [20:59:09] and then not making it to apaches [20:59:25] * bd808 agrees with Sam's assessment [20:59:33] (03PS1) 10RobH: holmium more corrections, service not rule for ferm definitions [puppet] - 10https://gerrit.wikimedia.org/r/163678 [20:59:39] You want to track down the problem Reedy? [21:00:24] (03CR) 10RobH: [C: 032] "once more, with feeling" [puppet] - 10https://gerrit.wikimedia.org/r/163678 (owner: 10RobH) [21:00:37] * Reedy tries to login to deployment-rsync01 [21:01:17] bd808: looks like you are right, from before key management changes, but did not get a new group [21:01:34] i'll make a ticket [21:01:53] mutante: Cool. Thanks for tracking it down [21:02:52] mutante: cc me? [21:03:05] cscott: ok, yep [21:05:21] "initializing_shards" : 3, [21:05:21] "unassigned_shards" : 0 [21:06:33] rsync: failed to set times on "/srv/mediawiki/.": Operation not permitted (1) [21:06:33] rsync: rename "/srv/mediawiki/.agent.dFP8KL" -> ".~tmp~/agent": Permission denied (13) [21:06:33] rsync: rename "/srv/mediawiki/.wikiversions-labs.cdb.1C6Idl" -> ".~tmp~/wikiversions-labs.cdb": Permission denied (13) [21:07:02] (03CR) 10MaxSem: "After some poking, it is unmaintained since summer 2012 and fundamentally broken, I'm gonna deploy this today." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162505 (owner: 10Dzahn) [21:07:18] woo [21:07:49] some host doesn't like Reedy? [21:08:00] https://integration.wikimedia.org/ci/job/beta-scap-eqiad/23438/console [21:08:02] not just me :) [21:08:18] pfft. beta [21:09:16] (03PS1) 10Ori.livneh: HHVM: capture & log traces for catchable fatals. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 [21:09:22] "initializing_shards" : 1, [21:09:22] "unassigned_shards" : 0 [21:10:50] (03PS1) 10Alexandros Kosiaris: Assign ldap-mirror.wikimedia.org WMF CA cert [puppet] - 10https://gerrit.wikimedia.org/r/163682 [21:10:51] deployment-rsync01 seems pretty out of date [21:11:21] (03PS1) 10Alexandros Kosiaris: ldap-mirror to plutonium.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/163683 [21:12:43] !log elasticsearch upgradeed to 1.3.2 on logstash1001 [21:12:48] Logged the message, Master [21:13:01] bd808: nearly 100 minutes [21:13:08] ouch [21:13:18] (03CR) 10Alexandros Kosiaris: [C: 032] Assign ldap-mirror.wikimedia.org WMF CA cert [puppet] - 10https://gerrit.wikimedia.org/r/163682 (owner: 10Alexandros Kosiaris) [21:14:01] logstash1002 should be quicker... [21:14:17] 5 unassigned, 3 initializing [21:14:42] bd808: dprsync: write failed on "/srv/mediawiki/php-master/extensions/WikiEditor/i18n/azb.json": No space left on device (28) [21:14:47] /dev/mapper/vd-second--local--disk 9.1G 8.6G 33M 100% /srv [21:15:12] (03CR) 10Alexandros Kosiaris: [C: 032] ldap-mirror to plutonium.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/163683 (owner: 10Alexandros Kosiaris) [21:19:27] (03PS1) 10Alexandros Kosiaris: assign base::firewall to plutonium [puppet] - 10https://gerrit.wikimedia.org/r/163685 [21:20:05] (03CR) 10jenkins-bot: [V: 04-1] assign base::firewall to plutonium [puppet] - 10https://gerrit.wikimedia.org/r/163685 (owner: 10Alexandros Kosiaris) [21:20:48] Reedy: /srv is full there? Yikes [21:21:16] guess it's a small instance with a 20GB hard drive [21:23:03] (03PS2) 10Alexandros Kosiaris: assign base::firewall to plutonium [puppet] - 10https://gerrit.wikimedia.org/r/163685 [21:23:52] (03CR) 10Alexandros Kosiaris: [C: 032] assign base::firewall to plutonium [puppet] - 10https://gerrit.wikimedia.org/r/163685 (owner: 10Alexandros Kosiaris) [21:25:24] (03CR) 10Dzahn: "that looks fine:" [puppet] - 10https://gerrit.wikimedia.org/r/162805 (https://bugzilla.wikimedia.org/38516) (owner: 10Chmarkine) [21:26:38] (03PS1) 10Ori.livneh: HHVM: Set a fatal handler that logs traces [puppet] - 10https://gerrit.wikimedia.org/r/163686 [21:28:43] andrewbogott: I be all yours. Point me at it? [21:32:58] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: Puppet has 1 failures [21:37:07] (03PS1) 10Alexandros Kosiaris: exim4 uses ldap-mirror now [puppet] - 10https://gerrit.wikimedia.org/r/163735 [21:38:07] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Puppet has 1 failures [21:38:12] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: Puppet has 1 failures [21:38:12] (03PS1) 10Reedy: Don't use deployment-rsync01 as scap proxy [puppet] - 10https://gerrit.wikimedia.org/r/163736 [21:39:07] Coren: https://gerrit.wikimedia.org/r/#/c/163665/ [21:39:17] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [21:39:48] That needs review, and I'd also like you to double-check that all the servers in question are indeed working and syncing properly with ldap and such... [21:39:50] 1 unassigned shard on logstash1002 [21:40:02] Coren: I'll create a new instance right now, then we can verify that it gets picked up everywhere. [21:42:05] (03CR) 10Ori.livneh: "Companion change to operations/puppet: Id93b9eaa1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [21:42:17] PROBLEM - MySQL Processlist on db1064 is CRITICAL: CRIT 84 unauthenticated, 0 locked, 0 copy to table, 1 statistics [21:42:18] Coren: OK, I just now created a new host, testlabs-dns.wmflabs.org [21:42:41] (03CR) 10BryanDavis: Don't use deployment-rsync01 as scap proxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/163736 (owner: 10Reedy) [21:43:15] Hmmm. [21:43:17] RECOVERY - MySQL Processlist on db1064 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 0 statistics [21:44:04] Coren: I've confirmed that nslookup works for each of the three, I think… would like you to doublecheck though [21:44:04] andrewbogott: Works fine from outside our infrastructure, works fine from prod. [21:44:45] I get NXDOMAIN from within labs though. [21:45:45] Worse yet, it works from labnet1001 which is the DNS for labs. So that &#^ piece of dnsmasq crap isn't recursing right. [21:45:53] Coren: Actually... [21:46:10] I got a fail for labs-ns0 and restarted it... [21:46:12] And then it worked. [21:46:17] So maybe dnsmasq is caching that bad response? [21:46:28] That would be... evil. [21:46:58] hm, if I specify a name server it works fine [21:47:14] Well sure, because then you're not asking dnsmasq at all. [21:47:25] nslookup testlabs-dns works [21:47:39] What? Where from? [21:47:47] but not nslookup testlabs-dns.wmflabs.org [21:47:52] on a labs instance [21:48:02] testlabs-dns. 0 IN A 10.68.16.136 <-- that's a very, very evil answer. [21:48:10] Note the trailing . [21:49:00] What did you ask to get that? [21:49:26] dig (you should be using it too, it's much better at telling you everything) [21:49:41] :-) [21:50:17] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:51:18] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [21:51:19] hm, so I see... [21:51:45] Coren: So, wait an hour and see if it shapes up? [21:51:56] Or should we forge ahead and presume this is unrelated? [21:52:04] (Since I havent really changed anything yet, it /should/ be unrelated) [21:52:45] (03PS1) 10Dzahn: decom silver [puppet] - 10https://gerrit.wikimedia.org/r/163743 [21:53:41] andrewbogott: I /would/ like to see if it's a cache issue first. [21:53:49] (03PS1) 10QChris: Force ACKs from all in-sync kafka replicas [puppet] - 10https://gerrit.wikimedia.org/r/163744 (https://bugzilla.wikimedia.org/69667) [21:54:01] Coren: Does that involve anything other than waiting? [21:54:04] andrewbogott: Or if something broke when creating instances. [21:54:26] andrewbogott: Well, I supposed we could restart dnsmasq but I don't know if it flushes its cache on restart for sure anyways. [21:54:41] (03CR) 10Dzahn: [C: 032] "not used for anything anymore per RT #8459 - yuvi moved vumi and also ldap client to terbium.. thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/163743 (owner: 10Dzahn) [21:54:41] ok, so meet back here in an hour :) [21:54:42] (Or even if it /does/ do negative caching) [21:54:44] kk [21:54:55] Want to check my patch in the meantime, make sure I didn't transpose any ip numbers? [21:55:01] (which I did once today already) [21:55:13] anyone here grok statsd/ [21:56:23] YuviPanda|zzzz: "silver is a udp2log data collection server (udp2log::logger) [21:56:28] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:56:28] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:56:55] YuviPanda|zzzz: but just outdated motd pretty sure.. so decom anyways.. thanks for getting rid of vumi [21:57:43] YuviPanda|zzzz: oh yea,..that _is_ vumi. udp2log --config-file=/etc/udp2log/vumi --daemon -p 5678 [21:58:22] !log stopping udp2log-vumi on silver - not needed anymore per Yuvipanda [21:58:27] Logged the message, Master [21:59:24] Vumipanda [21:59:37] (03PS1) 10Ori.livneh: Graphite: allow Grafana to fetch metric data [puppet] - 10https://gerrit.wikimedia.org/r/163745 [22:01:13] !log silver - revoke puppet cert, salt-key, stopping services, disable monitoring [22:01:18] Logged the message, Master [22:02:11] !log elasticsearch upgradeed to 1.3.2 on logstash1002 [22:02:17] Logged the message, Master [22:02:19] bd808: 50 minutes that time [22:02:38] PROBLEM - ElasticSearch health check on logstash1003 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.136 [22:03:27] Reedy: ^ but. that's 1003 ? [22:03:38] mutante: Right, it's now upgrading 1003 [22:03:48] ah :) [22:03:50] It's very likely icinga tried it whilst it was down for restart/package upgrade [22:03:55] yea [22:04:13] should be back up if icinga-wm cares [22:06:09] (03PS2) 10Giuseppe Lavagetto: Graphite: allow Grafana to fetch metric data [puppet] - 10https://gerrit.wikimedia.org/r/163745 (owner: 10Ori.livneh) [22:07:18] (03CR) 10Giuseppe Lavagetto: [C: 031] "I don't love this approach, but we use it already and I found no specific vulnerabilities for this API." [puppet] - 10https://gerrit.wikimedia.org/r/163745 (owner: 10Ori.livneh) [22:07:21] (03PS1) 10Alexandros Kosiaris: Try a different approach for handling wmf_ca_2014_2017 [puppet] - 10https://gerrit.wikimedia.org/r/163748 [22:10:30] bblack: not a particular reason except that the old hosts didn't include it, I'll take a closer look tomorrow but it feels like we should be including base and that's it, or even standard [22:11:43] yeah I mean that gives us things like ntp as well. usually there's a really good reason not to include it [22:12:06] (03CR) 10Ori.livneh: [C: 032] Graphite: allow Grafana to fetch metric data [puppet] - 10https://gerrit.wikimedia.org/r/163745 (owner: 10Ori.livneh) [22:12:16] indeed, possibly lost in commits [22:12:37] bd808: is logstash1003 a different spec? It seems to only initialise 2 shards at once [22:12:42] ms-be2*** have services but no hosts for them [22:13:14] bblack: I've been thinking more about the "initial downtime", it seems we should be having that state explicit somewhere I think [22:13:26] Reedy: they should all be the same hardware I think [22:13:49] andrewbogott: Sorry, that didn't ping me. Point me at said patch? [22:14:11] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: Puppet last ran 362648 seconds ago, expected 14400 [22:14:12] Coren: https://gerrit.wikimedia.org/r/#/c/163665/ [22:14:31] testlabs-dns.wmflabs.org. 3600 IN A 208.80.155.192 [22:14:31] andrewbogott: ^^ from within labs. There's *also* still the funky TLD answer, but at least that one is also there. [22:16:08] (03CR) 10coren: [C: 031] "I'm not entirely clear on why we use separate IPs rather than just extra names, but the patch does what it sets out to do." [dns] - 10https://gerrit.wikimedia.org/r/163665 (owner: 10Andrew Bogott) [22:16:44] I'm going to deploy a fix for 71421 before SWAT, since it's affecting account creation, and since I won't be around during SWAT. [22:16:47] Coren: "That's how Ryan did it" [22:16:59] Coren: so, it was dnsmasq caching... [22:17:10] Does that give you the confidence for me to merge? Or are there still things to doublecheck? [22:17:11] andrewbogott: So we now know that (a) dnsmasq sucks in other, extra ways and (b) your fixes to the DNS worked. [22:17:42] andrewbogott: Yeah, I'm still a little troubled by dnsmasq answering authoratively for the root, but I don't think that's related. [22:18:17] !log renaming labs-ns1 to labs-ns0 and labs-ns2 to labs-ns1 [22:18:21] Logged the message, Master [22:18:27] (03CR) 10Andrew Bogott: [C: 032] Rearrange labs-ns servers. [dns] - 10https://gerrit.wikimedia.org/r/163665 (owner: 10Andrew Bogott) [22:19:34] mutante: Jeff_Green, just saw your icinga puppet email [22:19:42] is that possible related to the work Yuvi and I merged on Friday? [22:19:54] ottomata: no clue. I'm emailing what I've got [22:20:16] Coren: did you scare up any reviews for https://gerrit.wikimedia.org/r/#/c/163222/? [22:20:28] ottomata: godog Jeff_Green bblack : i think it's all the same, what bblack said about not including base on ms-be hosts [22:20:47] if they'd include base the hosts would probably be created in icinga [22:20:52] ok [22:20:59] I'm pretty sure I want to move "get a real recursor for labs" up our priority list. [22:21:01] and then the services would match something [22:21:37] ottomata: did you merge something related to ms-be though? maybe it was a separate error [22:22:02] no, but we did merge stuff related to icinga and naggen [22:22:06] should have all been a no-op [22:22:10] just module rearranging. [22:22:32] i see, yea, i think it's not that [22:22:36] andrewbogott: No new ones. If I don't get more by tomorrow morning I'll presume "nolo contendere" and merge it in and start working on the followup. [22:23:53] It's not like nobody was aware of it. [22:24:39] anyone got a sms number for alex? [22:24:49] !log elasticsearch upgradeed to 1.3.2 on logstash1003 [22:24:55] Logged the message, Master [22:25:08] Coren: ok, fairenough [22:25:10] cajoel: there's one on the office contact list [22:25:12] RECOVERY - ElasticSearch health check on logstash1003 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 36: active_shards: 103: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [22:25:28] Coren: I kind of want to shut down virt0 tomorrow morning, and see if anything breaks. [22:25:38] message sent [22:25:44] As far as I know I've handled everythign that depends on it... [22:25:57] andrewbogott: how about the logins on icinga etc? [22:26:11] (03PS1) 10BBlack: Remove 1H delay on new icinga services/hosts [puppet] - 10https://gerrit.wikimedia.org/r/163752 [22:26:13] mutante: they use it as a secondary I think. So those services will still work, they'll just be more fragile [22:26:17] andrewbogott: I expect that any /current/ users of LDAP would have already broken. [22:26:31] (03CR) 10BBlack: [C: 032 V: 032] Remove 1H delay on new icinga services/hosts [puppet] - 10https://gerrit.wikimedia.org/r/163752 (owner: 10BBlack) [22:26:36] Coren: not necessarily, virt1000 and virt0 are still working as always, with the same certs as always. [22:26:48] andrewbogott: Ah, true. *sigh* [22:27:02] ah, they still have virt1000, true [22:27:35] Well, it's not the certificates that are make-or-break I expect. No doubt it'll shake out a couple of ad-hoc configs with explicit names here and there; but I expect it'll be a day of hunting down small annoyances not large broken things. [22:27:59] !log packages upgraded on logstash1001 [22:28:05] Logged the message, Master [22:28:54] !log silver - shutting down, wait with wiping it for a few days, just incase [22:28:58] Logged the message, Master [22:30:11] !log packages upgraded on logstash1002 [22:30:16] Logged the message, Master [22:34:39] !log restarted elasticsearch on logstash1003 post java upgrades [22:34:45] Logged the message, Master [22:36:29] !log dist-upgrade on logstash1003 and rebooting [22:36:34] Logged the message, Master [22:38:41] PROBLEM - Host logstash1003 is DOWN: CRITICAL - Plugin timed out after 15 seconds [22:38:45] PROBLEM - NTP on tungsten is CRITICAL: NTP CRITICAL: No response from NTP server [22:39:02] RECOVERY - Host logstash1003 is UP: PING OK - Packet loss = 0%, RTA = 2.64 ms [22:39:24] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 31 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 29, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 72, uinitializing_shards: 2, unumber_of_data_nodes: 3} [22:39:24] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 31 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 29, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 72, uinitializing_shards: 2, unumber_of_data_nodes: 3} [22:39:32] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [22:40:31] RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 102, initializing_shards: 1, number_of_data_nodes: 3 [22:40:31] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 102, initializing_shards: 1, number_of_data_nodes: 3 [22:41:52] (03CR) 10Tim Starling: "Added a few people to the reviewer list who I thought may want to comment on the log format and destination. This change is a replacement " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [22:45:36] (03CR) 10Catrope: [C: 031] "Log format looks fine to me" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [22:45:49] (03PS1) 10BBlack: add standard to swift_new -based hosts in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/163755 [22:46:07] (03CR) 10Alexandros Kosiaris: [C: 032] Try a different approach for handling wmf_ca_2014_2017 [puppet] - 10https://gerrit.wikimedia.org/r/163748 (owner: 10Alexandros Kosiaris) [22:46:31] (03PS1) 10Dzahn: redirect noc user homedirs to people.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/163756 [22:47:40] !log dist-upgrade logstash1002 and reboot [22:47:45] Logged the message, Master [22:47:51] PROBLEM - puppet last run on lead is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:52] (03CR) 10Dzahn: "yuvipanda: ^" [puppet] - 10https://gerrit.wikimedia.org/r/158355 (owner: 10JanZerebecki) [22:49:11] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: Puppet has 1 failures [22:49:23] PROBLEM - puppet last run on es1007 is CRITICAL: CRITICAL: Puppet has 1 failures [22:49:24] PROBLEM - Host logstash1002 is DOWN: CRITICAL - Plugin timed out after 15 seconds [22:49:27] (03PS2) 10BBlack: add standard to swift_new -based hosts in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/163755 [22:49:32] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: Puppet has 1 failures [22:49:32] (03CR) 10BBlack: [C: 032 V: 032] add standard to swift_new -based hosts in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/163755 (owner: 10BBlack) [22:49:41] PROBLEM - puppet last run on db1016 is CRITICAL: CRITICAL: Puppet has 1 failures [22:49:42] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Puppet has 1 failures [22:49:42] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: Puppet has 1 failures [22:49:42] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: Puppet has 1 failures [22:49:42] PROBLEM - puppet last run on labstore1001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:49:43] (03CR) 10Dzahn: [C: 032] gerrit - raise HSTS max-age to 1 year [puppet] - 10https://gerrit.wikimedia.org/r/159729 (https://bugzilla.wikimedia.org/38516) (owner: 10Chmarkine) [22:49:46] (03PS1) 10Alexandros Kosiaris: followup to I4b970fc09c0e5b18b84d5d81412e615e [puppet] - 10https://gerrit.wikimedia.org/r/163757 [22:49:51] RECOVERY - Host logstash1002 is UP: PING OK - Packet loss = 0%, RTA = 1.29 ms [22:49:52] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:01] PROBLEM - puppet last run on antimony is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:09] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:15] PROBLEM - puppet last run on analytics1026 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:17] these are me ^ [22:50:21] PROBLEM - puppet last run on db2029 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:21] PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:21] PROBLEM - puppet last run on db2007 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:21] ignore [22:50:22] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:22] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:22] PROBLEM - puppet last run on analytics1013 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:22] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:23] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:33] (03CR) 10Dzahn: [V: 032] gerrit - raise HSTS max-age to 1 year [puppet] - 10https://gerrit.wikimedia.org/r/159729 (https://bugzilla.wikimedia.org/38516) (owner: 10Chmarkine) [22:50:35] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:36] (03CR) 10Alexandros Kosiaris: [C: 032] followup to I4b970fc09c0e5b18b84d5d81412e615e [puppet] - 10https://gerrit.wikimedia.org/r/163757 (owner: 10Alexandros Kosiaris) [22:50:38] PROBLEM - puppet last run on wtp1012 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:38] PROBLEM - puppet last run on polonium is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:38] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 23 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 21, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 80, uinitializing_shards: 2, unumber_of_data_nodes: 3} [22:50:38] PROBLEM - puppet last run on virt1001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:38] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:38] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:39] PROBLEM - puppet last run on elastic1006 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:39] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:42] PROBLEM - puppet last run on mw1044 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:43] PROBLEM - puppet last run on analytics1016 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:53] stfu puppet [22:50:54] PROBLEM - puppet last run on mw1055 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:54] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 1 failures [22:50:55] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 12 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 10, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 91, uinitializing_shards: 2, unumber_of_data_nodes: 3} [22:50:55] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:03] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:03] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:03] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:03] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:03] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:04] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:07] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:07] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:07] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:07] PROBLEM - puppet last run on ssl1005 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:07] PROBLEM - puppet last run on virt1004 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:08] PROBLEM - puppet last run on mw1149 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:08] PROBLEM - puppet last run on mw1111 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:09] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:09] PROBLEM - puppet last run on mw1049 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:10] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:13] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:13] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:14] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:14] PROBLEM - puppet last run on search1005 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:15] wow that was fast [22:51:23] oh, it's not me, yay [22:51:30] PROBLEM - puppet last run on mw1014 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:33] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:33] PROBLEM - puppet last run on mw1051 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:34] PROBLEM - puppet last run on mw1079 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:34] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:34] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:34] :-) [22:51:34] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:34] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:43] PROBLEM - puppet last run on argon is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:43] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:43] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:44] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:44] PROBLEM - puppet last run on snapshot1002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:44] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:44] PROBLEM - puppet last run on search1002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:45] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:45] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:46] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:46] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 3, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 98, initializing_shards: 2, number_of_data_nodes: 3 [22:51:47] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:47] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:48] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:48] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:53] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:53] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:53] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:53] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:53] PROBLEM - puppet last run on amssq42 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:53] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:54] PROBLEM - puppet last run on mw1057 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:54] RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 2, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 99, initializing_shards: 2, number_of_data_nodes: 3 [22:51:55] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:55] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:56] PROBLEM - puppet last run on mw1050 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:56] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:57] PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:03] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:04] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:04] PROBLEM - puppet last run on thallium is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:04] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:04] PROBLEM - puppet last run on mw1034 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:04] PROBLEM - puppet last run on search1017 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:04] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:05] PROBLEM - puppet last run on analytics1023 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:05] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:06] PROBLEM - puppet last run on snapshot1004 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:06] PROBLEM - puppet last run on db72 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:07] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [22:52:07] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:15] PROBLEM - puppet last run on mw1156 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:15] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:15] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:15] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:26] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:26] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:26] PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:26] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:26] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:27] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:27] PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:28] PROBLEM - puppet last run on elastic1011 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:28] PROBLEM - puppet last run on db1004 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:29] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:29] PROBLEM - puppet last run on db69 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:33] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:34] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:43] PROBLEM - puppet last run on ms-be1008 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:43] PROBLEM - puppet last run on ssl1009 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:43] PROBLEM - puppet last run on search1024 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:43] RECOVERY - puppet last run on polonium is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [22:52:44] PROBLEM - puppet last run on analytics1037 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:44] PROBLEM - puppet last run on mw1030 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:44] PROBLEM - puppet last run on ssl3002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:45] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:45] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:49] akosiaris: bblack: should we do this? https://gerrit.wikimedia.org/r/#/c/162171/1/modules/ntp/manifests/client.pp [22:52:53] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:57] PROBLEM - puppet last run on mw1159 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:57] PROBLEM - puppet last run on mw1116 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:57] PROBLEM - puppet last run on db1055 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:58] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:58] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:58] PROBLEM - puppet last run on elastic1014 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:58] PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:59] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:59] PROBLEM - puppet last run on mw1081 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:00] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:00] PROBLEM - puppet last run on mw1074 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:03] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:08] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:09] PROBLEM - puppet last run on mw1146 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:13] PROBLEM - puppet last run on mc1001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:13] RECOVERY - NTP on tungsten is OK: NTP OK: Offset -0.002251148224 secs [22:53:13] PROBLEM - puppet last run on search1023 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:21] ok. first thing tomorrow [22:53:23] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:23] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:23] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:23] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:23] PROBLEM - puppet last run on analytics1032 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:23] PROBLEM - puppet last run on mw1097 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:24] PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:24] PROBLEM - puppet last run on db1057 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:35] updating that check to only issue critical on puppet compilation failure [22:53:38] PROBLEM - puppet last run on es1002 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:42] I would be super grateful if y'all would not make icinga-wm go into panic mode while SWAT is happening. :) [22:53:44] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:53] PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:53] PROBLEM - puppet last run on virt1007 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:55] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: Puppet has 1 failures [22:54:03] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Puppet has 1 failures [22:54:03] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Puppet has 1 failures [22:54:13] PROBLEM - NTP on logstash1003 is CRITICAL: NTP CRITICAL: Offset unknown [22:54:48] (03CR) 10Dzahn: "adding bblack because it also changes NTP client config.. might be outdated now anyways" [puppet] - 10https://gerrit.wikimedia.org/r/162171 (owner: 10Dzahn) [22:56:13] mutante: can just skip the ntp part (it will fail on rebase anyways) [22:56:30] it's no longer necessary as it was all updated with the ntp change [22:56:48] Why don't we have symlinks for like stage0, stage1, stage2 [22:57:02] ln -s php-1.25wmf1 stage0 [22:57:23] RECOVERY - NTP on logstash1003 is OK: NTP OK: Offset -0.002212166786 secs [22:57:30] marktraceur: what's the use case? [22:57:31] (03CR) 10Dzahn: "when merging that, also make ticket to properly remove/wipe the cert" [puppet] - 10https://gerrit.wikimedia.org/r/163312 (owner: 10Dzahn) [22:57:44] bblack: ok, thanks, lemme rebase [22:57:59] Shrug, mostly I'm lazy and dislike writing directory names with combinations of letters and numbers [22:58:01] (03CR) 10BBlack: "The part that touches ntp will fail on rebase, that class doesn't exist anymore. There's no NTP changes needed as part of this decom anym" [puppet] - 10https://gerrit.wikimedia.org/r/162171 (owner: 10Dzahn) [22:58:07] Plus it's not always easy to keep track [22:58:13] oh whoops I thought maybe you were watching there and not here [22:58:17] either way :) [22:58:31] I will do SWAT today, lots of mobile stuff [22:58:39] MaxSem: Oh, I was going to do it [22:58:49] Is it...lots and *lots*? [22:59:07] reedy@tin:/srv/mediawiki-staging$ ./multiversion/activeMWVersions --withdb [22:59:07] 1.24wmf22=plwikimedia 1.25wmf1=mediawikiwiki [22:59:22] You're wonderful Reedy [22:59:34] mwversionsinuse? [22:59:58] marktraceur, I no want all this shit:P [23:00:04] ebernhardson: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140929T2300). Please do the needful. [23:00:04] Reedy: are you aware of ongoing problems with redis on beta labs? [23:00:18] MaxSem: What? [23:00:36] to deploy;) [23:01:14] (03PS2) 10Dzahn: decom linne [puppet] - 10https://gerrit.wikimedia.org/r/162171 [23:01:14] chrismcmahon: not really. other than you pinging ori about it this morning [23:01:31] (03PS3) 10Dzahn: decom linne [puppet] - 10https://gerrit.wikimedia.org/r/162171 [23:01:47] (03PS4) 10Dzahn: decom linne [puppet] - 10https://gerrit.wikimedia.org/r/162171 [23:01:58] chrismcmahon: wassup? [23:02:28] Reedy: OK. it's still not 100%. try logging in and then http://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences, get redis error [23:02:38] (03PS5) 10Dzahn: decom linne [puppet] - 10https://gerrit.wikimedia.org/r/162171 [23:03:05] [90d73499] /wiki/Special:Preferences Exception from line 827 of /srv/mediawiki/php-master/includes/jobqueue/JobQueueRedis.php: Unable to connect to redis server. [23:03:09] RoanKattouw, does your change require a submodule update? [23:03:11] (03CR) 10Dzahn: [C: 032] decom linne [puppet] - 10https://gerrit.wikimedia.org/r/162171 (owner: 10Dzahn) [23:03:12] Reedy: or edit a page in VE and save http://en.wikipedia.beta.wmflabs.org/wiki/0.6580621172556147?veaction=edit [23:03:22] MaxSem: heh, is that why it pinged me? :) [23:03:24] leeeroy jeeeeenkins [23:03:48] Reedy: I re-opened the bug, but maybe need ori to put it over the line [23:03:51] (yea, i'm not even a WoW player) [23:04:12] mutante: isn't that now common across MMORPG's? it was used all the time when i played EVE years ago [23:04:20] MaxSem: So you're doing it? [23:04:24] * ebernhardson is probably in the wrong channel to discuss MMORPG, ignore :P [23:04:27] MaxSem: Yes, recursively [23:04:33] PROBLEM - NTP on logstash1002 is CRITICAL: NTP CRITICAL: Offset unknown [23:04:52] marktraceur, yep [23:04:54] (03PS8) 10Krinkle: Gzip SVGs on back upload varnishes [puppet] - 10https://gerrit.wikimedia.org/r/108484 (https://bugzilla.wikimedia.org/54291) (owner: 10Ori.livneh) [23:05:47] (03PS1) 10Alexandros Kosiaris: openldap: fix sambaNTpassword aci [puppet] - 10https://gerrit.wikimedia.org/r/163758 [23:06:05] ebernhardson: i just have to think of that every time i wait for jenkins :) [23:06:13] it's trying to call it [23:06:58] btw, there is going to be a flurry of icinga-wm messages about puppet recovery [23:07:11] in about 2-5 mins :-) [23:07:13] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [23:07:13] RECOVERY - puppet last run on labstore1001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [23:07:13] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [23:07:16] recoveries are cool [23:07:21] (03PS9) 10Krinkle: Gzip SVGs on back upload varnishes [puppet] - 10https://gerrit.wikimedia.org/r/108484 (https://bugzilla.wikimedia.org/54291) (owner: 10Ori.livneh) [23:07:22] I should have said secs ... [23:07:27] :) [23:07:43] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [23:07:53] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [23:07:53] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [23:07:53] RECOVERY - puppet last run on es1007 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [23:07:54] RECOVERY - puppet last run on wtp1012 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [23:07:54] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [23:07:54] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [23:08:03] RECOVERY - puppet last run on elastic1006 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [23:08:03] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [23:08:03] RECOVERY - puppet last run on mw1055 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [23:08:03] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [23:08:03] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [23:08:03] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [23:08:04] RECOVERY - puppet last run on db1016 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [23:08:04] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [23:08:14] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [23:08:14] RECOVERY - puppet last run on cp1058 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [23:08:23] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [23:08:28] RECOVERY - puppet last run on mw1149 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [23:08:34] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [23:08:35] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [23:08:35] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [23:08:35] RECOVERY - NTP on logstash1002 is OK: NTP OK: Offset -0.003017544746 secs [23:08:35] RECOVERY - puppet last run on search1005 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [23:08:43] RECOVERY - puppet last run on analytics1026 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [23:08:43] RECOVERY - puppet last run on mw1051 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [23:08:43] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [23:08:43] RECOVERY - puppet last run on db2007 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [23:08:44] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [23:08:44] RECOVERY - puppet last run on analytics1013 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [23:08:44] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [23:08:45] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [23:08:45] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [23:08:46] (03CR) 10Krinkle: "Escaped the plus sign in the regex. I do question whether this should be a regex at all though. Shouldn't it do like ^ and $ or just plain" [puppet] - 10https://gerrit.wikimedia.org/r/108484 (https://bugzilla.wikimedia.org/54291) (owner: 10Ori.livneh) [23:08:53] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [23:08:56] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [23:08:56] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [23:08:56] RECOVERY - puppet last run on snapshot1002 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [23:08:56] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [23:08:56] RECOVERY - puppet last run on search1002 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [23:08:57] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [23:09:03] RECOVERY - puppet last run on virt1001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [23:09:03] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [23:09:03] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [23:09:03] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [23:09:04] RECOVERY - puppet last run on mw1044 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [23:09:04] (03PS2) 10Krinkle: rcstream: make lvs health check fetch /nginx_status [puppet] - 10https://gerrit.wikimedia.org/r/145997 (https://bugzilla.wikimedia.org/67957) (owner: 10Ori.livneh) [23:09:04] RECOVERY - puppet last run on analytics1016 is OK: OK: Puppet is currently enabled, last run 61 seconds ago with 0 failures [23:09:04] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [23:09:05] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [23:09:05] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [23:09:06] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [23:09:06] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [23:09:07] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [23:09:13] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [23:09:16] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [23:09:16] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [23:09:17] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [23:09:17] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [23:09:17] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [23:09:17] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [23:09:17] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [23:09:18] RECOVERY - puppet last run on analytics1023 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [23:09:23] RECOVERY - puppet last run on ssl1005 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [23:09:23] RECOVERY - puppet last run on virt1004 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [23:09:24] RECOVERY - puppet last run on mw1049 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [23:09:24] RECOVERY - puppet last run on mw1111 is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [23:09:24] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [23:09:24] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [23:09:33] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 63 seconds ago with 0 failures [23:09:33] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [23:09:33] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [23:09:33] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [23:09:34] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [23:09:34] RECOVERY - puppet last run on db1004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [23:09:43] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [23:09:43] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [23:09:44] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [23:09:44] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [23:09:44] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [23:09:45] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [23:09:47] icinga-wm: come back soon [23:09:55] RECOVERY - puppet last run on argon is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [23:09:55] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [23:09:55] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [23:09:55] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [23:09:55] RECOVERY - puppet last run on ms-be1008 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [23:09:55] RECOVERY - puppet last run on ssl1009 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [23:10:12] (03CR) 10Krinkle: "Assuming this change does the expected thing and we haven't merged a different patch, this is currently causing rcstream servers to have t" [puppet] - 10https://gerrit.wikimedia.org/r/145997 (https://bugzilla.wikimedia.org/67957) (owner: 10Ori.livneh) [23:10:15] It will be missed. [23:10:20] But not that much. [23:10:31] bd808: paravoid: mutante: https://gerrit.wikimedia.org/r/#/c/145997/ [23:13:32] !log dist-upgraded logstash1001 and reboot [23:13:36] Logged the message, Master [23:14:50] (03CR) 10MaxSem: [C: 032] "HATEHATEHATE" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162505 (owner: 10Dzahn) [23:14:54] Heh [23:15:03] * Reedy waits for jenkins to V: -1 [23:15:05] (03Merged) 10jenkins-bot: delete entire DolphinBrowser directory from bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162505 (owner: 10Dzahn) [23:15:43] heh :) [23:16:03] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 31 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 2, uunassigned_shards: 31, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 72, uinitializing_shards: 0, unumber_of_data_nodes: 2} [23:16:06] he killed the dolphin [23:16:14] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 31 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 31, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 72, uinitializing_shards: 0, unumber_of_data_nodes: 3} [23:17:02] !log maxsem Synchronized docroot/: he killed the dolphin (duration: 00m 06s) [23:17:04] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 1, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 100, initializing_shards: 2, number_of_data_nodes: 3 [23:17:08] Logged the message, Master [23:17:15] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 102, initializing_shards: 1, number_of_data_nodes: 3 [23:17:51] bd808: looks like 1.3 is a bit smarter [23:17:59] That only took a few minutes to catch up after a restart [23:18:11] started with 17 unassigned, but then all gone [23:18:21] s/a bit/a lot/ [23:19:03] Reedy: Cool. Thanks for babysitting that all day. :) [23:20:10] (03PS1) 10Dzahn: remove linne from site.pp - decom [puppet] - 10https://gerrit.wikimedia.org/r/163762 [23:20:32] (03PS2) 10Dzahn: remove linne from site.pp/dsh/dhcp [puppet] - 10https://gerrit.wikimedia.org/r/163316 [23:20:58] (03Abandoned) 10Dzahn: remove linne from site.pp - decom [puppet] - 10https://gerrit.wikimedia.org/r/163762 (owner: 10Dzahn) [23:21:28] bd808: Were the logstash debs imported? [23:21:36] (03PS3) 10Dzahn: remove linne from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/163316 [23:22:14] Reedy: Yeah. The one and only deb we have was pulled in paravoid in like January or somehting [23:22:24] They're upto 1.3.3 [23:22:29] * Reedy looks for 1.2 point releases [23:22:40] uh, 1.4.2 even [23:23:10] hmm, no newer point releases [23:23:14] I think we should upgrade in beta to the latest and if it works there for a few day ask ops to pull in the latest upstream [23:23:22] right [23:23:23] !log maxsem Started scap: SWATting a bunch of stuff [23:23:26] PROBLEM - BGP status on cr2-eqiad is CRITICAL: Return code of -1 is out of bounds [23:23:28] Logged the message, Master [23:23:33] I'm just thinking around the upgrade thing [23:23:35] I've had that on my list of things to do for months now :( [23:23:36] PROBLEM - swift-object-server on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:23:36] PROBLEM - swift-account-reaper on ms-be2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:23:37] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: Puppet has 17 failures [23:23:37] PROBLEM - swift-account-replicator on ms-be2005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:23:42] bd808: is it on kanban? [23:23:46] PROBLEM - swift-account-replicator on ms-be2004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:23:46] PROBLEM - swift-account-auditor on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:23:47] PROBLEM - swift-object-replicator on ms-be2012 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:23:57] PROBLEM - swift-account-reaper on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:23:57] PROBLEM - BGP status on csw1-esams is CRITICAL: Return code of -1 is out of bounds [23:23:57] PROBLEM - swift-object-replicator on ms-be2011 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:23:57] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: Puppet has 17 failures [23:23:57] PROBLEM - swift-object-server on ms-be2012 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:24:00] There's really just 14.04 and logstash upgrades we need to worry about I think [23:24:04] (03CR) 10Dzahn: [C: 032] "both NTP and url-downloader have been moved" [puppet] - 10https://gerrit.wikimedia.org/r/163316 (owner: 10Dzahn) [23:24:07] PROBLEM - BGP status on csw2-esams is CRITICAL: Return code of -1 is out of bounds [23:24:07] PROBLEM - swift-account-replicator on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:24:07] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: Puppet has 17 failures [23:24:07] PROBLEM - swift-object-server on ms-be2011 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:24:07] PROBLEM - swift-account-auditor on ms-be2012 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:24:19] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 10.65.0.1, interfaces up: 33, down: 3, dormant: 0, excluded: 0, unused: 0BRfe-0/0/3: down - BRfe-0/0/5: down - testBRfe-0/0/7: down - Layer42 OOB linkBR [23:24:27] PROBLEM - swift-account-auditor on ms-be2011 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:24:27] PROBLEM - swift-account-reaper on ms-be2012 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:24:27] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: Puppet has 12 failures [23:24:28] PROBLEM - Corp OIT LDAP Mirror on plutonium is CRITICAL: Could not bind to the LDAP server [23:24:37] eh.. this one isnt good [23:24:38] RECOVERY - swift-account-reaper on ms-be2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:24:38] PROBLEM - swift-account-replicator on ms-be2012 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:24:39] 16:26 <+icinga-wm> PROBLEM - Corp OIT LDAP Mirror on plutonium is CRITICAL: Could not bind to the LDAP server [23:24:40] PROBLEM - swift-account-reaper on ms-be2011 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:24:40] PROBLEM - swift-object-replicator on ms-be2008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:24:40] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet has 12 failures [23:24:47] RECOVERY - swift-account-replicator on ms-be2004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:24:47] PROBLEM - swift-account-replicator on ms-be2011 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:24:48] PROBLEM - swift-object-replicator on ms-be2007 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:24:48] PROBLEM - swift-object-server on ms-be2008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:24:48] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: Puppet has 17 failures [23:25:09] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: Puppet has 17 failures [23:25:09] PROBLEM - swift-account-auditor on ms-be2008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:25:09] PROBLEM - swift-object-server on ms-be2007 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:25:10] PROBLEM - swift-object-replicator on ms-be2005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:25:16] Reedy: OMG, look it was on my kanban board (since January 9th) -- https://github.com/bd808/wmf-kanban/issues/39 [23:25:16] PROBLEM - BGP status on cr1-eqiad is CRITICAL: Return code of -1 is out of bounds [23:25:19] PROBLEM - swift-object-server on ms-be2005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:25:19] PROBLEM - swift-account-auditor on ms-be2007 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:25:19] PROBLEM - swift-account-reaper on ms-be2008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:25:19] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: Puppet has 17 failures [23:25:24] bd808: lmfao [23:25:26] PROBLEM - swift-account-auditor on ms-be2005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:25:26] PROBLEM - swift-account-reaper on ms-be2007 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:25:27] PROBLEM - swift-account-replicator on ms-be2008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:25:27] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: Puppet has 12 failures [23:25:36] PROBLEM - BGP status on cr1-esams is CRITICAL: Return code of -1 is out of bounds [23:25:37] PROBLEM - swift-object-replicator on ms-be2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:25:37] PROBLEM - swift-account-reaper on ms-be2005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:25:37] PROBLEM - swift-account-replicator on ms-be2007 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:25:37] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: Puppet has 12 failures [23:26:05] i don't know if its related, and i think you all don't maintain beta, but if it matters jenkins just spit this out: 23:24:54 Error: 13 database or disk is full [23:26:48] ebernhardson: Which host? It would be at the top of the full report. [23:26:52] !log disabling puppet on virt0 so I can kill off services one by one... [23:27:36] RECOVERY - swift-account-reaper on ms-be2011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:27:37] !log /var/lib/jenkins-slave/tmpfs 100% full on lanthanum.eqiad.wmnet [23:27:56] RECOVERY - swift-account-replicator on ms-be2011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:27:59] bd808: not seeing which, its here: https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/29831/console [23:28:08] RECOVERY - swift-object-replicator on ms-be2011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:28:17] RECOVERY - swift-object-server on ms-be2011 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:28:17] RECOVERY - swift-account-auditor on ms-be2011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:28:21] !log linne - disable icinga, revoking puppet cert, salt key,..stopping services [23:28:22] ebernhardson: "Building remotely on lanthanum" [23:28:26] !log stopped puppetmaster on virt0 [23:28:31] !log stopped nova-scheduler on virt0 [23:28:34] wow i totally read over that [23:28:36] !log stopped keystone on virt0 [23:28:37] oh, it's jenkins [23:28:38] bd808: well know i know! thanks [23:29:07] RECOVERY - swift-object-server on ms-be2012 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:29:07] RECOVERY - swift-account-auditor on ms-be2008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:29:16] RECOVERY - swift-account-auditor on ms-be2012 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:29:16] RECOVERY - swift-account-reaper on ms-be2008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:29:17] RECOVERY - swift-account-reaper on ms-be2012 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:29:25] !log stopped apache on virt0 [23:29:26] RECOVERY - swift-account-replicator on ms-be2008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:29:26] RECOVERY - swift-account-auditor on ms-be2005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:29:41] RECOVERY - swift-account-replicator on ms-be2005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:29:41] RECOVERY - swift-account-replicator on ms-be2012 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:29:42] RECOVERY - swift-object-replicator on ms-be2008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:29:46] RECOVERY - swift-object-replicator on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:29:46] RECOVERY - swift-account-reaper on ms-be2005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:29:47] RECOVERY - swift-account-auditor on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:29:47] RECOVERY - swift-object-replicator on ms-be2012 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:29:50] morebots, you there? [23:29:56] RECOVERY - swift-object-server on ms-be2008 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:30:03] andrewbogott: should be logged it's just not responding [23:30:06] for some reason [23:30:09] hm [23:30:13] RECOVERY - swift-account-reaper on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:30:14] RECOVERY - swift-object-replicator on ms-be2005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:30:14] RECOVERY - swift-account-replicator on ms-be2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:30:26] RECOVERY - swift-object-server on ms-be2005 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:30:28] RECOVERY - swift-account-reaper on ms-be2007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [23:30:32] hmm, not quite [23:30:37] 23:26 andrewbogott: disabling puppet on virt0 so I can kill off services one by one... [23:30:38] RECOVERY - swift-object-server on ms-be2001 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:30:46] RECOVERY - swift-account-replicator on ms-be2007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:30:47] yeah, but not the ones after that [23:30:48] !log lanthanum tmpfs filled up again, purged manually (bug 71128) [23:30:57] RECOVERY - swift-object-replicator on ms-be2007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [23:31:07] RECOVERY - swift-object-server on ms-be2007 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [23:31:16] RECOVERY - swift-account-auditor on ms-be2007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [23:31:46] RECOVERY - CI tmpfs disk space on lanthanum is OK: DISK OK [23:31:46] PROBLEM - HTTP on virt0 is CRITICAL: Connection refused [23:31:56] RECOVERY - Disk space on lanthanum is OK: DISK OK [23:31:57] morebots, doing better? [23:31:58] I am a logbot running on tools-exec-14. [23:31:58] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [23:31:58] To log a message, type !log . [23:32:07] PROBLEM - puppetmaster https on virt0 is CRITICAL: Connection refused [23:32:08] Thanks Krinkle. I was just trying to figure out if that was safe to do. [23:32:16] !log stopped apache, nova-scheduler, keystone, puppetmaster on virt0 [23:32:22] Logged the message, Master [23:32:22] bd808: it'll break any running jobs, but nothing else we can do right now. [23:32:57] bd808: ssh lanthanum; cd /var/lib/jenkins/tmpfs; ll; rm -rf *@* mwext-*; ll [23:33:00] that tends to do it [23:33:13] sudo -su jenkins-slave after the cd [23:33:53] * Krinkle comments on bug [23:34:28] hashar gave me access to lanthanum quite a while ago, but I've never poked around there really [23:35:46] (03PS2) 10Alexandros Kosiaris: openldap: fix sambaNTpassword aci [puppet] - 10https://gerrit.wikimedia.org/r/163758 [23:36:18] hey guys, can you please fix perms for /srv/mediawiki/docroot on terbium? [23:38:48] Ew [23:39:20] MaxSem: Then rsync will delete it [23:39:29] it seems it *only* exists on terbium [23:39:46] it's needed [23:39:51] Right [23:39:52] serves noc [23:40:12] should be redone not ot spam errors into scap, them:) [23:40:13] should it be in mediawiki-config and staged from tin then? [23:40:55] $confdir = '/usr/local/apache/common/wmf-config/'; [23:40:55] $s3file = '/usr/local/apache/common/s3.dblist'; [23:40:56] MaxSem: I hate to do this to you, but I messed up [23:40:57] proooly not;) [23:40:57] dunno, it already existed on terbium.. didn't even copy from fenari, just used it [23:40:57] * Reedy blinks [23:41:05] Can you sync me fixing my stupid error? [23:41:13] (03CR) 10BryanDavis: "We will need to make sure that these messages get into logstash and ideally are also viewable with the fatalmonitor script that people are" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163680 (owner: 10Ori.livneh) [23:41:18] sure, as soon as scap ends [23:41:39] Reedy: /usr/local/apache = old = before ori change [23:41:43] Right [23:41:50] mutante: I guess it could be a submodule, apart from it's part of the larger operations-software repo [23:42:07] !log maxsem Finished scap: SWATting a bunch of stuff (duration: 18m 44s) [23:42:12] Logged the message, Master [23:42:13] IMHO: copy to tin, fix permissions, check into mediawiki-config, rm -rf it on terbium, sync-common [23:42:22] marktraceur, what to deploy? [23:42:47] Should be similar [23:43:01] I accidentally tracked the extension submodule on 1.24wmf1 instead of 1.25wmf1 [23:43:09] Thought I fixed it, didn't, now I feel dumb [23:43:42] dafuuuq? PHP Fatal error: Cannot use object of type stdClass as array in /srv/mediawiki/php-1.25wmf1/includes/specialpage/SpecialPageFactory.php on line 281 [23:44:02] Reedy, ^ [23:44:26] (03PS1) 10Reedy: Add noc/dbtree [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163768 [23:46:04] (03PS1) 10Reedy: Swap to /srv/mediawiki [software] - 10https://gerrit.wikimedia.org/r/163769 [23:46:48] (03PS1) 10Alexandros Kosiaris: icinga checks LDAP v3 protocol compatibility [puppet] - 10https://gerrit.wikimedia.org/r/163770 [23:47:23] (03PS1) 10Reedy: Swap dbtree to /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163772 [23:47:33] MaxSem: For the record in here, https://gerrit.wikimedia.org/r/163771 is the new change [23:48:16] mutante: could you delete dbtree on terbium if I merge those? [23:48:51] (03CR) 10Alexandros Kosiaris: [C: 032] icinga checks LDAP v3 protocol compatibility [puppet] - 10https://gerrit.wikimedia.org/r/163770 (owner: 10Alexandros Kosiaris) [23:49:08] (03PS1) 10Kaldari: Turn on WikiGrok on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163773 [23:50:08] (03CR) 10Reedy: [C: 032] Add noc/dbtree [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163768 (owner: 10Reedy) [23:50:08] !log maxsem Synchronized php-1.25wmf1/extensions/MultimediaViewer/: second try... (duration: 00m 04s) [23:50:14] Logged the message, Master [23:50:14] (03Merged) 10jenkins-bot: Add noc/dbtree [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163768 (owner: 10Reedy) [23:50:16] (03CR) 10Reedy: [C: 032] Swap dbtree to /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163772 (owner: 10Reedy) [23:50:22] (03Merged) 10jenkins-bot: Swap dbtree to /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163772 (owner: 10Reedy) [23:50:23] marktraceur, ^^^ [23:50:27] KK [23:50:32] I'll try it out [23:51:31] (03PS2) 10MaxSem: Remove live-1.5 and skins-1.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162520 [23:52:40] (03CR) 10MaxSem: [C: 032] Turn on WikiGrok on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163773 (owner: 10Kaldari) [23:55:00] Looks OK, MaxSem - l10nbot will fix the message issues we're seeing [23:56:54] (03PS3) 10MaxSem: Remove live-1.5 and skins-1.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162520 [23:57:33] (03CR) 10MaxSem: [C: 032] Remove live-1.5 and skins-1.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162520 (owner: 10MaxSem) [23:57:49] (03PS2) 10Kaldari: Turn on WikiGrok on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163773 [23:58:01] (03Merged) 10jenkins-bot: Remove live-1.5 and skins-1.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162520 (owner: 10MaxSem) [23:58:11] (03CR) 10MaxSem: [C: 032] Turn on WikiGrok on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163773 (owner: 10Kaldari) [23:58:17] (03Merged) 10jenkins-bot: Turn on WikiGrok on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/163773 (owner: 10Kaldari) [23:58:54] !log maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/163773/ (duration: 00m 03s) [23:59:00] Logged the message, Master [23:59:26] !log maxsem Synchronized w: https://gerrit.wikimedia.org/r/#/c/162520/ (duration: 00m 03s) [23:59:32] Logged the message, Master