[00:12:36] RECOVERY - Puppet errors on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [00:13:04] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [00:13:26] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [00:13:34] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [00:13:37] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [00:13:56] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [00:13:56] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [00:14:01] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [00:14:15] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [00:14:21] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [00:14:37] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [00:14:43] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1425 is OK: OK: Less than 1.00% above the threshold [0.0] [00:14:45] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1422 is OK: OK: Less than 1.00% above the threshold [0.0] [00:14:57] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1426 is OK: OK: Less than 1.00% above the threshold [0.0] [00:14:59] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:01] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:02] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:06] RECOVERY - Puppet errors on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:06] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:18] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1421 is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:36] RECOVERY - Puppet errors on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:40] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:48] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [00:16:14] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1424 is OK: OK: Less than 1.00% above the threshold [0.0] [00:16:49] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [00:16:53] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [00:17:21] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [00:17:31] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [00:18:11] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1427 is OK: OK: Less than 1.00% above the threshold [0.0] [00:22:27] RECOVERY - Free space - all mounts on tools-bastion-02 is OK: OK: tools.tools-bastion-02.diskspace._public_dumps.byte_percentfree (No valid datapoints found) [00:23:27] RECOVERY - Puppet errors on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:28:46] RECOVERY - Puppet errors on tools-elastic-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:28:46] RECOVERY - Puppet errors on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:28:46] RECOVERY - Puppet errors on tools-docker-registry-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:29:02] RECOVERY - Puppet errors on tools-elastic-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:30:45] RECOVERY - Puppet errors on tools-elastic-03 is OK: OK: Less than 1.00% above the threshold [0.0] [00:35:10] RECOVERY - Puppet errors on tools-worker-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [00:35:20] RECOVERY - Puppet errors on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [00:35:26] RECOVERY - Puppet errors on tools-worker-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [00:36:02] RECOVERY - Puppet errors on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [00:36:25] RECOVERY - Puppet errors on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [00:36:38] RECOVERY - Puppet errors on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [0.0] [00:36:58] RECOVERY - Puppet errors on tools-worker-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [00:37:20] RECOVERY - Puppet errors on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [00:37:43] RECOVERY - Puppet errors on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:31] RECOVERY - Puppet errors on tools-k8s-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:41] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:41] RECOVERY - Puppet errors on tools-k8s-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:45] RECOVERY - Puppet errors on tools-worker-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:49] RECOVERY - Puppet errors on tools-worker-1026 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:51] RECOVERY - Puppet errors on tools-worker-1025 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:51] RECOVERY - Puppet errors on tools-worker-1023 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:57] RECOVERY - Puppet errors on tools-k8s-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:38:57] RECOVERY - Puppet errors on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [00:39:01] RECOVERY - Puppet errors on tools-worker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [00:39:07] RECOVERY - Puppet errors on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [00:39:11] RECOVERY - Puppet errors on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:39:12] RECOVERY - Puppet errors on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [00:39:12] RECOVERY - Puppet errors on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [00:39:40] RECOVERY - Puppet errors on tools-worker-1022 is OK: OK: Less than 1.00% above the threshold [0.0] [00:39:50] RECOVERY - Puppet errors on tools-worker-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [00:40:16] RECOVERY - Puppet errors on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [00:41:10] RECOVERY - Puppet errors on tools-flannel-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:41:16] RECOVERY - Puppet errors on tools-flannel-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [00:43:33] PROBLEM - Puppet errors on tools-exec-1405 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [00:45:33] RECOVERY - Puppet errors on tools-flannel-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:48:23] RECOVERY - Puppet errors on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:50:22] RECOVERY - Puppet errors on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:56:55] RECOVERY - Puppet errors on tools-docker-registry-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:58:35] RECOVERY - Puppet errors on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [01:00:17] PROBLEM - Puppet errors on tools-puppetmaster-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:00:19] RECOVERY - Puppet errors on tools-redis-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [01:01:52] RECOVERY - Puppet errors on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0] [01:03:38] RECOVERY - Puppet errors on tools-static-10 is OK: OK: Less than 1.00% above the threshold [0.0] [01:03:40] RECOVERY - Puppet errors on tools-grid-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [01:03:58] RECOVERY - Puppet errors on tools-redis-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [01:04:02] RECOVERY - Puppet errors on tools-prometheus-02 is OK: OK: Less than 1.00% above the threshold [0.0] [01:05:13] RECOVERY - Puppet errors on tools-prometheus-01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:07:49] RECOVERY - Puppet errors on tools-static-11 is OK: OK: Less than 1.00% above the threshold [0.0] [01:08:41] RECOVERY - Puppet errors on tools-paws-worker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [01:09:03] RECOVERY - Puppet errors on tools-paws-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:18:31] RECOVERY - Puppet errors on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [01:21:49] RECOVERY - Puppet errors on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [01:25:48] PROBLEM - Puppet errors on tools-worker-1013 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:26:06] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [01:26:14] PROBLEM - Puppet errors on tools-exec-1431 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:26:22] PROBLEM - Puppet errors on tools-proxy-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:26:28] PROBLEM - Puppet errors on tools-exec-1439 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:27:28] PROBLEM - Puppet errors on tools-bastion-05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:27:49] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:28:22] PROBLEM - Puppet errors on tools-services-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:28:40] PROBLEM - Puppet errors on tools-exec-1422 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:28:50] PROBLEM - Puppet errors on tools-exec-1403 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:29:04] PROBLEM - Puppet errors on tools-exec-1436 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [01:29:22] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:29:24] PROBLEM - Puppet errors on tools-exec-1408 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:29:32] PROBLEM - Puppet errors on tools-exec-1404 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:29:32] PROBLEM - Puppet errors on tools-exec-1405 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:29:37] PROBLEM - Puppet errors on tools-mail is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:29:39] PROBLEM - Puppet errors on tools-exec-1420 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:29:47] PROBLEM - Puppet errors on tools-elastic-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:29:57] PROBLEM - Puppet errors on tools-worker-1016 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:30:01] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:30:07] PROBLEM - Puppet errors on tools-exec-1440 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:30:07] PROBLEM - Puppet errors on tools-worker-1010 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:30:11] PROBLEM - Puppet errors on tools-worker-1017 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:30:21] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:30:53] PROBLEM - Puppet errors on tools-exec-1417 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:31:02] PROBLEM - Puppet errors on tools-exec-1434 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:31:02] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:31:07] PROBLEM - Puppet errors on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:31:12] PROBLEM - Puppet errors on tools-worker-1015 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:31:20] PROBLEM - Puppet errors on tools-worker-1014 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:31:21] PROBLEM - Puppet errors on tools-redis-1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:31:22] PROBLEM - Puppet errors on tools-exec-1406 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:31:42] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:31:48] PROBLEM - Puppet errors on tools-exec-1441 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:32:00] PROBLEM - Puppet errors on tools-exec-1409 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:32:16] PROBLEM - Puppet errors on tools-flannel-etcd-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:32:50] PROBLEM - Puppet errors on tools-worker-1020 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:33:04] PROBLEM - Puppet errors on tools-exec-1428 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:33:22] PROBLEM - Puppet errors on tools-worker-1019 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:33:25] PROBLEM - Puppet errors on tools-exec-1414 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:33:31] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:33:39] PROBLEM - Puppet errors on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:33:49] PROBLEM - Puppet errors on tools-static-11 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:34:29] PROBLEM - Puppet errors on tools-checker-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:34:43] PROBLEM - Puppet errors on tools-k8s-etcd-03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:34:57] PROBLEM - Puppet errors on tools-k8s-etcd-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:35:03] PROBLEM - Puppet errors on tools-prometheus-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:35:45] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1422 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:36:01] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:36:49] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:36:54] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1419 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:37:04] PROBLEM - Puppet errors on tools-exec-1442 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:37:08] PROBLEM - Puppet errors on tools-bastion-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:37:58] PROBLEM - Puppet errors on tools-exec-1423 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:38:00] PROBLEM - Puppet errors on tools-worker-1012 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:38:30] PROBLEM - Puppet errors on tools-exec-1438 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:38:32] PROBLEM - Puppet errors on tools-exec-1401 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:39:08] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1427 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:39:20] PROBLEM - Puppet errors on tools-exec-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:39:40] PROBLEM - Puppet errors on tools-grid-shadow is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:39:44] PROBLEM - Puppet errors on tools-paws-worker-1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:39:49] PROBLEM - Puppet errors on tools-docker-registry-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:39:51] PROBLEM - Puppet errors on tools-worker-1025 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:39:55] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1401 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [01:40:01] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1420 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:40:07] PROBLEM - Puppet errors on tools-exec-1427 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:40:13] PROBLEM - Puppet errors on tools-worker-1011 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:40:13] PROBLEM - Puppet errors on tools-exec-1432 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:40:15] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1415 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:40:51] PROBLEM - Puppet errors on tools-exec-1418 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:40:55] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1426 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:41:17] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1421 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:41:30] PROBLEM - Puppet errors on tools-exec-1424 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:42:11] PROBLEM - Puppet errors on tools-exec-1435 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:42:52] PROBLEM - Puppet errors on tools-grid-master is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:43:24] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1428 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:43:50] PROBLEM - Puppet errors on tools-docker-builder-05 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:44:04] PROBLEM - Puppet errors on tools-exec-1412 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:44:04] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:44:10] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:44:22] PROBLEM - Puppet errors on tools-exec-1426 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:44:26] PROBLEM - Puppet errors on tools-exec-1421 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:44:26] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:44:30] PROBLEM - Puppet errors on tools-k8s-etcd-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:44:32] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:44:34] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:44:38] PROBLEM - Puppet errors on tools-static-10 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:44:43] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:44:45] PROBLEM - Puppet errors on tools-exec-1419 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:44:45] PROBLEM - Puppet errors on tools-exec-1416 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:44:47] PROBLEM - Puppet errors on tools-cron-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:44:51] PROBLEM - Puppet errors on tools-worker-1026 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:44:57] PROBLEM - Puppet errors on tools-redis-1002 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:44:59] PROBLEM - Puppet errors on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:45:01] PROBLEM - Puppet errors on tools-paws-master-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:45:03] PROBLEM - Puppet errors on tools-worker-1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:45:11] PROBLEM - Puppet errors on tools-exec-1415 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:46:41] PROBLEM - Puppet errors on tools-exec-1402 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:47:27] PROBLEM - Puppet errors on tools-worker-1005 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:47:53] PROBLEM - Puppet errors on tools-docker-registry-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:48:19] RECOVERY - Puppet errors on tools-package-builder-01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:48:21] PROBLEM - Puppet errors on tools-exec-1430 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:48:42] PROBLEM - Puppet errors on tools-worker-1027 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:48:44] PROBLEM - Puppet errors on tools-worker-1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:49:36] PROBLEM - Puppet errors on tools-exec-1425 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:50:04] PROBLEM - Puppet errors on tools-elastic-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:50:10] PROBLEM - Puppet errors on tools-k8s-master-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:50:18] RECOVERY - Puppet errors on tools-puppetmaster-01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:50:36] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1417 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:50:40] PROBLEM - Puppet errors on tools-worker-1022 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:51:25] PROBLEM - Puppet errors on tools-worker-1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:51:33] PROBLEM - Puppet errors on tools-flannel-etcd-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:51:47] PROBLEM - Puppet errors on tools-elastic-03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:51:59] PROBLEM - Puppet errors on tools-exec-1437 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:52:37] PROBLEM - Puppet errors on tools-worker-1008 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:52:41] PROBLEM - Puppet errors on tools-exec-1429 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:52:51] PROBLEM - Puppet errors on tools-exec-1433 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:58:24] RECOVERY - Puppet errors on tools-worker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [01:58:30] RECOVERY - Puppet errors on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [02:02:49] RECOVERY - Puppet errors on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [02:03:19] RECOVERY - Puppet errors on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:03:39] RECOVERY - Puppet errors on tools-exec-1422 is OK: OK: Less than 1.00% above the threshold [0.0] [02:04:21] RECOVERY - Puppet errors on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [02:04:23] RECOVERY - Puppet errors on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [02:04:36] RECOVERY - Puppet errors on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [02:04:40] RECOVERY - Puppet errors on tools-exec-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [02:05:48] RECOVERY - Puppet errors on tools-worker-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:04] RECOVERY - Puppet errors on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:08] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:12] RECOVERY - Puppet errors on tools-exec-1431 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:20] RECOVERY - Puppet errors on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:21] RECOVERY - Puppet errors on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:22] RECOVERY - Puppet errors on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:26] RECOVERY - Puppet errors on tools-exec-1439 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:43] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [02:07:01] RECOVERY - Puppet errors on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [02:07:31] RECOVERY - Puppet errors on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [02:07:51] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [02:08:21] RECOVERY - Puppet errors on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [02:08:38] RECOVERY - Puppet errors on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [02:08:47] RECOVERY - Puppet errors on tools-static-11 is OK: OK: Less than 1.00% above the threshold [0.0] [02:08:49] RECOVERY - Puppet errors on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:04] RECOVERY - Puppet errors on tools-exec-1436 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:32] RECOVERY - Puppet errors on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:35] RECOVERY - Puppet errors on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:41] RECOVERY - Puppet errors on tools-k8s-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:47] RECOVERY - Puppet errors on tools-elastic-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:59] RECOVERY - Puppet errors on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:01] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:05] RECOVERY - Puppet errors on tools-exec-1440 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:05] RECOVERY - Puppet errors on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:11] RECOVERY - Puppet errors on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:23] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:44] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1422 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:53] RECOVERY - Puppet errors on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [02:11:01] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [02:11:01] RECOVERY - Puppet errors on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [02:11:13] RECOVERY - Puppet errors on tools-worker-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [02:11:21] RECOVERY - Puppet errors on tools-redis-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [02:11:49] RECOVERY - Puppet errors on tools-exec-1441 is OK: OK: Less than 1.00% above the threshold [0.0] [02:12:15] RECOVERY - Puppet errors on tools-flannel-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:13:02] RECOVERY - Puppet errors on tools-exec-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [02:13:26] RECOVERY - Puppet errors on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [02:13:32] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [02:14:26] RECOVERY - Puppet errors on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:14:50] RECOVERY - Puppet errors on tools-worker-1025 is OK: OK: Less than 1.00% above the threshold [0.0] [02:14:56] RECOVERY - Puppet errors on tools-k8s-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:04] RECOVERY - Puppet errors on tools-prometheus-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:06] RECOVERY - Puppet errors on tools-exec-1427 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:12] RECOVERY - Puppet errors on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:15] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:55] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1426 is OK: OK: Less than 1.00% above the threshold [0.0] [02:16:01] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [02:16:19] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1421 is OK: OK: Less than 1.00% above the threshold [0.0] [02:16:47] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [02:16:52] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [02:17:03] RECOVERY - Puppet errors on tools-exec-1442 is OK: OK: Less than 1.00% above the threshold [0.0] [02:17:07] RECOVERY - Puppet errors on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:17:14] RECOVERY - Puppet errors on tools-exec-1435 is OK: OK: Less than 1.00% above the threshold [0.0] [02:17:51] RECOVERY - Puppet errors on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0] [02:17:55] RECOVERY - Puppet errors on tools-exec-1423 is OK: OK: Less than 1.00% above the threshold [0.0] [02:18:00] RECOVERY - Puppet errors on tools-worker-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [02:18:31] RECOVERY - Puppet errors on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:05] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:07] RECOVERY - Puppet errors on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:09] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1427 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:11] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:17] RECOVERY - Puppet errors on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:23] RECOVERY - Puppet errors on tools-exec-1421 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:25] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:32] RECOVERY - Puppet errors on tools-k8s-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:40] RECOVERY - Puppet errors on tools-grid-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:40] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:42] RECOVERY - Puppet errors on tools-paws-worker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:48] RECOVERY - Puppet errors on tools-docker-registry-01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:50] RECOVERY - Puppet errors on tools-worker-1026 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:54] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:58] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [02:20:01] RECOVERY - Puppet errors on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:20:03] RECOVERY - Puppet errors on tools-paws-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:20:05] RECOVERY - Puppet errors on tools-worker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [02:20:13] RECOVERY - Puppet errors on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [02:20:54] RECOVERY - Puppet errors on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [02:21:30] RECOVERY - Puppet errors on tools-exec-1424 is OK: OK: Less than 1.00% above the threshold [0.0] [02:23:24] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [02:23:42] RECOVERY - Puppet errors on tools-worker-1027 is OK: OK: Less than 1.00% above the threshold [0.0] [02:23:46] RECOVERY - Puppet errors on tools-docker-builder-05 is OK: OK: Less than 1.00% above the threshold [0.0] [02:24:20] RECOVERY - Puppet errors on tools-exec-1426 is OK: OK: Less than 1.00% above the threshold [0.0] [02:24:32] RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:24:34] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [02:24:36] RECOVERY - Puppet errors on tools-static-10 is OK: OK: Less than 1.00% above the threshold [0.0] [02:24:42] RECOVERY - Puppet errors on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [02:24:44] RECOVERY - Puppet errors on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [02:24:46] RECOVERY - Puppet errors on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:24:59] RECOVERY - Puppet errors on tools-redis-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [02:25:01] RECOVERY - Puppet errors on tools-elastic-01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:25:11] RECOVERY - Puppet errors on tools-exec-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [02:25:11] RECOVERY - Puppet errors on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:25:35] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [02:25:39] RECOVERY - Puppet errors on tools-worker-1022 is OK: OK: Less than 1.00% above the threshold [0.0] [02:26:27] RECOVERY - Puppet errors on tools-worker-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [02:26:33] RECOVERY - Puppet errors on tools-flannel-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:26:41] RECOVERY - Puppet errors on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [02:26:47] RECOVERY - Puppet errors on tools-elastic-03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:27:23] RECOVERY - Puppet errors on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [02:27:38] RECOVERY - Puppet errors on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [0.0] [02:27:40] RECOVERY - Puppet errors on tools-exec-1429 is OK: OK: Less than 1.00% above the threshold [0.0] [02:27:50] RECOVERY - Puppet errors on tools-exec-1433 is OK: OK: Less than 1.00% above the threshold [0.0] [02:27:56] RECOVERY - Puppet errors on tools-docker-registry-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:28:20] RECOVERY - Puppet errors on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0] [02:28:46] RECOVERY - Puppet errors on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [02:29:36] RECOVERY - Puppet errors on tools-exec-1425 is OK: OK: Less than 1.00% above the threshold [0.0] [02:31:57] RECOVERY - Puppet errors on tools-exec-1437 is OK: OK: Less than 1.00% above the threshold [0.0] [02:37:04] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Create an XTools logo - https://phabricator.wikimedia.org/T167345#3384955 (10MusikAnimal) Here's what I've got now, using the xtools5.svg logo (I also favour the Overpass font), dropping the icons, and using a horizontal layout: {F8540296} Currently live at h... [02:45:30] PROBLEM - Puppet errors on tools-exec-1405 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [03:15:31] RECOVERY - Puppet errors on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [03:28:43] RECOVERY - Puppet errors on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:29:59] RECOVERY - Puppet errors on tools-logs-02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:30:35] RECOVERY - Puppet errors on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:32:22] RECOVERY - Puppet errors on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [0.0] [05:56:42] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Create an XTools logo - https://phabricator.wikimedia.org/T167345#3385065 (10kaldari) 05Open>03Resolved Now that looks beautiful! [06:32:39] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:32:49] PROBLEM - Puppet errors on tools-exec-1441 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:34:25] PROBLEM - Puppet errors on tools-exec-1414 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:36:53] PROBLEM - Puppet errors on tools-exec-1417 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [06:51:01] PROBLEM - Puppet errors on tools-elastic-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:07:39] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [07:11:56] RECOVERY - Puppet errors on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [07:12:48] RECOVERY - Puppet errors on tools-exec-1441 is OK: OK: Less than 1.00% above the threshold [0.0] [07:14:25] RECOVERY - Puppet errors on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [07:26:01] RECOVERY - Puppet errors on tools-elastic-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:45:37] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [08:30:34] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [08:30:37] Heads up. Someone posted on mediawiki-api complaining (roughly) that something is wrong with the cl_from index on the dewiki.labsdb replica - https://lists.wikimedia.org/pipermail/mediawiki-api/2017-June/004014.html [08:50:10] 10Labs, 10Labs-Infrastructure, 10cloud-services-team: SQL dewiki_p categorylinks cl_from index missing - https://phabricator.wikimedia.org/T169038#3385391 (10Bawolff) For reference, the thread is https://lists.wikimedia.org/pipermail/mediawiki-api/2017-June/004014.html tl;dr: When I do `SELECT * from catego... [08:51:20] 10Labs, 10Labs-Infrastructure, 10cloud-services-team: SQL dewiki_p categorylinks cl_from index missing - https://phabricator.wikimedia.org/T169038#3385396 (10Bawolff) [09:05:38] 10Labs, 10Tool-Labs: MediaWiki instance on Tool Labs used as a spam relay - https://phabricator.wikimedia.org/T169040#3385462 (10ashley) [09:08:50] 10Labs, 10Tool-Labs: MediaWiki instance on Tool Labs used as a spam relay - https://phabricator.wikimedia.org/T169040#3385488 (10Legoktm) p:05Triage>03High https://tools.wmflabs.org/?tool=comprende says @Magnus owns the tool. [09:11:29] 10Labs, 10Tool-Labs: MediaWiki instance on Tool Labs used as a spam relay - https://phabricator.wikimedia.org/T169040#3385507 (10Legoktm) Also @ashley said that spambots are spamming links to this wiki on other independent wikis, so it would be good to get this cleaned up or disabled quickly :/ [09:29:07] 10Labs, 10Tool-Labs: MediaWiki instance on Tool Labs used as a spam relay - https://phabricator.wikimedia.org/T169040#3385551 (10Magnus) Running cleanup now, will lock afterwards. [10:29:12] 10Labs, 10Tool-Labs: MediaWiki instance on Tool Labs used as a spam relay - https://phabricator.wikimedia.org/T169040#3385791 (10Peachey88) a:03Magnus [10:45:17] 10Tool-Labs-tools-Other: Lighttpd for tools.para webservice crashes on startup - https://phabricator.wikimedia.org/T169022#3385805 (10Para) 05Open>03Resolved a:03Para Commented it out. The logs show the error since March 2017. [12:36:15] 10Labs, 10Labs-Infrastructure, 10Scoring-platform-team-Backlog: Keep wmflabs scoring boxes up-to-date - https://phabricator.wikimedia.org/T168478#3386243 (10faidon) [12:46:36] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:26:36] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [13:34:18] 10Labs, 10Labs-Infrastructure, 10cloud-services-team: SQL dewiki_p categorylinks cl_from index missing - https://phabricator.wikimedia.org/T169038#3386451 (10chasemp) p:05High>03Normal I spoke with the #DBA crew a bit today and this seems to be a real issue but not high priority at the moment [13:39:33] 10Labs, 10Labs-Infrastructure, 10cloud-services-team, 10DBA: SQL dewiki_p categorylinks cl_from index missing - https://phabricator.wikimedia.org/T169038#3386490 (10jcrespo) Without looking too much in depth, there seems to be some unexpected query plan- there is not much we can do about it- except maybe r... [13:44:12] PROBLEM - Puppet staleness on tools-puppetmaster-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0] [13:52:18] 10Labs, 10Tool-Labs: MediaWiki instance on Tool Labs used as a spam relay - https://phabricator.wikimedia.org/T169040#3385462 (10chasemp) @magnus, thank you as always, let us know if we can help -- #cloud-services-team [14:04:58] 10Tool-Labs-tools-Other: MediaWiki instance on Tool Labs used as a spam relay - https://phabricator.wikimedia.org/T169040#3386752 (10zhuyifei1999) [14:08:01] 10Labs, 10Labs-Infrastructure, 10cloud-services-team, 10DBA: SQL dewiki_p categorylinks cl_from index missing - https://phabricator.wikimedia.org/T169038#3386781 (10Bawolff) >>! In T169038#3386490, @jcrespo wrote: > Without looking too much in depth, there seems to be some unexpected query plan- there is n... [14:36:21] PROBLEM - Puppet errors on tools-puppetmaster-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [14:39:39] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad, and 2 others: rack/setup/install labvirt101[5-8] - https://phabricator.wikimedia.org/T165531#3386956 (10Cmjohnson) [14:40:42] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad, and 2 others: rack/setup/install labvirt101[5-8] - https://phabricator.wikimedia.org/T165531#3269143 (10Cmjohnson) The second ethernet ports are cabled, cleared of the current vlan as far as I can tell. They need to be added to lab-instances. @faid... [14:58:00] 10Labs, 10Labs-Infrastructure, 10Tool-Labs: Locate an alternative source for modern Openstack Debian packages - https://phabricator.wikimedia.org/T169099#3387023 (10Andrew) [14:58:08] 10Labs, 10Labs-Infrastructure, 10Tool-Labs: Locate an alternative source for modern Openstack Debian packages - https://phabricator.wikimedia.org/T169099#3387036 (10Andrew) p:05Triage>03High [14:58:20] 10Labs, 10Labs-Infrastructure, 10Tool-Labs, 10cloud-services-team (Kanban): Locate an alternative source for modern Openstack Debian packages - https://phabricator.wikimedia.org/T169099#3387023 (10Andrew) [15:00:27] 10Labs, 10Labs-Infrastructure, 10Operations, 10procurement: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3387058 (10Papaul) [15:06:10] 10Labs, 10Labs-Infrastructure, 10cloud-services-team, 10DBA: SQL dewiki_p categorylinks cl_from index missing - https://phabricator.wikimedia.org/T169038#3387086 (10jcrespo) p:05Normal>03High Ok, sorry, I understand now (the initial email with the complex join confused me)- I can see now that the host... [15:06:16] RECOVERY - Puppet errors on tools-puppetmaster-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:08:23] 10Labs, 10Labs-Infrastructure, 10Operations, 10procurement: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3387089 (10Papaul) Rack location Row C rack C1 [15:11:50] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3387106 (10Papaul) Rack location Row B rack B8 [15:16:50] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad: rack/setup/install labpuppetmaster100[12].wikimedia.org - https://phabricator.wikimedia.org/T167905#3387165 (10Cmjohnson) [15:17:00] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad: rack/setup/install labpuppetmaster100[12].wikimedia.org - https://phabricator.wikimedia.org/T167905#3349071 (10Cmjohnson) network ports are setup [15:17:38] 10Labs, 10Labs-Infrastructure, 10cloud-services-team, 10DBA: SQL dewiki_p categorylinks cl_from index missing - https://phabricator.wikimedia.org/T169038#3385371 (10jcrespo) Probably related to T166207 (alters sometimes timeout due to excesive load on labsdb* hosts). [15:18:23] PROBLEM - Puppet errors on tools-worker-1005 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:19:09] RECOVERY - Puppet staleness on tools-puppetmaster-02 is OK: OK: Less than 1.00% above the threshold [3600.0] [15:26:34] 10Labs, 10Labs-Infrastructure, 10Operations, 10procurement: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3387217 (10chasemp) a:05chasemp>03RobH >>! In T168894#3379828, @RobH wrote: > @Chasemp: Please review my racking and vlan/IP proposal above and co... [15:28:14] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3387233 (10chasemp) a:05chasemp>03Papaul >>! In T168893#3379863, @RobH wrote: > @Chasemp: Please review my racking and vlan/IP proposal above and c... [15:28:51] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3387235 (10chasemp) a:05chasemp>03Papaul >>! In T168892#3379861, @RobH wrote: > @Chasemp: Please review my racking and vlan/IP proposal above and c... [15:30:29] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3387241 (10chasemp) >>! In T168891#3379859, @RobH wrote: > @Chasemp: Please review my racking and vlan/IP proposal above and confirm or correct. Once that... [15:30:41] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3387242 (10chasemp) a:05chasemp>03Papaul [15:31:50] 10Labs, 10Labs-Infrastructure, 10Operations: rack/setup/install labtestneutron2002 - https://phabricator.wikimedia.org/T167160#3387248 (10chasemp) [15:32:12] 10Labs, 10Labs-Infrastructure, 10Operations: rack/setup/install labtestneutron2002 - https://phabricator.wikimedia.org/T167160#3319508 (10chasemp) 05Open>03Resolved I am going to close this, consider it done :) I am handling configuration here in other tasks as it's ongoing and initial thank you @robh... [15:32:50] 10Labs, 10Labs-Infrastructure, 10Operations: rack/setup/install labtestnet2002 - https://phabricator.wikimedia.org/T167159#3387252 (10chasemp) 05Open>03Resolved I am going to close this, consider it done :) I am handling configuration here in other tasks as it's ongoing and initial thank you @RobH and @... [15:50:00] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3387313 (10Papaul) Rack location Row C rack C1 [15:50:56] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3387315 (10Papaul) Rack location Row D rack D1 [15:53:25] RECOVERY - Puppet errors on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:14] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3387322 (10Papaul) [15:55:03] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3387333 (10Papaul) [15:57:35] 10Labs, 10Labs-Infrastructure, 10Operations, 10procurement: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3387344 (10RobH) a:05RobH>03Papaul [16:23:02] 10Labs, 10Labs-Infrastructure, 10cloud-services-team, 10DBA: SQL dewiki_p categorylinks cl_from index missing - https://phabricator.wikimedia.org/T169038#3387468 (10jcrespo) 05Open>03Resolved a:03jcrespo This took less time than expected, but should now be fixed: ``` MariaDB [dewiki]> EXPLAIN SELECT... [16:23:22] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad: rack/setup/install labcontrol100[34] - https://phabricator.wikimedia.org/T165781#3387471 (10Cmjohnson) [16:23:42] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad: rack/setup/install labcontrol100[34] - https://phabricator.wikimedia.org/T165781#3276700 (10Cmjohnson) @robh network ports updated and vlans set [16:27:36] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:29:58] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad: rack/setup/install labcontrol100[34] - https://phabricator.wikimedia.org/T165781#3387522 (10RobH) a:05Cmjohnson>03RobH [16:45:38] 10Labs, 10Labs-Infrastructure, 10Operations, 10procurement: rack/setup/install labtestcontrol2003.wikimedia.org - https://phabricator.wikimedia.org/T168894#3387588 (10Papaul) [16:46:04] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestservices2003.wikimedia.org - https://phabricator.wikimedia.org/T168893#3387589 (10Papaul) [16:46:28] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestservices2002.wikimedia.org - https://phabricator.wikimedia.org/T168892#3387590 (10Papaul) [16:48:02] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3387596 (10Papaul) [16:48:40] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-codfw: rack/setup/install labtestmetal2001.codfw.wmnet - https://phabricator.wikimedia.org/T168891#3379777 (10Papaul) @chasemp ok will do. [16:49:00] 10Labs: Metrics from WikiFactMine labs project have disappeared from graphite - https://phabricator.wikimedia.org/T169118#3387598 (10Tarrow) [16:49:17] 10Labs: Metrics from WikiFactMine labs project have disappeared from graphite - https://phabricator.wikimedia.org/T169118#3387612 (10Tarrow) [16:57:36] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [17:25:42] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Create an XTools logo - https://phabricator.wikimedia.org/T167345#3387741 (10MusikAnimal) >>! In T167345#3385065, @kaldari wrote: > Now that looks beautiful! Awesome :) Now we just need to come up with a slogan (or brief description) for XTools. Any ideas? Or... [17:54:50] !log ores-staging deleting mediawiki-ores and hashing-vector instances (unused) [17:54:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores-staging/SAL [17:55:05] !log ores deleting mediawiki-ores and hashing-vector instances from ores-staging (unused) [17:55:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [18:06:46] question: I am running wdqs test server on a virtual server and it has /srv of 160G. It is rapidly becoming too small, due to wikidata growing. Is there any setup with bigger space? Should I ask for real hardware instead? do we even have non-production real hardware setups? [18:23:40] (03Draft1) 10Paladox: set interval to 40m for notifications [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361920 [18:23:42] (03PS2) 10Paladox: set interval to 40m for notifications [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361920 [18:27:05] (03CR) 10Awight: [C: 032] "Great, thank you for adjusting this!" [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361920 (owner: 10Paladox) [18:28:01] (03CR) 10Paladox: [V: 032] set interval to 40m for notifications [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361920 (owner: 10Paladox) [18:44:05] (03Draft1) 10Paladox: Add some more ores instances to monitor [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361925 [18:44:07] (03PS2) 10Paladox: Add some more ores instances to monitor [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361925 [18:46:32] (03PS3) 10Paladox: Add some more ores instances to monitor [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361925 [18:49:29] (03CR) 10Paladox: [V: 032 C: 032] Add some more ores instances to monitor [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361925 (owner: 10Paladox) [18:50:37] (03CR) 10Awight: Add some more ores instances to monitor (031 comment) [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361925 (owner: 10Paladox) [18:51:03] (03CR) 10Paladox: Add some more ores instances to monitor (031 comment) [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361925 (owner: 10Paladox) [18:58:06] SMalyshev: I only have 2 minutes between meetings but to not leave you hanging I'll give the quick and dirty answer, it's not common for cloud tenants to exceed the storage and we are really having the discussion about how to handle a few larger storage use cases ongoing. storage is the most contentious resource primarily because everyone wants it. Depending on what you want, 200G? 300G? maybe we could work somethin [18:58:06] g out but we have a fixed budget to buy hardware that is done upfront for the fiscal and so that translates into finite allocation of resources. Not to put you off, but a task with details and especially a specific ask is the only way to go anywhere. We don't do real hardware in labs and anyting like that is a ways off. [19:02:57] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labcontrol100[34] - https://phabricator.wikimedia.org/T165781#3388067 (10RobH) So I have these setup and loading into the installer, but grub fails: > Jun 28 18:53:20 grub-installer: info: Installing grub... [19:03:10] 10Labs, 10Labs-Infrastructure, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labcontrol100[34] - https://phabricator.wikimedia.org/T165781#3388075 (10RobH) [19:03:42] chasemp: Maybe we could come up with a cap-and-trade solution? I'll give up the storage I don't need for someone else's benefit? :) [19:03:53] (or you can give me the CPU you don't need in exchange for my disk :)) [19:03:57] heh [19:04:09] chasemp: probably something around 200, but maybe 300 further... wikidata gows... [19:04:15] *grows [19:05:03] a projection of growth with ask would be good [19:05:09] we have some people ask for 5T [19:05:16] and then it turns out that is 5 years from now [19:05:23] whereas we are looking for immediate ask and then roughly next year [19:05:37] as that has to be reflected in fiscal realities ...as well as underlying storage itself [19:05:44] and it has to be specific for obvious reasons [19:05:59] chasemp: yeah I'd do the task but I would like to figure out what are the options here... it looks to me this thing is growing out of labs so maybe it's better to figure out how to do something else than to force it into the labs [19:06:40] 200G or 300G isn't a good reason to dump a test host in production I think, but I don't have much insight into other components [19:07:10] the main thing is knowing what asks are coming and projecting growth [19:07:22] (03Draft1) 10Paladox: Fix check_disk for ores* [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361927 [19:07:24] (03PS2) 10Paladox: Fix check_disk for ores* [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361927 [19:07:27] (03CR) 10Paladox: [V: 032 C: 032] Fix check_disk for ores* [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361927 (owner: 10Paladox) [19:08:13] chasemp: well, the performance is also an issue but I'm kind of dealing with it... as for projections, I'm not sure, I didn't really collect the stats for DB size but I can try to compile something adhoc [19:11:04] chasemp: from this: https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel?refresh=30m&panelId=3&fullscreen&orgId=1&from=now-1y&to=now growth rate is about 20% a year [19:12:06] chasemp: and this: https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel?refresh=30m&panelId=1&fullscreen&orgId=1&from=now-1y&to=now shows growth of 48% a year in item size [19:12:35] SMalyshev: (away in a meeting but seems like good fodder for making a special large storage instance for you) [19:12:39] combined, we've got projection of about 75% grows in DB size per year. Very rough of course [19:13:35] (03Draft1) 10Paladox: Fix undefined variable [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361930 [19:13:37] (03Draft2) 10Paladox: Fix undefined variable [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361930 [19:13:40] (03CR) 10Paladox: [V: 032 C: 032] Fix undefined variable [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361930 (owner: 10Paladox) [19:18:49] (03Draft1) 10Paladox: Fix undefined variable part 2 [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361936 [19:18:51] (03PS2) 10Paladox: Fix undefined variable part 2 [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361936 [19:18:55] (03CR) 10Paladox: [V: 032 C: 032] Fix undefined variable part 2 [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361936 (owner: 10Paladox) [19:25:23] PROBLEM - Puppet errors on tools-exec-1411 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [19:48:03] (03Draft1) 10Paladox: Fix intervals for notifications [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361939 [19:48:05] (03PS2) 10Paladox: Fix intervals for notifications [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361939 [19:50:11] (03CR) 10Awight: [V: 032 C: 032] "Second try will be a charm, I'm sure!" (031 comment) [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361939 (owner: 10Paladox) [20:05:23] RECOVERY - Puppet errors on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [20:07:35] chasemp: (when you're out of the meetings) it looks like something like we have for relforge - weaker real hw in labs-support - would be great for me [20:08:06] chasemp: created T169133, please feel free to comment [20:08:06] T169133: WDQS testing setup platform sizing - https://phabricator.wikimedia.org/T169133 [20:12:38] (03Draft1) 10Paladox: Disable volatile for services [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361948 [20:12:40] (03PS2) 10Paladox: Disable volatile for services [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361948 [20:12:43] (03CR) 10Paladox: [V: 032 C: 032] Disable volatile for services [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361948 (owner: 10Paladox) [20:15:14] (03CR) 10Awight: "/me posthumously blesses the cadaver" [labs/icinga2] - 10https://gerrit.wikimedia.org/r/361948 (owner: 10Paladox) [20:59:54] (03PS1) 10Gilles: Move perf gerrit updates to #wikimedia-perf-bots [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/362003 [21:04:31] I'm seeing some puppet errors on ORES boxes, caused by the custom WMF ldap packages lagging behind backports. Is there a good workaround? [21:18:16] (03CR) 10Krinkle: [C: 032] Move perf gerrit updates to #wikimedia-perf-bots [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/362003 (owner: 10Gilles) [21:18:39] (03Merged) 10jenkins-bot: Move perf gerrit updates to #wikimedia-perf-bots [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/362003 (owner: 10Gilles) [21:18:47] (03CR) 10jenkins-bot: Move perf gerrit updates to #wikimedia-perf-bots [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/362003 (owner: 10Gilles) [21:29:57] 10Labs, 10Labs-Infrastructure, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): labnet-users group can no more access labnet1001 / labnet1002 - https://phabricator.wikimedia.org/T169018#3388688 (10hashar) @Andrew I have regained access on both labnet1001 and labnet1002. Thank you! [21:30:03] 10Labs, 10Labs-Infrastructure, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): labnet-users group can no more access labnet1001 / labnet1002 - https://phabricator.wikimedia.org/T169018#3388690 (10hashar) 05Open>03Resolved [21:32:00] 10Labs, 10Tool-Labs, 10wikimedia-irc-freenode: Freenode sometimes throttles bot connections from tools - https://phabricator.wikimedia.org/T151704#3388695 (10charitwo) [21:36:07] awight: andrewbogott did something there let's ping him^ [21:36:38] :) [21:36:52] awight: can you give me an example hostname? [21:37:15] This sounds sort of familiar... [21:37:49] sure-- andrewbogott ores-web-04.ores.eqiad.wmflabs [21:38:00] Can I grab the puppet output for you, or do you have that covered? [21:38:10] I'll just log in and poke about [21:38:22] I may have destroyed some of the evidence, fixing by hand [21:38:40] sure, what were you seeing? [21:38:44] The problem was that ldap-utils and its dependency were coming from both jessie-backports and from the wmf custom repo [21:39:01] the wmf version was older, at .41. I imagine it matched regular jessie [21:39:26] but somehow, we had installed the backports one so I was being asked to manually downgrade in order to get the higher-priority, pinned package. [21:40:07] "Error: Could not find user deploy-service" <- unrelated I take it? [21:41:03] unrelated [21:41:38] andrewbogott: Here's a clear case. ores-worker-07.ores.eqiad.wmflabs:/var/log/puppet.log [21:41:48] ESC[1;31mError: /Stage[main]/Ldap::Client::Openldap/Package[ldap-utils]/ensure: change from 2.4.44+dfsg-4~bpo8+1 to 2.4.41+dfsg-1+wmf1 failed: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install ldap-utils' returned 100: Reading package lists... [21:46:40] (argh ANSI art) [21:47:00] awight funny fact 4th july is independance day in the us, and i live in the town his family came from [21:48:03] puppet runs without a hitch on -07 now. I can imagine various maneuvers with pinning and/or changing repos in a strategic order and/or doing apt-get upgrade that would result in this but it's hard to know exactly what happened. [21:48:22] Does it happen on new builds as well? If not you should probably just tidy up what you have by hand. [21:49:30] paladox: I'm missing the cultural reference. "The Boss"? [21:49:45] awight? [21:50:02] andrewbogott: okay cool, I'm happy to hand-adjust for now [21:50:30] paladox: I missed who "his" family is? [21:50:48] ah wrong channel that's in #wikimedia-ai [22:04:04] legoktm: Could you deploy https://gerrit.wikimedia.org/r/362003 perhaps? (wikibugs) [22:05:31] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Create an XTools logo - https://phabricator.wikimedia.org/T167345#3388972 (10Ricordisamoa) The Overpass Font Software is dual licensed under the SIL Open Font License and the GNU Lesser General Public License, LGPL 2.1 : http://www.gnu.org/licenses/old-license... [22:07:15] twentyafterfour: Reedy: ^ [22:10:52] 10Labs, 10Tool-Labs, 10wikimedia-irc-freenode: Freenode sometimes throttles bot connections from tools - https://phabricator.wikimedia.org/T151704#3389006 (10charitwo) after speaking with a staffer today there is no issue adding an iline but the box needs to ensure an ident daemon is running for so each indi... [22:14:50] Krinkle: can do [22:15:43] PROBLEM - Puppet errors on tools-cron-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [22:16:45] or I guess it's already done? [22:19:50] twentyafterfour: Indeed. Must've happened a few minutes ago [22:19:58] Thanks anyway :) [22:29:19] PROBLEM - Puppet errors on tools-services-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:30:45] RECOVERY - Puppet errors on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:37:51] 10Labs, 10Labs-Infrastructure, 10Tool-Labs, 10cloud-services-team (Kanban): Locate an alternative source for modern Openstack Debian packages - https://phabricator.wikimedia.org/T169099#3387023 (10bd808) One bandaid we could try would be to gather up packages from `/var/cache/apt/archives/` on our hosts an... [22:40:18] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Create an XTools logo - https://phabricator.wikimedia.org/T167345#3389078 (10kaldari) Since we're only using the font, not distributing it or creating a derivative font, we don't technically have to worry about the licenses (as they only cover distribution and... [22:40:20] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Create an XTools logo - https://phabricator.wikimedia.org/T167345#3389079 (10DannyH) > Awesome :) Now we just need to come up with a slogan (or brief description) for XTools. Any ideas? Or is "Feeding your data hunger" catchy enough? It works for me, I think... [22:44:20] RECOVERY - Puppet errors on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:01:36] 10Labs, 10Labs-Infrastructure, 10LDAP-Access-Requests, 10Operations, and 2 others: Make all ldap users have a sane shell (/bin/bash) - https://phabricator.wikimedia.org/T86668#3389101 (10bd808) 05Open>03Resolved a:03bd808