[00:00:49] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Mon Oct 21 00:00:41 UTC 2013 [00:00:49] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [00:00:49] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Mon Oct 21 00:00:46 UTC 2013 [00:00:59] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.062 second response time [00:01:19] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [00:04:09] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 316 seconds [00:05:49] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Mon Oct 21 00:05:47 UTC 2013 [00:06:29] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [00:18:49] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 00:18:41 UTC 2013 [00:19:29] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [00:19:49] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 00:19:42 UTC 2013 [00:20:39] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [00:21:49] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 00:21:43 UTC 2013 [00:21:49] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 00:21:43 UTC 2013 [00:21:59] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [00:22:09] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [00:28:50] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 00:28:45 UTC 2013 [00:28:59] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [00:32:49] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 00:32:41 UTC 2013 [00:33:09] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [00:33:49] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 00:33:47 UTC 2013 [00:34:49] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [00:35:49] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 00:35:43 UTC 2013 [00:36:29] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [00:38:43] <^d> !log bringing gerrit down to troubleshoot replication [00:39:01] Logged the message, Master [00:39:26] i was just about to mention it was down :P [00:41:49] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 00:41:45 UTC 2013 [00:41:59] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [00:42:49] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 00:42:41 UTC 2013 [00:43:23] should nagios be reporting that gerrit's down? [00:43:47] <^d> There's some work-in-progress on icinga alerts. [00:43:49] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [00:43:49] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 00:43:46 UTC 2013 [00:43:50] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [00:44:59] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 00:44:56 UTC 2013 [00:45:11] errr, yeah, sorry trademark gods [00:45:59] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [00:46:20] <^d> Gee gerrit, why you have to be so stupid today? [00:51:47] i wonder if watchmouse allows scheduling downtime [00:51:56] (it did notice gerrit) [00:54:15] <^d> Even if it did, this isn't scheduled. [00:54:22] <^d> Nor do I have access to watchmouse. [00:54:49] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [00:55:14] well i interpreted the !log as "intentional" even if not scheduled [00:55:18] but whatever [00:55:38] hey ^d [00:55:49] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [00:55:50] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 00:55:48 UTC 2013 [00:55:55] so weird seeing aude in this TZ [00:56:03] got to meet shawn pearce (gerrit developer) today :) [00:56:14] at teh google? [00:56:15] hi jeremyb :) [00:56:17] yep [00:56:19] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [00:56:22] what's doing there on a weekend? [00:56:28] gsoc summit [00:56:30] mentor summit [00:56:42] aha [00:56:59] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 00:56:53 UTC 2013 [00:57:49] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [01:00:59] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Mon Oct 21 01:00:55 UTC 2013 [01:00:59] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Mon Oct 21 01:00:55 UTC 2013 [01:01:19] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [01:01:49] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [01:05:50] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Mon Oct 21 01:05:47 UTC 2013 [01:06:29] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [01:18:49] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 01:18:41 UTC 2013 [01:19:29] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [01:19:49] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 01:19:47 UTC 2013 [01:20:39] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [01:21:40] <^d> *sigh* This is not how I planned to spend my sunday evening. [01:21:49] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 01:21:47 UTC 2013 [01:21:49] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 01:21:47 UTC 2013 [01:21:59] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [01:22:09] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [01:28:49] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 01:28:44 UTC 2013 [01:28:59] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [01:32:49] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 01:32:46 UTC 2013 [01:33:09] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [01:33:49] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 01:33:46 UTC 2013 [01:34:09] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [01:34:49] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [01:35:49] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 01:35:41 UTC 2013 [01:36:29] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [01:41:49] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 01:41:42 UTC 2013 [01:41:59] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [01:42:39] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 01:42:38 UTC 2013 [01:42:49] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [01:43:49] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 01:43:43 UTC 2013 [01:43:50] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [01:44:49] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 01:44:43 UTC 2013 [01:44:59] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [01:55:59] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 01:55:58 UTC 2013 [01:56:19] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [01:56:49] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 01:56:44 UTC 2013 [01:57:49] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [02:00:49] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Mon Oct 21 02:00:40 UTC 2013 [02:00:49] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Mon Oct 21 02:00:45 UTC 2013 [02:01:19] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [02:01:49] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [02:05:49] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Mon Oct 21 02:05:44 UTC 2013 [02:06:29] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [02:10:47] !log LocalisationUpdate completed (1.22wmf21) at Mon Oct 21 02:10:47 UTC 2013 [02:11:10] Logged the message, Master [02:15:19] PROBLEM - Puppet freshness on copper is CRITICAL: No successful Puppet run in the last 10 hours [02:18:49] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 02:18:44 UTC 2013 [02:19:25] !log LocalisationUpdate completed (1.22wmf22) at Mon Oct 21 02:19:25 UTC 2013 [02:19:29] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [02:19:38] Logged the message, Master [02:19:59] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 02:19:49 UTC 2013 [02:20:39] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [02:22:09] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 02:22:04 UTC 2013 [02:22:19] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 02:22:10 UTC 2013 [02:22:59] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [02:23:09] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [02:28:49] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 02:28:46 UTC 2013 [02:28:59] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [02:32:49] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 02:32:44 UTC 2013 [02:33:09] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [02:33:59] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 02:33:49 UTC 2013 [02:34:49] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [02:35:49] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 02:35:40 UTC 2013 [02:36:29] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [02:39:00] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 21 02:39:00 UTC 2013 [02:39:13] Logged the message, Master [02:41:49] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 02:41:46 UTC 2013 [02:41:59] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [02:42:50] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 02:42:47 UTC 2013 [02:43:49] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [02:43:59] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 02:43:57 UTC 2013 [02:44:49] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 02:44:42 UTC 2013 [02:44:49] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [02:44:59] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [02:55:49] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 02:55:45 UTC 2013 [02:56:19] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [02:56:49] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 02:56:45 UTC 2013 [02:57:49] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [03:00:49] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Mon Oct 21 03:00:46 UTC 2013 [03:00:59] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Mon Oct 21 03:00:56 UTC 2013 [03:01:19] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [03:01:49] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [03:05:49] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Mon Oct 21 03:05:47 UTC 2013 [03:06:29] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [03:14:59] PROBLEM - Apache HTTP on mw1109 is CRITICAL: Connection refused [03:18:49] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 03:18:42 UTC 2013 [03:19:29] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [03:19:59] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 03:19:58 UTC 2013 [03:20:39] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [03:21:39] !log on mw1109: stopped apache to test cgconfig [03:21:49] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 03:21:43 UTC 2013 [03:21:55] Logged the message, Master [03:21:59] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [03:21:59] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 03:21:58 UTC 2013 [03:22:09] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [03:28:49] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 03:28:41 UTC 2013 [03:28:59] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [03:32:50] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 03:32:44 UTC 2013 [03:33:09] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [03:34:09] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 03:34:00 UTC 2013 [03:34:49] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [03:35:49] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 03:35:40 UTC 2013 [03:36:29] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [03:41:49] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 03:41:43 UTC 2013 [03:41:59] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [03:42:59] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 03:42:48 UTC 2013 [03:43:49] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [03:43:49] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 03:43:44 UTC 2013 [03:44:49] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [03:44:49] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 03:44:44 UTC 2013 [03:44:59] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:49] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 03:55:47 UTC 2013 [03:56:19] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [03:56:50] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 03:56:48 UTC 2013 [03:57:49] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [04:00:49] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Mon Oct 21 04:00:42 UTC 2013 [04:00:59] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.083 second response time [04:00:59] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Mon Oct 21 04:00:57 UTC 2013 [04:01:19] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [04:01:49] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [04:05:59] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Mon Oct 21 04:05:49 UTC 2013 [04:06:29] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [04:18:49] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 04:18:40 UTC 2013 [04:19:29] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [04:19:49] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 04:19:46 UTC 2013 [04:20:39] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [04:21:59] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 04:21:57 UTC 2013 [04:21:59] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 04:21:57 UTC 2013 [04:22:09] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [04:22:59] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [04:28:49] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 04:28:46 UTC 2013 [04:28:59] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [04:32:49] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 04:32:43 UTC 2013 [04:33:09] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [04:33:49] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 04:33:44 UTC 2013 [04:34:49] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [04:35:49] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 04:35:39 UTC 2013 [04:36:29] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [04:39:09] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [04:41:49] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 04:41:45 UTC 2013 [04:41:59] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [04:42:19] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:42:49] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 04:42:41 UTC 2013 [04:43:49] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [04:43:49] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 04:43:47 UTC 2013 [04:44:49] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 04:44:42 UTC 2013 [04:44:49] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [04:44:59] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [04:55:49] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 04:55:45 UTC 2013 [04:56:19] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:49] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 04:56:45 UTC 2013 [04:57:49] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [05:00:49] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Mon Oct 21 05:00:41 UTC 2013 [05:00:49] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Mon Oct 21 05:00:41 UTC 2013 [05:00:49] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [05:01:19] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [05:06:09] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Mon Oct 21 05:05:59 UTC 2013 [05:06:29] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [05:10:09] PROBLEM - search indices - check lucene status page on search18 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern found - 55856 bytes in 0.110 second response time [05:14:43] (03PS2) 10Legoktm: Add MassMessage jobs to the high priority queue [operations/puppet] - 10https://gerrit.wikimedia.org/r/90280 [05:15:09] (03PS3) 10Legoktm: Add MassMessage jobs to the high priority queue [operations/puppet] - 10https://gerrit.wikimedia.org/r/90280 [05:18:49] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 05:18:43 UTC 2013 [05:19:29] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [05:20:09] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 05:19:59 UTC 2013 [05:20:39] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [05:21:49] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 05:21:45 UTC 2013 [05:21:59] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [05:22:09] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 05:22:00 UTC 2013 [05:22:09] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [05:28:49] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 05:28:44 UTC 2013 [05:28:59] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [05:32:59] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 05:32:53 UTC 2013 [05:33:09] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [05:33:26] (03PS1) 10Springle: icinga pmp-check-mysql-innodb idle_blocker_duration [operations/puppet] - 10https://gerrit.wikimedia.org/r/90867 [05:33:59] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 05:33:49 UTC 2013 [05:34:49] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [05:35:34] (03CR) 10Springle: [C: 032] icinga pmp-check-mysql-innodb idle_blocker_duration [operations/puppet] - 10https://gerrit.wikimedia.org/r/90867 (owner: 10Springle) [05:35:49] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 05:35:40 UTC 2013 [05:36:29] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [05:41:49] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 05:41:44 UTC 2013 [05:42:00] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [05:42:49] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 05:42:40 UTC 2013 [05:43:49] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [05:43:49] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 05:43:46 UTC 2013 [05:44:49] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [05:45:09] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 05:45:01 UTC 2013 [05:45:59] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [05:55:59] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 05:55:49 UTC 2013 [05:56:19] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [05:56:59] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 05:56:54 UTC 2013 [05:57:49] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [06:00:49] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Mon Oct 21 06:00:46 UTC 2013 [06:00:49] RECOVERY - Puppet freshness on cp1031 is OK: puppet ran at Mon Oct 21 06:00:46 UTC 2013 [06:01:19] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [06:01:49] PROBLEM - Puppet freshness on cp1022 is CRITICAL: No successful Puppet run in the last 10 hours [06:02:58] (03CR) 10Ori.livneh: [C: 032] Add MassMessage jobs to the high priority queue [operations/puppet] - 10https://gerrit.wikimedia.org/r/90280 (owner: 10Legoktm) [06:05:59] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Mon Oct 21 06:05:58 UTC 2013 [06:06:29] PROBLEM - Puppet freshness on cp1028 is CRITICAL: No successful Puppet run in the last 10 hours [06:07:00] RECOVERY - Disk space on copper is OK: DISK OK [06:14:09] RECOVERY - Puppet freshness on copper is OK: puppet ran at Mon Oct 21 06:14:06 UTC 2013 [06:15:12] !log moved older swift replication logs from copper:/root to iron:/root/swift-repl/ (now gzipped), copper was full [06:15:26] Logged the message, Master [06:32:27] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 06:32:20 UTC 2013 [06:32:27] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 06:32:20 UTC 2013 [06:32:27] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 06:32:20 UTC 2013 [06:32:27] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 06:32:20 UTC 2013 [06:32:27] RECOVERY - Puppet freshness on cp4001 is OK: puppet ran at Mon Oct 21 06:32:21 UTC 2013 [06:32:27] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 06:32:21 UTC 2013 [06:32:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [06:32:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [06:32:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [06:32:57] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 06:32:47 UTC 2013 [06:33:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [06:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [06:33:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [06:33:57] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 06:33:47 UTC 2013 [06:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [06:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 06:35:42 UTC 2013 [06:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [06:41:47] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 06:41:45 UTC 2013 [06:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [06:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 06:42:45 UTC 2013 [06:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [06:43:57] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 06:43:51 UTC 2013 [06:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [06:45:07] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 06:45:06 UTC 2013 [06:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [06:55:47] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 06:55:43 UTC 2013 [06:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [06:57:07] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 06:56:58 UTC 2013 [06:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [07:04:03] (03PS1) 10Ori.livneh: Correct path reference to bits path hit by ULS [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90869 [07:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 07:18:43 UTC 2013 [07:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [07:20:07] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 07:19:59 UTC 2013 [07:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [07:21:47] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 07:21:46 UTC 2013 [07:21:47] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 07:21:46 UTC 2013 [07:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [07:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [07:28:57] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 07:28:47 UTC 2013 [07:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [07:32:57] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 07:32:50 UTC 2013 [07:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [07:33:47] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 07:33:45 UTC 2013 [07:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [07:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 07:35:40 UTC 2013 [07:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [07:41:57] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 07:41:47 UTC 2013 [07:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 07:42:37 UTC 2013 [07:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [07:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [07:43:57] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 07:43:47 UTC 2013 [07:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [07:44:47] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 07:44:43 UTC 2013 [07:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:35] (03PS2) 10ArielGlenn: remove srv1-234 main and mgmt entries, except for srv193 [operations/dns] - 10https://gerrit.wikimedia.org/r/90516 [07:55:57] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 07:55:55 UTC 2013 [07:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:57] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 07:56:56 UTC 2013 [07:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:27] PROBLEM - MySQL Slave Running on db1026 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Table wikidatawiki._wb_terms_new doesnt exist on query. De [07:59:17] PROBLEM - MySQL Slave Running on db45 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Deadlock found when trying to get lock: try restarting transac [07:59:27] RECOVERY - MySQL Slave Running on db1026 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [07:59:33] I am going to upgrade Jenkins / restart it for a scheduled maintenance. Expected downtime: 1 hour starting now. [07:59:40] !log upgrading Jenkins for scheduled maintenance [07:59:52] Logged the message, Master [08:00:36] !log stopping Zuul / Jenkins [08:00:48] Logged the message, Master [08:01:23] (03CR) 10ArielGlenn: [C: 032] remove srv1-234 main and mgmt entries, except for srv193 [operations/dns] - 10https://gerrit.wikimedia.org/r/90516 (owner: 10ArielGlenn) [08:02:07] PROBLEM - MySQL Replication Heartbeat on db45 is CRITICAL: CRIT replication delay 305 seconds [08:02:57] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/local/bin/zuul-server [08:13:04] that Zuul issue is me [08:13:15] I don't think I can flag a service has been under maintenance [08:13:20] and IIRC it does not send page [08:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 08:18:43 UTC 2013 [08:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:07] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 08:19:59 UTC 2013 [08:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [08:21:47] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 08:21:39 UTC 2013 [08:22:07] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 08:21:59 UTC 2013 [08:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [08:28:57] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 08:28:48 UTC 2013 [08:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [08:32:57] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 08:32:49 UTC 2013 [08:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [08:34:07] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 08:33:59 UTC 2013 [08:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [08:34:55] (03PS1) 10ArielGlenn: get rid of temp-es* hosts, entries from 2009 long since unused [operations/dns] - 10https://gerrit.wikimedia.org/r/90871 [08:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 08:35:40 UTC 2013 [08:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [08:41:05] (03CR) 10Akosiaris: "From what I see it should be hitting all hosts(when of course they include that class)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/87332 (owner: 10Matanya) [08:41:57] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 08:41:47 UTC 2013 [08:41:57] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/local/bin/zuul-server [08:41:58] !log restarted Zuul [08:42:03] damn icinga is fast [08:42:10] Logged the message, Master [08:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [08:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 08:42:43 UTC 2013 [08:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [08:43:18] !log stopping Zuul again. Need to upgrade Jenkins plugins [08:43:32] Logged the message, Master [08:43:57] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 08:43:48 UTC 2013 [08:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [08:44:47] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 08:44:43 UTC 2013 [08:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [08:45:57] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/local/bin/zuul-server [08:46:59] !log jenkins: upgrading plugins [08:47:10] Logged the message, Master [08:49:03] (03PS1) 10Akosiaris: Fix drac module broken in 22d7837 [operations/puppet] - 10https://gerrit.wikimedia.org/r/90874 [08:49:20] (03CR) 10Akosiaris: [C: 032] Fix drac module broken in 22d7837 [operations/puppet] - 10https://gerrit.wikimedia.org/r/90874 (owner: 10Akosiaris) [08:49:59] (03CR) 10Akosiaris: [V: 032] Fix drac module broken in 22d7837 [operations/puppet] - 10https://gerrit.wikimedia.org/r/90874 (owner: 10Akosiaris) [08:50:29] akosiaris: hi, jenkins is being upgraded so no linting for you :-] [08:51:10] !log forced verified +2 on gerrit 90874 since jenkins is being upgraded [08:51:22] hashar: yeah i remembered. Thanks :-) [08:51:23] Logged the message, Master [08:53:49] akosiaris: can you please explain your fix? I don't fully understand it [08:54:38] aaaaaaa before that. I just noticed this https://gerrit.wikimedia.org/r/#/c/90098/8/modules/ssh/templates/sshd_config.erb [08:54:59] yes, my bad [08:55:10] fixed by andrewbogott [08:55:21] yes... what i am trying to understand is why [08:55:42] I cherry picked paravoid's patch, and merged it into mine by accident [08:55:57] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 08:55:47 UTC 2013 [08:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [08:56:47] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 08:56:42 UTC 2013 [08:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [08:58:23] akosiaris: i'm referring to this patch: https://gerrit.wikimedia.org/r/#/c/15874/1/modules/ssh/templates/sshd_config.erb [08:59:48] !log rerestarting Jenkins. [09:00:02] Logged the message, Master [09:02:00] hmmm... ok. What I mostly disliked was that it got merged... [09:02:09] anyway [09:02:33] As far as the other fix goes, puppet needs to be able to reference files [09:02:46] !log Jenkins restarted / upgraded [09:02:47] and modules need to have their files in a files directory [09:03:00] Logged the message, Master [09:03:10] !log restarting Zuul [09:03:22] but confusingly enough the sources need to be of the form "puppet:///modules// [09:03:23] Logged the message, Master [09:03:25] or else it won't work [09:03:57] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/local/bin/zuul-server [09:04:08] Jenkins should be back up now :-] [09:04:23] ahm, it was https://gerrit.wikimedia.org/r/#/c/87332/5/modules/drac/manifests/init.pp akosiaris, you requested the removal ... [09:04:58] Not really. Read my comment more carefully please [09:05:42] oh, you want the files in the files dirctory, without being called from there. I think i understand now [09:06:25] not me. Puppet does [09:06:34] yeah, well :) [09:06:48] i need to redo some work on my download module then. [09:11:00] (03PS8) 10Matanya: download: convert into a module and clean up [operations/puppet] - 10https://gerrit.wikimedia.org/r/90760 [09:12:01] (03PS2) 10Reedy: Moved all apple-touch-icon.png images to bits. Set $wgAppleTouchIcon where appropriate. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90762 [09:12:09] (03CR) 10Reedy: [C: 032] Moved all apple-touch-icon.png images to bits. Set $wgAppleTouchIcon where appropriate. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90762 (owner: 10Reedy) [09:12:50] c'mon jenkins [09:13:34] (03Abandoned) 10Hashar: misc varnish conf for doc.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/82653 (owner: 10Hashar) [09:13:47] Reedy: he just woke up, give a few seconds :) [09:14:11] He's a slacker! [09:14:30] (03CR) 10jenkins-bot: [V: 04-1] download: convert into a module and clean up [operations/puppet] - 10https://gerrit.wikimedia.org/r/90760 (owner: 10Matanya) [09:14:39] here i go [09:14:45] (03Merged) 10jenkins-bot: Moved all apple-touch-icon.png images to bits. Set $wgAppleTouchIcon where appropriate. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90762 (owner: 10Reedy) [09:15:26] Could someone from ops merge an apache config change for me please? It's a partial reversion (re-addition of 3 lines) of something Daniel merged and pushed for me on friday. https://gerrit.wikimedia.org/r/#/c/90764 [09:16:07] Reedy: yes I will [09:16:08] !log reedy synchronized docroot/bits/apple-touch/ [09:16:13] thanks [09:16:20] Logged the message, Master [09:16:46] !log reedy synchronized wmf-config/InitialiseSettings.php [09:16:59] Logged the message, Master [09:17:14] Another step toward removing most of our docroot folders :D [09:17:21] (03CR) 10Akosiaris: [C: 032] www.wikisource.org is not a portal, but a redirect to wikisource.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90764 (owner: 10Reedy) [09:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 09:18:41 UTC 2013 [09:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [09:19:57] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 09:19:47 UTC 2013 [09:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [09:21:57] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 09:21:48 UTC 2013 [09:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [09:22:27] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 09:22:18 UTC 2013 [09:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [09:32:47] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 09:32:44 UTC 2013 [09:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [09:33:47] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 09:33:45 UTC 2013 [09:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [09:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 09:35:41 UTC 2013 [09:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [09:40:35] (03PS1) 10Reedy: Compress apple touch pngs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90880 [09:41:05] (03CR) 10Reedy: [C: 032] Compress apple touch pngs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90880 (owner: 10Reedy) [09:41:14] (03Merged) 10jenkins-bot: Compress apple touch pngs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90880 (owner: 10Reedy) [09:41:57] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 09:41:48 UTC 2013 [09:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [09:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 09:42:44 UTC 2013 [09:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [09:43:57] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 09:43:54 UTC 2013 [09:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [09:44:47] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 09:44:44 UTC 2013 [09:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [09:55:57] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 09:55:47 UTC 2013 [09:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [09:56:57] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 09:56:52 UTC 2013 [09:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [10:08:08] ping Reedy [10:08:18] * Reedy hides [10:08:30] https://gerrit.wikimedia.org/r/#/c/90762/2/wmf-config/InitialiseSettings.php [10:08:45] You know I left default to false, and apple-touch.png in docroot for a reason, right? :-P [10:09:02] No? [10:09:18] There was a bug for it. Somewhere. [10:09:24] I left the default as false because of a lack of an image to use as using wikipedia will bitch [10:09:55] https://bugzilla.wikimedia.org/show_bug.cgi?id=55917 [10:10:06] I did query on there if i should make a favicon/robots style rewrite to go with it [10:10:41] https://bugzilla.wikimedia.org/show_bug.cgi?id=19392#c15 [10:11:13] I'm not keeping the docroots around for silly apple images [10:11:34] Noting default is still false [10:11:34] Sure, I noticed you've been cleaning those up recently [10:12:21] The images are still actually in the docroots for now [10:12:32] I didn't sync them, and sync-docroot doesn't propogate deletions [10:12:42] I also note I didn't approve and merge https://gerrit.wikimedia.org/r/#/c/60777/ [10:13:00] So that's a yes, I need to make a favico style php redirect script [10:13:35] Maybe [10:13:58] * twkozlowski curses the mere idea of having separete Apple Touch icons [10:14:06] separate* [10:14:49] * Reedy lets twkozlowski propose we use wikimedia on all projects and he can deal with the backlash [10:15:38] Oh [10:15:44] You mean seperate touch icons from favico? [10:16:38] Yeah; the idea of having to create per-device icons is just crazy. [10:17:20] mumble mumble apple mumble mumble [10:17:38] Reedy: I odn [10:17:45] Oooo [10:18:14] You mean that older Apple devices don't understand link rel= stuff? [10:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 10:18:39 UTC 2013 [10:19:01] https://developer.apple.com/library/ios/documentation/AppleApplications/Reference/SafariWebContent/ConfiguringWebApplications/ConfiguringWebApplications.html doesn't mention this [10:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [10:19:44] Reedy: it should be OK, I actually tested this with legoktm, and serving one size seems to work [10:19:57] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 10:19:55 UTC 2013 [10:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [10:21:57] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 10:21:55 UTC 2013 [10:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [10:22:17] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 10:22:16 UTC 2013 [10:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [10:25:47] (03PS1) 10Reedy: Add "touch.php" for $wgAppleTouchIcon... [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90886 [10:27:45] I'll come back to this later ;) [10:27:55] \o/ [10:28:47] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 10:28:43 UTC 2013 [10:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [10:32:47] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 10:32:42 UTC 2013 [10:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [10:33:57] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 10:33:47 UTC 2013 [10:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [10:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 10:35:43 UTC 2013 [10:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [10:41:47] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 10:41:45 UTC 2013 [10:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [10:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 10:42:41 UTC 2013 [10:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [10:43:36] !log Gerrit replication is broken, side effects: git.wm.o shows outdated trees and Jenkins might be missing some commits [10:43:53] Logged the message, Master [10:43:57] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 10:43:56 UTC 2013 [10:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [10:44:47] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 10:44:46 UTC 2013 [10:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [10:55:47] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 10:55:44 UTC 2013 [10:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [10:56:47] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 10:56:45 UTC 2013 [10:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [11:10:41] (03CR) 10Akosiaris: [C: 032] "LGTM, although I have no idea what each of those fonts is. A quick check showed that all packages exist so we are good I think" [operations/puppet] - 10https://gerrit.wikimedia.org/r/88441 (owner: 10Reedy) [11:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 11:18:41 UTC 2013 [11:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [11:19:47] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 11:19:46 UTC 2013 [11:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [11:22:07] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 11:22:03 UTC 2013 [11:22:17] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 11:22:08 UTC 2013 [11:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [11:23:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [11:28:47] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 11:28:45 UTC 2013 [11:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [11:31:22] !log Gerrit replication have been broken since Oct 19th roughly 20:50 UTC. {{bug|55948}} [11:31:27] addshore: ^^^ [11:31:37] Logged the message, Master [11:31:38] :< [11:32:03] That blocks us rather a lot :P [11:32:22] 20:54 ^d: gerrit: installed 2.7-rc2-507-g1e7090b, service back up [11:32:29] on friday [11:32:47] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 11:32:42 UTC 2013 [11:33:08] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [11:33:47] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 11:33:42 UTC 2013 [11:33:51] hmm [11:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [11:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 11:35:43 UTC 2013 [11:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [11:38:28] https://bugzilla.wikimedia.org/show_bug.cgi?id=55948#c3 [11:38:42] so when ytterbium connects on gallium it gets: [11:38:45] Received disconnect from 208.80.154.80: 3: com.jcraft.jsch.JSchException: reject HostKey: gallium.wikimedia.org [preauth] [11:39:57] I guess Gerrit no more recognize gallium host key [11:42:07] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 11:42:01 UTC 2013 [11:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [11:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 11:42:42 UTC 2013 [11:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [11:44:07] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 11:43:57 UTC 2013 [11:44:08] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [11:44:47] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 11:44:43 UTC 2013 [11:45:16] I need to grab a snack [11:45:29] potentially a root could look at whatever user is doing the replication on ytterbium [11:45:43] and verify it has gallium.wikimedia.org in its known_hosts file [11:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [11:46:07] maybe the replication has been restarted as root who does not have the host key [11:46:07] brb [11:47:55] !log Shutdown csw2-esams:xe-2/1/1 (1 DF leg) [11:48:08] Logged the message, Master [11:55:47] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 11:55:44 UTC 2013 [11:55:58] (03CR) 10Reza: [C: 031] "it is ok" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90759 (owner: 10Ebrahim) [11:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [11:56:57] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 11:56:56 UTC 2013 [11:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [11:59:36] back [12:07:28] (03PS1) 10Odder: (bug 54828) Configure FlaggedRevs for ptwiki (take 3) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90893 [12:07:45] (03PS2) 10Physikerwelt: Mathoid service [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 [12:08:16] (03CR) 10Odder: "See https://gerrit.wikimedia.org/r/#/c/90893/ for a follow-up." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/89001 (owner: 10Odder) [12:09:24] (03CR) 10Odder: "This is a follow-up to https://gerrit.wikimedia.org/r/#/c/89001/ and https://gerrit.wikimedia.org/r/#/c/89482/" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90893 (owner: 10Odder) [12:09:33] apergos paravoid akosiaris mark: any of you could assist in fixing some ssh known_host issue on ytterbium ? A gerrit replication process can't ssh to some boxes because it is rejecting the destination host key :( [12:09:51] traces: https://bugzilla.wikimedia.org/show_bug.cgi?id=55948#c3 [12:10:33] Reedy: how's https://bugzilla.wikimedia.org/show_bug.cgi?id=54680 going on? [12:13:53] (03CR) 10Helder.wiki: [C: 031] (bug 54828) Configure FlaggedRevs for ptwiki (take 3) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90893 (owner: 10Odder) [12:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 12:18:39 UTC 2013 [12:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [12:20:07] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 12:19:59 UTC 2013 [12:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [12:21:47] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 12:21:44 UTC 2013 [12:21:47] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 12:21:44 UTC 2013 [12:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [12:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [12:29:07] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 12:29:03 UTC 2013 [12:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [12:32:47] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 12:32:45 UTC 2013 [12:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [12:33:47] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 12:33:46 UTC 2013 [12:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [12:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 12:35:41 UTC 2013 [12:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [12:41:47] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 12:41:43 UTC 2013 [12:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 12:42:39 UTC 2013 [12:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [12:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [12:43:47] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 12:43:44 UTC 2013 [12:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [12:45:07] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 12:45:04 UTC 2013 [12:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [12:51:40] hashar: could it be something similar ? http://stackoverflow.com/questions/13079002/knownhosts-for-ant-scp-and-sshexec-tasks [12:51:58] akosiaris: looking [12:52:40] the thing is that the Gerrit change is pretty small and unrelated :/ [12:52:59] I am suspecting gerrit got restarted as root instead of gerrit2 or gerrit user [12:53:37] akosiaris: can you potentially look at the known_hosts files for root / gerrit (or gerrit2) users and see whether it got gallium / lanthanum / antinomy ? [12:53:58] might also want to verify which user is running the java Gerrit process [12:54:28] on which machines ? [12:54:34] gallium + the other 2 ? [12:55:03] ytterbium [12:55:07] that is the machine running Gerrit [12:55:32] the Java process does ssh connection to various hosts (lanthanum / gallium/ antinomy) they are the Gerrit replication receiver [12:55:47] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 12:55:44 UTC 2013 [12:55:48] seems the issue is on Gerrit master side, it is rejecting the destination hosts keys [12:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [12:56:47] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 12:56:44 UTC 2013 [12:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [12:57:28] known_hosts on ytterbium was updated today [12:57:32] well [12:58:42] but the gallium key is correct [12:58:51] and for /root/known_hosts ? [12:59:23] not touched since Aug 27 [12:59:24] if Gerrit runs at root, it would use that file [12:59:24] yes but it does not [12:59:24] which probably doesn't contains lanthanum/gallium/antinomy [12:59:24] it runs as gerrit2 [12:59:24] ohhh [12:59:24] crazy :( [12:59:26] you could try symlinking known_hosts2 :D [12:59:40] such thing does not exist (thank god) [13:02:02] ehe [13:06:34] akosiaris: so I have no clue :( [13:07:20] * hashar digs in https://gerrit.wikimedia.org/r/plugins/replication/Documentation/config.html [13:07:43] If replicating over SSH (recommended), ensure the host key of the remote system(s) is already in the Gerrit user’s ~/.ssh/known_hosts file. The easiest way to add the host key is to connect once by hand with the command line: [13:07:44] sudo su -c 'ssh mirror1.us.some.org echo' gerrit2 [13:07:44] yeah hmm [13:09:40] these keys are not wrong .... [13:11:51] so Gerrit is crazy :( [13:13:13] have you tried turning it off and back on *runs* [13:14:30] huh... [13:17:34] akosiaris: some threads says the java ssh implementation (Jsch) expect a spefici format for known_hosts [13:17:35] https://groups.google.com/forum/#!topic/repo-discuss/9PTfVG8vdAU [13:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 13:18:39 UTC 2013 [13:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [13:20:07] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 13:19:59 UTC 2013 [13:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [13:22:07] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 13:22:00 UTC 2013 [13:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [13:22:17] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 13:22:10 UTC 2013 [13:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [13:26:27] akosiaris: do you have access to etherpad's logs? [13:26:38] yes [13:27:01] but i am afraid you are in for a disappointment [13:27:12] why [13:27:14] if you are expecting any kind of help from them... [13:27:31] but let's be optimistic here [13:27:35] how can i help ? [13:27:53] akosiaris: I'm wondering what's going on when the server returns 503 error [13:28:30] I just saw https://github.com/ether/etherpad-lite/issues/1941 ; it seems that if a single pad is too weird the whole etherpad can go down, so maybe we have specific pads that make the site restart? [13:28:43] that is a proxy error returned from apache. [13:28:57] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 13:28:48 UTC 2013 [13:29:00] it will be returned if the backend process is not responding for some reason [13:29:09] for example it crashed and is restarting [13:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [13:29:24] yep [13:29:48] that's why that bug lit a lamp in my head (is this valid English? :) ) [13:30:01] can you check if it's being restarted? [13:30:15] valid english yes... i don't think it is an expression actually used though [13:30:17] lemme check [13:31:14] any specific timeframe ? [13:32:57] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 13:32:49 UTC 2013 [13:32:58] so [13:32:58] akosiaris: a couple minutes before I asked? :) [13:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [13:33:09] or grep for "RESTART!" [13:33:21] restart at 13:15.46 UTC [13:33:27] or "graceful shutdown" , "SyntaxError: Unexpected end of input" [13:33:46] yes, that could be the 503 I saw [13:33:47] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 13:33:44 UTC 2013 [13:33:52] and at 12:30:55 UTC and at 11:45:20 and at 11:44:25 [13:33:58] what were you doing then ? [13:34:07] cause i got 0 restarts the previous days [13:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [13:34:27] saw i am gonna assume that is you :-) [13:34:30] https://etherpad.wikimedia.org/p/l10n-team-2013-10 https://etherpad.wikimedia.org/p/i18n-team-04 [13:34:58] epl has a tendency to corrup pads https://github.com/ether/etherpad-lite/issues/1885#issuecomment-26715140 [13:35:38] [2013-10-21 13:15:36.165] [ERROR] console - Error: exports: mismatched apply: 28808 / 28806 [13:35:38] at Object.error (/usr/share/etherpad-lite/src/static/js/Changeset.js:39:11) [13:35:38] at Object.assert (/usr/share/etherpad-lite/src/static/js/Changeset.js:53:13) [13:35:38] at Object.exports.applyToText (/usr/share/etherpad-lite/src/static/js/Changeset.js:907:11) [13:35:38] at Object.exports.applyToAText (/usr/share/etherpad-lite/src/static/js/Changeset.js:1598:19) [13:35:38] at Pad.getInternalRevisionAText (/usr/share/etherpad-lite/src/node/db/Pad.js:204:27) [13:35:38] at async.series.results (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:486:21) [13:35:39] at _asyncMap (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:185:13) [13:35:39] at async.forEachSeries.iterate (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:108:13) [13:35:40] at async.forEachSeries.iterate (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:119:25) [13:35:40] at _asyncMap (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:187:17) [13:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 13:35:40 UTC 2013 [13:35:47] nice... [13:35:53] O_o [13:36:27] [2013-10-21 11:45:09.793] [ERROR] console - [RangeError: Maximum call stack size exceeded] [13:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [13:36:30] man... [13:36:36] at least 2 different bugs [13:36:46] (not that i am surprised...) [13:36:56] akosiaris: could you please file upstream? [13:37:18] this could explain all the 503 errors randomly appearing now and then, perhaps [13:37:41] I can. maybe they will get fixed [13:38:57] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [13:39:27] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 28.16 ms [13:40:58] https://github.com/ether/etherpad-lite/issues/1953 [13:41:47] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 13:41:43 UTC 2013 [13:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 13:42:39 UTC 2013 [13:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [13:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [13:43:26] akosiaris: and I filed https://github.com/ether/etherpad-lite/issues/1954 (convert.js doesn't migrate saved revisions from etherpad to etherpad lite) [13:43:47] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 13:43:45 UTC 2013 [13:43:50] test driven development is awesome :-] [13:43:55] finally got all my tests to pass \O/ [13:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [13:44:19] Nemo_bis: huh... convert.js is a big PITA [13:44:47] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 13:44:45 UTC 2013 [13:44:59] no wonder [13:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [13:52:27] Nemo_bis: closed in my face... [13:52:30] latest develop... [13:52:56] As if we are going to be running the latest develop... [13:54:19] well... there is on minor version we can upgrade to... But it won't help since it is already 9 days old [13:54:52] (03PS1) 10Cmjohnson: Updating cerium and praseodymium [operations/puppet] - 10https://gerrit.wikimedia.org/r/90905 [13:55:13] heh, that was fast [13:56:07] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 13:55:58 UTC 2013 [13:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [13:56:47] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 13:56:44 UTC 2013 [13:57:07] (03CR) 10Cmjohnson: [C: 032] Updating cerium and praseodymium [operations/puppet] - 10https://gerrit.wikimedia.org/r/90905 (owner: 10Cmjohnson) [13:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [14:05:18] (03CR) 10Lydia Pintscher: "Daniel, Katie: Can we please talk this through before merging? I have some reservations if this is doing what I think it is. Would like to" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/65443 (owner: 10Dzahn) [14:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 14:18:42 UTC 2013 [14:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [14:19:30] argh [14:19:57] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 14:19:47 UTC 2013 [14:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [14:21:47] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 14:21:43 UTC 2013 [14:21:57] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 14:21:48 UTC 2013 [14:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [14:22:43] hashar: issue with gerrit fixed [14:22:43] akosiaris: okay to merge your drac.pp changes? [14:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [14:23:07] cmjohnson1: i forgot them on sockpuppet ? [14:23:11] shit! [14:23:13] yep [14:23:14] sorry... yes please [14:23:18] cool [14:23:19] thx [14:23:27] no, thank you :-) [14:23:35] akosiaris: how did you manage to fix it ? :-] [14:23:47] you can probably dump / close https://bugzilla.wikimedia.org/show_bug.cgi?id=55948#c5 [14:24:08] well... it turns out it is a case of things going a little bit awry [14:24:14] i have no idea why i used to work [14:24:21] but the problem was that for some reason [14:24:29] gerrit2 users homedir is /home/gerrit2 [14:24:48] which did not contain the .ssh directory that normally is contained in /var/lib/gerrit [14:24:57] addshore: Gerrit replication is back up thanks to akosiaris, so your extensions should be installed using the latest master again [14:25:06] :D [14:25:12] akosiaris: bahhh :-( [14:25:12] * addshore thanks akosiaris and hashar [14:25:19] I had thought of that earlier being really pissed about it [14:25:24] akosiaris: can you copy paste to bug 55948 and close it please ? [14:25:52] cause gerrit2 user has no job having a home there but it turns out you need to restart gerrit before it will see new files in .ssh [14:27:34] it cashes them somehow and i don't want to know how.... crappy java code [14:27:34] hashar: yes i will close the bug [14:27:34] but that thing.... I will fix gerrit2's homedir is going to be /var/lib/gerrit... no more /home [14:28:47] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 14:28:46 UTC 2013 [14:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [14:29:22] akosiaris: that would be nice. Thank you veryyyy much :-] [14:30:03] (03CR) 10Aude: [C: 04-1] "per Lydia, we need to discuss and re-consider the approach" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/65443 (owner: 10Dzahn) [14:32:17] (03PS1) 10Ottomata: Missing | character in pagecounts hourly import cron job [operations/puppet] - 10https://gerrit.wikimedia.org/r/90907 [14:32:26] (03CR) 10Ottomata: [C: 032 V: 032] Missing | character in pagecounts hourly import cron job [operations/puppet] - 10https://gerrit.wikimedia.org/r/90907 (owner: 10Ottomata) [14:32:57] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 14:32:47 UTC 2013 [14:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [14:34:07] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 14:34:03 UTC 2013 [14:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [14:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 14:35:39 UTC 2013 [14:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [14:39:46] yurik: hey [14:39:54] yurik: any progress with https://bugzilla.wikimedia.org/show_bug.cgi?id=54822 ? [14:41:47] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 14:41:41 UTC 2013 [14:42:30] <^d> akosiaris: Thanks for spotting the homedir issue with gerrit2, I overlooked that. [14:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [14:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 14:42:42 UTC 2013 [14:42:58] np. I am just curious how it used to work... what changed ? [14:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [14:43:58] <^d> akosiaris: The package sucks, so I was working around it and recreated the gerrit2 user. I did so incorrectly and didn't specify the right homedir. [14:44:07] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 14:43:57 UTC 2013 [14:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [14:44:47] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 14:44:42 UTC 2013 [14:44:48] ^d: ok thanks for the explanation [14:45:19] (03CR) 10Akosiaris: [C: 032] toollabs: Sort package names in dev_environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/90765 (owner: 10Yuvipanda) [14:45:21] <^d> I'm so glad we're getting rid of this package :) [14:45:29] <^d> Well, eventually. [14:45:40] akosiaris: there's two more where that came from, just in case you hadn't noticed :D [14:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [14:45:48] yeah i noticed [14:45:53] :D [14:46:00] but ... why did jenkins give it a +1 and not a +2 ? [14:46:09] .. no idea? [14:46:09] hashar: ^ ? [14:50:36] <^d> akosiaris: I kicked off replication jobs for all repos to force it to catch up. About halfway done now. [14:50:52] aaah cool. Thanks :-) [14:55:57] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 14:55:47 UTC 2013 [14:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [14:56:47] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 14:56:42 UTC 2013 [14:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [15:02:57] andrewbogott: I would like to pm you when possible [15:03:37] RECOVERY - Disk space on ms-be1006 is OK: DISK OK [15:04:28] matanya: Sure. I'm still catching up, haven't read your patches yet this morning. [15:05:12] thanks andrewbogott, let me know when you are available [15:05:48] ottomata: jq 1.3 in Debian unstable btw [15:06:39] twkozlowski: I have stopped the frwiki collation stuff on request of springle-afk as it was upsetting the master [15:07:07] I know he was doing some further testing to see if he could find a route going forward (like doing some reads on the slaves rather than master) [15:07:46] nice! [15:07:49] thanks paravoid [15:07:52] ottomata: when I asked for a rationale/alternatives for the udp2log thing, I wasn't asking for such a detailed wiki page btw :) [15:07:55] not that I mind [15:08:25] hehe, I know, but there were more and more options as we all discussed this [15:08:43] also I didn't actually know all of the answers to your questions, so it was good for me to poke diederik and actually write them down [15:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 15:18:39 UTC 2013 [15:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [15:19:57] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 15:19:54 UTC 2013 [15:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [15:21:47] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 15:21:45 UTC 2013 [15:21:47] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 15:21:45 UTC 2013 [15:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [15:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [15:28:47] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 15:28:44 UTC 2013 [15:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [15:32:47] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 15:32:45 UTC 2013 [15:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [15:33:47] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 15:33:46 UTC 2013 [15:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [15:35:04] MaxSem: heh, thanks a lot :) [15:35:24] ? [15:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 15:35:37 UTC 2013 [15:36:06] xff [15:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [15:36:47] ah:) [15:37:10] paravoid: thanks a log for your inspiration with the ssh module [15:37:14] *lot [15:37:53] that was more than a year ago [15:38:02] but you're welcome :) [15:41:47] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 15:41:41 UTC 2013 [15:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [15:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 15:42:42 UTC 2013 [15:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [15:43:47] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 15:43:42 UTC 2013 [15:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [15:44:57] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 15:44:47 UTC 2013 [15:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [15:53:13] matanya, I'm having flaky internet problems so won't be able to use gerrit for a while. Did you have a specific/immediate question? [15:53:40] andrewbogott: just some general stuff [15:55:57] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 15:55:53 UTC 2013 [15:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [15:56:47] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 15:56:43 UTC 2013 [15:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [15:59:11] <^d> manybubbles: Yo, you ready? :) [15:59:35] ^d: sure! you want to push this to test2 first? [15:59:55] <^d> Hmm, this config doesn't swap on per-wiki basis, lemme amend. [16:00:20] can you push to the appropriate machine? I saw something about that [16:01:17] <^d> That's test, not test2. [16:01:28] <^d> And it's an old way of testing things that's frowned upon :) [16:02:30] ah [16:03:46] !log tweaked permissions of the blog's w3 caching plugin to actually make it work again [16:04:00] do i dare try to update it... [16:04:00] Logged the message, RobH [16:07:07] !log updated akismet blog plugin [16:07:17] Logged the message, RobH [16:08:06] (03PS4) 10Chad: Use new LVS setup for search on test(2)wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86743 [16:08:07] (03PS1) 10Chad: Use new LVS setup for search for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90916 [16:08:09] <^d> manybubbles: ^ [16:09:06] (03CR) 10Manybubbles: [C: 031] Use new LVS setup for search on test(2)wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86743 (owner: 10Chad) [16:09:15] looks good to me [16:10:30] (03CR) 10Chad: [C: 032] Use new LVS setup for search on test(2)wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86743 (owner: 10Chad) [16:10:39] (03Merged) 10jenkins-bot: Use new LVS setup for search on test(2)wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86743 (owner: 10Chad) [16:11:34] !log demon synchronized wmf-config/CirrusSearch-production.php 'LVS for cirrus on test2wiki' [16:11:47] Logged the message, Master [16:14:06] (03CR) 10Ori.livneh: [C: 032] "Tested on mw60" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90869 (owner: 10Ori.livneh) [16:15:13] <^d> manybubbles: I'm getting results on test2. [16:15:30] ^d: me too. I'm happy [16:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 16:18:42 UTC 2013 [16:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [16:20:07] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 16:19:57 UTC 2013 [16:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [16:21:13] (03CR) 10Manybubbles: [C: 031] "Working on test2 so should be good to deploy" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90916 (owner: 10Chad) [16:21:47] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 16:21:44 UTC 2013 [16:21:57] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 16:21:49 UTC 2013 [16:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [16:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [16:25:41] <^d> manybubbles: Gonna flip the switch on the rest now [16:25:49] sounds good to me! [16:25:58] (03CR) 10Chad: [C: 032] Use new LVS setup for search for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90916 (owner: 10Chad) [16:26:08] (03Merged) 10jenkins-bot: Use new LVS setup for search for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90916 (owner: 10Chad) [16:26:36] !log demon synchronized wmf-config/CirrusSearch-production.php 'LVS for cirrus on all wikis' [16:26:48] Logged the message, Master [16:27:31] looks good [16:27:45] <^d> Hmm, suggestions aren't working for me on mw.org [16:28:29] <^d> "ext.gadget.externalsearch" [16:28:32] <^d> What the heck is that? [16:28:47] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 16:28:40 UTC 2013 [16:28:55] <^d> Freaking a. [16:29:03] <^d> Some broken gadget enabled by default. [16:29:15] .... [16:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [16:29:19] great! [16:29:40] works for me, but I'm logged in [16:29:54] seems to work while logged out too [16:30:45] !log ran sync-apache for I1113b9594 & Ie73ce6213; verified on mw60; gracefuling. [16:30:58] Logged the message, Master [16:31:39] <^d> Suggestions when on the search page are what's broken [16:32:09] <^d> Also, we don't seem to be suggesting redirects :\ [16:32:57] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 16:32:48 UTC 2013 [16:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [16:33:47] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 16:33:44 UTC 2013 [16:34:17] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [16:34:41] (03PS1) 10Jforrester: Enable VisualEditor for NS_FILE, NS_HELP, NS_CATEGORY [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90923 [16:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 16:35:41 UTC 2013 [16:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [16:39:15] ^d: I see. I'm not sure we ever suggested redirects. [16:39:30] <^d> I could've sworn we did, hm. [16:40:05] lucenesearch certainly does [16:40:10] just checked that [16:41:47] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 16:41:42 UTC 2013 [16:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [16:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 16:42:43 UTC 2013 [16:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [16:43:47] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 16:43:43 UTC 2013 [16:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [16:44:47] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 16:44:43 UTC 2013 [16:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [16:51:17] matanya: still around? [16:51:23] yes andrewbogott [16:51:39] I have ~15 minute now, and am in a cafe with apparently stable internet. What's up? [16:52:05] all cool. i'll grab a drink and be right back [16:52:22] anomie: heya, sorry you can't make it to the devops kickoff tomorrow, I tried to schedule it when you seemed free, what did I do wrong? [16:52:38] anomie: too late? [16:54:45] greg-g: too late. Alternating weeks after 5pm is not good for me. [16:55:02] gotcha [16:55:52] wait, confused anomie. Alternating? MW Core is at same time every week. [16:55:57] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 16:55:47 UTC 2013 [16:56:09] I just want to make sure I understand so I can respect your time better [16:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [16:56:18] greg-g: I make an exception for the MW Core meeting [16:56:31] anomie: gotcha [16:56:47] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 16:56:42 UTC 2013 [16:56:55] so you and Tim can never talk :( [16:57:10] "never" being a little extreme, of course ;) [16:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [17:01:15] greg-g: It's difficult sometimes. For OAuth we scheduled the weekly check-in at 6pm SF time. [17:01:27] Since we had no Europeans, that worked. [17:01:36] * greg-g nods [17:14:48] Who's on RT this week? [17:14:58] Or better yet, how would I find out who's on RT duty? [17:15:46] andrewbogott [17:15:47] https://wikitech.wikimedia.org/wiki/RT_Triage_Duty [17:16:12] cool, thanks [17:18:47] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Mon Oct 21 17:18:44 UTC 2013 [17:19:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: No successful Puppet run in the last 10 hours [17:19:47] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Mon Oct 21 17:19:44 UTC 2013 [17:20:37] PROBLEM - Puppet freshness on cp1023 is CRITICAL: No successful Puppet run in the last 10 hours [17:21:47] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Mon Oct 21 17:21:44 UTC 2013 [17:22:07] RECOVERY - Puppet freshness on cp1035 is OK: puppet ran at Mon Oct 21 17:21:59 UTC 2013 [17:22:07] PROBLEM - Puppet freshness on cp1035 is CRITICAL: No successful Puppet run in the last 10 hours [17:22:08] mark, could you check https://gerrit.wikimedia.org/r/#/c/90665/ -- it fixes incorrect host matching and allows beta cluster to properly function as well [17:22:19] or paravoid ^ [17:22:47] PROBLEM - Puppet freshness on cp1021 is CRITICAL: No successful Puppet run in the last 10 hours [17:23:36] paravoid: is it possible to rotate the logs for swiftrepl.py on copper? it's at 4.8gb now with 2.3g left free (I moved some old logs this morning, see SAL) [17:24:14] !log The @Wikimedia Operations team is seeking proposals for a new datacenter in the midwestern/western US: https://blog.wikimedia.org/2013/10/21/rfp-new-datacenter-continental-us/ [17:24:23] No idea how well that will actually work... [17:24:28] Logged the message, Master [17:24:41] Sweet. It did [17:25:33] :) [17:25:46] Reedy: seattle equinix [17:28:11] (03CR) 10Akosiaris: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90766 (owner: 10Yuvipanda) [17:28:47] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 17:28:43 UTC 2013 [17:28:53] Offset the heat generated by the servers to make coffee? [17:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [17:29:57] Reedy: that sounds like a plan [17:30:57] Reedy: i worked with them there, and they are great, and meet the RFP, though somewhat pricy [17:31:25] Seems to be certainly an industry that you get what you pay for [17:32:57] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Oct 21 17:32:49 UTC 2013 [17:33:07] PROBLEM - Puppet freshness on cp1029 is CRITICAL: No successful Puppet run in the last 10 hours [17:33:47] RECOVERY - Puppet freshness on cp1032 is OK: puppet ran at Mon Oct 21 17:33:44 UTC 2013 [17:34:18] PROBLEM - Puppet freshness on cp1032 is CRITICAL: No successful Puppet run in the last 10 hours [17:35:38] Reedy: well, you may note what i said to Ken, if you wish [17:35:47] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Mon Oct 21 17:35:39 UTC 2013 [17:36:27] PROBLEM - Puppet freshness on cp1027 is CRITICAL: No successful Puppet run in the last 10 hours [17:41:47] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Oct 21 17:41:45 UTC 2013 [17:42:47] PROBLEM - Puppet freshness on cp1036 is CRITICAL: No successful Puppet run in the last 10 hours [17:42:47] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Mon Oct 21 17:42:41 UTC 2013 [17:43:17] PROBLEM - Puppet freshness on cp1024 is CRITICAL: No successful Puppet run in the last 10 hours [17:43:47] RECOVERY - Puppet freshness on cp1030 is OK: puppet ran at Mon Oct 21 17:43:46 UTC 2013 [17:44:07] PROBLEM - Puppet freshness on cp1030 is CRITICAL: No successful Puppet run in the last 10 hours [17:44:24] !log reclaiming cp1021-36, cp1041-42 per RT5981 [17:44:39] Logged the message, Master [17:44:57] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Mon Oct 21 17:44:56 UTC 2013 [17:45:47] PROBLEM - Puppet freshness on cp1026 is CRITICAL: No successful Puppet run in the last 10 hours [17:55:48] gwicke do you want to turn hyper threading on for your test host? [17:55:57] RECOVERY - Puppet freshness on cp1033 is OK: puppet ran at Mon Oct 21 17:55:47 UTC 2013 [17:56:17] PROBLEM - Puppet freshness on cp1033 is CRITICAL: No successful Puppet run in the last 10 hours [17:56:47] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Mon Oct 21 17:56:42 UTC 2013 [17:57:27] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [18:10:52] (03PS1) 10Amire80: Add lang and dir attributes to the Wikimedia address for Echo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90934 [18:12:30] (03PS2) 10Mwalker: Add BannerRandom filter to erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/90667 [18:14:17] PROBLEM - Host cp1021 is DOWN: PING CRITICAL - Packet loss = 100% [18:16:56] Reedy: it used to be better, with !Wikimedia group tag, but identi.ca killed that (and we killed identi.ca support I think?) [18:17:13] and yes, it would be nice to microblog more via morebots :P [18:17:47] PROBLEM - Host cp1023 is DOWN: PING CRITICAL - Packet loss = 100% [18:17:57] PROBLEM - Host cp1024 is DOWN: PING CRITICAL - Packet loss = 100% [18:18:07] PROBLEM - Host cp1025 is DOWN: PING CRITICAL - Packet loss = 100% [18:18:07] PROBLEM - Host cp1027 is DOWN: PING CRITICAL - Packet loss = 100% [18:18:08] PROBLEM - Host cp1026 is DOWN: PING CRITICAL - Packet loss = 100% [18:18:17] PROBLEM - Host cp1033 is DOWN: PING CRITICAL - Packet loss = 100% [18:18:17] PROBLEM - Host cp1032 is DOWN: PING CRITICAL - Packet loss = 100% [18:18:27] PROBLEM - Host cp1035 is DOWN: PING CRITICAL - Packet loss = 100% [18:18:34] * cmjohnson1 waives good-bye to cp1021-1036 [18:26:17] (03PS1) 10Reedy: Revert site specific config change to hewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90935 [18:26:21] mark: do you know if the https terminator boxes do de/compression as well as encryption? or are they straight pass through of whatever the squids give them? [18:27:55] (03CR) 10Reedy: [C: 032] Revert site specific config change to hewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90935 (owner: 10Reedy) [18:28:07] (03Merged) 10jenkins-bot: Revert site specific config change to hewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90935 (owner: 10Reedy) [18:29:00] !log reedy synchronized wmf-config/InitialiseSettings.php [18:29:07] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Mon Oct 21 18:29:00 UTC 2013 [18:29:12] Logged the message, Master [18:29:17] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [18:32:17] PROBLEM - RAID on cp1029 is CRITICAL: CRITICAL: Active: 1, Working: 1, Failed: 1, Spare: 0 [18:32:37] PROBLEM - Disk space on cp1029 is CRITICAL: DISK CRITICAL - /srv/sda3 is not accessible: Input/output error [18:32:47] PROBLEM - Disk space on cp1030 is CRITICAL: DISK CRITICAL - /srv/sda3 is not accessible: Input/output error [18:33:07] I should do this deploy [18:33:07] PROBLEM - RAID on cp1030 is CRITICAL: CRITICAL: Active: 1, Working: 1, Failed: 1, Spare: 0 [18:34:17] PROBLEM - RAID on cp1036 is CRITICAL: CRITICAL: Active: 1, Working: 1, Failed: 1, Spare: 0 [18:34:27] PROBLEM - Disk space on cp1036 is CRITICAL: DISK CRITICAL - /srv/sda3 is not accessible: Input/output error [18:34:55] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.22wmf22 [18:35:08] Logged the message, Master [18:35:36] (03CR) 10Ottomata: [C: 031] "I think this should be fine." [operations/puppet] - 10https://gerrit.wikimedia.org/r/90667 (owner: 10Mwalker) [18:35:37] PROBLEM - LVS HTTP IPv6 on wikipedia-lb.esams.wikimedia.org_ipv6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.221 second response time [18:35:57] (03CR) 10Jgreen: [C: 031] Add BannerRandom filter to erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/90667 (owner: 10Mwalker) [18:36:18] ottomata: Jeff_Green had voiced concerns about udp2log load? [18:36:37] PROBLEM - Host cp1036 is DOWN: PING CRITICAL - Packet loss = 100% [18:36:37] RECOVERY - LVS HTTP IPv6 on wikipedia-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 67954 bytes in 0.460 second response time [18:36:47] PROBLEM - Host cp1030 is DOWN: PING CRITICAL - Packet loss = 100% [18:37:57] PROBLEM - Host cp1029 is DOWN: PING CRITICAL - Packet loss = 100% [18:39:14] yeah, i thikn it should be fine [18:39:41] erbium is runnign pretty spare at the moment. there should be plenty of headroom [18:40:06] and your filter doesn't look heavy, especially with the 100 sampling [18:40:08] mwalker: ^ [18:40:54] !log reedy synchronized wmf-config/interwiki.cdb 'Updating interwiki cache' [18:41:06] Logged the message, Master [18:42:10] (03PS1) 10Reedy: All non wikipedias to 1.22wmf22 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90938 [18:42:11] (03PS1) 10Reedy: Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90939 [18:42:16] ottomata: thanks [18:42:31] Jeff_Green: ^ [18:42:33] (03CR) 10Reedy: [C: 032] All non wikipedias to 1.22wmf22 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90938 (owner: 10Reedy) [18:42:40] (03CR) 10Reedy: [C: 032] Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90939 (owner: 10Reedy) [18:43:06] mwalker: ya [18:43:13] (03Merged) 10jenkins-bot: All non wikipedias to 1.22wmf22 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90938 (owner: 10Reedy) [18:43:20] (03Merged) 10jenkins-bot: Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90939 (owner: 10Reedy) [18:43:55] hey ottomata, got a sec? [18:44:12] Jeff_Green, ottomata: so you both +1'd; and I don't have +2 permissions... one of you want to hit the button? :D [18:44:28] yeah ja [18:44:32] mwuhahahahha. hahahahha.mwuwh mwuh. [18:44:37] ottomata: you got it? [18:45:45] hah [18:45:47] yeah i can do! [18:45:59] (03PS3) 10Ottomata: Add BannerRandom filter to erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/90667 (owner: 10Mwalker) [18:46:05] (03CR) 10Ottomata: [C: 032 V: 032] Add BannerRandom filter to erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/90667 (owner: 10Mwalker) [18:46:10] whooo! [18:46:50] Jeff_Green: will those data files just start showing up on Al automagically; or will I have to ask you to retrieve them for me from erbium in a couple of hours? [18:47:25] mwalker: i can't remember offhand whether the rotation script needs tweaking. checking [18:49:17] PROBLEM - Host cp1042 is DOWN: PING CRITICAL - Packet loss = 100% [18:49:54] i need to tell the script to watch for them. [18:52:04] (03PS1) 10Jgreen: add log file to rotate_fundraising_logs to accompany change to erbium filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/90941 [18:53:26] (03CR) 10Jgreen: [C: 032 V: 031] add log file to rotate_fundraising_logs to accompany change to erbium filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/90941 (owner: 10Jgreen) [18:53:35] (03CR) 10Hashar: "(1 comment)" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/90716 (owner: 10Hashar) [18:55:47] PROBLEM - Host cp1041 is DOWN: PING CRITICAL - Packet loss = 100% [18:56:09] ottomata: librdkafka supports dns roundrobin, so you could have a single dns record for all brokers if that makes your life simpler. [18:56:52] hmm, naw its easy to config them in puppet [18:56:58] the config has to exist elsewhere in puppet anyway [18:57:13] e.g. [18:57:33] class { 'varnishkafka': brokers => $role::analytics::kafka::brokers [18:57:35] or something like that [18:57:38] yokidoki [19:06:05] RECOVERY - Host praseodymium is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [19:22:46] (03PS1) 10Cmjohnson: Remvoing cp1021-cp1036 /cp1041-42 from puppet files [operations/puppet] - 10https://gerrit.wikimedia.org/r/90946 [19:24:17] (03CR) 10Cmjohnson: [C: 032] Remvoing cp1021-cp1036 /cp1041-42 from puppet files [operations/puppet] - 10https://gerrit.wikimedia.org/r/90946 (owner: 10Cmjohnson) [19:35:26] (03PS1) 10Cmjohnson: Removing dns entries for cp1021-36 and 1041/42 [operations/dns] - 10https://gerrit.wikimedia.org/r/90960 [19:36:57] (03CR) 10Cmjohnson: [C: 032] Removing dns entries for cp1021-36 and 1041/42 [operations/dns] - 10https://gerrit.wikimedia.org/r/90960 (owner: 10Cmjohnson) [19:38:17] !log dns update [19:38:33] Logged the message, Master [19:48:34] yay [20:01:35] paravoid, any thoughts on the kafka -> udp2log issue? [20:02:55] (03CR) 10Hashar: "After some investigations, I think 'trunk' comes from git-import-orig which has --upstream-branch defaulting to trunk. I am pretty sure t" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/90716 (owner: 10Hashar) [20:03:19] (03PS1) 10Cmjohnson: Removing dns entrries for reclaimed servers arsenic and niobium [operations/dns] - 10https://gerrit.wikimedia.org/r/91036 [20:04:14] (03CR) 10Cmjohnson: [C: 032 V: 032] Removing dns entrries for reclaimed servers arsenic and niobium [operations/dns] - 10https://gerrit.wikimedia.org/r/91036 (owner: 10Cmjohnson) [20:05:04] !log dns update [20:09:00] (03PS1) 10Chad: bnwiki gets Cirrus as alternative [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91038 [20:10:38] (03CR) 10Ottomata: "(1 comment)" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/90716 (owner: 10Hashar) [20:12:43] (03CR) 10Manybubbles: [C: 031] "I support this proposal." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91038 (owner: 10Chad) [20:15:30] hashar: does that make sense? [20:15:41] ottomata: yeah more or less :-D [20:15:42] I don't htink either alex or I have built kafka using that gbp.conf [20:15:55] RobH: ping ;) [20:16:18] gwicke: so two of three of your servers are done, and i think cmjohnson1 is finishing up last [20:16:21] but had question for you [20:16:23] and 'trunk' is actually a real branch name [20:16:26] do you want hyperthreading on or off? [20:16:26] ottomata: I am looking at creating the packages for us whenever some send a patchset in Gerrit [20:16:31] right now its all off [20:16:41] we can turn one system on if you wanted to see if it changes things? [20:16:45] yeah hashar that would be cool [20:16:46] whatever you need =] [20:16:48] actually robh/gwicke they're all done I was waiting for a response to the HT q [20:16:53] cool [20:16:55] would it work for different branches? [20:16:59] we ahven't been committing the build branches [20:17:03] there'd be too many [20:17:10] RobH: not sure re hyperthreading- I guess on would not hurt [20:17:21] <^demon|lunch> manybubbles: I'm going to do it at 4pm sf time. [20:17:49] gwicke: well, we can turn it on one, and off on other two, and you can compare and see if it matters? [20:17:49] ^demon|lunch: sounds good to me. I'll be available in case of disaster but don't expect anything [20:17:51] afaik Cassandra does use multiple cores quite well [20:17:51] sound good? [20:17:59] or can turn on two and off one [20:17:59] or that, yes [20:18:04] your call =] [20:18:13] i say one on, rest off, but thats cuz they are all off [20:18:18] heh [20:18:18] I'd just turn it on in general [20:18:20] ok [20:18:23] cmjohnson1: ^ [20:18:33] ah, it does not matter that much I think [20:18:44] that parameter is not something I planned to study so far [20:19:22] k [20:21:17] PROBLEM - Host cerium is DOWN: PING CRITICAL - Packet loss = 100% [20:25:47] RECOVERY - Host cerium is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [20:25:57] PROBLEM - Host xenon is DOWN: PING CRITICAL - Packet loss = 100% [20:28:37] PROBLEM - Host praseodymium is DOWN: PING CRITICAL - Packet loss = 100% [20:29:27] RECOVERY - Host xenon is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [20:30:24] (03PS1) 10Cmjohnson: Removing cp1021 from role/cache.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/91041 [20:30:55] gwicke: they're all yours [20:31:21] cmjohnson1: awesome, thanks! [20:31:47] can't log into cerium yet [20:32:27] give it a go now (gwicke) [20:32:51] still no luck as gwicke [20:33:12] now that's odd...robh..can you try plz [20:33:15] (03CR) 10Hashar: "Thanks for clarifying the workflow in use. I guess I will get some more debs in Jenkins to gain more experience with git buildpackage work" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/90716 (owner: 10Hashar) [20:33:17] I'm trying from bast1001 [20:33:37] RECOVERY - Host praseodymium is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [20:33:53] # cerium,praseodymium and xenon are cassandra test host [20:33:53] node /^(cerium|praseodymium|xenon)\.eqiad\.wmnet$/ { [20:33:53] include standard [20:33:53] i can get in from bast1001 [20:33:54] } [20:33:59] there are no notes to let gwicke in manifest [20:34:18] gwicke: You'll want to either have someone include you, or you can submit your own patchset to give yourself the rights [20:34:26] yeah...didn't add that [20:34:26] cmjohnson1: wanna handle that from ops side? [20:34:39] yea, i expected gwicke to do it, but never conveyed it to him ;] [20:34:49] gwicke: So, usually, what will happen is we will push hosts online with nothing special. [20:35:02] then dev will usually submit a site.pp patchset adding themselves to sudo for that host [20:35:08] and some ops person will merge [20:35:12] RobH: I have not done puppet access right management before, so would need some hand holding [20:35:14] or they'll ask an ops person to add and merge [20:35:24] ahh, well, we can do it, or we can help you do it (either way is cool) [20:35:30] but i rather teach you to fish =] [20:35:41] ok [20:35:55] * gwicke updates the puppet checkout [20:35:57] the easy way to find out 'how should i do this as a dev' is look for an example like ori-l's patches [20:36:17] he tends to submit patchsets for his works in progress and flag appropriate ops folks [20:36:30] so, lets see... [20:36:58] if you open site.pp, and look at the entry for formey [20:37:09] you can see an example where chad has local sudo rights on the box for that [20:37:15] line 990 [20:37:25] (03CR) 10Ottomata: [C: 032 V: 032] gbp: do not set export-dir [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/90716 (owner: 10Hashar) [20:37:45] So sudo rights are usually not included in the role classes, but under individual server entries in site.pp [20:38:23] gwicke: Keep in mind that just cuz i think devs should know this doesnt mean you have to know this, im not in charge of anyone =] [20:39:05] RobH: let me have a look at it and see if I can figure it out [20:39:16] need to clean my checkout first [20:39:18] cool, if you get frustrated, or when you have a patchset, lemme know [20:39:24] *nod* [20:39:25] you can add me as reviewer, im happy to merge. [20:39:39] (for the sudo and like stuff, since that stuff i get) [20:39:49] when you get crazy into cassandra tweaking in puppet, im not the dude ;] [20:40:07] i'm working on that right now [20:40:54] ah, just similar, nevermind, another access request [20:47:56] Someone around with access to the ldap logs? [20:48:07] for what purpose? [20:48:09] A user has problems to log in to gerrit [20:48:17] We've checked the gerrit db. [20:48:22] And the ldap account. [20:48:28] Both look fine. [20:49:04] At some point there was some weird error logged, [20:49:19] that hinted towards an ldap query not finding groups. [20:49:55] it shouldn't be a problem to not find groups [20:50:11] Yes. [20:50:38] Still the error message we saw seemed to be caused by an ladp query going wrong. [20:51:02] So I was curious if the query itself (that gerrit sends) was sound [20:55:45] Ryan_Lane: Since I do not understand what goes wrong in code path that causes the problem [20:56:00] Seeing the actual queries would be super-helpful in debugging the problem. [20:56:18] (03PS1) 10Dzahn: sudo -u parsoid access for parsoid admins [operations/puppet] - 10https://gerrit.wikimedia.org/r/91043 [20:58:31] well, that's easier said than done [20:59:30] qchris: the user can log into wikitech? [20:59:36] Yes. [21:00:26] Hello. [21:00:38] (03PS1) 10Cmjohnson: Removing ms2 from dsh groups as it's decom'd [operations/puppet] - 10https://gerrit.wikimedia.org/r/91044 [21:00:41] hi DGarry [21:00:50] Hey Ori! [21:01:02] So I hear we're discussing my inability to access gerrit? [21:01:13] DGarry: Yes. [21:01:15] we are? Oh, I think qchris and Ryan_Lane are [21:01:29] :-) [21:01:43] DGarry: can you please try logging into it now? [21:01:44] ori-l: Yep. :) [21:02:06] (03PS1) 10Cmjohnson: Removing DNS entries for ms2 [operations/dns] - 10https://gerrit.wikimedia.org/r/91045 [21:02:11] Ryan_Lane: Just did. Same error as usual. Invalid username or password. [21:02:23] try again. I'm tailing the logs [21:02:44] what username are you trying to log in with? [21:02:48] Deskana [21:02:57] I'm not seeing it in the logs [21:03:02] * Ryan_Lane looks at virt1000 [21:03:05] That's weird. [21:03:24] try now [21:03:39] (03CR) 10Cmjohnson: [C: 032] Removing DNS entries for ms2 [operations/dns] - 10https://gerrit.wikimedia.org/r/91045 (owner: 10Cmjohnson) [21:03:50] Ryan_Lane: Same error. [21:04:03] ok. I see logs now [21:04:03] !log dns update [21:04:17] Yippie! [21:04:38] bblack: do you have a sec for https://gerrit.wikimedia.org/r/#/c/90665/ [21:05:25] (03PS1) 10GWicke: Sudo for gwicke on cassandra test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91046 [21:06:46] !log morebots are you working? [21:07:00] Logged the message, Master [21:07:32] hm. weird [21:07:35] maybe replication is broken? [21:07:49] We have a second affected user as well [21:08:33] yeah, I'm seeing deskana's entry on virt0 and not virt1000 [21:08:37] I'll pm you the other user name. Maybe it fits the pattern as well [21:08:56] Does that explain why I can log in to wikitech.wikimedia.org but not gerrit.wikimedia.org? [21:09:47] (We also tried logging in to a different gerrit instance with the same ldap config, there logging in worked) [21:10:02] I've already figured out the pattern ;) [21:10:24] So replication is broken? [21:11:20] (03PS1) 10Dzahn: add account marktraceur and add to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91047 [21:11:26] Yayyyy [21:11:30] * marktraceur looks at watch [21:11:50] Thanks mutante! [21:11:51] (03CR) 10Cmcmahon: [C: 031] "oddities are bad" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90670 (owner: 10CSteipp) [21:12:14] (03PS4) 10BBlack: Fixed incorrect domain matching for ZERO [operations/puppet] - 10https://gerrit.wikimedia.org/r/90665 (owner: 10Yurik) [21:12:49] (03CR) 10jenkins-bot: [V: 04-1] add account marktraceur and add to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91047 (owner: 10Dzahn) [21:12:59] :( [21:13:04] Jenkins is just being spiteful [21:13:19] yeah, replication is broken [21:13:24] and gerrit points at virt1000 [21:13:32] (03CR) 10BBlack: [C: 032] Fixed incorrect domain matching for ZERO [operations/puppet] - 10https://gerrit.wikimedia.org/r/90665 (owner: 10Yurik) [21:13:52] marktraceur: can you do me a favor and please paste your SSH key here https://office.wikimedia.org/wiki/User:MHolmquist as your wiki user? [21:14:03] Sure sure [21:14:28] Ryan_Lane: Ok. That explains things. [21:15:59] yurik_: done [21:16:25] yurik: any progress with https://bugzilla.wikimedia.org/show_bug.cgi?id=54822 ? [21:18:03] (03PS2) 10Dzahn: add account marktraceur and add to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91047 [21:20:21] Ryan_Lane: I need to go soon. Is there anything else you need from me to fix this? [21:20:27] nope [21:20:46] (03CR) 10BBlack: "Really, we should fix this elsewhere. The basic idea is for all varnishes (not just mobile), do processing of trusted proxies (Opera, Nok" [operations/puppet] - 10https://gerrit.wikimedia.org/r/88261 (owner: 10Dr0ptp4kt) [21:20:49] Ryan_Lane: Great. Thanks a bunch. :) [21:21:24] no yurik_ anymore [21:21:33] patch got merged, he doesn't need us anymore :P [21:21:42] greg-g: ping [21:22:17] hi there [21:22:22] hello [21:22:34] AaronSchulz & me want to reenable multiwrite for swift @ eqiad [21:22:40] paravoid: I should have held it hostage! [21:23:04] paravoid: aha [21:23:18] when would you like to do that? [21:23:19] I pinged you about it last week too but we never synced up and I didn't send an email :-) [21:23:42] it can go in asap, maybe even now [21:23:50] or I can do it at european hours whenever [21:24:05] now is probably fine, honestly. [21:26:05] bleh. replication must have gone out of sync for some period of time [21:26:13] and now I need to reinitialize virt1000 [21:26:15] replication of what? [21:26:23] ldap [21:26:25] oh [21:26:28] no clue why [21:26:29] ouch [21:26:31] yeah [21:26:39] :-( [21:26:47] reinitialization only takes a few secs [21:26:50] but it's not a good sign [21:26:57] I wonder what caused that to break [21:27:56] paravoid: oh, now, hehe, ok :) [21:28:06] if you're busy, I can handle it [21:28:11] handle it myself I mean [21:28:20] no worries [21:28:26] * AaronSchulz will commit [21:29:13] (03PS1) 10Aaron Schulz: Added eqiad swift to multiwrite backends [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91050 [21:29:20] oh [21:29:24] need to add replication monitoring [21:29:25] damn, I had vi open [21:29:31] :) [21:29:34] (03Abandoned) 10Edenhill: Make scratch buffer size configurable (issue #2) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90028 (owner: 10Edenhill) [21:29:39] Ryan_Lane: I'm sure there's a ticket for that somewhere :P [21:29:41] (03Abandoned) 10Edenhill: Log failed Kafka message deliveries (issue #1) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90029 (owner: 10Edenhill) [21:29:43] definitely should never get to this point [21:29:50] it's dangerous for this to happen [21:29:55] (03Abandoned) 10Edenhill: Provide some more detail when Kafka ..produce() fails. [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90030 (owner: 10Edenhill) [21:30:20] I wonder if this happened during our DNS change [21:30:24] I bet it did [21:30:33] (03Abandoned) 10Edenhill: Added rate-limiting to (most) error logs generated by varnishkafka (issue #1) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90184 (owner: 10Edenhill) [21:30:53] (03CR) 10Faidon Liambotis: [C: 031] "LGTM" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91050 (owner: 10Aaron Schulz) [21:30:55] Ryan_Lane: We had the first reports of gerrit problems around early october. [21:31:13] why didn't anyone say anything? :) [21:31:31] (03CR) 10Aaron Schulz: [C: 032] Added eqiad swift to multiwrite backends [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91050 (owner: 10Aaron Schulz) [21:31:33] Hehe. [21:31:51] (03Merged) 10jenkins-bot: Added eqiad swift to multiwrite backends [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91050 (owner: 10Aaron Schulz) [21:31:52] We did try to make sure that we do everything possible to rule out errors on our side. [21:31:58] reinitialized [21:32:01] (03PS1) 10Edenhill: Added statistics (both from varnishkafka and librdkafka) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91052 [21:32:02] (03PS1) 10Edenhill: Grow scratch pad by temporary buffers if necessary (issue #2) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91053 [21:32:03] (03PS1) 10Edenhill: Limit maximum tag size content [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91054 [21:32:04] (03PS1) 10Edenhill: Increase string renderer output buffer to 8K [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91055 [21:32:05] Getting access to ldap logs is not easy ;-) [21:32:05] (03PS1) 10Edenhill: Avoid unnecessary clearing of scratch pad on logline alloc. [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91056 [21:32:24] thankfully we only write to one place [21:32:32] otherwise that could have been nasty [21:32:49] :-) [21:32:58] Thanks for looking into it. [21:33:24] yw [21:33:26] As DGarry is gone now, I'll check with him by email. [21:33:29] * Ryan_Lane nods [21:33:30] AaronSchulz: paravoid does the swift multiwrite need a deploy, or just waiting for puppet? [21:33:37] syncfile [21:33:43] (03Restored) 10Edenhill: Added rate-limiting to (most) error logs generated by varnishkafka (issue #1) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90184 (owner: 10Edenhill) [21:33:50] (03Restored) 10Edenhill: Make scratch buffer size configurable (issue #2) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90028 (owner: 10Edenhill) [21:33:55] (03Restored) 10Edenhill: Log failed Kafka message deliveries (issue #1) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90029 (owner: 10Edenhill) [21:33:59] (03Restored) 10Edenhill: Provide some more detail when Kafka ..produce() fails. [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90030 (owner: 10Edenhill) [21:34:03] !log aaron synchronized wmf-config/filebackend.php 'Added eqiad swift to multiwrite backends' [21:34:15] Logged the message, Master [21:34:16] that :) [21:34:52] I see reqs on the logs [21:35:03] backend error log looks fine [21:35:04] PROBLEM - Disk space on professor is CRITICAL: DISK CRITICAL - free space: /a 21574 MB (3% inode=99%): [21:35:20] * AaronSchulz looks at ori-l [21:36:34] (03CR) 10Ori.livneh: [C: 031] "I'm inclined to merge this, because" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90265 (owner: 10Aaron Schulz) [21:36:41] hrm? [21:36:44] (03PS1) 10Awjrichards: Ensure that m.mediawiki.org will work as an origin for CORS [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91058 [21:36:45] oh [21:36:46] professor [21:36:48] fml [21:36:50] * ori-l looks [21:37:14] (03CR) 10Chad: [C: 031] Switch to single Json object for gerrit's reviewer count query [operations/puppet] - 10https://gerrit.wikimedia.org/r/84743 (owner: 10QChris) [21:41:38] AaronSchulz: what's the rationale behind r90265? [21:41:51] (purge/thumbnail rate limits) [21:41:54] (03CR) 10Dzahn: [C: 031] "key confirmed: https://office.wikimedia.org/w/index.php?title=User:MHolmquist/Key&action=history" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91047 (owner: 10Dzahn) [21:42:03] is it the even where we had swift getting DoSed by too many DELETEs? [21:42:09] s/even/event/ [21:43:16] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/r90265 [21:43:18] moo? [21:43:33] paravoid: you mean the wmf-config thing [21:43:54] I meant https://gerrit.wikimedia.org/r/90265 [21:44:54] just disk space, wasted i/o and cpu, and cache eviction even if we didn't use swift [21:47:22] (03CR) 10Dzahn: [C: 031] Sudo for gwicke on cassandra test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91046 (owner: 10GWicke) [21:48:34] Krinkle: zuul and jenkins have generated 41G and 28G of graphite data, respectively. are there configuration options you could tweak to reduce the data points that get logged? this would be temporary, while graphite is still in tampa. [21:48:47] how about swift, ori-l? [21:49:00] I still haven't tweaked the sampling rate [21:49:03] swift is at 11G [21:49:07] and it was a bit excessive last time I checked [21:50:10] 11G in two weeks [21:50:12] doesn't sound great [21:50:16] I'll have a look [21:50:18] well, full disclosure, before i start harassing people: client-side stats (that's navigation timing and some ve) is at 106G [21:51:01] but for a much larger period, isn't it [21:51:12] also, considering that we do nothing with the swift stats than collecting them now... :) [21:51:14] yeah, several months [21:51:23] ori-l: I don't know any of the logging things hashar set up for jenkins/zuul. [21:51:50] Krinkle: OK, I'll poke him; it's not an emergency or anything. I can clear up disk space elsewhere in the interim. [21:51:57] ori-l: if you have spare cycles, I'd love your advice on what swift views to have [21:52:20] response time avg/95p/99p by method would be one I guess [21:52:41] ori-l: Any specifics on what kind of data? [21:53:09] the data as it appears in the graphs, or seemingly redundant data that isn't used but acompanies the data? [21:53:15] paravoid: I'm pretty new to this, but I found this very useful / persuasive: http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/ [21:53:20] eg. https://ganglia.wikimedia.org/latest/graph_all_periods.php?title=Jenkins+Queues&vl=&x=&n=&hreg%5B%5D=gallium&mreg%5B%5D=jenkins_overallload>ype=line&glegend=show&aggregate=1 [21:53:35] ori-l: I'll have a look, thanks [21:53:39] that's not graphite [21:53:40] ori-l: but this isn't about graphite per se [21:53:42] paravoid: specifically: 'Neither median nor mean can summarize the whole story of your system’s behavior. Instead I prefer to use a 5-95 span (thanks Steve Akers for showing me this metric and most of what I know about Graphite). A 5-95 span means that we cut off the extreme outliers above 95% and below 5%.' [21:53:57] I'd like us to be aligned in what data we collect [21:54:02] er, graphs we present I mean [21:54:22] (03CR) 10MarkTraceur: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91047 (owner: 10Dzahn) [21:54:54] ori-l: btw, my main motivation is not performance, but rather fixing a rather crazy piece of our setup [21:55:21] if you go to http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&vn=swift+frontend+proxies [21:55:51] you'll see a bunch of response time graphs [21:56:06] these are generated by an apache log parsers to ganglia script [21:56:25] so we log each and every request unsampled up to 4 times [21:56:46] (then we syslog that to fenari, which writes to the netapp, which gets replicated to the netapp across DCs -- but that's another story) [21:56:50] paravoid: anyway, that patch only does render/linkpurge not action=purge [21:57:04] RECOVERY - Disk space on professor is OK: DISK OK [21:57:53] what is "linkpurge"? [21:57:55] paravoid: well, which views have been useful? [21:58:19] (03CR) 10DarTar: "I did specify the preferred username but only in a comment to the original ticket, sorry about that." [operations/puppet] - 10https://gerrit.wikimedia.org/r/91047 (owner: 10Dzahn) [21:58:42] they're frequently wrong and hard to decipher with all the colors anyway [21:59:01] but I do think that having some performance metrics might help in the future [21:59:07] paravoid: when someone edits or does a recursive=true API purge to do page link table updates [21:59:30] might help me in debugging e.g. disk/raid controller issues or other kind of outages, might also help you in enhancing performance across the stack I'd hope [21:59:54] paravoid: blargh, have to go to a meeting, bbiaw. (joys of being local.) [22:00:07] it's not urgent anyway. [22:00:26] well, it touches on stuff that i've been wondering about too [22:00:34] so it'd be a useful discussion to have [22:04:16] !log Running syncFilebackend on all wikis (should reduce sync errors; a few a popping up in the logs) [22:04:18] (03PS1) 10Dzahn: give milimetric sudo privileges on analytics nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/91067 [22:04:29] Logged the message, Master [22:05:57] (03PS2) 10Dzahn: give milimetric sudo privileges on analytics nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/91067 [22:09:35] (03PS8) 10Andrew Bogott: Move mysql_wmf into a module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/88666 [22:13:53] !log maxsem synchronized php-1.22wmf22/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/91065' [22:13:56] (03CR) 10Andrew Bogott: "Sean -- this new patch incorporates your recent patch 'icinga pmp-check-mysql-innodb idle_blocker_duration.' Please verify that I didn't " [operations/puppet] - 10https://gerrit.wikimedia.org/r/88666 (owner: 10Andrew Bogott) [22:14:03] Logged the message, Master [22:14:42] (03CR) 10CSteipp: "Fine as long as the WMF owns the domain (I think we do)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91058 (owner: 10Awjrichards) [22:16:07] (03CR) 10CSteipp: "And by that, I mean all the sub domains under the top-level domain." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91058 (owner: 10Awjrichards) [22:17:19] (03PS1) 10Bsitu: Enable Echo on all wikis except dewiki and itwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91072 [22:21:04] (03CR) 10Bsitu: [C: 04-2] "Do not merge till deployment window" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91072 (owner: 10Bsitu) [22:24:01] (03PS3) 10Physikerwelt: Mathoid service [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 [22:27:47] (03PS1) 10Dzahn: add account for Gerrit Padgham and add to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91075 [22:30:03] PROBLEM - Host ms-be1004 is DOWN: PING CRITICAL - Packet loss = 100% [22:32:13] (03CR) 10Dzahn: [C: 04-1] "not yet, pending approval and key check" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91075 (owner: 10Dzahn) [22:32:29] paravoid: wee [22:33:24] (03CR) 10GWicke: [C: 031] Sudo for gwicke on cassandra test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91046 (owner: 10GWicke) [22:33:58] RobH: ^^ [22:44:10] ... [22:45:59] getting scared? [22:49:00] !log powercycling ms-be1004, locked up(?) [22:49:13] RECOVERY - Host ms-be1004 is UP: PING OK - Packet loss = 0%, RTA = 0.49 ms [22:49:18] Logged the message, Master [22:49:34] (03CR) 10Andrew Bogott: "I think the removal of the exec { "mkdir /var/spool/exim4/scan"} section is still correct here -- Matanya, care to resubmit with just that" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86889 (owner: 10Matanya) [22:51:16] (03PS1) 10JGonera: Add mobile views to ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/91079 [22:53:46] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=ms-be1004.eqiad.wmnet&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Swift+eqiad [22:53:49] fun [22:57:06] (03PS1) 10Dzahn: add account fflorin and add to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91084 [23:01:17] (03CR) 10Chad: [C: 032] bnwiki gets Cirrus as alternative [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91038 (owner: 10Chad) [23:02:13] (03PS2) 10JGonera: Add mobile views to ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/91079 [23:03:27] (03CR) 10Dzahn: [C: 04-1] "not yet, pending manager approval" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91084 (owner: 10Dzahn) [23:05:01] !log mwalker synchronized php-1.22wmf21/extensions/CentralNotice/ 'Updating CentralNotice to master' [23:05:12] Logged the message, Master [23:05:32] !log mwalker synchronized php-1.22wmf22/extensions/CentralNotice/ 'Updating CentralNotice to master' [23:05:45] Logged the message, Master [23:06:20] (03PS1) 10Faidon Liambotis: Reenable LVS paging check for ms-fe.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/91090 [23:06:22] RobH: ping [23:07:09] !log demon synchronized wmf-config/InitialiseSettings.php 'bnwiki gets cirrus as secondary' [23:07:21] Logged the message, Master [23:07:40] (03CR) 10Faidon Liambotis: [C: 04-1] "Would it be reasonable for these to go to the Navigation Timing view instead? This looks like two graphs, might be a bit too excessive for" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91079 (owner: 10JGonera) [23:08:05] (03CR) 10Faidon Liambotis: [C: 032] Reenable LVS paging check for ms-fe.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/91090 (owner: 10Faidon Liambotis) [23:09:14] (03CR) 10Faidon Liambotis: [V: 032] Reenable LVS paging check for ms-fe.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/91090 (owner: 10Faidon Liambotis) [23:09:14] <^demon|lunch> !log elastic: index created for bnwiki, running force indexing in 4 processes on terbium [23:09:26] Logged the message, Master [23:12:07] (03CR) 10Ryan Lane: [C: 032] localssl: listen on both ipv6 and ipv4 sockets [operations/puppet] - 10https://gerrit.wikimedia.org/r/90738 (owner: 10Ryan Lane) [23:15:09] (03PS1) 10Ryan Lane: Pass localssl traffic to ipaddress rather than 127.0.0.1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91091 [23:15:13] paravoid: ^^ [23:16:59] !log CentralNotice deploy had an issue where we are now pushing ALL traffic to the mobile site for CN -- ganglia reports a spike and I'm now reverting the change [23:17:13] Logged the message, Master [23:18:18] nope [23:18:31] you either need to do scope.lookupvar [23:18:38] or copy $::ipaddress to the local scope [23:18:52] it's facter... [23:18:55] so it should be global [23:19:00] yes, it's global [23:19:11] but I don't think you can't just reference the global variable from within the template like that [23:19:17] since when? [23:19:18] (03PS3) 10Dzahn: add account marktraceur and add to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91047 [23:19:23] is this a 3.0ism? [23:19:28] well, it might work with 2.7 and log warnings [23:20:16] Oct 21 23:01:16 stafford puppet-master[20189]: Dynamic lookup of $ipaddress at /etc/puppet/templates/nginx/sites/proxy.erb:100 is deprecated. Support will be removed in Puppet 2.8. Use a fully-qualified variable name (e.g., $classname::variable) or parameterized classes. [23:20:21] there you go :) [23:20:23] -_- [23:20:30] (03PS1) 10Andrew Bogott: Remove reference to ssh::bastion [operations/puppet] - 10https://gerrit.wikimedia.org/r/91094 [23:20:37] but yeah, I guess you won't be alone in that, so we can fix it when we fix all the others [23:20:45] some things should always be globally scopped [23:20:50] *scoped [23:21:00] but in their own namespaces [23:21:07] I wish puppet had a fact[''] namespace [23:21:09] (03CR) 10Dzahn: "Mark, if it was also a different key i would have just disabled the old one (ensuring the old key absent) and given you the new one, but 2" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91047 (owner: 10Dzahn) [23:21:16] !log mwalker synchronized php-1.22wmf22/extensions/CentralNotice/ 'Reverting earlier change' [23:21:25] and a global variable one too [23:21:30] Logged the message, Master [23:21:30] you can just do scope.lookupvar("::ipaddress") [23:21:32] then local variables would be in their own scope [23:21:36] bleh [23:22:08] puppet is one giant hack [23:22:18] I don't mind that [23:22:28] I was more annoyed at the implicit scoping tbh [23:22:39] facts pollute the scope [23:22:47] !log mwalker synchronized php-1.22wmf21/extensions/CentralNotice/ 'Reverting earlier change' [23:22:53] that's a separate issue, though [23:23:00] nod [23:23:03] Logged the message, Master [23:23:21] (03CR) 10Andrew Bogott: "Matanya, I'm about to merge this but you should have a look. No big deal, but could've been caught with a bit of grepping." [operations/puppet] - 10https://gerrit.wikimedia.org/r/91094 (owner: 10Andrew Bogott) [23:23:23] I guess scope.lookupvar('::globalvar') is an acceptable namespacing [23:23:47] (03PS2) 10Ryan Lane: Pass localssl traffic to ipaddress rather than 127.0.0.1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/91091 [23:23:49] paravoid: ^^ [23:24:59] so, I wonder if we should use ::ipaddress or the LVS service IP :) [23:25:36] (as for the scope lookup, I see @fqdn above, so it's not like you'd be alone in that) [23:26:48] so if we use the service IP, it will always go locally if the IP is bound on lo, which is what happens by lvs realserver [23:27:08] but if it's unbound, it'd still terminate SSL traffic and push it back to the rest of the servers via LVS [23:27:33] yeah, that's likely a good idea [23:27:33] the latter might be good, if there's a varnish issue, and it makes it more consistent with the non-localssl setup, but might also be counterintuitive [23:27:40] let's replace all the font package on imagescalers , yay https://gerrit.wikimedia.org/r/#/c/88441/ [23:27:52] I think going with lvs ip is good [23:28:20] paravoid: though, which IP will be used to communicate with it? [23:28:23] it's bound on lo [23:28:27] so it may be 127.0.0.1 [23:28:56] or would it be the lvs address? [23:29:36] I guess I could try and see :) [23:31:11] hm, is jenkins down? [23:31:56] looks like it comes in over the lvs IP [23:32:22] yep [23:32:34] ok, that looks like a winner [23:33:10] cp4001.ulsfo.wmnet 15 2013-10-21T23:31:22 0.000060320 198.35.26.106 hit/404 2802 GET http://bits.wikimedia.org/foo - - - - - [23:33:32] ah, there it goes [23:33:54] (03PS3) 10Ryan Lane: Pass localssl traffic to ipaddress_lo_lvs [operations/puppet] - 10https://gerrit.wikimedia.org/r/91091 [23:33:57] Ryan_Lane: that would make the config identical to non-local ssl too, wouldn't it? [23:33:59] paravoid: heh. you tested on the same host, eh? [23:34:02] (03CR) 10Andrew Bogott: [C: 032] Remove reference to ssh::bastion [operations/puppet] - 10https://gerrit.wikimedia.org/r/91094 (owner: 10Andrew Bogott) [23:34:06] paravoid: no [23:34:23] non-local ssl passes to a different internal -lb [23:34:29] because otherwise it would talk to itself [23:34:40] oh, right [23:35:26] you could do policy routing, but I'm so happy that we don't :) [23:35:39] heh [23:35:39] yeah [23:35:47] easier to just have an internal -lb [23:35:49] route 443 differently than 80 [23:36:06] * Ryan_Lane nods [23:36:08] well, route 80 to the outbound interface instead of lo [23:36:35] ok, I'm going to merge this and apply it [23:36:42] that only affects ulsfo, right? [23:36:47] traffic isn't going to ulsfo right now, so no worries on anything else [23:36:48] yeah [23:36:51] right [23:37:05] (03CR) 10Ryan Lane: [C: 032] Pass localssl traffic to ipaddress_lo_lvs [operations/puppet] - 10https://gerrit.wikimedia.org/r/91091 (owner: 10Ryan Lane) [23:37:29] stupid jenkins [23:37:30] :) [23:38:48] /names [23:39:22] hate it when that happens. sorry. [23:39:27] (03PS2) 10Andrew Bogott: bastion: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/87473 (owner: 10Matanya) [23:41:06] (03CR) 10Andrew Bogott: [C: 032] bastion: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/87473 (owner: 10Matanya) [23:43:46] <^demon|lunch> mutante: Went ahead and resolved 5867. I meant to this morning but forgot. [23:47:33] (03PS1) 10Andrew Bogott: Rename 'bastion' module to 'bastionhost' [operations/puppet] - 10https://gerrit.wikimedia.org/r/91097 [23:48:07] ^demon|lunch: thank you, that was the intention to confirm it's done [23:48:51] (03CR) 10Andrew Bogott: [C: 032] Rename 'bastion' module to 'bastionhost' [operations/puppet] - 10https://gerrit.wikimedia.org/r/91097 (owner: 10Andrew Bogott) [23:58:43] <^demon|lunch> manybubbles: bnwiki finished indexing.