[00:42:18] PROBLEM - RAID on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:28] PROBLEM - check configured eth on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:48] PROBLEM - check if dhclient is running on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:48] PROBLEM - DPKG on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:48] PROBLEM - puppet last run on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:58] PROBLEM - SSH on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:43:08] PROBLEM - Disk space on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:43:08] PROBLEM - MediaWiki profile collector on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:43:08] PROBLEM - Graphite Carbon on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:43:18] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:43:18] PROBLEM - uWSGI web apps on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:43:28] RECOVERY - check configured eth on tungsten is OK: NRPE: Unable to read output [00:43:38] RECOVERY - check if dhclient is running on tungsten is OK: PROCS OK: 0 processes with command name dhclient [00:43:38] RECOVERY - DPKG on tungsten is OK: All packages OK [00:43:38] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 784 seconds ago with 0 failures [00:43:48] RECOVERY - SSH on tungsten is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [00:43:58] RECOVERY - Disk space on tungsten is OK: DISK OK [00:43:58] RECOVERY - Graphite Carbon on tungsten is OK: OK: All defined Carbon jobs are runnning. [00:43:58] RECOVERY - MediaWiki profile collector on tungsten is OK: OK: All defined mwprof jobs are runnning. [00:44:08] RECOVERY - gdash.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 9055 bytes in 0.018 second response time [00:44:09] RECOVERY - RAID on tungsten is OK: OK: optimal, 1 logical, 2 physical [00:44:09] RECOVERY - uWSGI web apps on tungsten is OK: OK: All defined uWSGI apps are runnning. [01:22:40] !log ongoing osc_host.sh schema change jobs on terbium. fine to kill in an emergency [01:22:47] Logged the message, Master [01:28:01] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:39:44] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.009 second response time [02:16:08] !log LocalisationUpdate completed (1.24wmf11) at 2014-07-05 02:15:05+00:00 [02:16:16] Logged the message, Master [02:27:11] !log LocalisationUpdate completed (1.24wmf12) at 2014-07-05 02:26:08+00:00 [02:27:17] Logged the message, Master [02:53:10] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 5 02:52:04 UTC 2014 (duration 52m 3s) [02:53:15] Logged the message, Master [04:21:01] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 02:20:25 UTC [04:49:38] PROBLEM - SSH on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:49:48] PROBLEM - puppet last run on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:50:28] PROBLEM - MediaWiki profile collector on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:50:28] PROBLEM - Graphite Carbon on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:50:38] RECOVERY - SSH on tungsten is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [04:50:38] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 1230 seconds ago with 0 failures [04:51:18] RECOVERY - Graphite Carbon on tungsten is OK: OK: All defined Carbon jobs are runnning. [04:51:18] RECOVERY - MediaWiki profile collector on tungsten is OK: OK: All defined mwprof jobs are runnning. [05:00:14] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sat Jul 5 05:00:05 UTC 2014 [05:12:54] PROBLEM - puppet last run on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:14] PROBLEM - SSH on rhenium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:24] PROBLEM - DPKG on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:24] PROBLEM - check configured eth on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:24] PROBLEM - Disk space on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:25] PROBLEM - RAID on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:25] PROBLEM - check if dhclient is running on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:54] PROBLEM - puppet disabled on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:25:31] PROBLEM - NTP on rhenium is CRITICAL: NTP CRITICAL: No response from NTP server [05:57:41] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: Puppet has 1 failures [06:16:48] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:28:04] PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:14] PROBLEM - puppet last run on search1010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:15] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:24] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:24] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:24] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:24] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:34] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:44] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:44] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:44] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:44] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:44] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:45] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:45] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:54] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:54] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:04] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:04] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:04] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:04] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:04] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:05] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:05] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:14] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:14] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:14] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:14] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:14] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:24] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:34] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:34] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on db1042 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:04] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:34] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:24] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: Puppet has 1 failures [06:41:26] PROBLEM - puppet last run on es1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:44:16] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:44:16] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:45:06] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:45:06] RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:45:26] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:45:26] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:45:26] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:45:36] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:45:46] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:45:46] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:45:46] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:45:46] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:46:06] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:46:16] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:46:16] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:46:26] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:46:26] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:46:26] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:46:36] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:46:46] RECOVERY - puppet last run on db1042 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:46:46] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:46:46] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:46:46] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:46:46] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:47:16] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:47:36] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:47:56] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:48:26] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:50:26] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Puppet has 1 failures [06:53:06] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 04:52:37 UTC [06:57:26] RECOVERY - puppet last run on db1035 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:59:19] RECOVERY - puppet last run on es1003 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [07:06:09] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [07:06:10] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [07:07:29] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [07:09:39] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 1 failures [07:09:59] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 05:09:48 UTC [07:27:42] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [07:33:12] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Jul 5 07:33:05 UTC 2014 [07:43:41] PROBLEM - SSH on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:43:41] PROBLEM - MediaWiki profile collector on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:43:51] PROBLEM - puppet last run on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:21] PROBLEM - DPKG on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:41] PROBLEM - RAID on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:41] PROBLEM - check configured eth on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:51] PROBLEM - uWSGI web apps on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:51] PROBLEM - Disk space on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:11] PROBLEM - check if dhclient is running on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:11] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:45:11] PROBLEM - Graphite Carbon on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:41] RECOVERY - SSH on tungsten is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [07:45:41] RECOVERY - check configured eth on tungsten is OK: NRPE: Unable to read output [07:45:41] RECOVERY - MediaWiki profile collector on tungsten is OK: OK: All defined mwprof jobs are runnning. [07:45:41] RECOVERY - uWSGI web apps on tungsten is OK: OK: All defined uWSGI apps are runnning. [07:45:42] RECOVERY - Disk space on tungsten is OK: DISK OK [07:45:42] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 965 seconds ago with 0 failures [07:45:51] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:46:01] RECOVERY - check if dhclient is running on tungsten is OK: PROCS OK: 0 processes with command name dhclient [07:46:01] RECOVERY - gdash.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 9055 bytes in 0.015 second response time [07:46:01] RECOVERY - Graphite Carbon on tungsten is OK: OK: All defined Carbon jobs are runnning. [07:46:12] RECOVERY - DPKG on tungsten is OK: All packages OK [07:46:31] RECOVERY - RAID on tungsten is OK: OK: optimal, 1 logical, 2 physical [07:46:41] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.004 second response time [08:17:35] (03Abandoned) 10Hashar: contint: compress Jenkins console logs once per day [operations/puppet] - 10https://gerrit.wikimedia.org/r/125991 (https://bugzilla.wikimedia.org/63939) (owner: 10Hashar) [08:27:44] PROBLEM - puppet last run on ms-fe1003 is CRITICAL: CRITICAL: Puppet has 1 failures [08:28:14] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: Puppet has 1 failures [08:28:14] PROBLEM - puppet last run on pc1001 is CRITICAL: CRITICAL: Puppet has 1 failures [08:29:14] PROBLEM - puppet last run on mw1072 is CRITICAL: CRITICAL: Puppet has 1 failures [08:30:44] PROBLEM - puppet last run on mw1080 is CRITICAL: CRITICAL: Puppet has 1 failures [08:40:05] PROBLEM - check if dhclient is running on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:40:15] PROBLEM - SSH on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:40:25] PROBLEM - check configured eth on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:40:45] RECOVERY - puppet last run on ms-fe1003 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [08:41:05] RECOVERY - SSH on tungsten is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [08:41:05] RECOVERY - check if dhclient is running on tungsten is OK: PROCS OK: 0 processes with command name dhclient [08:41:15] RECOVERY - check configured eth on tungsten is OK: NRPE: Unable to read output [08:42:15] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [08:42:15] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [08:42:15] RECOVERY - puppet last run on pc1001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [08:42:45] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [09:10:48] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 05:09:48 UTC [10:05:01] PROBLEM - MySQL Processlist on db1002 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 113 statistics [10:06:01] RECOVERY - MySQL Processlist on db1002 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 21 statistics [10:44:39] there are more public https hosts with and old libssl https://bugzilla.wikimedia.org/show_bug.cgi?id=53259#c19 though not as important as wikitech, it is probably systematic, so it should be checked on all hosts [11:04:13] (03PS1) 10TTO: Add a complete list of local interwikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/144264 (https://bugzilla.wikimedia.org/954) [11:11:47] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 05:09:48 UTC [13:12:35] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 05:09:48 UTC [13:19:27] PROBLEM - puppet last run on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:17] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 647 seconds ago with 0 failures [14:19:30] PROBLEM - RAID on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:20:20] RECOVERY - RAID on tungsten is OK: OK: optimal, 1 logical, 2 physical [15:13:05] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 05:09:48 UTC [15:14:05] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 13:13:14 UTC [15:30:34] PROBLEM - RAID on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:30:34] PROBLEM - DPKG on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:30:54] PROBLEM - Disk space on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:31:24] RECOVERY - DPKG on tungsten is OK: All packages OK [15:31:25] RECOVERY - RAID on tungsten is OK: OK: optimal, 1 logical, 2 physical [15:31:44] RECOVERY - Disk space on tungsten is OK: DISK OK [16:13:23] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Jul 5 16:13:21 UTC 2014 [16:26:04] PROBLEM - SSH on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:26:34] PROBLEM - RAID on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:26:44] PROBLEM - puppet last run on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:26:44] PROBLEM - Graphite Carbon on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:26:44] PROBLEM - check configured eth on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:26:44] PROBLEM - MediaWiki profile collector on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:26:54] RECOVERY - SSH on tungsten is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [16:28:24] RECOVERY - RAID on tungsten is OK: OK: optimal, 1 logical, 2 physical [16:28:34] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 1130 seconds ago with 0 failures [16:28:34] RECOVERY - check configured eth on tungsten is OK: NRPE: Unable to read output [16:28:34] RECOVERY - Graphite Carbon on tungsten is OK: OK: All defined Carbon jobs are runnning. [16:28:34] RECOVERY - MediaWiki profile collector on tungsten is OK: OK: All defined mwprof jobs are runnning. [17:13:40] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 05:09:48 UTC [18:59:13] ori: ? [19:00:43] PROBLEM - RAID on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:15] PROBLEM - SSH on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:01:23] PROBLEM - Graphite Carbon on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:23] PROBLEM - check configured eth on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:23] PROBLEM - puppet last run on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:24] PROBLEM - uWSGI web apps on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:43] PROBLEM - DPKG on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:43] PROBLEM - Disk space on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:43] PROBLEM - MediaWiki profile collector on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:01:53] PROBLEM - check if dhclient is running on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:02:13] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:33] RECOVERY - Disk space on tungsten is OK: DISK OK [19:02:34] RECOVERY - DPKG on tungsten is OK: All packages OK [19:02:34] RECOVERY - MediaWiki profile collector on tungsten is OK: OK: All defined mwprof jobs are runnning. [19:02:34] RECOVERY - RAID on tungsten is OK: OK: optimal, 1 logical, 2 physical [19:02:43] RECOVERY - check if dhclient is running on tungsten is OK: PROCS OK: 0 processes with command name dhclient [19:03:03] RECOVERY - gdash.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 9055 bytes in 0.014 second response time [19:03:03] RECOVERY - SSH on tungsten is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [19:03:13] RECOVERY - Graphite Carbon on tungsten is OK: OK: All defined Carbon jobs are runnning. [19:03:13] RECOVERY - check configured eth on tungsten is OK: NRPE: Unable to read output [19:03:14] RECOVERY - uWSGI web apps on tungsten is OK: OK: All defined uWSGI apps are runnning. [19:03:14] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 775 seconds ago with 0 failures [19:13:53] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 05:09:48 UTC [20:18:25] matanya: hm? [20:18:45] Hi ori, I met on friday a developer from zend [20:19:00] hello [20:19:07] he works on php optimization, and he was interested in helping out [20:19:17] I spoke with him about hhvm and stuff [20:19:44] what sort of help? [20:20:00] he said there is probably a lot of space to improve performance [20:20:17] in hhvm or zend php or mediawiki code? [20:20:29] (it's true for all three.) [20:20:34] in terms of php optimization zend or hhvm [20:21:01] i don't know if he knows mediawiki very well. But i'm sure he knows the internals of zend [20:21:11] and probably some hhvm too [20:21:35] probably best to direct to mediawiki-vagrant, which makes it easy to switch between the different interpreters [20:21:44] and see if he can make zend php perf comparable to hhvm's [20:21:58] he said that if WMF is interested he would like to help out. Can i introduce you to each other ? [20:22:17] sure, feel free. i'm ori@wikimedia.org. [20:22:26] great, thanks a lot [20:22:39] thank you [20:25:44] PROBLEM - RAID on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:26:34] RECOVERY - RAID on tungsten is OK: OK: optimal, 1 logical, 2 physical [20:40:28] hey ori [20:40:34] hey yuvi [20:40:35] ori: happy adopted country independence day! [20:40:45] ori: also, see http://tools.wmflabs.org/grafana/giraffe/index.html#dashboard=ToolLabs+Basics&timeFrame=1d I got tired of grafana [20:41:07] (I'll admit my charting skills leave something to be desired) [20:41:09] what didn't you like about it? [20:41:46] ori: so, 1. no JSONP support, 2. adding new graphts is... confusing. I need to use the UI (the dashboards/.js format doesn't seem too documented) [20:41:59] 1 is probably no issue when it is deployed, but (2) felt really painful [20:42:14] my tools instance of grafana is still running at http://tools.wmflabs.org/grafana/index.html [20:43:09] giraffe doesn't pretend to have UI 'drag/drop' dashboards, has a fairly nicely documented config for charts, and uses a rather configurable charting library as well (Rickshaw) [20:46:14] i have a special hatred for this entire ecosystem, to be honest [20:46:20] I can see why [20:46:34] well, I think I can see why [20:46:52] gah, to rephrase again, *I* have hate for this ecosystem as well :) But unsure if it is the same as yours. [21:14:27] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 05:09:48 UTC [21:26:07] PROBLEM - check if dhclient is running on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:26:07] PROBLEM - MediaWiki profile collector on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:26:07] PROBLEM - SSH on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:26:58] PROBLEM - Graphite Carbon on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:27:07] PROBLEM - gdash.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:27:27] PROBLEM - check configured eth on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:27:37] PROBLEM - Disk space on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:28:07] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:28:17] PROBLEM - puppet last run on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:28:28] RECOVERY - Disk space on tungsten is OK: DISK OK [21:28:31] poor tungsten [21:28:47] RECOVERY - Graphite Carbon on tungsten is OK: OK: All defined Carbon jobs are runnning. [21:28:57] RECOVERY - check if dhclient is running on tungsten is OK: PROCS OK: 0 processes with command name dhclient [21:28:57] RECOVERY - MediaWiki profile collector on tungsten is OK: OK: All defined mwprof jobs are runnning. [21:28:57] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.005 second response time [21:28:57] RECOVERY - gdash.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 9055 bytes in 0.156 second response time [21:28:57] RECOVERY - SSH on tungsten is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [21:29:07] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 1181 seconds ago with 0 failures [21:29:17] RECOVERY - check configured eth on tungsten is OK: NRPE: Unable to read output [22:08:00] (03PS1) 10Yuvipanda: diamond: Remove archive.log handler [operations/puppet] - 10https://gerrit.wikimedia.org/r/144330 [22:10:00] (03PS1) 10Yuvipanda: diamond: keep only current day's logs when running in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/144331 [22:46:16] PROBLEM - RAID on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:46:56] PROBLEM - DPKG on tungsten is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:47:06] RECOVERY - RAID on tungsten is OK: OK: optimal, 1 logical, 2 physical [22:47:09] poor tungsten again [22:47:46] RECOVERY - DPKG on tungsten is OK: All packages OK [22:49:13] (03CR) 10Yuvipanda: "Why wasn't the sudo rule everywhere? It is part of the collector definition..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/144161 (owner: 10coren) [23:14:52] PROBLEM - Puppet freshness on rhenium is CRITICAL: Last successful Puppet run was Sat 05 Jul 2014 05:09:48 UTC