[01:18:46] (03PS3) 10Reedy: Remove $wgCopyrightIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (https://phabricator.wikimedia.org/T122754) (owner: 10Florianschmidtwelzow) [01:44:03] Was there a mwgrep patch to skip private wikis? [01:46:41] Reedy: I think results are grouped away, but still in the results per default [01:46:47] yeah [01:55:47] https://phabricator.wikimedia.org/T71581 [01:57:38] PROBLEM - Hadoop NodeManager on analytics1031 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [02:18:44] RECOVERY - Hadoop NodeManager on analytics1031 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [02:25:06] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 22s) [02:25:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:31:58] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Jan 3 02:31:58 UTC 2016 (duration 6m 52s) [02:32:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:35:16] (03PS1) 10Reedy: Optionally filter private wiki results in mwgrep [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) [02:35:18] legoktm: ^^ [02:36:26] (03CR) 10Reedy: Optionally filter private wiki results in mwgrep (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) (owner: 10Reedy) [03:35:12] PROBLEM - puppet last run on mw1034 is CRITICAL: CRITICAL: Puppet has 2 failures [03:39:32] PROBLEM - puppet last run on mw2187 is CRITICAL: CRITICAL: Puppet has 1 failures [03:50:01] PROBLEM - puppet last run on mw2196 is CRITICAL: CRITICAL: puppet fail [04:02:11] RECOVERY - puppet last run on mw1034 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [04:02:49] RECOVERY - puppet last run on mw2187 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [04:17:39] RECOVERY - puppet last run on mw2196 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [04:18:09] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:19:11] PROBLEM - RAID on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:20:00] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [04:20:49] PROBLEM - puppet last run on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:38:48] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:42:49] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [04:45:39] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:49:18] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:49:29] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:49:58] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:51:18] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [04:51:38] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [04:51:48] RECOVERY - DPKG on mw1003 is OK: All packages OK [04:51:58] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [05:01:51] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:50] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:04:31] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:05:50] RECOVERY - DPKG on mw1003 is OK: All packages OK [05:07:20] RECOVERY - Disk space on mw1003 is OK: DISK OK [05:10:50] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:11:01] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:11:51] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:12:00] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:12:31] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:21] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:31] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:22:42] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [05:27:10] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [05:27:10] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [05:27:31] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [05:27:41] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [05:27:52] RECOVERY - Disk space on mw1003 is OK: DISK OK [05:28:20] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [05:28:31] RECOVERY - DPKG on mw1003 is OK: All packages OK [05:37:43] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:38:44] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:34] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:43:25] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:43:33] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:45:15] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:47:14] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:48:44] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:03:41] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:03:41] RECOVERY - Disk space on mw1003 is OK: DISK OK [06:03:51] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [06:04:12] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [06:04:13] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [06:04:41] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [06:04:51] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [06:07:41] RECOVERY - DPKG on mw1003 is OK: All packages OK [06:13:52] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:13:52] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:14:41] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:16:02] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:16:11] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:16:42] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:17:02] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:17:12] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:18:01] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:18:01] RECOVERY - DPKG on mw1003 is OK: All packages OK [06:18:01] RECOVERY - Disk space on mw1003 is OK: DISK OK [06:18:02] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [06:18:31] https://grafana.wikimedia.org/dashboard/db/server-board?panelId=7&fullscreen&var-server=mw1003 [06:18:42] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [06:18:42] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [06:19:02] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [06:19:03] https://grafana.wikimedia.org/dashboard/db/server-board?var-server=mw1003 [06:19:10] Looks like memory as well [06:19:11] anyway [06:19:13] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [06:19:14] * Krinkle goes back into hding [06:20:09] (03CR) 10Krinkle: [C: 04-1] "Per Reedy" [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) (owner: 10Reedy) [06:25:21] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:26:21] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:26:21] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:28:22] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:29:11] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:29:11] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:29:31] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [06:30:31] RECOVERY - DPKG on mw1003 is OK: All packages OK [06:31:21] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 5 failures [06:31:43] PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:51] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:11] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:22] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 2 failures [06:34:01] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: Puppet has 2 failures [06:34:02] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:51] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:01] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [06:36:02] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [06:36:52] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:38:41] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:38:51] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:40:02] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:42:12] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:42:21] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:42:22] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:21] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:41] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [06:54:51] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [06:54:51] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [06:55:01] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [06:55:22] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [06:55:31] RECOVERY - DPKG on mw1003 is OK: All packages OK [06:55:32] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:56:02] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:56:33] RECOVERY - puppet last run on db1045 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:57:12] RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:21] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:31] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:32] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:52] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:52] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:00:01] RECOVERY - Disk space on mw1003 is OK: DISK OK [07:04:21] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:05:30] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:05:41] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:06:50] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:21] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:11:42] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [07:14:02] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:14:31] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:10] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:11] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:20:20] RECOVERY - puppet last run on elastic1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:32:49] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [07:32:59] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [07:32:59] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [07:33:49] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [07:33:59] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [07:34:09] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [07:34:11] RECOVERY - DPKG on mw1003 is OK: All packages OK [07:35:09] RECOVERY - Disk space on mw1003 is OK: DISK OK [07:40:00] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:20] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:30] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:00] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:00] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:29] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:48:10] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:48:59] PROBLEM - nutcracker port on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:53:39] RECOVERY - Disk space on mw1003 is OK: DISK OK [07:59:59] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:09:14] RECOVERY - Disk space on mw1003 is OK: DISK OK [08:09:14] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:09:14] RECOVERY - DPKG on mw1003 is OK: All packages OK [08:10:13] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [08:12:43] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [08:15:33] PROBLEM - DPKG on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:17:34] PROBLEM - salt-minion processes on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:17:34] PROBLEM - Disk space on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:21:03] PROBLEM - configured eth on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:26:44] PROBLEM - SSH on mw1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:29:13] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:29:13] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [08:29:13] PROBLEM - salt-minion processes on lvs3001 is CRITICAL: Timeout while attempting connection [08:31:14] PROBLEM - dhclient process on lvs3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:31:24] PROBLEM - salt-minion processes on lvs3003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [08:32:12] RECOVERY - dhclient process on lvs3001 is OK: PROCS OK: 0 processes with command name dhclient [08:34:02] PROBLEM - nutcracker process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:34:11] PROBLEM - dhclient process on mw1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:38:22] RECOVERY - Disk space on mw1003 is OK: DISK OK [08:39:41] RECOVERY - salt-minion processes on mw1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:40:01] RECOVERY - DPKG on mw1003 is OK: All packages OK [08:40:02] RECOVERY - nutcracker process on mw1003 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:40:02] RECOVERY - dhclient process on mw1003 is OK: PROCS OK: 0 processes with command name dhclient [08:40:22] RECOVERY - configured eth on mw1003 is OK: OK - interfaces up [08:40:53] RECOVERY - nutcracker port on mw1003 is OK: TCP OK - 0.000 second response time on port 11212 [08:41:12] RECOVERY - SSH on mw1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [08:41:12] RECOVERY - RAID on mw1003 is OK: OK: no RAID installed [08:44:22] RECOVERY - salt-minion processes on lvs3003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:45:23] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:54:21] RECOVERY - salt-minion processes on lvs3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:35:45] (03PS1) 10Faidon Liambotis: mediawiki: silence HHVM jobrunner restart cron [puppet] - 10https://gerrit.wikimedia.org/r/262079 [09:36:44] (03CR) 10Faidon Liambotis: [C: 032 V: 032] mediawiki: silence HHVM jobrunner restart cron [puppet] - 10https://gerrit.wikimedia.org/r/262079 (owner: 10Faidon Liambotis) [10:35:18] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 786 [10:40:18] RECOVERY - check_mysql on db1008 is OK: Uptime: 1101802 Threads: 143 Questions: 39549421 Slow queries: 13588 Opens: 58860 Flush tables: 2 Open tables: 416 Queries per second avg: 35.895 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:43:20] PROBLEM - Hadoop NodeManager on analytics1028 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [10:53:39] RECOVERY - Hadoop NodeManager on analytics1028 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [13:52:30] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: puppet fail [13:56:09] PROBLEM - Host ms-be2007 is DOWN: PING CRITICAL - Packet loss = 100% [14:19:33] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:25:30] (03CR) 10Reedy: "Obviously, as the private filtering is done post ES, the max-results amount might not be adhered too, if it has private wiki results" [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) (owner: 10Reedy) [17:37:01] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] [17:37:42] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [1000.0] [17:49:21] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:50:10] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:15:20] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: puppet fail [18:41:24] RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [19:25:04] PROBLEM - puppet last run on ms-be2014 is CRITICAL: CRITICAL: puppet fail [19:49:45] PROBLEM - RAID on db2018 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [19:50:46] RECOVERY - puppet last run on ms-be2014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:50:56] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 1 failures [20:15:34] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:08:09] (03CR) 10Krinkle: "Ideally the output of this would be easily shareable without needing to redact anything. So maybe if this flag is passed, omit everything " [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) (owner: 10Reedy) [21:08:58] (03CR) 10Reedy: "I suspect the count tells very little, but I know what you're saying" [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) (owner: 10Reedy) [21:12:30] (03CR) 10Reedy: Optionally filter private wiki results in mwgrep (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) (owner: 10Reedy) [21:26:44] (03PS2) 10Reedy: Optionally filter private wiki results in mwgrep [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) [21:31:22] (03PS3) 10Reedy: Optionally filter private wiki results in mwgrep [puppet] - 10https://gerrit.wikimedia.org/r/262068 (https://phabricator.wikimedia.org/T71581) [21:41:50] PROBLEM - puppet last run on mw2195 is CRITICAL: CRITICAL: Puppet has 1 failures [22:06:55] RECOVERY - puppet last run on mw2195 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [22:52:18] PROBLEM - puppet last run on cp2022 is CRITICAL: CRITICAL: puppet fail [23:19:37] RECOVERY - puppet last run on cp2022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures