[00:01:43] <Krenair>	 Of course, having them run automatically from terbium would require https://phabricator.wikimedia.org/T98682 being fixed
[00:02:08] <icinga-wm>	 PROBLEM - puppet last run on mw2029 is CRITICAL puppet fail
[00:06:10] <wikibugs>	 6operations, 6Labs, 10wikitech.wikimedia.org, 7Database: labswiki DB is inaccessible from tin, terbium, etc. - https://phabricator.wikimedia.org/T98682#1426876 (10Krenair) This also prevents us from being able to add silver to the wikis which get QueryPages like Special:Wantedpages automatically updated
[00:19:10] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected
[00:20:29] <icinga-wm>	 RECOVERY - puppet last run on mw2029 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures
[00:22:19] <icinga-wm>	 PROBLEM - Apache HTTP on mw1108 is CRITICAL - Socket timeout after 10 seconds
[00:22:29] <icinga-wm>	 PROBLEM - HHVM rendering on mw1108 is CRITICAL - Socket timeout after 10 seconds
[00:22:49] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1028 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[00:23:09] <icinga-wm>	 PROBLEM - nutcracker port on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:23:28] <icinga-wm>	 PROBLEM - configured eth on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:23:28] <icinga-wm>	 PROBLEM - puppet last run on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:23:39] <icinga-wm>	 PROBLEM - SSH on mw1108 is CRITICAL: Server answer
[00:23:49] <icinga-wm>	 PROBLEM - dhclient process on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:23:59] <icinga-wm>	 PROBLEM - salt-minion processes on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:24:09] <icinga-wm>	 PROBLEM - HHVM processes on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:24:09] <icinga-wm>	 PROBLEM - nutcracker process on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:24:18] <icinga-wm>	 PROBLEM - DPKG on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:24:18] <icinga-wm>	 PROBLEM - Disk space on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:24:19] <icinga-wm>	 PROBLEM - RAID on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:35:38] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1028 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[00:36:59] <icinga-wm>	 RECOVERY - Disk space on mw1108 is OK: DISK OK
[00:36:59] <icinga-wm>	 RECOVERY - nutcracker process on mw1108 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[00:36:59] <icinga-wm>	 RECOVERY - DPKG on mw1108 is OK: All packages OK
[00:37:08] <icinga-wm>	 RECOVERY - RAID on mw1108 is OK no RAID installed
[00:37:48] <icinga-wm>	 RECOVERY - nutcracker port on mw1108 is OK: TCP OK - 0.000 second response time on port 11212
[00:37:59] <icinga-wm>	 RECOVERY - puppet last run on mw1108 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures
[00:38:08] <icinga-wm>	 RECOVERY - configured eth on mw1108 is OK - interfaces up
[00:38:19] <icinga-wm>	 RECOVERY - SSH on mw1108 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[00:38:20] <icinga-wm>	 RECOVERY - dhclient process on mw1108 is OK: PROCS OK: 0 processes with command name dhclient
[00:38:30] <icinga-wm>	 RECOVERY - salt-minion processes on mw1108 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[00:38:40] <icinga-wm>	 RECOVERY - Apache HTTP on mw1108 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.038 second response time
[00:38:48] <icinga-wm>	 RECOVERY - HHVM processes on mw1108 is OK: PROCS OK: 6 processes with command name hhvm
[00:38:50] <icinga-wm>	 RECOVERY - HHVM rendering on mw1108 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.124 second response time
[01:23:30] <grrrit-wm>	 (03CR) 10Krinkle: [C: 04-1] "If we remove 'contentadmin' from the labswiki entry in InitialiseSettings, please leave a comment in its place that we must never add a 's" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222776 (owner: 10Alex Monk)
[01:43:08] <grrrit-wm>	 (03PS1) 10Alex Monk: Remove bastion1 and bastion2 from labs bastion hosts list [puppet] - 10https://gerrit.wikimedia.org/r/222871 
[01:46:23] <grrrit-wm>	 (03PS2) 10Alex Monk: wikitech: Clean up contentadmin rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222776 
[01:48:12] <grrrit-wm>	 (03CR) 10Alex Monk: "Those IPs point to these instances now:" [puppet] - 10https://gerrit.wikimedia.org/r/222871 (owner: 10Alex Monk)
[02:17:18] <icinga-wm>	 PROBLEM - puppet last run on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:17:39] <icinga-wm>	 PROBLEM - RAID on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:18:09] <icinga-wm>	 PROBLEM - HHVM rendering on mw1091 is CRITICAL - Socket timeout after 10 seconds
[02:18:49] <icinga-wm>	 PROBLEM - Apache HTTP on mw1091 is CRITICAL - Socket timeout after 10 seconds
[02:20:09] <icinga-wm>	 PROBLEM - nutcracker process on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:20:28] <icinga-wm>	 PROBLEM - HHVM processes on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:20:58] <icinga-wm>	 PROBLEM - nutcracker port on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:46] <icinga-wm>	 PROBLEM - SSH on mw1091 is CRITICAL - Socket timeout after 10 seconds
[02:23:47] <icinga-wm>	 PROBLEM - DPKG on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:47] <icinga-wm>	 PROBLEM - configured eth on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:47] <icinga-wm>	 PROBLEM - Disk space on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:47] <icinga-wm>	 PROBLEM - salt-minion processes on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:47] <icinga-wm>	 PROBLEM - dhclient process on mw1091 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:24:28] <icinga-wm>	 RECOVERY - nutcracker process on mw1091 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:24:30] <icinga-wm>	 RECOVERY - HHVM processes on mw1091 is OK: PROCS OK: 6 processes with command name hhvm
[02:24:58] <icinga-wm>	 RECOVERY - nutcracker port on mw1091 is OK: TCP OK - 0.000 second response time on port 11212
[02:25:08] <icinga-wm>	 RECOVERY - DPKG on mw1091 is OK: All packages OK
[02:25:08] <icinga-wm>	 RECOVERY - SSH on mw1091 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:25:08] <icinga-wm>	 RECOVERY - configured eth on mw1091 is OK - interfaces up
[02:25:18] <icinga-wm>	 RECOVERY - Disk space on mw1091 is OK: DISK OK
[02:25:29] <icinga-wm>	 RECOVERY - salt-minion processes on mw1091 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:25:38] <icinga-wm>	 RECOVERY - puppet last run on mw1091 is OK Puppet is currently enabled, last run 29 minutes ago with 0 failures
[02:25:39] <icinga-wm>	 RECOVERY - dhclient process on mw1091 is OK: PROCS OK: 0 processes with command name dhclient
[02:25:50] <icinga-wm>	 RECOVERY - RAID on mw1091 is OK no RAID installed
[02:26:44] <logmsgbot>	 !log l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 13m 05s)
[02:29:00] <icinga-wm>	 RECOVERY - Apache HTTP on mw1058 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.047 second response time
[02:29:08] <icinga-wm>	 RECOVERY - Apache HTTP on mw1091 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 6.929 second response time
[02:29:09] <icinga-wm>	 RECOVERY - HHVM rendering on mw1058 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.169 second response time
[02:29:09] <icinga-wm>	 RECOVERY - HHVM rendering on mw1075 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.233 second response time
[02:29:18] <icinga-wm>	 RECOVERY - Apache HTTP on mw1113 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.287 second response time
[02:29:30] <icinga-wm>	 RECOVERY - HHVM rendering on mw1113 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.298 second response time
[02:29:50] <icinga-wm>	 PROBLEM - HHVM rendering on mw1061 is CRITICAL - Socket timeout after 10 seconds
[02:32:51] <icinga-wm>	 RECOVERY - Apache HTTP on mw1075 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.061 second response time
[02:32:51] <icinga-wm>	 RECOVERY - HHVM rendering on mw1091 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.182 second response time
[02:32:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw1065 is CRITICAL - Socket timeout after 10 seconds
[02:32:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw1102 is CRITICAL - Socket timeout after 10 seconds
[02:32:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw1061 is CRITICAL - Socket timeout after 10 seconds
[02:32:52] <icinga-wm>	 RECOVERY - HHVM rendering on mw1110 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.134 second response time
[02:32:52] <icinga-wm>	 RECOVERY - Apache HTTP on mw1110 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.040 second response time
[02:32:52] <icinga-wm>	 PROBLEM - SSH on mw1047 is CRITICAL - Socket timeout after 10 seconds
[02:32:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw1047 is CRITICAL: Connection timed out
[02:32:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw1089 is CRITICAL: Connection timed out
[02:32:52] <icinga-wm>	 PROBLEM - puppet last run on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw1109 is CRITICAL: Connection timed out
[02:32:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw1149 is CRITICAL: Connection timed out
[02:32:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw1047 is CRITICAL: Connection timed out
[02:32:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw1088 is CRITICAL - Socket timeout after 10 seconds
[02:32:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw1065 is CRITICAL - Socket timeout after 10 seconds
[02:32:52] <icinga-wm>	 PROBLEM - DPKG on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - puppet last run on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw1084 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.019 second response time
[02:32:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw1084 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.016 second response time
[02:32:52] <icinga-wm>	 PROBLEM - puppet last run on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - nutcracker port on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - puppet last run on mw1072 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - nutcracker port on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw1072 is CRITICAL: Connection timed out
[02:32:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw1072 is CRITICAL: Connection timed out
[02:32:52] <icinga-wm>	 PROBLEM - DPKG on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - DPKG on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw1089 is CRITICAL - Socket timeout after 10 seconds
[02:32:52] <icinga-wm>	 PROBLEM - DPKG on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - SSH on mw1072 is CRITICAL - Socket timeout after 10 seconds
[02:32:52] <icinga-wm>	 PROBLEM - salt-minion processes on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - nutcracker process on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - puppet last run on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - salt-minion processes on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - RAID on mw1072 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw1102 is CRITICAL: Connection timed out
[02:32:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw1088 is CRITICAL: Connection timed out
[02:32:52] <icinga-wm>	 PROBLEM - SSH on mw1065 is CRITICAL: Server answer
[02:32:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw1149 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 7.579 second response time
[02:32:52] <icinga-wm>	 PROBLEM - HHVM processes on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - nutcracker process on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - Disk space on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - nutcracker port on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - salt-minion processes on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - RAID on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - nutcracker port on mw1072 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - salt-minion processes on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - HHVM processes on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - RAID on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - RAID on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - configured eth on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - nutcracker port on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - RAID on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - configured eth on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - DPKG on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - dhclient process on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - configured eth on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:52] <icinga-wm>	 PROBLEM - Disk space on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:58] <icinga-wm>	 PROBLEM - SSH on mw1061 is CRITICAL - Socket timeout after 10 seconds
[02:32:59] <icinga-wm>	 PROBLEM - Disk space on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:59] <icinga-wm>	 PROBLEM - nutcracker process on mw1072 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:32:59] <icinga-wm>	 PROBLEM - dhclient process on mw1072 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:09] <icinga-wm>	 PROBLEM - RAID on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:09] <icinga-wm>	 PROBLEM - nutcracker process on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:09] <icinga-wm>	 PROBLEM - RAID on mw1149 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:18] <icinga-wm>	 PROBLEM - configured eth on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:18] <icinga-wm>	 PROBLEM - dhclient process on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:19] <icinga-wm>	 PROBLEM - puppet last run on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:19] <icinga-wm>	 PROBLEM - SSH on mw1088 is CRITICAL - Socket timeout after 10 seconds
[02:33:20] <icinga-wm>	 PROBLEM - RAID on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:20] <icinga-wm>	 PROBLEM - SSH on mw1109 is CRITICAL - Socket timeout after 10 seconds
[02:33:44] <icinga-wm>	 PROBLEM - Apache HTTP on mw1109 is CRITICAL: Connection timed out
[02:33:44] <SMalyshev>	 wow looks like something wrong is going on... also I;m getting a lot of 503s from wikidata suddenly
[02:33:45] <icinga-wm>	 PROBLEM - dhclient process on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - Disk space on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - DPKG on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - puppet last run on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - SSH on mw1074 is CRITICAL - Socket timeout after 10 seconds
[02:33:45] <icinga-wm>	 PROBLEM - Disk space on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - nutcracker process on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - HHVM processes on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - SSH on mw1089 is CRITICAL - Socket timeout after 10 seconds
[02:33:45] <icinga-wm>	 PROBLEM - salt-minion processes on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - nutcracker port on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - configured eth on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - DPKG on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - RAID on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:45] <icinga-wm>	 PROBLEM - HHVM processes on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:48] <icinga-wm>	 PROBLEM - dhclient process on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:49] <icinga-wm>	 PROBLEM - configured eth on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:49] <icinga-wm>	 PROBLEM - nutcracker process on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:58] <icinga-wm>	 PROBLEM - Disk space on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:58] <icinga-wm>	 PROBLEM - puppet last run on mw1102 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:58] <icinga-wm>	 PROBLEM - nutcracker process on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:33:58] <icinga-wm>	 RECOVERY - HHVM rendering on mw1072 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.136 second response time
[02:33:59] <icinga-wm>	 RECOVERY - puppet last run on mw1072 is OK Puppet is currently enabled, last run 11 minutes ago with 0 failures
[02:33:59] <icinga-wm>	 RECOVERY - Apache HTTP on mw1072 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.041 second response time
[02:33:59] <icinga-wm>	 RECOVERY - SSH on mw1072 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:34:00] <icinga-wm>	 PROBLEM - dhclient process on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:08] <icinga-wm>	 PROBLEM - salt-minion processes on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:09] <icinga-wm>	 PROBLEM - nutcracker port on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:18] <icinga-wm>	 RECOVERY - RAID on mw1072 is OK no RAID installed
[02:34:19] <icinga-wm>	 PROBLEM - HHVM processes on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:19] <icinga-wm>	 RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.058 second response time
[02:34:19] <icinga-wm>	 PROBLEM - dhclient process on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:19] <icinga-wm>	 PROBLEM - dhclient process on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:29] <icinga-wm>	 PROBLEM - nutcracker process on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:29] <icinga-wm>	 RECOVERY - nutcracker port on mw1072 is OK: TCP OK - 0.000 second response time on port 11212
[02:34:55] <icinga-wm>	 RECOVERY - Apache HTTP on mw1102 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.107 second response time
[02:34:55] <icinga-wm>	 PROBLEM - Disk space on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:55] <icinga-wm>	 PROBLEM - HHVM processes on mw1089 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:55] <icinga-wm>	 RECOVERY - nutcracker process on mw1072 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:34:55] <icinga-wm>	 RECOVERY - dhclient process on mw1072 is OK: PROCS OK: 0 processes with command name dhclient
[02:34:58] <icinga-wm>	 PROBLEM - salt-minion processes on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:59] <icinga-wm>	 RECOVERY - RAID on mw1149 is OK no RAID installed
[02:35:00] <icinga-wm>	 PROBLEM - nutcracker port on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:08] <icinga-wm>	 PROBLEM - HHVM processes on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:18] <icinga-wm>	 PROBLEM - configured eth on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:39] <icinga-wm>	 RECOVERY - HHVM rendering on mw1149 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.142 second response time
[02:35:49] <icinga-wm>	 RECOVERY - puppet last run on mw1102 is OK Puppet is currently enabled, last run 14 minutes ago with 0 failures
[02:35:49] <icinga-wm>	 RECOVERY - configured eth on mw1089 is OK - interfaces up
[02:35:59] <icinga-wm>	 RECOVERY - DPKG on mw1089 is OK: All packages OK
[02:35:59] <icinga-wm>	 RECOVERY - nutcracker port on mw1047 is OK: TCP OK - 0.000 second response time on port 11212
[02:35:59] <icinga-wm>	 RECOVERY - nutcracker process on mw1089 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:36:00] <icinga-wm>	 RECOVERY - salt-minion processes on mw1089 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:36:10] <icinga-wm>	 RECOVERY - salt-minion processes on mw1065 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:36:11] <icinga-wm>	 RECOVERY - dhclient process on mw1089 is OK: PROCS OK: 0 processes with command name dhclient
[02:36:19] <icinga-wm>	 RECOVERY - HHVM rendering on mw1102 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.126 second response time
[02:36:19] <icinga-wm>	 RECOVERY - HHVM processes on mw1065 is OK: PROCS OK: 6 processes with command name hhvm
[02:36:29] <icinga-wm>	 RECOVERY - Disk space on mw1089 is OK: DISK OK
[02:36:29] <icinga-wm>	 RECOVERY - SSH on mw1065 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:36:29] <icinga-wm>	 RECOVERY - HHVM processes on mw1089 is OK: PROCS OK: 6 processes with command name hhvm
[02:36:29] <icinga-wm>	 RECOVERY - nutcracker port on mw1065 is OK: TCP OK - 0.000 second response time on port 11212
[02:36:30] <icinga-wm>	 RECOVERY - Apache HTTP on mw1065 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.978 second response time
[02:36:39] <icinga-wm>	 RECOVERY - RAID on mw1065 is OK no RAID installed
[02:36:48] <icinga-wm>	 RECOVERY - configured eth on mw1065 is OK - interfaces up
[02:36:49] <icinga-wm>	 RECOVERY - DPKG on mw1065 is OK: All packages OK
[02:36:49] <icinga-wm>	 RECOVERY - salt-minion processes on mw1109 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:36:49] <icinga-wm>	 RECOVERY - Disk space on mw1065 is OK: DISK OK
[02:36:58] <icinga-wm>	 RECOVERY - nutcracker port on mw1109 is OK: TCP OK - 0.000 second response time on port 11212
[02:36:59] <icinga-wm>	 RECOVERY - RAID on mw1109 is OK no RAID installed
[02:37:00] <icinga-wm>	 RECOVERY - HHVM processes on mw1047 is OK: PROCS OK: 6 processes with command name hhvm
[02:37:09] <icinga-wm>	 RECOVERY - puppet last run on mw1109 is OK Puppet is currently enabled, last run 13 minutes ago with 0 failures
[02:37:18] <icinga-wm>	 RECOVERY - SSH on mw1109 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:37:18] <icinga-wm>	 RECOVERY - RAID on mw1089 is OK no RAID installed
[02:37:19] <icinga-wm>	 RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.233 second response time
[02:37:19] <icinga-wm>	 RECOVERY - puppet last run on mw1089 is OK Puppet is currently enabled, last run 13 minutes ago with 0 failures
[02:37:29] <icinga-wm>	 RECOVERY - DPKG on mw1109 is OK: All packages OK
[02:37:30] <icinga-wm>	 RECOVERY - Disk space on mw1109 is OK: DISK OK
[02:37:30] <icinga-wm>	 RECOVERY - SSH on mw1089 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:37:30] <icinga-wm>	 RECOVERY - nutcracker port on mw1089 is OK: TCP OK - 0.000 second response time on port 11212
[02:37:31] <icinga-wm>	 RECOVERY - configured eth on mw1109 is OK - interfaces up
[02:37:31] <icinga-wm>	 RECOVERY - Apache HTTP on mw1089 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.055 second response time
[02:37:31] <icinga-wm>	 RECOVERY - HHVM processes on mw1109 is OK: PROCS OK: 6 processes with command name hhvm
[02:37:38] <icinga-wm>	 RECOVERY - dhclient process on mw1109 is OK: PROCS OK: 0 processes with command name dhclient
[02:37:40] <icinga-wm>	 RECOVERY - HHVM rendering on mw1109 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.286 second response time
[02:37:40] <icinga-wm>	 RECOVERY - HHVM rendering on mw1065 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.131 second response time
[02:37:49] <icinga-wm>	 RECOVERY - nutcracker process on mw1065 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:37:49] <icinga-wm>	 RECOVERY - puppet last run on mw1065 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[02:37:50] <icinga-wm>	 RECOVERY - Disk space on mw1047 is OK: DISK OK
[02:37:59] <icinga-wm>	 RECOVERY - dhclient process on mw1065 is OK: PROCS OK: 0 processes with command name dhclient
[02:38:00] <icinga-wm>	 RECOVERY - HHVM rendering on mw1089 is OK: HTTP OK: HTTP/1.1 200 OK - 64457 bytes in 0.140 second response time
[02:38:09] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[02:38:19] <icinga-wm>	 RECOVERY - nutcracker process on mw1109 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:38:29] <icinga-wm>	 RECOVERY - puppet last run on mw1113 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[02:38:30] <icinga-wm>	 RECOVERY - nutcracker process on mw1047 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:38:49] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 50.00% of data above the critical threshold [500.0]
[02:38:58] <icinga-wm>	 RECOVERY - Disk space on mw1061 is OK: DISK OK
[02:38:58] <icinga-wm>	 RECOVERY - dhclient process on mw1061 is OK: PROCS OK: 0 processes with command name dhclient
[02:38:59] <icinga-wm>	 RECOVERY - puppet last run on mw1075 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures
[02:39:09] <icinga-wm>	 RECOVERY - nutcracker process on mw1061 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:39:09] <icinga-wm>	 RECOVERY - configured eth on mw1047 is OK - interfaces up
[02:39:49] <icinga-wm>	 RECOVERY - DPKG on mw1061 is OK: All packages OK
[02:39:50] <icinga-wm>	 RECOVERY - puppet last run on mw1061 is OK Puppet is currently enabled, last run 32 minutes ago with 0 failures
[02:40:08] <icinga-wm>	 RECOVERY - nutcracker port on mw1061 is OK: TCP OK - 0.000 second response time on port 11212
[02:40:39] <icinga-wm>	 RECOVERY - HHVM processes on mw1061 is OK: PROCS OK: 6 processes with command name hhvm
[02:40:40] <icinga-wm>	 RECOVERY - salt-minion processes on mw1061 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:40:48] <icinga-wm>	 RECOVERY - puppet last run on mw1058 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[02:40:49] <icinga-wm>	 RECOVERY - RAID on mw1061 is OK no RAID installed
[02:40:50] <icinga-wm>	 RECOVERY - configured eth on mw1061 is OK - interfaces up
[02:40:59] <icinga-wm>	 RECOVERY - SSH on mw1061 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:41:38] <icinga-wm>	 RECOVERY - Disk space on mw1088 is OK: DISK OK
[02:41:39] <icinga-wm>	 RECOVERY - dhclient process on mw1088 is OK: PROCS OK: 0 processes with command name dhclient
[02:41:40] <icinga-wm>	 RECOVERY - SSH on mw1047 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:41:48] <icinga-wm>	 RECOVERY - nutcracker process on mw1074 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:41:48] <icinga-wm>	 RECOVERY - HHVM processes on mw1074 is OK: PROCS OK: 1 process with command name hhvm
[02:41:48] <icinga-wm>	 RECOVERY - salt-minion processes on mw1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:42:10] <icinga-wm>	 RECOVERY - DPKG on mw1074 is OK: All packages OK
[02:42:29] <icinga-wm>	 RECOVERY - dhclient process on mw1047 is OK: PROCS OK: 0 processes with command name dhclient
[02:42:40] <icinga-wm>	 RECOVERY - Disk space on mw1074 is OK: DISK OK
[02:42:40] <icinga-wm>	 RECOVERY - salt-minion processes on mw1074 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:42:59] <icinga-wm>	 RECOVERY - nutcracker port on mw1074 is OK: TCP OK - 0.000 second response time on port 11212
[02:43:00] <icinga-wm>	 RECOVERY - HHVM rendering on mw1074 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.536 second response time
[02:43:01] <Krenair>	 Sigh.
[02:43:06] <Krenair>	 wikitech session issues again
[02:43:19] <icinga-wm>	 RECOVERY - configured eth on mw1074 is OK - interfaces up
[02:43:28] <icinga-wm>	 RECOVERY - dhclient process on mw1074 is OK: PROCS OK: 0 processes with command name dhclient
[02:43:38] <icinga-wm>	 RECOVERY - SSH on mw1074 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:43:40] <icinga-wm>	 RECOVERY - Apache HTTP on mw1074 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.045 second response time
[02:44:49] <icinga-wm>	 RECOVERY - RAID on mw1074 is OK no RAID installed
[02:46:39] <icinga-wm>	 RECOVERY - RAID on mw1047 is OK no RAID installed
[02:47:40] <icinga-wm>	 PROBLEM - dhclient process on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:47:40] <icinga-wm>	 PROBLEM - Disk space on mw1088 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:47:41] <icinga-wm>	 RECOVERY - puppet last run on mw1047 is OK Puppet is currently enabled, last run 28 minutes ago with 0 failures
[02:47:49] <icinga-wm>	 PROBLEM - HHVM rendering on mw1052 is CRITICAL - Socket timeout after 10 seconds
[02:48:08] <icinga-wm>	 PROBLEM - Apache HTTP on mw1052 is CRITICAL - Socket timeout after 10 seconds
[02:48:09] <icinga-wm>	 RECOVERY - DPKG on mw1047 is OK: All packages OK
[02:49:09] <icinga-wm>	 RECOVERY - puppet last run on mw1074 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[02:49:19] <icinga-wm>	 PROBLEM - RAID on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:49:29] <icinga-wm>	 PROBLEM - DPKG on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:49:29] <icinga-wm>	 PROBLEM - configured eth on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:49:29] <icinga-wm>	 PROBLEM - puppet last run on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:08] <icinga-wm>	 PROBLEM - Disk space on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:09] <icinga-wm>	 PROBLEM - dhclient process on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:48] <icinga-wm>	 PROBLEM - SSH on mw1052 is CRITICAL - Socket timeout after 10 seconds
[02:50:50] <icinga-wm>	 PROBLEM - nutcracker process on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:51:18] <icinga-wm>	 PROBLEM - salt-minion processes on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:51:28] <icinga-wm>	 PROBLEM - HHVM processes on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:51:29] <icinga-wm>	 PROBLEM - nutcracker port on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:52:50] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[02:57:28] <logmsgbot>	 !log LocalisationUpdate completed (1.26wmf12) at 2015-07-05 02:57:28+00:00
[02:57:29] <icinga-wm>	 RECOVERY - Disk space on mw1088 is OK: DISK OK
[02:57:29] <icinga-wm>	 RECOVERY - dhclient process on mw1088 is OK: PROCS OK: 0 processes with command name dhclient
[02:57:35] <morebots>	 Logged the message, Master
[02:57:38] <icinga-wm>	 RECOVERY - DPKG on mw1088 is OK: All packages OK
[02:57:38] <icinga-wm>	 RECOVERY - RAID on mw1088 is OK no RAID installed
[02:57:49] <icinga-wm>	 RECOVERY - nutcracker process on mw1088 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:58:00] <icinga-wm>	 RECOVERY - nutcracker port on mw1088 is OK: TCP OK - 0.000 second response time on port 11212
[02:58:09] <icinga-wm>	 RECOVERY - puppet last run on mw1088 is OK Puppet is currently enabled, last run 31 minutes ago with 0 failures
[02:58:19] <icinga-wm>	 RECOVERY - HHVM processes on mw1088 is OK: PROCS OK: 6 processes with command name hhvm
[02:58:19] <icinga-wm>	 RECOVERY - HHVM rendering on mw1088 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.264 second response time
[02:58:38] <icinga-wm>	 RECOVERY - SSH on mw1052 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:58:50] <icinga-wm>	 RECOVERY - configured eth on mw1088 is OK - interfaces up
[02:59:19] <icinga-wm>	 RECOVERY - SSH on mw1088 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[02:59:49] <icinga-wm>	 RECOVERY - Apache HTTP on mw1088 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.040 second response time
[02:59:51] <Katie>	 SMalyshev: Still?
[03:00:19] <SMalyshev>	 Katie: last one was 02:58:31
[03:02:59] <SMalyshev>	 since then no 503s so far
[03:03:00] <Katie>	 Please file a task in Phabricator ( https://phabricator.wikimedia.org/ ) if it persists.
[03:03:00] <SMalyshev>	 ok, will do
[03:03:00] <Katie>	 You're getting the errors at https://www.wikidata.org ?
[03:03:00] <SMalyshev>	 yes: org.wikidata.query.rdf.tool.exception.ContainedException: Unexpected status code fetching RDF for https://www.wikidata.org/wiki/Special:EntityData/Q20614033.ttl?nocache=1436065106180&flavor=dump:  503
[03:03:00] <Katie>	 Hmmm.
[03:03:00] <icinga-wm>	 RECOVERY - salt-minion processes on mw1088 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[03:04:49] <icinga-wm>	 PROBLEM - SSH on mw1052 is CRITICAL - Socket timeout after 10 seconds
[03:05:30] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[03:07:10] <icinga-wm>	 RECOVERY - salt-minion processes on mw1052 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[03:07:19] <icinga-wm>	 RECOVERY - HHVM processes on mw1052 is OK: PROCS OK: 1 process with command name hhvm
[03:07:19] <icinga-wm>	 RECOVERY - puppet last run on mw1052 is OK Puppet is currently enabled, last run 38 minutes ago with 0 failures
[03:07:20] <icinga-wm>	 RECOVERY - configured eth on mw1052 is OK - interfaces up
[03:07:20] <icinga-wm>	 RECOVERY - DPKG on mw1052 is OK: All packages OK
[03:07:20] <icinga-wm>	 RECOVERY - nutcracker port on mw1052 is OK: TCP OK - 0.000 second response time on port 11212
[03:07:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw1052 is OK: HTTP OK: HTTP/1.1 200 OK - 64445 bytes in 5.158 second response time
[03:07:58] <icinga-wm>	 RECOVERY - Apache HTTP on mw1052 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.176 second response time
[03:07:59] <icinga-wm>	 RECOVERY - Disk space on mw1052 is OK: DISK OK
[03:07:59] <icinga-wm>	 RECOVERY - dhclient process on mw1052 is OK: PROCS OK: 0 processes with command name dhclient
[03:08:38] <icinga-wm>	 RECOVERY - SSH on mw1052 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[03:08:49] <icinga-wm>	 RECOVERY - nutcracker process on mw1052 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[03:09:08] <icinga-wm>	 RECOVERY - RAID on mw1052 is OK no RAID installed
[03:09:28] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 18407 bytes in 0.045 second response time
[03:19:28] <icinga-wm>	 PROBLEM - puppet last run on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:23:09] <icinga-wm>	 RECOVERY - puppet last run on mw1047 is OK Puppet is currently enabled, last run 1 hour ago with 0 failures
[03:35:10] <icinga-wm>	 PROBLEM - puppet last run on wtp1013 is CRITICAL Puppet has 1 failures
[03:35:10] <icinga-wm>	 PROBLEM - puppet last run on analytics1032 is CRITICAL Puppet has 1 failures
[03:35:58] <icinga-wm>	 PROBLEM - puppet last run on mw1180 is CRITICAL Puppet has 1 failures
[03:35:58] <icinga-wm>	 PROBLEM - puppet last run on mw1181 is CRITICAL Puppet has 1 failures
[03:36:08] <icinga-wm>	 PROBLEM - puppet last run on mw1156 is CRITICAL Puppet has 1 failures
[03:36:09] <icinga-wm>	 PROBLEM - puppet last run on mw2189 is CRITICAL Puppet has 2 failures
[03:39:19] <icinga-wm>	 PROBLEM - HHVM processes on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:39:28] <icinga-wm>	 PROBLEM - configured eth on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:39:40] <icinga-wm>	 PROBLEM - salt-minion processes on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:39:48] <icinga-wm>	 PROBLEM - SSH on mw1047 is CRITICAL - Socket timeout after 10 seconds
[03:39:49] <icinga-wm>	 PROBLEM - puppet last run on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:39:59] <icinga-wm>	 PROBLEM - Disk space on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:40:09] <icinga-wm>	 PROBLEM - nutcracker port on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:40:19] <icinga-wm>	 PROBLEM - DPKG on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:40:30] <icinga-wm>	 PROBLEM - dhclient process on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:40:39] <icinga-wm>	 PROBLEM - nutcracker process on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:40:50] <icinga-wm>	 PROBLEM - RAID on mw1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:41:09] <icinga-wm>	 PROBLEM - puppet last run on mw1085 is CRITICAL puppet fail
[03:45:58] <icinga-wm>	 RECOVERY - dhclient process on mw1047 is OK: PROCS OK: 0 processes with command name dhclient
[03:45:59] <icinga-wm>	 RECOVERY - nutcracker process on mw1047 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[03:46:09] <icinga-wm>	 RECOVERY - RAID on mw1047 is OK no RAID installed
[03:46:39] <icinga-wm>	 RECOVERY - HHVM processes on mw1047 is OK: PROCS OK: 6 processes with command name hhvm
[03:46:49] <icinga-wm>	 RECOVERY - configured eth on mw1047 is OK - interfaces up
[03:46:59] <icinga-wm>	 RECOVERY - salt-minion processes on mw1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[03:47:00] <icinga-wm>	 RECOVERY - SSH on mw1047 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[03:47:00] <icinga-wm>	 RECOVERY - Apache HTTP on mw1047 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.045 second response time
[03:47:09] <icinga-wm>	 RECOVERY - puppet last run on mw1047 is OK Puppet is currently enabled, last run 1 hour ago with 0 failures
[03:47:09] <icinga-wm>	 RECOVERY - HHVM rendering on mw1047 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.243 second response time
[03:47:19] <icinga-wm>	 RECOVERY - Disk space on mw1047 is OK: DISK OK
[03:47:28] <icinga-wm>	 RECOVERY - nutcracker port on mw1047 is OK: TCP OK - 0.000 second response time on port 11212
[03:47:29] <icinga-wm>	 RECOVERY - DPKG on mw1047 is OK: All packages OK
[03:50:18] <icinga-wm>	 PROBLEM - puppet last run on mw1129 is CRITICAL puppet fail
[03:51:59] <icinga-wm>	 PROBLEM - puppet last run on db1048 is CRITICAL puppet fail
[03:52:18] <icinga-wm>	 PROBLEM - puppet last run on stat1003 is CRITICAL puppet fail
[03:52:28] <icinga-wm>	 PROBLEM - puppet last run on mc2014 is CRITICAL puppet fail
[03:52:39] <icinga-wm>	 PROBLEM - puppet last run on cp4005 is CRITICAL puppet fail
[03:52:50] <icinga-wm>	 PROBLEM - puppet last run on mw2062 is CRITICAL puppet fail
[03:53:39] <icinga-wm>	 RECOVERY - puppet last run on wtp1013 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures
[03:53:39] <icinga-wm>	 RECOVERY - puppet last run on analytics1032 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures
[03:54:39] <icinga-wm>	 PROBLEM - puppet last run on cp2018 is CRITICAL puppet fail
[03:54:48] <icinga-wm>	 PROBLEM - puppet last run on ms-be2012 is CRITICAL puppet fail
[03:58:08] <icinga-wm>	 RECOVERY - puppet last run on mw1180 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures
[03:58:11] <icinga-wm>	 RECOVERY - puppet last run on mw1156 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures
[03:59:50] <icinga-wm>	 PROBLEM - puppet last run on mw2136 is CRITICAL Puppet has 1 failures
[03:59:59] <icinga-wm>	 RECOVERY - puppet last run on mw1181 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:00:19] <icinga-wm>	 RECOVERY - puppet last run on mw2189 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:02:29] <icinga-wm>	 PROBLEM - Incoming network saturation on labstore1003 is CRITICAL 11.11% of data above the critical threshold [100000000.0]
[04:03:19] <icinga-wm>	 RECOVERY - puppet last run on mw1085 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:08:48] <icinga-wm>	 RECOVERY - puppet last run on mw1129 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures
[04:08:58] <icinga-wm>	 RECOVERY - puppet last run on stat1003 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures
[04:09:09] <icinga-wm>	 RECOVERY - puppet last run on mw2136 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures
[04:09:09] <icinga-wm>	 RECOVERY - puppet last run on mc2014 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures
[04:09:20] <icinga-wm>	 RECOVERY - puppet last run on cp4005 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures
[04:10:29] <icinga-wm>	 RECOVERY - puppet last run on db1048 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:11:19] <icinga-wm>	 RECOVERY - puppet last run on cp2018 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures
[04:11:20] <icinga-wm>	 RECOVERY - puppet last run on mw2062 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:11:28] <icinga-wm>	 RECOVERY - puppet last run on ms-be2012 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures
[04:22:49] <icinga-wm>	 RECOVERY - Incoming network saturation on labstore1003 is OK Less than 10.00% above the threshold [75000000.0]
[04:27:00] <icinga-wm>	 PROBLEM - Restbase root url on restbase1005 is CRITICAL - Socket timeout after 10 seconds
[04:33:20] <icinga-wm>	 PROBLEM - HHVM rendering on mw1044 is CRITICAL - Socket timeout after 10 seconds
[04:33:39] <icinga-wm>	 PROBLEM - Apache HTTP on mw1044 is CRITICAL - Socket timeout after 10 seconds
[04:34:29] <icinga-wm>	 PROBLEM - nutcracker port on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:34:29] <icinga-wm>	 PROBLEM - puppet last run on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:34:38] <icinga-wm>	 PROBLEM - HHVM processes on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:34:38] <icinga-wm>	 PROBLEM - salt-minion processes on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:34:39] <icinga-wm>	 PROBLEM - RAID on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:34:39] <icinga-wm>	 PROBLEM - configured eth on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:35:09] <icinga-wm>	 PROBLEM - Disk space on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:35:18] <icinga-wm>	 PROBLEM - dhclient process on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:35:29] <icinga-wm>	 PROBLEM - DPKG on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:35:49] <icinga-wm>	 PROBLEM - nutcracker process on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:36:09] <icinga-wm>	 PROBLEM - SSH on mw1044 is CRITICAL - Socket timeout after 10 seconds
[04:46:58] <icinga-wm>	 RECOVERY - nutcracker process on mw1044 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[04:47:08] <icinga-wm>	 RECOVERY - SSH on mw1044 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[04:47:20] <icinga-wm>	 RECOVERY - salt-minion processes on mw1044 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[04:47:20] <icinga-wm>	 RECOVERY - HHVM processes on mw1044 is OK: PROCS OK: 6 processes with command name hhvm
[04:47:20] <icinga-wm>	 RECOVERY - nutcracker port on mw1044 is OK: TCP OK - 0.000 second response time on port 11212
[04:47:29] <icinga-wm>	 RECOVERY - configured eth on mw1044 is OK - interfaces up
[04:47:29] <icinga-wm>	 RECOVERY - RAID on mw1044 is OK no RAID installed
[04:48:09] <icinga-wm>	 RECOVERY - HHVM rendering on mw1044 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.200 second response time
[04:48:09] <icinga-wm>	 RECOVERY - Disk space on mw1044 is OK: DISK OK
[04:48:09] <icinga-wm>	 RECOVERY - dhclient process on mw1044 is OK: PROCS OK: 0 processes with command name dhclient
[04:48:28] <icinga-wm>	 RECOVERY - DPKG on mw1044 is OK: All packages OK
[04:48:28] <icinga-wm>	 RECOVERY - Apache HTTP on mw1044 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.039 second response time
[05:07:58] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 0 below the confidence bounds
[05:09:40] <icinga-wm>	 RECOVERY - puppet last run on mw1044 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures
[05:24:47] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul  5 05:24:47 UTC 2015 (duration 24m 46s)
[05:26:16] * Katie eyes morebots.
[05:30:45] <closedmouth>	 morebots
[05:30:45] <morebots>	 I am a logbot running on tools-exec-1217.
[05:30:46] <morebots>	 Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log.
[05:30:46] <morebots>	 To log a message, type !log <msg>.
[06:04:21] <greg-g>	 Krenair: mukunda/chad/dan, depending on who's around
[06:12:27] <icinga-wm>	 PROBLEM - puppet last run on ms-be3001 is CRITICAL puppet fail
[06:30:48] <icinga-wm>	 PROBLEM - puppet last run on cp2003 is CRITICAL puppet fail
[06:32:48] <icinga-wm>	 PROBLEM - puppet last run on logstash1006 is CRITICAL Puppet has 1 failures
[06:32:59] <icinga-wm>	 RECOVERY - puppet last run on ms-be3001 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures
[06:34:18] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 11 data above and 0 below the confidence bounds
[06:34:18] <icinga-wm>	 PROBLEM - puppet last run on analytics1030 is CRITICAL Puppet has 1 failures
[06:35:29] <icinga-wm>	 PROBLEM - puppet last run on elastic1027 is CRITICAL Puppet has 1 failures
[06:35:39] <icinga-wm>	 PROBLEM - puppet last run on db1051 is CRITICAL Puppet has 1 failures
[06:35:48] <icinga-wm>	 PROBLEM - puppet last run on ruthenium is CRITICAL Puppet has 1 failures
[06:36:10] <icinga-wm>	 PROBLEM - puppet last run on mw1046 is CRITICAL Puppet has 1 failures
[06:36:19] <icinga-wm>	 PROBLEM - puppet last run on mw2163 is CRITICAL Puppet has 1 failures
[06:36:51] <icinga-wm>	 PROBLEM - puppet last run on labcontrol2001 is CRITICAL Puppet has 1 failures
[06:36:59] <icinga-wm>	 PROBLEM - puppet last run on mw2022 is CRITICAL Puppet has 1 failures
[06:37:09] <icinga-wm>	 PROBLEM - puppet last run on mw1123 is CRITICAL Puppet has 1 failures
[06:37:29] <icinga-wm>	 PROBLEM - puppet last run on mw1060 is CRITICAL Puppet has 1 failures
[06:37:49] <icinga-wm>	 PROBLEM - puppet last run on mw1052 is CRITICAL Puppet has 1 failures
[06:37:59] <icinga-wm>	 PROBLEM - puppet last run on mw1235 is CRITICAL Puppet has 1 failures
[06:37:59] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.69% of data above the critical threshold [500.0]
[06:38:10] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL Puppet has 1 failures
[06:38:19] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL Puppet has 1 failures
[06:38:28] <icinga-wm>	 PROBLEM - puppet last run on mw2033 is CRITICAL Puppet has 1 failures
[06:38:50] <icinga-wm>	 PROBLEM - puppet last run on mw1228 is CRITICAL Puppet has 1 failures
[06:40:19] <icinga-wm>	 PROBLEM - puppet last run on mw2093 is CRITICAL Puppet has 2 failures
[06:43:50] <icinga-wm>	 PROBLEM - Cassanda CQL query interface on restbase1004 is CRITICAL: Connection refused
[06:44:19] <icinga-wm>	 RECOVERY - Apache HTTP on mw1112 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.627 second response time
[06:44:38] <icinga-wm>	 PROBLEM - Cassandra database on restbase1004 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon
[06:44:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw1112 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.207 second response time
[06:45:59] <icinga-wm>	 RECOVERY - puppet last run on mw2033 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:48:17] <icinga-wm>	 RECOVERY - Apache HTTP on mw1028 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 5.535 second response time
[06:48:17] <icinga-wm>	 RECOVERY - puppet last run on elastic1027 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures
[06:48:17] <icinga-wm>	 RECOVERY - puppet last run on db1051 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures
[06:48:17] <icinga-wm>	 RECOVERY - puppet last run on mw1060 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures
[06:48:17] <icinga-wm>	 RECOVERY - puppet last run on ruthenium is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures
[06:48:17] <icinga-wm>	 RECOVERY - puppet last run on mw1235 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures
[06:48:17] <icinga-wm>	 RECOVERY - puppet last run on analytics1030 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures
[06:48:17] <icinga-wm>	 RECOVERY - puppet last run on cp2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:48:17] <icinga-wm>	 RECOVERY - puppet last run on logstash1006 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures
[06:48:17] <icinga-wm>	 RECOVERY - HHVM rendering on mw1028 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.134 second response time
[06:48:29] <icinga-wm>	 RECOVERY - puppet last run on mw1228 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures
[06:49:29] <icinga-wm>	 RECOVERY - puppet last run on mw1052 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:49:38] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[06:49:39] <icinga-wm>	 RECOVERY - puppet last run on mw1046 is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures
[06:49:48] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures
[06:49:49] <icinga-wm>	 RECOVERY - puppet last run on mw2163 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:49:50] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures
[06:49:58] <icinga-wm>	 RECOVERY - puppet last run on mw2093 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures
[06:50:09] <icinga-wm>	 RECOVERY - Apache HTTP on mw1057 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.042 second response time
[06:50:19] <icinga-wm>	 RECOVERY - HHVM rendering on mw1057 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.426 second response time
[06:50:19] <icinga-wm>	 RECOVERY - puppet last run on labcontrol2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:50:29] <icinga-wm>	 RECOVERY - puppet last run on mw2022 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:50:38] <icinga-wm>	 RECOVERY - puppet last run on mw1123 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:52:29] <icinga-wm>	 RECOVERY - Apache HTTP on mw1061 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.252 second response time
[06:53:18] <icinga-wm>	 RECOVERY - HHVM rendering on mw1061 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.138 second response time
[06:55:39] <icinga-wm>	 RECOVERY - HHVM rendering on mw1070 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.146 second response time
[06:55:39] <icinga-wm>	 RECOVERY - Apache HTTP on mw1070 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.048 second response time
[06:55:48] <icinga-wm>	 RECOVERY - puppet last run on mw1057 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:49] <icinga-wm>	 RECOVERY - Apache HTTP on mw1069 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.247 second response time
[06:58:08] <icinga-wm>	 RECOVERY - HHVM rendering on mw1069 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.141 second response time
[06:58:59] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected
[06:59:09] <icinga-wm>	 RECOVERY - Apache HTTP on mw1084 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 2.174 second response time
[06:59:10] <icinga-wm>	 RECOVERY - HHVM rendering on mw1084 is OK: HTTP OK: HTTP/1.1 200 OK - 64452 bytes in 6.167 second response time
[07:00:29] <icinga-wm>	 RECOVERY - Apache HTTP on mw1086 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 2.533 second response time
[07:01:06] <jynus>	 !log Restarted HHVM for mw1112,1028,1057,1061,1069,1070,1084,1086
[07:01:27] <morebots>	 Logged the message, Master
[07:01:32] <bblack>	 jynus: any idea what's going on?
[07:02:09] <icinga-wm>	 RECOVERY - HHVM rendering on mw1086 is OK: HTTP OK: HTTP/1.1 200 OK - 64444 bytes in 0.155 second response time
[07:02:28] <jynus>	 bblack, sorry I just wanted to log after the fact
[07:02:30] <icinga-wm>	 PROBLEM - puppet last run on mw1105 is CRITICAL Puppet has 12 failures
[07:02:43] <bblack>	 are you just restarting them because they're dead?
[07:02:50] <bblack>	 or?
[07:03:02] <jynus>	 bblack, yes, they were dead
[07:03:17] <bblack>	 any idea how/why? locked up but running?
[07:03:24] <jynus>	 I assure every time before restarting
[07:04:31] <jynus>	 well, given the time, I assume "regular" crashing, and I only assume more frequently than usual due to the extra load
[07:04:53] <bblack>	 what's the extra load?
[07:05:01] <bblack>	 is this still S:RI load maybe?
[07:05:23] <bblack>	 I'm also seeing some restbase issues in icinga, which I think mobrovac sounded like he was expecting yesterday :/
[07:05:41] <jynus>	 I saw you speaking, so you probably know more than me about that
[07:06:03] <bblack>	 I have no idea
[07:06:57] <jynus>	 then the puppet failures, usually proxy-induced
[07:07:26] <bblack>	 well yeah the 06:xx puppetfails are just puppet being bad I think
[07:07:45] <bblack>	 !log restarted cassandra + restbase on restbase1005
[07:07:51] <morebots>	 Logged the message, Master
[07:08:21] <jynus>	 so to be clear: HHVM service was running, but curl did return the error page on all of those
[07:08:48] <icinga-wm>	 RECOVERY - Restbase root url on restbase1005 is OK: HTTP OK: HTTP/1.1 200 - 15149 bytes in 0.010 second response time
[07:09:13] <bblack>	 !log restarted cassandra on restbase1004
[07:09:17] <jynus>	 I restarted all that were dead in the last hours
[07:09:20] <bblack>	 ok
[07:09:29] <icinga-wm>	 RECOVERY - Cassandra database on restbase1004 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon
[07:09:30] <morebots>	 Logged the message, Master
[07:09:56] <jynus>	 I do not think there is nothing wrong short-term
[07:09:59] <icinga-wm>	 PROBLEM - puppet last run on cp3042 is CRITICAL puppet fail
[07:10:35] <jynus>	 s/nothing/anything/
[07:11:55] <bblack>	 well the restbase thing is still wrong even short-term
[07:12:11] <bblack>	 my suspicion is it may have been the root cause of the fallout the past several hours
[07:12:20] <icinga-wm>	 RECOVERY - Cassanda CQL query interface on restbase1004 is OK: TCP OK - 0.001 second response time on port 9042
[07:12:44] <jynus>	 yeah, I expressed badly, I mean something that requires our attention that hasn't been announced
[07:12:46] <bblack>	 rb1004 still hasn't recovered CQL yet, we'll see
[07:14:14] <jynus>	 but I specifically connected at this time because I imagine there would be less ops eyes
[07:14:51] <bblack>	 well I just got home a bit ago
[07:15:10] <bblack>	 all of the past ~5h looks awful on channel alerts
[07:16:49] <jynus>	 that is why I connected, just a few minutes ago, too :-)
[07:16:56] <bblack>	 hopefully with two dead restbase cassandras restarted, things will stabilize for a while
[07:18:26] <bblack>	 I'm really pretty displeased with the whole RB/cassandra debacle that's been going on lately :P
[07:25:19] <icinga-wm>	 RECOVERY - puppet last run on cp3042 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures
[07:25:27] <jynus>	 donations is up
[07:27:47] <wikibugs>	 6operations, 10Traffic, 10fundraising-tech-ops: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1427057 (10Chmarkine) Once these two domain names, `www.donate.wikimediafoundation.org`, `www.donate.mediawiki.org` are removed, `wikimediafoundation.org`...
[07:33:50] <bblack>	 I really think the Special:RecordImpression problems from the broken campaign going on are to blame for the appserver deaths now
[07:34:09] <bblack>	 it correlates well, anyways.  restbase being unhealthy I think was unrelated.
[07:34:30] <bblack>	 https://phabricator.wikimedia.org/T45250#1427060
[07:35:58] <wikibugs>	 6operations, 10Traffic, 10fundraising-tech-ops: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1427062 (10Chmarkine) Oh, actually http://www.donate.wikimediafoundation.org/ redirects to https://wikimediafoundation.org/wiki/Home, and http://www.donate...
[07:36:21] <MaxSem>	 uh, that thing is still not killed?
[07:36:39] <icinga-wm>	 RECOVERY - puppet last run on mw1105 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures
[07:40:02] <bblack>	 MaxSem: doesn't seem so on the appserver graphs
[07:40:30] <bblack>	 and we had a big 503 spike around 02:30, bunch of appservers dying shortly before and for a while after, all correlating with the latest request-rate peak from it
[07:41:08] <MaxSem>	 I mean, RecordImpression in general. we should feed it to o r i
[07:45:37] <bblack>	 :)
[07:46:11] <bblack>	 but this is different.  we apparently just have a runaway poorly configured campaign spamming users and spamming our servers in this particular case
[07:46:24] <bblack>	 even if S:RI survives a bit longer, that campaign needs to die or get throttled
[07:47:11] <jynus>	 as current status seems ok (although not stable), I will disconnect now, check again things later
[07:47:17] <bblack>	 cya jynus
[07:47:43] <bblack>	 as far as I can tell using an anonymous browsing tab: for regular anon/logged-out pageviews, we're showing the banner 100% of the time
[07:47:45] <grrrit-wm>	 (03PS1) 10Chmarkine: Remove www.donate.wikimediafoundation.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222876 (https://phabricator.wikimedia.org/T102827) 
[07:48:11] <bblack>	 I've followed like 20 article links, and it keeps popping up on every single one
[07:48:38] <bblack>	 it does stop showing if you click the X-mark to close it, at least
[07:50:08] <bblack>	 heh after my 20 or so mostly-blind random clicks in from Main_Page, somehow I ended up on https://en.wikipedia.org/wiki/Encyclopedia_of_the_Central_Intelligence_Agency
[07:51:45] <bblack>	 http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Application+servers+eqiad&h=mw1033.eqiad.wmnet&jr=&js=&v=20.425&m=ap_rps&vl=req%2Fsec&ti=Requests+per+second
[07:51:56] <bblack>	 ^ appserver req-rate impact of banner campaign
[07:53:20] <grrrit-wm>	 (03PS1) 10Chmarkine: Remove www.donate.mediawiki.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222877 (https://phabricator.wikimedia.org/T102827) 
[07:56:39] <icinga-wm>	 PROBLEM - Apache HTTP on mw1105 is CRITICAL - Socket timeout after 10 seconds
[07:57:29] <icinga-wm>	 PROBLEM - HHVM rendering on mw1105 is CRITICAL - Socket timeout after 10 seconds
[07:58:38] <icinga-wm>	 PROBLEM - RAID on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:58:59] <icinga-wm>	 PROBLEM - DPKG on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:59:00] <icinga-wm>	 PROBLEM - nutcracker port on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:59:19] <icinga-wm>	 PROBLEM - puppet last run on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:59:31] <MaxSem>	 the wikimania campaign? isn't it sliiiightly late to register?
[07:59:58] <icinga-wm>	 PROBLEM - SSH on mw1105 is CRITICAL - Socket timeout after 10 seconds
[07:59:58] <icinga-wm>	 PROBLEM - HHVM processes on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:59:58] <icinga-wm>	 PROBLEM - dhclient process on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:59:59] <icinga-wm>	 PROBLEM - configured eth on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:00:18] <icinga-wm>	 PROBLEM - nutcracker process on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:00:19] <icinga-wm>	 PROBLEM - salt-minion processes on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:00:40] <icinga-wm>	 PROBLEM - Disk space on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:06:28] <bblack>	 MaxSem: no idea.  they're set to run this campaign from Jul 2-14: https://meta.wikimedia.org/w/index.php?title=Special:CentralNotice&subaction=noticeDetail&notice=wm2015register
[08:06:37] <MaxSem>	 yup
[08:06:59] <MaxSem>	 if it's causing problems, I guess it's reasonable to disable it for now
[08:07:03] <bblack>	 I'm considering just blocking S:RI
[08:07:19] <bblack>	 I really don't know how, at my level of things, I could or should cleanly disable the campaign itself without breaking other things
[08:07:38] <bblack>	 but I could at least kill the S:RI traffic at varnish
[08:08:15] <bblack>	 I'd really rather we have someone who knows more about this, or whoever created it, fix their campaign to not be spammy
[08:08:31] <MaxSem>	 that can work yes - we don't friggin need to have analytics about a banner as spammy as this
[08:14:49] <grrrit-wm>	 (03PS1) 10BBlack: filter S:RI from wm2015register T45250 [puppet] - 10https://gerrit.wikimedia.org/r/222879 
[08:15:23] <grrrit-wm>	 (03PS2) 10BBlack: filter S:RI from wm2015register T45250 [puppet] - 10https://gerrit.wikimedia.org/r/222879 
[08:15:40] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] filter S:RI from wm2015register T45250 [puppet] - 10https://gerrit.wikimedia.org/r/222879 (owner: 10BBlack)
[08:18:08] <icinga-wm>	 PROBLEM - HHVM rendering on mw1053 is CRITICAL - Socket timeout after 10 seconds
[08:18:38] <icinga-wm>	 PROBLEM - Apache HTTP on mw1053 is CRITICAL - Socket timeout after 10 seconds
[08:18:39] <icinga-wm>	 RECOVERY - HHVM processes on mw1105 is OK: PROCS OK: 6 processes with command name hhvm
[08:18:39] <icinga-wm>	 RECOVERY - dhclient process on mw1105 is OK: PROCS OK: 0 processes with command name dhclient
[08:19:18] <icinga-wm>	 RECOVERY - Disk space on mw1105 is OK: DISK OK
[08:19:29] <icinga-wm>	 RECOVERY - DPKG on mw1105 is OK: All packages OK
[08:19:29] <icinga-wm>	 PROBLEM - HHVM processes on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:19:38] <icinga-wm>	 RECOVERY - nutcracker port on mw1105 is OK: TCP OK - 0.000 second response time on port 11212
[08:19:40] <icinga-wm>	 PROBLEM - nutcracker port on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:19:40] <icinga-wm>	 PROBLEM - nutcracker process on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:19:40] <icinga-wm>	 PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:19:48] <icinga-wm>	 RECOVERY - puppet last run on mw1105 is OK Puppet is currently enabled, last run 43 minutes ago with 0 failures
[08:19:58] <icinga-wm>	 RECOVERY - HHVM rendering on mw1105 is OK: HTTP OK: HTTP/1.1 200 OK - 64428 bytes in 0.289 second response time
[08:20:08] <icinga-wm>	 PROBLEM - SSH on mw1053 is CRITICAL - Socket timeout after 10 seconds
[08:20:09] <icinga-wm>	 PROBLEM - puppet last run on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:20:09] <icinga-wm>	 PROBLEM - Disk space on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:20:10] <icinga-wm>	 PROBLEM - configured eth on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:20:29] <icinga-wm>	 RECOVERY - SSH on mw1105 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[08:20:29] <icinga-wm>	 RECOVERY - configured eth on mw1105 is OK - interfaces up
[08:20:29] <icinga-wm>	 PROBLEM - dhclient process on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:20:38] <icinga-wm>	 PROBLEM - salt-minion processes on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:20:38] <icinga-wm>	 PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:20:49] <icinga-wm>	 RECOVERY - nutcracker process on mw1105 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[08:20:50] <icinga-wm>	 RECOVERY - salt-minion processes on mw1105 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[08:21:00] <icinga-wm>	 RECOVERY - Apache HTTP on mw1105 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.043 second response time
[08:21:00] <icinga-wm>	 RECOVERY - RAID on mw1105 is OK no RAID installed
[08:23:34] <_joe_>	 !log restarted apache on mw1105,mw1092,90,82,78
[08:23:39] <morebots>	 Logged the message, Master
[08:23:49] <_joe_>	 !log restarted hhvm because of ooms, not apache
[08:23:53] <morebots>	 Logged the message, Master
[08:27:42] <bblack>	 !log FYI: 08:15 < grrrit-wm> (CR) BBlack: [C: 2 V: 2] filter S:RI from wm2015register T45250 [puppet] - https://gerrit.wikimedia.org/r/222879 (owner: BBlack)
[08:27:47] <morebots>	 Logged the message, Master
[08:27:58] <icinga-wm>	 RECOVERY - dhclient process on mw1053 is OK: PROCS OK: 0 processes with command name dhclient
[08:27:58] <icinga-wm>	 RECOVERY - salt-minion processes on mw1053 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[08:28:50] <icinga-wm>	 RECOVERY - HHVM processes on mw1053 is OK: PROCS OK: 6 processes with command name hhvm
[08:28:58] <icinga-wm>	 RECOVERY - nutcracker port on mw1053 is OK: TCP OK - 0.000 second response time on port 11212
[08:28:59] <icinga-wm>	 RECOVERY - nutcracker process on mw1053 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[08:29:19] <icinga-wm>	 RECOVERY - Disk space on mw1053 is OK: DISK OK
[08:29:20] <icinga-wm>	 RECOVERY - SSH on mw1053 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)
[08:29:29] <icinga-wm>	 RECOVERY - configured eth on mw1053 is OK - interfaces up
[08:29:49] <_joe_>	 !log restaarted HHVM on mw1059 with heap profiling enabled, collecting data (will stop this evening).
[08:29:49] <icinga-wm>	 RECOVERY - RAID on mw1053 is OK no RAID installed
[08:29:53] <morebots>	 Logged the message, Master
[08:30:49] <icinga-wm>	 RECOVERY - DPKG on mw1053 is OK: All packages OK
[08:31:08] <icinga-wm>	 RECOVERY - puppet last run on mw1053 is OK Puppet is currently enabled, last run 16 minutes ago with 0 failures
[08:31:09] <icinga-wm>	 RECOVERY - HHVM rendering on mw1053 is OK: HTTP OK: HTTP/1.1 200 OK - 64428 bytes in 1.656 second response time
[08:31:38] <icinga-wm>	 RECOVERY - Apache HTTP on mw1053 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.037 second response time
[08:40:34] <_joe_>	 !log collecting heaps on an api appserver, mw1115, as comparison
[08:40:38] <morebots>	 Logged the message, Master
[08:43:39] <icinga-wm>	 PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100%
[08:44:19] <icinga-wm>	 RECOVERY - Host mw2027 is UPING WARNING - Packet loss = 58%, RTA = 43.04 ms
[08:50:28] <icinga-wm>	 RECOVERY - Apache HTTP on mw1051 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.049 second response time
[08:50:28] <icinga-wm>	 RECOVERY - HHVM rendering on mw1051 is OK: HTTP OK: HTTP/1.1 200 OK - 64428 bytes in 0.137 second response time
[08:58:09] <icinga-wm>	 RECOVERY - HHVM busy threads on mw1051 is OK Less than 30.00% above the threshold [57.6]
[08:58:49] <icinga-wm>	 RECOVERY - HHVM queue size on mw1051 is OK Less than 30.00% above the threshold [10.0]
[09:04:54] <grrrit-wm>	 (03PS1) 10Chmarkine: Remove www.donate.wiktionary.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222880 (https://phabricator.wikimedia.org/T102827) 
[09:29:24] <grrrit-wm>	 (03PS1) 10Chmarkine: Remove www.donate.wikipedia.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222883 (https://phabricator.wikimedia.org/T102827) 
[09:31:30] <icinga-wm>	 PROBLEM - Restbase root url on restbase1005 is CRITICAL - Socket timeout after 10 seconds
[09:43:34] <bblack>	 !log restarted restbase on restbase1005
[09:43:38] <morebots>	 Logged the message, Master
[09:44:49] <icinga-wm>	 RECOVERY - Restbase root url on restbase1005 is OK: HTTP OK: HTTP/1.1 200 - 15149 bytes in 0.058 second response time
[10:17:11] <wikibugs>	 6operations, 7HHVM: HHVM memory leaks result in OOMs & 500 spikes - https://phabricator.wikimedia.org/T104769#1427156 (10Joe) I am collecting data on mw1059 (and mw1115 for comparison) to see what changes in the memory profile of both servers over time. In about 1 day of data we should have a very clear pictur...
[10:36:00] <_joe_>	  
[10:45:05] <Nemo_bis>	 Hi
[10:45:50] <icinga-wm>	 PROBLEM - Restbase root url on restbase1002 is CRITICAL - Socket timeout after 10 seconds
[10:45:58] <icinga-wm>	 PROBLEM - Cassanda CQL query interface on restbase1004 is CRITICAL: Connection refused
[10:46:38] <icinga-wm>	 PROBLEM - Restbase root url on restbase1003 is CRITICAL - Socket timeout after 10 seconds
[10:46:49] <icinga-wm>	 PROBLEM - Cassandra database on restbase1004 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon
[10:51:39] <icinga-wm>	 PROBLEM - puppet last run on analytics1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:51:39] <icinga-wm>	 PROBLEM - RAID on analytics1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:52:18] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:52:28] <icinga-wm>	 PROBLEM - Disk space on analytics1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:52:49] <icinga-wm>	 PROBLEM - SSH on analytics1020 is CRITICAL - Socket timeout after 10 seconds
[10:53:10] <icinga-wm>	 PROBLEM - dhclient process on analytics1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:53:10] <icinga-wm>	 PROBLEM - DPKG on analytics1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:53:19] <icinga-wm>	 PROBLEM - configured eth on analytics1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:53:28] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:53:29] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:59:29] <icinga-wm>	 PROBLEM - Host analytics1020 is DOWN: PING CRITICAL - Packet loss = 100%
[11:03:26] <_joe_>	 !log restarting cassandra on rb1003,4 and restbase on rb1002,3
[11:03:39] <icinga-wm>	 PROBLEM - Cassanda CQL query interface on restbase1003 is CRITICAL: Connection refused
[11:03:39] <icinga-wm>	 RECOVERY - Cassandra database on restbase1004 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon
[11:03:44] <_joe_>	 also, what the fuck.
[11:03:47] <morebots>	 Logged the message, Master
[11:04:38] <icinga-wm>	 RECOVERY - Cassanda CQL query interface on restbase1004 is OK: TCP OK - 0.000 second response time on port 9042
[11:05:29] <icinga-wm>	 RECOVERY - Cassanda CQL query interface on restbase1003 is OK: TCP OK - 0.002 second response time on port 9042
[11:10:39] <icinga-wm>	 RECOVERY - Restbase root url on restbase1003 is OK: HTTP OK: HTTP/1.1 200 - 15149 bytes in 0.016 second response time
[11:11:48] <icinga-wm>	 RECOVERY - Restbase root url on restbase1002 is OK: HTTP OK: HTTP/1.1 200 - 15149 bytes in 0.009 second response time
[11:14:29] <icinga-wm>	 PROBLEM - Restbase root url on restbase1001 is CRITICAL - Socket timeout after 10 seconds
[11:14:49] <icinga-wm>	 PROBLEM - Cassandra database on restbase1004 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon
[11:14:49] <icinga-wm>	 PROBLEM - Restbase root url on restbase1006 is CRITICAL - Socket timeout after 10 seconds
[11:15:40] <icinga-wm>	 PROBLEM - Cassanda CQL query interface on restbase1004 is CRITICAL: Connection refused
[11:17:20] <_joe_>	 this is bad
[11:20:08] <_joe_>	 !log restarting restbase on rb100{1,4,6}
[11:20:28] <icinga-wm>	 RECOVERY - Cassandra database on restbase1004 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon
[11:21:13] <_joe_>	 !log restarted cassandra on restbase1004 (again), seemingly crashed for a bad request
[11:21:17] <morebots>	 Logged the message, Master
[11:21:20] <icinga-wm>	 RECOVERY - Cassanda CQL query interface on restbase1004 is OK: TCP OK - 0.012 second response time on port 9042
[11:21:59] <icinga-wm>	 RECOVERY - Restbase root url on restbase1001 is OK: HTTP OK: HTTP/1.1 200 - 15149 bytes in 0.010 second response time
[11:22:18] <icinga-wm>	 RECOVERY - Restbase root url on restbase1006 is OK: HTTP OK: HTTP/1.1 200 - 15149 bytes in 0.021 second response time
[11:22:46] <_joe_>	 mobrovac, gwicke urandom this should be fixed first thing on monday - restbase should not keep dying for timeouts
[11:33:50] <mobrovac>	 _joe_: agree
[11:36:33] <mobrovac>	 _joe_: i did deploy a small fix / improvement for that friday, but apparently we're still missing something
[12:16:41] <grrrit-wm>	 (03CR) 10Krinkle: [C: 031] Set $wgMainStash to redis instead of the DB default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221885 (https://phabricator.wikimedia.org/T88493) (owner: 10Aaron Schulz)
[12:18:17] <grrrit-wm>	 (03CR) 10Krinkle: "Code looks good, but I can't find in the commit nor the referenced task anything about why Redis. In case this goes side-ways and/or for w" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221885 (https://phabricator.wikimedia.org/T88493) (owner: 10Aaron Schulz)
[12:22:49] <icinga-wm>	 PROBLEM - Cassanda CQL query interface on restbase1004 is CRITICAL: Connection refused
[12:23:59] <icinga-wm>	 PROBLEM - Cassandra database on restbase1004 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon
[12:25:49] <icinga-wm>	 RECOVERY - Cassandra database on restbase1004 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon
[12:26:38] <icinga-wm>	 RECOVERY - Cassanda CQL query interface on restbase1004 is OK: TCP OK - 0.008 second response time on port 9042
[12:27:46] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 031] Remove www.donate.wikimediafoundation.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222876 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine)
[12:27:51] <Nemo_bis>	 Cassanda?
[12:28:08] <Nemo_bis>	 i.e. delenda? :)
[12:28:09] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 031] Remove www.donate.mediawiki.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222877 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine)
[12:36:29] <icinga-wm>	 PROBLEM - Restbase root url on restbase1003 is CRITICAL - Socket timeout after 10 seconds
[12:37:25] <wikibugs>	 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1427281 (10JanZerebecki)
[12:42:11] <wikibugs>	 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1427290 (10JanZerebecki) Regarding the mkt41.net cnames, see also T74514 and T60373.
[12:43:58] <icinga-wm>	 PROBLEM - puppet last run on restbase1005 is CRITICAL Puppet last ran 2 days ago
[12:44:10] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 031] Remove www.donate.wikipedia.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222883 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine)
[12:46:19] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: cassandra: raise heap size to 16Gb [puppet] - 10https://gerrit.wikimedia.org/r/222899 
[12:47:14] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 031] Remove www.donate.wiktionary.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222880 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine)
[12:49:23] <grrrit-wm>	 (03CR) 10Mobrovac: [C: 031] "It should. We have restarted the nodes plenty of times this week due to OOMs." [puppet] - 10https://gerrit.wikimedia.org/r/222899 (owner: 10Giuseppe Lavagetto)
[12:49:29] <icinga-wm>	 RECOVERY - puppet last run on restbase1005 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures
[12:49:44] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] cassandra: raise heap size to 16Gb [puppet] - 10https://gerrit.wikimedia.org/r/222899 (owner: 10Giuseppe Lavagetto)
[12:51:36] <wikibugs>	 6operations, 10RESTBase-Cassandra, 5Patch-For-Review: consider moving Cassandra to G1GC in production - https://phabricator.wikimedia.org/T103161#1427300 (10mobrovac)
[12:53:10] <icinga-wm>	 RECOVERY - Restbase root url on restbase1003 is OK: HTTP OK: HTTP/1.1 200 - 15149 bytes in 0.019 second response time
[12:55:34] <mobrovac>	 !log restbase rolling restart of cassandra to apply the 16G heap change https://gerrit.wikimedia.org/r/222899
[12:55:38] <morebots>	 Logged the message, Master
[13:31:43] <jzerebecki>	 any idea why wikitech instantly looses knowledge of sessions? (as in you log in, reload you are logged out; or the login screen already says you need cookies; even though the cookie is correctly sent)
[13:44:37] <Nemo_bis>	 jzerebecki: https://phabricator.wikimedia.org/T104766 was filed recently
[13:58:20] <icinga-wm>	 PROBLEM - puppet last run on oxygen is CRITICAL Puppet has 1 failures
[13:59:29] <icinga-wm>	 PROBLEM - puppet last run on eventlog2001 is CRITICAL puppet fail
[14:06:48] <jzerebecki>	 Nemo_bis: thx, that's it
[14:17:30] <wikibugs>	 6operations, 10OCG-General-or-Unknown, 6Services: Issues with OCG service in production - https://phabricator.wikimedia.org/T104708#1427364 (10TheDJ) Shouldn't we have event logging for this or something ? It all breaks quite common and we seem to find out everything through user feedback...
[14:18:20] <icinga-wm>	 RECOVERY - puppet last run on eventlog2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:34:18] <icinga-wm>	 RECOVERY - puppet last run on oxygen is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:13:25] <grrrit-wm>	 (03PS1) 10Yuvipanda: celery: Create simple module for celery workers [puppet] - 10https://gerrit.wikimedia.org/r/222913 
[15:13:27] <grrrit-wm>	 (03PS1) 10Yuvipanda: ores: Move ores initial install setup into a base class [puppet] - 10https://gerrit.wikimedia.org/r/222914 
[15:13:55] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] celery: Create simple module for celery workers [puppet] - 10https://gerrit.wikimedia.org/r/222913 (owner: 10Yuvipanda)
[15:16:43] <YuviPanda>	 !log restarted nutcracker on silver.
[15:16:46] <morebots>	 Logged the message, Master
[15:20:29] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] ores: Move ores initial install setup into a base class [puppet] - 10https://gerrit.wikimedia.org/r/222914 (owner: 10Yuvipanda)
[15:24:42] <grrrit-wm>	 (03PS1) 10Yuvipanda: ores: Fix scoping issue with src/venv/config paths [puppet] - 10https://gerrit.wikimedia.org/r/222915 
[15:24:54] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] ores: Fix scoping issue with src/venv/config paths [puppet] - 10https://gerrit.wikimedia.org/r/222915 (owner: 10Yuvipanda)
[15:39:01] <grrrit-wm>	 (03PS2) 10Yuvipanda: Remove bastion1 and bastion2 from labs bastion hosts list [puppet] - 10https://gerrit.wikimedia.org/r/222871 (owner: 10Alex Monk)
[15:39:11] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] "Thanks! :)" [puppet] - 10https://gerrit.wikimedia.org/r/222871 (owner: 10Alex Monk)
[15:50:56] <grrrit-wm>	 (03PS1) 10Yuvipanda: [WIP] ores: worker role [puppet] - 10https://gerrit.wikimedia.org/r/222919 
[16:17:39] <icinga-wm>	 PROBLEM - puppet last run on mw1167 is CRITICAL Puppet has 1 failures
[16:38:29] <icinga-wm>	 RECOVERY - puppet last run on mw1167 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:56:08] <icinga-wm>	 PROBLEM - puppet last run on mw1163 is CRITICAL Puppet has 1 failures
[16:56:09] <icinga-wm>	 PROBLEM - puppet last run on mw1125 is CRITICAL Puppet has 1 failures
[17:06:33] <grrrit-wm>	 (03CR) 10Steinsplitter: "can we merge this pls ASAP? https://phabricator.wikimedia.org/T104178#1427512" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya)
[17:08:04] <grrrit-wm>	 (03CR) 10Alex Monk: "Steinsplitter: Today is a Sunday." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya)
[17:10:10] <grrrit-wm>	 (03CR) 10Steinsplitter: "And tomorrow is monday." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya)
[17:11:09] <icinga-wm>	 RECOVERY - puppet last run on mw1125 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures
[17:12:49] <icinga-wm>	 RECOVERY - puppet last run on mw1163 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:17:33] <grrrit-wm>	 (03CR) 10Awight: "Code should include a "TODO: Horrific workaround to an unspeakable status quo" :D" [puppet] - 10https://gerrit.wikimedia.org/r/222879 (owner: 10BBlack)
[17:20:09] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 8.33% of data above the critical threshold [500.0]
[17:21:22] <grrrit-wm>	 (03CR) 10Ori.livneh: "https://youtu.be/kfVsfOSbJY0?t=125" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya)
[17:23:31] <Steinsplitter>	 Krenair: it would be the simplest way to move url whitelisting onwik. O_O
[17:23:48] <Steinsplitter>	 so it don't takes ages to whiteliste it.
[17:24:53] <hoo>	 Greg said those are ok to deploy on Weekend
[17:24:54] <hoo>	 btw
[17:24:58] <Steinsplitter>	 it schouldn't be criticism. just a thought :>
[17:25:38] <icinga-wm>	 PROBLEM - puppet last run on cp3015 is CRITICAL puppet fail
[17:34:56] <Krenair>	 hoo, where/when?
[17:35:10] <hoo>	 Krenair: Not sure when... probably this channel
[17:35:36] <JohnFLewis>	 +1 ori :p
[17:35:47] <Krenair>	 Fine, I'll deploy it now
[17:36:23] <JohnFLewis>	 Krenair: I have a noc symlink update as well - if you want to do that as iirc that doesn't need any deployment-y stuff :)
[17:36:51] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[17:40:08] <Krenair>	 JohnFLewis, we sync that stuff anyway
[17:40:42] <JohnFLewis>	 Krenair: want me to link you to the patch and you can do it now or wait for a pickup on Monday?
[17:40:47] <Krenair>	 And actually
[17:40:50] <Krenair>	 noc is served from terbium, not tin
[17:40:53] <Krenair>	 So it does need syncing
[17:41:11] <JohnFLewis>	 oh its from terbium? heh thought it was tin
[17:41:45] <Krenair>	 nope
[17:42:19] <icinga-wm>	 RECOVERY - puppet last run on cp3015 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures
[17:42:29] <grrrit-wm>	 (03PS4) 10Alex Monk: add unibas.ch to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya)
[17:43:04] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 032] add unibas.ch to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya)
[17:43:10] <grrrit-wm>	 (03Merged) 10jenkins-bot: add unibas.ch to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221600 (owner: 10Matanya)
[17:44:14] <logmsgbot>	 !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221600/ (duration: 00m 12s)
[17:44:18] <morebots>	 Logged the message, Master
[17:46:29] <Krenair>	 JohnFLewis, where is this symlink change then?
[17:46:49] <JohnFLewis>	 Krenair: https://gerrit.wikimedia.org/r/#/c/222290/
[17:47:59] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 032] refresh symlinks to catch new dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222290 (owner: 10John F. Lewis)
[17:48:30] <grrrit-wm>	 (03Merged) 10jenkins-bot: refresh symlinks to catch new dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222290 (owner: 10John F. Lewis)
[17:49:24] <logmsgbot>	 !log krenair Synchronized docroot/noc/conf: https://gerrit.wikimedia.org/r/#/c/222290/ (duration: 00m 13s)
[17:49:28] <morebots>	 Logged the message, Master
[17:51:34] <Krenair>	 JohnFLewis, I reckon we're still missing stuff for those dblists
[17:51:44] <Krenair>	 e.g. aren't we supposed to list them at https://noc.wikimedia.org/conf/ ?
[17:52:20] <Krenair>	 and https://noc.wikimedia.org/conf/visualeditor.dblist and https://noc.wikimedia.org/conf/mediaviewer.dblist are 403
[17:52:24] <Krenair>	 (but nonglobal.dblist is fine)
[17:53:04] <Krenair>	 and also https://noc.wikimedia.org/conf/highlight.php?file=mediaviewer.dblist and https://noc.wikimedia.org/conf/highlight.php?file=visualeditor.dblist don't work either
[17:53:05] <JohnFLewis>	 https://github.com/wikimedia/operations-mediawiki-config/blob/master/docroot/noc/conf/index.php#L63
[17:53:22] <Krenair>	 honestly I wonder why we still have this now that it's all in a public git repo
[17:53:37] <JohnFLewis>	 Yeah...
[17:53:46] <hoo>	 Because it's more convenient for end users, I guess
[17:53:57] <hoo>	 Would be better to build it up on the public git repo, thought
[17:54:05] <hoo>	 * though
[17:54:06] <JohnFLewis>	 they just were updated when I ran that script for another conf change so I just committed those as well
[17:54:31] <Krenair>	 so hang on... those two don't actually exist?
[17:54:34] <Krenair>	 I can't find them
[17:54:51] <JohnFLewis>	 someone didn't clean things up then apparently
[17:55:41] <Krenair>	 We did have a visualeditor.dblist at one point
[17:55:59] <Krenair>	 But apparently not anymore.
[17:56:14] <JohnFLewis>	 thats why I committed them as-is because I remember them existing though I didn't know they were removed then at some point
[17:56:14] <Krenair>	 mediaviewer.dblist was also removed
[17:56:28] <Krenair>	 ok
[17:57:10] <Krenair>	 ahh, I see
[17:57:17] <Krenair>	 they're still listed in createTxtFileSymlinks
[17:57:21] <Krenair>	 I'll clean this up
[18:01:47] <grrrit-wm>	 (03PS1) 10Alex Monk: Clean up noc symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222932 
[18:11:16] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 032] Clean up noc symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222932 (owner: 10Alex Monk)
[18:11:22] <grrrit-wm>	 (03Merged) 10jenkins-bot: Clean up noc symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222932 (owner: 10Alex Monk)
[18:11:58] <logmsgbot>	 !log krenair Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/222932/ (duration: 00m 12s)
[18:12:02] <morebots>	 Logged the message, Master
[19:10:26] <grrrit-wm>	 (03PS1) 10Alex Monk: Update README to remove pmtpa references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222941 
[19:10:28] <grrrit-wm>	 (03PS1) 10Alex Monk: Get rid of most of noc.wikimedia.org/conf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222942 
[19:10:33] <Krenair>	 JohnFLewis, hoo: ^
[19:10:35] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Get rid of most of noc.wikimedia.org/conf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222942 (owner: 10Alex Monk)
[19:11:27] <hoo>	 Krenair: Don't think we can do that
[19:11:32] <Krenair>	 why not?
[19:11:40] <hoo>	 Because we want to keep the urls working
[19:11:42] <hoo>	 I guess
[19:11:49] <Krenair>	 we could redirect stuff
[19:11:51] <hoo>	 We could redirect them to $gitfileview
[19:12:10] <hoo>	 well, but probably not git.wm.o as that's more often down than up
[19:12:47] <grrrit-wm>	 (03PS2) 10Alex Monk: Get rid of most of noc.wikimedia.org/conf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222942 
[19:41:00] <grrrit-wm>	 (03CR) 10John F. Lewis: [C: 031] Update README to remove pmtpa references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222941 (owner: 10Alex Monk)
[19:49:18] <icinga-wm>	 PROBLEM - puppet last run on cp3010 is CRITICAL puppet fail
[20:05:58] <icinga-wm>	 RECOVERY - puppet last run on cp3010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[20:14:51] <subbu>	 looks like cassandra on restbase1005 could use a restart (looking at http://grafana.wikimedia.org/#/dashboard/db/restbase-cassandra-thread-pools )
[20:16:48] <icinga-wm>	 PROBLEM - puppet last run on oxygen is CRITICAL Puppet has 1 failures
[20:16:49] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0]
[20:16:58] <MaxSem>	 subbu, what's going on with cassandra on restbase?
[20:17:35] <subbu>	 MaxSem, i don't know .. all i know is that marko gabriel and erik have been working on for a couple days before the weekend.
[20:18:46] <subbu>	 i periodically chekc parsoid ganglia graphs (a few  times in the day) and that is how I discovered that something is off since there is very little load on the parsoid cluster right now.
[20:20:26] <subbu>	 ok, emailing the ops list and signing off. 
[20:26:10] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[20:35:09] <icinga-wm>	 RECOVERY - puppet last run on oxygen is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:07:09] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 04-1] "todo: have these urls redirect to git" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222942 (owner: 10Alex Monk)
[21:33:52] <grrrit-wm>	 (03CR) 10BBlack: [C: 04-1] "text-lb doesn't have certs for this, and I don't think it's the right answer to this problem." [dns] - 10https://gerrit.wikimedia.org/r/222860 (https://phabricator.wikimedia.org/T104735) (owner: 10John F. Lewis)
[22:03:42] <mobrovac>	 !log restbase rolling restart of restbase
[22:03:46] <morebots>	 Logged the message, Master
[22:08:10] <icinga-wm>	 PROBLEM - Restbase root url on restbase1001 is CRITICAL: Connection refused
[22:18:07] <grrrit-wm>	 (03PS2) 10Ori.livneh: tlsproxy: add negotiated cipher to conn props [puppet] - 10https://gerrit.wikimedia.org/r/222842 (owner: 10BBlack)
[22:18:08] <grrrit-wm>	 (03PS1) 10Ori.livneh: varnishxcps: transform 'C' key to 'ssl_cipher' [puppet] - 10https://gerrit.wikimedia.org/r/222983 
[22:19:19] <icinga-wm>	 RECOVERY - Restbase root url on restbase1001 is OK: HTTP OK: HTTP/1.1 200 - 15149 bytes in 0.014 second response time
[22:19:21] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 031] "LGTM. I5f3fcf87a8 should go out first (and be allowed to propagate to all Varnishes), so we don't end up with a 'C' metric in Graphite." [puppet] - 10https://gerrit.wikimedia.org/r/222842 (owner: 10BBlack)
[22:23:06] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] varnishxcps: transform 'C' key to 'ssl_cipher' [puppet] - 10https://gerrit.wikimedia.org/r/222983 (owner: 10Ori.livneh)
[22:30:37] <bd808>	 !log Restarted logstash on logstah1001; Hung due to OOM errors
[22:30:42] <morebots>	 Logged the message, Master
[22:31:46] <bd808>	 That's the second time in less than a week that logstash has OOM'ed on logstash1001. Something new
[22:32:53] <matanya>	 joy
[23:05:09] <odder>	 Help! I think MediaWiki is stupid!
[23:05:44] <hoo>	 Yeah, it's run by computers... we know that.
[23:05:45] <odder>	 With the humourous beginning out of the way, I need serious help with a bug that's preventing me from helping with a serious privacy violation.
[23:06:05] <odder>	 See the repetition?  I'm /that/ stressed out
[23:08:20] <grrrit-wm>	 (03Abandoned) 10John F. Lewis: (www.)wmfusercontent.org point to text-lb [dns] - 10https://gerrit.wikimedia.org/r/222860 (https://phabricator.wikimedia.org/T104735) (owner: 10John F. Lewis)
[23:10:00] * odder sobs help help help
[23:10:35] <hoo>	 Did you file it?
[23:14:46] <odder>	 Not sure it's a bug though, maybe it's the supposed behaviour
[23:14:52] <odder>	 Need troubleshooting first
[23:15:51] <hoo>	 If it's a serious issue, you can still open a ticket
[23:56:30] <wikibugs>	 6operations, 10Deployment-Systems, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1428414 (10bd808) >>! In T95436#1423807, @Krenair wrote: > How are we going to handle sync of mediawiki-staging between tin and mira? Wouldn't we want any sort of gi...