[01:36:30] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0]
[01:38:40] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[01:39:40] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[01:39:50] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0]
[01:45:30] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[01:46:50] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[01:47:40] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[01:48:40] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[02:19:47] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.4) (duration: 07m 37s)
[02:19:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:25:53] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Jun 11 02:25:53 UTC 2017 (duration 6m 6s)
[02:26:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:23:45] <wikibugs>	 10Operations, 10Labs, 10Striker, 10LDAP: Store Wikimedia unified account name (SUL) in LDAP directory - https://phabricator.wikimedia.org/T148048#3337599 (10bd808) @faidon, @Volans, and I talked about this at the Vienna hackathon. Moving this data from #striker's local DB to LDAP would be a useful step in...
[04:15:40] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=5786.00 Read Requests/Sec=4644.80 Write Requests/Sec=13.00 KBytes Read/Sec=18748.40 KBytes_Written/Sec=4841.60
[04:22:40] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=6.40 Read Requests/Sec=1.30 Write Requests/Sec=2.20 KBytes Read/Sec=16.80 KBytes_Written/Sec=73.20
[06:47:40] <icinga-wm>	 PROBLEM - puppet last run on mw1260 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[scap]
[07:15:50] <icinga-wm>	 RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[12:21:40] <phedenskog>	 We had a performance alert yesterday and I redeployed the agent we use (it runs on AWS US East (N. Virginia)) and every run now complains about problems with the security certificate like in https://phabricator.wikimedia.org/T167572#3337802
[13:01:30] <icinga-wm>	 PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:01:30] <icinga-wm>	 PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:01:30] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:02:20] <icinga-wm>	 RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.339 second response time
[13:02:20] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 615 bytes in 1.414 second response time
[13:02:20] <icinga-wm>	 RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 74697 bytes in 2.220 second response time
[13:22:51] <icinga-wm>	 PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:23:40] <icinga-wm>	 RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[14:14:54] <elukey>	 !log executed cumin 'mw22[51-60].codfw.wmnet' 'find /var/log/hhvm/* -user root -exec chown www-data:www-data {} \;' to reduce cron-spam (new hosts added in March) - T146464
[14:15:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:04] <stashbot>	 T146464: hhvm root:adm owned log files cause failures for logrotate - https://phabricator.wikimedia.org/T146464
[14:17:40] <wikibugs>	 10Operations, 10User-Elukey: hhvm root:adm owned log files cause failures for logrotate - https://phabricator.wikimedia.org/T146464#3337980 (10elukey) On mw2251 today:  ``` root@mw2251:/var/log/hhvm# ls -lht total 24M -rw-r----- 1 www-data www-data    0 Jun 10 06:25 error.log -rw-r----- 1 www-data www-data...
[14:32:01] <wikibugs>	 10Operations, 10User-Elukey: hhvm root:adm owned log files cause failures for logrotate - https://phabricator.wikimedia.org/T146464#3337983 (10elukey) A possibile fix for this issue might to add `FileCreateMode="0640" FileOwner="www-data" FileGroup="www-data"` to the HHVM rsyslog config.   I suspect that until...
[14:43:30] <icinga-wm>	 PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:43:30] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:43:30] <icinga-wm>	 PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:44:20] <icinga-wm>	 RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.190 second response time
[14:44:20] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 615 bytes in 1.254 second response time
[14:44:20] <icinga-wm>	 RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 74769 bytes in 2.292 second response time
[15:11:30] <icinga-wm>	 PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:11:30] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:11:30] <icinga-wm>	 PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:12:20] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 615 bytes in 2.790 second response time
[15:12:20] <icinga-wm>	 RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.751 second response time
[15:12:20] <icinga-wm>	 RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 74707 bytes in 3.613 second response time
[15:29:00] <icinga-wm>	 PROBLEM - Apache HTTP on mw1192 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.076 second response time
[15:29:01] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1192 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.161 second response time
[15:30:00] <icinga-wm>	 RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.541 second response time
[15:30:00] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.183 second response time
[15:32:31] <elukey>	 this one was killed due to hhvm.service: main process exited, code=killed, status=11/SEGV
[15:32:38] <elukey>	 moritzm: --^
[15:34:11] <elukey>	 there is a stacktrace in /var/log/hhvm
[15:34:27] <elukey>	 (not sure if it is already part of another bug report, mention it anyway :
[15:34:30] <elukey>	 :)
[15:43:27] <wikibugs>	 10Operations, 10DBA, 10Wikimedia-Site-requests: Renaming Neoalpha: supervision needed - https://phabricator.wikimedia.org/T167597#3338004 (10revi)
[16:35:54] <wikibugs>	 (03PS1) 10Halfak: Adds require ::icinga::plugins to ::ores::base [puppet] - 10https://gerrit.wikimedia.org/r/358240
[17:27:25] <wikibugs>	 (03PS2) 10Zppix: Adds require ::icinga::plugins to ::ores::base [puppet] - 10https://gerrit.wikimedia.org/r/358240 (https://phabricator.wikimedia.org/T167602) (owner: 10Halfak)
[18:01:50] <icinga-wm>	 PROBLEM - Host scb2005 is DOWN: PING CRITICAL - Packet loss = 100%
[19:30:00] <icinga-wm>	 PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:30:00] <icinga-wm>	 PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:30:00] <icinga-wm>	 PROBLEM - nutcracker process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:31:50] <icinga-wm>	 RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[19:31:50] <icinga-wm>	 RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient
[19:31:50] <icinga-wm>	 RECOVERY - nutcracker process on thumbor1002 is OK: PROCS OK: 1 process with UID = 115 (nutcracker), command name nutcracker
[20:04:52] <Zppix>	 hey are any security folks around at the moment?
[20:37:54] <Zppix>	 Reedy:  got a min?
[20:47:15] <matanya>	 Zppix: email security@wikimedia.org or file a bug in phabricator
[20:48:28] <wikibugs>	 (03CR) 10Krinkle: [C: 031] For HHVM set LANG=C.UTF-8 [puppet] - 10https://gerrit.wikimedia.org/r/353228 (https://phabricator.wikimedia.org/T107128) (owner: 10Tim Starling)
[21:04:00] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2006254
[22:04:00] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 47
[22:06:20] <Reedy>	 Zppix: Hmm?
[22:11:47] <wikibugs>	 (03PS1) 10Framawiki: planet: cleanup en_config.erb [puppet] - 10https://gerrit.wikimedia.org/r/358301
[22:24:49] <Zppix>	 Reedy:  nevermind ill just email security@wikimedia.org
[22:51:09] <wikibugs>	 (03PS1) 10Odder: Add high-density logos for the Basque Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/358303 (https://phabricator.wikimedia.org/T150618)
[22:53:38] <wikibugs>	 (03PS2) 10Odder: Add high-density logos for the Basque Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/358303 (https://phabricator.wikimedia.org/T150618)
[23:10:06] <wikibugs>	 (03CR) 10Dereckson: [C: 031] planet: cleanup en_config.erb [puppet] - 10https://gerrit.wikimedia.org/r/358301 (owner: 10Framawiki)