[01:02:10] <icinga-wm>	 PROBLEM - Check health of redis instance on 6379 on rdb2001 is CRITICAL: CRITICAL: replication_delay is 1498957325 600 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9002661 keys, up 2 minutes 3 seconds - replication_delay is 1498957325
[01:02:20] <icinga-wm>	 PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:02:20] <icinga-wm>	 PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:03:00] <icinga-wm>	 PROBLEM - Check health of redis instance on 6380 on rdb2003 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6380
[01:03:00] <icinga-wm>	 PROBLEM - Check health of redis instance on 6379 on rdb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:03:10] <icinga-wm>	 RECOVERY - Check health of redis instance on 6379 on rdb2001 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8999531 keys, up 3 minutes 3 seconds - replication_delay is 0
[01:03:10] <icinga-wm>	 RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 4293885 keys, up 3 minutes 3 seconds - replication_delay is 0
[01:03:11] <icinga-wm>	 RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4291547 keys, up 3 minutes 4 seconds - replication_delay is 0
[01:03:50] <icinga-wm>	 RECOVERY - Check health of redis instance on 6380 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 8997237 keys, up 3 minutes 42 seconds - replication_delay is 0
[01:03:50] <icinga-wm>	 RECOVERY - Check health of redis instance on 6379 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8993149 keys, up 3 minutes 47 seconds - replication_delay is 0
[01:04:50] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:12:20] <icinga-wm>	 PROBLEM - Check systemd state on conf2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[01:12:38] <icinga-wm>	 PROBLEM - Etcd replication lag on conf2002 is CRITICAL: connect to address 10.192.32.141 and port 8000: Connection refused
[01:13:20] <icinga-wm>	 PROBLEM - etcdmirror-conftool-eqiad-wmnet service on conf2002 is CRITICAL: CRITICAL - Expecting active but unit etcdmirror-conftool-eqiad-wmnet is failed
[01:32:00] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1003 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[01:36:40] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[01:37:40] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1001 is OK: OK ferm input default policy is set
[01:42:10] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes2003 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[01:42:30] <icinga-wm>	 PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100%
[01:43:10] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes2003 is OK: OK ferm input default policy is set
[01:46:44] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team: Viewing raw files on phab fails with ERROR_MESSAGE_MAIN - https://phabricator.wikimedia.org/T169454#3398729 (10Paladox)
[01:48:46] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team: Viewing raw files on phab fails with ERROR_MESSAGE_MAIN - https://phabricator.wikimedia.org/T169454#3398741 (10Paladox) p:05Triage>03High I am filling this in #operations because the error sound unrelated to phabricator internal code. I think this...
[01:52:41] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team: Viewing raw files on phab fails with ERROR_MESSAGE_MAIN - https://phabricator.wikimedia.org/T169454#3398743 (10Paladox) But the rewrites seem to work for https://phabzilla.wmflabs.org/file/data/yydp2qcdrep6lrv2vlxn/PHID-FILE-wnr4rr7iatru4jo2ityf/index....
[01:54:23] <wikibugs>	 (03PS4) 10Paladox: Update npm to 2.x and nodejs to 4.x [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/303370
[01:54:32] <wikibugs>	 (03PS5) 10Paladox: Update npm to 2.x and nodejs to 4.x [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/303370
[02:01:14] <wikibugs>	 (03PS6) 10Paladox: Update npm to 2.x and nodejs to 4.x [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/303370
[02:03:52] <wikibugs>	 (03PS7) 10Paladox: Update npm to 4.x and nodejs to 6.x [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/303370
[02:07:32] <wikibugs>	 (03PS8) 10Paladox: Update npm to 4.x and nodejs to 6.x [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/303370
[02:17:30] <icinga-wm>	 RECOVERY - etcdmirror-conftool-eqiad-wmnet service on conf2002 is OK: OK - etcdmirror-conftool-eqiad-wmnet is active
[02:20:30] <icinga-wm>	 PROBLEM - etcdmirror-conftool-eqiad-wmnet service on conf2002 is CRITICAL: CRITICAL - Expecting active but unit etcdmirror-conftool-eqiad-wmnet is failed
[02:38:20] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes2003 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[02:39:20] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes2003 is OK: OK ferm input default policy is set
[02:47:43] <icinga-wm>	 RECOVERY - Etcd replication lag on conf2002 is OK: HTTP OK: HTTP/1.1 200 OK - 148 bytes in 0.073 second response time
[02:47:43] <icinga-wm>	 RECOVERY - etcdmirror-conftool-eqiad-wmnet service on conf2002 is OK: OK - etcdmirror-conftool-eqiad-wmnet is active
[02:47:50] <icinga-wm>	 RECOVERY - Check systemd state on conf2002 is OK: OK - running: The system is fully operational
[03:02:10] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1002 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[03:03:10] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1002 is OK: OK ferm input default policy is set
[03:30:14] <wikibugs>	 (03CR) 10BryanDavis: [C: 04-2] Update npm to 4.x and nodejs to 6.x [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/303370 (owner: 10Paladox)
[03:31:43] <wikibugs>	 (03CR) 10BryanDavis: [C: 04-2] "Running a `curl | sudo` anything is not safe. We will not be doing things like that." [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/303370 (owner: 10Paladox)
[03:32:41] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes2002 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[03:32:50] <icinga-wm>	 PROBLEM - puppet last run on mw2135 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[03:33:10] <icinga-wm>	 PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[03:33:40] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes2002 is OK: OK ferm input default policy is set
[03:33:44] <wikibugs>	 (03CR) 10Zhuyifei1999: Update npm to 4.x and nodejs to 6.x (032 comments) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/303370 (owner: 10Paladox)
[04:00:20] <icinga-wm>	 RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[04:01:10] <icinga-wm>	 RECOVERY - puppet last run on mw2135 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[04:06:00] <icinga-wm>	 PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[bump nf_conntrack hash table size],Service[carbon]
[04:14:10] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=567.40 Read Requests/Sec=4307.60 Write Requests/Sec=2.50 KBytes Read/Sec=44128.00 KBytes_Written/Sec=48.80
[04:21:20] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=3.90 Read Requests/Sec=0.20 Write Requests/Sec=13.40 KBytes Read/Sec=0.80 KBytes_Written/Sec=146.40
[04:34:10] <icinga-wm>	 RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[04:55:02] <wikibugs>	 10Operations, 10DBA, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, and 2 others: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3398774 (10Urbanecm) Great. I'll schedule the reopening for Wednesday.
[05:37:27] <wikibugs>	 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Markos90 → Mαρκος: supervision needed - https://phabricator.wikimedia.org/T169396#3398775 (10Marostegui) Please ping me before you do the rename so I can monitor the DBs Thanks!
[05:44:07] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T169448#3398777 (10Marostegui) p:05Triage>03Normal This is a s1 slave - @Cmjohnson please change the disk when you are back from holidays. If you need to get some used disks, there are some hosts scheduled for d...
[06:57:10] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 89904.44 seconds
[07:12:30] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw2111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:13:20] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw2111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.199 second response time
[07:44:00] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[07:44:50] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[07:50:50] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[07:54:00] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:25:30] <icinga-wm>	 PROBLEM - graphite.wikimedia.org on graphite1003 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.001 second response time
[10:43:00] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mwlog2001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[10:44:00] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mwlog2001 is OK: OK ferm input default policy is set
[10:55:01] <TabbyCat>	 marostegui: ping
[11:04:15] <wm-bot>	 This user is now online in #wikimedia-operations. I'll let you know when they show some activity (talk, etc.)
[11:04:15] <TabbyCat>	 @notify marostegui
[11:12:00] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[11:13:00] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[11:16:00] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0]
[11:24:00] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[12:44:17] <ema>	 !log powercycle mw2256 https://phabricator.wikimedia.org/P5662 T163346
[12:44:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:44:29] <stashbot>	 T163346: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346
[12:45:40] <icinga-wm>	 RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 36.06 ms
[13:09:53] <wikibugs>	 (03PS1) 10D3r1ck01: Remove 'din' from wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/362876 (https://phabricator.wikimedia.org/T168523)
[14:02:20] <icinga-wm>	 PROBLEM - CPU frequency on tin is CRITICAL: CRITICAL: CPU frequency is 600 MHz (160 MHz)
[14:07:50] <icinga-wm>	 PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on tin is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:18:21] <TabbyCat>	 marostegui: global rename time?
[15:37:30] <icinga-wm>	 PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:38:20] <icinga-wm>	 RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient
[15:50:20] <icinga-wm>	 RECOVERY - CPU frequency on tin is OK: OK: CPU frequency is = 600 MHz (1199 MHz)
[15:57:40] <icinga-wm>	 RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on tin is OK: Files ownership is ok.
[16:03:01] <Steinsplitter>	 WARNING: Current database lag 13 s violates maxlag of 6 s, waiting 6 s <-- isn't 13 unusual high?
[16:04:36] <Steinsplitter>	 TabtbyCat:  meaow, today is saturday. :P
[16:06:28] <TabbyCat>	 Steinsplitter: no, today in Sunday
[16:06:41] <Steinsplitter>	 yes .
[16:06:43] <Steinsplitter>	 so. meh.
[16:06:59] <TabbyCat>	 heh
[16:10:10] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0]
[16:22:18] <TabbyCat>	 flood of spambot registrations happening right now, sigh
[16:22:40] <Steinsplitter>	 but how... they deal with captcha?
[16:23:10] <TabbyCat>	 our captcha is too easy
[16:23:17] <Steinsplitter>	 :/
[16:23:33] <TabbyCat>	 or they have humans resolving it for them? idk
[16:25:10] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0]
[16:26:30] <rxy>	 https://anti-captcha.com/　　CAPTCHA is outdated method.   "ajax" confirmation is better way.
[16:27:10] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[17:07:10] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[17:08:20] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[17:15:20] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[17:16:10] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[17:47:51] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team: Gerrit: can lose data if it crashes - https://phabricator.wikimedia.org/T159743#3399128 (10Paladox)
[17:47:54] <wikibugs>	 10Operations, 10Gerrit: Decide how to support polygerrit - https://phabricator.wikimedia.org/T158479#3399129 (10Paladox)
[19:11:20] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0]
[19:19:20] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0]
[19:21:20] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[19:22:20] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[19:45:30] <icinga-wm>	 PROBLEM - Apache HTTP on mw1279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[19:45:40] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.006 second response time
[19:45:40] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1204 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.009 second response time
[19:46:10] <icinga-wm>	 PROBLEM - HHVM rendering on mw1279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[19:46:40] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.061 second response time
[19:47:10] <icinga-wm>	 RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 74897 bytes in 0.429 second response time
[19:47:30] <icinga-wm>	 RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.093 second response time
[19:47:40] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.052 second response time
[19:57:48] <wikibugs>	 10Operations, 10Traffic, 10netops: codfw row B switch upgrade - https://phabricator.wikimedia.org/T169345#3399300 (10ayounsi)
[19:58:47] <wikibugs>	 10Operations, 10Traffic, 10netops: codfw row B switch upgrade - https://phabricator.wikimedia.org/T169345#3395298 (10ayounsi) Edit: moving the maintenance to Wednesday July 12 for availability reasons.
[20:19:58] <wikibugs>	 10Operations, 10cloud-services-team, 10Upstream: New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS - https://phabricator.wikimedia.org/T169290#3399312 (10Nemo_bis)
[20:23:18] <wikibugs>	 10Operations, 10cloud-services-team, 10Upstream: New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS - https://phabricator.wikimedia.org/T169290#3393693 (10Nemo_bis) "A total of 16,214 non-merge changesets were pulled into the mainline repository for the 4.9 development cycle, making this cycle t...
[21:09:20] <icinga-wm>	 PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:38:30] <icinga-wm>	 RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures
[21:53:30] <icinga-wm>	 PROBLEM - HHVM rendering on mw2120 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[21:54:20] <icinga-wm>	 RECOVERY - HHVM rendering on mw2120 is OK: HTTP OK: HTTP/1.1 200 OK - 74865 bytes in 0.302 second response time
[22:45:20] <icinga-wm>	 PROBLEM - HHVM rendering on mw2209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[22:46:10] <icinga-wm>	 RECOVERY - HHVM rendering on mw2209 is OK: HTTP OK: HTTP/1.1 200 OK - 74865 bytes in 0.348 second response time
[23:28:33] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:29:23] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 173 bytes in 0.046 second response time
[23:31:01] <icinga-wm>	 PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[23:32:30] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[23:32:40] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[23:35:30] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[23:38:30] <icinga-wm>	 RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:39:30] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:40:10] <icinga-wm>	 RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:40:40] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]