[00:08:31] <icinga-wm>	 PROBLEM - Restbase root url on restbase1017 is CRITICAL: connect to address 10.64.32.129 and port 7231: Connection refused
[00:14:23] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/page/mobile-sections/{title}{/revision} (Get mobile-sections for a test page on enwiki) 
[00:14:23] <icinga-wm>	 a response was received
[00:14:35] <icinga-wm>	 RECOVERY - Restbase root url on restbase1017 is OK: HTTP OK: HTTP/1.1 200 - 16164 bytes in 0.005 second response time
[00:15:31] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy
[03:33:05] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 985.36 seconds
[04:21:45] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 267.01 seconds
[05:00:48] <icinga-wm>	 PROBLEM - Host text-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[05:01:19] <icinga-wm>	 RECOVERY - Host text-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 83.32 ms
[05:01:56] <robh>	 bah, that paged =P
[05:02:45] <robh>	 silly icinga false positives (it seems)
[05:05:49] <icinga-wm>	 PROBLEM - Host lvs3001 is DOWN: PING CRITICAL - Packet loss = 100%
[05:15:44] <_joe_>	 robh: not a false postive
[05:16:03] <_joe_>	 (I'm still waking up, I'll take a look at lvs3001 shortly)
[05:16:04] <robh>	 bah, i saw it come back but did not see the lvs3001 fail shortly after
[05:16:25] <_joe_>	 robh: I tried to sleep through it (it's 6 AM here)
[05:16:27] <robh>	 mostly cuz im not at home and using mifi to handl ethis from my car in a parking lot (not ideal)
[05:16:39] <_joe_>	 robh: I'll take a look in a few
[05:16:39] <robh>	 but, i can totally call/text folks
[05:16:53] <_joe_>	 don't worry, I got this, in a few :D
[05:16:56] <robh>	 ok, let me know if you want me to send out any further texts (cuz we saw an alert and then an ok)
[05:17:03] <robh>	 so i bet most folks think its fine
[05:17:07] <_joe_>	 nah no problem for now
[05:17:10] <robh>	 cool
[05:17:32] <_joe_>	 yeah, one should always take a look, even if it's immediately recovered
[05:17:47] <robh>	 cool, i was already in car when i got the page and hadn't pulled out of parking space =]
[05:17:55] <robh>	 now i shall, offline for a bit.
[05:18:06] <_joe_>	 yeah, no worries :)
[05:18:18] <robh>	 i got paged for the text-lb, but not for lvs
[05:18:24] <robh>	 fyi
[05:38:29] <_joe_>	 !log powercycling lvs3001
[05:38:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:41:13] <icinga-wm>	 RECOVERY - Host lvs3001 is UP: PING WARNING - Packet loss = 64%, RTA = 83.44 ms
[06:28:27] <icinga-wm>	 PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.007 second response time
[06:28:37] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[06:28:39] <icinga-wm>	 PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apparmor.d/abstractions/ssl_certs]
[06:31:23] <icinga-wm>	 PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/R/biocLite.R]
[06:32:47] <icinga-wm>	 PROBLEM - puppet last run on phab1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/field.sh]
[06:38:09] <icinga-wm>	 RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.550 second response time
[06:38:19] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational
[06:57:23] <icinga-wm>	 RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:58:45] <icinga-wm>	 RECOVERY - puppet last run on phab1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:59:53] <icinga-wm>	 RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[07:04:08] <wikibugs>	 (03PS1) 10ArielGlenn: convert snapshot1005 to a regular dump runner [puppet] - 10https://gerrit.wikimedia.org/r/481285 (https://phabricator.wikimedia.org/T203382)
[07:24:37] <wikibugs>	 (03PS2) 10ArielGlenn: convert snapshot1005 to a regular dump runner [puppet] - 10https://gerrit.wikimedia.org/r/481285 (https://phabricator.wikimedia.org/T203382)
[07:28:27] <wikibugs>	 (03PS16) 10Mathew.onipe: elasticsearch: allow cross cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434)
[07:35:36] <wikibugs>	 (03PS3) 10ArielGlenn: convert snapshot1005 to a regular dump runner [puppet] - 10https://gerrit.wikimedia.org/r/481285 (https://phabricator.wikimedia.org/T203382)
[07:44:45] <wikibugs>	 (03PS4) 10ArielGlenn: convert snapshot1005 to a regular dump runner [puppet] - 10https://gerrit.wikimedia.org/r/481285 (https://phabricator.wikimedia.org/T203382)
[08:03:37] <wikibugs>	 (03PS5) 10ArielGlenn: convert snapshot1005 to a regular dump runner [puppet] - 10https://gerrit.wikimedia.org/r/481285 (https://phabricator.wikimedia.org/T203382)
[08:21:43] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] convert snapshot1005 to a regular dump runner [puppet] - 10https://gerrit.wikimedia.org/r/481285 (https://phabricator.wikimedia.org/T203382) (owner: 10ArielGlenn)
[08:33:06] <wikibugs>	 (03PS17) 10Mathew.onipe: elasticsearch: allow cross cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434)
[08:45:43] <wikibugs>	 10Operations, 10decommission, 10User-fgiunchedi: Return graphite200[12] to spares pool - https://phabricator.wikimedia.org/T199321 (10fgiunchedi) a:05fgiunchedi→03RobH >>! In T199321#4840822, @RobH wrote: > @fgiunchedi: can you confirm these are ready for reclaim and disk wipe?  I claimed it, but I likel...
[08:51:59] <wikibugs>	 10Operations, 10decommission, 10Patch-For-Review, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10fgiunchedi)
[09:00:44] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on ms-be2018 - https://phabricator.wikimedia.org/T212560 (10fgiunchedi) 05Open→03Invalid Host's raid controller locked up, a reboot brought things back. I'll update the controller firmware after the holidays JIC.  ` ms-be2018:~$ cat /proc/mdstat  Personalities : [ra...
[09:02:15] <wikibugs>	 (03PS18) 10Mathew.onipe: elasticsearch: allow cross cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434)
[09:20:46] <wikibugs>	 (03PS19) 10Mathew.onipe: elasticsearch: allow cross cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434)
[09:21:41] <wikibugs>	 10Operations, 10RESTBase-Cassandra, 10Services: restbase cassandra driver excessive logging when cassandra hosts are down - https://phabricator.wikimedia.org/T212424 (10fgiunchedi) Agreed less workers will lessen the problem, though even per-worker logging  (assuming different workers have different `pid` in...
[10:11:15] <icinga-wm>	 PROBLEM - Restbase root url on restbase1018 is CRITICAL: connect to address 10.64.48.97 and port 7231: Connection refused
[10:14:55] <wikibugs>	 (03PS20) 10Giuseppe Lavagetto: elasticsearch: allow cross cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434) (owner: 10Mathew.onipe)
[10:17:19] <icinga-wm>	 RECOVERY - Restbase root url on restbase1018 is OK: HTTP OK: HTTP/1.1 200 - 16164 bytes in 0.007 second response time
[10:32:06] <wikibugs>	 (03PS21) 10Mathew.onipe: elasticsearch: allow cross cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434)
[10:42:26] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Please be careful when merging this patch. It needs to be compiled on all clusters that don't define profile::elasticsearch::instances too" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434) (owner: 10Mathew.onipe)
[10:47:15] <wikibugs>	 (03PS22) 10Mathew.onipe: elasticsearch: allow cross cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434)
[10:48:01] <wikibugs>	 (03CR) 10Mathew.onipe: elasticsearch: allow cross cluster communication (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434) (owner: 10Mathew.onipe)
[10:50:17] <wikibugs>	 (03CR) 10Mathew.onipe: elasticsearch: allow cross cluster communication (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434) (owner: 10Mathew.onipe)
[11:03:50] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC is happy: https://puppet-compiler.wmflabs.org/compiler1002/14082/" [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434) (owner: 10Mathew.onipe)
[11:09:05] <wikibugs>	 (03CR) 10Mathew.onipe: elasticsearch: allow cross cluster communication (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434) (owner: 10Mathew.onipe)
[11:28:47] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1
[11:28:47] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3fullscreenrefresh=1morgId=1
[11:32:23] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1
[11:32:25] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3fullscreenrefresh=1morgId=1
[11:51:27] <wikibugs>	 10Operations, 10Release Pipeline, 10Core Platform Team Backlog (Watching / External), 10Release-Engineering-Team (Watching / External), 10Services (watching): Revisit the logging work done on Q1 2017-2018 for the standard pod setup - https://phabricator.wikimedia.org/T207200 (10fgiunchedi) Thanks for loo...
[11:51:50] <wikibugs>	 10Operations, 10decommission, 10Patch-For-Review, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10fgiunchedi) a:03RobH @RobH graphite100[13]  confirmed ready for wipe/decom
[12:06:45] <wikibugs>	 (03PS23) 10Mathew.onipe: elasticsearch: allow cross cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434)
[12:14:12] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/14083/" [puppet] - 10https://gerrit.wikimedia.org/r/481125 (https://phabricator.wikimedia.org/T212434) (owner: 10Mathew.onipe)
[13:37:16] <wikibugs>	 10Operations, 10Parsoid, 10Patch-For-Review: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10MoritzMuehlenhoff) npm 5.8 is now finally available in stretch-backports: https://lists.debian.org/debian-backports-changes/2018/12/threads.html
[14:34:53] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-2] "For the same reasons that I -1ed https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475150/" [puppet] - 10https://gerrit.wikimedia.org/r/481215 (https://phabricator.wikimedia.org/T212327) (owner: 10Bstorm)
[14:36:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-2] "I 've also found https://phabricator.wikimedia.org/T41785 that is pretty much the same topic." [puppet] - 10https://gerrit.wikimedia.org/r/481215 (https://phabricator.wikimedia.org/T212327) (owner: 10Bstorm)
[15:07:53] <wikibugs>	 (03PS1) 10ArielGlenn: permit rsync pulls from dumps primary nfs server to a peer [puppet] - 10https://gerrit.wikimedia.org/r/481299
[15:13:11] <wikibugs>	 (03PS2) 10ArielGlenn: permit rsync pulls from dumps primary nfs server to a peer [puppet] - 10https://gerrit.wikimedia.org/r/481299
[15:17:14] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] permit rsync pulls from dumps primary nfs server to a peer [puppet] - 10https://gerrit.wikimedia.org/r/481299 (owner: 10ArielGlenn)
[15:24:21] <wikibugs>	 (03PS1) 10ArielGlenn: ferm rules for rsync from dumps peers to dumps primary nfs server [puppet] - 10https://gerrit.wikimedia.org/r/481301
[15:29:35] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] ferm rules for rsync from dumps peers to dumps primary nfs server [puppet] - 10https://gerrit.wikimedia.org/r/481301 (owner: 10ArielGlenn)
[17:01:41] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1
[17:04:07] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1
[17:22:16] <wikibugs>	 (03PS1) 10ArielGlenn: use dumps nfs server fallback host for rsyncs to dump webserver etc [puppet] - 10https://gerrit.wikimedia.org/r/481303
[17:26:06] <apergos>	 https://twitter.com/alicegoldfuss/status/1076944612432826368
[17:33:41] <apergos>	 I think I might be done for the day. rsync ate my brain. so... have a great holiday or vacation, hopefully everyone gets a couple days off
[18:01:31] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received
[18:02:43] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy
[18:02:45] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[18:03:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[18:03:59] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[18:05:03] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[18:07:27] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[18:07:35] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[18:08:45] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[18:08:49] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received
[18:09:53] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[18:11:13] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[18:13:31] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[18:17:27] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:19:41] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[18:19:53] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.930 second response time
[18:25:53] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received
[18:27:07] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[18:27:13] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:28:19] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[18:29:29] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[18:38:17] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.725 second response time
[18:42:01] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:43:05] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[18:43:07] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[18:44:13] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[18:44:13] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[18:48:01] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 277 bytes in 0.886 second response time
[18:48:03] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:59:11] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:01:29] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[19:02:45] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[19:02:51] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.079 second response time
[19:03:53] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received
[19:03:55] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[19:05:03] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[19:05:07] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received
[19:05:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received
[19:06:13] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy
[19:06:21] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received
[19:07:27] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[19:07:31] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[19:07:33] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy
[19:07:33] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[19:08:51] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[19:08:59] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:10:01] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received
[19:10:03] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[19:11:15] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received
[19:12:21] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[19:12:27] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[19:12:29] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received
[19:12:37] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 277 bytes in 7.569 second response time
[19:13:35] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy
[19:13:47] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[19:14:53] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy
[19:15:03] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received
[19:16:09] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[19:16:09] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[19:16:09] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[19:16:21] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:17:23] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[19:18:31] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[19:18:37] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received
[19:19:51] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy
[19:19:57] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[19:19:57] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[19:21:05] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[19:21:05] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[19:21:05] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[19:24:49] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[19:24:53] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[19:26:01] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[19:29:43] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[19:47:07] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.111 second response time
[19:50:55] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:09:03] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.132 second response time
[20:12:49] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:21:27] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.430 second response time
[20:25:09] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:30:01] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.039 second response time
[20:33:47] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:34:49] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time
[20:36:03] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time
[22:20:26] <apergos>	 I confess to having restarted pdfrender on three of those hosts and it looks like they've shut up since then
[22:20:28] * apergos is off