[00:00:06] <YuviPanda>	 urandom try sshing now?
[00:00:08] <icinga-wm>	 RECOVERY - Host xenon is UP: PING OK - Packet loss = 0%, RTA = 1.10 ms
[00:00:27] <icinga-wm>	 RECOVERY - MegaRAID on xenon is OK: OK: no disks configured for RAID
[00:00:37] <urandom>	 YuviPanda: yeah, but it's the same thing as before
[00:00:38] <icinga-wm>	 RECOVERY - dhclient process on xenon is OK: PROCS OK: 0 processes with command name dhclient
[00:00:38] <icinga-wm>	 RECOVERY - salt-minion processes on xenon is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[00:00:46] <urandom>	 acpi_pad is using a lot of cpu
[00:00:58] <icinga-wm>	 RECOVERY - DPKG on xenon is OK: All packages OK
[00:01:05] <urandom>	 YuviPanda: do you think it's OK if i try unloading that kernel module?
[00:01:07] <icinga-wm>	 RECOVERY - Disk space on xenon is OK: DISK OK
[00:01:15] <urandom>	 YuviPanda: https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5098951
[00:02:07] <urandom>	 I'm going to ask forgiveness rather than permission on this one
[00:02:58] <urandom>	 cool; it worked
[00:02:59] <YuviPanda>	 urandom yup
[00:03:19] <YuviPanda>	 urandom and open a ticket?
[00:03:27] <urandom>	 YuviPanda: will do.
[00:03:35] <YuviPanda>	 thanks
[00:03:40] <urandom>	 YuviPanda: thank you
[00:10:18] <icinga-wm>	 RECOVERY - cassandra-a service on xenon is OK: OK - cassandra-a is active
[00:11:17] <icinga-wm>	 RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 1.297 second response time
[00:13:18] <icinga-wm>	 RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy
[00:14:40] <mutante>	 urandom: just got back. i see it recovered. that bug talks about "if HT is disabled" so i checked if it is on xenon. enabled there though
[00:14:54] <urandom>	 yeah
[00:15:56] <urandom>	 mutante: it's wierd
[00:15:58] <wikibugs>	 06Operations, 10Cassandra: xenon.eqiad.wmnet: very high cpu utilization - https://phabricator.wikimedia.org/T141675#2507314 (10Eevans)
[00:16:02] <wikibugs>	 06Operations, 10Cassandra: xenon.eqiad.wmnet: very high cpu utilization - https://phabricator.wikimedia.org/T141675#2507327 (10Eevans) p:05Triage>03High
[00:16:09] <icinga-wm>	 RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[00:16:11] <urandom>	 mutante, YuviPanda: ^^^
[00:16:39] <urandom>	 should be good enough going into the weekend though
[00:16:51] <YuviPanda>	 thanks urandom
[00:17:01] <mutante>	 yea, thanks, ack
[00:17:03] <urandom>	 it's not production, but i don't want it creating pager fatigue
[00:17:21] <mutante>	 :) es
[00:17:23] <mutante>	 yes
[00:19:38] <icinga-wm>	 RECOVERY - cassandra-a CQL 10.64.0.202:9042 on xenon is OK: TCP OK - 0.004 second response time on port 9042
[00:23:53] <wikibugs>	 06Operations, 10Cassandra: xenon.eqiad.wmnet: very high cpu utilization - https://phabricator.wikimedia.org/T141675#2507314 (10Dzahn) yea, HT is enabled on xenon..  it seems to start here, when RT throttling gets activated  2680 Jul 29 22:23:40 xenon kernel: [10997327.180547] sched: RT throttling activated 268...
[00:32:08] <icinga-wm>	 PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 1 failures
[00:56:37] <wikibugs>	 06Operations, 10Cassandra: xenon.eqiad.wmnet: very high cpu utilization - https://phabricator.wikimedia.org/T141675#2507368 (10Eevans)
[00:57:47] <icinga-wm>	 RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[02:20:18] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.12) (duration: 08m 10s)
[02:20:23] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:25:55] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Jul 30 02:25:55 UTC 2016 (duration 5m 37s)
[02:26:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:37:48] <icinga-wm>	 PROBLEM - puppet last run on dbstore2001 is CRITICAL: CRITICAL: puppet fail
[02:47:38] <icinga-wm>	 RECOVERY - cassandra-c CQL 10.192.48.56:9042 on restbase2009 is OK: TCP OK - 0.036 second response time on port 9042
[02:48:58] <UAWIKI|>	 LALAALLLAALALLA
[02:49:01] <UAWIKI|>	 LALAALLLAALALLA
[02:49:02] <UAWIKI|>	 LALAALLLAALALLA
[02:49:04] <UAWIKI|>	 LALAALLLAALALLA
[02:49:05] <UAWIKI|>	 LALAALLLAALALLA
[02:49:06] <UAWIKI|>	 LALAALLLAALALLA
[02:49:07] <UAWIKI|>	 LALAALLLAALALLA
[02:49:08] <UAWIKI|>	 LALAALLLAALALLA
[02:49:10] <UAWIKI|>	 LALAALLLAALALLA
[02:49:11] <UAWIKI|>	 LALAALLLAALALLA
[02:49:12] <UAWIKI|>	 LALAALLLAALALLA
[02:49:13] <UAWIKI|>	 LALAALLLAALALLA
[02:49:14] <UAWIKI|>	 LALAALLLAALALLA
[02:49:15] <UAWIKI|>	 LALAALLLAALALLA
[02:49:17] <UAWIKI|>	 LALAALLLAALALLA
[02:49:22] <UAWIKI|>	 LALAALLLAALALLA
[02:49:23] <UAWIKI|>	 LALAALLLAALALLA
[02:49:28] <UAWIKI|>	 LALAALLLAALALLA
[02:49:29] <UAWIKI|>	 LALAALLLAALALLA
[02:49:30] <UAWIKI|>	 LALAALLLAALALLA
[02:49:31] <UAWIKI|>	 LALAALLLAALALLA
[02:49:32] <UAWIKI|>	 LALAALLLAALALLA
[02:49:34] <UAWIKI|>	 LALAALLLAALALLA
[02:49:35] <UAWIKI|>	 LALAALLLAALALLA
[02:49:36] <UAWIKI|>	 LALAALLLAALALLA
[02:49:38] <UAWIKI|>	 CHAU
[03:03:48] <icinga-wm>	 RECOVERY - puppet last run on dbstore2001 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures
[05:44:37] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 222, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps wave]BR
[06:29:57] <icinga-wm>	 PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: puppet fail
[06:31:08] <icinga-wm>	 PROBLEM - puppet last run on mw2228 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:09] <icinga-wm>	 PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:18] <icinga-wm>	 PROBLEM - puppet last run on ms-be2022 is CRITICAL: CRITICAL: Puppet has 4 failures
[06:31:18] <icinga-wm>	 PROBLEM - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:28] <icinga-wm>	 PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:08] <icinga-wm>	 PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:58] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:36:58] <icinga-wm>	 PROBLEM - puppet last run on analytics1042 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:56:28] <icinga-wm>	 RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[06:56:29] <icinga-wm>	 RECOVERY - puppet last run on ms-be2022 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures
[06:56:29] <icinga-wm>	 RECOVERY - puppet last run on ms-be2026 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[06:56:37] <icinga-wm>	 RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:48] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 224, down: 0, dormant: 0, excluded: 0, unused: 0
[06:57:08] <icinga-wm>	 RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:18] <icinga-wm>	 RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:08] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:19] <icinga-wm>	 RECOVERY - puppet last run on mw2228 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:02:27] <icinga-wm>	 RECOVERY - puppet last run on analytics1042 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:15:27] <icinga-wm>	 PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24
[07:17:09] <icinga-wm>	 RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms
[08:52:49] <icinga-wm>	 RECOVERY - cassandra-c CQL 10.64.48.137:9042 on restbase1014 is OK: TCP OK - 0.006 second response time on port 9042
[09:05:49] <wikibugs>	 06Operations: reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 - https://phabricator.wikimedia.org/T140439#2507574 (10ArielGlenn)
[09:10:47] <wikibugs>	 06Operations, 10Dumps-Generation, 07HHVM, 13Patch-For-Review: Convert snapshot hosts to use HHVM and trusty - https://phabricator.wikimedia.org/T94277#2507578 (10ArielGlenn)
[09:14:53] <wikibugs>	 06Operations, 10Datasets-General-or-Unknown: reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 - https://phabricator.wikimedia.org/T140439#2507594 (10ArielGlenn)
[12:49:39] <icinga-wm>	 PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 1 failures
[12:53:59] <grrrit-wm>	 (03CR) 10Nemo bis: [C: 031] "Certainly ok as first step." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301893 (https://phabricator.wikimedia.org/T131340) (owner: 10Jforrester)
[13:00:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[13:08:58] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[13:08:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[13:12:48] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[13:15:27] <icinga-wm>	 RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[13:22:38] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[13:24:38] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[13:30:28] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[13:38:37] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[13:40:28] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[13:42:27] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[13:44:27] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[13:44:27] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[13:46:18] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[13:50:17] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[13:54:09] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[13:55:30] <icinga-wm>	 PROBLEM - puppet last run on es2012 is CRITICAL: CRITICAL: puppet fail
[13:56:07] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[14:02:09] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[14:04:17] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[14:13:59] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[14:19:48] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[14:19:57] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[14:21:08] <icinga-wm>	 RECOVERY - puppet last run on es2012 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[14:21:48] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[14:27:35] <grrrit-wm>	 (03PS4) 10MarcoAurelio: Expanding throttle limits for enwiki Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301761 (https://phabricator.wikimedia.org/T141421) 
[14:27:39] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[14:27:39] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[14:27:40] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[14:28:18] <grrrit-wm>	 (03CR) 10MarcoAurelio: [C: 04-1] "Per task." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299354 (https://phabricator.wikimedia.org/T140550) (owner: 10Kharkiv07)
[14:29:37] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[14:31:48] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[14:31:57] <icinga-wm>	 PROBLEM - HP RAID on ms-be1025 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[14:33:57] <icinga-wm>	 RECOVERY - HP RAID on ms-be1025 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[14:35:47] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[14:37:17] <icinga-wm>	 PROBLEM - dhclient process on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:37:27] <icinga-wm>	 PROBLEM - configured eth on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:37:47] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[14:37:47] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:37:50] <icinga-wm>	 PROBLEM - Check size of conntrack table on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:37:50] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:38:09] <icinga-wm>	 PROBLEM - puppet last run on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:38:09] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:38:12] <icinga-wm>	 PROBLEM - Disk space on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:38:37] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:38:41] <icinga-wm>	 PROBLEM - DPKG on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:38:48] <icinga-wm>	 PROBLEM - MegaRAID on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:38:58] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:40:27] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1045 is OK: OK: YARN NodeManager analytics1045.eqiad.wmnet:8041 Node-State: RUNNING
[14:40:31] <icinga-wm>	 RECOVERY - DPKG on analytics1045 is OK: All packages OK
[14:40:48] <icinga-wm>	 RECOVERY - MegaRAID on analytics1045 is OK: OK: optimal, 13 logical, 14 physical
[14:40:48] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1045 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[14:41:09] <icinga-wm>	 PROBLEM - puppet last run on mw2224 is CRITICAL: CRITICAL: puppet fail
[14:41:17] <icinga-wm>	 RECOVERY - dhclient process on analytics1045 is OK: PROCS OK: 0 processes with command name dhclient
[14:41:19] <icinga-wm>	 RECOVERY - configured eth on analytics1045 is OK: OK - interfaces up
[14:41:49] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1045 is OK: DISK OK
[14:41:51] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1045 is OK: OK: nf_conntrack is 0 % full
[14:41:51] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1045 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[14:41:58] <icinga-wm>	 RECOVERY - puppet last run on analytics1045 is OK: OK: Puppet is currently enabled, last run 13 minutes ago with 0 failures
[14:41:58] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1045 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[14:42:08] <icinga-wm>	 RECOVERY - Disk space on analytics1045 is OK: DISK OK
[14:45:14] <grrrit-wm>	 (03CR) 10MarcoAurelio: [C: 031] "Looks good to me. Needs to be rebased though." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301807 (https://phabricator.wikimedia.org/T140566) (owner: 10Dereckson)
[14:55:37] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:01:38] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:06:17] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[15:07:29] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:08:09] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4933024 keys - replication_delay is 0
[15:08:47] <icinga-wm>	 RECOVERY - puppet last run on mw2224 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:09:28] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:09:37] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:13:27] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:13:33] <grrrit-wm>	 (03PS1) 10BBlack: openssl (1.0.2h-1~wmf3) jessie-wikimedia; urgency=medium [debs/openssl] - 10https://gerrit.wikimedia.org/r/301920 (https://phabricator.wikimedia.org/T131908) 
[15:15:19] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:17:18] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:23:08] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:25:07] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:31:17] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:35:07] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:35:08] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:41:08] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:42:58] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:42:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:44:57] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:47:56] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 031] "Looks good to me!" [debs/openssl] - 10https://gerrit.wikimedia.org/r/301903 (owner: 10BBlack)
[15:50:48] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:52:47] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:56:48] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[16:02:47] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[16:02:48] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[16:04:48] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[16:16:37] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[16:20:29] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[16:26:27] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[16:30:27] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[16:32:37] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[16:34:13] <grrrit-wm>	 (03PS2) 10BBlack: openssl (1.0.2h-1~wmf3) jessie-wikimedia; urgency=medium [debs/openssl] - 10https://gerrit.wikimedia.org/r/301920 (https://phabricator.wikimedia.org/T131908) 
[16:34:28] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[16:44:18] <icinga-wm>	 PROBLEM - HP RAID on ms-be1025 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[16:46:17] <icinga-wm>	 RECOVERY - HP RAID on ms-be1025 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[16:46:18] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[16:47:06] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Possible increase in logged-out users being served cached outdated revisions - https://phabricator.wikimedia.org/T141693#2508537 (10Glaisher)
[16:47:19] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Possible increase in logged-out users being served cached outdated revisions - https://phabricator.wikimedia.org/T141693#2508549 (10Glaisher) p:05Triage>03High
[16:49:27] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508550 (10Aklapper)
[16:49:42] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2507607 (10Aklapper)
[16:49:45] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Possible increase in logged-out users being served cached outdated revisions - https://phabricator.wikimedia.org/T141693#2508558 (10Aklapper)
[16:52:50] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508560 (10Aklapper) Quoting Glaisher from T141693: > Multiple reports at enwiki and OTRS. > * https://en.wikipedia.org/wiki/Wikipedia:Help_desk#Why_No_Text_In_Article....
[16:53:22] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508563 (10Aklapper) (Wondering if belated syncing of ProofRead status on Wikisource reported in T141692 might be related to caching issues.)
[16:54:49] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508566 (10Boshomi)
[16:56:32] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508567 (10Glaisher) >>! In T141687#2508563, @Aklapper wrote: > (Wondering if belated syncing of ProofRead status on Wikisource reported in T141692 might be related to...
[16:59:58] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[16:59:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1025 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[16:59:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[17:02:08] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:06:07] <icinga-wm>	 RECOVERY - HP RAID on ms-be1025 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:11:00] <wikibugs>	 06Operations, 10Phabricator, 06Project-Admins, 06Triagers: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#2508576 (10Danny_B)
[17:11:59] <wikibugs>	 06Operations, 10Phabricator, 06Project-Admins, 06Triagers: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#1722432 (10Danny_B)
[17:15:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[17:30:59] <wikibugs>	 06Operations, 06WMF-NDA-Requests: Please add me to #WMF-NDA - https://phabricator.wikimedia.org/T94238#2508587 (10Danny_B)
[17:31:48] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:31:48] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[17:31:48] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[17:33:48] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:35:49] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:39:48] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[17:41:47] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[17:45:38] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:53:28] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:55:27] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[17:57:18] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:59:18] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[18:04:48] <icinga-wm>	 PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: Puppet has 1 failures
[18:07:27] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[18:09:18] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[18:13:17] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[18:15:17] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[18:20:58] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[18:24:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[18:26:57] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[18:26:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[18:28:57] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[18:32:18] <icinga-wm>	 RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[18:34:59] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[18:36:12] <grrrit-wm>	 (03Abandoned) 10Tpt: Deploy the Kartographer extension to meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298042 (https://phabricator.wikimedia.org/T139787) (owner: 10Tpt)
[18:36:53] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508701 (10Boshomi) T141695 is also the  same
[18:38:57] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[18:38:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[18:40:06] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508709 (10Boshomi)
[18:41:45] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2507607 (10Boshomi) in T141695 @Gestrid wrote: >I work in the English Wikipedia's Teahouse (a place for new users to ask questions and get answers), where we have recen...
[18:42:49] <icinga-wm>	 PROBLEM - HP RAID on ms-be1025 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[18:44:47] <icinga-wm>	 RECOVERY - HP RAID on ms-be1025 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[18:46:38] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[19:06:28] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[19:10:28] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[19:12:18] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[19:16:21] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[19:18:17] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[19:18:19] <icinga-wm>	 PROBLEM - HP RAID on ms-be1025 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[19:20:09] <icinga-wm>	 RECOVERY - HP RAID on ms-be1025 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[19:26:09] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[19:27:58] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[19:28:06] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508719 (10Gestrid) Several reports have come in at the [[ https://en.wikipedia.org/wiki/Wikipedia:Teahouse/Questions | English Wikipedia's Teahouse ]] regarding this i...
[19:32:17] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[19:34:25] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508720 (10Boshomi)
[19:40:00] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[19:43:49] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[19:43:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[19:53:38] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[19:53:48] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:01:57] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[20:03:57] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[20:05:48] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:07:48] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:10:58] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Logged out users are not seeing the most up-to-date versions of Wikipedia pages - https://phabricator.wikimedia.org/T141695#2508730 (10Danny_B)
[20:13:39] <icinga-wm>	 PROBLEM - HP RAID on ms-be1025 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[20:15:28] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[20:15:37] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:15:38] <icinga-wm>	 RECOVERY - HP RAID on ms-be1025 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:17:29] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:23:27] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[20:25:19] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[20:27:18] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:27:18] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[20:35:27] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:39:19] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:41:11] <apergos>	 um, I'm just peeking in, does anyone know what is going on with these HP RAID warnings/
[20:41:17] <apergos>	 ?
[20:41:18] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[20:42:36] <Reedy>	 Not seeen anyone mention anything
[20:44:06] <apergos>	 it's ms-be1023,4,5,6
[20:45:17] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[20:45:38] <apergos>	 seems to have been going on since about 4pm my time (EET)
[20:45:43] <apergos>	 so that's 1pm? UTC?
[20:46:14] <Reedy>	 aye
[20:48:43] <apergos>	 all but one ms-be host (and there are 58 right now) is salt-responsive, I wonder why these 4 are seeming to have the issue
[20:49:07] <Reedy>	 I think they're "new"
[20:49:28] <Reedy>	 https://phabricator.wikimedia.org/T136631
[20:49:33] <Reedy>	 rack/setup/deploy ms-be102[2-7]
[20:50:27] <Reedy>	 T140374 diagnose failed disks on ms-be1027 
[20:50:28] <stashbot>	 T140374: diagnose failed disks on ms-be1027 - https://phabricator.wikimedia.org/T140374
[20:50:45] <Reedy>	 diagnose failed(?) sda on ms-be1022 
[20:50:51] <Reedy>	 T140597
[20:50:51] <stashbot>	 T140597: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597
[20:51:07] <grrrit-wm>	 (03PS1) 10Matanya: allow sysops on hewikt to remove autopatroller and patroller rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302054 
[20:51:08] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[20:51:38] <apergos>	 https://phabricator.wikimedia.org/T136631
[20:51:45] <apergos>	 oh, you already  pasted it in
[20:51:54] <apergos>	 yeh I had come to the same conclusion, these are "new"-ish
[20:52:00] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2341913 (10Reedy) These all seem to be flapping with  ``` [21:27:18] <icinga-wm> PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 second...
[20:52:26] <Reedy>	 The checklists look fairly out of date :)
[20:53:05] <apergos>	  08:49 godog: swift eqiad-prod: ms-be102[3456] weight 3000      on the 28th
[20:53:13] <apergos>	 indeed they do
[20:54:03] <apergos>	 godog: any chance you're around?
[20:54:12] <apergos>	 so unlikely, saturday night.  I shouldn't even be around :-P
[20:56:14] <apergos>	 ganglia looks pretty boring for those 4
[20:56:26] <Reedy>	 false positive and such
[21:04:32] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2341913 (10ArielGlenn)   Jul 30 18:18:38 ms-be1026 kernel: [1391836.163876] hpsa 0000:08:00.0: scsi 0:0:0:0 Aborting command ffff8812d751b4c0Tag:0x00000000:00000310 CD...
[21:04:32] <apergos>	 sort of
[21:04:38] <apergos>	 I mean it's probably kernel
[21:04:43] <apergos>	 see ticket ^
[21:05:08] <apergos>	 I should see if the hosts that aren't new have these issues
[21:05:08] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[21:05:13] <apergos>	 geethanks
[21:05:45] <Reedy>	 I wonder if those were the hosts that paravoid among others were chasing various firmware updates
[21:06:23] <apergos>	 nope they don't
[21:06:24] <apergos>	 just these
[21:06:28] <apergos>	 maybe
[21:06:46] <apergos>	 anyways now it's on a ticket someplace
[21:08:23] <apergos>	 it appears there is no user impact 
[21:08:28] <apergos>	 so yay for that
[21:10:59] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[21:11:39] <Reedy>	 Is it worth ack-ing in incinga?
[21:11:47] <Reedy>	 /schedule downtime
[21:13:22] <apergos>	 well it's not paging
[21:14:02] <apergos>	 I'd rather let it visibly flap, if someone knows better they can ack it
[21:16:02] <apergos>	 by 'knows better' I mean they have been working on it and know what attention the icinga whines need
[21:16:03] <Reedy>	 aha, ok
[21:16:15] <Reedy>	 I guess if it was paging, someone would've come in earlier to shut up or fix it
[21:16:19] <apergos>	 yep
[21:16:33] <apergos>	 seriously, I just was peeking in on irc and saw these, otherwise I would not know
[21:18:59] <Reedy>	 heh
[21:22:38] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[21:32:47] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[21:34:25] <Platonides>	 /49/6
[21:36:38] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[21:38:34] <matanya>	 Reedy: feel like reviewing https://gerrit.wikimedia.org/r/#/c/302054/ ?
[21:40:29] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[21:40:29] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[21:50:18] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[21:52:17] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[22:00:07] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[22:02:08] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[22:07:42] <wpayuda272626>	 JEM
[22:07:43] <wpayuda272626>	 JEM
[22:07:44] <wpayuda272626>	 JEM
[22:07:45] <wpayuda272626>	 JEM
[22:07:46] <wpayuda272626>	 JEM
[22:07:48] <wpayuda272626>	 T
[22:07:49] <wpayuda272626>	 E
[22:07:51] <wpayuda272626>	 -
[22:07:52] <wpayuda272626>	 M
[22:07:54] <wpayuda272626>	 A
[22:07:55] <wpayuda272626>	 T
[22:07:56] <wpayuda272626>	 A
[22:07:57] <wpayuda272626>	 R
[22:07:58] <wpayuda272626>	 E
[22:08:01] <wpayuda272626>	 -
[22:08:03] <wpayuda272626>	 DHAHAHAHAHAH
[22:08:08] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[22:08:09] <wpayuda272626>	 JEM TE MATARE
[22:08:11] <wpayuda272626>	 HAHAHAHA
[22:08:16] <wpayuda272626>	 JEM HIJO DE PERRA
[22:08:41] <paladox>	 Spam
[22:08:47] <wpayuda272626>	 JEM
[22:08:49] <wpayuda272626>	 MALDITO
[22:09:00] <wpayuda272626>	 CALLATE PALADOX DE MIERDA
[22:09:23] <wpayuda272626>	 HIJO DE PERRA.. ME ACUSAS Y TEMATI
[22:09:28] <wpayuda272626>	 MATARE
[22:09:42] <paladox>	 @kb 185.140.114.121
[22:09:42] <wm-bot>	 Permission denied
[22:10:41] <wpayuda272626>	 LARGATE DE AQUI PALADOX
[22:10:43] <wpayuda272626>	 PUDRETE
[22:10:57] <Platonides>	 sigh
[22:11:12] <paladox>	 Italian
[22:11:17] <paladox>	 Largate DE AQUI PALADOX
[22:11:22] <paladox>	 What does that even mean
[22:11:45] <paladox>	 Portuguise now
[22:12:38] <godog>	 mhh gone heh
[22:12:42] <Platonides>	 it's Spanish
[22:12:54] <Platonides>	 largate = go away
[22:13:06] <Platonides>	 it's a known troll
[22:13:59] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[22:14:18] <godog>	 Reedy apergos: sigh, looks like the controller isn't very happy (swift) thanks for looking !
[22:14:42] <apergos>	 godog:  is this something you have known about?
[22:15:27] <paladox>	 Oh
[22:15:39] <paladox>	 thank Platonides for explaning what it means :)
[22:15:43] <godog>	 apergos: I've seen the check timing out before yeah
[22:15:57] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[22:16:01] <apergos>	 ok
[22:19:18] <paladox>	 Platonides, im guessing the person who did that did not know i only speak english and not spanish so i would not know what they were saying
[22:19:20] <paladox>	 lol
[22:19:39] <paladox>	 It also swore at me.
[22:19:58] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[22:20:04] <Platonides>	 yes
[22:20:09] <godog>	 also I'm silencing those, no point in spam
[22:20:15] <Platonides>	 he has been attacking us on many channels for weeks
[22:20:23] <paladox>	 oh
[22:20:34] <paladox>	 It is very hard to ban them it seems
[22:20:40] <paladox>	 they keep changing ip and username
[22:21:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[22:23:06] <Platonides>	 yes
[22:23:17] <paladox>	 oh
[22:23:30] <paladox>	 it said something to do with jem kill you
[22:23:32] <apergos>	 I didn't feel right about acking them until I knew that someone with a clue was on the case
[22:23:34] <paladox>	 that is translated from
[22:23:44] <paladox>	 JEM HIJO DE PERRA
[22:23:50] <apergos>	 thanks for patting icinga on the head
[22:25:23] <paladox>	 It is problay going to come back at sometime in the earley morning bst time.
[22:25:28] <paladox>	 like it did this morning
[22:27:10] <paladox>	 LOL it just changed its ip by one UAWIKI| (~Javiera@185.140.114.122) from this morning and  wpayuda272626 (~cremosa@185.140.114.121) has joined now
[22:27:57] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[22:34:31] <godog>	 apergos: yeah I'm taking a look, I agree it doesn't seem to have an impact tho
[22:35:44] <apergos>	 so far so good at least
[22:45:58] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508947 (10BBlack) There haven't been any related changes recently on the cache side of things (e.g. changes in relevant VCL or how purging works, etc).  More likely it...
[22:51:57] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2508951 (10fgiunchedi) it looks like the controller isn't responding under load and makes the icinga alarms flap. Given that this is new hardware and we have the same...
[22:52:07] <wikibugs>	 06Operations, 10MediaWiki-Cache, 10Traffic: Cached outdated revisions served to logged-out users - https://phabricator.wikimedia.org/T141687#2508952 (10Gestrid) The first question in the Teahouse related to this problem came in yesterday (July 29th) at about 2:16 pm EDT.  This means that whatever happened wa...
[23:26:03] <grrrit-wm>	 (03PS3) 10BBlack: openssl (1.0.2h-1~wmf3) jessie-wikimedia; urgency=medium [debs/openssl] - 10https://gerrit.wikimedia.org/r/301920 (https://phabricator.wikimedia.org/T131908) 
[23:33:18] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[23:35:09] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4913580 keys - replication_delay is 0
[23:36:45] <Jamesofur>	 !log deleted 7 files from server for legal compliance
[23:36:49] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:43:19] <Reedy>	 Jamesofur: vague much
[23:43:38] <Jamesofur>	 Reedy: not really, there is literally only 1 reason I ever do that :)
[23:44:02] <Reedy>	 even saying which server it is might be more useful ;)
[23:45:14] <Jamesofur>	 Reedy: I don't know which server it is! The magic eraseArchivedFile.php goes and finds the files I needs and purges the caches etc ;) and if I say what wiki people could go looking for my logs which is #notgood in this case ;)
[23:46:19] <Reedy>	 oh, so you deleted media?
[23:46:22] * Jamesofur nods
[23:47:30] <Jamesofur>	 think there's a better way to word it for the future? I can adjust the internal docs we have for the procedure. I mostly log for the unlikely possibility that something fucks up we have a time stamp (also because for this particular procedure I REALLY like having time stamps)
[23:48:22] <Reedy>	 I think calling them uploads... might make it clearer
[23:48:34] <Reedy>	 we have a clearer definition of what they actually are
[23:48:41] <Reedy>	 vs "files" which could be anything
[23:49:02] <Reedy>	 or file uploads..
[23:49:14] <Jamesofur>	 ahhh, that's a fair point
[23:49:19] <Jamesofur>	 hadn't even thought about that angle
[23:50:06] <Jamesofur>	 (because you're right if I had deleted a log or something for legal compliance, which I've done before, I totally would have said what server etc)
[23:53:11] <Reedy>	 Not a big deal, just makes it clearer rather than "Jamesofur deleted a random file on a random server" ;)
[23:53:17] <Jamesofur>	 ;) yup
[23:53:25] <Reedy>	 Another way.. would say deleting file from swift
[23:53:31] <Reedy>	 Again, that's specific and fairly obvious
[23:53:37] * Jamesofur nods
[23:54:08] <Jamesofur>	 but only to those who know what they're talking about which isn't a bad way to do it
[23:54:17] <Reedy>	 Yeah, indeed
[23:54:33] <Reedy>	 People who know more than the average user know what you're doing
[23:54:37] <Reedy>	 Rather than wtf-ing at it
[23:55:33] * Jamesofur nods
[23:55:47] <Jamesofur>	 yup, wrote it down in our docs :) appreciate it
[23:56:06] <Reedy>	 cool, no :)
[23:56:08] <Reedy>	 *np
[23:56:31] <p858snake>	 "Used eraseArchivedFile.php to remove 7 files for legal compliance" 
[23:56:53] <Krenair>	 having a timestamp for destructive operations is probably a good thing
[23:57:31] <Reedy>	 Yeah, or that
[23:57:41] <Krenair>	 e.g. when user rights change logging broke we could use the times to help the DBA find all the rights changes that weren't accounted for 
[23:57:47] <Reedy>	 "Why did Jamesofur just set off the chaos monkey?"
[23:58:47] <Krenair>	 although in that case we had the data necessary to reconstruct almost everything (except log summaries IIRC)
[23:58:49] <Krenair>	 I guess this script actually properly deletes things completely, unlike on-wiki deletion
[23:58:57] * Jamesofur nods
[23:58:58] <Jamesofur>	 yup
[23:59:18] <Jamesofur>	 Reedy: why WOULDN'T I set off the chaos monkey?
[23:59:20] <Jamesofur>	 ;)