[01:17:20] what's the cluster apache version? [01:17:38] just want to be sure that i'm looking at the right docs version [01:31:48] also, do we ever use backports? i see no relevant hits in the puppet repo for /backport/ [01:42:21] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 306 seconds [01:45:12] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [02:35:02] jeremyb: Server version: Apache/2.2.14 (Ubuntu) [02:35:26] 2.2.14-5ubuntu8.9 [02:35:32] Currently at least, seems we're using stock packages [02:40:41] hrmm, k thanks [03:54:57] PROBLEM - udp2log log age for oxygen on oxygen is CRITICAL: CRITICAL: log files /a/squid/telenor-montenegro.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [03:57:12] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [04:33:39] RECOVERY - udp2log log age for oxygen on oxygen is OK: OK: all log files active [04:39:39] PROBLEM - udp2log log age for emery on emery is CRITICAL: CRITICAL: log files /var/log/squid/orange-ivory-coast.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [05:17:00] RECOVERY - udp2log log age for emery on emery is OK: OK: all log files active [05:36:12] PROBLEM - udp2log log age for oxygen on oxygen is CRITICAL: CRITICAL: log files /a/squid/telenor-montenegro.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [05:51:57] RECOVERY - udp2log log age for oxygen on oxygen is OK: OK: all log files active [06:21:37] * jeremyb wants the output of https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=files/nagios/check_udp2log_procs;h=207b488134c27e847bea78725932aee7e3088b1e;hb=HEAD#l13 so i can refactor [06:23:18] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [06:24:36] can someone pastebin `ps ax` from any of locke/emery/oxygen? [06:25:09] thanks, back tomorrow, nacht [08:01:55] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is 0.649583418803 [08:06:16] PROBLEM - Packetloss_Average on oxygen is CRITICAL: XML parse error [08:14:13] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:41:58] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:30:07] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [13:58:10] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [15:42:16] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 182 seconds [15:43:01] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 186 seconds [15:45:07] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [15:45:52] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [16:05:49] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is -0.144204262295 [16:10:11] PROBLEM - Packetloss_Average on oxygen is CRITICAL: XML parse error [16:24:07] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [16:51:07] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [17:44:08] anyone with shell access around? [17:44:31] I thought you had [17:44:46] not until monday [17:45:48] Reedy: ^ [17:48:49] notpeter, woosters: ^ [17:49:12] wassup? [17:50:04] The problem that people were reporting with PageTriage on Thursday morning has reappeared [17:50:17] Can you turn it off? [17:50:53] https://bugzilla.wikimedia.org/show_bug.cgi?id=36968 [17:51:28] it's only turned on for en.wiki [18:01:48] apergos: ^ [18:10:10] andrew_wmf, mark, RobH: ^ [18:10:27] guess I'll have to go through the whole list :) [18:38:38] arthur resolved the issue by disabling PageTriage until we can find out what the problem is [19:48:31] 19 06:21:37 * jeremyb wants the output of [19:48:32] https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=files/nagios/check_udp2log_procs;h=207b488134c27e847bea78725932aee7e3088b1e;hb=HEAD#l13 so i can refactor [19:48:36] 19 06:24:36 < jeremyb> can someone pastebin `ps ax` from any of locke/emery/oxygen? [19:53:44] Why can't you just check the pid that udp2log writes out? Surley it writes one somewhere and you use like init rather than screen and $monkey. [19:58:00] Damianz: huh? [21:07:29] !log shutting down storage3 to replace RAID controller card [21:07:33] Logged the message, Master [21:11:22] PROBLEM - Host storage3 is DOWN: PING CRITICAL - Packet loss = 100% [21:28:19] RECOVERY - Host storage3 is UP: PING OK - Packet loss = 0%, RTA = 0.75 ms [22:31:10] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [23:59:13] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours