[00:00:00] <notpeter>	 cool
[00:00:05] <notpeter>	 no need to get red of it, really
[00:00:06] <RobH>	 figured if no one bitches in a bit, you can pull and archive it so its not takin up space.
[00:00:08] <^demon>	 If misc::limesurvey's not used anywhere, can it be removed?
[00:00:16] <^demon>	 I'd rather not encourage anyone to ever use it :p
[00:00:18] <RobH>	 ^demon: i wanna make sure no one flips out
[00:00:22] <^demon>	 *nod*
[00:00:26] <RobH>	 but if no one does in a week or two, we can pull it out entirely yep
[00:00:45] <RobH>	 im glad its dead.
[00:00:51] <RobH>	 it was a pain in the ass to administer
[00:01:10] <RobH>	 'im so tired of dealing with limesurvey, i need something less error prone and less frustrating, like wordpress.'
[00:01:13] <RobH>	 =P
[00:01:26] <RobH>	 (saying 'like mediawiki' goes without saying ;)
[00:01:41] <^demon>	 Heh
[00:02:02] <nagios-wm>	 PROBLEM - Host argon is DOWN: PING CRITICAL - Packet loss = 100%
[00:02:35] <RobH>	 DIE ARGON
[00:02:42] <RobH>	 then come back without limesurvey.
[00:03:03] <RobH>	 notpeter: well, if we dont get rid of it, it will travel to the new misc dbs!  (as long as you dont migrate it i dont care ;)
[00:03:51] <notpeter>	 yeah
[00:03:52] <notpeter>	 I dunno
[00:04:00] <notpeter>	 I'm not too worried
[00:04:15] <notpeter>	 drop now, or drop later
[00:04:26] <nagios-wm>	 PROBLEM - Puppet freshness on stafford is CRITICAL: Puppet has not run in the last 10 hours
[00:04:33] <notpeter>	 the important part, is that we hire skrillex as our dba.
[00:06:59] <MaxSem>	 RobH, shouldn't misc::limesurvey die too?
[00:07:45] <^demon>	 Already asked that :p
[00:09:40] <MaxSem>	 meh
[00:11:06] <RobH>	 hrmm, i guess i could kill it now
[00:11:13] <RobH>	 cuz i can always revert, yay git.
[00:11:18] <notpeter>	 RobH: no
[00:11:22] <RobH>	 ?
[00:11:30] <notpeter>	 just wait until we know that no one needs it
[00:11:40] <notpeter>	 let the laziness flow through you ;)
[00:11:48] <RobH>	 nah, i want it dead nowwwwwwwwwww
[00:11:53] <notpeter>	 it is!
[00:11:59] <RobH>	 i shoudnt have cut off argon without seeing if it had any recent access first.
[00:12:11] <RobH>	 oh well
[00:12:32] <RobH>	 cuz if it had no access for months, i would feel even more pressure to delete limesurvey.pp
[00:12:34] <notpeter>	 here, this might help: http://www.youtube.com/watch?v=-4KIoz_g0V0
[00:13:31] <notpeter>	 I am at least 17% more chilled out since I started listening to that
[00:13:43] <Reedy>	 Why remove it today if someone else will remove it tomorrow? :p
[00:13:54] <notpeter>	 Reedy: you speak my mind
[00:16:28] <RobH>	 bwahahaha
[00:16:33] <RobH>	 die limesurvey dddddiiiieeeeee
[00:16:46] <gerrit-wm>	 New patchset: RobH; "limesurvey.pp is dead" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50461
[00:17:09] <RobH>	 i had the joy of commiting the change
[00:17:12] <RobH>	 anyone else wanna merge?
[00:17:19] <RobH>	 its kinda cathartic.
[00:20:41] <gerrit-wm>	 New review: RobH; "from hell's heart I stab at thee" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50461
[00:21:01] <gerrit-wm>	 New review: RobH; "from hell's heart I stab at thee" [operations/puppet] (production); V: 2 C: 2;  - https://gerrit.wikimedia.org/r/50461
[00:21:14] <gerrit-wm>	 Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50461
[00:21:18] <RobH>	 \o/
[00:21:47] <RobH>	 so if it has to come back now, its gonna mean reverting crap, good reason to not bring it back.
[00:22:35] <RobH>	 ahh, bleh, i missed the damned apache host file.
[00:23:41] <gerrit-wm>	 New patchset: RobH; "removing limesurvey apache vhost file from repo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50463
[00:24:36] <gerrit-wm>	 Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50463
[00:29:27] <gerrit-wm>	 New patchset: RobH; "old vhost files for services no longer offered" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50464
[00:30:16] <gerrit-wm>	 New review: RobH; "death to old cruft" [operations/puppet] (production); V: 2 C: 2;  - https://gerrit.wikimedia.org/r/50464
[00:30:27] <gerrit-wm>	 Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50464
[00:37:26] <nagios-wm>	 PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours
[00:39:14] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 187 seconds
[00:40:44] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 187 seconds
[00:43:08] <nagios-wm>	 RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Sat Feb 23 00:42:57 UTC 2013
[00:43:11] <RobH>	 !log lvs1003 had a ton of hung puppet processes (no output on them when tracing, so dunno whats up).  killed them and kicked manual run
[00:43:12] <morebots>	 Logged the message, RobH
[00:49:26] <nagios-wm>	 PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours
[00:55:12] <robla>	 woosters: https://gerrit.wikimedia.org/r/#/c/49069/
[01:01:40] <gerrit-wm>	 New patchset: Reedy; "Recurseively checkout submodules..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50467
[01:32:29] <nagios-wm>	 PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host
[01:33:24] <nagios-wm>	 PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host
[01:42:50] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 181 seconds
[01:43:21] <gerrit-wm>	 New review: Mattflaschen; "Seems like it should resolve the issue." [operations/mediawiki-config] (master) C: 1;  - https://gerrit.wikimedia.org/r/50467
[01:44:11] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 212 seconds
[01:49:00] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds
[01:49:27] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 3 seconds
[01:53:30] <nagios-wm>	 PROBLEM - Puppet freshness on mw64 is CRITICAL: Puppet has not run in the last 10 hours
[01:53:30] <nagios-wm>	 PROBLEM - Puppet freshness on mw1039 is CRITICAL: Puppet has not run in the last 10 hours
[02:04:00] <nagios-wm>	 RECOVERY - MySQL disk space on neon is OK: DISK OK
[02:04:27] <nagios-wm>	 RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho
[02:27:18] <logmsgbot>	 !log LocalisationUpdate completed (1.21wmf10) at Sat Feb 23 02:27:17 UTC 2013
[02:27:21] <morebots>	 Logged the message, Master
[02:36:24] <nagios-wm>	 PROBLEM - Puppet freshness on db1009 is CRITICAL: Puppet has not run in the last 10 hours
[02:38:30] <nagios-wm>	 PROBLEM - Puppet freshness on mw1134 is CRITICAL: Puppet has not run in the last 10 hours
[02:52:11] <logmsgbot>	 !log LocalisationUpdate completed (1.21wmf9) at Sat Feb 23 02:52:10 UTC 2013
[02:52:13] <morebots>	 Logged the message, Master
[02:59:30] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:01:18] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.132 seconds
[03:16:54] <nagios-wm>	 PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host
[03:18:06] <nagios-wm>	 PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host
[03:22:54] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 183 seconds
[03:23:21] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 188 seconds
[03:36:42] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:49:54] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.059 seconds
[03:50:39] <nagios-wm>	 RECOVERY - MySQL disk space on neon is OK: DISK OK
[03:51:06] <nagios-wm>	 RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho
[03:59:03] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds
[04:00:15] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds
[04:08:21] <nagios-wm>	 PROBLEM - Puppet freshness on srv245 is CRITICAL: Puppet has not run in the last 10 hours
[04:09:15] <nagios-wm>	 RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000
[04:10:45] <nagios-wm>	 RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000
[04:21:42] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 26 seconds
[04:22:18] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds
[04:22:45] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:29:21] <nagios-wm>	 PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours
[04:33:33] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.077 seconds
[04:53:48] <nagios-wm>	 PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host
[04:54:24] <nagios-wm>	 PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host
[05:08:21] <nagios-wm>	 PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours
[05:25:30] <nagios-wm>	 RECOVERY - MySQL disk space on neon is OK: DISK OK
[05:25:56] <nagios-wm>	 RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho
[05:26:32] <nagios-wm>	 PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out
[05:29:59] <nagios-wm>	 RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 9.021 second response time on port 8123
[05:57:17] <nagios-wm>	 PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out
[05:58:38] <nagios-wm>	 PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out
[06:01:38] <nagios-wm>	 RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123
[06:07:47] <nagios-wm>	 PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out
[06:11:41] <nagios-wm>	 RECOVERY - Lucene on search1016 is OK: TCP OK - 0.027 second response time on port 8123
[06:11:41] <nagios-wm>	 RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.030 second response time on port 8123
[06:13:50] <apergos>	 !log restarted lucene search on search1016
[06:13:52] <morebots>	 Logged the message, Master
[06:15:44] <nagios-wm>	 PROBLEM - Host cp3003 is DOWN: PING CRITICAL - Packet loss = 100%
[06:20:45] <nagios-wm>	 RECOVERY - Host cp3003 is UP: PING OK - Packet loss = 0%, RTA = 118.29 ms
[06:22:51] <nagios-wm>	 PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours
[06:40:15] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:41:54] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.066 seconds
[06:43:42] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db64 is CRITICAL: CRIT replication delay 311 seconds
[06:44:10] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db64 is CRITICAL: CRIT replication delay 338 seconds
[06:49:27] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db64 is OK: OK replication delay NULL seconds
[06:53:30] <nagios-wm>	 PROBLEM - MySQL Slave Running on db64 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Lock wait timeout exceeded: try restarting transaction on que
[07:02:03] <nagios-wm>	 PROBLEM - SSH on dataset1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:05:39] <nagios-wm>	 RECOVERY - SSH on dataset1001 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0)
[07:15:06] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:16:54] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.058 seconds
[07:47:16] <nagios-wm>	 PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host
[07:47:53] <nagios-wm>	 PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host
[07:51:10] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:05:25] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.037 seconds
[08:07:58] <nagios-wm>	 PROBLEM - Host cp3003 is DOWN: PING CRITICAL - Packet loss = 100%
[08:17:52] <nagios-wm>	 RECOVERY - MySQL disk space on neon is OK: DISK OK
[08:18:28] <nagios-wm>	 RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho
[08:19:58] <nagios-wm>	 RECOVERY - Host cp3003 is UP: PING OK - Packet loss = 0%, RTA = 118.26 ms
[08:24:46] <nagios-wm>	 PROBLEM - Puppet freshness on cp3004 is CRITICAL: Puppet has not run in the last 10 hours
[08:35:08] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:39:55] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.048 seconds
[08:44:43] <nagios-wm>	 PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Puppet has not run in the last 10 hours
[08:44:43] <nagios-wm>	 PROBLEM - Puppet freshness on ms-be3003 is CRITICAL: Puppet has not run in the last 10 hours
[09:07:40] <nagios-wm>	 PROBLEM - Puppet freshness on ms-be3001 is CRITICAL: Puppet has not run in the last 10 hours
[09:30:19] <nagios-wm>	 RECOVERY - Puppet freshness on stafford is OK: puppet ran at Sat Feb 23 09:30:14 UTC 2013
[09:38:16] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 187 seconds
[09:39:10] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 210 seconds
[09:40:04] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 8 seconds
[09:40:59] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds
[10:37:25] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 213 seconds
[10:39:13] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds
[10:50:37] <nagios-wm>	 PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours
[10:58:33] <nagios-wm>	 PROBLEM - Host cp3003 is DOWN: PING CRITICAL - Packet loss = 100%
[11:10:24] <nagios-wm>	 RECOVERY - Host cp3003 is UP: PING OK - Packet loss = 0%, RTA = 118.30 ms
[11:36:39] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 190 seconds
[11:38:27] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 6 seconds
[11:54:21] <nagios-wm>	 PROBLEM - Puppet freshness on mw64 is CRITICAL: Puppet has not run in the last 10 hours
[11:54:21] <nagios-wm>	 PROBLEM - Puppet freshness on mw1039 is CRITICAL: Puppet has not run in the last 10 hours
[12:23:02] <nagios-wm>	 PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out
[12:24:32] <nagios-wm>	 RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.035 second response time on port 8123
[12:37:26] <nagios-wm>	 PROBLEM - Puppet freshness on db1009 is CRITICAL: Puppet has not run in the last 10 hours
[12:38:29] <nagios-wm>	 PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , svwiki (34371), Total (41259)
[12:38:29] <nagios-wm>	 PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , svwiki (34371), Total (41259)
[12:39:32] <nagios-wm>	 PROBLEM - Puppet freshness on mw1134 is CRITICAL: Puppet has not run in the last 10 hours
[12:54:23] <nagios-wm>	 RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000
[12:54:32] <nagios-wm>	 RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000
[13:37:26] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 190 seconds
[13:37:44] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 198 seconds
[13:39:33] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds
[13:39:42] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds
[13:58:45] <nagios-wm>	 PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host
[13:59:12] <nagios-wm>	 PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host
[14:05:02] <gerrit-wm>	 New patchset: Hydriz; "Adding .gitreview and .gitignore files" [operations/dumps/archiving] (master) - https://gerrit.wikimedia.org/r/50485
[14:05:38] <gerrit-wm>	 Change merged: Hydriz; [operations/dumps/archiving] (master) - https://gerrit.wikimedia.org/r/50485
[14:09:51] <nagios-wm>	 PROBLEM - Puppet freshness on srv245 is CRITICAL: Puppet has not run in the last 10 hours
[14:30:51] <nagios-wm>	 PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours
[14:31:09] <nagios-wm>	 RECOVERY - MySQL disk space on neon is OK: DISK OK
[14:31:36] <nagios-wm>	 RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho
[15:08:30] <nagios-wm>	 PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host
[15:09:24] <nagios-wm>	 PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours
[15:09:51] <nagios-wm>	 PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host
[15:39:06] <nagios-wm>	 RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho
[15:40:28] <nagios-wm>	 RECOVERY - MySQL disk space on neon is OK: DISK OK
[16:24:13] <nagios-wm>	 PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours
[17:04:43] <nagios-wm>	 PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiki (112423), Total (116107)
[17:06:13] <nagios-wm>	 PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiki (111017), Total (112893)
[17:31:16] <nagios-wm>	 PROBLEM - Host google is DOWN: CRITICAL - Time to live exceeded (74.125.225.84)
[17:31:16] <nagios-wm>	 PROBLEM - Host mobile-lb.eqiad.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:861:ed1a::c
[17:31:21] <nagios-wm>	 RECOVERY - Host google is UP: PING OK - Packet loss = 0%, RTA = 54.93 ms
[17:31:39] <nagios-wm>	 RECOVERY - Host mobile-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 35.45 ms
[17:46:13] <gerrit-wm>	 New review: Danny B.; "What's the status, please? Instead of the desired enwikt-like icon ['w] we have now the scrabble-lik..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49681
[17:47:12] <gerrit-wm>	 New review: Alex Monk; "Nothing to do with this." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49681
[17:49:21] <gerrit-wm>	 New review: Alex Monk; "Probably I6b727825" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49681
[17:49:41] <gerrit-wm>	 New patchset: Alex Monk; "(bug 45113) Set cswiktionary favicon to the same as enwiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49681
[17:53:27] <nagios-wm>	 PROBLEM - check google safe browsing for wikipedia.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[17:53:42] <gerrit-wm>	 New review: Danny B.; "Please sync." [operations/mediawiki-config] (master) C: 1;  - https://gerrit.wikimedia.org/r/49681
[17:55:03] <nagios-wm>	 RECOVERY - check google safe browsing for wikipedia.org on google is OK: HTTP OK HTTP/1.0 200 OK - 0.153 second response time
[17:57:00] <nagios-wm>	 PROBLEM - Puppet freshness on sq73 is CRITICAL: Puppet has not run in the last 10 hours
[17:58:03] <nagios-wm>	 PROBLEM - Puppet freshness on amssq37 is CRITICAL: Puppet has not run in the last 10 hours
[18:08:33] <nagios-wm>	 PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host
[18:08:45] <nagios-wm>	 PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host
[18:19:57] <nagios-wm>	 PROBLEM - Puppet freshness on cp3003 is CRITICAL: Puppet has not run in the last 10 hours
[18:26:06] <nagios-wm>	 PROBLEM - Puppet freshness on cp3004 is CRITICAL: Puppet has not run in the last 10 hours
[18:37:29] <nagios-wm>	 PROBLEM - Puppet freshness on lardner is CRITICAL: Puppet has not run in the last 10 hours
[18:40:24] <nagios-wm>	 RECOVERY - MySQL disk space on neon is OK: DISK OK
[18:40:56] <nagios-wm>	 RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho
[18:46:29] <nagios-wm>	 PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Puppet has not run in the last 10 hours
[18:46:30] <nagios-wm>	 PROBLEM - Puppet freshness on ms-be3003 is CRITICAL: Puppet has not run in the last 10 hours
[18:56:50] <nagios-wm>	 PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out
[18:58:48] <nagios-wm>	 RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123
[19:09:26] <nagios-wm>	 PROBLEM - Puppet freshness on ms-be3001 is CRITICAL: Puppet has not run in the last 10 hours
[19:55:34] <nagios-wm>	 PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host
[19:56:05] <nagios-wm>	 PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host
[20:04:12] <nagios-wm>	 PROBLEM - Host google is DOWN: CRITICAL - Time to live exceeded (74.125.225.84)
[20:04:29] <nagios-wm>	 RECOVERY - Host google is UP: PING OK - Packet loss = 0%, RTA = 47.47 ms
[20:15:35] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:17:32] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds
[20:28:20] <nagios-wm>	 RECOVERY - MySQL disk space on neon is OK: DISK OK
[20:28:30] <nagios-wm>	 RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho
[20:52:29] <nagios-wm>	 PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours
[20:56:40] <Aaron|home>	 apergos: why does class mediawiki_new::jobrunner have $procs = 5? that number used is 12
[21:11:41] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db36 is CRITICAL: CRIT replication delay 260 seconds
[21:13:02] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db36 is CRITICAL: CRIT replication delay 341 seconds
[21:18:53] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay NULL seconds
[21:21:35] <nagios-wm>	 PROBLEM - MySQL Slave Running on db36 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Lock wait timeout exceeded: try restarting transaction on que
[21:28:50] <j^>	 Aaron|home, paravoid: looks like the last deploy of TMH caused 404s for transcoded videos (https://bugzilla.wikimedia.org/show_bug.cgi?id=45294) this can be fixed buy moving them to the transcoded container with the maintenance/moveTranscoded.php script
[21:35:13] <Reedy>	 j^: looks like htere's loads on commons alone..
[21:36:27] <j^>	 Reedy: yes this affects all videos
[21:36:47] <j^>	 the move from thumbs to trancoded required that they are all moved
[21:36:56] <j^>	 deploying the new coded also required moving them
[21:37:16] <j^>	 only see now that code was deployed but transcodes not moved
[21:43:45] <Reedy>	 !log mw110 is asking for a password
[21:43:49] <morebots>	 Logged the message, Master
[21:44:29] <nagios-wm>	 PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host
[21:44:46] <logmsgbot>	 !log reedy synchronized php-1.21wmf10/cache/interwiki.cdb
[21:44:47] <morebots>	 Logged the message, Master
[21:44:56] <nagios-wm>	 PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host
[21:45:40] <logmsgbot>	 !log reedy synchronized php-1.21wmf9/cache/interwiki.cdb
[21:45:41] <morebots>	 Logged the message, Master
[21:45:43] <Reedy>	 Nemo_bis: ^^ Both done
[21:48:01] <Nemo_bis>	 thanks!
[21:53:12] <Reedy>	 j^: Looks like this will need re-running when the wmf9 wikis are moved to wmf10?
[21:53:47] <Reedy>	 Due to "The MediaWiki script file "/usr/local/apache/common-local/php-1.21wmf9/extensions/TimedMediaHandler/maintenance/moveTranscoded.php" does not exist."
[21:53:48] <j^>	 Reedy: was this ever run?
[21:54:02] <j^>	 ah ok
[21:54:03] <Reedy>	 No idea
[21:54:03] <j^>	 hm
[21:54:33] <Reedy>	 i've just restarted it in a screen session to iterate over all the wikis
[21:54:41] <j^>	 commons is on 10 so yes needs to be run upgrading to 10
[21:55:44] <nagios-wm>	 PROBLEM - Puppet freshness on mw1039 is CRITICAL: Puppet has not run in the last 10 hours
[21:55:44] <nagios-wm>	 PROBLEM - Puppet freshness on mw64 is CRITICAL: Puppet has not run in the last 10 hours
[21:56:08] <j^>	 but possibly something else goes wrong https://upload.wikimedia.org/wikipedia/commons/transcoded/4/41/Gwtoolset-sprint6-demo.webm/Gwtoolset-sprint6-demo.webm.360p.webm return 401 not 404
[21:57:48] <Reedy>	 it's doing stuff
[21:58:40] <j^>	 ok waiting
[22:03:50] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 182 seconds
[22:04:36] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 200 seconds
[22:05:11] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db56 is CRITICAL: CRIT replication delay 247 seconds
[22:06:59] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db56 is OK: OK replication delay 0 seconds
[22:15:05] <nagios-wm>	 RECOVERY - MySQL disk space on neon is OK: DISK OK
[22:15:32] <nagios-wm>	 RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho
[22:33:23] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 2 seconds
[22:34:26] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds
[22:38:27] <nagios-wm>	 PROBLEM - Puppet freshness on db1009 is CRITICAL: Puppet has not run in the last 10 hours
[22:38:52] <gerrit-wm>	 New patchset: QChris; "Move connection limiting from gerrit's Jetty to Apache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50591
[22:40:24] <nagios-wm>	 PROBLEM - Puppet freshness on mw1134 is CRITICAL: Puppet has not run in the last 10 hours
[22:56:45] <nagios-wm>	 PROBLEM - MySQL Slave Running on db39 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Cant find record in page_restrictions on query. Default da
[23:37:05] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 185 seconds
[23:37:32] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 192 seconds
[23:40:41] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds
[23:41:08] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds