[00:00:59] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51068 [00:03:19] New patchset: Dzahn; "adjust NRPE commands for jenkins to use regex and check for exactly 1 process" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51070 [00:05:28] New patchset: Dzahn; "adjust NRPE commands for jenkins to use regex and check for exactly 1 process" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51070 [00:09:30] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [00:09:30] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [00:10:21] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [00:10:24] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [00:10:31] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [00:11:11] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [00:12:17] I'm a bit confused on my workflow. When I git review, do I need to invite specific reviewers or do they just patrol? [00:12:57] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [00:13:04] Coren, both:P [00:13:15] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [00:15:35] LeslieCarr: ^ :( [00:15:48] grrr [00:15:56] New patchset: Lcarr; "fixing up stop command on nrpe" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51072 [00:15:58] why did you kill it ? ;) [00:16:02] I didn't [00:16:16] mmmhmm [00:16:33] so --retry 15 was added ina new init script version [00:16:35] trying that :) [00:17:16] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51072 [00:22:42] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: Connection refused [00:24:03] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: Connection refused [00:25:47] !log tstarling cleared profiling data [00:25:50] Logged the message, Master [00:29:17] RECOVERY - jenkins_service_running on gallium is OK: PROCS OK: 3 processes with args jenkins [00:29:18] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [00:29:27] RECOVERY - MySQL disk space on neon is OK: DISK OK [00:29:27] RECOVERY - MySQL disk space on neon is OK: DISK OK [00:29:27] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [00:29:47] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 0.002 second response time [00:32:47] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: Connection refused [00:36:35] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 2 processes with args zuul-server [00:39:33] PROBLEM - mysqld processes on es1002 is CRITICAL: Connection refused by host [00:39:34] PROBLEM - MySQL Recent Restart on es1002 is CRITICAL: Connection refused by host [00:39:43] PROBLEM - MySQL disk space on es1002 is CRITICAL: Connection refused by host [00:41:35] RECOVERY - mysqld processes on es1002 is OK: PROCS OK: 1 process with command name mysqld [00:41:36] RECOVERY - MySQL Recent Restart on es1002 is OK: OK 16765469 seconds since restart [00:41:43] RECOVERY - MySQL disk space on es1002 is OK: DISK OK [00:44:38] New patchset: Ram; "Fix for bug 45266. Needs parallel changes to OAI." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/51077 [00:48:43] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 637 bytes in 0.001 second response time [00:49:04] New patchset: Ottomata; "Adding puppet-merge for sockpuppet puppet merges." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50452 [00:49:15] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK HTTP/1.1 200 OK - 637 bytes in 0.053 seconds [00:59:44] New review: Ottomata; "Ok, I'm ready for review." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50452 [01:00:59] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:02:59] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 637 bytes in 9.798 second response time [01:05:36] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:05:59] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:06:49] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 637 bytes in 0.001 second response time [01:07:15] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK HTTP/1.1 200 OK - 637 bytes in 0.055 seconds [01:07:30] andre__: grrrrr [01:09:30] jeremyb_, sorry, didn't understand your question earlier (didn't get that c4 addressed me). But yeah, needs some more changes I guess :-/ [01:14:22] PROBLEM - Auth DNS on ns0.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:16:39] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [01:16:48] ^^ that was me (the auth dns ) fixed [01:17:09] RECOVERY - Auth DNS on ns0.wikimedia.org is OK: DNS OK: 0.031 seconds response time. www.wikipedia.org returns 208.80.154.225 [01:17:29] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.013 second response time on port 8123 [01:18:21] New patchset: Asher; "trial twemproxy configs for eqiad and pmtpa app servers" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51087 [01:19:14] andre__: well i can't very well run a copy in labs (no DB dump or non-public config) [01:19:23] andre__: and i don't have access to the box [01:19:40] andre__: i suppose i could try to guess some more... but that's not likely to be fruitful i think [01:21:08] jeremyb_, yeah, setting up a labs instance is one of the dreams... [01:21:24] andre__ has a dream? [01:21:25] jeremyb_, but thanks for helping on that ticket. Appreciated! [01:21:42] Reedy: many. It's just that reality often interferes. [01:22:32] Reedy: Anything against syncing 1.21wmf1-8 ? [01:23:01] I just found another whole category of pages on a wiki with broken scripts and images [01:23:54] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (58044), Total (64604) [01:24:42] andre__: (unless you want to get me a copy of the config, etc.) anyway, focusing on other stuff for at least the next 16 hours [01:25:22] * jeremyb_ sees no sign of madman so far this week (at least not in one of his usual channels). :( [01:25:22] jeremyb_, heh, sure, please go ahead :) [01:25:23] Krinkle: Plus all the extensions? [01:25:42] that'll take a fucking age to check out [01:25:47] Reedy: Well, I'd probably use checkoutMediaWiki and whatever it does [01:25:54] most importantly static- on bits [01:26:07] Reedy: Yeah, it'll take a while [01:26:09] checkoutMediaWiki won't make it any quicker [01:26:13] I know [01:26:23] but at least I won't have to create the symlinks manually [01:26:28] Due to a retarded git bug, I had to move it back to checking out directly on NFS [01:26:31] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51087 [01:27:51] Reedy: I dont know about that, but I just want to get this over with. Not doing anything seems too traditional [01:28:54] Reedy: git bug? [01:28:56] Reedy: But I haven't created a wmf branch checkout on fenari before so I'd like to not be alone on it. [01:29:18] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 186 seconds [01:29:19] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 187 seconds [01:29:20] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 187 seconds [01:29:21] I don't know how big the harddrive is of it and the apache for example, and or whather that is a realistic concern when it is "only" a few mediawikis [01:29:32] It doesn't do anything that can cause any problems [01:29:36] All the apaches have been fixed now [01:29:38] Or should've been [01:29:44] Some of the "misc" type servers maybe not [01:29:48] Reedy: what do you mean, what was broken? [01:29:56] they had tiny / partitions [01:30:00] right [01:30:01] and they filled up [01:30:04] and the world failed [01:30:09] wee [01:30:12] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 190 seconds [01:30:15] jeremyb_: It sets some working dir variable in each extension submodule to /tmp [01:30:20] for no reason [01:30:23] after git was upgraded [01:33:18] Reedy: I usually only connect with -A if I plan to sync, which I don't right now (first creating one on fenari). Looks like it want my public key anyway,since it is an ssh checkout [01:33:28] which makes sense, so we can push back from fenari [01:33:50] just pointing out the obvious [01:34:18] I always connect with -A [01:34:37] Easier than having to restart connections when you'll be hopping to other servers [01:35:16] !log dist-upgrade and reboot grosley [01:35:19] Logged the message, Master [01:35:22] !log Bringing back checkouts of MediaWiki 1.21wmf5 - 1.21wmf1 on fenari for bug 44570 [01:35:23] Logged the message, Master [01:37:48] PROBLEM - HTTP on grosley is CRITICAL: Connection refused [01:38:18] PROBLEM - Exim SMTP on grosley is CRITICAL: Connection refused [01:38:28] PROBLEM - SSH on grosley is CRITICAL: Connection refused [01:39:18] RECOVERY - Exim SMTP on grosley is OK: SMTP OK - 0.134 sec. response time [01:39:28] RECOVERY - SSH on grosley is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [01:39:48] RECOVERY - HTTP on grosley is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.068 second response time [01:41:09] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 24 seconds [01:41:18] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 27 seconds [01:41:19] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 28 seconds [01:46:28] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:46:39] New patchset: Ottomata; "Adding puppet-merge for sockpuppet puppet merges." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50452 [01:47:54] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:51:03] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [01:55:24] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [02:00:51] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51070 [02:04:06] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [02:04:30] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [02:05:58] Reedy: almost done [02:06:01] fatal: remote error: Git repository not found [02:06:01] Clone of 'https://gerrit.wikimedia.org/r/p/mediawiki/extensions/wikidiff2.git' into submodule path 'extensions/wikidiff2' failed [02:06:04] yeah, that was to be expected [02:06:22] yeah [02:06:29] you could make a commit to remove it from the submodule list though I guess [02:07:30] removing a submodule has proven difficult in the past [02:07:32] (for me that is) [02:07:45] usually stackoverflow has a good solution though [02:09:50] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: Connection refused [02:12:21] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: Connection refused [02:13:00] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 2 processes with args zuul-server [02:21:50] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 0.001 second response time [02:23:18] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK HTTP/1.1 200 OK - 635 bytes in 0.054 seconds [02:24:24] Reedy: https://gerrit.wikimedia.org/r/#/q/I133b716e69cdcdd5e47dcb73937a75adca7721d3,n,z [02:31:08] !log LocalisationUpdate completed (1.21wmf10) at Wed Feb 27 02:31:06 UTC 2013 [02:31:09] Logged the message, Master [02:35:28] RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay 0 seconds [02:35:48] RECOVERY - MySQL Replication Heartbeat on db36 is OK: OK replication delay 0 seconds [02:36:12] RECOVERY - MySQL Replication Heartbeat on db36 is OK: OK replication delay 0 seconds [02:36:48] RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay 0 seconds [02:49:15] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , Total (13637) [02:54:28] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 189 seconds [02:54:28] PROBLEM - MySQL Slave Delay on db1020 is CRITICAL: CRIT replication delay 185 seconds [02:54:48] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 203 seconds [02:54:58] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 214 seconds [02:55:06] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 215 seconds [02:56:28] RECOVERY - MySQL Slave Delay on db1020 is OK: OK replication delay 2 seconds [02:57:23] !log LocalisationUpdate completed (1.21wmf9) at Wed Feb 27 02:57:22 UTC 2013 [02:57:25] Logged the message, Master [03:06:58] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 183 seconds [03:07:18] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 188 seconds [03:07:24] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 188 seconds [03:07:51] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 195 seconds [03:15:30] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , commonswiki (277276), Total (283779) [03:55:00] PROBLEM - Varnish HTTP bits on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:50] PROBLEM - SSH on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:56:45] PROBLEM - Varnish HTTP bits on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:57:48] PROBLEM - SSH on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:06:33] Hm.. syntax check in sync-dir seems rather pointless [04:06:43] it treats it as a file [04:06:56] which it obviously isn't since the line above asserted it is a dir [04:07:05] it can't fail as the argument isn't a file [04:07:08] :/ [04:08:40] PROBLEM - NTP on niobium is CRITICAL: NTP CRITICAL: No response from NTP server [04:09:32] RECOVERY - NTP on niobium is OK: NTP OK: Offset 0.0002081394196 secs [04:09:40] RECOVERY - SSH on niobium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [04:11:50] RECOVERY - Varnish HTTP bits on niobium is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 0.088 second response time [04:12:03] RECOVERY - SSH on niobium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [04:12:48] RECOVERY - Varnish HTTP bits on niobium is OK: HTTP OK HTTP/1.1 200 OK - 635 bytes in 3.047 seconds [04:22:25] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [04:22:35] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [04:24:03] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [04:25:06] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [04:37:19] !log krinkle Started syncing Wikimedia installation... : [04:37:21] Logged the message, Master [04:37:39] wait,what? [04:37:48] sync-common-all is now a straight forward to scap.. [04:38:32] er.. outdated documentation [04:38:38] Looks like I did do the right thing [04:49:25] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 10 seconds [04:49:25] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 3 seconds [04:50:05] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [04:50:27] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [04:51:35] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [04:52:35] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 3.006 second response time on port 8123 [04:57:06] !log krinkle Finished syncing Wikimedia installation... : [04:57:07] Logged the message, Master [05:02:36] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [05:15:32] PROBLEM - Varnish HTTP bits on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:00] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:42] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:40] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:17:45] PROBLEM - Varnish HTTP bits on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:18:31] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [05:23:40] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:24:30] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.001 second response time on port 8123 [05:30:12] RECOVERY - Varnish HTTP bits on arsenic is OK: HTTP OK HTTP/1.1 200 OK - 635 bytes in 0.055 seconds [05:30:20] RECOVERY - Varnish HTTP bits on arsenic is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 0.732 second response time [05:30:40] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:30:57] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [05:31:04] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [05:31:06] RECOVERY - MySQL disk space on neon is OK: DISK OK [05:31:24] RECOVERY - MySQL disk space on neon is OK: DISK OK [05:31:34] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [05:31:43] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [05:33:44] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [05:34:06] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:35:53] !log restarted lucene search on search1016 [05:35:55] Logged the message, Master [05:36:34] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.002 second response time on port 8123 [05:36:35] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.003 second response time on port 8123 [05:37:24] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [06:04:33] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [06:08:31] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [06:08:31] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [06:09:30] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [06:10:00] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [06:11:00] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [06:15:04] !log added 1.21wmf1, 1.21wmf2 (bug 44570) [06:15:06] Logged the message, Master [06:15:09] !log adding 1.21wmf3, 1.21wmf4, 1.21wmf5 (bug 44570) [06:15:10] Logged the message, Master [06:27:18] !log krinkle Started syncing Wikimedia installation... : [06:27:20] Logged the message, Master [06:29:12] !log snapshot1002 down for reinstallation [06:29:14] Logged the message, Master [06:29:30] Krinkl, are you dding just the skins or more than that? [06:29:35] *adding [06:29:40] PROBLEM - Host snapshot1002 is DOWN: PING CRITICAL - Packet loss = 100% [06:30:00] RECOVERY - Host snapshot1002 is UP: PING OK - Packet loss = 0%, RTA = 1.18 ms [06:32:40] PROBLEM - SSH on snapshot1002 is CRITICAL: Connection refused [06:35:09] PROBLEM - SSH on snapshot1002 is CRITICAL: Connection refused [06:40:34] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [06:41:36] RECOVERY - MySQL disk space on neon is OK: DISK OK [06:41:37] RECOVERY - MySQL disk space on neon is OK: DISK OK [06:41:54] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [06:44:18] RECOVERY - SSH on snapshot1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [06:44:34] RECOVERY - SSH on snapshot1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [06:44:54] PROBLEM - NTP on snapshot1002 is CRITICAL: NTP CRITICAL: No response from NTP server [06:49:30] apergos: static needs core/skins, core/resources and core/extensions [06:49:45] apergos: using multiversion/checkoutMediaWiki which creates the symlinks in live-1.5 and bits where they are needed [06:50:07] extensions? that's a shame [06:50:10] which means it is a full git clone, seems to be the most straight forward to make sure git command work as expected. [06:50:12] yep [06:50:25] lot of our UI come from plugins [06:50:27] at least it won't need the l10n cdb which is huge [06:50:32] indeed [06:50:45] I wonder how long we have to keep that crap around for [06:51:03] see bug [06:51:28] at least as long as our cache is valid [06:51:46] yeah I dunno how long that is [06:52:02] a year? [06:52:05] yeah, that's the first task on the list, but that's for operations to figure our [06:52:05] ok that's not really cool [06:52:13] I doubt a year [06:52:28] see comment 16 [06:52:34] but that's not my concern. if you guys want to keep cache for a year, that means bits for a year and all versions. [06:52:47] which 'you guys'? [06:52:52] assuming that is crazy, that means we need to shorten the cache. [06:53:05] apergos: nobody, everybody, we, us, the legion, or just operations? [06:53:23] this sounds like a decision that was made by the legion tbh [06:53:40] the collective 'you' that manages the servers and the cache protocols [06:53:48] anyways that's a ridiculous length of time and it needs to be cut down to something reasonable [06:53:54] yep [06:55:18] commented on bug [06:57:58] PROBLEM - NTP on snapshot1002 is CRITICAL: NTP CRITICAL: No response from NTP server [07:02:52] apergos: Hm.. I noticed earlier and now again that sync-common-all doesnt' return to my shell [07:03:00] The last output is snapshot1001: Done [07:03:03] I think that's the last one [07:03:09] I haven't seen any output for 2-3 minutees [07:03:10] 1002 is not going to sync righ tnow [07:03:20] as noted here it's being reinstalled [07:03:29] I'm not worried about that [07:03:33] I don't see why the script would hang however [07:03:49] I can press Enter but it just inserts empty lines [07:03:58] ah, so it isn't the last one [07:04:11] Okay, it is picking up at snapshot1004: now [07:04:17] still going :) [07:04:18] all right then [07:04:20] !log krinkle Finished syncing Wikimedia installation... : [07:04:22] Logged the message, Master [07:04:24] wow that took a long while for it to time out [07:04:26] and there it is [07:04:33] yes, 2 minutes is a bit long isn't it [07:04:53] all together the scap took 30 minutes [07:05:19] !log done adding 1.21wmf3, 1.21wmf4, 1.21wmf5 (bug 44570) [07:05:20] Logged the message, Master [07:06:09] that's a long time [07:06:31] the one earlier today 'only' took 20 minutes [07:08:19] hmm hungry. from not having time to eat dinner last night [07:08:25] guess I'd better do something about it [07:08:48] apergos: looks like bits and squid have already picked the new data up [07:08:48] https://wikimediafoundation.org/wiki/Wikimedia_visual_identity_guidelines [07:08:54] I guess 404 have a shorter cache tiem [07:08:56] "have already" [07:09:01] (which is good in this case) [07:09:17] that package used to look broken (icons 404) [07:09:20] ah [07:09:26] page* [07:09:36] I wouldn't be surpised if 1.20wmf\d are still referenced [07:09:46] Reedy: Hm.. I hope not [07:09:55] Reedy: No, don't worry,I won't bring those back (yet) [07:10:06] Reedy: Can't we see though? 404 hits [07:10:07] no plese don't [07:10:12] on squid and bits [07:10:25] I'm still not sure why we haven't just adjusted the caches [07:10:37] At least for foundationwiki which is the main place people see the problem [07:10:44] contains 'GET 404' and '/static-' [07:11:12] 'wgSquidMaxage' => array( [07:11:12] 'default' => 2678400, // 31 days seems about right [07:11:12] 'foundationwiki' => 3600, // template links may be funky [07:11:12] ), [07:11:20] ^ Evidence foundationwiki is special [07:11:21] Reedy: That would make it easier to see where they are referenced and how far back. And, more importantly, where/when the cut off is. [07:11:32] Well, read the config [07:11:36] 31 days for squid [07:11:37] Apparently [07:11:38] Reedy: Shorter, not longer. I doubt that is notable in this case. [07:11:49] 'wgParserCacheExpireTime' => array( [07:11:50] 'default' => 86400 * 365, [07:11:50] ), [07:11:52] But that's a lie [07:11:57] right [07:12:06] As we purge at some other point using the maintenance script [07:12:09] or should be [07:12:30] it would be nice to adjust that value [07:12:32] I suppose that $wgParserCacheExpireTime is just to make sure apaches don't go around purging parser cache [07:12:42] so that we don't rely on knowing what some cron job somewhere says, right? [07:13:16] https://gerrit.wikimedia.org/r/#/c/47202/ [07:13:23] Uploaded Feb 2, 2013 3:34 AM [07:13:28] For bug 44570 - Make the parser cache expire at 30 days [07:13:43] but no reviews [07:13:51] is that linked to in the bug report? [07:14:00] :D [07:14:04] Aaron did +1 it originally [07:14:08] and? [07:14:23] It's been rebased, and then jeremyb_ edited the commit summary after [07:14:28] ah [07:14:46] Reedy: Can we get 404 stats? That would make it possible to measure this. Especially if/when we set from year to a month, to see whether it actually helps a month from now. [07:14:53] Krinkle: Ask ops? [07:15:00] sure, will do. [07:15:10] Just thought maybe there is something for it in place already [07:15:11] I've nfi what's (if anything) logged in varnish [07:15:20] me either sadly [07:15:21] like fatalmonitor and those other /a/mwlogs somewhere [07:15:46] 1.21wmf1 is 764M including the .git folder [07:15:46] Reedy: Actually, varninsh may be better logged than squids even [07:15:46] :| [07:15:57] So at rough numbers [07:16:02] We've got 7GB of mediawiki checkouts? [07:16:02] Reedy: yeah, I noticed ganglia earlier, fenari spiking network [07:16:05] Ignoring l10n cache [07:16:14] l10n cache only for active checkouts [07:16:21] As I said, ignoring them [07:16:21] yeah I so don't love it [07:16:46] Reedy: Varnish is used for event logging as well, unsampled I think, but short term only. [07:16:52] squids are sampled obviously [07:16:54] no wait [07:17:03] lol, 1.21wmf2 is 826M [07:17:05] varnish not sampled would be weird if squid is [07:17:11] we have the 1000 sampled log for squid [07:17:18] since varnish probably gets at least 2X the number of requests [07:17:33] yeah [07:17:55] but every html page has at least 3 or 4 load.php calls (still reducing, but 3 or 4 certainly) [07:18:29] http://ganglia.wikimedia.org/latest/?c=Bits%20caches%20eqiad&h=arsenic.wikimedia.org&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [07:18:32] Ganglia has some stats [07:19:06] what's up with memory boogie here? http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Bits+caches+eqiad&h=arsenic.wikimedia.org&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS [07:19:33] staircases [07:19:48] It's swapping a lot [07:20:03] it's a memleak [07:20:11] ahh [07:20:14] i've been debugging that on and off today [07:20:21] Is varnish being restarted at somepoint? [07:20:29] manually [07:20:29] every 6 hours? [07:20:49] or by the OOM killer sometimes ;) [07:20:54] Krinkle: If you ask mark nicely he might know if we've any 404 stats for varnish :p [07:21:04] ah, only in the last 24 hours [07:21:05] http://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&c=Bits+caches+eqiad&h=arsenic.wikimedia.org&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS [07:21:09] all squid and varnish logging traffic is sent to the log servers [07:21:16] where it's sampled or otherwise processed [07:21:28] 404s are logged just the same [07:21:32] mark: regardless of the request^ [07:21:34] perfect [07:22:06] mark: What kind of format are they stored / are they read into something for querying? [07:22:37] the sampled logs are just sent straight to disk, 1/1000 requests [07:22:40] e.g. how would I go about getting results on 404 hits by unique url for bits.wikimedia.org/static-* [07:22:42] for the rest, ask analytics [07:22:49] alrighty [07:23:17] The sampled logs should be fine [07:23:20] mark: how far bakc? [07:23:23] mark: how far back? [07:23:40] don't know right now [07:23:49] ok, don't bother. Thx for the info! [07:26:40] New review: ArielGlenn; "I'd like to see either Tim or Asher sign off on this (soon)." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/47202 [07:27:27] # Purge entries older than 30d * 86400s/d = 2592000s [07:27:28] command => '/usr/local/bin/mwscript purgeParserCache.php --wiki=aawiki --age=2592000 >/dev/null 2>&1', [07:28:05] runs how often? [07:29:07] 1am on Sunday, I think [07:29:18] minute => 0, hour => 1, weekday => 0, [07:29:20] once a week = max lifetime 37 days then [07:29:38] cya ltr [07:30:02] * apergos looks for an rt ticket n this very issue [07:30:03] Which in theory means the mediawiki variable makes no difference [07:30:14] yes (though it should match) [07:30:28] Yeah [07:30:32] but in that case why do we have stuff from wmf1 still around? is that form the last 30 days? [07:30:35] or 37... [07:31:10] far from it [07:31:34] 9 versions later should mean it should have been out of production maybe 18 weeks ago [07:31:46] https://www.mediawiki.org/wiki/MediaWiki_1.21/Roadmap [07:32:00] wmf2 was fully deployed by Wednesday, October 24, 2012 [07:32:16] 4 months ago [07:33:06] so [07:33:19] if we can't toss those, why is that? [07:35:28] meh the rt ticket shows the cron job but not who gave the thumbs up [07:36:07] Between Tim and Asher [07:36:10] I think Tim wrote the script [07:37:54] commit b3d484f9f530d785fb10a0c48b3b84ce0fc88b16 [07:37:54] Merge: 4de369b d70e593 [07:37:54] Author: Asher [07:39:09] Though.. The parser cache wouldn't actually include the html of the footer, would it? :/ [07:39:58] is that one of the broken bits? grrrr [07:40:42] The problems people usually notice is the missing magnifying glass for the top search box, and the powered by mediawiki image [07:41:21] Powered by MediaWiki [07:43:34] since asher merged this in with 30 days for cron I am fine with +2 it actually for your fix [07:44:06] lemme do that now and then we can try to figure out the footer [07:46:07] New review: ArielGlenn; "actually ashar merged the equivalent cron job (30 day expiry, purge once a week) in r38275 so I'll +..." [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/47202 [07:46:07] Change merged: ArielGlenn; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47202 [07:46:39] how does that get deployed? [07:47:30] git pull on fenari then sync-file [07:47:48] We could actually fix the powered by mediawiki one by changing it's entry in $wgFooterIcons to something constant [07:48:03] "src" => null, // Defaults to "$wgStylePath/common/images/poweredby_mediawiki_88x31.png" [07:49:01] Bit of a hack though [07:49:14] uh... git pull into where? [07:49:37] /home/wikipedia/common [07:49:47] huh [07:50:16] so it is [07:51:05] do I need to worry about the untracked files? [07:53:56] nope [07:54:16] 1.21wmf4 has no bits docroot or live-1.5 symlinks :/ [07:56:09] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [07:56:13] *sigh* git fetch origin shows nothing happened. git diff HEAD origin/master also shows nothing [07:57:24] it's already 30 days? [07:57:48] the log says it's been merged.. [07:57:53] yeah I merged it [07:57:54] but [07:58:19] !log reedy synchronized wmf-config/InitialiseSettings.php [07:58:21] I didn't do a git pull, I juss did a fetch (which returned silently) and a diff [07:58:21] Logged the message, Master [07:58:24] snapshot1002: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ [07:58:27] which showed nothing [07:58:38] yeah, it's being reinstalled with [07:58:46] *cough* larger partition *cough* guess why [07:59:08] :( [07:59:09] I'll do the other ones over there the same way once this is set up [07:59:18] hey all need to move to precise anyways [07:59:35] New patchset: Reedy; "Numerous symlinks..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51113 [08:00:12] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51113 [08:01:41] I don't see why powered by mediawiki has to vary by version [08:01:46] (back to the footer) [08:02:28] It doesn't really, it's just with the paths it uses end up with it having a version in it [08:02:45] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [08:02:45] PROBLEM - Puppet freshness on lardner is CRITICAL: Puppet has not run in the last 10 hours [08:02:54] New patchset: Reedy; "Add wmf4" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51114 [08:03:33] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51114 [08:03:34] we don't have version free urls anywhere I suppose, meh [08:03:43] which is logical but also annoying [08:03:48] PROBLEM - Puppet freshness on amssq59 is CRITICAL: Puppet has not run in the last 10 hours [08:03:48] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [08:03:48] PROBLEM - Puppet freshness on es2 is CRITICAL: Puppet has not run in the last 10 hours [08:03:48] PROBLEM - Puppet freshness on mw108 is CRITICAL: Puppet has not run in the last 10 hours [08:03:48] PROBLEM - Puppet freshness on mw101 is CRITICAL: Puppet has not run in the last 10 hours [08:03:49] PROBLEM - Puppet freshness on gurvin is CRITICAL: Puppet has not run in the last 10 hours [08:03:49] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [08:03:50] PROBLEM - Puppet freshness on mc6 is CRITICAL: Puppet has not run in the last 10 hours [08:03:50] PROBLEM - Puppet freshness on mw58 is CRITICAL: Puppet has not run in the last 10 hours [08:03:51] PROBLEM - Puppet freshness on mw68 is CRITICAL: Puppet has not run in the last 10 hours [08:03:51] PROBLEM - Puppet freshness on sq83 is CRITICAL: Puppet has not run in the last 10 hours [08:03:52] PROBLEM - Puppet freshness on mw92 is CRITICAL: Puppet has not run in the last 10 hours [08:03:52] PROBLEM - Puppet freshness on mw7 is CRITICAL: Puppet has not run in the last 10 hours [08:03:53] PROBLEM - Puppet freshness on sq42 is CRITICAL: Puppet has not run in the last 10 hours [08:04:07] !log reedy synchronized docroot/bits/static-1.21wmf4/ [08:04:08] Logged the message, Master [08:04:42] !log reedy synchronized live-1.5/static-1.21wmf4/ [08:04:44] Logged the message, Master [08:04:51] PROBLEM - Puppet freshness on db78 is CRITICAL: Puppet has not run in the last 10 hours [08:04:51] PROBLEM - Puppet freshness on db64 is CRITICAL: Puppet has not run in the last 10 hours [08:04:51] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [08:04:51] PROBLEM - Puppet freshness on lvs3 is CRITICAL: Puppet has not run in the last 10 hours [08:04:52] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: Puppet has not run in the last 10 hours [08:04:52] PROBLEM - Puppet freshness on mw33 is CRITICAL: Puppet has not run in the last 10 hours [08:04:52] PROBLEM - Puppet freshness on mw109 is CRITICAL: Puppet has not run in the last 10 hours [08:04:53] PROBLEM - Puppet freshness on mw119 is CRITICAL: Puppet has not run in the last 10 hours [08:04:53] PROBLEM - Puppet freshness on mw87 is CRITICAL: Puppet has not run in the last 10 hours [08:04:54] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [08:04:54] PROBLEM - Puppet freshness on sq68 is CRITICAL: Puppet has not run in the last 10 hours [08:04:55] PROBLEM - Puppet freshness on sq65 is CRITICAL: Puppet has not run in the last 10 hours [08:04:55] PROBLEM - Puppet freshness on yvon is CRITICAL: Puppet has not run in the last 10 hours [08:04:56] PROBLEM - Puppet freshness on sq70 is CRITICAL: Puppet has not run in the last 10 hours [08:04:56] PROBLEM - Puppet freshness on virt10 is CRITICAL: Puppet has not run in the last 10 hours [08:04:57] PROBLEM - Puppet freshness on sq76 is CRITICAL: Puppet has not run in the last 10 hours [08:05:45] PROBLEM - Puppet freshness on db58 is CRITICAL: Puppet has not run in the last 10 hours [08:05:45] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [08:05:45] PROBLEM - Puppet freshness on linne is CRITICAL: Puppet has not run in the last 10 hours [08:05:45] PROBLEM - Puppet freshness on mc10 is CRITICAL: Puppet has not run in the last 10 hours [08:05:45] PROBLEM - Puppet freshness on kuo is CRITICAL: Puppet has not run in the last 10 hours [08:05:46] PROBLEM - Puppet freshness on mw104 is CRITICAL: Puppet has not run in the last 10 hours [08:05:46] PROBLEM - Puppet freshness on mc8 is CRITICAL: Puppet has not run in the last 10 hours [08:05:47] PROBLEM - Puppet freshness on mw52 is CRITICAL: Puppet has not run in the last 10 hours [08:05:47] PROBLEM - Puppet freshness on mw15 is CRITICAL: Puppet has not run in the last 10 hours [08:05:48] PROBLEM - Puppet freshness on mw84 is CRITICAL: Puppet has not run in the last 10 hours [08:05:48] PROBLEM - Puppet freshness on mw63 is CRITICAL: Puppet has not run in the last 10 hours [08:05:49] PROBLEM - Puppet freshness on search26 is CRITICAL: Puppet has not run in the last 10 hours [08:05:49] PROBLEM - Puppet freshness on search18 is CRITICAL: Puppet has not run in the last 10 hours [08:05:50] PROBLEM - Puppet freshness on sq82 is CRITICAL: Puppet has not run in the last 10 hours [08:05:50] PROBLEM - Puppet freshness on srv246 is CRITICAL: Puppet has not run in the last 10 hours [08:05:51] PROBLEM - Puppet freshness on search31 is CRITICAL: Puppet has not run in the last 10 hours [08:05:51] PROBLEM - Puppet freshness on sq74 is CRITICAL: Puppet has not run in the last 10 hours [08:05:55] gtfo nagios-wm [08:06:08] apergos: I do agree, its very overkill to have all these mediawiki trees checked out for 404s for 2 images [08:06:48] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [08:06:48] PROBLEM - Puppet freshness on mw18 is CRITICAL: Puppet has not run in the last 10 hours [08:06:48] PROBLEM - Puppet freshness on mw34 is CRITICAL: Puppet has not run in the last 10 hours [08:06:48] PROBLEM - Puppet freshness on pdf1 is CRITICAL: Puppet has not run in the last 10 hours [08:06:48] PROBLEM - Puppet freshness on mw11 is CRITICAL: Puppet has not run in the last 10 hours [08:06:49] PROBLEM - Puppet freshness on mw74 is CRITICAL: Puppet has not run in the last 10 hours [08:06:49] PROBLEM - Puppet freshness on search30 is CRITICAL: Puppet has not run in the last 10 hours [08:06:50] PROBLEM - Puppet freshness on ssl1 is CRITICAL: Puppet has not run in the last 10 hours [08:06:50] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [08:07:30] LeslieCarr: nagios is revolting [08:07:51] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [08:07:52] PROBLEM - Puppet freshness on db56 is CRITICAL: Puppet has not run in the last 10 hours [08:07:52] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Puppet has not run in the last 10 hours [08:07:52] PROBLEM - Puppet freshness on mw24 is CRITICAL: Puppet has not run in the last 10 hours [08:07:52] PROBLEM - Puppet freshness on sq50 is CRITICAL: Puppet has not run in the last 10 hours [08:07:52] PROBLEM - Puppet freshness on mw122 is CRITICAL: Puppet has not run in the last 10 hours [08:07:52] PROBLEM - Puppet freshness on srv260 is CRITICAL: Puppet has not run in the last 10 hours [08:07:53] PROBLEM - Puppet freshness on srv269 is CRITICAL: Puppet has not run in the last 10 hours [08:07:53] PROBLEM - Puppet freshness on ms-fe4 is CRITICAL: Puppet has not run in the last 10 hours [08:07:54] PROBLEM - Puppet freshness on pc1 is CRITICAL: Puppet has not run in the last 10 hours [08:07:54] PROBLEM - Puppet freshness on virt6 is CRITICAL: Puppet has not run in the last 10 hours [08:08:14] foundationwiki:pcache:idhash:21087-0!*!0!!*!4!* and timestamp 20120919200207 [08:08:31] it lies [08:08:45] PROBLEM - Puppet freshness on db10 is CRITICAL: Puppet has not run in the last 10 hours [08:08:45] PROBLEM - Puppet freshness on db43 is CRITICAL: Puppet has not run in the last 10 hours [08:08:45] PROBLEM - Puppet freshness on mw124 is CRITICAL: Puppet has not run in the last 10 hours [08:08:46] PROBLEM - Puppet freshness on mchenry is CRITICAL: Puppet has not run in the last 10 hours [08:08:46] PROBLEM - Puppet freshness on mw49 is CRITICAL: Puppet has not run in the last 10 hours [08:08:46] PROBLEM - Puppet freshness on mexia is CRITICAL: Puppet has not run in the last 10 hours [08:08:46] PROBLEM - Puppet freshness on mw31 is CRITICAL: Puppet has not run in the last 10 hours [08:08:47] PROBLEM - Puppet freshness on mw60 is CRITICAL: Puppet has not run in the last 10 hours [08:08:47] PROBLEM - Puppet freshness on sq62 is CRITICAL: Puppet has not run in the last 10 hours [08:08:47] The last Puppet run was at Wed Feb 27 07:59:11 UTC 2013 (8 minutes ago). [08:08:47] from srv269 [08:08:48] PROBLEM - Puppet freshness on sq85 is CRITICAL: Puppet has not run in the last 10 hours [08:08:48] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [08:08:49] PROBLEM - Puppet freshness on mw81 is CRITICAL: Puppet has not run in the last 10 hours [08:08:49] PROBLEM - Puppet freshness on sq71 is CRITICAL: Puppet has not run in the last 10 hours [08:08:50] PROBLEM - Puppet freshness on srv247 is CRITICAL: Puppet has not run in the last 10 hours [08:08:50] PROBLEM - Puppet freshness on srv298 is CRITICAL: Puppet has not run in the last 10 hours [08:08:51] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [08:09:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:09:48] PROBLEM - Puppet freshness on db65 is CRITICAL: Puppet has not run in the last 10 hours [08:09:49] PROBLEM - Puppet freshness on mw16 is CRITICAL: Puppet has not run in the last 10 hours [08:09:49] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [08:09:49] PROBLEM - Puppet freshness on mc7 is CRITICAL: Puppet has not run in the last 10 hours [08:09:49] PROBLEM - Puppet freshness on mw19 is CRITICAL: Puppet has not run in the last 10 hours [08:09:49] PROBLEM - Puppet freshness on mw40 is CRITICAL: Puppet has not run in the last 10 hours [08:09:50] PROBLEM - Puppet freshness on srv277 is CRITICAL: Puppet has not run in the last 10 hours [08:09:50] PROBLEM - Puppet freshness on srv243 is CRITICAL: Puppet has not run in the last 10 hours [08:09:51] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [08:10:14] how is there anything that old in there? [08:10:51] PROBLEM - Puppet freshness on db33 is CRITICAL: Puppet has not run in the last 10 hours [08:10:51] PROBLEM - Puppet freshness on db39 is CRITICAL: Puppet has not run in the last 10 hours [08:10:51] PROBLEM - Puppet freshness on db60 is CRITICAL: Puppet has not run in the last 10 hours [08:10:51] PROBLEM - Puppet freshness on db54 is CRITICAL: Puppet has not run in the last 10 hours [08:10:51] PROBLEM - Puppet freshness on lvs4 is CRITICAL: Puppet has not run in the last 10 hours [08:10:52] PROBLEM - Puppet freshness on mc11 is CRITICAL: Puppet has not run in the last 10 hours [08:10:52] PROBLEM - Puppet freshness on emery is CRITICAL: Puppet has not run in the last 10 hours [08:10:53] PROBLEM - Puppet freshness on mw38 is CRITICAL: Puppet has not run in the last 10 hours [08:10:53] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [08:10:54] PROBLEM - Puppet freshness on mw48 is CRITICAL: Puppet has not run in the last 10 hours [08:10:54] PROBLEM - Puppet freshness on sq37 is CRITICAL: Puppet has not run in the last 10 hours [08:10:55] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [08:10:55] PROBLEM - Puppet freshness on sq54 is CRITICAL: Puppet has not run in the last 10 hours [08:10:56] PROBLEM - Puppet freshness on mw6 is CRITICAL: Puppet has not run in the last 10 hours [08:10:56] PROBLEM - Puppet freshness on sq59 is CRITICAL: Puppet has not run in the last 10 hours [08:10:58] I'm not sure [08:11:00] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [08:11:07] I was just going to have a look at the parser cache mysql boxes [08:11:54] PROBLEM - Puppet freshness on db40 is CRITICAL: Puppet has not run in the last 10 hours [08:11:55] PROBLEM - Puppet freshness on ms-be11 is CRITICAL: Puppet has not run in the last 10 hours [08:11:55] PROBLEM - Puppet freshness on dataset2 is CRITICAL: Puppet has not run in the last 10 hours [08:11:55] PROBLEM - Puppet freshness on ms-be2 is CRITICAL: Puppet has not run in the last 10 hours [08:11:55] PROBLEM - Puppet freshness on mw30 is CRITICAL: Puppet has not run in the last 10 hours [08:11:55] PROBLEM - Puppet freshness on mw67 is CRITICAL: Puppet has not run in the last 10 hours [08:11:55] PROBLEM - Puppet freshness on mw69 is CRITICAL: Puppet has not run in the last 10 hours [08:11:56] PROBLEM - Puppet freshness on mw88 is CRITICAL: Puppet has not run in the last 10 hours [08:11:56] PROBLEM - Puppet freshness on sq60 is CRITICAL: Puppet has not run in the last 10 hours [08:11:57] PROBLEM - Puppet freshness on sq66 is CRITICAL: Puppet has not run in the last 10 hours [08:11:57] PROBLEM - Puppet freshness on sq61 is CRITICAL: Puppet has not run in the last 10 hours [08:11:58] PROBLEM - Puppet freshness on sq69 is CRITICAL: Puppet has not run in the last 10 hours [08:11:58] PROBLEM - Puppet freshness on srv270 is CRITICAL: Puppet has not run in the last 10 hours [08:11:59] PROBLEM - Puppet freshness on srv280 is CRITICAL: Puppet has not run in the last 10 hours [08:11:59] PROBLEM - Puppet freshness on srv282 is CRITICAL: Puppet has not run in the last 10 hours [08:12:00] PROBLEM - Puppet freshness on srv293 is CRITICAL: Puppet has not run in the last 10 hours [08:12:00] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [08:12:48] PROBLEM - Puppet freshness on db38 is CRITICAL: Puppet has not run in the last 10 hours [08:12:48] PROBLEM - Puppet freshness on grosley is CRITICAL: Puppet has not run in the last 10 hours [08:12:48] PROBLEM - Puppet freshness on mc9 is CRITICAL: Puppet has not run in the last 10 hours [08:12:48] PROBLEM - Puppet freshness on mw103 is CRITICAL: Puppet has not run in the last 10 hours [08:12:49] PROBLEM - Puppet freshness on mw32 is CRITICAL: Puppet has not run in the last 10 hours [08:12:49] PROBLEM - Puppet freshness on mw82 is CRITICAL: Puppet has not run in the last 10 hours [08:12:49] PROBLEM - Puppet freshness on manutius is CRITICAL: Puppet has not run in the last 10 hours [08:12:50] PROBLEM - Puppet freshness on mw91 is CRITICAL: Puppet has not run in the last 10 hours [08:12:50] PROBLEM - Puppet freshness on es8 is CRITICAL: Puppet has not run in the last 10 hours [08:12:51] PROBLEM - Puppet freshness on search21 is CRITICAL: Puppet has not run in the last 10 hours [08:12:51] PROBLEM - Puppet freshness on mw55 is CRITICAL: Puppet has not run in the last 10 hours [08:12:52] PROBLEM - Puppet freshness on pc2 is CRITICAL: Puppet has not run in the last 10 hours [08:12:52] PROBLEM - Puppet freshness on sq51 is CRITICAL: Puppet has not run in the last 10 hours [08:12:53] PROBLEM - Puppet freshness on ssl3 is CRITICAL: Puppet has not run in the last 10 hours [08:12:53] PROBLEM - Puppet freshness on sq72 is CRITICAL: Puppet has not run in the last 10 hours [08:12:54] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [08:13:51] PROBLEM - Puppet freshness on dobson is CRITICAL: Puppet has not run in the last 10 hours [08:13:51] PROBLEM - Puppet freshness on lvs5 is CRITICAL: Puppet has not run in the last 10 hours [08:13:51] PROBLEM - Puppet freshness on colby is CRITICAL: Puppet has not run in the last 10 hours [08:13:51] PROBLEM - Puppet freshness on es6 is CRITICAL: Puppet has not run in the last 10 hours [08:13:51] PROBLEM - Puppet freshness on wtp1 is CRITICAL: Puppet has not run in the last 10 hours [08:13:52] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [08:13:52] PROBLEM - Puppet freshness on search33 is CRITICAL: Puppet has not run in the last 10 hours [08:13:53] PROBLEM - Puppet freshness on williams is CRITICAL: Puppet has not run in the last 10 hours [08:13:53] PROBLEM - Puppet freshness on srv300 is CRITICAL: Puppet has not run in the last 10 hours [08:13:54] PROBLEM - Puppet freshness on sq56 is CRITICAL: Puppet has not run in the last 10 hours [08:14:45] PROBLEM - Puppet freshness on cp3019 is CRITICAL: Puppet has not run in the last 10 hours [08:14:45] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [08:14:46] PROBLEM - Puppet freshness on db51 is CRITICAL: Puppet has not run in the last 10 hours [08:14:46] PROBLEM - Puppet freshness on es4 is CRITICAL: Puppet has not run in the last 10 hours [08:14:46] PROBLEM - Puppet freshness on ms10 is CRITICAL: Puppet has not run in the last 10 hours [08:14:46] PROBLEM - Puppet freshness on mc4 is CRITICAL: Puppet has not run in the last 10 hours [08:14:46] PROBLEM - Puppet freshness on mc14 is CRITICAL: Puppet has not run in the last 10 hours [08:14:47] PROBLEM - Puppet freshness on mw53 is CRITICAL: Puppet has not run in the last 10 hours [08:14:47] PROBLEM - Puppet freshness on db55 is CRITICAL: Puppet has not run in the last 10 hours [08:14:48] PROBLEM - Puppet freshness on knsq16 is CRITICAL: Puppet has not run in the last 10 hours [08:14:48] PROBLEM - Puppet freshness on mw9 is CRITICAL: Puppet has not run in the last 10 hours [08:14:49] PROBLEM - Puppet freshness on snapshot3 is CRITICAL: Puppet has not run in the last 10 hours [08:14:49] PROBLEM - Puppet freshness on sq43 is CRITICAL: Puppet has not run in the last 10 hours [08:15:48] PROBLEM - Puppet freshness on amssq31 is CRITICAL: Puppet has not run in the last 10 hours [08:15:48] PROBLEM - Puppet freshness on constable is CRITICAL: Puppet has not run in the last 10 hours [08:15:48] PROBLEM - Puppet freshness on amssq42 is CRITICAL: Puppet has not run in the last 10 hours [08:15:48] PROBLEM - Puppet freshness on mw83 is CRITICAL: Puppet has not run in the last 10 hours [08:15:48] PROBLEM - Puppet freshness on db45 is CRITICAL: Puppet has not run in the last 10 hours [08:15:49] PROBLEM - Puppet freshness on sq86 is CRITICAL: Puppet has not run in the last 10 hours [08:15:49] PROBLEM - Puppet freshness on srv265 is CRITICAL: Puppet has not run in the last 10 hours [08:15:50] PROBLEM - Puppet freshness on mw125 is CRITICAL: Puppet has not run in the last 10 hours [08:15:50] PROBLEM - Puppet freshness on pdf3 is CRITICAL: Puppet has not run in the last 10 hours [08:16:51] PROBLEM - Puppet freshness on amssq54 is CRITICAL: Puppet has not run in the last 10 hours [08:16:51] PROBLEM - Puppet freshness on ersch is CRITICAL: Puppet has not run in the last 10 hours [08:16:51] PROBLEM - Puppet freshness on cp3021 is CRITICAL: Puppet has not run in the last 10 hours [08:16:51] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [08:16:51] PROBLEM - Puppet freshness on es3 is CRITICAL: Puppet has not run in the last 10 hours [08:16:52] PROBLEM - Puppet freshness on db52 is CRITICAL: Puppet has not run in the last 10 hours [08:16:52] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [08:16:53] PROBLEM - Puppet freshness on ms-fe2 is CRITICAL: Puppet has not run in the last 10 hours [08:16:53] PROBLEM - Puppet freshness on db34 is CRITICAL: Puppet has not run in the last 10 hours [08:16:54] PROBLEM - Puppet freshness on mw102 is CRITICAL: Puppet has not run in the last 10 hours [08:16:54] PROBLEM - Puppet freshness on amssq37 is CRITICAL: Puppet has not run in the last 10 hours [08:16:55] PROBLEM - Puppet freshness on mw86 is CRITICAL: Puppet has not run in the last 10 hours [08:16:55] PROBLEM - Puppet freshness on mw45 is CRITICAL: Puppet has not run in the last 10 hours [08:16:56] PROBLEM - Puppet freshness on mw96 is CRITICAL: Puppet has not run in the last 10 hours [08:16:56] PROBLEM - Puppet freshness on mw23 is CRITICAL: Puppet has not run in the last 10 hours [08:16:57] PROBLEM - Puppet freshness on sq52 is CRITICAL: Puppet has not run in the last 10 hours [08:16:57] PROBLEM - Puppet freshness on mw70 is CRITICAL: Puppet has not run in the last 10 hours [08:16:58] PROBLEM - Puppet freshness on sanger is CRITICAL: Puppet has not run in the last 10 hours [08:16:58] PROBLEM - Puppet freshness on amssq57 is CRITICAL: Puppet has not run in the last 10 hours [08:16:59] PROBLEM - Puppet freshness on srv274 is CRITICAL: Puppet has not run in the last 10 hours [08:16:59] PROBLEM - Puppet freshness on mw113 is CRITICAL: Puppet has not run in the last 10 hours [08:17:00] PROBLEM - Puppet freshness on srv284 is CRITICAL: Puppet has not run in the last 10 hours [08:17:00] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [08:17:01] PROBLEM - Puppet freshness on srv287 is CRITICAL: Puppet has not run in the last 10 hours [08:17:45] PROBLEM - Puppet freshness on amssq40 is CRITICAL: Puppet has not run in the last 10 hours [08:17:45] PROBLEM - Puppet freshness on cp3010 is CRITICAL: Puppet has not run in the last 10 hours [08:17:46] PROBLEM - Puppet freshness on es10 is CRITICAL: Puppet has not run in the last 10 hours [08:17:46] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [08:17:46] PROBLEM - Puppet freshness on mw27 is CRITICAL: Puppet has not run in the last 10 hours [08:17:46] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [08:17:46] PROBLEM - Puppet freshness on mw46 is CRITICAL: Puppet has not run in the last 10 hours [08:17:47] PROBLEM - Puppet freshness on mw51 is CRITICAL: Puppet has not run in the last 10 hours [08:17:47] PROBLEM - Puppet freshness on mw3 is CRITICAL: Puppet has not run in the last 10 hours [08:17:48] PROBLEM - Puppet freshness on mw93 is CRITICAL: Puppet has not run in the last 10 hours [08:17:48] PROBLEM - Puppet freshness on pc3 is CRITICAL: Puppet has not run in the last 10 hours [08:17:49] PROBLEM - Puppet freshness on nescio is CRITICAL: Puppet has not run in the last 10 hours [08:17:49] PROBLEM - Puppet freshness on snapshot1 is CRITICAL: Puppet has not run in the last 10 hours [08:17:50] PROBLEM - Puppet freshness on srv261 is CRITICAL: Puppet has not run in the last 10 hours [08:17:50] PROBLEM - Puppet freshness on search23 is CRITICAL: Puppet has not run in the last 10 hours [08:17:51] PROBLEM - Puppet freshness on srv283 is CRITICAL: Puppet has not run in the last 10 hours [08:18:48] PROBLEM - Puppet freshness on bellin is CRITICAL: Puppet has not run in the last 10 hours [08:18:48] PROBLEM - Puppet freshness on mw25 is CRITICAL: Puppet has not run in the last 10 hours [08:18:48] PROBLEM - Puppet freshness on mw26 is CRITICAL: Puppet has not run in the last 10 hours [08:18:49] PROBLEM - Puppet freshness on mw43 is CRITICAL: Puppet has not run in the last 10 hours [08:18:49] PROBLEM - Puppet freshness on srv242 is CRITICAL: Puppet has not run in the last 10 hours [08:18:49] PROBLEM - Puppet freshness on mw28 is CRITICAL: Puppet has not run in the last 10 hours [08:18:49] PROBLEM - Puppet freshness on db53 is CRITICAL: Puppet has not run in the last 10 hours [08:18:50] PROBLEM - Puppet freshness on mw4 is CRITICAL: Puppet has not run in the last 10 hours [08:18:50] PROBLEM - Puppet freshness on search28 is CRITICAL: Puppet has not run in the last 10 hours [08:18:51] PROBLEM - Puppet freshness on formey is CRITICAL: Puppet has not run in the last 10 hours [08:18:51] PROBLEM - Puppet freshness on virt7 is CRITICAL: Puppet has not run in the last 10 hours [08:19:51] PROBLEM - Puppet freshness on blondel is CRITICAL: Puppet has not run in the last 10 hours [08:19:51] PROBLEM - Puppet freshness on db26 is CRITICAL: Puppet has not run in the last 10 hours [08:19:51] PROBLEM - Puppet freshness on db67 is CRITICAL: Puppet has not run in the last 10 hours [08:19:51] PROBLEM - Puppet freshness on kaulen is CRITICAL: Puppet has not run in the last 10 hours [08:19:52] PROBLEM - Puppet freshness on db9 is CRITICAL: Puppet has not run in the last 10 hours [08:19:52] PROBLEM - Puppet freshness on mw10 is CRITICAL: Puppet has not run in the last 10 hours [08:19:52] PROBLEM - Puppet freshness on knsq19 is CRITICAL: Puppet has not run in the last 10 hours [08:19:53] PROBLEM - Puppet freshness on ms-be4 is CRITICAL: Puppet has not run in the last 10 hours [08:19:53] PROBLEM - Puppet freshness on mw64 is CRITICAL: Puppet has not run in the last 10 hours [08:19:54] PROBLEM - Puppet freshness on mw120 is CRITICAL: Puppet has not run in the last 10 hours [08:19:54] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [08:19:55] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [08:19:55] PROBLEM - Puppet freshness on search25 is CRITICAL: Puppet has not run in the last 10 hours [08:19:56] PROBLEM - Puppet freshness on search34 is CRITICAL: Puppet has not run in the last 10 hours [08:19:56] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [08:19:57] PROBLEM - Puppet freshness on solr1 is CRITICAL: Puppet has not run in the last 10 hours [08:19:57] PROBLEM - Puppet freshness on solr3 is CRITICAL: Puppet has not run in the last 10 hours [08:19:58] PROBLEM - Puppet freshness on srv239 is CRITICAL: Puppet has not run in the last 10 hours [08:19:58] PROBLEM - Puppet freshness on srv241 is CRITICAL: Puppet has not run in the last 10 hours [08:19:59] PROBLEM - Puppet freshness on sq57 is CRITICAL: Puppet has not run in the last 10 hours [08:19:59] PROBLEM - Puppet freshness on srv301 is CRITICAL: Puppet has not run in the last 10 hours [08:20:00] PROBLEM - Puppet freshness on srv296 is CRITICAL: Puppet has not run in the last 10 hours [08:20:00] PROBLEM - Puppet freshness on virt11 is CRITICAL: Puppet has not run in the last 10 hours [08:20:01] PROBLEM - Puppet freshness on sq41 is CRITICAL: Puppet has not run in the last 10 hours [08:20:01] PROBLEM - Puppet freshness on ssl4 is CRITICAL: Puppet has not run in the last 10 hours [08:20:08] PROBLEM - MySQL Replication Heartbeat on db1020 is CRITICAL: CRIT replication delay 185 seconds [08:20:18] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 190 seconds [08:20:28] PROBLEM - MySQL Replication Heartbeat on db1020 is CRITICAL: CRIT replication delay 198 seconds [08:20:28] PROBLEM - MySQL Slave Delay on db1020 is CRITICAL: CRIT replication delay 203 seconds [08:20:41] hmm, why nagios-wm's head is still on its shoulders? [08:20:45] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 205 seconds [08:20:46] PROBLEM - Puppet freshness on amssq49 is CRITICAL: Puppet has not run in the last 10 hours [08:20:46] PROBLEM - Puppet freshness on amssq50 is CRITICAL: Puppet has not run in the last 10 hours [08:20:46] PROBLEM - Puppet freshness on cp3009 is CRITICAL: Puppet has not run in the last 10 hours [08:20:46] PROBLEM - Puppet freshness on db48 is CRITICAL: Puppet has not run in the last 10 hours [08:20:46] PROBLEM - Puppet freshness on es5 is CRITICAL: Puppet has not run in the last 10 hours [08:20:46] PROBLEM - Puppet freshness on db32 is CRITICAL: Puppet has not run in the last 10 hours [08:20:47] PROBLEM - Puppet freshness on mc5 is CRITICAL: Puppet has not run in the last 10 hours [08:20:47] PROBLEM - Puppet freshness on mc1 is CRITICAL: Puppet has not run in the last 10 hours [08:20:48] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [08:20:48] PROBLEM - Puppet freshness on mw97 is CRITICAL: Puppet has not run in the last 10 hours [08:20:48] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 209 seconds [08:20:49] PROBLEM - Puppet freshness on mw99 is CRITICAL: Puppet has not run in the last 10 hours [08:20:49] PROBLEM - Puppet freshness on sq78 is CRITICAL: Puppet has not run in the last 10 hours [08:20:50] PROBLEM - Puppet freshness on srv236 is CRITICAL: Puppet has not run in the last 10 hours [08:20:50] PROBLEM - Puppet freshness on srv275 is CRITICAL: Puppet has not run in the last 10 hours [08:20:51] PROBLEM - Puppet freshness on srv245 is CRITICAL: Puppet has not run in the last 10 hours [08:20:51] PROBLEM - Puppet freshness on srv253 is CRITICAL: Puppet has not run in the last 10 hours [08:20:52] PROBLEM - Puppet freshness on tarin is CRITICAL: Puppet has not run in the last 10 hours [08:20:52] PROBLEM - Puppet freshness on srv258 is CRITICAL: Puppet has not run in the last 10 hours [08:21:03] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 218 seconds [08:21:12] PROBLEM - MySQL Slave Delay on db1020 is CRITICAL: CRIT replication delay 228 seconds [08:21:48] PROBLEM - Puppet freshness on cp3020 is CRITICAL: Puppet has not run in the last 10 hours [08:21:49] PROBLEM - Puppet freshness on mw36 is CRITICAL: Puppet has not run in the last 10 hours [08:21:49] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Puppet has not run in the last 10 hours [08:21:49] PROBLEM - Puppet freshness on mw42 is CRITICAL: Puppet has not run in the last 10 hours [08:21:49] PROBLEM - Puppet freshness on ocg1 is CRITICAL: Puppet has not run in the last 10 hours [08:21:49] PROBLEM - Puppet freshness on knsq28 is CRITICAL: Puppet has not run in the last 10 hours [08:21:49] PROBLEM - Puppet freshness on mw1197 is CRITICAL: Puppet has not run in the last 10 hours [08:21:50] PROBLEM - Puppet freshness on pdf2 is CRITICAL: Puppet has not run in the last 10 hours [08:21:50] PROBLEM - Puppet freshness on mw47 is CRITICAL: Puppet has not run in the last 10 hours [08:21:51] PROBLEM - Puppet freshness on sq49 is CRITICAL: Puppet has not run in the last 10 hours [08:21:51] PROBLEM - Puppet freshness on sq55 is CRITICAL: Puppet has not run in the last 10 hours [08:21:52] PROBLEM - Puppet freshness on sq64 is CRITICAL: Puppet has not run in the last 10 hours [08:21:52] PROBLEM - Puppet freshness on sq67 is CRITICAL: Puppet has not run in the last 10 hours [08:21:53] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [08:21:53] PROBLEM - Puppet freshness on srv193 is CRITICAL: Puppet has not run in the last 10 hours [08:21:54] PROBLEM - Puppet freshness on srv297 is CRITICAL: Puppet has not run in the last 10 hours [08:22:51] PROBLEM - Puppet freshness on amssq41 is CRITICAL: Puppet has not run in the last 10 hours [08:22:51] PROBLEM - Puppet freshness on db63 is CRITICAL: Puppet has not run in the last 10 hours [08:22:51] PROBLEM - Puppet freshness on cp3022 is CRITICAL: Puppet has not run in the last 10 hours [08:22:51] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [08:22:51] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [08:22:52] PROBLEM - Puppet freshness on ms5 is CRITICAL: Puppet has not run in the last 10 hours [08:22:52] PROBLEM - Puppet freshness on knsq18 is CRITICAL: Puppet has not run in the last 10 hours [08:22:53] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [08:22:53] PROBLEM - Puppet freshness on mw1104 is CRITICAL: Puppet has not run in the last 10 hours [08:22:54] PROBLEM - Puppet freshness on mw1145 is CRITICAL: Puppet has not run in the last 10 hours [08:22:54] PROBLEM - Puppet freshness on lvs1002 is CRITICAL: Puppet has not run in the last 10 hours [08:22:55] PROBLEM - Puppet freshness on mw114 is CRITICAL: Puppet has not run in the last 10 hours [08:22:55] PROBLEM - Puppet freshness on mw1143 is CRITICAL: Puppet has not run in the last 10 hours [08:22:56] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Puppet has not run in the last 10 hours [08:22:56] PROBLEM - Puppet freshness on mw1177 is CRITICAL: Puppet has not run in the last 10 hours [08:22:57] PROBLEM - Puppet freshness on mw57 is CRITICAL: Puppet has not run in the last 10 hours [08:22:57] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Puppet has not run in the last 10 hours [08:22:58] PROBLEM - Puppet freshness on srv235 is CRITICAL: Puppet has not run in the last 10 hours [08:22:58] PROBLEM - Puppet freshness on mw61 is CRITICAL: Puppet has not run in the last 10 hours [08:22:59] PROBLEM - Puppet freshness on ssl2 is CRITICAL: Puppet has not run in the last 10 hours [08:22:59] PROBLEM - Puppet freshness on sq77 is CRITICAL: Puppet has not run in the last 10 hours [08:23:00] PROBLEM - Puppet freshness on srv264 is CRITICAL: Puppet has not run in the last 10 hours [08:23:00] PROBLEM - Puppet freshness on srv271 is CRITICAL: Puppet has not run in the last 10 hours [08:23:08] apergos, time to whack it down? [08:23:54] PROBLEM - Puppet freshness on db1027 is CRITICAL: Puppet has not run in the last 10 hours [08:23:55] PROBLEM - Puppet freshness on cp1032 is CRITICAL: Puppet has not run in the last 10 hours [08:23:55] PROBLEM - Puppet freshness on db57 is CRITICAL: Puppet has not run in the last 10 hours [08:23:55] PROBLEM - Puppet freshness on mc15 is CRITICAL: Puppet has not run in the last 10 hours [08:23:55] PROBLEM - Puppet freshness on ekrem is CRITICAL: Puppet has not run in the last 10 hours [08:23:55] PROBLEM - Puppet freshness on mc3 is CRITICAL: Puppet has not run in the last 10 hours [08:23:55] PROBLEM - Puppet freshness on ms-fe3 is CRITICAL: Puppet has not run in the last 10 hours [08:23:56] PROBLEM - Puppet freshness on db36 is CRITICAL: Puppet has not run in the last 10 hours [08:23:56] PROBLEM - Puppet freshness on mw1044 is CRITICAL: Puppet has not run in the last 10 hours [08:23:57] PROBLEM - Puppet freshness on mw112 is CRITICAL: Puppet has not run in the last 10 hours [08:23:57] PROBLEM - Puppet freshness on mw1081 is CRITICAL: Puppet has not run in the last 10 hours [08:23:58] PROBLEM - Puppet freshness on mw1015 is CRITICAL: Puppet has not run in the last 10 hours [08:23:58] PROBLEM - Puppet freshness on search32 is CRITICAL: Puppet has not run in the last 10 hours [08:23:59] PROBLEM - Puppet freshness on snapshot2 is CRITICAL: Puppet has not run in the last 10 hours [08:23:59] PROBLEM - Puppet freshness on mw123 is CRITICAL: Puppet has not run in the last 10 hours [08:24:00] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: Puppet has not run in the last 10 hours [08:24:00] PROBLEM - Puppet freshness on mw85 is CRITICAL: Puppet has not run in the last 10 hours [08:24:01] PROBLEM - Puppet freshness on srv249 is CRITICAL: Puppet has not run in the last 10 hours [08:24:01] PROBLEM - Puppet freshness on virt9 is CRITICAL: Puppet has not run in the last 10 hours [08:24:02] PROBLEM - Puppet freshness on srv238 is CRITICAL: Puppet has not run in the last 10 hours [08:24:02] PROBLEM - Puppet freshness on mw1150 is CRITICAL: Puppet has not run in the last 10 hours [08:24:07] yeah I'm not sure what's going on there, prolly related to leslie's changes yesterday [08:24:50] PROBLEM - Puppet freshness on amssq36 is CRITICAL: Puppet has not run in the last 10 hours [08:24:50] PROBLEM - Puppet freshness on db35 is CRITICAL: Puppet has not run in the last 10 hours [08:24:50] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [08:24:50] PROBLEM - Puppet freshness on db50 is CRITICAL: Puppet has not run in the last 10 hours [08:24:50] PROBLEM - Puppet freshness on mc1001 is CRITICAL: Puppet has not run in the last 10 hours [08:24:54] heh [08:25:22] Reedy: as I lok at eg entries on pc1 [08:25:39] they all seem to be within range [08:25:50] eg select keyname, exptime from pc248 order by exptime asc limit 3; [08:26:06] enwiki:pcache:idhash:29566059-0!*!0!!*!4!* | 2013-02-24 07:19:50 [08:26:37] I didn't look specifically for foundationwiki entries [08:27:57] PROBLEM - Puppet freshness on amssq51 is CRITICAL: Puppet has not run in the last 10 hours [08:27:58] PROBLEM - Puppet freshness on arsenic is CRITICAL: Puppet has not run in the last 10 hours [08:27:58] PROBLEM - Puppet freshness on cp1006 is CRITICAL: Puppet has not run in the last 10 hours [08:27:58] PROBLEM - Puppet freshness on cp1028 is CRITICAL: Puppet has not run in the last 10 hours [08:27:58] PROBLEM - Puppet freshness on cp1033 is CRITICAL: Puppet has not run in the last 10 hours [08:27:58] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Puppet has not run in the last 10 hours [08:29:55] PROBLEM - Puppet freshness on db31 is CRITICAL: Puppet has not run in the last 10 hours [08:29:55] PROBLEM - Puppet freshness on db37 is CRITICAL: Puppet has not run in the last 10 hours [08:29:55] PROBLEM - Puppet freshness on labstore3 is CRITICAL: Puppet has not run in the last 10 hours [08:29:55] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [08:29:55] PROBLEM - Puppet freshness on amssq47 is CRITICAL: Puppet has not run in the last 10 hours [08:29:56] PROBLEM - Puppet freshness on mw1076 is CRITICAL: Puppet has not run in the last 10 hours [08:29:56] PROBLEM - Puppet freshness on mw1123 is CRITICAL: Puppet has not run in the last 10 hours [08:29:57] PROBLEM - Puppet freshness on mw105 is CRITICAL: Puppet has not run in the last 10 hours [08:29:57] PROBLEM - Puppet freshness on mw1149 is CRITICAL: Puppet has not run in the last 10 hours [08:29:58] PROBLEM - Puppet freshness on mw111 is CRITICAL: Puppet has not run in the last 10 hours [08:29:58] PROBLEM - Puppet freshness on mw1140 is CRITICAL: Puppet has not run in the last 10 hours [08:29:59] PROBLEM - Puppet freshness on stafford is CRITICAL: Puppet has not run in the last 10 hours [08:29:59] PROBLEM - Puppet freshness on mw14 is CRITICAL: Puppet has not run in the last 10 hours [08:30:00] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [08:30:00] PROBLEM - Puppet freshness on tmh1 is CRITICAL: Puppet has not run in the last 10 hours [08:30:01] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: Puppet has not run in the last 10 hours [08:30:45] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [08:30:48] PROBLEM - Puppet freshness on knsq27 is CRITICAL: Puppet has not run in the last 10 hours [08:30:49] PROBLEM - Puppet freshness on mw1074 is CRITICAL: Puppet has not run in the last 10 hours [08:30:49] PROBLEM - Puppet freshness on mw1086 is CRITICAL: Puppet has not run in the last 10 hours [08:30:49] PROBLEM - Puppet freshness on mw118 is CRITICAL: Puppet has not run in the last 10 hours [08:30:49] PROBLEM - Puppet freshness on mw76 is CRITICAL: Puppet has not run in the last 10 hours [08:30:49] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [08:30:49] PROBLEM - Puppet freshness on sq73 is CRITICAL: Puppet has not run in the last 10 hours [08:30:50] PROBLEM - Puppet freshness on sq80 is CRITICAL: Puppet has not run in the last 10 hours [08:30:50] PROBLEM - Puppet freshness on mw35 is CRITICAL: Puppet has not run in the last 10 hours [08:30:51] PROBLEM - Puppet freshness on mw1095 is CRITICAL: Puppet has not run in the last 10 hours [08:30:51] PROBLEM - Puppet freshness on srv263 is CRITICAL: Puppet has not run in the last 10 hours [08:30:52] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [08:30:52] PROBLEM - Puppet freshness on sq79 is CRITICAL: Puppet has not run in the last 10 hours [08:30:53] PROBLEM - Puppet freshness on srv290 is CRITICAL: Puppet has not run in the last 10 hours [08:30:53] PROBLEM - Puppet freshness on srv285 is CRITICAL: Puppet has not run in the last 10 hours [08:30:54] PROBLEM - Puppet freshness on virt8 is CRITICAL: Puppet has not run in the last 10 hours [08:30:54] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [08:31:00] apergos: What's the simplest way to log onto the pc boxen with sql? [08:31:08] uh [08:31:28] normaly sql enwiki -h dbwhatever is enough [08:31:35] I used a 4 line script ' [08:31:35] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.004 second response time on port 8123 [08:31:37] ~/mysql_atg -h pc1 [08:31:39] :-P [08:31:42] PROBLEM - Puppet freshness on amssq48 is CRITICAL: Puppet has not run in the last 10 hours [08:31:43] PROBLEM - Puppet freshness on cp1007 is CRITICAL: Puppet has not run in the last 10 hours [08:31:43] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [08:31:43] PROBLEM - Puppet freshness on knsq22 is CRITICAL: Puppet has not run in the last 10 hours [08:31:43] PROBLEM - Puppet freshness on db66 is CRITICAL: Puppet has not run in the last 10 hours [08:31:47] feel free to examine it. it's very dumb [08:32:08] (I'm off that db so feel free [08:32:09] ) [08:32:20] works for me :) [08:32:47] Yeah, simplified version of sql [08:32:56] yup [08:33:18] The difficulty here is all the shards [08:33:36] !log restarted lucene search on search1015 [08:33:38] Logged the message, Master [08:33:47] yes I checkd like 5, not 255 of them [08:33:50] PROBLEM - Puppet freshness on ms1001 is CRITICAL: Puppet has not run in the last 10 hours [08:33:50] PROBLEM - Puppet freshness on sq44 is CRITICAL: Puppet has not run in the last 10 hours [08:33:51] PROBLEM - Puppet freshness on mw78 is CRITICAL: Puppet has not run in the last 10 hours [08:33:51] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [08:33:52] PROBLEM - Puppet freshness on sq58 is CRITICAL: Puppet has not run in the last 10 hours [08:33:52] PROBLEM - Puppet freshness on hooft is CRITICAL: Puppet has not run in the last 10 hours [08:33:53] PROBLEM - Puppet freshness on srv250 is CRITICAL: Puppet has not run in the last 10 hours [08:33:53] PROBLEM - Puppet freshness on sq84 is CRITICAL: Puppet has not run in the last 10 hours [08:33:54] PROBLEM - Puppet freshness on srv252 is CRITICAL: Puppet has not run in the last 10 hours [08:33:54] PROBLEM - Puppet freshness on srv268 is CRITICAL: Puppet has not run in the last 10 hours [08:33:55] PROBLEM - Puppet freshness on srv272 is CRITICAL: Puppet has not run in the last 10 hours [08:33:55] PROBLEM - Puppet freshness on ssl1001 is CRITICAL: Puppet has not run in the last 10 hours [08:34:02] you could script it I guess [08:34:23] for i in seq 1 255; do.... [08:34:51] PROBLEM - Puppet freshness on amssq35 is CRITICAL: Puppet has not run in the last 10 hours [08:34:51] PROBLEM - Puppet freshness on amssq56 is CRITICAL: Puppet has not run in the last 10 hours [08:34:51] PROBLEM - Puppet freshness on db1017 is CRITICAL: Puppet has not run in the last 10 hours [08:34:51] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [08:34:51] PROBLEM - Puppet freshness on db1042 is CRITICAL: Puppet has not run in the last 10 hours [08:34:52] PROBLEM - Puppet freshness on mw1204 is CRITICAL: Puppet has not run in the last 10 hours [08:34:52] PROBLEM - Puppet freshness on es1007 is CRITICAL: Puppet has not run in the last 10 hours [08:34:53] PROBLEM - Puppet freshness on mw1041 is CRITICAL: Puppet has not run in the last 10 hours [08:34:53] PROBLEM - Puppet freshness on mw1058 is CRITICAL: Puppet has not run in the last 10 hours [08:34:54] PROBLEM - Puppet freshness on search1009 is CRITICAL: Puppet has not run in the last 10 hours [08:35:35] RECOVERY - MySQL Slave Delay on db1020 is OK: OK replication delay 0 seconds [08:35:36] RECOVERY - MySQL Slave Delay on db1020 is OK: OK replication delay 0 seconds [08:35:47] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [08:35:47] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [08:35:47] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [08:35:47] PROBLEM - Puppet freshness on es1008 is CRITICAL: Puppet has not run in the last 10 hours [08:35:47] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [08:36:05] RECOVERY - MySQL Replication Heartbeat on db1020 is OK: OK replication delay 1 seconds [08:36:15] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 28 seconds [08:36:35] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 30 seconds [08:36:48] RECOVERY - MySQL Replication Heartbeat on db1020 is OK: OK replication delay 1 seconds [08:36:48] PROBLEM - Puppet freshness on cp1020 is CRITICAL: Puppet has not run in the last 10 hours [08:36:48] PROBLEM - Puppet freshness on mw1097 is CRITICAL: Puppet has not run in the last 10 hours [08:36:48] PROBLEM - Puppet freshness on stat1001 is CRITICAL: Puppet has not run in the last 10 hours [08:36:48] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [08:36:49] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [08:36:49] PROBLEM - Puppet freshness on amssq33 is CRITICAL: Puppet has not run in the last 10 hours [08:36:50] PROBLEM - Puppet freshness on db1001 is CRITICAL: Puppet has not run in the last 10 hours [08:36:50] PROBLEM - Puppet freshness on mw1203 is CRITICAL: Puppet has not run in the last 10 hours [08:37:51] PROBLEM - Puppet freshness on mw1183 is CRITICAL: Puppet has not run in the last 10 hours [08:37:51] PROBLEM - Puppet freshness on amssq60 is CRITICAL: Puppet has not run in the last 10 hours [08:37:51] PROBLEM - Puppet freshness on wtp1004 is CRITICAL: Puppet has not run in the last 10 hours [08:37:51] PROBLEM - Puppet freshness on mw1184 is CRITICAL: Puppet has not run in the last 10 hours [08:37:52] PROBLEM - Puppet freshness on wtp1002 is CRITICAL: Puppet has not run in the last 10 hours [08:38:09] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , commonswiki (834952), Total (843378) [08:38:45] PROBLEM - Puppet freshness on amssq44 is CRITICAL: Puppet has not run in the last 10 hours [08:38:46] PROBLEM - Puppet freshness on lvs2 is CRITICAL: Puppet has not run in the last 10 hours [08:38:46] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [08:38:46] PROBLEM - Puppet freshness on cp1023 is CRITICAL: Puppet has not run in the last 10 hours [08:38:46] PROBLEM - Puppet freshness on mc1009 is CRITICAL: Puppet has not run in the last 10 hours [08:38:46] PROBLEM - Puppet freshness on mw1019 is CRITICAL: Puppet has not run in the last 10 hours [08:38:46] PROBLEM - Puppet freshness on cp1036 is CRITICAL: Puppet has not run in the last 10 hours [08:38:47] PROBLEM - Puppet freshness on mw1034 is CRITICAL: Puppet has not run in the last 10 hours [08:38:47] PROBLEM - Puppet freshness on mw1030 is CRITICAL: Puppet has not run in the last 10 hours [08:38:48] PROBLEM - Puppet freshness on mw1064 is CRITICAL: Puppet has not run in the last 10 hours [08:38:48] PROBLEM - Puppet freshness on mw1182 is CRITICAL: Puppet has not run in the last 10 hours [08:38:49] PROBLEM - Puppet freshness on mw1201 is CRITICAL: Puppet has not run in the last 10 hours [08:38:49] PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours [08:38:50] PROBLEM - Puppet freshness on strontium is CRITICAL: Puppet has not run in the last 10 hours [08:39:48] PROBLEM - Puppet freshness on amssq52 is CRITICAL: Puppet has not run in the last 10 hours [08:39:48] PROBLEM - Puppet freshness on cp1014 is CRITICAL: Puppet has not run in the last 10 hours [08:39:48] PROBLEM - Puppet freshness on cp1034 is CRITICAL: Puppet has not run in the last 10 hours [08:39:48] PROBLEM - Puppet freshness on amssq55 is CRITICAL: Puppet has not run in the last 10 hours [08:39:48] PROBLEM - Puppet freshness on mw1042 is CRITICAL: Puppet has not run in the last 10 hours [08:39:49] PROBLEM - Puppet freshness on db1039 is CRITICAL: Puppet has not run in the last 10 hours [08:39:49] PROBLEM - Puppet freshness on titanium is CRITICAL: Puppet has not run in the last 10 hours [08:39:50] PROBLEM - Puppet freshness on mw1089 is CRITICAL: Puppet has not run in the last 10 hours [08:39:50] PROBLEM - Puppet freshness on search1019 is CRITICAL: Puppet has not run in the last 10 hours [08:40:24] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , commonswiki (825926), Total (832500) [08:40:51] PROBLEM - Puppet freshness on es1009 is CRITICAL: Puppet has not run in the last 10 hours [08:40:51] PROBLEM - Puppet freshness on mw1068 is CRITICAL: Puppet has not run in the last 10 hours [08:40:51] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [08:40:51] PROBLEM - Puppet freshness on mw1088 is CRITICAL: Puppet has not run in the last 10 hours [08:40:51] PROBLEM - Puppet freshness on mw1107 is CRITICAL: Puppet has not run in the last 10 hours [08:40:52] PROBLEM - Puppet freshness on mw1200 is CRITICAL: Puppet has not run in the last 10 hours [08:40:52] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [08:40:53] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [08:40:53] PROBLEM - Puppet freshness on tin is CRITICAL: Puppet has not run in the last 10 hours [08:40:54] PROBLEM - Puppet freshness on amssq32 is CRITICAL: Puppet has not run in the last 10 hours [08:41:45] PROBLEM - Puppet freshness on amssq45 is CRITICAL: Puppet has not run in the last 10 hours [08:41:45] PROBLEM - Puppet freshness on db1035 is CRITICAL: Puppet has not run in the last 10 hours [08:41:45] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [08:41:45] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [08:41:45] PROBLEM - Puppet freshness on mw1075 is CRITICAL: Puppet has not run in the last 10 hours [08:42:48] PROBLEM - Puppet freshness on knsq24 is CRITICAL: Puppet has not run in the last 10 hours [08:42:48] PROBLEM - Puppet freshness on mw1079 is CRITICAL: Puppet has not run in the last 10 hours [08:42:48] PROBLEM - Puppet freshness on mw1080 is CRITICAL: Puppet has not run in the last 10 hours [08:42:48] PROBLEM - Puppet freshness on mw1138 is CRITICAL: Puppet has not run in the last 10 hours [08:42:49] PROBLEM - Puppet freshness on mw1146 is CRITICAL: Puppet has not run in the last 10 hours [08:42:49] PROBLEM - Puppet freshness on solr1002 is CRITICAL: Puppet has not run in the last 10 hours [08:42:49] PROBLEM - Puppet freshness on mw1206 is CRITICAL: Puppet has not run in the last 10 hours [08:42:50] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [08:43:29] reedy@fenari:~$ mwscript purgeParserCache.php aawiki --age=2592000 [08:43:30] Deleting objects expiring before 08:39, 27 February 2013 [08:43:37] $date = wfTimestamp( TS_MW, time() + $wgParserCacheExpireTime - intval( $inputAge ) ); [08:43:51] PROBLEM - Puppet freshness on amssq34 is CRITICAL: Puppet has not run in the last 10 hours [08:43:51] PROBLEM - Puppet freshness on amssq46 is CRITICAL: Puppet has not run in the last 10 hours [08:43:51] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [08:43:51] PROBLEM - Puppet freshness on db1033 is CRITICAL: Puppet has not run in the last 10 hours [08:43:51] PROBLEM - Puppet freshness on es1006 is CRITICAL: Puppet has not run in the last 10 hours [08:43:52] PROBLEM - Puppet freshness on cp1008 is CRITICAL: Puppet has not run in the last 10 hours [08:43:52] PROBLEM - Puppet freshness on mw1001 is CRITICAL: Puppet has not run in the last 10 hours [08:43:53] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [08:43:53] PROBLEM - Puppet freshness on mw1069 is CRITICAL: Puppet has not run in the last 10 hours [08:43:54] PROBLEM - Puppet freshness on mw1176 is CRITICAL: Puppet has not run in the last 10 hours [08:43:54] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Puppet has not run in the last 10 hours [08:43:55] PROBLEM - Puppet freshness on mw1115 is CRITICAL: Puppet has not run in the last 10 hours [08:43:55] PROBLEM - Puppet freshness on search1007 is CRITICAL: Puppet has not run in the last 10 hours [08:44:23] Shouldn't it be expires before 27th Jan? [08:44:45] PROBLEM - Puppet freshness on cp1001 is CRITICAL: Puppet has not run in the last 10 hours [08:44:45] PROBLEM - Puppet freshness on db1049 is CRITICAL: Puppet has not run in the last 10 hours [08:44:45] PROBLEM - Puppet freshness on knsq26 is CRITICAL: Puppet has not run in the last 10 hours [08:44:45] PROBLEM - Puppet freshness on mw1144 is CRITICAL: Puppet has not run in the last 10 hours [08:44:45] PROBLEM - Puppet freshness on mw1027 is CRITICAL: Puppet has not run in the last 10 hours [08:44:46] PROBLEM - Puppet freshness on search1013 is CRITICAL: Puppet has not run in the last 10 hours [08:44:46] PROBLEM - Puppet freshness on mw1166 is CRITICAL: Puppet has not run in the last 10 hours [08:44:47] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [08:44:47] PROBLEM - Puppet freshness on search1020 is CRITICAL: Puppet has not run in the last 10 hours [08:44:48] PROBLEM - Puppet freshness on zirconium is CRITICAL: Puppet has not run in the last 10 hours [08:44:48] PROBLEM - Puppet freshness on search1024 is CRITICAL: Puppet has not run in the last 10 hours [08:44:55] before Feb 27 2013? [08:45:01] yes it should [08:45:08] gah [08:45:47] that explains why the entries I saw were all feb 24 [08:45:48] PROBLEM - Puppet freshness on mw1118 is CRITICAL: Puppet has not run in the last 10 hours [08:45:48] PROBLEM - Puppet freshness on mw1111 is CRITICAL: Puppet has not run in the last 10 hours [08:45:48] PROBLEM - Puppet freshness on mw1129 is CRITICAL: Puppet has not run in the last 10 hours [08:45:48] PROBLEM - Puppet freshness on mw1105 is CRITICAL: Puppet has not run in the last 10 hours [08:45:48] PROBLEM - Puppet freshness on mw1161 is CRITICAL: Puppet has not run in the last 10 hours [08:45:49] PROBLEM - Puppet freshness on ssl1002 is CRITICAL: Puppet has not run in the last 10 hours [08:45:49] PROBLEM - Puppet freshness on helium is CRITICAL: Puppet has not run in the last 10 hours [08:46:30] I wonder why the code is time now + the expire time - the age entered [08:46:50] ok so now we hae three issues pending: footer, purge script purges too much, and where did that foundationwiki item from sept come from [08:46:51] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [08:46:51] PROBLEM - Puppet freshness on cp1002 is CRITICAL: Puppet has not run in the last 10 hours [08:46:51] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [08:46:51] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [08:46:51] PROBLEM - Puppet freshness on mw1017 is CRITICAL: Puppet has not run in the last 10 hours [08:46:52] PROBLEM - Puppet freshness on mw1024 is CRITICAL: Puppet has not run in the last 10 hours [08:46:52] PROBLEM - Puppet freshness on mc1016 is CRITICAL: Puppet has not run in the last 10 hours [08:46:53] time() - intval( $inputAge ) [08:46:53] PROBLEM - Puppet freshness on mw1132 is CRITICAL: Puppet has not run in the last 10 hours [08:46:53] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [08:46:54] PROBLEM - Puppet freshness on wtp1001 is CRITICAL: Puppet has not run in the last 10 hours [08:47:05] if we keep looking a little longer I guess we'll be up to 5 or 6 things pending [08:47:23] well, fixing the script is easy enough [08:47:34] * Reedy checks the log first [08:47:45] PROBLEM - Puppet freshness on db1015 is CRITICAL: Puppet has not run in the last 10 hours [08:47:45] PROBLEM - Puppet freshness on mw1038 is CRITICAL: Puppet has not run in the last 10 hours [08:47:46] PROBLEM - Puppet freshness on mw1039 is CRITICAL: Puppet has not run in the last 10 hours [08:47:46] PROBLEM - Puppet freshness on mw1087 is CRITICAL: Puppet has not run in the last 10 hours [08:47:46] PROBLEM - Puppet freshness on ssl1003 is CRITICAL: Puppet has not run in the last 10 hours [08:48:32] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 181 seconds [08:48:43] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 184 seconds [08:48:48] PROBLEM - Puppet freshness on cp1003 is CRITICAL: Puppet has not run in the last 10 hours [08:48:48] PROBLEM - Puppet freshness on mw1033 is CRITICAL: Puppet has not run in the last 10 hours [08:48:48] PROBLEM - Puppet freshness on cp1021 is CRITICAL: Puppet has not run in the last 10 hours [08:48:49] PROBLEM - Puppet freshness on mw1090 is CRITICAL: Puppet has not run in the last 10 hours [08:48:49] PROBLEM - Puppet freshness on mw1165 is CRITICAL: Puppet has not run in the last 10 hours [08:48:49] PROBLEM - Puppet freshness on mw1063 is CRITICAL: Puppet has not run in the last 10 hours [08:48:49] PROBLEM - Puppet freshness on pc1003 is CRITICAL: Puppet has not run in the last 10 hours [08:48:50] PROBLEM - Puppet freshness on mw1124 is CRITICAL: Puppet has not run in the last 10 hours [08:48:50] PROBLEM - Puppet freshness on mw1094 is CRITICAL: Puppet has not run in the last 10 hours [08:48:58] cac02778 (Tim Starling 2011-09-09 03:51:45 +0000 52) global $wgParserCacheExpireTime; [08:48:58] cac02778 (Tim Starling 2011-09-09 03:51:45 +0000 53) $date = wfTimestamp( TS_MW, time() + $wgParserCacheExpireTime - intval( $inputAge ) ); [08:49:33] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 193 seconds [08:49:51] PROBLEM - Puppet freshness on mw1078 is CRITICAL: Puppet has not run in the last 10 hours [08:49:52] PROBLEM - Puppet freshness on cp1011 is CRITICAL: Puppet has not run in the last 10 hours [08:49:52] PROBLEM - Puppet freshness on db1002 is CRITICAL: Puppet has not run in the last 10 hours [08:49:52] PROBLEM - Puppet freshness on mw1082 is CRITICAL: Puppet has not run in the last 10 hours [08:49:52] PROBLEM - Puppet freshness on db1048 is CRITICAL: Puppet has not run in the last 10 hours [08:49:52] PROBLEM - Puppet freshness on hydrogen is CRITICAL: Puppet has not run in the last 10 hours [08:49:52] PROBLEM - Puppet freshness on search1002 is CRITICAL: Puppet has not run in the last 10 hours [08:49:53] PROBLEM - Puppet freshness on mw1091 is CRITICAL: Puppet has not run in the last 10 hours [08:49:53] PROBLEM - Puppet freshness on mw1160 is CRITICAL: Puppet has not run in the last 10 hours [08:50:18] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 204 seconds [08:50:45] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [08:50:45] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [08:50:45] PROBLEM - Puppet freshness on db1020 is CRITICAL: Puppet has not run in the last 10 hours [08:50:45] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [08:50:45] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Puppet has not run in the last 10 hours [08:50:46] PROBLEM - Puppet freshness on mc1003 is CRITICAL: Puppet has not run in the last 10 hours [08:50:46] PROBLEM - Puppet freshness on mw1040 is CRITICAL: Puppet has not run in the last 10 hours [08:50:47] PROBLEM - Puppet freshness on mw1113 is CRITICAL: Puppet has not run in the last 10 hours [08:50:47] PROBLEM - Puppet freshness on mw1043 is CRITICAL: Puppet has not run in the last 10 hours [08:50:48] PROBLEM - Puppet freshness on mw1136 is CRITICAL: Puppet has not run in the last 10 hours [08:50:48] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [08:50:49] PROBLEM - Puppet freshness on mw1174 is CRITICAL: Puppet has not run in the last 10 hours [08:50:49] PROBLEM - Puppet freshness on mw1190 is CRITICAL: Puppet has not run in the last 10 hours [08:50:50] PROBLEM - Puppet freshness on mw1199 is CRITICAL: Puppet has not run in the last 10 hours [08:50:50] PROBLEM - Puppet freshness on mw1170 is CRITICAL: Puppet has not run in the last 10 hours [08:50:51] PROBLEM - Puppet freshness on search1016 is CRITICAL: Puppet has not run in the last 10 hours [08:50:51] PROBLEM - Puppet freshness on snapshot1003 is CRITICAL: Puppet has not run in the last 10 hours [08:50:52] PROBLEM - Puppet freshness on oxygen is CRITICAL: Puppet has not run in the last 10 hours [08:51:18] https://gerrit.wikimedia.org/r/51117 [08:51:35] looks like stuff isn't really going to irc.log for icinga yet, I'm going to have to ask leslie to look at that [08:51:48] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [08:51:48] PROBLEM - Puppet freshness on db1043 is CRITICAL: Puppet has not run in the last 10 hours [08:51:48] PROBLEM - Puppet freshness on cp1018 is CRITICAL: Puppet has not run in the last 10 hours [08:51:48] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [08:51:48] PROBLEM - Puppet freshness on fluorine is CRITICAL: Puppet has not run in the last 10 hours [08:51:49] PROBLEM - Puppet freshness on db1008 is CRITICAL: Puppet has not run in the last 10 hours [08:51:49] PROBLEM - Puppet freshness on db1025 is CRITICAL: Puppet has not run in the last 10 hours [08:51:50] PROBLEM - Puppet freshness on db1030 is CRITICAL: Puppet has not run in the last 10 hours [08:51:50] PROBLEM - Puppet freshness on iron is CRITICAL: Puppet has not run in the last 10 hours [08:51:51] PROBLEM - Puppet freshness on mc1015 is CRITICAL: Puppet has not run in the last 10 hours [08:51:51] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [08:51:52] PROBLEM - Puppet freshness on mw1005 is CRITICAL: Puppet has not run in the last 10 hours [08:51:52] PROBLEM - Puppet freshness on mw1023 is CRITICAL: Puppet has not run in the last 10 hours [08:51:53] PROBLEM - Puppet freshness on search1015 is CRITICAL: Puppet has not run in the last 10 hours [08:51:53] PROBLEM - Puppet freshness on mw1056 is CRITICAL: Puppet has not run in the last 10 hours [08:52:51] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [08:52:51] PROBLEM - Puppet freshness on db1045 is CRITICAL: Puppet has not run in the last 10 hours [08:52:51] PROBLEM - Puppet freshness on mw1092 is CRITICAL: Puppet has not run in the last 10 hours [08:52:51] PROBLEM - Puppet freshness on mw1022 is CRITICAL: Puppet has not run in the last 10 hours [08:52:52] PROBLEM - Puppet freshness on caesium is CRITICAL: Puppet has not run in the last 10 hours [08:52:52] PROBLEM - Puppet freshness on mw1051 is CRITICAL: Puppet has not run in the last 10 hours [08:52:52] PROBLEM - Puppet freshness on nitrogen is CRITICAL: Puppet has not run in the last 10 hours [08:52:53] PROBLEM - Puppet freshness on mw1152 is CRITICAL: Puppet has not run in the last 10 hours [08:52:53] PROBLEM - Puppet freshness on mw1180 is CRITICAL: Puppet has not run in the last 10 hours [08:52:54] PROBLEM - Puppet freshness on chromium is CRITICAL: Puppet has not run in the last 10 hours [08:52:54] PROBLEM - Puppet freshness on mw1109 is CRITICAL: Puppet has not run in the last 10 hours [08:52:55] PROBLEM - Puppet freshness on search1011 is CRITICAL: Puppet has not run in the last 10 hours [08:52:55] PROBLEM - Puppet freshness on search1023 is CRITICAL: Puppet has not run in the last 10 hours [08:52:56] PROBLEM - Puppet freshness on mw1195 is CRITICAL: Puppet has not run in the last 10 hours [08:52:56] PROBLEM - Puppet freshness on solr1003 is CRITICAL: Puppet has not run in the last 10 hours [08:52:57] PROBLEM - Puppet freshness on search1012 is CRITICAL: Puppet has not run in the last 10 hours [08:53:32] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [08:53:34] apergos why we even use icinga? [08:53:45] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [08:53:45] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [08:53:45] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [08:53:45] PROBLEM - Puppet freshness on cp1005 is CRITICAL: Puppet has not run in the last 10 hours [08:53:45] PROBLEM - Puppet freshness on mw1028 is CRITICAL: Puppet has not run in the last 10 hours [08:53:46] PROBLEM - Puppet freshness on mw1006 is CRITICAL: Puppet has not run in the last 10 hours [08:53:46] PROBLEM - Puppet freshness on lvs6 is CRITICAL: Puppet has not run in the last 10 hours [08:53:47] PROBLEM - Puppet freshness on mw1052 is CRITICAL: Puppet has not run in the last 10 hours [08:53:47] PROBLEM - Puppet freshness on snapshot1004 is CRITICAL: Puppet has not run in the last 10 hours [08:53:54] it's a nagios fork with a much better front end and some new features [08:53:59] aha [08:54:09] there was new version of nagios released recently :P [08:54:48] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 186 seconds [08:54:48] PROBLEM - Puppet freshness on db1036 is CRITICAL: Puppet has not run in the last 10 hours [08:54:48] PROBLEM - Puppet freshness on cp1035 is CRITICAL: Puppet has not run in the last 10 hours [08:54:49] PROBLEM - Puppet freshness on cp1010 is CRITICAL: Puppet has not run in the last 10 hours [08:54:49] PROBLEM - Puppet freshness on cp1029 is CRITICAL: Puppet has not run in the last 10 hours [08:54:49] PROBLEM - Puppet freshness on ms-be1012 is CRITICAL: Puppet has not run in the last 10 hours [08:54:49] PROBLEM - Puppet freshness on mw1059 is CRITICAL: Puppet has not run in the last 10 hours [08:54:50] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [08:54:50] PROBLEM - Puppet freshness on mw1008 is CRITICAL: Puppet has not run in the last 10 hours [08:54:51] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: Puppet has not run in the last 10 hours [08:54:51] PROBLEM - Puppet freshness on mw1193 is CRITICAL: Puppet has not run in the last 10 hours [08:55:12] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 210 seconds [08:55:34] Reedy, I'm looking at the gerrit change, give me a min [08:55:51] PROBLEM - Puppet freshness on lvs1 is CRITICAL: Puppet has not run in the last 10 hours [08:55:51] PROBLEM - Puppet freshness on es1010 is CRITICAL: Puppet has not run in the last 10 hours [08:55:51] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [08:55:51] PROBLEM - Puppet freshness on mw1164 is CRITICAL: Puppet has not run in the last 10 hours [08:55:51] PROBLEM - Puppet freshness on mw1127 is CRITICAL: Puppet has not run in the last 10 hours [08:55:52] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [08:55:52] PROBLEM - Puppet freshness on mw1198 is CRITICAL: Puppet has not run in the last 10 hours [08:55:54] heh [08:55:57] Expire time: 2014-02-27 07:58:14 [08:56:20] ugh [08:56:45] PROBLEM - Puppet freshness on mw1096 is CRITICAL: Puppet has not run in the last 10 hours [08:56:46] PROBLEM - Puppet freshness on tmh1001 is CRITICAL: Puppet has not run in the last 10 hours [08:56:46] PROBLEM - Puppet freshness on mw1167 is CRITICAL: Puppet has not run in the last 10 hours [08:56:46] PROBLEM - Puppet freshness on mw1119 is CRITICAL: Puppet has not run in the last 10 hours [08:57:48] PROBLEM - Puppet freshness on aluminium is CRITICAL: Puppet has not run in the last 10 hours [08:57:48] PROBLEM - Puppet freshness on calcium is CRITICAL: Puppet has not run in the last 10 hours [08:57:48] PROBLEM - Puppet freshness on cp1004 is CRITICAL: Puppet has not run in the last 10 hours [08:57:48] PROBLEM - Puppet freshness on cp1013 is CRITICAL: Puppet has not run in the last 10 hours [08:57:48] PROBLEM - Puppet freshness on cp1041 is CRITICAL: Puppet has not run in the last 10 hours [08:57:49] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [08:57:49] PROBLEM - Puppet freshness on mw1196 is CRITICAL: Puppet has not run in the last 10 hours [08:57:50] PROBLEM - Puppet freshness on mw1016 is CRITICAL: Puppet has not run in the last 10 hours [08:57:50] PROBLEM - Puppet freshness on mw1141 is CRITICAL: Puppet has not run in the last 10 hours [08:57:51] PROBLEM - Puppet freshness on search1008 is CRITICAL: Puppet has not run in the last 10 hours [08:57:51] PROBLEM - Puppet freshness on db1009 is CRITICAL: Puppet has not run in the last 10 hours [08:58:51] PROBLEM - Puppet freshness on es1001 is CRITICAL: Puppet has not run in the last 10 hours [08:58:51] PROBLEM - Puppet freshness on mw1067 is CRITICAL: Puppet has not run in the last 10 hours [08:58:51] PROBLEM - Puppet freshness on mw1021 is CRITICAL: Puppet has not run in the last 10 hours [08:58:51] PROBLEM - Puppet freshness on mw1100 is CRITICAL: Puppet has not run in the last 10 hours [08:58:51] PROBLEM - Puppet freshness on mw1084 is CRITICAL: Puppet has not run in the last 10 hours [08:58:52] PROBLEM - Puppet freshness on mw1112 is CRITICAL: Puppet has not run in the last 10 hours [08:58:52] PROBLEM - Puppet freshness on mc1008 is CRITICAL: Puppet has not run in the last 10 hours [08:58:53] PROBLEM - Puppet freshness on mw1103 is CRITICAL: Puppet has not run in the last 10 hours [08:58:53] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [08:59:32] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [08:59:45] PROBLEM - Puppet freshness on cp1019 is CRITICAL: Puppet has not run in the last 10 hours [08:59:45] PROBLEM - Puppet freshness on cp1042 is CRITICAL: Puppet has not run in the last 10 hours [08:59:46] PROBLEM - Puppet freshness on mw1158 is CRITICAL: Puppet has not run in the last 10 hours [08:59:46] PROBLEM - Puppet freshness on mw1159 is CRITICAL: Puppet has not run in the last 10 hours [08:59:46] PROBLEM - Puppet freshness on db1011 is CRITICAL: Puppet has not run in the last 10 hours [08:59:46] PROBLEM - Puppet freshness on snapshot1002 is CRITICAL: Puppet has not run in the last 10 hours [09:00:09] so here's how 'stuff created before X' works: we don't actually know when they went into the db. we do know or think we know what the expiry limit is (say it's a month). if we want stuff that was created 6 weeks ago to be tossed, we would look for things with an expiry time of 2 weeks ago or longer (6 weeks minus the expiry limit)... that's what that line did [09:00:12] it's not what it does now [09:00:24] Reedy: [09:00:35] (quick before tht scrolls off the screen :-P) [09:00:48] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [09:00:48] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [09:00:48] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [09:00:48] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [09:00:48] PROBLEM - Puppet freshness on gallium is CRITICAL: Puppet has not run in the last 10 hours [09:00:49] PROBLEM - Puppet freshness on mw1202 is CRITICAL: Puppet has not run in the last 10 hours [09:00:49] PROBLEM - Puppet freshness on mw1168 is CRITICAL: Puppet has not run in the last 10 hours [09:00:50] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [09:00:50] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [09:00:51] PROBLEM - Puppet freshness on mw1066 is CRITICAL: Puppet has not run in the last 10 hours [09:01:12] Ahh [09:01:16] I think ;) [09:01:28] well double check me but reading the option that's what it seems like [09:01:32] (for 'age') [09:01:51] PROBLEM - Puppet freshness on db1024 is CRITICAL: Puppet has not run in the last 10 hours [09:01:51] PROBLEM - Puppet freshness on labsdb1003 is CRITICAL: Puppet has not run in the last 10 hours [09:01:51] PROBLEM - Puppet freshness on mc1002 is CRITICAL: Puppet has not run in the last 10 hours [09:01:51] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Puppet has not run in the last 10 hours [09:01:51] PROBLEM - Puppet freshness on mw1093 is CRITICAL: Puppet has not run in the last 10 hours [09:01:52] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [09:01:52] PROBLEM - Puppet freshness on mw1134 is CRITICAL: Puppet has not run in the last 10 hours [09:01:53] PROBLEM - Puppet freshness on mw1187 is CRITICAL: Puppet has not run in the last 10 hours [09:02:44] what happens when you want to delete stuff created 6 weeks ago and the parser expiry limit is a year? [09:02:45] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [09:02:45] PROBLEM - Puppet freshness on db1019 is CRITICAL: Puppet has not run in the last 10 hours [09:02:45] PROBLEM - Puppet freshness on mw1054 is CRITICAL: Puppet has not run in the last 10 hours [09:02:45] PROBLEM - Puppet freshness on mw1121 is CRITICAL: Puppet has not run in the last 10 hours [09:02:46] PROBLEM - Puppet freshness on mw1171 is CRITICAL: Puppet has not run in the last 10 hours [09:02:46] PROBLEM - Puppet freshness on potassium is CRITICAL: Puppet has not run in the last 10 hours [09:03:48] PROBLEM - Puppet freshness on barium is CRITICAL: Puppet has not run in the last 10 hours [09:03:49] PROBLEM - Puppet freshness on db1046 is CRITICAL: Puppet has not run in the last 10 hours [09:03:49] PROBLEM - Puppet freshness on mw1009 is CRITICAL: Puppet has not run in the last 10 hours [09:03:49] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: Puppet has not run in the last 10 hours [09:03:49] PROBLEM - Puppet freshness on mw1108 is CRITICAL: Puppet has not run in the last 10 hours [09:03:49] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Puppet has not run in the last 10 hours [09:03:49] PROBLEM - Puppet freshness on db1026 is CRITICAL: Puppet has not run in the last 10 hours [09:03:50] PROBLEM - Puppet freshness on mw1179 is CRITICAL: Puppet has not run in the last 10 hours [09:03:50] PROBLEM - Puppet freshness on mw1114 is CRITICAL: Puppet has not run in the last 10 hours [09:03:51] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [09:03:51] PROBLEM - Puppet freshness on solr1001 is CRITICAL: Puppet has not run in the last 10 hours [09:04:02] maybe we ccna move this conversation, it's getting ridiculous [09:04:11] wikimedia-tech (still open, just less spam) [09:04:12] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [09:04:51] PROBLEM - Puppet freshness on mw1025 is CRITICAL: Puppet has not run in the last 10 hours [09:04:51] PROBLEM - Puppet freshness on mw1071 is CRITICAL: Puppet has not run in the last 10 hours [09:04:51] PROBLEM - Puppet freshness on mw1072 is CRITICAL: Puppet has not run in the last 10 hours [09:04:51] PROBLEM - Puppet freshness on mw1049 is CRITICAL: Puppet has not run in the last 10 hours [09:04:51] PROBLEM - Puppet freshness on search1018 is CRITICAL: Puppet has not run in the last 10 hours [09:04:52] PROBLEM - Puppet freshness on mw1135 is CRITICAL: Puppet has not run in the last 10 hours [09:05:36] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [09:05:45] PROBLEM - Puppet freshness on mw1047 is CRITICAL: Puppet has not run in the last 10 hours [09:05:46] PROBLEM - Puppet freshness on mw1048 is CRITICAL: Puppet has not run in the last 10 hours [09:05:46] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [09:05:46] PROBLEM - Puppet freshness on mw1062 is CRITICAL: Puppet has not run in the last 10 hours [09:05:46] PROBLEM - Puppet freshness on pc1001 is CRITICAL: Puppet has not run in the last 10 hours [09:05:46] PROBLEM - Puppet freshness on mw1098 is CRITICAL: Puppet has not run in the last 10 hours [09:05:46] PROBLEM - Puppet freshness on search1014 is CRITICAL: Puppet has not run in the last 10 hours [09:06:48] PROBLEM - Puppet freshness on cp1016 is CRITICAL: Puppet has not run in the last 10 hours [09:06:48] PROBLEM - Puppet freshness on db1007 is CRITICAL: Puppet has not run in the last 10 hours [09:06:48] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [09:06:48] PROBLEM - Puppet freshness on cp1024 is CRITICAL: Puppet has not run in the last 10 hours [09:06:48] PROBLEM - Puppet freshness on mw1046 is CRITICAL: Puppet has not run in the last 10 hours [09:06:49] PROBLEM - Puppet freshness on mc1011 is CRITICAL: Puppet has not run in the last 10 hours [09:06:49] PROBLEM - Puppet freshness on mw1013 is CRITICAL: Puppet has not run in the last 10 hours [09:06:50] PROBLEM - Puppet freshness on mw1116 is CRITICAL: Puppet has not run in the last 10 hours [09:06:50] PROBLEM - Puppet freshness on mw1186 is CRITICAL: Puppet has not run in the last 10 hours [09:07:51] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [09:07:51] PROBLEM - Puppet freshness on cerium is CRITICAL: Puppet has not run in the last 10 hours [09:07:51] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [09:07:51] PROBLEM - Puppet freshness on mc1012 is CRITICAL: Puppet has not run in the last 10 hours [09:07:51] PROBLEM - Puppet freshness on mw1032 is CRITICAL: Puppet has not run in the last 10 hours [09:07:52] PROBLEM - Puppet freshness on mw1151 is CRITICAL: Puppet has not run in the last 10 hours [09:07:52] PROBLEM - Puppet freshness on es1005 is CRITICAL: Puppet has not run in the last 10 hours [09:08:45] PROBLEM - Puppet freshness on cp1017 is CRITICAL: Puppet has not run in the last 10 hours [09:08:45] PROBLEM - Puppet freshness on ms-fe1002 is CRITICAL: Puppet has not run in the last 10 hours [09:08:45] PROBLEM - Puppet freshness on mw1077 is CRITICAL: Puppet has not run in the last 10 hours [09:08:45] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [09:08:45] PROBLEM - Puppet freshness on nickel is CRITICAL: Puppet has not run in the last 10 hours [09:08:46] PROBLEM - Puppet freshness on mw1172 is CRITICAL: Puppet has not run in the last 10 hours [09:09:48] PROBLEM - Puppet freshness on cp1012 is CRITICAL: Puppet has not run in the last 10 hours [09:09:48] PROBLEM - Puppet freshness on db1050 is CRITICAL: Puppet has not run in the last 10 hours [09:09:48] PROBLEM - Puppet freshness on db1021 is CRITICAL: Puppet has not run in the last 10 hours [09:09:48] PROBLEM - Puppet freshness on mc1010 is CRITICAL: Puppet has not run in the last 10 hours [09:09:48] PROBLEM - Puppet freshness on mw1007 is CRITICAL: Puppet has not run in the last 10 hours [09:09:49] PROBLEM - Puppet freshness on mw1011 is CRITICAL: Puppet has not run in the last 10 hours [09:09:49] PROBLEM - Puppet freshness on mw1130 is CRITICAL: Puppet has not run in the last 10 hours [09:09:50] PROBLEM - Puppet freshness on mw1154 is CRITICAL: Puppet has not run in the last 10 hours [09:09:50] PROBLEM - Puppet freshness on mw1194 is CRITICAL: Puppet has not run in the last 10 hours [09:09:51] PROBLEM - Puppet freshness on mw1162 is CRITICAL: Puppet has not run in the last 10 hours [09:09:51] PROBLEM - Puppet freshness on xenon is CRITICAL: Puppet has not run in the last 10 hours [09:09:52] PROBLEM - Puppet freshness on mw1207 is CRITICAL: Puppet has not run in the last 10 hours [09:09:52] PROBLEM - Puppet freshness on mw1181 is CRITICAL: Puppet has not run in the last 10 hours [09:10:51] PROBLEM - Puppet freshness on cp1031 is CRITICAL: Puppet has not run in the last 10 hours [09:10:51] PROBLEM - Puppet freshness on db1034 is CRITICAL: Puppet has not run in the last 10 hours [09:10:51] PROBLEM - Puppet freshness on mw1045 is CRITICAL: Puppet has not run in the last 10 hours [09:10:51] PROBLEM - Puppet freshness on mw1083 is CRITICAL: Puppet has not run in the last 10 hours [09:10:51] PROBLEM - Puppet freshness on mw1061 is CRITICAL: Puppet has not run in the last 10 hours [09:10:52] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Puppet has not run in the last 10 hours [09:10:52] PROBLEM - Puppet freshness on mw1169 is CRITICAL: Puppet has not run in the last 10 hours [09:10:53] PROBLEM - Puppet freshness on search1006 is CRITICAL: Puppet has not run in the last 10 hours [09:10:53] PROBLEM - Puppet freshness on mw1139 is CRITICAL: Puppet has not run in the last 10 hours [09:10:54] PROBLEM - Puppet freshness on mw1137 is CRITICAL: Puppet has not run in the last 10 hours [09:11:45] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [09:11:45] PROBLEM - Puppet freshness on cp1027 is CRITICAL: Puppet has not run in the last 10 hours [09:11:45] PROBLEM - Puppet freshness on db1031 is CRITICAL: Puppet has not run in the last 10 hours [09:11:45] PROBLEM - Puppet freshness on mw1188 is CRITICAL: Puppet has not run in the last 10 hours [09:11:45] PROBLEM - Puppet freshness on pc1002 is CRITICAL: Puppet has not run in the last 10 hours [09:11:46] PROBLEM - Puppet freshness on mw1173 is CRITICAL: Puppet has not run in the last 10 hours [09:11:46] PROBLEM - Puppet freshness on search1005 is CRITICAL: Puppet has not run in the last 10 hours [09:12:48] PROBLEM - Puppet freshness on db1010 is CRITICAL: Puppet has not run in the last 10 hours [09:12:48] PROBLEM - Puppet freshness on mw1155 is CRITICAL: Puppet has not run in the last 10 hours [09:12:48] PROBLEM - Puppet freshness on db1005 is CRITICAL: Puppet has not run in the last 10 hours [09:13:51] PROBLEM - Puppet freshness on db1041 is CRITICAL: Puppet has not run in the last 10 hours [09:13:52] PROBLEM - Puppet freshness on mw1057 is CRITICAL: Puppet has not run in the last 10 hours [09:13:52] PROBLEM - Puppet freshness on mw1073 is CRITICAL: Puppet has not run in the last 10 hours [09:13:52] PROBLEM - Puppet freshness on mc1013 is CRITICAL: Puppet has not run in the last 10 hours [09:13:52] PROBLEM - Puppet freshness on mw1029 is CRITICAL: Puppet has not run in the last 10 hours [09:13:52] PROBLEM - Puppet freshness on mw1014 is CRITICAL: Puppet has not run in the last 10 hours [09:13:52] PROBLEM - Puppet freshness on mw1026 is CRITICAL: Puppet has not run in the last 10 hours [09:14:45] PROBLEM - Puppet freshness on cp1025 is CRITICAL: Puppet has not run in the last 10 hours [09:14:46] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [09:14:46] PROBLEM - Puppet freshness on db1012 is CRITICAL: Puppet has not run in the last 10 hours [09:14:46] PROBLEM - Puppet freshness on db1028 is CRITICAL: Puppet has not run in the last 10 hours [09:14:46] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: Puppet has not run in the last 10 hours [09:14:46] PROBLEM - Puppet freshness on mw1055 is CRITICAL: Puppet has not run in the last 10 hours [09:14:46] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [09:14:47] PROBLEM - Puppet freshness on mw1192 is CRITICAL: Puppet has not run in the last 10 hours [09:15:48] PROBLEM - Puppet freshness on cp1009 is CRITICAL: Puppet has not run in the last 10 hours [09:15:48] PROBLEM - Puppet freshness on db1016 is CRITICAL: Puppet has not run in the last 10 hours [09:15:48] PROBLEM - Puppet freshness on db1023 is CRITICAL: Puppet has not run in the last 10 hours [09:15:48] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [09:15:48] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [09:15:49] PROBLEM - Puppet freshness on mw1099 is CRITICAL: Puppet has not run in the last 10 hours [09:15:49] PROBLEM - Puppet freshness on mw1185 is CRITICAL: Puppet has not run in the last 10 hours [09:15:50] PROBLEM - Puppet freshness on search1017 is CRITICAL: Puppet has not run in the last 10 hours [09:16:42] PROBLEM - Puppet freshness on mw1106 is CRITICAL: Puppet has not run in the last 10 hours [09:16:42] PROBLEM - Puppet freshness on mw1012 is CRITICAL: Puppet has not run in the last 10 hours [09:16:42] PROBLEM - Puppet freshness on mc1006 is CRITICAL: Puppet has not run in the last 10 hours [09:16:43] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [09:17:45] PROBLEM - Puppet freshness on mw1147 is CRITICAL: Puppet has not run in the last 10 hours [09:17:45] PROBLEM - Puppet freshness on mc1004 is CRITICAL: Puppet has not run in the last 10 hours [09:17:45] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [09:18:48] PROBLEM - Puppet freshness on cp1030 is CRITICAL: Puppet has not run in the last 10 hours [09:18:48] PROBLEM - Puppet freshness on mc1014 is CRITICAL: Puppet has not run in the last 10 hours [09:18:48] PROBLEM - Puppet freshness on ms-be1004 is CRITICAL: Puppet has not run in the last 10 hours [09:18:48] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [09:18:49] PROBLEM - Puppet freshness on mw1070 is CRITICAL: Puppet has not run in the last 10 hours [09:18:49] PROBLEM - Puppet freshness on mw1189 is CRITICAL: Puppet has not run in the last 10 hours [09:18:49] PROBLEM - Puppet freshness on mw1060 is CRITICAL: Puppet has not run in the last 10 hours [09:18:50] PROBLEM - Puppet freshness on niobium is CRITICAL: Puppet has not run in the last 10 hours [09:19:51] PROBLEM - Puppet freshness on cp1026 is CRITICAL: Puppet has not run in the last 10 hours [09:19:51] PROBLEM - Puppet freshness on db1006 is CRITICAL: Puppet has not run in the last 10 hours [09:19:51] PROBLEM - Puppet freshness on mw1126 is CRITICAL: Puppet has not run in the last 10 hours [09:19:52] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [09:21:08] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [09:21:48] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [09:40:58] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [09:43:39] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [09:44:38] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [10:10:20] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [10:10:51] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [10:11:10] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [10:11:30] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [10:34:15] New review: Lydia Pintscher; "Hmm this seems to indeed be the case. Can someone else confirm? Did someone fix this?" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/49069 [11:24:31] Change abandoned: ArielGlenn; "not needed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50894 [11:27:34] New patchset: ArielGlenn; "update ms-be12 entries in dhcp, site, clean up other ms-be entries" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51128 [11:32:37] RECOVERY - NTP on snapshot1002 is OK: NTP OK: Offset -0.01466202736 secs [11:33:30] RECOVERY - NTP on snapshot1002 is OK: NTP OK: Offset -0.01512026787 secs [11:41:17] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [11:42:07] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [11:52:08] New patchset: Jérémie Roquet; "Explicitely redefine thumbnail sizes for frwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51152 [12:10:13] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [12:11:13] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [12:21:02] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [12:21:52] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [12:22:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:23:00] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [12:23:36] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [12:24:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.343 second response time [12:28:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.033 seconds [12:33:42] PROBLEM - MySQL Recent Restart on db1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:33:48] PROBLEM - MySQL Recent Restart on db1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:34:42] RECOVERY - MySQL Recent Restart on db1011 is OK: OK 346 seconds since restart [12:34:42] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [12:35:27] RECOVERY - MySQL Recent Restart on db1011 is OK: OK 389 seconds since restart [12:35:27] PROBLEM - MySQL Replication Heartbeat on db1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:37:06] RECOVERY - MySQL Replication Heartbeat on db1011 is OK: OK replication delay 0 seconds [12:45:02] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [12:45:22] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [12:45:52] RECOVERY - MySQL disk space on neon is OK: DISK OK [12:46:33] RECOVERY - MySQL disk space on neon is OK: DISK OK [12:47:12] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [12:50:02] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [12:52:24] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [12:57:18] New review: Lydia Pintscher; "Ok I take that back. There's still one problem: When using :d in the wikitext" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/49069 [12:58:18] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [12:59:04] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.001 second response time on port 8123 [13:14:47] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51128 [13:15:30] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 21 seconds [13:15:32] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 24 seconds [13:16:33] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 14 seconds [13:16:52] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [13:42:30] RECOVERY - Host ms-be12 is UP: PING OK - Packet loss = 0%, RTA = 1.01 ms [13:46:24] PROBLEM - swift-account-server on ms-be12 is CRITICAL: Connection refused by host [13:46:43] RECOVERY - Host ms-be12 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [13:46:51] PROBLEM - swift-account-auditor on ms-be12 is CRITICAL: Connection refused by host [13:46:51] PROBLEM - SSH on ms-be12 is CRITICAL: Connection refused [13:46:51] PROBLEM - swift-container-updater on ms-be12 is CRITICAL: Connection refused by host [13:47:00] PROBLEM - swift-account-reaper on ms-be12 is CRITICAL: Connection refused by host [13:47:01] PROBLEM - swift-container-replicator on ms-be12 is CRITICAL: Connection refused by host [13:47:09] PROBLEM - swift-account-replicator on ms-be12 is CRITICAL: Connection refused by host [13:47:27] PROBLEM - swift-object-updater on ms-be12 is CRITICAL: Connection refused by host [13:47:27] PROBLEM - swift-object-server on ms-be12 is CRITICAL: Connection refused by host [13:47:27] PROBLEM - swift-container-auditor on ms-be12 is CRITICAL: Connection refused by host [13:47:27] PROBLEM - swift-container-server on ms-be12 is CRITICAL: Connection refused by host [13:47:27] PROBLEM - swift-object-replicator on ms-be12 is CRITICAL: Connection refused by host [13:47:36] PROBLEM - swift-object-auditor on ms-be12 is CRITICAL: Connection refused by host [13:49:03] PROBLEM - swift-object-replicator on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:04] PROBLEM - swift-account-server on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:13] PROBLEM - swift-container-replicator on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:15] PROBLEM - swift-container-server on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:23] PROBLEM - swift-container-auditor on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:23] PROBLEM - SSH on ms-be12 is CRITICAL: Connection timed out [13:49:23] PROBLEM - swift-account-replicator on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:33] PROBLEM - swift-container-updater on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:34] PROBLEM - swift-object-server on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:34] PROBLEM - swift-account-reaper on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:43] PROBLEM - swift-account-auditor on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:43] PROBLEM - swift-object-updater on ms-be12 is CRITICAL: Timeout while attempting connection [13:49:44] PROBLEM - swift-object-auditor on ms-be12 is CRITICAL: Timeout while attempting connection [13:50:27] PROBLEM - Host ms-be12 is DOWN: PING CRITICAL - Packet loss = 100% [13:56:18] RECOVERY - Host ms-be12 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [13:56:22] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 194 seconds [13:56:31] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 196 seconds [13:56:45] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 199 seconds [13:57:03] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 205 seconds [13:58:31] PROBLEM - Host ms-be12 is DOWN: CRITICAL - Plugin timed out after 15 seconds [13:59:21] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 194 seconds [13:59:32] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 197 seconds [14:00:21] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 215 seconds [14:00:39] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 224 seconds [14:04:15] PROBLEM - Host ms-be12 is DOWN: PING CRITICAL - Packet loss = 100% [14:04:51] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , ptwiktionary (185816), Total (193988) [14:06:48] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , ptwiktionary (188018), Total (196835) [14:15:21] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [14:15:31] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [14:16:42] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [14:16:43] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [14:18:00] Hey guys, any idea whether there would be any wider issues associated with the problems with nagios? [14:18:19] would there be any or none at all? [14:19:21] I think there /shouldn't/ be. [14:19:41] If there are, it's a bug in deployment rather than a change in substance. [14:23:04] hmmm kk, im just trying to see if payments related errors that we are seeing from donors can be easily explained by anything on our end [14:23:29] wild stab in the dark [14:24:03] very wild but was the only thing that I could see that status seemed to be showing as being bad [14:25:58] Yeah, I very much doubt that. The problems should only be cosmetic, and anything genuinely broken would be bits of monitoring. [14:31:25] Hey guys, could someone +2 https://gerrit.wikimedia.org/r/#/c/50913/ and https://gerrit.wikimedia.org/r/#/c/51051/ ? They're blockers for me atm. [14:32:48] Ryan_Lane ^^ [14:33:00] Coren are you remote worker [14:50:37] petan: Aye. [14:51:07] hm, people in this channel are mostly afk, so if you need someone to do something, ping them :P if you can't poke them in person [14:52:08] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [14:52:28] petan: Thing is, at this time, I don't really yet know who to ping for what. :-) [14:53:15] is this change related to labs or production? [14:53:28] petan: Labs, but the puppet config is production-wide. [14:53:38] petan: So... "yes"? :-) [14:53:43] ok, then Ryan_Lane andrewbogott_afk or mutante are who you want :P [14:54:03] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [14:54:28] or paravoid ofc [14:54:51] Noted. Thanks. :-) [14:55:22] there is this thing @notify too [14:55:25] @notify Ryan_Lane [14:55:25] This user is now online in #wikimedia-dev so I will let you know when they show some activity (talk etc) [14:55:36] Ooo. Nice. [14:56:15] @notify Ryan_Lane [14:56:15] This user is now online in #wikimedia-dev so I will let you know when they show some activity (talk etc) [14:56:25] Eeexcellent. [15:03:27] it works in private messages too so that you don't need to spam channels :D [15:03:48] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [15:05:39] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [15:06:29] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [15:21:09] PROBLEM - SSH on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:21:39] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:24] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:59] RECOVERY - SSH on palladium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:23:29] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 0.001 second response time [15:23:45] petan: Well yeah! First show me how to do in wrong /then/ berate me for doing it that way. :-P [15:24:03] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK HTTP/1.1 200 OK - 635 bytes in 0.053 seconds [15:24:14] no :D I just was telling you that in future you can use pm [15:24:17] 's [15:24:18] as well [15:24:33] it's hard to show example in pm anyway :P [15:36:29] RECOVERY - SSH on ms-be12 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:36:39] RECOVERY - Host ms-be12 is UP: PING OK - Packet loss = 0%, RTA = 26.92 ms [15:36:39] RECOVERY - SSH on ms-be12 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:36:48] RECOVERY - Host ms-be12 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [15:38:49] PROBLEM - Varnish HTTP bits on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:41:00] PROBLEM - Varnish HTTP bits on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:41:22] PROBLEM - SSH on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:42:39] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , Total (13041) [15:47:19] PROBLEM - SSH on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:48:33] Coren, when they will give you +2? [15:49:42] MaxSem: I dunno. They're all inordinately busy this week, maybe it's even on the TODO list. But that's kinda besides the point; one shouldn't +2 one's own changes anyways. Kinda defeats the purpose. :-) [15:50:12] Coren, most puppet changes are self-merged [15:50:42] PROBLEM - NTP on ms-be12 is CRITICAL: NTP CRITICAL: No response from NTP server [15:50:48] MaxSem: Ah, I wasn't aware of that -- though it make sense in that limited scenario. [15:51:10] MaxSem: Which just means I need to bug someone to give /me/ +2 instead of just okaying my changes. :-) [15:51:55] are they planning to bring you yo SF for initial brainwa training? [15:56:12] RECOVERY - SSH on niobium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:56:42] RECOVERY - Varnish HTTP bits on niobium is OK: HTTP OK: HTTP/1.1 200 OK - 633 bytes in 4.338 second response time [15:57:03] RECOVERY - Varnish HTTP bits on niobium is OK: HTTP OK HTTP/1.1 200 OK - 633 bytes in 0.053 seconds [15:57:20] MaxSem: I have no idea. That hasn't been discussed, and I'm going to wager that they'll expect me to be self-reliant enough to not have to shuffle me around for this given that there is already one planned trip and another plausible one on the way. I can see a good argument for not spending more on travel than they do in remuneration. :-) [15:57:57] RECOVERY - SSH on niobium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:58:06] Especially since I think that contractors are also brought in for the yearly all-staff? [15:58:14] (Which would make a /third/ trip) [15:58:24] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [16:03:30] PROBLEM - NTP on ms-be12 is CRITICAL: NTP CRITICAL: No response from NTP server [16:05:45] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [16:10:12] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [16:28:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:29:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.883 second response time [16:33:39] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , huwiktionary (11251), Total (18456) [16:41:18] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (36879), Total (47030) [16:47:41] +2, +2, my kindom* for a +2 (* kingdom may or may not exist) [16:49:24] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 194 seconds [16:49:24] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 192 seconds [16:49:24] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 195 seconds [16:50:00] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 209 seconds [16:55:31] New patchset: Alex Monk; "(bug 44587) Fix trwiki FlaggedRevs autopromotion config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51172 [17:00:30] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [17:00:30] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [17:02:45] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [17:02:45] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [17:07:15] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [17:07:29] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [17:07:30] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [17:07:51] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [17:11:17] hashar: morning, zuul-server is running twice. let's check together later [17:11:41] i think we just want to adjust monitoring like with jenkins [17:12:23] mutante: morning :-) [17:12:54] maybe it is forking too [17:13:12] MaxSem: working on fenari? would you mind a reboot now? [17:13:13] or the check detect itself [17:13:23] mutante, go ahead [17:13:33] hashar: yea, we probably want to use the regex option [17:13:40] MaxSem: kthx [17:13:57] !log rebooting fenari for kernel upgrade [17:14:00] Logged the message, Master [17:14:06] i don't see other users logged in [17:15:16] or maybe that the check_proc which is matched :-] [17:15:17] New patchset: Hashar; "WIP monitoring lucene search boxes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51174 [17:16:39] PROBLEM - Host fenari is DOWN: PING CRITICAL - Packet loss = 100% [17:19:19] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [17:19:48] meh, had to powercycle it [17:19:48] coming back now [17:20:09] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 26.61 ms [17:20:54] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [17:21:19] fscking [17:21:39] PROBLEM - ircecho_service_running on fenari is CRITICAL: Connection refused by host [17:21:48] PROBLEM - HTTP on fenari is CRITICAL: Connection refused [17:21:49] RECOVERY - Host fenari is UP: PING OK - Packet loss = 0%, RTA = 26.76 ms [17:22:19] PROBLEM - Apache HTTP on mw31 is CRITICAL: Connection refused [17:22:51] PROBLEM - SSH on fenari is CRITICAL: Connection refused [17:23:36] RECOVERY - HTTP on fenari is OK: HTTP OK HTTP/1.1 200 OK - 4915 bytes in 0.019 seconds [17:24:39] RECOVERY - SSH on fenari is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [17:25:15] RECOVERY - ircecho_service_running on fenari is OK: PROCS OK: 1 process with args ircecho [17:25:15] PROBLEM - Apache HTTP on mw31 is CRITICAL: Connection refused [17:28:07] New patchset: Ram; "Fix for bug 45266. Needs parallel changes to OAI." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/51077 [17:28:33] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [17:32:46] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [17:33:12] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [17:33:16] RECOVERY - MySQL disk space on neon is OK: DISK OK [17:33:30] RECOVERY - MySQL disk space on neon is OK: DISK OK [17:43:50] New review: Hashar; "Indeed :(" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51152 [17:45:15] Reedy: https://gerrit.wikimedia.org/r/#/c/51152/ :( [17:45:26] Reedy: the $wgThumbLimits have different ordering now :( [17:45:39] Ohh [17:45:43] fail [17:45:46] since we refer to the array index instead of the thumb size, that changed the thumb size for lot of wikis :-] [17:45:49] lot of side effects [17:45:58] in the first place, using the index is probably dumb [17:45:59] :-] [17:46:03] (not your fault) [17:46:41] that probably had some effects on our imagescalers [17:46:48] and swift :-] [17:46:55] and maybe the image caches [17:48:03] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , dewiki (35871), Total (43786) [17:49:42] New review: Hashar; "Caused by https://gerrit.wikimedia.org/r/#/c/47303/ for Bug 27839" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51152 [17:50:35] New review: Hashar; "That patch had the side effect of shifting the elements in $wgThumbLimits. Since they are referred t..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47303 [17:50:43] New patchset: Jgreen; "fixed ssh key type for user sahar" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51181 [17:50:48] Reedy: I think we should revert the patch [17:50:55] sure [17:50:56] Reedy: that would cause lot of new thumbnails to be generated [17:51:03] seems sensible [17:51:04] in turns filling the image caches / swift etc [17:51:19] we had that discussion a few weeks ago when the hewiki requested to get larger thumbnails [17:51:28] ops list has some thread about it [17:51:33] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51181 [17:53:51] Reedy: writing the change to revert it [17:55:03] New patchset: Hashar; "make frwiki thumbs the default again" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51182 [17:55:17] New patchset: Matmarex; "(bug 42413) Set $wgCategoryCollation to 'uca-pl' for the Polish Wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51183 [17:55:42] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , dewiki (16594), Total (23674) [17:55:47] Change abandoned: Hashar; "We are just going to revert the thumbnail sizes since that has huge side effect on our cache infrast..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51152 [17:56:00] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.098 second response time [17:56:03] New review: Hashar; "This is being reverted with https://gerrit.wikimedia.org/r/51182" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47303 [17:56:16] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.113 second response time [17:57:55] !log reedy synchronized php-1.21wmf10/extensions/EducationProgram/ [17:57:57] Logged the message, Master [17:58:05] Reedy: and here is the revert https://gerrit.wikimedia.org/r/#/c/51182/ :-D [17:59:20] New review: Matmarex; "After this is merged, maintenance/updateCollation.php has to be ran." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51183 [18:02:04] sbernardin: ms-be12 question...you used the intel300 ssd's and swapped the controller to h710...correct? [18:02:26] PROBLEM - Host ms-be12 is DOWN: PING CRITICAL - Packet loss = 100% [18:02:36] PROBLEM - Host ms-be12 is DOWN: PING CRITICAL - Packet loss = 100% [18:03:48] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [18:03:48] PROBLEM - Puppet freshness on lardner is CRITICAL: Puppet has not run in the last 10 hours [18:04:51] PROBLEM - Puppet freshness on amssq59 is CRITICAL: Puppet has not run in the last 10 hours [18:04:51] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [18:04:51] PROBLEM - Puppet freshness on gurvin is CRITICAL: Puppet has not run in the last 10 hours [18:04:52] PROBLEM - Puppet freshness on es2 is CRITICAL: Puppet has not run in the last 10 hours [18:04:52] PROBLEM - Puppet freshness on mc6 is CRITICAL: Puppet has not run in the last 10 hours [18:04:52] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [18:04:52] PROBLEM - Puppet freshness on mw58 is CRITICAL: Puppet has not run in the last 10 hours [18:04:53] PROBLEM - Puppet freshness on mw68 is CRITICAL: Puppet has not run in the last 10 hours [18:04:53] PROBLEM - Puppet freshness on mw101 is CRITICAL: Puppet has not run in the last 10 hours [18:04:54] PROBLEM - Puppet freshness on mw108 is CRITICAL: Puppet has not run in the last 10 hours [18:04:54] PROBLEM - Puppet freshness on sq42 is CRITICAL: Puppet has not run in the last 10 hours [18:04:55] PROBLEM - Puppet freshness on sq83 is CRITICAL: Puppet has not run in the last 10 hours [18:04:55] PROBLEM - Puppet freshness on mw92 is CRITICAL: Puppet has not run in the last 10 hours [18:04:56] PROBLEM - Puppet freshness on mw7 is CRITICAL: Puppet has not run in the last 10 hours [18:05:45] PROBLEM - Puppet freshness on db64 is CRITICAL: Puppet has not run in the last 10 hours [18:05:46] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [18:05:46] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: Puppet has not run in the last 10 hours [18:05:46] PROBLEM - Puppet freshness on mw109 is CRITICAL: Puppet has not run in the last 10 hours [18:05:46] PROBLEM - Puppet freshness on mw119 is CRITICAL: Puppet has not run in the last 10 hours [18:05:46] PROBLEM - Puppet freshness on mw33 is CRITICAL: Puppet has not run in the last 10 hours [18:05:46] PROBLEM - Puppet freshness on db78 is CRITICAL: Puppet has not run in the last 10 hours [18:05:47] PROBLEM - Puppet freshness on mw87 is CRITICAL: Puppet has not run in the last 10 hours [18:05:47] PROBLEM - Puppet freshness on lvs3 is CRITICAL: Puppet has not run in the last 10 hours [18:05:48] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [18:05:48] PROBLEM - Puppet freshness on sq65 is CRITICAL: Puppet has not run in the last 10 hours [18:05:49] PROBLEM - Puppet freshness on sq68 is CRITICAL: Puppet has not run in the last 10 hours [18:05:49] PROBLEM - Puppet freshness on sq70 is CRITICAL: Puppet has not run in the last 10 hours [18:05:50] PROBLEM - Puppet freshness on sq76 is CRITICAL: Puppet has not run in the last 10 hours [18:05:50] PROBLEM - Puppet freshness on virt10 is CRITICAL: Puppet has not run in the last 10 hours [18:05:51] PROBLEM - Puppet freshness on yvon is CRITICAL: Puppet has not run in the last 10 hours [18:06:13] New review: Reedy; "Guess this needs wmf11 deploying first?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51183 [18:06:48] PROBLEM - Puppet freshness on kuo is CRITICAL: Puppet has not run in the last 10 hours [18:06:49] PROBLEM - Puppet freshness on db58 is CRITICAL: Puppet has not run in the last 10 hours [18:06:49] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [18:06:49] PROBLEM - Puppet freshness on linne is CRITICAL: Puppet has not run in the last 10 hours [18:06:49] PROBLEM - Puppet freshness on mc10 is CRITICAL: Puppet has not run in the last 10 hours [18:06:49] PROBLEM - Puppet freshness on mc8 is CRITICAL: Puppet has not run in the last 10 hours [18:06:49] PROBLEM - Puppet freshness on mw104 is CRITICAL: Puppet has not run in the last 10 hours [18:06:50] PROBLEM - Puppet freshness on mw52 is CRITICAL: Puppet has not run in the last 10 hours [18:06:50] PROBLEM - Puppet freshness on mw15 is CRITICAL: Puppet has not run in the last 10 hours [18:06:51] PROBLEM - Puppet freshness on mw84 is CRITICAL: Puppet has not run in the last 10 hours [18:06:51] PROBLEM - Puppet freshness on search18 is CRITICAL: Puppet has not run in the last 10 hours [18:06:52] PROBLEM - Puppet freshness on mw63 is CRITICAL: Puppet has not run in the last 10 hours [18:06:52] PROBLEM - Puppet freshness on sq74 is CRITICAL: Puppet has not run in the last 10 hours [18:06:52] seriously [18:06:53] PROBLEM - Puppet freshness on search31 is CRITICAL: Puppet has not run in the last 10 hours [18:06:53] PROBLEM - Puppet freshness on search26 is CRITICAL: Puppet has not run in the last 10 hours [18:06:54] PROBLEM - Puppet freshness on sq82 is CRITICAL: Puppet has not run in the last 10 hours [18:06:54] PROBLEM - Puppet freshness on srv246 is CRITICAL: Puppet has not run in the last 10 hours [18:07:51] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [18:07:51] PROBLEM - Puppet freshness on mw18 is CRITICAL: Puppet has not run in the last 10 hours [18:07:51] PROBLEM - Puppet freshness on mw34 is CRITICAL: Puppet has not run in the last 10 hours [18:07:51] PROBLEM - Puppet freshness on mw11 is CRITICAL: Puppet has not run in the last 10 hours [18:07:51] PROBLEM - Puppet freshness on mw74 is CRITICAL: Puppet has not run in the last 10 hours [18:07:52] PROBLEM - Puppet freshness on search30 is CRITICAL: Puppet has not run in the last 10 hours [18:07:52] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [18:07:53] PROBLEM - Puppet freshness on ssl1 is CRITICAL: Puppet has not run in the last 10 hours [18:07:53] PROBLEM - Puppet freshness on pdf1 is CRITICAL: Puppet has not run in the last 10 hours [18:08:09] *plonk* (sound of nagios-wm being ignored) [18:08:45] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [18:08:46] PROBLEM - Puppet freshness on db56 is CRITICAL: Puppet has not run in the last 10 hours [18:08:46] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Puppet has not run in the last 10 hours [18:08:46] PROBLEM - Puppet freshness on ms-fe4 is CRITICAL: Puppet has not run in the last 10 hours [18:08:46] PROBLEM - Puppet freshness on mw122 is CRITICAL: Puppet has not run in the last 10 hours [18:08:46] PROBLEM - Puppet freshness on mw24 is CRITICAL: Puppet has not run in the last 10 hours [18:08:46] PROBLEM - Puppet freshness on pc1 is CRITICAL: Puppet has not run in the last 10 hours [18:08:47] PROBLEM - Puppet freshness on sq50 is CRITICAL: Puppet has not run in the last 10 hours [18:08:47] PROBLEM - Puppet freshness on srv260 is CRITICAL: Puppet has not run in the last 10 hours [18:08:48] PROBLEM - Puppet freshness on srv269 is CRITICAL: Puppet has not run in the last 10 hours [18:08:48] PROBLEM - Puppet freshness on virt6 is CRITICAL: Puppet has not run in the last 10 hours [18:09:48] PROBLEM - Puppet freshness on db10 is CRITICAL: Puppet has not run in the last 10 hours [18:09:48] PROBLEM - Puppet freshness on mchenry is CRITICAL: Puppet has not run in the last 10 hours [18:09:48] PROBLEM - Puppet freshness on db43 is CRITICAL: Puppet has not run in the last 10 hours [18:09:48] PROBLEM - Puppet freshness on mw124 is CRITICAL: Puppet has not run in the last 10 hours [18:09:48] PROBLEM - Puppet freshness on mw60 is CRITICAL: Puppet has not run in the last 10 hours [18:09:49] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [18:09:49] PROBLEM - Puppet freshness on mw31 is CRITICAL: Puppet has not run in the last 10 hours [18:09:50] PROBLEM - Puppet freshness on mw49 is CRITICAL: Puppet has not run in the last 10 hours [18:09:50] PROBLEM - Puppet freshness on mw81 is CRITICAL: Puppet has not run in the last 10 hours [18:09:51] PROBLEM - Puppet freshness on sq85 is CRITICAL: Puppet has not run in the last 10 hours [18:09:51] PROBLEM - Puppet freshness on sq71 is CRITICAL: Puppet has not run in the last 10 hours [18:09:52] PROBLEM - Puppet freshness on srv298 is CRITICAL: Puppet has not run in the last 10 hours [18:09:52] PROBLEM - Puppet freshness on mexia is CRITICAL: Puppet has not run in the last 10 hours [18:09:53] PROBLEM - Puppet freshness on sq62 is CRITICAL: Puppet has not run in the last 10 hours [18:09:53] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [18:09:54] PROBLEM - Puppet freshness on srv247 is CRITICAL: Puppet has not run in the last 10 hours [18:10:03] PROBLEM - Puppet freshness on tin is CRITICAL: Puppet has not run in the last 10 hours [18:10:51] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [18:10:51] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [18:10:51] PROBLEM - Puppet freshness on db65 is CRITICAL: Puppet has not run in the last 10 hours [18:10:51] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [18:10:51] PROBLEM - Puppet freshness on mw16 is CRITICAL: Puppet has not run in the last 10 hours [18:10:52] PROBLEM - Puppet freshness on srv243 is CRITICAL: Puppet has not run in the last 10 hours [18:10:52] PROBLEM - Puppet freshness on mc7 is CRITICAL: Puppet has not run in the last 10 hours [18:10:53] PROBLEM - Puppet freshness on srv277 is CRITICAL: Puppet has not run in the last 10 hours [18:10:53] PROBLEM - Puppet freshness on mw40 is CRITICAL: Puppet has not run in the last 10 hours [18:10:54] PROBLEM - Puppet freshness on mw19 is CRITICAL: Puppet has not run in the last 10 hours [18:10:58] New patchset: Lcarr; "turn off ircecho on nagios-wm / spence" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51186 [18:11:38] LeslieCarr: ensure => absent ? [18:11:45] PROBLEM - Puppet freshness on db33 is CRITICAL: Puppet has not run in the last 10 hours [18:11:45] PROBLEM - Puppet freshness on db39 is CRITICAL: Puppet has not run in the last 10 hours [18:11:45] PROBLEM - Puppet freshness on db60 is CRITICAL: Puppet has not run in the last 10 hours [18:11:46] PROBLEM - Puppet freshness on mc11 is CRITICAL: Puppet has not run in the last 10 hours [18:11:46] PROBLEM - Puppet freshness on lvs4 is CRITICAL: Puppet has not run in the last 10 hours [18:11:46] PROBLEM - Puppet freshness on mw38 is CRITICAL: Puppet has not run in the last 10 hours [18:11:46] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [18:11:47] PROBLEM - Puppet freshness on mw48 is CRITICAL: Puppet has not run in the last 10 hours [18:11:47] PROBLEM - Puppet freshness on emery is CRITICAL: Puppet has not run in the last 10 hours [18:11:48] PROBLEM - Puppet freshness on db54 is CRITICAL: Puppet has not run in the last 10 hours [18:11:48] PROBLEM - Puppet freshness on mw6 is CRITICAL: Puppet has not run in the last 10 hours [18:11:49] PROBLEM - Puppet freshness on sq37 is CRITICAL: Puppet has not run in the last 10 hours [18:11:49] PROBLEM - Puppet freshness on sq54 is CRITICAL: Puppet has not run in the last 10 hours [18:11:50] PROBLEM - Puppet freshness on sq59 is CRITICAL: Puppet has not run in the last 10 hours [18:11:50] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [18:11:54] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [18:12:17] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51186 [18:12:39] there [18:13:00] New review: Matmarex; "Yes." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51183 [18:15:18] LeslieCarr: Hey, would you like to +2 my puppet changes? while you're in Gerrit? [18:15:23] :-) [18:15:36] not in gerrit atm, and trying to pay attention to meeting - can you remind me in 1h ? [18:15:56] LeslieCarr: kk. [18:23:34] RECOVERY - Host ms-be12 is UP: PING OK - Packet loss = 0%, RTA = 26.60 ms [18:25:30] !log reedy Started syncing Wikimedia installation... : Rebuild message cache for EducationProgram [18:25:30] Logged the message, Master [18:25:32] PROBLEM - SSH on ms-be12 is CRITICAL: Connection refused [18:30:12] PROBLEM - Host ms-be12 is DOWN: PING CRITICAL - Packet loss = 100% [18:33:00] cmjohnson1: yes to both [18:33:13] ok...cool..thx [18:33:42] cmjohnson1: is there a problem with it? [18:35:09] i don't think so [18:36:51] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51051 [18:39:06] New review: Andrew Bogott; "This looks fine to me but I'd like someone with more security paranoia to give the final word." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/50913 [18:42:53] !log reedy Finished syncing Wikimedia installation... : Rebuild message cache for EducationProgram [18:42:55] Logged the message, Master [18:43:08] real 29m55.602s [18:43:55] Reedy: we should get scap time to be logged in graphite ! <-- binasher [18:45:08] Reedy: was that a scheduled deployment? [18:45:11] No [18:45:20] Reedy: just trying to figure out how the edu stuff works [18:45:23] don't piss the release manager :-] [18:45:26] ;) [18:45:36] greg-g: If you find out, please let me know [18:45:42] hah [18:46:18] PROBLEM - Varnish HTTP bits on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:46:48] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:47:06] Reedy: so, seriously, just wondering if that type of thing should be on the deploy schedule, I'm not totally in the know about the Edu stuff. Who should I talk with about their process? [18:48:38] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [18:48:51] Reedy: was that scap just about refreshing the l10n cache ? [18:48:58] Yeah [18:49:09] can't we simply mw-update-l10n ? [18:49:09] I deployed a bugfix earlier, which came with some message changes [18:49:15] It's essentially the same thing [18:49:24] sure thing [18:49:50] mark: looks like another bits box bites the dust …. tried logging into arsenic but it's semi-hanging on login [18:50:53] should i reboot or do you want to check it out ? [18:51:48] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:52:06] eh, i am rebooting [18:53:58] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [18:54:58] PROBLEM - Host arsenic is DOWN: CRITICAL - Host Unreachable (208.80.154.62) [18:55:30] !log rebooting arsenic [18:55:32] Logged the message, Mistress of the network gear. [18:56:38] Reedy: gotcha (through your answer to hashar ), nevermind my nagging :) [18:56:38] RECOVERY - Host ms-be12 is UP: PING OK - Packet loss = 0%, RTA = 26.56 ms [18:59:08] RECOVERY - SSH on ms-be12 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [18:59:58] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [19:00:38] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [19:00:48] RECOVERY - Host arsenic is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms [19:00:59] LeslieCarr: very busy? [19:01:05] We're in a meeting [19:01:08] RECOVERY - Varnish HTTP bits on arsenic is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 0.002 second response time [19:01:10] sorry [19:04:19] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Everything else to 1.21wmf10 [19:04:21] Logged the message, Master [19:04:28] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:05:18] PROBLEM - SSH on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:06:10] RECOVERY - SSH on palladium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [19:06:18] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 0.001 second response time [19:07:27] New patchset: Reedy; "Everything else to 1.21wmf10" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51195 [19:07:50] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51195 [19:10:08] RECOVERY - swift-account-server on ms-be12 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [19:10:08] RECOVERY - swift-object-updater on ms-be12 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [19:10:08] RECOVERY - swift-object-auditor on ms-be12 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [19:10:09] RECOVERY - swift-account-auditor on ms-be12 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [19:10:18] RECOVERY - swift-container-auditor on ms-be12 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:10:18] RECOVERY - swift-object-replicator on ms-be12 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [19:10:28] RECOVERY - swift-account-reaper on ms-be12 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [19:10:29] RECOVERY - swift-container-replicator on ms-be12 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [19:10:38] RECOVERY - swift-object-server on ms-be12 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [19:10:59] RECOVERY - swift-container-updater on ms-be12 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [19:10:59] RECOVERY - swift-container-server on ms-be12 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [19:11:03] apergos: yay \o/ [19:11:06] lookit all the oks! [19:11:58] RECOVERY - NTP on ms-be12 is OK: NTP OK: Offset -0.03717458248 secs [19:12:38] RECOVERY - swift-account-replicator on ms-be12 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:14:23] New patchset: Lcarr; "fixing icinga-admin site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51198 [19:15:08] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51198 [19:22:23] LeslieCarr: any reason for icinga-admin to not be a CNAME? [19:22:43] and, moreover, any reason for neon/icinga to not have IPv6? :-) [19:22:52] no on both [19:23:25] :-) [19:24:43] rt 4602 now [19:24:49] for after lunch i think :) [19:24:58] 4601 too [19:26:59] oh cool [19:27:06] i didn't see it so i figured it wouldn't show up [19:27:20] yay closed that one [19:32:20] Coren: got a 10 minute break [19:33:03] LeslieCarr: Only https://gerrit.wikimedia.org/r/#/c/50913/ is left; andrew wanted another pair of eyes because of the possible security implications. [19:33:33] More precisely, someone with security paranoia. :-) [19:39:27] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50913 [19:40:01] New patchset: Reedy; "lucene.php: simple loadbalancing of requests across datacenters" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43029 [19:41:06] ok , this afternoon, monitoring tastic [19:41:36] loadbalancing for search, yummy [19:41:52] anything that keeps me from getting (and responding to) all those silly pages is a huge win [19:41:56] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [19:42:01] jenkins didn't like it [19:42:09] New patchset: Reedy; "lucene.php: simple loadbalancing of requests across datacenters" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43029 [19:43:22] New review: Lcarr; "matanya: check open mange comes from here: http://folk.uio.no/trondham/software/check_openmanage.html" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47514 [19:47:14] what is '10.2.1.14' anyways? does it exist? [19:48:36] in fact I don't find a search-pool4.svc.pmtpa.wmnet either [19:48:43] Reedy: ^^ ? [19:49:18] New review: Faidon; "OMSA is a terrible beast, I'd like us to stay as far away as possible from it." [operations/puppet] (production) C: -2; - https://gerrit.wikimedia.org/r/47514 [19:49:42] :/ [19:50:00] the other ones seem ok [19:50:11] It's in the current lucene config commented out for tampa [19:50:38] must be commented out for a reason :-D [19:51:07] I think that's our datacentre changeover setup... :D [19:51:28] wellllll [19:52:36] it helps if it changes over *to* something [19:53:00] maybe not peter will know where the tampa host went off to [19:54:11] I'm goin to -1 this with a note about the ip so it gets resolved [19:55:04] paravoid: mind enlightening me why? [19:56:07] it's a terrible beast with horrible packages installing java crap [19:56:24] New review: ArielGlenn; "the ip for search pool4 in tampa (10.2.1.14) doesn't correspond to any host right now, and there's n..." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/43029 [19:57:17] paravoid, while you are here: how many total zones do we want for swift anyways? the docs say one zone per rack [19:57:22] PROBLEM - Puppet freshness on ms-be12 is CRITICAL: Puppet has not run in the last 10 hours [19:57:55] but that will be 8 zones [19:58:20] yeah, one per rack [19:58:21] LeslieCarr: Danke [19:58:44] ok, I'll keep it that way then, thanks [20:05:29] New review: MaxSem; "(1 comment)" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/43029 [20:12:22] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [20:15:22] RECOVERY - Puppet freshness on ms-be12 is OK: puppet ran at Wed Feb 27 20:15:16 UTC 2013 [20:16:04] how does one answer [20:16:11] ah well I'll answer 'out of band' [20:16:46] MaxSem: I thought the point was to save searchpool4, not to loadbalance eg enwiki; if we want this to be general purpose then it's an issue, for sure but otherwise [20:17:15] splitting off per wiki for the 'eveerything else' ones will be good enough [20:17:23] yeah [20:17:38] and for pool3 too [20:18:05] (sorry, didn't know how to comment on a comment in gerrit) [20:18:24] apergos, click on comment --> 'reply' [20:18:48] apergos, the commit summary mentions load balancing though [20:18:58] uh how do I do that, your comment is inline [20:19:23] click on inline comment to expand it [20:19:26] h. I right clicked :-D [20:19:34] expected a content menu... silly me [20:22:17] er [20:22:22] sorry for the ignorant q [20:22:34] but after saing the draft, 'reply done' just opens the draft back up [20:23:24] how do I publish the bloomin thing? [20:23:37] Review [20:24:04] :-D [20:24:16] ah so simple it's complicated (or maybe vv) [20:24:25] New review: ArielGlenn; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43029 [20:24:36] thanks [20:51:45] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [20:52:05] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [21:24:51] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [21:25:01] RECOVERY - MySQL disk space on neon is OK: DISK OK [21:26:17] !log tstarling synchronized php-1.21wmf10/includes/User.php 'user_token fixes' [21:26:20] Logged the message, Master [21:31:01] !log running resetUserTokens.php on all wikis, in screen on hume, to fix user_token field corruption (bug 41586) [21:31:02] Logged the message, Master [21:33:39] New review: Dzahn; "ugh, i agree. OMSA is a beast" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47514 [21:36:45] !log authdns-update for sdb1001-1002 [21:36:47] Logged the message, RobH [21:36:54] !log authdns-update for rdb1001-1002 [21:36:55] Logged the message, RobH [21:38:26] AaronSchulz: this one? https://gerrit.wikimedia.org/r/#/c/50877/ [21:40:35] git review...so......sloooow [21:40:38] New patchset: RobH; "adding rdb1001-1002" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51279 [21:41:22] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51279 [21:43:26] notpeter: yes [21:44:42] !log rdb1001 & rdb1002 allocated, network setup, and ready for handoff, per rt4608 [21:44:43] Logged the message, RobH [21:49:27] AaronSchulz: ok, going to deploy it now [21:49:45] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50877 [21:59:31] New patchset: Demon; "Updating gerrit to 2.5.2-1482-g9328c64" [operations/debs/gerrit] (master) - https://gerrit.wikimedia.org/r/51280 [21:59:36] New review: Demon; "New war can be found: https://integration.mediawiki.org/nightly/gerrit/wmf/gerrit-2.5.2-1482-g9328c6..." [operations/debs/gerrit] (master) - https://gerrit.wikimedia.org/r/51280 [22:03:01] New patchset: Reedy; "Simplify wmgCollectionArticleNamespaces" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51281 [22:04:14] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51281 [22:09:19] New patchset: Reedy; "Break really long strings in wgOverrideSiteFeed" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51282 [22:09:46] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51282 [22:10:03] notpeter: did you force a run? [22:10:49] nope [22:12:04] do you want me to? or shall we just wait another 10 minutes? [22:13:01] meh, we can wait then [22:15:05] New patchset: Reedy; "Add wikimania dblist" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51283 [22:15:26] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51283 [22:16:53] !log reedy synchronized wmf-config/ [22:16:55] Logged the message, Master [22:36:33] !log installing package upgrades on bast1001 [22:36:35] Logged the message, Master [22:40:37] apergos: paravoid: for swift are there any pairs of racks that are more likely to be simultaneously offline than other pairs? (e.g. because they share a power source) [22:46:08] !log rebooting bast1001 [22:46:10] Logged the message, Master [22:47:45] New patchset: Reedy; "Simplify wikimania config items by refactoring common config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51288 [22:49:42] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51288 [22:52:05] !log reedy synchronized wmf-config/ [22:52:07] Logged the message, Master [22:54:30] Reedy: bits 502 Bad Gateway [22:54:36] !log installing package upgrades on zirconium [22:54:38] Logged the message, Master [22:55:24] screwed the request panel, can't see which server it was [22:55:28] refresh shows normal [22:56:07] PROBLEM - SSH on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:56:17] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:57:15] !log rebooting zirconium (planet) [22:57:18] Logged the message, Master [22:57:57] RECOVERY - SSH on palladium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [22:58:07] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 0.001 second response time [22:58:37] PROBLEM - Host zirconium is DOWN: CRITICAL - Host Unreachable (208.80.154.41) [23:00:57] RECOVERY - Host zirconium is UP: PING OK - Packet loss = 0%, RTA = 0.17 ms [23:00:58] RECOVERY - Host zirconium is UP: PONG OK - Packet loss = 0%, RTA = 0.17 ms [23:01:22] .. [23:02:53] PROBLEM - rogue IRC bot detected [23:03:45] lol [23:10:08] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: Connection refused [23:11:51] AaronSchulz: https://ganglia.wikimedia.org/latest/?c=Jobrunners%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [23:12:10] you seem to be pushing the load pretty high... let's watch! [23:12:44] shouldn't change much [23:12:55] the second loop is just doing check...sleep 5... loops [23:13:22] cool [23:13:32] *shouldn't* [23:13:36] yeah [23:13:39] I'll keep an eye still [23:14:02] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 633 bytes in 0.608 second response time [23:14:05] also why is mw1001 always more loaded then the others? [23:14:16] notpeter: is it special? [23:14:40] AaronSchulz: it's a special and unique snowflake, like the rest of our boxes [23:15:09] New patchset: Jdlrobson; "Story 428: update schemas to latest revisions" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51291 [23:17:52] PROBLEM - SSH on erzurumi is CRITICAL: Connection refused [23:18:52] RECOVERY - SSH on erzurumi is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [23:24:15] random: do we have a local-ish Ubuntu mirror for staff use? [23:24:55] As in, in the office? [23:26:14] !log Running sync-common on mw1111. Reporting failed opening wmf-config/mc.php [23:26:16] Logged the message, Master [23:26:32] not in the office - we have a 100mbit link [23:37:11] does ubuntu have an equivalent to http.debian.net ? [23:37:21] greg-g: ? [23:39:35] deb mirror://mirrors.ubuntu.com/mirrors.txt .. [23:40:13] mutante: that doesn't quite answer my question [23:40:38] interesting that there's a bare IP (not a hostname) for one of the entries in that list though [23:41:34] it doesnt? you wanted automatic selection of a mirror close to you, right [23:41:44] it's supposed to do that [23:43:00] http://askubuntu.com/questions/37753/how-can-i-get-apt-to-use-a-mirror-close-to-me-or-choose-a-faster-mirror [23:43:16] http://mvogt.wordpress.com/2011/03/21/the-apt-mirror-method/ [23:45:35] mutante: is there one in that mirror list that is in our office? That's my real question. I know about the Ubuntu mirror network and how to get a good one :) [23:46:38] hmm, not that i know of, would have to ask OIT. there is apt.wikimedia , but that is not in our office [23:46:43] [23:26:31] not in the office - we have a 100mbit link [23:47:00] greg-g: no [23:48:30] they might have DVDs [23:48:33] \o/ https://wikitech-static.wikimedia.org/wiki/Main_Page [23:48:41] a valid certificate! [23:48:55] h4x [23:48:57] Anyone know why my labs instance is upgrading from 5.3.10-1ubuntu3.5 to 5.3.10-1ubuntu3.4+wmf1? Is 3.4+wmf1 > 3.5? [23:49:25] LeslieCarr: thanks :) [23:51:42] um… php5-mysql that is [23:52:09] Ryan_Lane: I'm guessing that wikitech-static is the read-only, off-cluster copy of the future labsconsole/wikitech merger? [23:52:15] guillom: yep [23:52:19] nice [23:52:48] Ryan_Lane: and also off-virt0 (not just off cluster)? [23:52:52] where is it? [23:52:59] rackspace cloud [23:53:07] andrewbogott, Wikimedia packages are preferrred [23:53:22] init.pp: # prefer Wikimedia APT repository packages in all cases [23:53:27] MaxSem: Yeah, the mystery is how this instance got a newer non-wiki version in the first place [23:54:05] I was thinking that something changed in the repo, but maybe a VM user installed that package via dpkg or something [23:54:26] Ryan_Lane: ahh, it's not the star cert, cool [23:54:47] yeah. I wouldn't install that outside of the cluster [23:55:00] hence why i was wondering where it was :) [23:55:11] do we really need SSL at all if it's read-only? [23:55:12] New review: Pyoungmeister; "I did intend to load-balance for all pools. all of the shards get hickups, just none as badly as pool4." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43029 [23:55:15] no one's logging in [23:58:07] New patchset: Lcarr; "adding in icinga-admin site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51300 [23:58:09] New patchset: Hashar; "make frwiki thumbs the default again" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51182 [23:58:45] Reedy: sam the patch above would fix up the thumbnails on frwiki which are a largely varying from the default config :-] [23:59:02] that will save a bit of processing on our thumbnails infrastructure [23:59:04] jeremyb_: Hardly a big cost ;) [23:59:33] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/51182 [23:59:38] Reedy: well it still needs renewing, etc. anyway, not complaining [23:59:52] It's probably a bit neater too