[00:13:11] New patchset: Lcarr; "updating nrpe package" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42054 [00:14:07] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42054 [00:23:31] New patchset: Ryan Lane; "Actually deploy deploy_redis" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42056 [00:24:53] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42056 [00:32:17] New patchset: RobH; "adding ersch to poolcounter pool" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42057 [00:32:32] !log olivneh synchronized php-1.21wmf6/extensions/EventLogging/includes/JsonSchemaHooks.php [00:32:40] Logged the message, Master [00:34:13] TimStarling: ^ if you dont mind =] [00:34:29] then i'll push it out and we can make sure i dont crash the site. [00:34:40] i just added ersch to the pool, rather than replacing tarin right now. [00:34:57] New patchset: Lcarr; "fixing puppet error on certs.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42058 [00:35:06] why are you leaving tarin in? [00:35:19] paranoia. [00:35:30] but can just replace if you think its ok. [00:35:35] i'll just cherry pick and fix. [00:36:00] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42058 [00:37:04] Change abandoned: RobH; "replace, not append" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42057 [00:38:03] New patchset: RobH; "ersch to replace tarin as poolcounter" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42059 [00:38:07] there's not much that can go wrong with it [00:38:16] ok, now its just a replace [00:38:42] TimStarling: ok, im happy to do the merge and the like then, just glad you are about in case i break it =] [00:38:53] I'll take care of it and ping you when done and tested (or if i bork it) [00:38:58] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42059 [00:39:14] thx, pushing it out from fenari now [00:40:38] i will give anyone $5 right now who can figure out why the fuck puppet fails on neon and gives me http://pastebin.com/g1yrHtWb [00:42:17] LeslieCarr: https://groups.google.com/forum/?fromgroups=#!msg/puppet-bugs/Jku_3Dz2MFk/t3owclKIP0EJ [00:42:23] "This is a bug in Ruby’s StringScanner C-library. It breaks on very large input strings." [00:43:00] but the only thing that changed from when it worked to when it fails is nagios-nrpe-server to icinga-nrpe-server [00:43:11] it's a 6 character diff! [00:46:03] LeslieCarr: is there a file that is marked backup => true? [00:46:31] perhaps a file backup is piggybacking on the change: according to http://projects.puppetlabs.com/issues/8229 [00:46:32] dunno [00:47:25] hrm, don't have that phrase in our puppet repo :( [00:48:10] * RobH is confused on deploying mediawiki stuff now [00:48:17] i have it merged in gerrit, and i pull and see nothing. [00:48:52] RobH: gerrit link? [00:49:03] https://gerrit.wikimedia.org/r/#/c/42059/ [00:49:10] So I made the change, and Tim reviewed and merged. [00:49:22] also on wikitech.wikimedia.org/view/How_to_deploy_code [00:49:41] bad pasted http://wikitech.wikimedia.org/view/How_to_deploy_code [00:50:01] New patchset: Ryan Lane; "redis port must be an int, not a string" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42060 [00:50:14] I am also insanely scared of crashign the site [00:50:18] as I have not done it in a couple years now [00:50:20] RobH: I just git-pulled and it grabbed it [00:50:26] wmf-config/PoolCounterSettings.php | 2 +- [00:50:29] RobH: don't be a wuss [00:50:42] probably just gerrit being slow [00:51:41] i just got a different error - "err: Could not retrieve catalog from remote server: Connection timed out - connect(2) [00:51:42] " [00:51:49] nice to see stafford is switching it up [00:52:00] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42060 [00:52:29] the how to deployh doesnt cover 'you added a single ip change in a config file' [00:52:46] sync-file wmf-config/PoolCounterSettings.php [00:53:17] ok, so git state on fenari is right [00:53:24] though i have no idea how that happened since I didnt do it [00:53:25] but meh. [00:53:32] TimStarling: thx, im overthinking i suppose [00:53:37] ori-l probably did it [00:53:59] !log robh synchronized wmf-config/PoolCounterSettings.php [00:54:04] I deserve this for not pushing changes in so long ;] [00:54:08] Logged the message, Master [00:54:40] !log poolcounter test shows numbers rising on ersch [00:54:45] seems it is successful \o/ [00:54:49] Logged the message, RobH [00:54:51] TimStarling: seem ok to you? [00:54:59] yes [00:55:15] huzzah, cool, tomorrow i'll reinstall tarin and push it out as another poolcounter server [00:55:23] so we'll have two live in tampa [00:55:29] sounds good [00:55:37] thx for help tim and ori-l [00:55:41] i appreciate it [00:56:31] no problem [00:57:01] I'm glad you didn't have to do something really scary, like switch enwiki over to a new major version of MW, it might have taken a month ;) [00:59:23] well, that wouldnt happen [00:59:30] cuz i know better than to try to help out on that ;] [01:00:03] binasher: woooooooo! thanks :) [01:01:35] !log dns update to move tarin to internal ip [01:01:44] Logged the message, RobH [01:02:08] ori-l: the restarts are actually happening right now, staggered [01:02:44] !log restarting all bits varnish instances with a 180s stagger between each (applying shm_reclen change) [01:02:53] Logged the message, Master [01:02:59] * ori-l fastens his seatbelt. [01:07:07] New patchset: Ryan Lane; "Add deploy-info script to deployment hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42062 [01:09:49] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42062 [01:10:20] New patchset: RobH; "tarin moving to internal ip for reinstall" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42063 [01:10:22] ^^ new deployment system updates status for all minions in redis [01:10:42] there's also a script to report info about current status of minions that reads from it [01:10:53] we could also write a simple web interface to display the info as well [01:11:42] New review: RobH; "this couldnt possibly break anything...." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/42063 [01:11:51] TimStarling: does captcha really need the width/height params for the tag [01:11:53] New review: RobH; "this couldnt possibly break anything...." [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/42063 [01:11:54] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42063 [01:12:04] getting them requires getimagesize atm...unless those were part of the name [01:17:31] I guess not [01:18:05] TimStarling: I'd like to kill that to avoid the localReference call [01:18:33] binasher: how did it go? [01:18:41] New patchset: RobH; "tarin moving from wikimedia.org to pmtpa.wment" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42065 [01:19:25] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42065 [01:22:01] New patchset: Ryan Lane; "Block bingbot (fake crawler)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42066 [01:22:33] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42066 [01:22:47] ori-l: looks good.. no complaints [01:44:04] TimStarling: maybe the size attributes are useful when the file is missing ;) [01:45:03] though that's already a awful situation [01:56:44] binasher excellent, thanks again [02:19:05] could someone checksum ircecho for me? (from e.g. manganese or mchenry? or diff it against svn @ http://svn.wikimedia.org/viewvc/mediawiki/trunk/debs/ircecho/ircecho?view=markup ?) [02:26:46] !log LocalisationUpdate completed (1.21wmf6) at Thu Jan 3 02:26:45 UTC 2013 [02:26:57] Logged the message, Master [02:46:38] !log LocalisationUpdate completed (1.21wmf7) at Thu Jan 3 02:46:37 UTC 2013 [02:46:47] Logged the message, Master [03:05:54] all of these !logs with gerrit links make it hard to find !logs about gerrit ;-) [03:06:05] anyone around? [03:06:26] is manganese / mchenry lucid or precise? [03:07:48] wow, 746 days of uptime [03:07:52] that answers that ;-) [03:08:12] (for mchenry) [03:10:44] Err Changelog for ircecho (http://apt.wikimedia.org/wikimedia/pool/main/i/ircecho/ircecho_1.3.changelog) 404 Not Found [03:11:14] but i guess 1.3 means that it's not been updated [03:11:25] (the svn changelog says 1.0-1) [03:15:16] * jeremyb grumbles [03:15:56] the version installed on precise in labs (by aptitude) is older than the version in SVN? by like 24 hours is [03:16:04] s/ is$// [03:23:52] (i did get at the changelog from `apt-get source ircecho`... it's up to 1.3 so it is ahead of svn. but the code itself is *behind* svn??) [06:39:49] New patchset: Ryan Lane; "Update git server info for submodules" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42073 [06:44:26] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42073 [07:58:45] New patchset: Ryan Lane; "Clarify message for minions that timeout deploy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42075 [07:59:31] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42075 [09:04:22] New patchset: Ryan Lane; "Add timestamps to returner and deploy-info" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42078 [09:12:24] New patchset: Ryan Lane; "Add timestamps to returner and deploy-info" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42078 [09:20:48] New patchset: Ryan Lane; "Add timestamps to returner and deploy-info" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42078 [09:28:35] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42078 [10:46:25] afk for a while (early lunch again) [11:38:30] New patchset: Nemo bis; "(bug 42105) Restore normal bureaucrat permissions where changed without consensus" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33390 [15:37:25] New patchset: Ottomata; "Including Dan Andreescu on analytics nodes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42092 [15:39:14] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42092 [15:55:34] New patchset: MaxSem; "Use FQDN for Solr replication" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/40569 [16:32:17] I'm getting pages [16:32:20] looking [16:32:30] can someone look at nagios-wm in the meantime? [16:32:58] I just got them both (critical and ok) [16:34:06] yeah, it's not okay [16:34:16] can you take a look on why nagios-wm is silent? [16:35:23] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [16:35:23] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [16:35:23] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [16:35:31] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [16:35:31] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [16:35:31] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [16:35:40] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:35:45] probably that same netsplit that got gerrit-wm [16:35:49] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [16:35:49] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [16:35:53] I restarted it [16:35:58] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [16:35:58] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [16:36:16] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [16:36:17] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [16:36:25] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [16:36:25] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [16:36:25] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [16:36:34] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [16:36:34] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [16:36:34] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [16:36:43] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [16:36:52] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [16:36:53] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [16:36:53] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [16:36:53] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [16:37:01] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [16:37:01] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [16:37:02] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [16:37:02] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [16:37:02] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [16:37:10] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [16:37:10] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [16:37:10] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [16:37:19] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [16:37:19] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:37:28] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [16:37:37] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [16:37:46] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [16:37:46] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [16:37:46] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [16:37:55] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [16:37:55] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [16:37:56] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [16:38:04] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [16:38:04] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [16:38:05] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [16:38:13] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [16:38:22] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [16:38:22] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [16:38:31] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [16:38:32] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [16:38:32] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [16:38:32] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [16:38:40] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [16:38:40] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [16:38:41] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [16:38:41] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [16:38:49] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [16:38:49] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [16:38:50] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [16:38:58] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [16:38:58] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [16:38:58] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [16:39:16] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:39:25] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [16:39:25] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [16:39:26] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [16:39:34] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [16:39:42] wtf [16:39:43] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [16:39:43] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [16:39:52] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [16:39:52] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [16:39:53] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [16:40:01] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [16:40:01] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [16:40:10] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [16:40:13] ottomata: ^^^ [16:40:19] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [16:40:19] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [16:40:20] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [16:40:20] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [16:40:20] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [16:40:20] although it looks like a nagios-wm bug [16:40:28] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [16:40:28] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [16:40:28] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [16:40:28] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [16:40:29] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [16:40:29] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [16:40:37] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [16:40:37] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [16:40:46] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [16:40:55] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:41:04] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [16:41:04] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [16:41:13] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [16:41:22] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [16:41:23] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [16:41:31] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [16:41:31] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [16:41:31] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [16:41:31] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [16:41:40] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [16:41:49] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [16:41:58] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [16:41:59] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [16:41:59] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [16:41:59] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [16:42:07] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [16:42:07] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [16:42:07] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [16:42:08] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [16:42:08] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [16:42:11] whoa [16:42:16] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [16:42:16] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [16:42:17] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [16:42:25] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [16:42:25] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [16:42:25] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [16:42:43] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:42:52] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [16:42:52] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [16:42:53] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [16:43:01] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [16:43:01] weird [16:43:05] that is definitely not true though [16:43:09] i just ran puppet on analytics1001 [16:43:19] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [16:43:19] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [16:43:20] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [16:43:20] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [16:43:20] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [16:43:28] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [16:43:28] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [16:43:46] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [16:43:46] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [16:43:47] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [16:43:47] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [16:43:47] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [16:43:52] ottomata: can those hosts access the nsca server on spence? [16:43:55] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [16:43:55] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [16:43:55] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [16:43:55] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [16:43:56] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [16:43:56] I have a feeling nagios-wm will soon be excess flood'ed [16:43:56] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [16:43:56] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [16:44:04] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [16:44:04] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [16:44:13] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [16:44:22] RECOVERY - Puppet freshness on analytics1012 is OK: puppet ran at Thu Jan 3 16:44:14 UTC 2013 [16:44:22] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:44:31] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [16:44:31] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [16:44:40] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [16:44:49] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [16:44:58] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [16:44:58] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [16:44:59] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [16:44:59] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [16:44:59] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [16:45:07] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [16:45:16] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [16:45:25] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [16:45:25] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [16:45:34] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [16:45:34] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [16:45:34] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [16:45:34] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [16:45:34] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [16:45:43] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [16:45:43] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [16:45:43] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [16:45:43] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [16:45:52] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [16:45:52] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [16:45:53] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [16:45:53] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [16:46:10] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:46:19] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [16:46:19] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [16:46:19] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [16:46:28] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [16:46:46] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [16:46:46] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [16:46:47] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [16:46:47] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [16:46:55] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [16:46:55] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [16:47:04] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [16:47:13] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [16:47:13] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [16:47:14] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [16:47:14] PROBLEM - Varnish HTCP daemon on cp1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:47:14] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [16:47:22] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [16:47:22] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [16:47:23] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [16:47:23] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [16:47:23] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [16:47:23] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [16:47:23] PROBLEM - Varnish HTTP upload-backend on cp1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:47:31] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [16:47:31] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [16:47:31] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [16:47:40] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [16:47:40] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [16:47:49] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:47:53] not true! [16:47:56] liar! [16:47:58] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [16:47:58] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [16:48:07] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [16:48:12] 3~The last Puppet run was at Wed Oct 24 16:52:30 UTC 2012 (102235 minutes ago). [16:48:15] root@analytics1001:~# [16:48:16] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [16:48:30] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [16:48:30] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [16:48:30] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [16:48:30] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [16:48:34] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [16:48:34] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [16:48:35] do you know where it gets that from? [16:48:43] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [16:48:52] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [16:48:52] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [16:49:01] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [16:49:01] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [16:49:01] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [16:49:01] I'm surprised nagios-wm hasn't flooded out yet [16:49:11] i'm looking for a script called 'puppet-FAIL' on spence right now [16:49:12] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [16:49:12] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [16:49:12] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [16:49:15] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [16:49:15] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [16:49:19] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [16:49:19] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [16:49:19] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [16:49:19] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [16:49:28] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [16:49:37] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:49:46] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [16:49:46] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [16:49:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:49:46] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [16:49:47] ottomata: i think that's a nagios function [16:49:55] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [16:50:03] gah [16:50:08] yeah, but it has to be defined somewhere [16:50:11] and its not defined in puppet [16:50:13] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [16:50:13] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [16:50:13] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [16:50:13] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [16:50:13] afact [16:50:15] it is. sec [16:50:15] afaict [16:50:18] never knew we had so many analytics servers [16:50:22] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [16:50:23] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [16:50:25] checkcommands just has [16:50:25] command_name puppet-FAIL [16:50:25] command_line echo "Puppet has not run in the last 10 hours" && exit 2 [16:50:31] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [16:50:36] /etc/nagios/checkcommands.cfg [16:50:38] okay [16:50:40] killed it for now [16:50:46] let's debug this now. [16:50:56] so that script kicks in if nagios doesn't receive the passive check [16:51:29] I think it's an SNMP trap [16:51:33] not a passive check [16:51:47] so 97-last-pupppet-rn updates the motd [16:51:50] which? the puppet freshness test? [16:51:51] I remember debugging firewalls about that in the past [16:51:54] which checks /var/lib/puppet/state/classes.txt [16:51:57] ottomata: yes, that stats classes.txt [16:51:57] which def is old [16:54:35] oohhh ok i know why, i've got a slightly funky puppet setup on these machines, the /var/lib/puppet directory wasn't really being used [16:54:47] think i just fixed on an01 [16:55:34] any idea on why puppet snmp trap doesn't fire? [16:55:38] realm is production on those, right? [16:55:57] hmm, yes think so [16:56:01] not sure what that means though [16:56:33] it's definitely the snmp trap [16:57:44] hmm I fixed 1 & 2 and now 1 is broken again [16:59:56] how'd you fix? [17:00:35] ran the trap by hand [17:00:44] what do you mean by funky puppet setup? [17:02:51] still got our own puppetmaster, but using a branch of operations/puppet, our puppet agent only runs when we tell it too, and the default one that points at operations puppetmaster is the one that is supposed to run regularly. The problem was that /etc/puppet/puppet.conf has its vardir (or whatever) pointed at a different dir than /var/lib/puppet, which I guess I need to fix. [17:02:57] how do you run the trap by hand? [17:03:22] the puppet nagios check is via snmp? [17:03:54] yes [17:04:24] is your own branch of operations/puppet overriding base::puppet or something? [17:04:37] no [17:04:52] it only is doing different things via analytics roles [17:05:02] does it not include standard? [17:05:25] i'm pretty sure something's wrong in your manifests [17:06:07] the branch does not include standard, [17:06:22] i did that because I didn't want to have to include any of the private stuff in my branch [17:06:25] okay [17:06:27] it's broken then [17:06:37] standard includes base and half of our stuff [17:06:40] base includes base::puppet [17:06:42] right but [17:06:44] base::puppet does the SNMP trap [17:06:53] i am still running off of the stafford puppetmaster regularly [17:06:57] which does include standard [17:08:12] how sure are you about that? [17:08:17] what does regularly mean? [17:08:21] at last once every 10h? [17:08:26] i mean, puppet-agent daemon [17:08:36] shoudl be contacting stafford [17:08:56] eh? [17:09:10] /etc/puppet/puppet.conf says server analytics1001.wikimedia.org [17:09:12] why would it? [17:09:16] can you fix all that? [17:09:21] I think we agreed that it needs fixing months ago :) [17:09:40] yeah, we talked about a lot of this, and that's why i'm running off of a branch of operations/puppet instead of my own repo [17:10:04] i def want to merge them together, but we need to figure out all that module git submodule stuff, so i've put that off for now [17:10:12] but, hmm, lemme check something [17:10:23] because this was def working, hmmm, at least on non an01 nodes [17:10:32] ahhh, that's it [17:10:34] ok ok [17:10:41] so puppet agent on an01 does not contact stafford [17:10:44] but on all the other nodes it does [17:10:49] it *shoudl* contact stafford [17:10:54] but obviously I did that wrong [17:12:22] so, paravoid, if you are troubleshooting the puppet snmp trap stuff, for now, can you try on analytics1002 [17:12:22] ? [17:12:30] i'll look at analytics1001 and fix this problem [17:25:14] am I right that name resolution doesn't work without FQDN across datacenters? [17:25:19] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [17:25:28] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [17:25:28] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [17:25:28] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [17:25:28] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [17:25:28] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [17:25:38] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [17:25:38] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [17:25:38] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:25:38] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [17:25:46] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [17:25:47] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [17:25:55] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [17:25:56] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [17:25:56] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:26:13] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [17:26:13] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [17:26:31] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [17:26:32] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [17:26:32] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [17:26:40] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [17:26:41] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [17:26:41] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [17:26:41] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [17:26:41] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [17:26:49] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [17:26:58] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [17:26:58] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [17:27:08] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [17:27:08] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [17:27:16] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [17:27:17] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [17:27:17] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [17:27:17] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [17:27:17] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [17:27:17] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:27:25] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [17:27:34] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [17:27:35] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [17:27:35] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [17:27:44] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [17:27:44] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:27:53] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [17:27:53] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [17:28:19] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [17:28:19] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [17:28:19] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [17:28:19] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [17:28:20] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [17:28:20] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [17:28:20] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [17:28:28] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [17:28:37] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [17:28:38] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [17:28:46] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [17:28:56] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [17:28:56] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [17:28:56] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [17:28:56] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [17:29:05] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [17:29:05] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [17:29:05] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [17:29:05] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:29:05] PROBLEM - Varnish HTTP upload-frontend on cp1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:05] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:29:13] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [17:29:14] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [17:29:14] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [17:29:23] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [17:29:23] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:29:23] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [17:29:31] PROBLEM - Varnish HTTP upload-backend on cp1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:40] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [17:29:41] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [17:29:59] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [17:29:59] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [17:30:07] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [17:30:08] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [17:30:08] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [17:30:08] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [17:30:08] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [17:30:16] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [17:30:17] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [17:30:25] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [17:30:26] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [17:30:34] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [17:30:35] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [17:30:44] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [17:30:44] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [17:30:44] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [17:30:44] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [17:30:44] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [17:30:44] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:30:53] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [17:30:53] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [17:31:01] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [17:31:02] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [17:31:02] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [17:31:10] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [17:31:19] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:31:20] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [17:31:20] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [17:31:46] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [17:31:47] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [17:31:47] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [17:31:47] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [17:31:47] ottomata: ping? :) [17:31:55] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [17:31:55] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [17:31:55] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [17:31:55] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [17:32:04] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [17:32:05] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [17:32:13] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [17:32:22] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [17:32:23] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [17:32:23] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [17:32:23] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [17:32:31] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [17:32:31] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:32:31] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [17:32:41] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [17:32:41] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [17:32:41] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [17:32:49] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [17:32:50] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [17:32:50] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [17:32:59] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:33:07] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [17:33:08] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [17:33:34] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [17:33:34] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [17:33:34] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [17:33:35] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [17:33:35] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [17:33:35] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [17:33:35] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [17:33:43] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [17:33:43] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [17:33:52] pong [17:33:52] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [17:33:53] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [17:34:00] are you working on it? [17:34:02] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [17:34:02] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [17:34:04] yeah [17:34:09] i know why analytics1001 isn't running puppet-agent properly [17:34:10] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [17:34:11] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [17:34:11] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [17:34:11] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:34:11] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [17:34:13] but i don't know why the others are reporting not run [17:34:19] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [17:34:24] they def are, and state/classes.txt is fresh on them [17:34:28] RECOVERY - Varnish HTTP upload-frontend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 9.066 seconds [17:34:29] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [17:34:29] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [17:34:29] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [17:34:29] check analytics1002 if you like [17:34:37] classes.txt has nothing to do with nagios [17:34:37] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [17:34:38] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [17:34:38] it says [17:34:40] ah ok [17:34:46] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:34:47] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [17:34:47] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [17:34:52] how does nagios get it? how does snmp check if puppet was run? [17:34:55] RECOVERY - Varnish HTTP upload-backend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 634 bytes in 0.054 seconds [17:34:57] oh i see [17:35:00] whenever puppet runs it rusn that exec [17:35:04] yes. [17:35:13] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [17:35:13] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [17:35:13] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [17:35:13] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [17:35:22] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [17:35:23] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [17:35:23] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [17:35:23] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [17:35:32] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [17:35:32] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [17:35:41] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [17:35:41] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [17:35:49] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [17:35:49] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [17:35:49] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [17:35:58] RECOVERY - Puppet freshness on analytics1002 is OK: puppet ran at Thu Jan 3 17:35:44 UTC 2013 [17:35:59] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:35:59] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [17:36:07] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [17:36:08] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [17:36:08] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [17:36:09] ok, lemme finsih an01 and i'll look at an02 more [17:36:17] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [17:36:17] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [17:36:17] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [17:36:17] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [17:36:26] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:36:34] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [17:36:35] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [17:36:35] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:37:01] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [17:37:02] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [17:37:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:02] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [17:37:02] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [17:37:02] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [17:37:11] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [17:37:11] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [17:37:11] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [17:37:11] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [17:37:19] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [17:37:19] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [17:37:28] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [17:37:29] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [17:37:37] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [17:37:38] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [17:37:38] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:37:38] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [17:37:46] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [17:37:55] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [17:37:56] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [17:37:56] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [17:37:56] RECOVERY - Puppet freshness on analytics1002 is OK: puppet ran at Thu Jan 3 17:37:50 UTC 2013 [17:38:00] interesting [17:38:04] that's me [17:38:05] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [17:38:05] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [17:38:05] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [17:38:13] ok, did you manually run the snmptrap command? [17:38:13] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [17:38:14] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [17:38:16] yes. [17:38:19] ok i tried that too [17:38:22] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:38:40] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [17:38:41] PROBLEM - Puppet freshness on analytics1004 is CRITICAL: Puppet has not run in the last 10 hours [17:38:49] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Puppet has not run in the last 10 hours [17:38:49] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [17:38:49] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [17:38:49] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [17:38:49] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [17:38:50] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [17:38:58] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [17:38:58] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [17:39:03] i'll stay off of an02 and look at an03 [17:39:06] while you look at 02 [17:39:07] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [17:39:08] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [17:39:16] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [17:39:17] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [17:39:17] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [17:39:19] I wonder why nagios-wm pings about it all the time [17:39:25] RECOVERY - Puppet freshness on analytics1003 is OK: puppet ran at Thu Jan 3 17:39:10 UTC 2013 [17:39:25] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:39:25] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [17:39:26] ya, that's weird [17:39:34] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [17:39:35] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [17:39:43] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [17:39:44] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [17:39:44] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [17:39:44] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [17:39:44] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [17:40:00] its really annoying to try to run puppet against stafford though [17:40:01] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [17:40:01] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [17:40:01] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [17:40:01] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [17:40:03] takes foreeverrrrrrr [17:42:49] paravoid: do you know how the passive check is involved? [17:43:25] no [17:43:55] we could probably adjust the passive check for these hosts in particular [17:44:49] paravoid, re the stafford is bogged down problem [17:45:13] would it be difficult to spawn up a second puppetmaster machine and have half of our nodes point at stafford, and the other half at the new one? [17:45:30] it needs someone to do it [17:45:39] as is stafford, to see why it's slow [17:45:53] happy to volunteer some time for that in a week or two [17:46:08] let's bring it up on the next meeting [17:46:15] ok cool [17:46:27] Ryan_Lane was also interested in doing that but said he was too busy [17:46:42] there are probably usual ways that people scale puppetmaster that I don't know about [17:46:57] it's usually more workers [17:46:59] paravoid had some good ideas [17:47:20] but I think before we scale it we should figure out why it's getting slower [17:47:25] also I've noticed a pattern [17:47:35] it works fine, and then at times it's getting stuck [17:47:43] for a while, then unstuck after that [17:47:50] maybe we're not distributing the load evenly, dunno [17:48:13] yeah [17:48:23] yeah could be buncha machines just happen to run at once [17:48:24] someone just needs to spend time to analyze this and figure it all out [17:49:11] aye [17:49:39] as for why nagios is notifying so aggressively for these--the passive check has a notification_interval = 0 [17:49:41] interesting, i think that would be fun, will try to find some time for that sometime if no one else does, yeah lets talk about it at next meeting fo sho [17:50:27] Jeff_Green: 0 is one time only iirc [17:50:32] haven't played with nagios in a while though [17:50:37] i haven't either [17:50:54] and I'm not having much luck finding the documentation :-P [17:52:19] paravoid: looks like the units are minutes [17:52:59] ah 0 is indeed first only [17:53:59] ok, puppet on analytics1001 seems better [17:54:01] jsut got this: [17:54:02] notice: /Stage[main]/Base::Puppet/Exec[puppet snmp trap]/returns: executed successfully [17:54:05] sooo, that's good [17:55:17] oh deaqr [17:55:30] # grep -c 'Puppet freshness' /etc/nagios/puppet_checks.d/analytics1016.cfg [17:55:33] 7 [17:55:35] New patchset: Reedy; "(bug 43310) Import sources for it.wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/40563 [17:55:44] hahahaha [17:55:49] has anyone looked at why jenkins is failing all puppet changes? [17:56:03] no [17:56:14] :( hashar isn't here [17:56:21] ^demon was looking for him too [17:56:33] that last one by Reed passed jenkins though [17:56:40] just everything in operations/puppet ? [17:56:45] oh a mutante [17:56:52] hey paravoid:) [17:56:57] hey :) [17:57:04] do you have some time to have a look at nagios? [17:57:05] hmmm intersesting, why so many puppet-FAILs? [17:57:34] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/40563 [17:57:34] I'm swamped with stuff [17:57:41] ottomata: puppet has a bad habit of encrufting the nagios config [17:57:42] Ευτυχισμένο το Νέο Έτος! [17:57:56] I didn't realise mutante was also Greek ;) [17:57:59] haha [17:58:07] you too! [17:58:12] paravoid: about anaylytics checks? [17:58:13] don't know how to say that in German :P [17:58:17] yeah [17:58:31] I've stopped nagios-wm as it wasn't very useful [17:58:46] alright, looking.. i am just surprised anything changed..it does not seem like we touched it recently [17:59:00] New patchset: Reedy; "(bug 29692) Per-wiki namespace aliases shouldn't override (remove) global ones" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/25737 [17:59:14] i had a suspicion it's because Nagios keeps being restarted for some reason [17:59:30] oh is it? [17:59:55] New patchset: Reedy; "(bug 43411) Import sources for se.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/40561 [17:59:59] anyway I have to get back to ceph [18:00:04] just a wild guess that may explain why it keeps re-checking [18:00:05] ok [18:00:10] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/40561 [18:00:21] spend all night troubleshooting a bug with upstream [18:00:22] try this: [18:00:27] grep service_description *|sort|uniq -c|sort -n [18:00:43] it has 12 defs for checking ssh on db62 [18:00:46] oh, do we have tons of duplicate checks again? [18:00:51] yes [18:00:52] New patchset: Reedy; "(bug 40879) Enable Collection on ba.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41809 [18:00:57] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41809 [18:01:13] puppet will recreate all this if we move it aside, right? [18:01:22] paravoid, thanks for your help with that, sorry for the puppet annoyance [18:02:13] New patchset: Reedy; "Added Babel category names for Ossetian Wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41967 [18:02:19] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41967 [18:03:20] New patchset: Reedy; "(bug 42721) tr.wikisource has been renamed into Vikikaynak" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41813 [18:03:24] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41813 [18:05:32] * Nemo_bis relieved by rebase not failing [18:05:33] !log reedy synchronized wmf-config/InitialiseSettings.php [18:05:43] Logged the message, Master [18:06:11] ok. it's time to rewrite the git-deploy perl script [18:06:35] Ryan_Lane, that hopeless?:) [18:06:46] it works to a point [18:07:00] but I want to separate the fetch and checkout parts of deployment [18:07:07] and I can't do that with this script [18:07:48] sometimes fetch can take 2-3 minutes [18:08:37] Jeff_Green: so i hear db62 is decom'ed [18:08:48] Icinga does not have the duplicate check issue ,btw [18:08:59] because there we put all services in one file again [18:09:12] Nagios has that issue since it was changed to have a separate file for each host [18:09:34] mutante: yeah. i noticed the other day that puppet (?) keeps remaking the config for storage3 even though storage3 has been long removed from site.pp etc [18:10:05] i couldn't get motivated to dig in and figure out why it is so broken [18:10:27] New patchset: Reedy; "(bug 42933) Initial configuration for es.wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38054 [18:11:23] yea, the thing is just that this is not a new thing. we can just wipe configs and let puppet recreate them once [18:11:42] and/or switch to icinga ..or switch Nagios back to using just one large file [18:11:49] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38054 [18:12:31] what's the holdup for icinga? [18:12:46] LeslieCarr: around? [18:12:53] mutante: is there some other data store? i couldn't see any reason puppet would recreate a file for a host that's been removed from git/puppet [18:13:05] LeslieCarr: I've started connecting to bast1001 via ipv6, as I get better latency [18:13:33] LeslieCarr: and since then, I've noticed interruptions every now and then (hours) [18:13:49] LeslieCarr: mtr shows reaching xe-4-0-0.was10.ip6.tinet.net but then nothing after that (it last for a minute or two) [18:14:14] paravoid: just the NRPE package thing afaik [18:14:22] but Leslie built a package yesterday [18:14:39] New patchset: Reedy; "Set timezone for eswikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42101 [18:14:50] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42101 [18:15:13] yeah, I'm not very fond of that solution :) [18:19:41] so hm, paravoid, puppet runs just fine against stafford on everything now (except for taking a long time) [18:19:43] i even see [18:19:44] notice: /Stage[main]/Base::Puppet/Exec[puppet snmp trap]/returns: executed successfully [18:19:51] but I don't see the status changing in nagios [18:19:58] oh you stopped nagios-wm, is that why? [18:20:44] ottomata: which host? [18:20:56] hm analytics1003 [18:21:00] is the one i'm trying now [18:21:12] here's the direct link http://nagios.wikimedia.org/nagios/cgi-bin/status.cgi?host=analytics1003 [18:21:23] so not related to missing IRC bot [18:21:33] hrmmm [18:21:37] oh nagios-wm is the bot, duh ok [18:21:42] yea [18:22:09] just noticing another issue.. see how Packetloss Average is STALE [18:22:12] yeah, and I think if I run the snmptrap manually, it fixes [18:22:16] yeah, that is on my list of todos to fix [18:22:20] I just added that [18:22:32] took a while to figure out how to get it in nagios (never done that before) [18:22:35] New patchset: Dereckson; "(bug 43310) Import sources for it.wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/40565 [18:22:39] but, it won't get the correct value from ganglia [18:22:51] but, that's unrelated to this problem [18:23:47] i'm rnning the snmptrap command manually now [18:26:02] LeslieCarr, if you get any emails complaining about me impersonating bugzilla accounts please let me know (I believe you look after the noc@ mailbox) [18:26:20] ottomata: gotcha, alright [18:27:43] ottomata: and its fixed [18:27:50] puppet freshness on 1003 that is [18:28:10] will do [18:28:18] impersonating bugzilla ? [18:28:23] Thehelpfulone, I've just received one [18:28:26] oh, what was it? [18:28:41] hmm, i see it still as critical, but now it says for 1m [18:28:50] mutante^ [18:28:50] ottomata: i dunno, just being slow [18:28:56] hm [18:28:57] MaxSem, yep, bugzilla will send it to you because I sudoed your account and adjusted your preferences per https://bugzilla.wikimedia.org/show_bug.cgi?id=43610 [18:29:02] so, when I run the snmptrap command manually [18:29:03] ugh, it just switched back ..dman [18:29:13] 'Last Update' is updated [18:29:18] but, the status does not change [18:29:42] but that email also says " If you feel that this action was inappropriate, please contact [18:29:42] noc@...." so just in case anyone does feel it's inappropriate :) [18:29:57] okay :) [18:30:06] yeah, some of my saved searches are popular [18:30:47] mutante, do you know how nagios decides that this check is critical? is it the freshness_threshold 36000 ? [18:30:56] ottomata: let me delete the entire analytics1003.cfg and wait for recreation [18:30:59] ok [18:31:21] ottomata: yes, if it does not receive anything at all within that time it turns critical... [18:31:35] or if it explicitly receives a CRIT [18:31:53] but it obviously is receiving something, since Last Update changes whenever I run the smtptrap command [18:32:18] ottomata: what commandline do you use [18:32:32] snmptrap -v 1 -c public nagios.wikimedia.org .1.3.6.1.4.1.33298 `hostname` 6 1004 `uptime | awk '{ split($3,a,":"); print (a[1]*60+a[2])*60 }'` [18:33:07] not sure what is up with that split, since the 3rd field in uptime is just the number of days [18:33:10] but it works [18:33:21] $ uptime | awk '{ split($3,a,":"); print (a[1]*60+a[2])*60; }' [18:33:21] 374400 [18:34:19] !log reedy synchronized php-1.21wmf6/extensions/WikimediaMaintenance/ [18:34:28] Logged the message, Master [18:35:48] !log restarting nagios on spence [18:35:57] Logged the message, Master [18:35:58] New patchset: Ryan Lane; "Fetch all submodules during the fetch stage" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42106 [18:36:17] Can someone please run something on fenari please? [18:36:17] chmod g+x /home/wikipedia/common/refresh-dblist [18:37:01] Reedy: done [18:37:20] it's tstarling:wikidev [18:37:28] Yup, but I can't run it :( [18:37:30] chmod g+w /home/wikipedia/common/all.dblist [18:37:32] ^ that too [18:37:47] -rw-r--r-- 1 dzahn wikidev 9550 Nov 9 19:45 all.dblist [18:37:49] * Reedy glares at mutante [18:37:50] ;) [18:38:23] g+w done [18:38:32] thanks [18:38:35] np [18:39:19] ottomata: now 1003 just has the Packetloss Average check.. this is just all slow.. checking again in a bit [18:39:31] ..and let's switch to Icinga [18:39:38] we've had this issue forever [18:40:22] ha ok [18:44:33] mutante, shoudl we clear out all of the analytics files then? [18:45:04] ottomata: yes, doing it [18:45:41] spence:/etc/nagios/puppet_checks.d# rm analytics10* [18:45:47] and restarted once again [18:45:48] shoudl we clear them all? [18:45:50] New patchset: Reedy; "Add eswikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42107 [18:46:40] Jeff_Green: we can, but no monitoring then until it is done recreating everything [18:46:44] which might be a while [18:47:08] hmm. i guess we could do it in batches [18:47:16] !log reedy synchronized wmf-config/ [18:47:25] Logged the message, Master [18:49:13] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [18:49:22] Logged the message, Master [18:50:35] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42107 [18:50:57] New patchset: Reedy; "(bug 42934) Initial configuration for pt.wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38057 [18:51:00] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [18:51:08] Logged the message, Master [18:51:09] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38057 [18:53:29] !log reedy synchronized wmf-config/ [18:53:37] Logged the message, Master [18:54:47] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [18:54:56] Logged the message, Master [18:55:40] New patchset: Reedy; "Add ptwikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42108 [18:56:22] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42108 [18:58:38] New patchset: Reedy; "Fix PHP notice of undefined constant DBO_DEFAULT when using addwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42109 [18:58:53] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42109 [18:59:17] !log reedy synchronized wmf-config/db.php [18:59:25] Logged the message, Master [19:00:18] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [19:00:46] New review: Reedy; "Needs rebasing" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/38801 [19:04:20] !log reedy synchronized php-1.21wmf7/extensions/WikimediaMaintenance/ [19:04:29] Logged the message, Master [19:04:35] New patchset: Ryan Lane; "Fetch all submodules during the fetch stage" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42106 [19:07:22] PROBLEM - Puppet freshness on solr2 is CRITICAL: Puppet has not run in the last 10 hours [19:08:07] anyone doing a super important puppet manual run ? [19:08:12] LeslieCarr: notpeter: and this is why puppet breaks on neon: [19:08:13] err: Could not retrieve catalog from remote server: Could not intern from pson: regexp buffer overflow [19:08:16] if not, i'm going to restart puppetmasterd in 10 minutes [19:08:21] regexp buffer overflow ?:o dang [19:08:30] it's had several different errors [19:08:36] err: Could not retrieve catalog from remote server: Connection reset by peer - SSL_connect [19:08:40] err: Could not retrieve catalog from remote server: Connection timed out - connect(2) [19:08:42] :) [19:08:52] heh, thats what Ryan posted yesterday too [19:09:18] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [19:09:46] New patchset: Dereckson; "Initial configuration for as.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38801 [19:09:56] ok, maybe in just 3 minutes [19:10:12] New patchset: Reedy; "Ensure list is sorted before writing (all.dblist wasn't)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42110 [19:10:35] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42110 [19:10:46] New patchset: Reedy; "Initial configuration for as.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38801 [19:10:54] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38801 [19:12:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:12:25] !log restarting puppetmasterd on stafford [19:12:32] (that wasn't me actually, i haven't done it yet) [19:12:35] Logged the message, Mistress of the network gear. [19:13:07] !log reedy synchronized wmf-config/ [19:13:16] Logged the message, Master [19:13:20] rsyslogd-2177: imuxsock begins to drop messages from pid 19445 due to rate-limiting [19:13:40] New patchset: Silke Meyer; "corrected typo related to https://gerrit.wikimedia.org/r/#/c/36353/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42111 [19:14:07] New patchset: MaxSem; "Enable Solr replication" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42112 [19:14:13] neon icinga: error executing command '/usr/lib/nagios/plugins/check_ganglios_generic_value': No such file or directory. [19:14:37] New review: MaxSem; "Should be deployed at a scheduled time." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/42112 [19:14:51] New patchset: Reedy; "Add aswikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42113 [19:14:59] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42113 [19:15:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 The environment must be purely alphanumeric, not - 283 bytes in 0.374 seconds [19:16:07] New review: Silke Meyer; "Hi Andrew, in your last changes with the ${install_path}, you introduced a typo corrected here." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/42111 [19:16:14] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [19:16:28] Logged the message, Master [19:18:01] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: remove aswikisource temporarily [19:18:10] Logged the message, Master [19:19:16] PROBLEM - Puppet freshness on solr3 is CRITICAL: Puppet has not run in the last 10 hours [19:19:16] PROBLEM - Puppet freshness on solr1003 is CRITICAL: Puppet has not run in the last 10 hours [19:19:27] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: and aswikisource back again [19:19:36] Logged the message, Master [19:20:20] PROBLEM - Puppet freshness on solr1001 is CRITICAL: Puppet has not run in the last 10 hours [19:20:37] PROBLEM - SSH on analytics1007 is CRITICAL: Connection refused [19:24:45] New review: Andrew Bogott; "This is obviously correct, despite Jenkins' complaint." [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/42111 [19:25:29] New patchset: Reedy; "Alphasort all.dblist in refresh-dblist" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42114 [19:25:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42114 [19:25:42] New patchset: Dereckson; "Clean wikivoyage namespaces configuration" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38063 [19:25:51] New patchset: Ryan Lane; "Followup on 6782d9a, use include - not import" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42115 [19:26:08] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42106 [19:26:19] mutante I just noticed that my reply from RT doesn't go to you. can you give my account "Cmcmahon(WMF)" edit privs on wikitech wiki also? [19:26:21] LeslieCarr: https://gerrit.wikimedia.org/r/#/c/42115/ [19:26:59] thank you ryan :) [19:27:03] yw [19:27:05] chrismcmahon: you should have it, i added you to "editors" right away [19:27:16] chrismcmahon: and replying to mail should also work..hrmmm [19:27:26] I've made that mistake before too. puppet's syntax is dumb [19:27:44] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42115 [19:28:17] chrismcmahon: it is just "cmcmahon" though.. did not add (WMF) [19:28:39] chrismcmahon: wikitech is not connected to central auth, so that accounts is all separate from other wikis [19:28:43] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [19:29:00] mutante: "Cmcmahon(WMF)" is my work handle, "Cmcmahon" is my old wikipedia handle [19:29:21] chrismcmahon: well, i created a new account called "cmcmahon" [19:29:32] do you want that to be renamed? [19:29:50] mutante: can I have both? and have edit privs on both? [19:30:28] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/38063 [19:30:29] both is silly [19:30:31] eh..why both? [19:31:09] New patchset: Andrew Bogott; "corrected typo related to https://gerrit.wikimedia.org/r/#/c/36353/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42111 [19:31:13] wikitech is not an open wiki for anyone to edit [19:31:16] PROBLEM - Puppet freshness on sockpuppet is CRITICAL: Puppet has not run in the last 10 hours [19:31:17] PROBLEM - Puppet freshness on tin is CRITICAL: Puppet has not run in the last 10 hours [19:31:19] OK, then I'd prefer "Cmcmahon(WMF)" to prevent any confusion with the other one. [19:31:27] mutante ^^ [19:31:36] so yea, just work handle is all thats needed. [19:31:38] New review: Reedy; "Needs rebasing" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/33390 [19:32:01] New patchset: Reedy; "(bug 29692) Per-wiki namespace aliases shouldn't override (remove) global ones" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/25737 [19:32:28] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.006 second response time on port 11000 [19:32:30] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42111 [19:33:29] renaming is a pain it seems. [19:33:44] chrismcmahon: it appears we dont have extension Renameuser there [19:34:28] i think the answer is going to become 'its how it is and thats all it is' [19:34:36] cuz spending time hacking at wikitech is painful ;P [19:34:42] * RobH isnt doing it so not his call [19:39:08] New patchset: MaxSem; "Use FQDN for Solr replication" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/40569 [19:39:27] chrismcmahon: i tried to rename it ..but you already had both ..so you did not need a new one and now there might be confusion because of that :p [19:39:52] Cmcmahon(WMF) (Created on 27 December 2012 at 19:23) [19:39:54] New patchset: MaxSem; "Enable Solr replication" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42112 [19:39:56] mutante: heh. all I really want is to be able to edit as Cmcmahon(WMF) [19:40:18] how did you get that other one on Dec 27? [19:40:40] sigh..ok.. tickets please to avoid this [19:41:18] changed group membership for User:Cmcmahon(WMF) from (none) to editor [19:41:21] done [19:41:33] thanks! [19:42:31] PROBLEM - NTP on analytics1007 is CRITICAL: NTP CRITICAL: No response from NTP server [19:44:17] i am going to disable the new one then. [19:47:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:49:01] !log temp stopping puppet on brewster [19:49:10] Logged the message, notpeter [19:53:47] * paravoid puts rt hat on [19:53:52] notpeter: want to take #3738? [19:54:41] paravoid: I think that's already done [19:54:50] I'm troubleshooting monitoring for them [19:55:21] that's about replication [19:55:29] New patchset: Bsitu; "Enable email batch for Echo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42121 [19:57:47] sure, i can help coordinate that [19:57:48] notpeter: see MaxSem's latest reply [19:57:49] Change merged: Bsitu; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42121 [19:58:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.373 seconds [20:01:56] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable email batch for Echo' [20:02:01] !log authdns-update for iodine [20:02:04] Logged the message, Master [20:02:13] Logged the message, RobH [20:09:24] notpeter, when we can schedule it? [20:10:47] Jenkins failed the deployment patch: https://gerrit.wikimedia.org/r/#/c/42119/, but the failure doesn't seem to be related to the patch, what can I do to get rid of the error? thx! [20:26:09] Anyone know what wg_enwiki is? [20:26:33] Newest revision timestamp is 20080811153730 [20:27:19] oh, working group [20:27:21] .. [20:31:09] New patchset: Reedy; "Remove deleted strategyappswiki from special.dblist" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42123 [20:31:46] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42123 [20:32:51] New patchset: Reedy; "Alphasort special.dblist too" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42124 [20:33:15] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42124 [20:35:10] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:38:15] !log reedy synchronized php-1.21wmf6/cache/interwiki.cdb 'Updating interwiki cache' [20:38:23] Logged the message, Master [20:38:32] !log reedy synchronized php-1.21wmf7/cache/interwiki.cdb 'Updating interwiki cache' [20:38:41] Logged the message, Master [20:49:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.043 seconds [20:50:37] RECOVERY - Varnish HTTP upload-frontend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 1.658 seconds [20:53:19] RECOVERY - Varnish HTCP daemon on cp1026 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [20:54:38] LeslieCarr: Not a big deal.. But doing traceroute I'm seemingly getting a hop (presumably a WMF switch) where it's timing out/not responding to ping. http://p.defau.lt/?bxsjA_TtD7X8YWMqBDJCvQ [20:54:45] s/switch/router/ [20:56:01] PROBLEM - Varnish HTTP upload-frontend on cp1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:57:32] RECOVERY - Varnish HTTP upload-backend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 634 bytes in 0.055 seconds [20:57:49] RECOVERY - Varnish HTTP upload-frontend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.054 seconds [20:58:00] anyone knows what's up with that? [20:58:23] (cp1026) [20:58:43] PROBLEM - Varnish HTCP daemon on cp1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:00:13] RECOVERY - Varnish HTCP daemon on cp1026 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [21:00:14] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [21:00:28] load avg 840 [21:02:55] PROBLEM - Varnish HTTP upload-backend on cp1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:03:04] PROBLEM - Varnish HTTP upload-frontend on cp1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:03:07] paravoid: re cp1026, is it still lucid? all the varnish hosts need to be upgraded to the latest version which has only been built for precise [21:03:20] oh nm, 1026 is precise [21:03:25] it's precise [21:03:26] yeah [21:03:37] it's swapping [21:05:38] PROBLEM - Varnish HTCP daemon on cp1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:05:55] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:06:23] please don't be shm_reclen, please don't be shm_reclen, please don't be shm_reclen.. [21:06:39] haha [21:06:43] * Reedy pets ori-l [21:07:23] binasher: are you investigating it? [21:08:00] paravoid: no [21:09:06] completely unresponsive [21:09:18] power cycle? [21:09:27] oh it's back again [21:10:43] binasher: want to investigate or should I just powercycle or restart varnish? [21:11:33] paravoid: go for it [21:12:13] PROBLEM - SSH on cp1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:12:21] i got a shell but it isn't responding at all, and we've got a meeting [21:12:34] yeah same here [21:13:52] RECOVERY - SSH on cp1026 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [21:13:56] !log powercycling cp1026, load spike, unresponsive [21:14:00] seriously? [21:14:05] Logged the message, Master [21:14:17] a second after I hit enter [21:14:19] amazing [21:16:02] AHHA!!!! i think nrpe.pp lines 160 to 165 caused the problem on neon [21:16:08] there was a 161 require => Package[nagios-nrpe-server], [21:16:15] and i had switched it to icinga-nrpe-server [21:16:25] what credentials should i be using to access ganglia ? [21:17:01] RECOVERY - Varnish HTTP upload-backend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 632 bytes in 0.053 seconds [21:17:13] New patchset: Lcarr; "for new icinga server, icinga-nrpe-server instead of nagios-nrpe-server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42129 [21:18:04] RECOVERY - Varnish HTCP daemon on cp1026 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [21:18:12] mutante: --^ you were on the ganglia thread last. what credentials can i use to access stats ? [21:18:15] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42129 [21:18:20] tfinc: hold on , i'll pm [21:18:24] thanks [21:19:39] AaronSchulz: Know of any scripts that use activeMWVersions currently? [21:21:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:24:04] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 186 seconds [21:24:08] aha, found in scap, I think [21:24:56] git review is so slooooow [21:24:58] New patchset: RobH; "yttrium and tarin were decom for ip change, removing as both will go back in service" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42132 [21:25:00] there we go [21:25:16] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 211 seconds [21:26:10] New review: RobH; "" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/42132 [21:26:12] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42132 [21:27:31] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [21:28:43] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [21:30:16] !log reinstalling tarin to internal ip for secondary poolcounter server in tampa [21:30:24] !log reedy synchronized php-1.21wmf6/cache/interwiki.cdb 'Updating 1.21wmf6 interwiki cache' [21:30:26] Logged the message, RobH [21:30:35] Logged the message, Master [21:30:42] !log reedy synchronized php-1.21wmf7/cache/interwiki.cdb 'Updating 1.21wmf7 interwiki cache' [21:30:51] Logged the message, Master [21:31:28] LeslieCarr: I think you owe yourself $5, then [21:31:59] hehehe [21:32:01] true! [21:32:03] :) [21:32:12] well let's see if puppet finally runs first [21:32:46] stafford is currently in its "oh my god world ending" mode [21:36:22] New patchset: Reedy; "Add script to update the interwiki cache for all active MediaWiki versions" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42133 [21:37:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.042 seconds [21:42:52] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [21:43:54] RECOVERY - Varnish HTTP upload-frontend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.056 seconds [21:45:43] RECOVERY - Puppet freshness on sockpuppet is OK: puppet ran at Thu Jan 3 21:45:14 UTC 2013 [21:45:54] hrm didn't actually fix it [21:46:04] might be due to the notify => Service nagios-nrpe-server perhaps [21:46:18] LeslieCarr: did you see my mail? [21:46:33] no need to have icinga-nrpe or whatever there [21:46:50] it's just a matter of wget && reprepro include [21:47:03] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [21:47:22] ah, new packages [21:47:27] ? [21:47:51] woot [21:48:46] New patchset: Pyoungmeister; "setting db61 as fake shard es2 master, and db62 as fake shard es2 slave" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42135 [21:51:21] we're talking about nrpe-plugin though, not nrpe-server [21:51:22] right? [21:53:11] server [21:53:25] New patchset: Lcarr; "once more, with feeling" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42136 [21:53:33] but it looks like server for quantal doesn't have that dependency [21:53:36] so trying that out :) [21:54:12] yay, the apt-get -s install appeared to be happy ... [21:55:30] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42136 [21:58:56] what ? [21:59:01] why is it unhappy now ?!?! [21:59:36] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42135 [22:00:35] i have to give my $5 back to myself [22:03:13] rfaulkner: so, we're considering using git-deploy for the next mw release. I'm at the point where I really think the perl script needs to be re-written before we proceed [22:04:41] hey. ok, are you in the office? It's still on my radar I've been involved in quite a bit of E3 work of late [22:05:11] not in the office today [22:05:12] will be tomorrow [22:05:21] when is the next mw release scheduled? [22:05:24] k [22:05:26] the 12th [22:05:45] I'll be able to work on it with you [22:06:28] k, yeah i think that would be helpful, I've got plenty of stuff on my plate, but i think we can hash it out in a couple of days [22:07:16] * Ryan_Lane nods [22:07:29] it shouldn't be amazingly hard to write [22:08:15] New patchset: Ryan Lane; "Move deploy-info to a non-root path" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42140 [22:08:47] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42140 [22:10:32] i give up, i have no idea what is wrong [22:10:45] LeslieCarr: with? [22:10:53] neon's puppet manifest [22:10:56] ah [22:11:04] it's still giving me that damn error [22:11:11] the catalog is compiled fine according to stafford [22:11:25] it's very likely to be an issue with a template [22:11:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:11:49] puppet's reporting on that is past abysmal and it won't report until after the catalog is compiled [22:12:07] i even changed the nrpe-server package back to one with the same name [22:12:09] it's the same as it was before [22:12:34] usually it's a variable that's defined twice, or a hash that isn't referenced properly [22:17:06] i've checked all the variables - there are like 3 [22:17:18] :( [22:17:19] they're all there, and there are equivalents in nagios and spence is happy [22:17:38] i did find we had one template put in two different directories, so at least i can remove the extra one [22:17:44] is it possible that a local variable has the same name as a global variable? [22:17:52] like one set via facter or at the node level? [22:18:33] !log reprepro includedeb (lucid|precise)-wikimedia mha4mysql-manager_0.55-0_all.deb mha4mysql-node_0.54-0_all.deb [22:18:41] Logged the message, Master [22:18:53] mha4? [22:19:21] is this some form of mysql cluster? [22:19:31] no [22:19:33] nagios_mysql_check_pass and <%= scope.lookupvar("nrpe::packagesnew::nrpe_allowed_hosts") %> are the only two [22:19:34] nope [22:19:41] LeslieCarr: :( [22:19:42] this is a single server, that was ahppy until yesterday [22:19:53] Ryan_Lane: I assume http://code.google.com/p/mysql-master-ha/ [22:20:11] https://github.com/yoshinorim/mha4mysql-manager ? [22:20:41] cool [22:23:57] RECOVERY - Puppet freshness on tin is OK: puppet ran at Thu Jan 3 22:23:48 UTC 2013 [22:25:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.026 seconds [22:31:20] mutante / notpeter can you check out neon and see if there's something i am not seeing that is why puppet is crashing ? [22:31:30] sure [22:31:48] what error are you getting? [22:33:13] New patchset: Aaron Schulz; "Switched captchas back to swift." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42143 [22:33:37] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42143 [22:34:00] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [22:34:45] !log aaron synchronized wmf-config/CommonSettings.php 'switched captchas back to swift' [22:34:53] Logged the message, Master [22:41:57] going to run scap [22:43:08] binasher: what version of redis are we using? [22:44:48] paravoid: you are my last hope - can you see anything wrong with neon's manifest ? [22:44:53] anything at all that would cause it to be insane [22:50:46] New patchset: Pyoungmeister; "removing cruft from previous test" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42148 [22:51:36] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42148 [22:52:10] New patchset: RobH; "tarin migrated to pmtpa.wmnet and reinstalled" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42149 [22:52:52] mutante: ^ no more self review ;] [22:53:58] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42149 [22:55:35] Ryan_Lane: https://github.com/wikimedia/Sartoris/pull/2 [22:56:07] this is informed from your email about the specs from last month [22:58:17] cool. looks good [22:59:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:59:20] !log bsitu Started syncing Wikimedia installation... : Update Echo to Master [22:59:28] Logged the message, Master [23:02:39] !log authdns-updateexit [23:02:42] bleh [23:02:48] Logged the message, RobH [23:10:55] !log olivneh synchronized php-1.21wmf6/extensions/EventLogging 'Updating display of rev IDs in schema subtitles' [23:11:04] Logged the message, Master [23:16:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.042 seconds [23:20:03] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [23:20:03] PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours [23:31:45] New patchset: Ryan Lane; "Add the sync_all salt state" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42200 [23:36:15] New patchset: Ryan Lane; "Reset .gitmodules before attempting a fetch" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42201 [23:36:19] New patchset: Asher; "node def for pc100[1-3]" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42202 [23:37:32] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42202 [23:47:18] New patchset: Ryan Lane; "Clone a repo if non-existent during fetch" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42205 [23:48:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:49:19] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42200 [23:49:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42201 [23:50:50] Thu Jan 3 23:48:45 UTC 2013 mw37 ptwikivoyage Error selecting database ptwikivoyage on server 10.0.6.74 [23:51:14] ?! [23:56:49] Reedy: ping [23:57:10] o_0 [23:57:26] It's there on db34 [23:57:38] 18:53 logmsgbot: reedy synchronized wmf-config/ .. 18:59 logmsgbot: reedy synchronized wmf-config/db.php [23:57:46] was db.php on fenari out of sync with git? [23:58:02] I don't think so.. [23:58:08] during addWiki there was a php notice [23:58:18] So I just shut that up with adding the if !defined define() [23:58:47] maybe it's not defined :D [23:59:48] That db host doesn't have the db..