[00:03:23] (03CR) 10CSteipp: "Should be fine" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153732 (https://bugzilla.wikimedia.org/61799) (owner: 10Spage) [00:46:51] PROBLEM - puppet last run on mw1026 is CRITICAL: CRITICAL: Puppet has 1 failures [01:05:51] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [01:30:17] (03CR) 10Ori.livneh: "IMO, this is risky and unnecessary, but it's your call to make." [operations/puppet] - 10https://gerrit.wikimedia.org/r/153577 (owner: 10Giuseppe Lavagetto) [01:41:38] (03PS4) 10Ori.livneh: wmflib: add ordered_yaml() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149775 [01:41:40] (03PS2) 10Ori.livneh: Clean up salt::minion [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 [01:42:06] (03PS5) 10Ori.livneh: wmflib: add ordered_yaml() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149775 [01:42:13] (03PS3) 10Ori.livneh: Clean up salt::minion [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 [01:42:28] (03PS2) 10Springle: tendril: add access for ldap groups ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/153116 (owner: 10JanZerebecki) [01:49:46] (03CR) 10Springle: [C: 032] tendril: add access for ldap groups ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/153116 (owner: 10JanZerebecki) [02:18:31] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 00:17:37 UTC [02:28:12] (03CR) 10MZMcBride: "Huh, "parametrize" is the British spelling; I nearly changed it to "parameterize." Thanks, Wiktionary!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 (owner: 10Ori.livneh) [02:29:09] Carmela: Wait, the British spell something with -ize as opposed to -ise? I'm shocked [02:29:34] Oh, it's just the alternate form. [02:29:39] https://en.wiktionary.org/wiki/parametrize [02:29:48] I can't read. [02:30:01] Oh it's the alternate US form [02:30:31] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:30:54] (03CR) 10MZMcBride: "Err, alternate spelling, not British spelling. Roan points out that the British alternate spelling wouldn't use -ize, of course." [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 (owner: 10Ori.livneh) [02:31:53] Alternate spelling v. alternate form. Tricky. [02:32:31] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [02:41:39] !log LocalisationUpdate completed (1.24wmf15) at 2014-08-13 02:40:36+00:00 [02:41:46] Logged the message, Master [02:45:31] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 31 data above and 0 below the confidence bounds [02:45:31] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 31 data above and 0 below the confidence bounds [03:09:21] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Epic puppet fail [03:12:42] !log LocalisationUpdate completed (1.24wmf16) at 2014-08-13 03:11:38+00:00 [03:12:48] Logged the message, Master [03:14:31] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 01:13:27 UTC [03:17:31] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Wed Aug 13 03:17:25 UTC 2014 [03:29:12] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [03:32:41] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Wed Aug 13 03:32:33 UTC 2014 [04:03:30] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Aug 13 04:02:23 UTC 2014 (duration 2m 22s) [04:03:35] Logged the message, Master [04:27:31] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 31 data above and 0 below the confidence bounds [04:27:31] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 31 data above and 0 below the confidence bounds [06:27:51] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: Epic puppet fail [06:28:51] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:02] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:11] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:31] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:32] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:52] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:11] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:39:21] PROBLEM - puppet last run on db1027 is CRITICAL: CRITICAL: Puppet has 2 failures [06:41:02] PROBLEM - puppet last run on ssl1004 is CRITICAL: CRITICAL: Puppet has 2 failures [06:45:11] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:45:31] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:46:31] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:51] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:47:02] PROBLEM - puppet last run on db1022 is CRITICAL: CRITICAL: Puppet has 1 failures [06:47:02] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:47:11] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:47:51] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:47:52] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:55:50] (03CR) 10Alexandros Kosiaris: [C: 032] Create a CNAME for labs postgresql DBs [operations/dns] - 10https://gerrit.wikimedia.org/r/153614 (owner: 10Alexandros Kosiaris) [06:56:21] RECOVERY - puppet last run on db1027 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:59:02] RECOVERY - puppet last run on ssl1004 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [07:00:06] (03PS1) 10Alexandros Kosiaris: osm planet import/sync, make sure we use hstore [operations/puppet] - 10https://gerrit.wikimedia.org/r/153757 [07:06:01] RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [07:10:03] (03CR) 10Alexandros Kosiaris: [C: 032] osm planet import/sync, make sure we use hstore [operations/puppet] - 10https://gerrit.wikimedia.org/r/153757 (owner: 10Alexandros Kosiaris) [07:47:54] !log upgrading Java on contint servers gallium and lanthanum , restarting Jenkins related process [07:47:59] Logged the message, Master [07:48:41] (03CR) 10Alexandros Kosiaris: [C: 031] "I also like this. Kudos." [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/153584 (owner: 10Filippo Giunchedi) [07:50:27] restarting jenkins [07:52:20] if I was good enough, I would even switch it to java 7 [08:14:37] <_joe_> hashar: java 6? [08:14:38] <_joe_> o my [08:14:47] <_joe_> then don't complain jenkins is slow [08:14:49] <_joe_> :) [08:15:28] <_joe_> I've seen a 50% cpu and a 20% response time improvement upon migrating from java 6 to java 7 in most tomcat apps [08:23:30] andrewbogott_afk: no risks involved in having the ring files in the repo afaik, though it is ~2MB per cluster of binary files stored in git, anyways I'll comment on the code review too! [08:24:45] _joe_: lot of Jenkins slowness is due to the history of builds being saved on XML files and gallium suffering from high IO latency :-D [08:24:58] _joe_: will try to find a way to change the java alternative via puppet [08:36:55] though we are currently using the sun jre [08:38:12] na forget me [08:38:56] !log gallium removing some sun-java6* packages coming from old lucid era [08:39:01] Logged the message, Master [08:39:06] <_joe_> lol [08:40:07] that comes from the old java puppet manifest we had [08:41:56] (03PS14) 10Giuseppe Lavagetto: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [08:44:19] (03PS15) 10Giuseppe Lavagetto: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [08:50:58] (03PS16) 10Giuseppe Lavagetto: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [09:20:31] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 07:20:09 UTC [09:23:08] (03PS17) 10Giuseppe Lavagetto: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [09:36:46] (03PS1) 10Hashar: jenkins: retab + some linting [operations/puppet] - 10https://gerrit.wikimedia.org/r/153763 [09:36:48] (03PS1) 10Hashar: jenkins: use openjdk-7-jre-headless [operations/puppet] - 10https://gerrit.wikimedia.org/r/153764 [09:50:09] (03CR) 10Hashar: "puppet compilation on gallium.wikimedia.org and lanthanum.eqiad.wmnet (contint slaves in production): http://puppet-compiler.wmflabs.org/2" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153764 (owner: 10Hashar) [10:00:21] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: /mnt/tmpfs 9.0GB (= 5.0GB critical): /srv/deployment/ocg/output 1923324516B: /srv/deployment/ocg/postmortem 3421429B: ocg_job_status 6831 msg: ocg_render_job_queue 0 msg [10:01:21] RECOVERY - OCG health on ocg1003 is OK: OK: /mnt/tmpfs 0B: /srv/deployment/ocg/output 1924037201B: /srv/deployment/ocg/postmortem 3421429B: ocg_job_status 6832 msg: ocg_render_job_queue 0 msg [10:01:32] <_joe_> I don't know how to handle those alerts ^^ [10:01:43] <_joe_> but the flapping I just saw screams of "bug" [10:04:53] _joe_: OCG renders PDF files out of wikitext [10:05:03] i guess it doesn't clear its tmp files properly [10:05:30] <_joe_> or has no guard for absurdly huge generated files [10:06:28] modules/ocg/files/nagios/check_ocg_health [10:06:45] the plugin has some dir size related settings: [10:06:45] parser.add_argument('--wod', type=size_string, help='output dir size warning threshold') [10:06:45] parser.add_argument('--cod', type=size_string, help='output dir size critical threshold') [10:07:10] guess ocg1003 needs some manual cleanup + a bug report [10:07:44] if you have some bandwidth after that, could use a review for a patch to switch jenkins to java 7 :-D https://gerrit.wikimedia.org/r/#/c/153764/ [10:07:52] the puppet catalog compiler confuses me [10:08:04] <_joe_> http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=large&h=ocg1003.eqiad.wmnet&m=disk_free&s=by+name&mc=2&g=network_report&c=PDF+servers+eqiad [10:08:15] <_joe_> some very large upload there.... [10:08:15] \O/ [10:09:11] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Epic puppet fail [10:10:11] <_joe_> hashar: missing_in_new is wrong, of course [10:10:27] <_joe_> that may be one of the fixes I did locally in a hurry [10:10:55] ah :-] [10:10:58] <_joe_> mh, I'll have to check [10:11:27] <_joe_> hashar: I removed all local fixes as I supposedly put all of them back in the repository [10:11:34] <_joe_> so maybe I overlooked that one [10:11:42] <_joe_> because I remember that being a bug [10:11:44] also the list of nodes should probably have whitespaces stripped [10:11:46] NODES='gallium.wikimedia.org, lanthanum.eqiad.wmnet' [10:11:52] <_joe_> yes [10:11:53] comparator: error: unrecognized arguments: lanthanum.eqiad.wmnet [10:11:54] :-] [10:12:00] full run https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/207/console [10:12:15] <_joe_> that you should do in the jenkins script [10:13:22] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "One minor comment, otherwise looks good." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153764 (owner: 10Hashar) [10:13:46] ah yeah alternatives [10:13:53] but then that will switch java for everything [10:13:53] (03PS2) 10Giuseppe Lavagetto: puppetmaster: make reimaging servers easier. [operations/puppet] - 10https://gerrit.wikimedia.org/r/153397 [10:13:55] <_joe_> :) [10:14:00] <_joe_> I know I am annoying [10:14:19] <_joe_> but the debian alternatives system is pretty neat [10:14:27] and honestly, I have no idea how to set the alternative via puppet. There seems to be a module to do so. Being lazy I just hardcoded the value hehe [10:14:44] <_joe_> hashar: yes there is a module [10:16:30] <_joe_> it's pretty simple to use, see https://github.com/wikimedia/operations-puppet/blob/production/modules/mediawiki/manifests/hhvm.pp [10:16:45] <_joe_> I'm pretty sure the jre package installs the alternative [10:16:54] <_joe_> it just doesn't update the configuration [10:16:59] yup noticed that [10:17:20] <_joe_> so you'll need something like alternatives::config{'java': path => your-path } [10:17:43] I have no clue what are going to be the side effects though [10:18:01] <_joe_> that you switch all apps on those servers to java 7 [10:18:05] we have some maven Jenkins jobs that might rely on a java being 6 [10:18:22] <_joe_> mmmh only one way to find out I guess [10:18:25] <_joe_> :) [10:18:43] <_joe_> hashar: yes mine was just a comment on what I'd expect in general [10:19:10] <_joe_> but if you have sound reasons to do what you did, please don't stick to my advice [10:19:18] (03PS1) 10QChris: Stop forwarding udp2log's EventLogging data to universities [operations/puppet] - 10https://gerrit.wikimedia.org/r/153769 [10:20:18] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster: make reimaging servers easier. [operations/puppet] - 10https://gerrit.wikimedia.org/r/153397 (owner: 10Giuseppe Lavagetto) [10:21:35] _joe_: merely avoid having to learn about the alternative puppet module and add having the jenkins module depend on alternative [10:21:40] but heh it is lunch time [10:21:54] working from home with a hungry 3 years old kid duhhh [10:22:16] bbl [10:24:12] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: Puppet has 1 failures [10:26:55] <_joe_> ^^ that's me :| [10:28:11] (03PS1) 10Giuseppe Lavagetto: puppetmaster: fix typo in scripts.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/153770 [10:28:40] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppetmaster: fix typo in scripts.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/153770 (owner: 10Giuseppe Lavagetto) [10:29:29] (03CR) 10Ori.livneh: [C: 031] "The hash_data() approach is clever. This looks good to me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [10:30:11] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:31:11] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [10:35:17] !log bump swift weights for ms-be1013 ms-be1014 ms-be1015 to 2500 [10:35:23] Logged the message, Master [10:35:29] (03PS3) 10Giuseppe Lavagetto: apache: add a 'replaces' parameter to apache::conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/153406 [10:35:57] (03CR) 10Giuseppe Lavagetto: [C: 032] apache: add a 'replaces' parameter to apache::conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/153406 (owner: 10Giuseppe Lavagetto) [10:36:54] (03PS3) 10Giuseppe Lavagetto: mediawiki: HAT appserver should turn off mod_php [operations/puppet] - 10https://gerrit.wikimedia.org/r/153577 [10:38:57] (03PS3) 10Ori.livneh: wmflib: add validate_ensure() [operations/puppet] - 10https://gerrit.wikimedia.org/r/153586 [10:39:04] (03PS1) 10Giuseppe Lavagetto: mediawiki: avoid installing php packages on HAT servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 [10:40:03] <_joe_> ori: this ^^ needs the changes to wwwportals.conf and your feedback. We should also really test it in beta [10:40:59] (03CR) 10Ori.livneh: [C: 031] mediawiki: avoid installing php packages on HAT servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 (owner: 10Giuseppe Lavagetto) [10:41:46] _joe_: i think pushing it into a packages/php.pp and including/excluding the whole class based on the os version might be a tad cleaner (hey, this is your influence here ;)) but this is good too and we can always do that later [10:42:01] <_joe_> eheheh you're right [10:42:02] <_joe_> :P [10:42:39] <_joe_> the pear packages will still need to be installed I guess [10:42:40] sorry i'm on and off irc so sporadically, i keep getting moved around to various family gathering functions [10:42:46] <_joe_> eheh [10:42:54] back in half an hour or so [10:43:09] <_joe_> I'll probably be at lunch [10:43:23] <_joe_> but pm me if there is something you think I should prioritize [10:50:07] (03PS2) 10Filippo Giunchedi: swift-ring: manage swift rings via git [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/153584 [10:59:51] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Wed Aug 13 10:59:44 UTC 2014 [11:00:21] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 876.259218385 [11:01:40] (03CR) 10Mark Bergsma: "The hash_data() approach actually doesn't work, because of our hierarchy. The HHVM backend doesn't exist on $cluster_tier==2, and we can't" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [11:02:10] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "php-mail-mime depends on php5 and not on php5-common, so it will install either libapache2-mod-php5 or php5-cgi or php5-fpm." [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 (owner: 10Giuseppe Lavagetto) [11:04:48] (03CR) 10Filippo Giunchedi: "thanks for the feedback!" [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/153584 (owner: 10Filippo Giunchedi) [11:08:23] (03CR) 10Mark Bergsma: "Disregard my previous comment - that was based on an older patchset indeed." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [11:10:26] (03CR) 10Mark Bergsma: "Also, none of this splits the cache on the frontend (memory only). Not nearly as a big a problem as on the large disk based caches, but st" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [11:11:49] (03CR) 10Mark Bergsma: [C: 04-2] "This approach breaks purging, as the Use-HHVM header is not hashed for those requests. Therefore, HHVM pages won't get purged at all." [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [11:14:04] godog: i updated ordered_yaml() to derive from the same file as ordered_json() (and to share the implementation), and updated validate_ensure() for usage examples.. i think that about wraps up the wmflib bonanza i've been on [11:15:00] the ordered_yaml() stuff is useful for the salt stuff i'm doing now (doing small refactoring is a way for me to get a handle on the setup; my goal is to make trebuchet work reliably for new installs) [11:15:27] (03PS3) 10Filippo Giunchedi: swift-ring: manage swift rings via git [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/153584 [11:16:54] ori: cool! I'll take a look at both in the european afternoon! [11:17:11] awesome, thanks [11:17:17] what's left to make trebuchet bootstrap correctly on new installs btw? there where many roadblocks? [11:18:59] (03CR) 10Ori.livneh: swift-ring: manage swift rings via git (031 comment) [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/153584 (owner: 10Filippo Giunchedi) [11:20:51] godog: it "should work", but ryan suggested manually updated modules after the addition of the deployment target grain, which prompted me to look at the source, and i'm confused by which updates are necessary where [11:21:40] deploy.sync_all != saltutil.sync_all, each of these calls miscellaneous other self-update routines [11:22:15] so it's not so much a need to implement missing functionality but to get to the bottom of what exactly needs to happen so it can be made simple and reliable [11:22:50] nice! [11:23:59] I don't have much knowledge/context on salt/trebuchet yet but should :( might pick your brain or ryan's or _joe_'s if need be [11:26:54] i think the only serious wtf that i encountered so far is the mixed synchronous/asynchronous approach [11:27:24] i'm not sure how anyone thinks it is a sane approach, and perusing the issues on the salt repo in github suggests it's a sticking point for many [11:29:16] but i think as long as you're disciplined about treating everything as at least potentially asynchronous it can work [11:31:35] i also think the key exchange between minion and master could be made less manual by piggybacking on trust in the puppetmaster but i haven't worked out fully how that would look [11:32:20] this is all very vague, sorry -- i'm still working it out myself. [11:33:02] (03PS6) 10Ori.livneh: Nutcracker: move declaration to role::mediawiki; parametrize [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 [11:41:52] ori: ack, running out to lunch [11:56:53] ori: what does the load balancer for rcstream run again? [11:57:05] as in what software? [11:57:09] LVS [12:03:27] ori: ok. I have extended the rcstream docs a bit: https://wikitech.wikimedia.org/w/index.php?title=Stream.wikimedia.org&diff=123281&oldid=115405 [12:04:43] valhallasw`cloud: oh, that's really nice -- thanks! have you been experimenting with it? how reliable has it been since the LVS change? [12:04:53] I haven't tested any of it yet :-) [12:05:55] nod [12:06:18] (03CR) 10Mxn: "In that case, I'd suggest adding the required rights. I can start a new community poll if necessary, but it seems like an implementation d" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149637 (https://bugzilla.wikimedia.org/68612) (owner: 10Withoutaname) [12:06:32] but 'python rctest.py' works when I run it, so I suppose it's stable ;-) [12:06:42] cool [12:30:28] (03PS1) 10Ori.livneh: Clean up salt::grain [operations/puppet] - 10https://gerrit.wikimedia.org/r/153783 [12:36:54] (03Abandoned) 10Ori.livneh: Beta: depool deployment-mediawiki01 to investigate HHVM lock-up [operations/puppet] - 10https://gerrit.wikimedia.org/r/151024 (owner: 10Ori.livneh) [12:51:24] !log restarting rebuilding Cirrus indexes to pick up weighted all field [12:51:29] Logged the message, Master [13:00:01] (03PS1) 10Yuvipanda: quarry: Use mariadb instead of mysql [operations/puppet] - 10https://gerrit.wikimedia.org/r/153784 [13:00:07] K4-713: Dear anthropoid, the time has come. Please deploy Fundraising (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140813T1300). [13:00:31] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 10:59:44 UTC [13:23:37] !log killed a mass of SpecialWhatLinksHere queries on enwiki [13:23:43] Logged the message, Master [13:26:31] PROBLEM - Puppet freshness on labsdb1003 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 11:25:24 UTC [13:30:40] (03PS2) 10Ottomata: Stop forwarding udp2log's EventLogging data to universities [operations/puppet] - 10https://gerrit.wikimedia.org/r/153769 (owner: 10QChris) [13:31:07] (03CR) 10Ottomata: [C: 032 V: 032] Stop forwarding udp2log's EventLogging data to universities [operations/puppet] - 10https://gerrit.wikimedia.org/r/153769 (owner: 10QChris) [13:31:24] (03PS1) 10Ori.livneh: Add 'udplog' service alias for fluorine.eqiad.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/153786 [13:32:20] bblack: feel like reviewing a wmnet CNAME change for me? :) ^ [13:37:36] ori: is it intentional that your alias is udplog.pmtpa.wmnet? [13:37:50] * ori facepalms. [13:37:51] no [13:38:03] * ori amends. [13:39:00] bblack: should i bother adding it for pmtpa as well too? [13:39:07] probably not [13:39:09] I wouldn't [13:39:32] (or for anywhere else, just eqiad) [13:39:40] (03PS2) 10Ori.livneh: Add 'udplog' service alias for fluorine.eqiad.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/153786 [13:40:12] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Wed Aug 13 13:40:09 UTC 2014 [13:40:45] (03CR) 10BBlack: [C: 032] Add 'udplog' service alias for fluorine.eqiad.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/153786 (owner: 10Ori.livneh) [13:41:16] bblack: thanks! will you deploy or should i? [13:41:24] yooo drdee, yt? [13:41:28] it's already deployed [13:41:46] bblack: what brand of coffee do you drink, so i can order some? :) [13:42:10] thanks btw! [13:42:19] I drink home-made espresso, made in a Jura Ena-1, with Illy Medium Roast beans [13:42:22] np! [13:53:28] legoktm, odder: if you want your patches SWATted this morning, windows are now available on the Deployments page. [13:54:27] (03PS5) 10Ottomata: Add script and class to manage HDFS user directories [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/153706 [13:57:35] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift-thumb-stats: dump thumb stats from swift [operations/software] - 10https://gerrit.wikimedia.org/r/148997 (owner: 10Filippo Giunchedi) [13:58:54] (03PS1) 10Ori.livneh: mediawiki: use 'udplog' service alias instead of hard-coding fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/153792 [14:03:21] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 1711.9962053 [14:05:12] PROBLEM - RAID on labsdb1003 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [14:05:41] (03PS2) 10Yuvipanda: quarry: Use mysql-server-5.6 instead of 5.5 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153784 [14:08:11] (03PS18) 10Giuseppe Lavagetto: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [14:08:25] <_joe_> bblack: just look at the text part [14:08:36] <_joe_> ^^ [14:12:01] RECOVERY - Puppet freshness on labsdb1003 is OK: puppet ran at Wed Aug 13 14:11:56 UTC 2014 [14:14:31] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 12:13:35 UTC [14:14:31] ACKNOWLEDGEMENT - RAID on labsdb1003 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Sean Pringle RT 8117 [14:24:12] RECOVERY - puppet last run on analytics1012 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [14:24:31] RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [14:25:41] anomie: Oh, I'm sure you can SWAT it at your leisure. I can remove it from the list if need be. [14:27:10] (03PS4) 10Filippo Giunchedi: swift-ring: manage swift rings via git [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/153584 [14:27:44] (03CR) 10Filippo Giunchedi: swift-ring: manage swift rings via git (031 comment) [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/153584 (owner: 10Filippo Giunchedi) [14:28:48] (03PS5) 10Filippo Giunchedi: swift-ring: manage swift rings via git [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/153584 [14:32:22] (03CR) 10Ori.livneh: [C: 031] swift-ring: manage swift rings via git [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/153584 (owner: 10Filippo Giunchedi) [14:35:33] (03PS1) 10Ori.livneh: Add 'udplog' to $nova_dnsmasq_aliases [operations/puppet] - 10https://gerrit.wikimedia.org/r/153800 [14:38:22] (03PS2) 10Ori.livneh: Add 'udplog' to $nova_dnsmasq_aliases [operations/puppet] - 10https://gerrit.wikimedia.org/r/153800 [14:40:24] (03CR) 10Andrew Bogott: [C: 032] Add 'udplog' to $nova_dnsmasq_aliases [operations/puppet] - 10https://gerrit.wikimedia.org/r/153800 (owner: 10Ori.livneh) [14:46:55] (03PS1) 10Ottomata: Add ferm::service rule for zookeeper admin port [operations/puppet] - 10https://gerrit.wikimedia.org/r/153801 [14:48:45] (03CR) 10Ottomata: Add ferm::service rule for zookeeper admin port (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153801 (owner: 10Ottomata) [14:55:27] (03PS2) 10Ori.livneh: mediawiki: use 'udplog' service alias instead of hard-coding fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/153792 [14:58:26] bblack: a little help? Looking at https://gerrit.wikimedia.org/r/#/c/152791/ [14:58:32] …and trying to make a new, similar change. [15:00:05] manybubbles, anomie, Reedy: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140813T1500). Please do the needful. [15:00:33] * anomie sees no patches listed for SWAT [15:02:53] (03PS2) 10Ori.livneh: Get rid of symlinks to scap scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/147321 [15:03:28] andrewbogott: what's the issue? [15:04:17] There are vague comments about needing to restart things, but it's not clear to me what actually needs restarting. [15:04:26] do you recall? [15:04:36] it's the nova network restart thing on virtXXXX, let me see... [15:04:44] Ah, on labnet1001? [15:04:49] I've done that, to no good effect. [15:04:56] The change we're trying to get is https://gerrit.wikimedia.org/r/#/c/153800/ [15:04:58] this https://ask.openstack.org/en/question/1442/how-does-dnsmasq-configuration-get-reloaded-when-dhcp_domain-is-set/ [15:05:20] ah, yeah, I've done that. [15:05:34] * andrewbogott shrugs, does it again [15:06:07] yeah, no dice [15:06:39] hang on a sec [15:08:38] (03CR) 10Ori.livneh: [C: 032 V: 032] Get rid of symlinks to scap scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/147321 (owner: 10Ori.livneh) [15:08:50] andrewbogott: where is the betalabs udplog host/addresses/etc defined anyways? [15:09:09] oh I see, deployment-bastion [15:11:06] bblack@deployment-cache-text02:~$ host udplog.wmflabs.org [15:11:06] udplog.wmflabs.org has address 10.68.16.58 [15:11:16] ^ that seems like your change is working correctly, no? [15:11:54] no, that was done via the addresses interface on wikitech [15:12:01] i need a bare 'udplog' to resolve to the private ip [15:12:10] i.e., udplog.eqiad.wmflabs [15:12:12] ori: by the way, you can throw back that public IP, right? [15:12:20] MINE NOW [15:12:22] j/k, sure. [15:12:49] well the way things are currently configured, udplog.wmflabs.org has a public address, and the dnsmasq change remaps that to the private address within labs so there's no firewalling issue [15:12:51] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.007 second response time [15:12:59] I don't know what else you're trying to do or why [15:13:35] $ host deployment-stream [15:13:36] deployment-stream.eqiad.wmflabs has address 10.68.17.106 [15:13:37] andrew@util-abogott:~$ host udplog [15:13:37] andrew@util-abogott:~$ [15:13:48] can't you just do the full name? [15:13:51] seems to me that based on that patch, behavior should be the same for those two [15:13:51] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 3.211 second response time [15:14:31] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 31 data above and 0 below the confidence bounds [15:14:31] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 31 data above and 0 below the confidence bounds [15:14:38] the text label on the left-hand-side of that patch is just a label. All that stuff does in dnsmasq is say "if a DNS result returns the public address, rewrite it as the private address". it doesn't care about hostnames. [15:14:53] actual dns hostname changes are... elsewhere? I have no idea [15:15:04] oh, hm... [15:15:09] * andrewbogott digs [15:15:09] DNS domains are configured here: https://wikitech.wikimedia.org/wiki/Special:NovaDomain [15:15:24] individual hostnames are assigned here: https://wikitech.wikimedia.org/wiki/Special:NovaAddress [15:15:30] there we go! [15:15:32] but the .eqiad.wmflabs domain is 'special' somehow [15:15:44] you can't (or at least i can't -- it could be a priv thing) add entries for it [15:15:49] The action you have requested is limited to users in the group: cloudadmin. [15:16:14] ah, I see. 'deployment-stream' is the name of an actual instance. [15:16:16] which udplog is not [15:16:16] shouldn't ops be in some super-group on labs that doesn't get arbitrarily restricted and have to get added to every little thing? I run into this all the time there on wikitech [15:16:48] (also, why can't labs hosts just configure to use udplog.wmflabs.org?) [15:17:06] bblack: they could, but part of what i wanted is to have the same config for labs and prod ('udplog' in both) [15:17:54] (03PS1) 10Ottomata: Turn on elasticsearch row awareness for shard allocation [operations/puppet] - 10https://gerrit.wikimedia.org/r/153805 [15:17:56] ori: Ok, so, why don't you recreate that public IP and then I'll put it in the right domain. [15:18:36] (03CR) 10jenkins-bot: [V: 04-1] Turn on elasticsearch row awareness for shard allocation [operations/puppet] - 10https://gerrit.wikimedia.org/r/153805 (owner: 10Ottomata) [15:19:13] (03PS2) 10Ottomata: Turn on elasticsearch row awareness for shard allocation [operations/puppet] - 10https://gerrit.wikimedia.org/r/153805 [15:20:02] (03CR) 10Ottomata: "This should work, but we should talk about possible complications of turning this on during the next search checkpoint meeting." [operations/puppet] - 10https://gerrit.wikimedia.org/r/153805 (owner: 10Ottomata) [15:24:09] andrewbogott: recreate that public IP? [15:25:00] nm, it's still there. [15:25:41] what were you doing when you got that 'must be cloud admin' error? [15:26:10] trying to view https://wikitech.wikimedia.org/wiki/Special:NovaDomain [15:26:45] ah, ok. [15:28:40] (03CR) 10coren: [C: 032] Tools: Include mediawiki::multimedia::fonts in exec_environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/151440 (https://bugzilla.wikimedia.org/66354) (owner: 10Tim Landscheidt) [15:29:32] (03PS2) 10coren: Tools: Include mediawiki::multimedia::fonts in exec_environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/151440 (https://bugzilla.wikimedia.org/66354) (owner: 10Tim Landscheidt) [15:31:11] (03CR) 10coren: [C: 032] "Still +2 after rebase" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151440 (https://bugzilla.wikimedia.org/66354) (owner: 10Tim Landscheidt) [15:34:33] (03PS19) 10Giuseppe Lavagetto: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [15:39:06] (03CR) 10Hashar: [C: 04-1] "That is going to break wmf fatal errors for the beta cluster since the change hardcode 'udplog' as a destination host. Or I am missing so" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153792 (owner: 10Ori.livneh) [15:39:50] be back later this evening [15:40:17] (03CR) 10Ori.livneh: "@hashar: no, you're exactly right. This was the motivation for change https://gerrit.wikimedia.org/r/#/c/153800/ , and it's what I've been" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153792 (owner: 10Ori.livneh) [15:40:34] * ori waves at hasar [15:41:00] * hashar errors out [15:41:05] invalid syntax [15:41:27] invalid pointer* [15:45:36] (03CR) 10Mark Bergsma: Separate HHVM app servers backend. (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [15:46:01] (03PS4) 10Giuseppe Lavagetto: apache: add a 'replaces' parameter to apache::conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/153406 [15:46:08] (03CR) 10Giuseppe Lavagetto: [V: 032] apache: add a 'replaces' parameter to apache::conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/153406 (owner: 10Giuseppe Lavagetto) [15:46:19] (03CR) 10Hashar: "I am sure you would not push for a breaking change :-] Nit: would you mind adding a comment stating that udplog is a DNS service entry fo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153792 (owner: 10Ori.livneh) [15:46:37] ori: I had a few more nitpick in the inline diff :-D [15:46:48] ori: also 'udplog' does not resolve currently on deployment-bastion.eqiad.wmflabs. [15:47:27] hashar: i know, andrewbogott was looking into it. maybe i should just create a udplog instance in deployment-prep? [15:47:40] hashar: (but i would have to depart from the convention of having a 'deployment-' prefix in the instance name) [15:48:19] (03PS1) 10Ori.livneh: mediawiki: create common-local directory [operations/puppet] - 10https://gerrit.wikimedia.org/r/153807 [15:49:15] ori: or just vary based on $::realm :-D [15:49:28] but that is not the intent of your changes :] [15:51:23] gotta run [16:11:02] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: Epic puppet fail [16:13:22] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Wed Aug 13 16:13:17 UTC 2014 [16:27:37] (03Abandoned) 10Dzahn: add eqiad network to external networks [operations/puppet] - 10https://gerrit.wikimedia.org/r/153062 (owner: 10Dzahn) [16:30:01] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:30:31] PROBLEM - puppetmaster backend https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:31:21] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.033 second response time [16:33:46] (03CR) 10Dzahn: "Giuseppe, thanks, that makes the difference, keep forgetting it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 (owner: 10Dzahn) [16:38:55] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "One small doubt that may be bogus, otherwise LGTM." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 (owner: 10Ori.livneh) [16:46:33] qchris_away: still away/meetings? [16:47:01] ottomata: still away ... gimme some 10 minutes. [16:47:56] k [16:51:35] (03PS9) 10Dzahn: turn icinga into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 [16:51:46] (03PS20) 10Giuseppe Lavagetto: Separate HHVM app servers backend. [operations/puppet] - 10https://gerrit.wikimedia.org/r/152903 (owner: 10Mark Bergsma) [16:51:54] (03PS10) 10Dzahn: turn icinga into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 [16:53:53] what happened to cmjohnson1? [16:58:23] ottomata: back [17:00:32] dogeydogey: cmjohnson1 is on vacation [17:01:01] (03PS1) 10Dzahn: download.wm.org - apache site setup broken [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 [17:01:07] (03PS4) 10Ottomata: Set up passive icinga for webrequest data imports in HDFS and Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/151963 [17:06:04] (03PS1) 1001tonythomas: Removed the acl_m2 check to consider VERP addresses not as spam [operations/puppet] - 10https://gerrit.wikimedia.org/r/153820 [17:11:54] (03PS2) 1001tonythomas: Remove redundant spam check bypass acl_m2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153820 [17:13:30] (03CR) 10Jgreen: [C: 031] Remove redundant spam check bypass acl_m2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153820 (owner: 1001tonythomas) [17:14:27] mark: Hi ! time to take a look into https://gerrit.wikimedia.org/r/#/c/153820/ ? [17:15:36] (03CR) 10Mark Bergsma: [C: 031] Remove redundant spam check bypass acl_m2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153820 (owner: 1001tonythomas) [17:16:34] (03CR) 10Jgreen: [C: 032 V: 031] Remove redundant spam check bypass acl_m2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153820 (owner: 1001tonythomas) [17:16:47] mark: yay ! [17:17:22] (03PS5) 10Ottomata: Set up passive icinga for webrequest data imports in HDFS and Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/151963 [17:17:29] (03CR) 10Ottomata: [C: 032 V: 032] Set up passive icinga for webrequest data imports in HDFS and Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/151963 (owner: 10Ottomata) [17:28:16] (03PS2) 10Dzahn: download.wm.org - use apache::site method [operations/puppet] - 10https://gerrit.wikimedia.org/r/153817 [17:31:53] (03CR) 10Dzahn: "ok, now who knows about the naggen error in here in production and also after the change?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 (owner: 10Dzahn) [17:32:01] Every few months I'm newly horrified to discover that php function names aren't case-sensitive [17:32:49] (03CR) 10Filippo Giunchedi: [C: 031] wmflib: add validate_ensure() [operations/puppet] - 10https://gerrit.wikimedia.org/r/153586 (owner: 10Ori.livneh) [17:33:24] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/211/change/145472/compiled/puppet_catalogs_3_production/neon.wikimedia.org.warnings" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145472 (owner: 10Dzahn) [17:35:10] (03CR) 10Filippo Giunchedi: [C: 031] "+1 by means of eyeballing, I'm not a ruby adept by any stretch of imagination" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149775 (owner: 10Ori.livneh) [17:38:08] (03PS1) 10Dzahn: wikistats - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153828 [17:38:45] (03CR) 10Filippo Giunchedi: [C: 031] "modulo what joe pointed out" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 (owner: 10Ori.livneh) [17:39:51] (03CR) 10Rush: "spent some time looking it over and it the ideas all make sense, and naming scheme wise I would go with what you got :)" [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/153706 (owner: 10Ottomata) [17:39:57] (03CR) 10Rush: [C: 031] "spent some time looking it over and it the ideas all make sense, and naming scheme wise I would go with what you got :)" [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/153706 (owner: 10Ottomata) [17:44:57] (03PS1) 10Jgreen: allow DKIM wiki-mail._domainkey for use with localpart wiki*@ instead of just wiki@ [operations/dns] - 10https://gerrit.wikimedia.org/r/153830 [17:51:05] (03PS1) 10Dzahn: stats.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 [17:51:36] (03PS2) 10Dzahn: stats.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 [17:54:35] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 31 data above and 0 below the confidence bounds [17:54:35] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 31 data above and 0 below the confidence bounds [17:55:10] (03CR) 10Ottomata: [C: 031] stats.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 (owner: 10Dzahn) [17:57:37] (03PS1) 10Dzahn: puppetmaster - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 [17:58:46] (03CR) 10Andrew Bogott: androidsdk: Make sure that JDK is present (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153600 (owner: 10Yuvipanda) [17:59:12] (03CR) 10Andrew Bogott: [C: 032] quarry: Use mysql-server-5.6 instead of 5.5 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153784 (owner: 10Yuvipanda) [18:00:05] yurik: Dear anthropoid, the time has come. Please deploy Wikipedia Zero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140813T1800). [18:00:41] greg-g, who is anthropoid? ^ [18:01:19] Not registered [18:01:43] yurikR: That's probably the default name [18:01:55] marktraceur: :P [18:01:58] Look it up [18:02:01] http://en.wiktionary.org/wiki/anthropoid#Translations [18:02:09] *nod* [18:02:22] i figured that was a default me, assumed a broken script :) [18:02:34] the script has randomness and multiple messages :) [18:02:34] Parsing got broken for some reason, I'll look at it. [18:02:51] hehe [18:02:58] and it was supposed to be gender neutral [18:04:24] Oh, I thought it pinged people [18:04:46] Oh, it did ping you [18:04:50] * marktraceur stops worrying about it [18:05:04] heh [18:05:45] (03PS11) 10Withoutaname: Reduce string URLs to defined constant [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131914 (https://bugzilla.wikimedia.org/48618) [18:05:50] add more messages :) it selects rand() [18:10:03] (03CR) 10Tim Landscheidt: androidsdk: Make sure that JDK is present (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153600 (owner: 10Yuvipanda) [18:10:45] (03CR) 10Tim Landscheidt: "@Andrew Bogott: The disadvantages of a non-native speaker with a dictionary ... :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124001 (owner: 10Tim Landscheidt) [18:13:01] (03PS1) 10Dzahn: mw-rc-irc - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153843 [18:13:46] (03CR) 10jenkins-bot: [V: 04-1] mw-rc-irc - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153843 (owner: 10Dzahn) [18:16:23] (03PS2) 10Dzahn: mw-rc-irc - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153843 [18:16:25] (03PS16) 10Withoutaname: Delete ve.wikimedia.org and leave redirect [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131907 (https://bugzilla.wikimedia.org/55737) [18:21:28] (03PS1) 10Dzahn: ishmael - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153846 [18:22:53] (03PS2) 10Dzahn: ishmael - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153846 [18:24:16] (03PS1) 10Calak: Change autoconfirmed settings on nowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153848 (https://bugzilla.wikimedia.org/69302) [18:29:01] (03PS1) 10Dzahn: gerrit - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153849 [18:31:16] (03PS1) 10Parent5446: Set $wgPasswordDefault to old MD5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153850 (https://bugzilla.wikimedia.org/68766) [18:35:56] heya Jeff_Green, yt? [18:37:05] or, chasemp? [18:37:17] ottomata: yeaqh [18:40:04] (03PS1) 10Dzahn: openstack - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153858 [18:41:59] (03PS2) 10Dzahn: wikistats - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153828 [18:42:38] ottomata: hey [18:43:40] (03PS3) 10Dzahn: stats.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 [18:49:15] (03CR) 10Reedy: [C: 04-1] "I think that might actually still be needed" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153347 (owner: 10Withoutaname) [18:49:36] Reedy: why is it still needed? [18:49:48] (03CR) 10Ottomata: [C: 032] stats.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 (owner: 10Dzahn) [18:50:25] legoktm: Do the app servers need to make requests to labs? [18:50:32] no [18:50:35] ah, ok [18:50:42] cause if they did, the proxy was still needed [18:50:52] you can't connect to labs even with the proxy [18:51:03] but, ExtensionDistributor no longer even checks those globals [18:51:12] lol [18:52:02] (03CR) 10Reedy: "Or not, variables are dead from the extnesion" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153347 (owner: 10Withoutaname) [19:00:21] (03PS1) 10Ottomata: Create icinga::monitor::nsca::client class [operations/puppet] - 10https://gerrit.wikimedia.org/r/153865 [19:00:55] (03CR) 10Andrew Bogott: [C: 032] openstack - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153858 (owner: 10Dzahn) [19:01:05] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:00:40 UTC [19:04:36] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Epic puppet fail [19:04:56] (03PS1) 10Andrew Bogott: Revert "openstack - use apache::site" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153866 [19:04:59] (03CR) 10Dzahn: [C: 04-1] "manifests/misc/icinga.pp: require icinga::nsca" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153865 (owner: 10Ottomata) [19:05:31] (03PS1) 10Yurik: Added graph extension to labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153867 [19:05:32] mutante: fyi, https://gerrit.wikimedia.org/r/#/c/153866/ [19:08:39] (03CR) 10Andrew Bogott: [C: 032] Revert "openstack - use apache::site" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153866 (owner: 10Andrew Bogott) [19:09:34] (03PS2) 10Ottomata: Create icinga::monitor::nsca::client class [operations/puppet] - 10https://gerrit.wikimedia.org/r/153865 [19:10:36] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [19:10:57] mutante: I welcome another attempt at that patch :) [19:13:06] (03CR) 10Dzahn: [C: 031] Create icinga::monitor::nsca::client class [operations/puppet] - 10https://gerrit.wikimedia.org/r/153865 (owner: 10Ottomata) [19:14:22] andrewbogott: aaah, yep, ok [19:14:23] (03CR) 10Ottomata: [C: 032 V: 032] Create icinga::monitor::nsca::client class [operations/puppet] - 10https://gerrit.wikimedia.org/r/153865 (owner: 10Ottomata) [19:14:49] (03PS1) 10Dzahn: Revert "Revert "openstack - use apache::site"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153870 [19:15:24] (03CR) 10Dzahn: "usually we used to have "this change has been reverted in.." messages" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153858 (owner: 10Dzahn) [19:15:52] (03CR) 10Dzahn: [C: 04-1] "wait, i'll amend" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153870 (owner: 10Dzahn) [19:18:55] (03PS2) 10Dzahn: Revert "Revert "openstack - use apache::site"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153870 [19:22:05] PROBLEM - Puppet freshness on mc1016 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:21:32 UTC [19:22:05] PROBLEM - Puppet freshness on ms-be1004 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:21:53 UTC [19:22:05] PROBLEM - Puppet freshness on netmon1001 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:21:48 UTC [19:23:05] PROBLEM - Puppet freshness on caesium is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:13 UTC [19:23:05] PROBLEM - Puppet freshness on amssq54 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:34 UTC [19:23:05] PROBLEM - Puppet freshness on carbon is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:34 UTC [19:23:05] PROBLEM - Puppet freshness on cp4006 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:39 UTC [19:23:05] PROBLEM - Puppet freshness on db1065 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:03 UTC [19:23:06] PROBLEM - Puppet freshness on mw1006 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:39 UTC [19:23:06] PROBLEM - Puppet freshness on mw1007 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:44 UTC [19:23:07] PROBLEM - Puppet freshness on es1001 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:03 UTC [19:23:07] PROBLEM - Puppet freshness on mw1045 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:03 UTC [19:23:08] PROBLEM - Puppet freshness on magnesium is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:24 UTC [19:23:08] PROBLEM - Puppet freshness on cp1049 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:21:58 UTC [19:23:09] PROBLEM - Puppet freshness on db1053 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:29 UTC [19:23:09] PROBLEM - Puppet freshness on cp3008 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:21:58 UTC [19:23:10] PROBLEM - Puppet freshness on mw1063 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:23 UTC [19:23:10] PROBLEM - Puppet freshness on mw1140 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:23 UTC [19:23:11] PROBLEM - Puppet freshness on mw1041 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:08 UTC [19:23:11] PROBLEM - Puppet freshness on platinum is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:39 UTC [19:23:16] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:29 UTC [19:23:16] PROBLEM - Puppet freshness on mw1145 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:34 UTC [19:23:16] PROBLEM - Puppet freshness on wtp1006 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:22:54 UTC [19:24:05] PROBLEM - Puppet freshness on amssq32 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:23:35 UTC [19:24:05] PROBLEM - Puppet freshness on amssq53 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:23:14 UTC [19:24:05] PROBLEM - Puppet freshness on amssq61 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:23:50 UTC [19:24:05] PROBLEM - Puppet freshness on cerium is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:23:04 UTC [19:24:05] PROBLEM - Puppet freshness on cp1047 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:23:25 UTC [19:25:43] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/212/change/153866/html/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153870 (owner: 10Dzahn) [19:26:05] PROBLEM - Puppet freshness on cp4008 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:25:11 UTC [19:26:05] PROBLEM - Puppet freshness on cp3014 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:25:52 UTC [19:26:05] PROBLEM - Puppet freshness on cp1061 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:25:21 UTC [19:26:05] PROBLEM - Puppet freshness on iron is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:25:36 UTC [19:26:05] PROBLEM - Puppet freshness on db1067 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:25:52 UTC [19:27:02] uhhh [19:27:05] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:26:12 UTC [19:27:05] PROBLEM - Puppet freshness on amssq46 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:26:32 UTC [19:28:35] RECOVERY - Puppet freshness on bast4001 is OK: puppet ran at Wed Aug 13 19:28:29 UTC 2014 [19:28:35] RECOVERY - Puppet freshness on db1048 is OK: puppet ran at Wed Aug 13 19:28:29 UTC 2014 [19:29:05] RECOVERY - Puppet freshness on analytics1013 is OK: puppet ran at Wed Aug 13 19:28:55 UTC 2014 [19:29:05] PROBLEM - Puppet freshness on lvs4003 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:28:19 UTC [19:29:05] PROBLEM - Puppet freshness on analytics1035 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:27:59 UTC [19:29:05] PROBLEM - Puppet freshness on db1071 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:28:45 UTC [19:29:05] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:28:09 UTC [19:29:05] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:28:04 UTC [19:29:05] PROBLEM - Puppet freshness on mw1076 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:28:24 UTC [19:29:06] PROBLEM - Puppet freshness on analytics1030 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:27:59 UTC [19:29:07] PROBLEM - Puppet freshness on wtp1005 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:28:09 UTC [19:29:07] RECOVERY - Puppet freshness on analytics1022 is OK: puppet ran at Wed Aug 13 19:29:00 UTC 2014 [19:29:35] RECOVERY - Puppet freshness on db1071 is OK: puppet ran at Wed Aug 13 19:29:25 UTC 2014 [19:29:49] my fault! [19:30:05] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:29:46 UTC [19:30:15] RECOVERY - Puppet freshness on hooft is OK: puppet ran at Wed Aug 13 19:30:06 UTC 2014 [19:31:05] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:30:37 UTC [19:31:05] PROBLEM - Puppet freshness on cp1060 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:30:52 UTC [19:31:05] PROBLEM - Puppet freshness on cp3009 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:30:52 UTC [19:31:05] PROBLEM - Puppet freshness on db1001 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:30:52 UTC [19:31:05] PROBLEM - Puppet freshness on db1057 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:30:47 UTC [19:31:05] PROBLEM - Puppet freshness on elastic1002 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:30:52 UTC [19:31:05] PROBLEM - Puppet freshness on mw1023 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:30:26 UTC [19:31:06] PROBLEM - Puppet freshness on rcs1001 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:30:47 UTC [19:31:06] PROBLEM - Puppet freshness on wtp1015 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:30:47 UTC [19:31:15] RECOVERY - Puppet freshness on mw1023 is OK: puppet ran at Wed Aug 13 19:31:06 UTC 2014 [19:31:18] (03PS2) 10Yurik: Added graph extension to labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153867 [19:31:25] RECOVERY - Puppet freshness on elastic1002 is OK: puppet ran at Wed Aug 13 19:31:16 UTC 2014 [19:31:25] RECOVERY - Puppet freshness on db1001 is OK: puppet ran at Wed Aug 13 19:31:17 UTC 2014 [19:31:25] RECOVERY - Puppet freshness on db1057 is OK: puppet ran at Wed Aug 13 19:31:22 UTC 2014 [19:31:35] RECOVERY - Puppet freshness on amslvs3 is OK: puppet ran at Wed Aug 13 19:31:27 UTC 2014 [19:31:35] RECOVERY - Puppet freshness on wtp1015 is OK: puppet ran at Wed Aug 13 19:31:27 UTC 2014 [19:31:36] RECOVERY - Puppet freshness on cp1060 is OK: puppet ran at Wed Aug 13 19:31:32 UTC 2014 [19:31:36] RECOVERY - Puppet freshness on rcs1001 is OK: puppet ran at Wed Aug 13 19:31:32 UTC 2014 [19:31:45] RECOVERY - Puppet freshness on cp3009 is OK: puppet ran at Wed Aug 13 19:31:37 UTC 2014 [19:32:05] PROBLEM - Puppet freshness on ocg1003 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:31:47 UTC [19:32:05] PROBLEM - Puppet freshness on rdb1003 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:31:47 UTC [19:32:05] PROBLEM - Puppet freshness on db1044 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:31:47 UTC [19:32:05] PROBLEM - Puppet freshness on mw1001 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:31:52 UTC [19:32:05] RECOVERY - Puppet freshness on mw1001 is OK: puppet ran at Wed Aug 13 19:32:02 UTC 2014 [19:32:15] RECOVERY - Puppet freshness on rdb1003 is OK: puppet ran at Wed Aug 13 19:32:12 UTC 2014 [19:32:35] RECOVERY - Puppet freshness on db1044 is OK: puppet ran at Wed Aug 13 19:32:27 UTC 2014 [19:32:35] RECOVERY - Puppet freshness on ocg1003 is OK: puppet ran at Wed Aug 13 19:32:32 UTC 2014 [19:33:05] PROBLEM - Puppet freshness on calcium is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:32:48 UTC [19:33:05] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:32:48 UTC [19:33:05] PROBLEM - Puppet freshness on fluorine is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:32:53 UTC [19:33:05] PROBLEM - Puppet freshness on dbstore1002 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:32:43 UTC [19:33:16] RECOVERY - Puppet freshness on fluorine is OK: puppet ran at Wed Aug 13 19:33:13 UTC 2014 [19:33:16] Um, ok, I disabled the freshness check. But maybe icinga doesn't know about that? [19:33:25] RECOVERY - Puppet freshness on calcium is OK: puppet ran at Wed Aug 13 19:33:18 UTC 2014 [19:33:31] mutante, any ideas? I merged your patch to turn off the check... [19:33:32] (03CR) 10JanZerebecki: [C: 031] mw-rc-irc - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153843 (owner: 10Dzahn) [19:33:35] RECOVERY - Puppet freshness on dbstore1002 is OK: puppet ran at Wed Aug 13 19:33:28 UTC 2014 [19:33:46] oh, and now it's ok…? [19:34:05] PROBLEM - Puppet freshness on amssq39 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:33:54 UTC [19:34:05] PROBLEM - Puppet freshness on db1072 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:33:44 UTC [19:34:05] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:33:54 UTC [19:34:05] PROBLEM - Puppet freshness on mw1064 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:33:49 UTC [19:34:05] PROBLEM - Puppet freshness on wtp1010 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:33:54 UTC [19:34:10] andrewbogott: that was otto looking at the fifo to debug the nsca stuff [19:34:16] RECOVERY - Puppet freshness on db1072 is OK: puppet ran at Wed Aug 13 19:34:14 UTC 2014 [19:34:19] yeah, sorry andrewbogott [19:34:22] my fault [19:34:30] mutante: ok, but why is 'puppet freshness' even a thing? There's no such test anymore… I thought. [19:34:34] it should all come back shortly, as soon as puppet runs [19:34:35] RECOVERY - Puppet freshness on wtp1010 is OK: puppet ran at Wed Aug 13 19:34:29 UTC 2014 [19:34:45] RECOVERY - Puppet freshness on lvs3003 is OK: puppet ran at Wed Aug 13 19:34:39 UTC 2014 [19:34:56] RECOVERY - Puppet freshness on amssq39 is OK: puppet ran at Wed Aug 13 19:34:54 UTC 2014 [19:35:05] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:34:14 UTC [19:35:05] PROBLEM - Puppet freshness on chromium is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:34:45 UTC [19:35:05] PROBLEM - Puppet freshness on mw1203 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:33:59 UTC [19:35:05] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:34:40 UTC [19:35:05] PROBLEM - Puppet freshness on mw1113 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:34:45 UTC [19:35:05] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:34:45 UTC [19:35:05] PROBLEM - Puppet freshness on db1035 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:34:34 UTC [19:35:06] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:34:34 UTC [19:35:06] PROBLEM - Puppet freshness on eeden is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:34:50 UTC [19:35:07] PROBLEM - Puppet freshness on mw1027 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:34:40 UTC [19:35:07] RECOVERY - Puppet freshness on chromium is OK: puppet ran at Wed Aug 13 19:34:59 UTC 2014 [19:35:08] RECOVERY - Puppet freshness on cp1040 is OK: puppet ran at Wed Aug 13 19:34:59 UTC 2014 [19:35:08] RECOVERY - Puppet freshness on mw1203 is OK: puppet ran at Wed Aug 13 19:34:59 UTC 2014 [19:35:09] RECOVERY - Puppet freshness on mw1027 is OK: puppet ran at Wed Aug 13 19:35:00 UTC 2014 [19:35:13] whao [19:35:14] andrewbogott: there was "puppet_disabled" ,but that is not the same as "puppet freshness" [19:35:15] RECOVERY - Puppet freshness on ssl3001 is OK: puppet ran at Wed Aug 13 19:35:05 UTC 2014 [19:35:15] RECOVERY - Puppet freshness on analytics1018 is OK: puppet ran at Wed Aug 13 19:35:10 UTC 2014 [19:35:36] unrelated things going on :) [19:36:10] puppet_disabled used to check for agents being disabled [19:36:13] and only that [19:36:17] oh, ok. [19:36:32] then the puppet_freshness check was enhanced [19:36:36] to _also_ include that [19:36:53] and that breake above was also unrelated [19:37:48] great, I will settle down then :) [19:40:05] PROBLEM - Puppet freshness on es1009 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:39:49 UTC [19:40:05] PROBLEM - Puppet freshness on mw1124 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:39:49 UTC [19:40:05] PROBLEM - Puppet freshness on pdf2 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:39:13 UTC [19:40:05] PROBLEM - Puppet freshness on wtp1021 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:39:44 UTC [19:40:05] RECOVERY - Puppet freshness on wtp1021 is OK: puppet ran at Wed Aug 13 19:39:58 UTC 2014 [19:40:06] RECOVERY - Puppet freshness on mw1124 is OK: puppet ran at Wed Aug 13 19:40:03 UTC 2014 [19:40:25] RECOVERY - Puppet freshness on pdf2 is OK: puppet ran at Wed Aug 13 19:40:19 UTC 2014 [19:40:35] RECOVERY - Puppet freshness on es1009 is OK: puppet ran at Wed Aug 13 19:40:34 UTC 2014 [19:41:05] PROBLEM - Puppet freshness on mw1134 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:40:49 UTC [19:41:05] PROBLEM - Puppet freshness on cp1059 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:40:44 UTC [19:41:05] PROBLEM - Puppet freshness on mw1080 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:40:54 UTC [19:41:05] PROBLEM - Puppet freshness on mc1009 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:40:54 UTC [19:41:05] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:40:44 UTC [19:41:15] RECOVERY - Puppet freshness on cp1059 is OK: puppet ran at Wed Aug 13 19:41:09 UTC 2014 [19:41:25] RECOVERY - Puppet freshness on mw1134 is OK: puppet ran at Wed Aug 13 19:41:14 UTC 2014 [19:41:45] RECOVERY - Puppet freshness on mc1009 is OK: puppet ran at Wed Aug 13 19:41:35 UTC 2014 [19:41:45] RECOVERY - Puppet freshness on ms-fe1003 is OK: puppet ran at Wed Aug 13 19:41:35 UTC 2014 [19:41:45] RECOVERY - Puppet freshness on db1053 is OK: puppet ran at Wed Aug 13 19:41:40 UTC 2014 [19:41:55] RECOVERY - Puppet freshness on analytics1017 is OK: puppet ran at Wed Aug 13 19:41:45 UTC 2014 [19:42:05] RECOVERY - Puppet freshness on carbon is OK: puppet ran at Wed Aug 13 19:41:55 UTC 2014 [19:42:05] RECOVERY - Puppet freshness on mw1045 is OK: puppet ran at Wed Aug 13 19:41:55 UTC 2014 [19:42:05] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:41:40 UTC [19:42:05] PROBLEM - Puppet freshness on dbproxy1002 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:41:45 UTC [19:42:05] RECOVERY - Puppet freshness on cp4006 is OK: puppet ran at Wed Aug 13 19:42:00 UTC 2014 [19:42:15] RECOVERY - Puppet freshness on mw1080 is OK: puppet ran at Wed Aug 13 19:42:05 UTC 2014 [19:42:15] RECOVERY - Puppet freshness on es1001 is OK: puppet ran at Wed Aug 13 19:42:05 UTC 2014 [19:42:15] RECOVERY - Puppet freshness on magnesium is OK: puppet ran at Wed Aug 13 19:42:05 UTC 2014 [19:42:15] RECOVERY - Puppet freshness on searchidx1001 is OK: puppet ran at Wed Aug 13 19:42:05 UTC 2014 [19:42:15] RECOVERY - Puppet freshness on mw1007 is OK: puppet ran at Wed Aug 13 19:42:05 UTC 2014 [19:42:16] RECOVERY - Puppet freshness on db1065 is OK: puppet ran at Wed Aug 13 19:42:10 UTC 2014 [19:42:16] RECOVERY - Puppet freshness on mw1006 is OK: puppet ran at Wed Aug 13 19:42:10 UTC 2014 [19:42:17] RECOVERY - Puppet freshness on dbproxy1002 is OK: puppet ran at Wed Aug 13 19:42:10 UTC 2014 [19:42:25] RECOVERY - Puppet freshness on mc1016 is OK: puppet ran at Wed Aug 13 19:42:15 UTC 2014 [19:42:25] RECOVERY - Puppet freshness on caesium is OK: puppet ran at Wed Aug 13 19:42:15 UTC 2014 [19:42:25] RECOVERY - Puppet freshness on amslvs2 is OK: puppet ran at Wed Aug 13 19:42:21 UTC 2014 [19:42:25] RECOVERY - Puppet freshness on ms-be1004 is OK: puppet ran at Wed Aug 13 19:42:21 UTC 2014 [19:42:35] RECOVERY - Puppet freshness on platinum is OK: puppet ran at Wed Aug 13 19:42:31 UTC 2014 [19:42:36] RECOVERY - Puppet freshness on mw1041 is OK: puppet ran at Wed Aug 13 19:42:31 UTC 2014 [19:42:36] RECOVERY - Puppet freshness on cp1055 is OK: puppet ran at Wed Aug 13 19:42:31 UTC 2014 [19:42:36] RECOVERY - Puppet freshness on cp3008 is OK: puppet ran at Wed Aug 13 19:42:31 UTC 2014 [19:42:45] RECOVERY - Puppet freshness on cerium is OK: puppet ran at Wed Aug 13 19:42:36 UTC 2014 [19:42:45] RECOVERY - Puppet freshness on mw1063 is OK: puppet ran at Wed Aug 13 19:42:41 UTC 2014 [19:42:45] RECOVERY - Puppet freshness on gold is OK: puppet ran at Wed Aug 13 19:42:41 UTC 2014 [19:42:46] RECOVERY - Puppet freshness on amssq54 is OK: puppet ran at Wed Aug 13 19:42:41 UTC 2014 [19:42:46] RECOVERY - Puppet freshness on cp1049 is OK: puppet ran at Wed Aug 13 19:42:41 UTC 2014 [19:42:46] RECOVERY - Puppet freshness on mw1145 is OK: puppet ran at Wed Aug 13 19:42:41 UTC 2014 [19:42:55] RECOVERY - Puppet freshness on analytics1020 is OK: puppet ran at Wed Aug 13 19:42:46 UTC 2014 [19:42:55] RECOVERY - Puppet freshness on helium is OK: puppet ran at Wed Aug 13 19:42:46 UTC 2014 [19:42:55] RECOVERY - Puppet freshness on mw1140 is OK: puppet ran at Wed Aug 13 19:42:51 UTC 2014 [19:42:56] RECOVERY - Puppet freshness on netmon1001 is OK: puppet ran at Wed Aug 13 19:42:51 UTC 2014 [19:43:05] RECOVERY - Puppet freshness on lvs1002 is OK: puppet ran at Wed Aug 13 19:42:56 UTC 2014 [19:43:05] PROBLEM - Puppet freshness on mw1197 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 17:41:55 UTC [19:43:05] RECOVERY - Puppet freshness on wtp1006 is OK: puppet ran at Wed Aug 13 19:43:01 UTC 2014 [19:43:15] RECOVERY - Puppet freshness on mw1197 is OK: puppet ran at Wed Aug 13 19:43:06 UTC 2014 [19:43:15] RECOVERY - Puppet freshness on analytics1041 is OK: puppet ran at Wed Aug 13 19:43:06 UTC 2014 [19:43:15] RECOVERY - Puppet freshness on elastic1007 is OK: puppet ran at Wed Aug 13 19:43:11 UTC 2014 [19:43:15] RECOVERY - Puppet freshness on ssl1002 is OK: puppet ran at Wed Aug 13 19:43:11 UTC 2014 [19:43:25] RECOVERY - Puppet freshness on mw1082 is OK: puppet ran at Wed Aug 13 19:43:16 UTC 2014 [19:43:26] RECOVERY - Puppet freshness on ms-be1003 is OK: puppet ran at Wed Aug 13 19:43:21 UTC 2014 [19:43:26] RECOVERY - Puppet freshness on mw1174 is OK: puppet ran at Wed Aug 13 19:43:21 UTC 2014 [19:43:26] RECOVERY - Puppet freshness on tridge is OK: puppet ran at Wed Aug 13 19:43:21 UTC 2014 [19:43:26] RECOVERY - Puppet freshness on mw1187 is OK: puppet ran at Wed Aug 13 19:43:21 UTC 2014 [19:43:35] RECOVERY - Puppet freshness on elastic1004 is OK: puppet ran at Wed Aug 13 19:43:26 UTC 2014 [19:43:35] RECOVERY - Puppet freshness on wtp1020 is OK: puppet ran at Wed Aug 13 19:43:26 UTC 2014 [19:43:35] RECOVERY - Puppet freshness on potassium is OK: puppet ran at Wed Aug 13 19:43:26 UTC 2014 [19:43:35] RECOVERY - Puppet freshness on mc1006 is OK: puppet ran at Wed Aug 13 19:43:26 UTC 2014 [19:43:35] RECOVERY - Puppet freshness on db1066 is OK: puppet ran at Wed Aug 13 19:43:26 UTC 2014 [19:43:36] RECOVERY - Puppet freshness on amssq32 is OK: puppet ran at Wed Aug 13 19:43:26 UTC 2014 [19:43:36] RECOVERY - Puppet freshness on db1073 is OK: puppet ran at Wed Aug 13 19:43:31 UTC 2014 [19:43:37] RECOVERY - Puppet freshness on ms-be1006 is OK: puppet ran at Wed Aug 13 19:43:31 UTC 2014 [19:43:37] RECOVERY - Puppet freshness on elastic1001 is OK: puppet ran at Wed Aug 13 19:43:31 UTC 2014 [19:43:38] RECOVERY - Puppet freshness on cp1047 is OK: puppet ran at Wed Aug 13 19:43:32 UTC 2014 [19:43:38] RECOVERY - Puppet freshness on amssq53 is OK: puppet ran at Wed Aug 13 19:43:32 UTC 2014 [19:43:45] RECOVERY - Puppet freshness on rbf1002 is OK: puppet ran at Wed Aug 13 19:43:37 UTC 2014 [19:43:45] RECOVERY - Puppet freshness on db74 is OK: puppet ran at Wed Aug 13 19:43:37 UTC 2014 [19:43:45] RECOVERY - Puppet freshness on analytics1035 is OK: puppet ran at Wed Aug 13 19:43:37 UTC 2014 [19:43:45] RECOVERY - Puppet freshness on cp1039 is OK: puppet ran at Wed Aug 13 19:43:37 UTC 2014 [19:43:45] RECOVERY - Puppet freshness on mw1160 is OK: puppet ran at Wed Aug 13 19:43:37 UTC 2014 [19:43:46] RECOVERY - Puppet freshness on analytics1025 is OK: puppet ran at Wed Aug 13 19:43:37 UTC 2014 [19:43:46] RECOVERY - Puppet freshness on search1016 is OK: puppet ran at Wed Aug 13 19:43:37 UTC 2014 [19:43:47] RECOVERY - Puppet freshness on db1031 is OK: puppet ran at Wed Aug 13 19:43:42 UTC 2014 [19:43:47] RECOVERY - Puppet freshness on lead is OK: puppet ran at Wed Aug 13 19:43:42 UTC 2014 [19:43:48] RECOVERY - Puppet freshness on analytics1040 is OK: puppet ran at Wed Aug 13 19:43:42 UTC 2014 [19:43:48] RECOVERY - Puppet freshness on es1008 is OK: puppet ran at Wed Aug 13 19:43:42 UTC 2014 [19:43:49] RECOVERY - Puppet freshness on db1050 is OK: puppet ran at Wed Aug 13 19:43:42 UTC 2014 [19:43:55] RECOVERY - Puppet freshness on cp3003 is OK: puppet ran at Wed Aug 13 19:43:47 UTC 2014 [19:43:56] RECOVERY - Puppet freshness on ms1004 is OK: puppet ran at Wed Aug 13 19:43:47 UTC 2014 [19:43:56] RECOVERY - Puppet freshness on amssq61 is OK: puppet ran at Wed Aug 13 19:43:47 UTC 2014 [19:43:56] RECOVERY - Puppet freshness on search1010 is OK: puppet ran at Wed Aug 13 19:43:52 UTC 2014 [19:43:56] RECOVERY - Puppet freshness on mw1217 is OK: puppet ran at Wed Aug 13 19:43:52 UTC 2014 [19:43:56] RECOVERY - Puppet freshness on mw1088 is OK: puppet ran at Wed Aug 13 19:43:52 UTC 2014 [19:43:56] RECOVERY - Puppet freshness on mw1026 is OK: puppet ran at Wed Aug 13 19:43:52 UTC 2014 [19:43:57] RECOVERY - Puppet freshness on elastic1008 is OK: puppet ran at Wed Aug 13 19:43:52 UTC 2014 [19:43:57] RECOVERY - Puppet freshness on lvs3001 is OK: puppet ran at Wed Aug 13 19:43:52 UTC 2014 [19:44:02] icinga spaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaam [19:44:05] RECOVERY - Puppet freshness on snapshot1003 is OK: puppet ran at Wed Aug 13 19:43:57 UTC 2014 [19:44:05] RECOVERY - Puppet freshness on mw1069 is OK: puppet ran at Wed Aug 13 19:43:57 UTC 2014 [19:44:05] RECOVERY - Puppet freshness on wtp1016 is OK: puppet ran at Wed Aug 13 19:43:57 UTC 2014 [19:44:05] RECOVERY - Puppet freshness on db1022 is OK: puppet ran at Wed Aug 13 19:44:02 UTC 2014 [19:44:15] RECOVERY - Puppet freshness on mw1003 is OK: puppet ran at Wed Aug 13 19:44:07 UTC 2014 [19:44:15] RECOVERY - Puppet freshness on mw1068 is OK: puppet ran at Wed Aug 13 19:44:12 UTC 2014 [19:44:16] RECOVERY - Puppet freshness on mw1164 is OK: puppet ran at Wed Aug 13 19:44:12 UTC 2014 [19:44:16] RECOVERY - Puppet freshness on lvs1005 is OK: puppet ran at Wed Aug 13 19:44:12 UTC 2014 [19:44:16] RECOVERY - Puppet freshness on mw1117 is OK: puppet ran at Wed Aug 13 19:44:12 UTC 2014 [19:44:16] RECOVERY - Puppet freshness on mw1153 is OK: puppet ran at Wed Aug 13 19:44:12 UTC 2014 [19:44:16] RECOVERY - Puppet freshness on mw1205 is OK: puppet ran at Wed Aug 13 19:44:12 UTC 2014 [19:44:17] RECOVERY - Puppet freshness on mc1002 is OK: puppet ran at Wed Aug 13 19:44:12 UTC 2014 [19:44:17] RECOVERY - Puppet freshness on ms-fe1001 is OK: puppet ran at Wed Aug 13 19:44:12 UTC 2014 [19:44:25] RECOVERY - Puppet freshness on mc1003 is OK: puppet ran at Wed Aug 13 19:44:17 UTC 2014 [19:44:25] RECOVERY - Puppet freshness on elastic1012 is OK: puppet ran at Wed Aug 13 19:44:17 UTC 2014 [19:44:25] RECOVERY - Puppet freshness on amssq35 is OK: puppet ran at Wed Aug 13 19:44:17 UTC 2014 [19:44:25] RECOVERY - Puppet freshness on mw1046 is OK: puppet ran at Wed Aug 13 19:44:17 UTC 2014 [19:44:25] RECOVERY - Puppet freshness on mw1150 is OK: puppet ran at Wed Aug 13 19:44:22 UTC 2014 [19:44:26] RECOVERY - Puppet freshness on cp3016 is OK: puppet ran at Wed Aug 13 19:44:22 UTC 2014 [19:44:26] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Wed Aug 13 19:44:22 UTC 2014 [19:44:27] RECOVERY - Puppet freshness on mw1060 is OK: puppet ran at Wed Aug 13 19:44:22 UTC 2014 [19:44:35] RECOVERY - Puppet freshness on mw1176 is OK: puppet ran at Wed Aug 13 19:44:27 UTC 2014 [19:44:45] RECOVERY - Puppet freshness on mw1120 is OK: puppet ran at Wed Aug 13 19:44:38 UTC 2014 [19:44:45] RECOVERY - Puppet freshness on db1002 is OK: puppet ran at Wed Aug 13 19:44:38 UTC 2014 [19:44:46] RECOVERY - Puppet freshness on mw1009 is OK: puppet ran at Wed Aug 13 19:44:43 UTC 2014 [19:44:46] RECOVERY - Puppet freshness on iron is OK: puppet ran at Wed Aug 13 19:44:43 UTC 2014 [19:44:55] RECOVERY - Puppet freshness on db1015 is OK: puppet ran at Wed Aug 13 19:44:48 UTC 2014 [19:44:55] RECOVERY - Puppet freshness on analytics1030 is OK: puppet ran at Wed Aug 13 19:44:48 UTC 2014 [19:44:55] RECOVERY - Puppet freshness on cp1056 is OK: puppet ran at Wed Aug 13 19:44:48 UTC 2014 [19:44:56] RECOVERY - Puppet freshness on mw1166 is OK: puppet ran at Wed Aug 13 19:44:53 UTC 2014 [19:45:05] RECOVERY - Puppet freshness on mw1189 is OK: puppet ran at Wed Aug 13 19:44:58 UTC 2014 [19:45:05] RECOVERY - Puppet freshness on mw1100 is OK: puppet ran at Wed Aug 13 19:44:58 UTC 2014 [19:45:05] RECOVERY - Puppet freshness on mw1173 is OK: puppet ran at Wed Aug 13 19:44:58 UTC 2014 [19:45:05] RECOVERY - Puppet freshness on mw1099 is OK: puppet ran at Wed Aug 13 19:44:58 UTC 2014 [19:45:05] RECOVERY - Puppet freshness on mw1144 is OK: puppet ran at Wed Aug 13 19:44:58 UTC 2014 [19:45:06] RECOVERY - Puppet freshness on cp3014 is OK: puppet ran at Wed Aug 13 19:44:58 UTC 2014 [19:45:16] RECOVERY - Puppet freshness on ms-fe1004 is OK: puppet ran at Wed Aug 13 19:45:08 UTC 2014 [19:45:25] RECOVERY - Puppet freshness on cp4008 is OK: puppet ran at Wed Aug 13 19:45:18 UTC 2014 [19:45:25] RECOVERY - Puppet freshness on cp4003 is OK: puppet ran at Wed Aug 13 19:45:18 UTC 2014 [19:45:25] RECOVERY - Puppet freshness on mw1042 is OK: puppet ran at Wed Aug 13 19:45:18 UTC 2014 [19:45:35] RECOVERY - Puppet freshness on mw1025 is OK: puppet ran at Wed Aug 13 19:45:28 UTC 2014 [19:45:36] RECOVERY - Puppet freshness on mw1061 is OK: puppet ran at Wed Aug 13 19:45:33 UTC 2014 [19:45:36] RECOVERY - Puppet freshness on mw1052 is OK: puppet ran at Wed Aug 13 19:45:33 UTC 2014 [19:45:45] RECOVERY - Puppet freshness on cp1061 is OK: puppet ran at Wed Aug 13 19:45:38 UTC 2014 [19:45:45] RECOVERY - Puppet freshness on db1018 is OK: puppet ran at Wed Aug 13 19:45:38 UTC 2014 [19:45:45] RECOVERY - Puppet freshness on search1018 is OK: puppet ran at Wed Aug 13 19:45:38 UTC 2014 [19:45:46] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Aug 13 19:45:43 UTC 2014 [19:45:46] RECOVERY - Puppet freshness on labsdb1003 is OK: puppet ran at Wed Aug 13 19:45:43 UTC 2014 [19:45:46] RECOVERY - Puppet freshness on mw1065 is OK: puppet ran at Wed Aug 13 19:45:43 UTC 2014 [19:45:55] RECOVERY - Puppet freshness on mw1213 is OK: puppet ran at Wed Aug 13 19:45:48 UTC 2014 [19:45:55] RECOVERY - Puppet freshness on db1059 is OK: puppet ran at Wed Aug 13 19:45:48 UTC 2014 [19:45:55] RECOVERY - Puppet freshness on mw1118 is OK: puppet ran at Wed Aug 13 19:45:48 UTC 2014 [19:45:55] RECOVERY - Puppet freshness on holmium is OK: puppet ran at Wed Aug 13 19:45:48 UTC 2014 [19:45:56] RECOVERY - Puppet freshness on amslvs1 is OK: puppet ran at Wed Aug 13 19:45:53 UTC 2014 [19:45:56] RECOVERY - Puppet freshness on search1001 is OK: puppet ran at Wed Aug 13 19:45:53 UTC 2014 [19:45:56] RECOVERY - Puppet freshness on analytics1010 is OK: puppet ran at Wed Aug 13 19:45:53 UTC 2014 [19:45:57] RECOVERY - Puppet freshness on db1046 is OK: puppet ran at Wed Aug 13 19:45:53 UTC 2014 [19:46:05] RECOVERY - Puppet freshness on db1028 is OK: puppet ran at Wed Aug 13 19:45:58 UTC 2014 [19:46:05] RECOVERY - Puppet freshness on db1051 is OK: puppet ran at Wed Aug 13 19:45:58 UTC 2014 [19:46:05] RECOVERY - Puppet freshness on db1023 is OK: puppet ran at Wed Aug 13 19:45:58 UTC 2014 [19:46:06] RECOVERY - Puppet freshness on db1067 is OK: puppet ran at Wed Aug 13 19:46:03 UTC 2014 [19:46:06] RECOVERY - Puppet freshness on db1021 is OK: puppet ran at Wed Aug 13 19:46:03 UTC 2014 [19:46:06] RECOVERY - Puppet freshness on db1042 is OK: puppet ran at Wed Aug 13 19:46:03 UTC 2014 [19:46:06] RECOVERY - Puppet freshness on amssq55 is OK: puppet ran at Wed Aug 13 19:46:03 UTC 2014 [19:46:07] RECOVERY - Puppet freshness on dbproxy1001 is OK: puppet ran at Wed Aug 13 19:46:03 UTC 2014 [19:46:07] RECOVERY - Puppet freshness on mw1092 is OK: puppet ran at Wed Aug 13 19:46:04 UTC 2014 [19:46:08] RECOVERY - Puppet freshness on cp4004 is OK: puppet ran at Wed Aug 13 19:46:04 UTC 2014 [19:46:08] RECOVERY - Puppet freshness on mw1054 is OK: puppet ran at Wed Aug 13 19:46:04 UTC 2014 [19:46:15] RECOVERY - Puppet freshness on amssq48 is OK: puppet ran at Wed Aug 13 19:46:09 UTC 2014 [19:46:15] RECOVERY - Puppet freshness on search1007 is OK: puppet ran at Wed Aug 13 19:46:09 UTC 2014 [19:46:16] RECOVERY - Puppet freshness on db1034 is OK: puppet ran at Wed Aug 13 19:46:09 UTC 2014 [19:46:16] RECOVERY - Puppet freshness on mw1170 is OK: puppet ran at Wed Aug 13 19:46:09 UTC 2014 [19:46:16] RECOVERY - Puppet freshness on amssq60 is OK: puppet ran at Wed Aug 13 19:46:09 UTC 2014 [19:46:16] RECOVERY - Puppet freshness on mw1123 is OK: puppet ran at Wed Aug 13 19:46:14 UTC 2014 [19:46:16] RECOVERY - Puppet freshness on logstash1002 is OK: puppet ran at Wed Aug 13 19:46:14 UTC 2014 [19:46:25] RECOVERY - Puppet freshness on mw1119 is OK: puppet ran at Wed Aug 13 19:46:19 UTC 2014 [19:46:25] RECOVERY - Puppet freshness on mw1039 is OK: puppet ran at Wed Aug 13 19:46:19 UTC 2014 [19:46:25] RECOVERY - Puppet freshness on mw1211 is OK: puppet ran at Wed Aug 13 19:46:19 UTC 2014 [19:46:25] RECOVERY - Puppet freshness on amssq46 is OK: puppet ran at Wed Aug 13 19:46:19 UTC 2014 [19:46:26] RECOVERY - Puppet freshness on mw1011 is OK: puppet ran at Wed Aug 13 19:46:19 UTC 2014 [19:46:26] RECOVERY - Puppet freshness on mw1002 is OK: puppet ran at Wed Aug 13 19:46:24 UTC 2014 [19:46:26] RECOVERY - Puppet freshness on mw1126 is OK: puppet ran at Wed Aug 13 19:46:24 UTC 2014 [19:46:35] RECOVERY - Puppet freshness on mw1177 is OK: puppet ran at Wed Aug 13 19:46:29 UTC 2014 [19:46:35] RECOVERY - Puppet freshness on mw1175 is OK: puppet ran at Wed Aug 13 19:46:29 UTC 2014 [19:46:35] RECOVERY - Puppet freshness on mw1114 is OK: puppet ran at Wed Aug 13 19:46:29 UTC 2014 [19:46:36] RECOVERY - Puppet freshness on ms-be3003 is OK: puppet ran at Wed Aug 13 19:46:34 UTC 2014 [19:46:36] RECOVERY - Puppet freshness on mw1129 is OK: puppet ran at Wed Aug 13 19:46:34 UTC 2014 [19:46:36] RECOVERY - Puppet freshness on cp4014 is OK: puppet ran at Wed Aug 13 19:46:34 UTC 2014 [19:46:45] RECOVERY - Puppet freshness on mc1012 is OK: puppet ran at Wed Aug 13 19:46:39 UTC 2014 [19:46:46] RECOVERY - Puppet freshness on amssq47 is OK: puppet ran at Wed Aug 13 19:46:44 UTC 2014 [19:46:46] RECOVERY - Puppet freshness on wtp1005 is OK: puppet ran at Wed Aug 13 19:46:44 UTC 2014 [19:47:05] RECOVERY - Puppet freshness on virt1006 is OK: puppet ran at Wed Aug 13 19:46:54 UTC 2014 [19:47:05] RECOVERY - Puppet freshness on gallium is OK: puppet ran at Wed Aug 13 19:46:59 UTC 2014 [19:47:05] RECOVERY - Puppet freshness on nescio is OK: puppet ran at Wed Aug 13 19:46:59 UTC 2014 [19:47:15] RECOVERY - Puppet freshness on dataset1001 is OK: puppet ran at Wed Aug 13 19:47:04 UTC 2014 [19:47:15] RECOVERY - Puppet freshness on mw1172 is OK: puppet ran at Wed Aug 13 19:47:04 UTC 2014 [19:47:15] RECOVERY - Puppet freshness on tin is OK: puppet ran at Wed Aug 13 19:47:04 UTC 2014 [19:47:15] RECOVERY - Puppet freshness on cp4019 is OK: puppet ran at Wed Aug 13 19:47:10 UTC 2014 [19:47:15] RECOVERY - Puppet freshness on pc1002 is OK: puppet ran at Wed Aug 13 19:47:10 UTC 2014 [19:47:16] RECOVERY - Puppet freshness on db1043 is OK: puppet ran at Wed Aug 13 19:47:10 UTC 2014 [19:47:25] RECOVERY - Puppet freshness on db1052 is OK: puppet ran at Wed Aug 13 19:47:15 UTC 2014 [19:47:25] RECOVERY - Puppet freshness on analytics1016 is OK: puppet ran at Wed Aug 13 19:47:20 UTC 2014 [19:47:25] RECOVERY - Puppet freshness on mw1162 is OK: puppet ran at Wed Aug 13 19:47:20 UTC 2014 [19:47:25] RECOVERY - Puppet freshness on labnet1001 is OK: puppet ran at Wed Aug 13 19:47:20 UTC 2014 [19:47:29] (03PS2) 10Ottomata: Lowering celery concurrency and removing MAX_PARALLEL_RUN [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/153390 (owner: 10Nuria) [19:47:35] RECOVERY - Puppet freshness on mw1076 is OK: puppet ran at Wed Aug 13 19:47:25 UTC 2014 [19:47:35] RECOVERY - Puppet freshness on cp1058 is OK: puppet ran at Wed Aug 13 19:47:30 UTC 2014 [19:47:45] RECOVERY - Puppet freshness on db1016 is OK: puppet ran at Wed Aug 13 19:47:35 UTC 2014 [19:47:45] RECOVERY - Puppet freshness on mw1044 is OK: puppet ran at Wed Aug 13 19:47:35 UTC 2014 [19:47:45] RECOVERY - Puppet freshness on wtp1012 is OK: puppet ran at Wed Aug 13 19:47:40 UTC 2014 [19:47:45] RECOVERY - Puppet freshness on polonium is OK: puppet ran at Wed Aug 13 19:47:40 UTC 2014 [19:47:55] RECOVERY - Puppet freshness on es1007 is OK: puppet ran at Wed Aug 13 19:47:45 UTC 2014 [19:47:55] RECOVERY - Puppet freshness on ruthenium is OK: puppet ran at Wed Aug 13 19:47:45 UTC 2014 [19:47:55] RECOVERY - Puppet freshness on silver is OK: puppet ran at Wed Aug 13 19:47:50 UTC 2014 [19:47:55] RECOVERY - Puppet freshness on lvs4003 is OK: puppet ran at Wed Aug 13 19:47:50 UTC 2014 [19:48:05] RECOVERY - Puppet freshness on mw1055 is OK: puppet ran at Wed Aug 13 19:48:00 UTC 2014 [19:48:28] (03CR) 10Ottomata: [C: 032 V: 032] Lowering celery concurrency and removing MAX_PARALLEL_RUN [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/153390 (owner: 10Nuria) [19:52:45] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Wed Aug 13 19:52:44 UTC 2014 [19:55:13] (03PS17) 10Withoutaname: Delete ve.wikimedia.org and leave redirect [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131907 (https://bugzilla.wikimedia.org/55737) [20:00:05] gwicke, subbu, cscott: Respected human, time to deploy Parsoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140813T2000). Please do the needful. [20:00:15] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Wed Aug 13 20:00:05 UTC 2014 [20:00:16] (03PS2) 10Withoutaname: Add "eliminator" group to viwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149637 (https://bugzilla.wikimedia.org/68612) [20:00:45] jouncebot, greg-g: don't think we're going to deploy today [20:00:53] kk [20:08:07] andrewbogott: http://puppet-compiler.wmflabs.org/212/change/153866/html/ [20:09:15] mutante: gerrit link? [20:09:26] https://gerrit.wikimedia.org/r/#/c/153870/ [20:11:22] unless there is labs of labs [20:11:37] since i am now using ${webserver_hostname}.erb for the template name [20:11:49] and default => "www.${controller_hostname}", [20:12:18] seemed better though to not hardcode value [20:13:12] eh, i meant default => $controller_hostname, www is the alias [20:13:50] (03PS2) 10Ori.livneh: mediawiki: create common-local directory [operations/puppet] - 10https://gerrit.wikimedia.org/r/153807 [20:16:40] greg-g: confirmed with rest of parsoid team (those not on vacation, at least) -- we are definitely not deploying today [20:17:10] i should probably do some OCG deploying, though. would that be a good use of the window? [20:18:11] (03CR) 10Giuseppe Lavagetto: [C: 031] "LOL. +2 for the best commit message of the week." [operations/puppet] - 10https://gerrit.wikimedia.org/r/153807 (owner: 10Ori.livneh) [20:19:02] <_joe_> ori: I was sad when I saw your patch, because I knew what it was about, but your commit message made my evening. [20:20:30] (03PS1) 10Dzahn: gdash - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153883 [20:20:38] heh [20:20:42] :) [20:21:23] (03CR) 10Ori.livneh: [C: 032] mediawiki: create common-local directory [operations/puppet] - 10https://gerrit.wikimedia.org/r/153807 (owner: 10Ori.livneh) [20:22:34] (03CR) 10Ori.livneh: [C: 031] gdash - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153883 (owner: 10Dzahn) [20:22:43] mutante: there is labs of labs but it's too different for this patch to teach us anything [20:23:23] (03CR) 10Andrew Bogott: [C: 032] "Seems simple enough... let's see!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153870 (owner: 10Dzahn) [20:24:00] (03PS4) 10Ori.livneh: wmflib: add validate_ensure() [operations/puppet] - 10https://gerrit.wikimedia.org/r/153586 [20:24:05] (03PS1) 10Dzahn: kibana - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153884 [20:25:29] andrewbogott: :) the one alternative i thought of was making the 'Template[...' the requirement vs. File [20:25:38] (03CR) 10Ori.livneh: [C: 032 V: 032] wmflib: add validate_ensure() [operations/puppet] - 10https://gerrit.wikimedia.org/r/153586 (owner: 10Ori.livneh) [20:25:56] have not looked to closely at ports-wikitech.conf yet [20:26:12] dammit! Error: Failed to apply catalog: Could not find dependency File[/etc/apache2/sites-available/wikitech.wikimedia.org] for File[/etc/apache2/conf.d/ports-wikitech.conf] at /etc/puppet/manifests/openstack.pp:417 [20:26:28] argg.. [20:26:35] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Epic puppet fail [20:26:37] (03CR) 10Ori.livneh: [C: 031] kibana - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153884 (owner: 10Dzahn) [20:26:37] and i even used the compiler :p [20:26:50] It's not an emergency, we can fix... [20:27:21] do we really need that requirement? (anymore)? [20:27:50] if file { '/etc/apache2/conf.d/ports-wikitech.conf': exists before the site.. does it matter? [20:28:30] all it has is [20:28:32] Listen 80 [20:28:33] Listen 443 [20:28:46] i wonder if that entire file should even be there [20:29:03] mutante: you can use apache::conf for that [20:29:09] it will take care of ordering and dependencies for you [20:29:22] that sounds much better :) [20:29:41] apache::conf { 'wikitech_ports': content => "Listen 443\n", } [20:29:52] (80 is default, so no need to redeclare) [20:30:01] so, mutante, you're on top of that? [20:30:06] * andrewbogott avoids writing a competing patch [20:30:19] yes [20:30:47] cscott: sure [20:33:38] (03PS1) 10Dzahn: openstack - use apache::conf for port [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 [20:34:04] ori: thanks! like so? [20:34:32] (03CR) 10Ori.livneh: "Don't squish it on one line (I just did that for IRC). Also, since it only has 443, "ports" (plural) is inaccurate. Maybe 'wikitech_ssl_po" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 (owner: 10Dzahn) [20:34:46] yes, with my usual fussiness ;) [20:35:55] (03PS6) 10Ori.livneh: wmflib: add ordered_yaml() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149775 [20:36:22] (03PS2) 10Dzahn: openstack - use apache::conf for port [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 [20:37:12] (03CR) 10Ori.livneh: [C: 031] openstack - use apache::conf for port [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 (owner: 10Dzahn) [20:37:20] (03PS3) 10Dzahn: openstack - use apache::conf for port [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 [20:37:37] (03CR) 10Ori.livneh: [C: 032 V: 032] wmflib: add ordered_yaml() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149775 (owner: 10Ori.livneh) [20:37:48] (03CR) 10Dzahn: [C: 031] "done, done, and fixed quoting" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 (owner: 10Dzahn) [20:39:10] (03CR) 10Dzahn: [C: 032] openstack - use apache::conf for port [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 (owner: 10Dzahn) [20:39:16] PROBLEM - puppet last run on virt0 is CRITICAL: CRITICAL: Epic puppet fail [20:40:01] ori: like https://gerrit.wikimedia.org/r/#/c/153888/ ok? [20:40:22] should be double quotes [20:40:28] because of \n [20:40:31] but otherwise yeah [20:41:01] \n, really? [20:41:42] oh, double quotes for that, just thought about variables [20:42:03] I don't follow why \n would be double quotes. It's just another character isn't it? [20:42:12] do single-quotes stop \escaping? [20:42:26] because ruby [20:42:28] irb(main):001:0> puts "hello\n" [20:42:29] hello [20:42:29] => nil [20:42:30] irb(main):002:0> puts 'hello\n' [20:42:32] hello\n [20:42:34] => nil [20:42:39] different rules for escaping [20:42:41] ok then :( [20:42:46] ugh.. ok [20:42:48] just like in php… [20:43:01] yes, php too, right [20:43:35] (03PS4) 10Dzahn: openstack - use apache::conf for port [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 [20:43:36] it's too prevalent to be completely senseless, so i imagine (or would like to imagine) that there exists some nominally convincing explanation [20:43:43] but i just know that that's how it is, not why [20:44:08] (03CR) 10Ori.livneh: [C: 031] openstack - use apache::conf for port [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 (owner: 10Dzahn) [20:46:32] (03CR) 10Dzahn: [C: 032] kibana - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153884 (owner: 10Dzahn) [20:46:39] (03CR) 10Ori.livneh: [C: 04-1] "Use apache::conf for the ports, like you did in the openstack patch" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 (owner: 10Dzahn) [20:47:43] (03CR) 10Ori.livneh: [C: 031] ishmael - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153846 (owner: 10Dzahn) [20:48:07] mutante: what do you think of the apache module? it's pretty painless, no? [20:49:13] it makes the code nicer to look at, i do like apache::site vs. just some file {} and slightly different dependencies everywhere, yes [20:49:20] we just have to go through all of them once [20:49:30] yes, thanks for doing that [20:49:36] and for fixing various glitches along the way [20:49:50] <_joe_> ori: describing something as "pretty painless" is not good marketing! [20:49:51] you're welcome, thanks for reviews [20:50:03] haha @ marketing :) [20:50:04] <_joe_> mutante: isn't the apache module AWESOME? [20:50:15] (03CR) 10Andrew Bogott: [C: 032] openstack - use apache::conf for port [operations/puppet] - 10https://gerrit.wikimedia.org/r/153888 (owner: 10Dzahn) [20:50:19] <_joe_> :) [20:50:35] :) just give me a moment to get adjusted:) [20:50:50] in north korea I bet 'pretty painless' would get them lined up :) [20:51:04] heheh [20:51:09] <_joe_> eheh [20:51:25] <_joe_> and ops is clearly the north korea of computer science [20:51:29] <_joe_> agreed [20:51:42] <_joe_> or at least, ops + puppet is [20:52:58] the emojis in the repo are probably steganography [20:53:35] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [20:53:37] woo [20:53:49] andrewbogott: worked? [20:53:51] (03CR) 10Dzahn: [C: 032] gdash - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153883 (owner: 10Dzahn) [20:53:53] * andrewbogott remembers to check if wikitech still loads [20:54:02] :) wfm [20:54:12] yep, seems good [20:54:15] nice [20:54:20] "Beware of bugs in the above code; I have only proved it correct, not tried it" [20:54:55] Why did the compiler not notice that reference to an undefined file? [20:55:06] <_joe_> andrewbogott: uh? [20:55:21] yea, i had http://puppet-compiler.wmflabs.org/212/change/153866/html/ [20:55:27] <_joe_> did the manifest compile? [20:55:40] <_joe_> the execution may fail as well [20:55:44] <_joe_> say you have a link [20:55:51] <_joe_> to a file not managed by puppet [20:56:00] http://puppet-compiler.wmflabs.org/212/change/153866/compiled/puppet_catalogs_3_153866/ [20:56:00] <_joe_> the manifest will compile [20:56:35] <_joe_> can I know what was broken? [20:56:48] ah, let's do the "variable access .. is deprecated" as well [20:57:13] _joe_: 13:28 < andrewbogott> dammit! Error: Failed to apply catalog: Could not find dependency File[/etc/apache2/sites-available/wikitech.wikimedia.org] for File[/etc/apache2/conf.d/ports-wikitech.conf] at /etc/puppet/manifests/openstack.pp:417 [20:57:16] RECOVERY - puppet last run on virt0 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [20:57:25] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:57:32] uh? [20:57:53] _joe_: that error above was not detected by the compiler [20:58:04] <_joe_> yes it's an apply error [20:58:07] the "uh?" was about the new icinga-wm problem though [20:58:46] mutante: yeah, looks like we clobbered the puppet apache config. Looking... [20:59:08] andrewbogott: it works for me though, https://virt1000.wikimedia.org/ [20:59:26] mutante: it's the puppetmaster though -- different port I think? [20:59:41] oh, puppetmaster, true [21:01:07] may be a false alarm, I'm getting a good puppet run on a labs instance... [21:01:22] we should make another change [21:01:29] that also uses the new method for the puppetmaster confi [21:01:56] the load order changed [21:02:02] because wikitech is now 50-wikitech [21:02:21] ok… load order shouldn't matter though, should it? [21:02:22] greg-g: just finished building the wmf-deploy repo for OCG. is it still ok to go ahead and git-deploy? (i think i'm at the end of the parsoid window) [21:02:28] but it should not be a socket timeout anyways [21:03:19] cscott: yeah, nothing until swat anyway [21:03:25] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [21:03:25] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [21:03:36] andrewbogott: did you graceful? any errors? [21:04:05] greg-g: ok, thanks. [21:04:18] mutante: I don't know what you mean by 'did you graceful' [21:04:23] So I'm guessing the answer is no :) [21:04:39] And, my labs puppet run is hanging... [21:05:51] let me try something..arg [21:06:27] now it broke.. sigh [21:06:38] * Starting web server apache2 (98)Address already in use: make_sock: could not bind to address [::]:443 [21:06:52] shit shit shit [21:07:00] um… I'm going to revoke everything by hand [21:07:20] let me move the old config file [21:07:27] but i dont want to conflict with you [21:07:39] please stand back [21:07:39] i say move the wikitech.wikimedia.org file [21:07:42] and leave the 50- [21:07:43] ok [21:07:45] PROBLEM - HTTP on virt1000 is CRITICAL: Connection refused [21:08:07] andrewbogott: it probably didn't kill apache fully [21:08:13] do a full stop, kill any lingering apaches [21:08:13] well, dammit, I backed all this up [21:08:18] but restoring it isn't helping [21:08:22] <_joe_> andrewbogott: keep calm [21:08:26] you probably have lingering apache processes [21:08:34] that won't release the port [21:08:37] <_joe_> don't revoke anything [21:08:47] <_joe_> lemme take a look [21:08:49] nope [21:08:52] oh, hm, wikitech just went down. i didn't do it! [21:08:55] no lingering apaches. [21:09:02] remove the duplicate configs first [21:09:13] just leave the newly generated one and try a restart then [21:09:35] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Epic puppet fail [21:10:08] i'll create another patch so that 'puppetmaster' also uses the new method [21:10:15] <_joe_> plese stand down guys [21:10:30] I thought that the apache module cleared out duplicates? [21:10:34] <_joe_> no please don't mess with puppetmaster configs [21:10:43] _joe_: we're talking about virt1000 [21:10:47] which is puppetmaster for labs. [21:10:52] <_joe_> can I know what's the status of the server? [21:10:54] <_joe_> I know [21:11:09] 4.0K -r--r--r-- 1 root root 3.2K Aug 13 20:52 50-wikitech-wikimedia-org.conf [21:11:09] <_joe_> is it at the state of its last puppet run? [21:11:13] 4.0K -rw-r--r-- 1 root root 2.4K Aug 13 21:07 wikitech.wikimedia.org [21:11:18] <_joe_> or not? [21:11:36] _joe_ I just cleared out sites-enabled and sites-available. I'm rerunning puppet now to regenerate configs. Then I'll restart apache and see what it does. [21:11:52] <_joe_> no just rerun puppet and tell me please [21:11:59] ok [21:12:39] ok, puppet is finished running. [21:13:35] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:13:55] (03CR) 10JanZerebecki: [C: 031] gerrit - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153849 (owner: 10Dzahn) [21:14:48] <_joe_> (98)Address already in use: make_sock: could not bind to address [::]:443 [21:14:53] yep [21:14:57] <_joe_> now, this is baffling [21:14:59] !log updated OCG to version faee29a260a96cbc0cdc0d402658b754a7425af9 [21:15:05] andrewbogott: it clears undeclared files [21:15:05] <_joe_> lemme check a few things here [21:15:12] if the dupes are declares, puppet won't purge them, obviously [21:15:26] <_joe_> ./conf.d/ports-wikitech.conf:7:Listen 443 [21:15:27] <_joe_> ./conf-available/50-wikitech-https-port.conf:1:Listen 443 [21:15:31] <_joe_> this is the problem [21:15:33] ori: Sure. At this point it's pretty clear that duplicate files weren't the problem. [21:15:43] ah, we need to delete the file ports-wikitech.conf [21:15:46] <_joe_> you declared to listen wtice [21:15:51] <_joe_> no [21:15:51] we removed it from puppet but did not ensure absent [21:15:56] <_joe_> no. [21:16:11] <_joe_> mutante: you should add a listen 80 somewhere if removing that file [21:16:21] ports.conf does, no? [21:16:23] https://gerrit.wikimedia.org/r/#/c/153888/ [21:16:33] _joe_ can you please hotfix while we discuss the proper solution? [21:16:35] <_joe_> fgrep -nir Listen /etc/apache2 [21:16:43] ori said 80 is there per default [21:17:08] Reedy: You're on the wikitech fail atm? [21:17:16] Nope [21:17:22] I think _joe_, andrewbogott and mutante are [21:17:23] Coren: _joe_, mutante, andrewbogott and i are; s'okay [21:17:24] Coren: we're on it. [21:17:26] <_joe_> done. [21:17:30] thanks [21:17:40] (03CR) 10JanZerebecki: [C: 04-1] wikistats - use apache::site (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153828 (owner: 10Dzahn) [21:17:40] <_joe_> Coren: wikitech should be back [21:17:41] it's back [21:17:41] So... [21:17:45] RECOVERY - HTTP on virt1000 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 0.003 second response time [21:17:47] <_joe_> andrewbogott: stop puppet [21:17:51] so, missing Listen 80 ? duh :p [21:18:04] _joe_: ok, done. [21:18:16] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.042 second response time [21:18:22] <_joe_> !log stopped puppet on virt1000, our fail [21:18:26] (03PS4) 10Ori.livneh: Clean up salt::minion [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 [21:18:28] Logged the message, Master [21:18:40] <_joe_> mutante: you shoud add a Liste 80 somewhere and remove the old file [21:19:14] ori: So in addition to whatever port problem we're having (which I still don't quite follow) it seems like there are two other things: a) removal of old config files, and b) restarting Apache when the config changes. [21:19:15] removing with ensure ? [21:19:17] _joe_: is there no ports.conf on that host that declares listen 80? [21:19:25] andrewbogott: conf.d is not recursively managed [21:19:42] <_joe_> ori: no [21:20:09] (03PS1) 10Dzahn: openstack Apache conf, also listen on port 80 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153946 [21:20:35] content => "Listen 80\nListen 443\n", ? done? [21:20:41] you meant manually delete? [21:21:13] <_joe_> mutante: is it devclared in puppet? [21:21:45] it was, i deleted it in https://gerrit.wikimedia.org/r/#/c/153888/4 [21:21:57] <_joe_> ok so just remove it [21:22:10] <_joe_> apache::conf doesn't manage conf.d/ [21:22:23] <_joe_> it manages conf-enabled/ conf-available/ [21:22:48] I'd like to duplicate this change on virt0; what file are you rm'ing? [21:23:09] /etc/apache2/conf.d/ports-wikitech.conf [21:23:13] ok [21:23:28] (03PS2) 10Dzahn: openstack Apache conf, also listen on port 80 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153946 [21:24:13] What happened to '80 is on by default'? Was that a misunderstanding? Or… [21:24:37] root@virt1000:/etc/apache2# grep Listen ports.conf [21:24:42] (03PS5) 10Ori.livneh: Clean up salt::minion [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 [21:24:44] Listen 8140 [21:24:54] usually i would expect it to be in /etc/apache2/ports.conf [21:24:59] yes [21:25:01] but we have 8140 there in this case [21:25:13] <_joe_> because it's a labs puppetmaster [21:25:13] the Debian package would normally put it there for 80 [21:25:16] exactly [21:25:20] <_joe_> whose config is terrible [21:25:21] so special case [21:25:24] <_joe_> :) [21:25:59] so, i'm merging? [21:26:14] <_joe_> the problem is, we have 200 modules that deal with apache, with 187 different (and subtly wrong) way of doing things out of the standard [21:26:32] <_joe_> we should take great care to understand how the config is now [21:26:39] <_joe_> whenever migrating to apache [21:26:45] mutante: yes, go ahead and merge [21:26:46] (03CR) 10JanZerebecki: [C: 031] stats.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153832 (owner: 10Dzahn) [21:27:17] right, all modules have slightly different dependencies in the apache stuff :p [21:27:32] (03CR) 10Dzahn: [C: 032] openstack Apache conf, also listen on port 80 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153946 (owner: 10Dzahn) [21:28:15] but that is why we are doing this:) [21:28:19] ok, _joe, mutante, ready for me to reenable puppet and refresh? [21:28:28] <_joe_> yes [21:28:53] ok, merged on palladium [21:28:58] <_joe_> mutante: right, I was just suggesting some reconaissance before going in the wild with a change [21:29:25] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [21:29:25] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [21:31:45] _joe_: right, I ran it in compiler, got reviews, and dont even self merge [21:33:31] _joe_: now I see ... waiting servi(98)Address already in use: make_sock: could not bind to address [::]:80 [21:33:40] and wikitech is down again [21:34:10] conf.d/ports-wikitech.conf:Listen 80 [21:34:15] it's still there [21:34:22] and, again, this didn't happen until I explicitly restarted Apache. Ori, shouldn't the apache module restart apache when config changes? [21:34:46] mutante: so, rm it again, and then figure out why puppet is creating it? [21:34:51] <_joe_> .... [21:34:57] <_joe_> I told you. [21:35:16] PROBLEM - puppetmaster https on virt1000 is CRITICAL: Connection refused [21:35:24] yes, delete the file [21:35:29] <_joe_> andrewbogott: because the apache module reloads gracefully apache [21:35:40] <_joe_> opposed to a hard restart [21:35:45] PROBLEM - HTTP on virt1000 is CRITICAL: Connection refused [21:35:49] <_joe_> gracefully doing that means config is tested [21:36:16] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.046 second response time [21:36:32] <_joe_> and the ports are not respawned if not necessary [21:36:45] RECOVERY - HTTP on virt1000 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 0.004 second response time [21:38:04] mutante: so, you removed that file before I reran puppet, right? So it was recreated somehow? [21:38:26] Sorry, if y'all are on top of this I'm happy to butt out [21:38:44] no, i stepped back to not cause a conflict, so it was never delted [21:38:52] puppet should not recreate it [21:39:11] oh :( [21:40:30] greg-g: done with OCG deploy [21:41:13] now watching the similar change on gdash [21:43:48] ok, this time I got a clean puppet run and an apache restart without killing wikitech. yay? [21:43:56] <_joe_> did you log puppet was restarted? [21:44:37] _joe_ not yet [21:44:44] testing a couple other things [21:44:50] <_joe_> ok [21:45:12] <_joe_> take your time, I'm off :) [21:45:24] ok, later! [21:45:28] me too, shortly [21:45:33] gonna catch up on wikimania talks [21:45:59] ok, labs puppetmaster seems happy too [21:46:33] !log re-enabled puppetmaster on virt1000; apache changes seem stable now. [21:46:39] Logged the message, Master [21:49:22] (03PS1) 10Dzahn: tendril - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153951 [21:51:16] (03PS6) 10Ori.livneh: Clean up salt::minion [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 [21:51:57] (03CR) 10jenkins-bot: [V: 04-1] Clean up salt::minion [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 (owner: 10Ori.livneh) [21:52:25] So, ori or someone, can you explain to me about graceful apache restart vs normal 'service apache2 restart'? [21:52:37] It seems to me that if a graceful succeeds but a hard restart fails, then... [21:52:45] we're suppressing a latent bug by not doing a hard restart. [21:52:47] Is that wrong somehow? [21:52:54] only because we changed the Listen ports [21:53:26] that. [21:53:32] "This differs from a normal restart in that currently open connections are not aborted." [21:53:41] Well, so what? It nonethless had the effect of concealing a latent problem. [21:53:46] That lurked until a reboot or restart or whatnot. [21:54:04] You're saying that usually that wouldn't happen, right? [21:54:05] usually a graceful is less intrusive [21:54:09] andrewbogott: well, what would you suggest? a hard restart kills any open connections, so the connection is aborted. [21:54:47] the flip-side of this question is… how do I do a graceful restart on the commandline? :) [21:54:47] andrewbogott: the class of problems that a reload conceals is limited to listen port [21:54:54] apache2ctl graceful [21:54:57] or service apache2 graceful [21:55:09] ok, that's easy enough. [21:55:12] the latter is probably cleaner (uses the debian wrapper as opposed to apache2ctl) [21:55:27] Yeah, I generally use 'service' anyway, so that fits my habits. [21:55:30] still uses /etc/init.d/apache2 graceful [21:55:54] kill -SIGUSR1 `pidof apache2` [21:55:56] yea, that's the part i meant with "did you graceful" [21:56:05] And, I'm generally sympathetic to the status quo, it's just that it creeps me out to have things be broken but not know it until I do something seemingly unrelated. [21:56:10] Might be nothing for it in this case. [21:56:14] andrewbogott: i agree that it's disturbing [21:56:50] In theory Apache could be smart and notice that it's config is busted, right? [21:56:57] Rather than just trying and failing? [21:56:59] it's rare that we make changes that really need a restart... [21:57:01] um… its [21:57:04] but by the same token, applying puppet patches on an already provisioned host already conceals problems [21:57:05] but of course we managed to hit it anyways :p [21:57:14] in that there may be resource ordering issues you don't hit until you reprovision [21:57:39] it's usually the case that stop-the-world, nuke everything, start from scratch is better at catching issues [21:57:48] but it requires stopping the world and nuking everything [21:58:03] at least until we have LVS orchestrated sufficiently well to depool an apache for restarts [21:58:05] yeah. I try to do that early and often in labs but it doesn't work so well in prod [21:58:19] (03PS1) 10Dzahn: graphite - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153952 [21:58:38] !log cirrus index rebuild is proceeding without trouble - I'm going to let it continue over night. [21:58:44] Logged the message, Master [21:58:55] bblack, how happy would you be if we began marking all traffic with x-analytics, not just zero? [21:59:59] (03PS2) 10Dzahn: graphite - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153952 [22:00:23] andrewbogott: on reflection, i think that this is credibly an apache bug [22:00:38] yeah, me too. [22:00:47] there's no reason why the config test couldn't detect duplicate listen directives [22:00:52] although possibly it was throwing warnings during the 'graceful' and I just haven't looked in the logs [22:00:53] * andrewbogott looks [22:01:04] if it had it would have not restarted apache [22:01:04] yurikR: I donno [22:01:33] depends on the whole context, what's being tagged there and how [22:02:00] andrewbogott: interesting: https://issues.apache.org/bugzilla/buglist.cgi?quicksearch=configtest [22:02:22] andrewbogott: https://issues.apache.org/bugzilla/show_bug.cgi?id=55340 is similar [22:03:34] (03PS1) 10Dzahn: performance.wm.org - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153953 [22:03:39] ori [22:03:44] yep, that's the one. [22:03:45] andrewbogott: [22:04:03] bblack, basically we are missing lots of data in stats like which partner a requests belongs to, etc. If we marked everything, it would be possible to do dynamic things like deciding to show smaller images for partners, etc [22:04:36] Also that bug is a year old :( [22:04:41] e.g. an image req comes in, goes through your netmapper, and gets a nice X-Analytics + a few more headers [22:05:06] do you mean partners other than zero? [22:05:14] varnish knows these things [22:05:22] so you can emit them as part of the varnish log output [22:05:29] without adding a header [22:05:32] bblack, no, at this point just talking about zero [22:05:32] * andrewbogott relocating, back later, maybe [22:05:50] so what zero requests are we talking about adding analytics to that we don't already? [22:05:58] (03PS1) 10Dzahn: ganglia - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153955 [22:06:23] e.g. our varnishkafka reqlog format already has: %{@response_size!num?0}b [22:06:41] bblack, multimedia [22:07:11] uhm, to save me some looking, do zero clients' multimedia requests go through the mobile caches or upload caches? [22:07:15] bblack, and theoretically the rest of desktop traffic [22:07:23] no idea [22:07:35] probably upload then :) [22:07:43] some of our partners are switching to ip-based whitelisting [22:08:02] theoretically the rest of the desktop means what? we start doing things like putting in netmapper ranges to identify who's on comcast? :p [22:08:54] plus at some point we will probably want to do some creative desktop advertising (community-permitting) - e.g. if you are on a partner network, show some small hint that something else might be free, or geotagging based [22:08:57] <_joe_> bblack: upload [22:09:21] bblack, hmmm... are we planning to add comcast to our partner network? :D [22:09:27] <_joe_> re: where multimedia request go [22:09:39] well that's why I'm wondering why we'd bother with desktop traffic. what are we trying to achieve there? [22:10:25] (03PS7) 10Ori.livneh: Clean up salt::minion [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 [22:11:04] maybe some meta-context is in order: the way you're asking the question, it seems like you expect me to have a problem with this, but I haven't yet seen the problem. So I'm poking around with random questions trying to find it. [22:12:07] I mean, I really don't care if there's another header on most traffic that's useful. [22:12:37] (03CR) 10Giuseppe Lavagetto: Clean up salt::minion (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 (owner: 10Ori.livneh) [22:12:43] but what are we trying to map into that header and how and why? [22:13:38] <_joe_> XY [22:13:49] heh [22:14:02] http://mywiki.wooledge.org/XyProblem for those not familiar with the jargon [22:14:04] also what's this creative desktop advertising, "something else might be free", geotagging, etc? This doesn't sound very wikimedia-ish, it sounds corporate marketing department-ish [22:14:28] bblack: "Congratulations, you are the 1,000,000th visitor today!!!" [22:14:33] click here to claim your prize [22:14:54] <_joe_> ori: shhhh [22:14:57] (a free cookie at the next wikimania) [22:15:03] <_joe_> that was intentionally cryptic! [22:15:27] i love greycat/greg's wiki [22:15:35] it's the crankiest piece of technical documentation [22:15:55] (03PS1) 10Dzahn: contint-use apache::site,move config to templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 [22:16:37] (03CR) 10jenkins-bot: [V: 04-1] contint-use apache::site,move config to templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 (owner: 10Dzahn) [22:17:57] (03PS2) 10Dzahn: contint-use apache::site,move config to templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 [22:18:38] (03CR) 10jenkins-bot: [V: 04-1] contint-use apache::site,move config to templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 (owner: 10Dzahn) [22:19:28] (03PS3) 10Dzahn: contint-use apache::site,move config to templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153959 [22:21:12] http://mywiki.wooledge.org/GreycatsGreatestMoments [22:21:14] (03PS3) 10Dzahn: wikistats - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153828 [22:21:39] " is it because you're all spending 30 minutes trying to look up obscure syntactic shit instead of just pasting 80 dots and KNOWING it'll work? " [22:21:41] enlightenment. [22:21:50] (03CR) 10Vogone: [C: 031] "LGTM as for the addition of the 'eliminator' group." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149637 (https://bugzilla.wikimedia.org/68612) (owner: 10Withoutaname) [22:22:47] labas Domai [22:25:42] (03CR) 10Dzahn: puppetmaster - use apache::site (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 (owner: 10Dzahn) [22:28:13] ori: the puppetmaster change isnt like the openstack change [22:28:20] it would need to replace the entire http://paste.debian.net/115492/ [22:28:24] with apache::conf [22:28:27] or keep using the template [22:32:26] (03PS1) 10Dzahn: limn - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153961 [22:33:05] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Wed 13 Aug 2014 20:32:51 UTC [22:33:15] (03CR) 10jenkins-bot: [V: 04-1] limn - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153961 (owner: 10Dzahn) [22:35:25] PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: Epic puppet fail [22:35:31] (03PS2) 10Dzahn: limn - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153961 [22:42:17] (03PS8) 10Ori.livneh: Clean up salt::minion [operations/puppet] - 10https://gerrit.wikimedia.org/r/153727 [22:55:25] RECOVERY - puppet last run on ssl3001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [23:00:04] RoanKattouw, mwalker, ori, MaxSem: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140813T2300). [23:00:18] i volunteer! [23:00:22] :) [23:00:23] no patches! :) [23:00:31] * ori collects his chips. [23:00:41] and wtf am i doing here?:P [23:14:57] (03PS1) 10Dzahn: gerrit - use global ssl_ciphersuite settings [operations/puppet] - 10https://gerrit.wikimedia.org/r/153967 [23:19:46] (03PS1) 10Ori.livneh: ordered_yaml(): omit document header [operations/puppet] - 10https://gerrit.wikimedia.org/r/153970 [23:20:27] (03CR) 10Ori.livneh: [C: 032 V: 032] ordered_yaml(): omit document header [operations/puppet] - 10https://gerrit.wikimedia.org/r/153970 (owner: 10Ori.livneh) [23:21:40] (03PS1) 10Dzahn: webserver - use ssl_ciphersuite in generic_vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/153971 [23:23:37] (03PS1) 10Dzahn: ganglia - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153972 [23:25:48] (03PS1) 10Dzahn: racktables - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153973 [23:28:57] (03PS1) 10Dzahn: wikitech - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153975 [23:30:39] (03PS1) 10Dzahn: noc.wm.org - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153976 [23:32:22] (03PS1) 10Dzahn: stats.wm.org - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153977 [23:35:01] (03PS1) 10Dzahn: etherpad - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153978 [23:35:43] (03CR) 10jenkins-bot: [V: 04-1] etherpad - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153978 (owner: 10Dzahn) [23:36:40] (03PS1) 10Dzahn: rt - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153981 [23:38:20] (03PS2) 10Dzahn: etherpad - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153978 [23:39:32] (03PS1) 10Dzahn: ishmael - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153982 [23:41:22] (03PS1) 10Dzahn: tendril - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153984 [23:42:39] (03PS1) 10Dzahn: gitblit - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153985 [23:45:50] (03PS1) 10Dzahn: puppetmaster - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153986 [23:46:51] (03PS1) 10Dzahn: puppetmaster Apache template - retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/153987 [23:52:55] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Wed Aug 13 23:52:46 UTC 2014 [23:53:00] (03PS1) 10Dzahn: svn - move Apache config from file to template [operations/puppet] - 10https://gerrit.wikimedia.org/r/153989 [23:53:46] (03PS2) 10Dzahn: svn - move Apache config from file to template [operations/puppet] - 10https://gerrit.wikimedia.org/r/153989 [23:56:07] (03PS1) 10Dzahn: subversion - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153991 [23:56:37] (03CR) 10Ori.livneh: [C: 04-1] limn - use apache::site (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153961 (owner: 10Dzahn) [23:57:20] (03CR) 10Dzahn: "in the prod puppetmaster case the ports template does stuff like:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153835 (owner: 10Dzahn) [23:58:31] mutante: if virt1001 is ever reimaged, ports.conf will revert to the version packaged with apache2. so in hindsight, i think it's better to drop the "Listen 80" directive from the wikitech apache::conf resource, and revert ports.conf instead [23:59:35] ... sigh [23:59:50] not a huge deal