[00:00:04] <jouncebot>	 RoanKattouw, ^d, marktraceur, MaxSem, RoanKattouw: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141114T0000).
[00:00:56] * marktraceur deploys from the airport
[00:01:01] <marktraceur>	 No, wait. Never mind.
[00:01:31] <^d>	 marktraceur: If you want to deploy from an airplane go for it :)
[00:01:34] <^d>	 {{bebold}}
[00:01:49] <marktraceur>	 Airport, at least.
[00:02:03] <marktraceur>	 ^d: Though technically the rule applies to *ferries*, not airplanes.
[00:02:45] <marktraceur>	 Air Bud would approve of my loophole.
[00:02:51] <marktraceur>	 So would FDR.
[00:03:07] <marktraceur>	 "There's nothing in the rulebook about a President playing basketball."
[00:04:35] <logmsgbot>	 !log demon Synchronized php-1.25wmf7/extensions/Echo: (no message) (duration: 00m 06s)
[00:04:37] <morebots>	 Logged the message, Master
[00:04:44] <logmsgbot>	 !log demon Synchronized php-1.25wmf8/extensions/Echo: (no message) (duration: 00m 04s)
[00:04:46] <matanya>	 marktraceur: you would violate "don't leave town rule" :D
[00:04:48] <^d>	 ebernhardson: ^
[00:04:48] <morebots>	 Logged the message, Master
[00:05:09] <marktraceur>	 matanya: good point
[00:05:20] <RoanKattouw>	 ^d: OK my first pair of commits is sitting in Jenkins now, once those merge I can create the actual submodule updates
[00:05:26] <grrrit-wm>	 (03CR) 10Chad: [C: 032] Share parsoid cookie forwarding config for VE/Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173175 (owner: 10EBernhardson)
[00:05:26] <marktraceur>	 matanya: Wait, what if I weren't *in* town in the first place?
[00:05:35] <^d>	 RoanKattouw: k thnx
[00:05:43] <marktraceur>	 Just wait until we're over the ocean
[00:05:51] <grrrit-wm>	 (03Merged) 10jenkins-bot: Share parsoid cookie forwarding config for VE/Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173175 (owner: 10EBernhardson)
[00:06:24] <matanya>	 marktraceur: invent a new rule, don't depoly when out of town
[00:06:37] <logmsgbot>	 !log demon Synchronized wmf-config/: (no message) (duration: 00m 07s)
[00:06:39] <morebots>	 Logged the message, Master
[00:06:43] <^d>	 ebernhardson: And the last of yours ^
[00:06:47] <^d>	 (you should be all done now)
[00:06:54] <spagewmf>	 thanks ^d
[00:07:04] <^d>	 I haven't done yours yet!
[00:07:05] <^d>	 :)
[00:07:17] <^d>	 But now you're here, lemme merge
[00:07:32] <ebernhardson>	 ^d: appears to work, thanks
[00:07:34] <grrrit-wm>	 (03CR) 10Chad: [C: 032] Enable Flow on some testwiki pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173196 (owner: 10Spage)
[00:07:42] <^d>	 ebernhardson: Sweet, yw :)
[00:07:42] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable Flow on some testwiki pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173196 (owner: 10Spage)
[00:08:00] <logmsgbot>	 !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
[00:08:02] <morebots>	 Logged the message, Master
[00:08:07] <^d>	 spagewmf: that's you ^
[00:08:22] <spagewmf>	 ^d you are the greatest
[00:08:28] <^d>	 I try :)
[00:08:31] <spagewmf>	 spage's law https://meta.wikimedia.org/wiki/User:SPage_%28WMF%29
[00:08:33] <RoanKattouw>	 OK I'm tired of this crap, I'm gonna bypass Jenkins
[00:08:49] <^d>	 RoanKattouw: Awww, I'm telling!
[00:10:41] <mutante>	 didnt jenkins just merge stuff up there a minute ago?
[00:10:53] <^d>	 Other stuff :)
[00:10:56] <mutante>	 ah
[00:11:22] <ori>	 cscott: around?
[00:11:25] <matanya>	 last famous words by RoanKattouw 
[00:12:12] <RoanKattouw>	 ^d: https://gerrit.wikimedia.org/r/173206 for wmf7
[00:13:02] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "Daniel, yes that will be accomplished by changing the Listening IP address, not the port. No need really change the port. Point being, the" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn)
[00:13:17] <RoanKattouw>	 ^d: And https://gerrit.wikimedia.org/r/173208 for wmf8
[00:13:21] <RoanKattouw>	 ^d: I'll go add them to the wiki page
[00:13:52] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn)
[00:14:49] <logmsgbot>	 !log demon Synchronized php-1.25wmf7/extensions/VisualEditor: (no message) (duration: 00m 05s)
[00:14:51] <morebots>	 Logged the message, Master
[00:14:59] <logmsgbot>	 !log demon Synchronized php-1.25wmf8/extensions/VisualEditor: (no message) (duration: 00m 04s)
[00:15:02] <morebots>	 Logged the message, Master
[00:15:03] <^d>	 RoanKattouw: ^^
[00:15:05] <mutante>	 !log nickel - shutdown 
[00:15:07] <morebots>	 Logged the message, Master
[00:16:21] <grrrit-wm>	 (03PS1) 10BryanDavis: logstash: Use doc_values for normalized_message.raw [puppet] - 10https://gerrit.wikimedia.org/r/173209 
[00:17:25] <^d>	 Thanks for playing in today's SWAT. Today's grand prize goes to spagewmf, runner up RoanKattouw.
[00:17:36] <^d>	 Tune in next week for more SWAT action and prizes.
[00:18:01] <JohnFLewis>	 ^d: what were the prizes :o
[00:18:05] <grrrit-wm>	 (03PS1) 10Spage: Fix typo in Flow-enable page name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173210 
[00:18:29] <^d>	 JohnFLewis: I'm sure I can find some stickers or something in the supply closet :p
[00:18:33] <grrrit-wm>	 (03CR) 10BryanDavis: "Already applied on the beta and production clusters via curl. The copy in the puppet repo is just for setting up a brand new cluster. It a" [puppet] - 10https://gerrit.wikimedia.org/r/173209 (owner: 10BryanDavis)
[00:18:40] <RoanKattouw>	 Well I went to the raffle for a Broadway show today and didn't get in
[00:18:52] <ori>	 MaxSem: are you hitting globalusage a lot again?
[00:18:56] <RoanKattouw>	 So there were potential prizes for me, I just didn't win any
[00:19:05] <JohnFLewis>	 ^d: I'll be sure to tune in next week as a contestant :D
[00:19:06] <MaxSem>	 ori, ???
[00:19:07] <^d>	 spagewmf: You need that ^?
[00:19:16] <spagewmf>	 ^d: dang, typo in one of those, can you deploy https://gerrit.wikimedia.org/r/173210 ?  ( remember how great I said you were?  )
[00:19:22] <ori>	 MaxSem: isn't globalusage the API endpoint the wikidata mobile stuff hits?
[00:19:28] <^d>	 spagewmf: One moment.
[00:19:36] <grrrit-wm>	 (03CR) 10Chad: [C: 032] Fix typo in Flow-enable page name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173210 (owner: 10Spage)
[00:19:39] <JohnFLewis>	 RoanKattouw: I think you might have just taken first place ;)
[00:19:44] <grrrit-wm>	 (03Merged) 10jenkins-bot: Fix typo in Flow-enable page name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173210 (owner: 10Spage)
[00:20:01] <MaxSem>	 ori, labs
[00:20:02] <RoanKattouw>	 haha
[00:20:03] <logmsgbot>	 !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
[00:20:06] <morebots>	 Logged the message, Master
[00:20:40] <^d>	 No, spagewmf still wins 1st today. He was super nice :)
[00:20:47] <^d>	 Roan didn't have his patches ready, which was -10 points :(
[00:20:57] <^d>	 (Although he got some back for making them up so quickly!)
[00:21:52] <JohnFLewis>	 ^d: what is the penalty for a typo?
[00:22:07] <^d>	 Depends on my mood :)
[00:22:17] <^d>	 And if it crashed the cluster.
[00:23:10] <JohnFLewis>	 so, if it Sets Wikis Ablaze, -100? :p
[00:23:30] <^d>	 Unless I think it's funny, in which case it's like +200 :p
[00:23:34] <spagewmf>	 hmm, https://test.wikipedia.org/wiki/Wikipedia:Co-op/Mentorship_match just doesn't want to be a Flow board.
[00:23:37] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] remove nickel's public IP [dns] - 10https://gerrit.wikimedia.org/r/172819 (owner: 10Dzahn)
[00:24:03] <^d>	 spagewmf: I purged it.
[00:24:11] <^d>	 It shows up now for me.
[00:25:34] <spagewmf>	 ^d: I've run out of superlatives. 
[00:26:10] <awight>	 If I want to wfGetDB and specify metawiki in the third, wikiId parameter, should I use a literal 'mediawiki', or is there a lookup for this?
[00:26:18] <kaldari>	 ^d are the Echo and VE deployment all done?
[00:26:24] <^d>	 kaldari: Yessir.
[00:26:26] <kaldari>	 thanks
[00:26:37] <^d>	 awight: Not "mediawiki" but yes.
[00:26:48] <^d>	 Although, might be best to make it configurable?
[00:27:19] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] Change ru.wikinews.org to HTTPS only. [puppet] - 10https://gerrit.wikimedia.org/r/173078 (owner: 10JanZerebecki)
[00:27:23] <spagewmf>	 ^d: is a "Wikipedia does not have a page with this exact name" not cached?  I wonder why sometimes we need to purge when enabling Flow and sometimes not.
[00:27:23] <awight>	 ^d: yah it's a global config variable.  So... I meant 'metawiki'.  Is that a valid wikiId?
[00:27:40] <^d>	 Yeah, metawiki is valid for meta.wikimedia
[00:27:50] <awight>	 ^d: great, thanks for confirming!
[00:27:51] <^d>	 yw
[00:27:58] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] "after https://gerrit.wikimedia.org/r/#/c/173078/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173083 (owner: 10JanZerebecki)
[00:32:06] <grrrit-wm>	 (03PS1) 10Cmjohnson: Adding dns entries for new frack host bismuth [dns] - 10https://gerrit.wikimedia.org/r/173216 
[00:33:04] <grrrit-wm>	 (03CR) 10Cmjohnson: [C: 032] Adding dns entries for new frack host bismuth [dns] - 10https://gerrit.wikimedia.org/r/173216 (owner: 10Cmjohnson)
[00:34:45] <grrrit-wm>	 (03CR) 10Dzahn: "bismuth looks pretty cool http://en.wikipedia.org/wiki/Bismuth#mediaviewer/File:Wismut_Kristall_und_1cm3_Wuerfel.jpg" [dns] - 10https://gerrit.wikimedia.org/r/173216 (owner: 10Cmjohnson)
[00:35:17] <cmjohnson>	 bismuth does look cool mutante
[00:35:20] <cmjohnson>	 very colorful
[00:35:50] <mutante>	 :) i think i had one as a kid if i'm not mistaken
[00:40:24] <grrrit-wm>	 (03PS1) 10Kaldari: Updating A?B test start and end times for WikiGrok test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173219 
[00:40:58] <grrrit-wm>	 (03CR) 10Kaldari: [C: 032] Updating A?B test start and end times for WikiGrok test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173219 (owner: 10Kaldari)
[00:41:06] <grrrit-wm>	 (03Merged) 10jenkins-bot: Updating A?B test start and end times for WikiGrok test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173219 (owner: 10Kaldari)
[00:42:17] <logmsgbot>	 !log kaldari Synchronized wmf-config/mobile.php: Update WikiGrok A/B test times (duration: 00m 03s)
[00:42:23] <morebots>	 Logged the message, Master
[00:45:09] <icinga-wm>	 PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3612 MB (10% inode=93%): /dev 32200 MB (99% inode=99%): /run 6403 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /srv 480540 MB (33% inode=99%):  
[00:49:27] <jgage>	 !log logstash1003: migrating elasticsearch data to new raid volume
[00:49:33] <morebots>	 Logged the message, Master
[00:51:58] <icinga-wm>	 PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 307 seconds  
[00:52:08] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 43, uinitializing_shards: 1, unumber_of_data_nodes: 2}  
[00:52:43] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 43, uinitializing_shards: 1, unumber_of_data_nodes: 2}  
[00:52:53] <icinga-wm>	 PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 362 seconds  
[00:53:24] <icinga-wm>	 RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -1 seconds  
[00:54:14] <icinga-wm>	 RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds  
[00:54:52] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 43, uinitializing_shards: 1, unumber_of_data_nodes: 2} Jeff Gage adding storage to logstash1003
[00:54:52] <icinga-wm>	 ACKNOWLEDGEMENT - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 43, uinitializing_shards: 1, unumber_of_data_nodes: 2} Jeff Gage adding storage to logstash1003
[00:55:07] <bd808>	 jgage: yum! 5.5T is >> the ~300G we had before
[00:55:19] <jgage>	 yeah really, i'm stoked :D
[00:55:38] <jgage>	 hopefully this means i can reenable the hadoop firehose :)
[00:55:59] <bd808>	 I would hope so.
[00:56:55] <jgage>	 data is copying to the new partition in a screen session, i will check back in a bit. when it's done i'll mv /var/lib/elasticsearch{,.old} and mount the new raid on /var/lib/eleasticseach and start things back up
[00:57:16] <jgage>	 then later when we're satisifed we can nuke elasticsearch.old
[00:57:37] <bd808>	 *nod*
[01:02:22] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 62, initializing_shards: 1, number_of_data_nodes: 3  
[01:02:53] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 62, initializing_shards: 1, number_of_data_nodes: 3  
[01:04:03] <logmsgbot>	 !log kaldari Synchronized php-1.25wmf7/extensions/WikiGrok: (no message) (duration: 00m 03s)
[01:04:08] <morebots>	 Logged the message, Master
[01:04:19] <logmsgbot>	 !log kaldari Synchronized php-1.25wmf7/extensions/MobileFrontend: (no message) (duration: 00m 05s)
[01:04:22] <morebots>	 Logged the message, Master
[01:10:12] <icinga-wm>	 PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3573 MB (10% inode=93%): /dev 32200 MB (99% inode=99%): /run 6403 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /srv 470272 MB (32% inode=99%):  
[01:15:12] <icinga-wm>	 RECOVERY - check_disk on lutetium is OK: DISK OK - free space: / 12484 MB (35% inode=93%): /dev 32200 MB (99% inode=99%): /run 6403 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /srv 468204 MB (32% inode=99%):  
[01:20:04] <bd808>	 jgage: I think that puppet may have tried to be helpful and restarted elasticsearch on logstash1003 :(
[01:36:02] <grrrit-wm>	 (03PS1) 10Kaldari: Updating WikiGrok A/B test start and end times [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173228 
[01:36:32] <jgage>	 bd808: ah geez
[01:36:41] <jgage>	 yeah forgot it would do that
[01:36:49] <jgage>	 it's ok, i just have to start the copy again
[01:36:53] <jgage>	 This Time For Real
[01:37:07] <bd808>	 sudo puppet agent --disable "copying elasticsearch files" :)
[01:37:25] <jgage>	 yeah :)
[01:37:34] <jgage>	 scheduling maint in icinga first
[01:38:02] <grrrit-wm>	 (03CR) 10Kaldari: [C: 032] Updating WikiGrok A/B test start and end times [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173228 (owner: 10Kaldari)
[01:38:10] <grrrit-wm>	 (03Merged) 10jenkins-bot: Updating WikiGrok A/B test start and end times [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173228 (owner: 10Kaldari)
[01:39:20] <logmsgbot>	 !log kaldari Synchronized wmf-config/mobile.php: updating WikiGrok A/B test times (duration: 00m 03s)
[01:39:27] <morebots>	 Logged the message, Master
[01:40:16] <jgage>	 ok, puppet is disabled and logstash1003 data is copying anew
[01:40:43] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 1, uactive_shards: 44, uinitializing_shards: 0, unumber_of_data_nodes: 2}  
[01:41:03] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 1, uactive_shards: 44, uinitializing_shards: 0, unumber_of_data_nodes: 2}  
[01:43:26] <jgage>	 ^ silenced
[01:44:30] <mutante>	 thx gage
[01:47:17] <grrrit-wm>	 (03PS3) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 
[01:48:18] <mutante>	 we will never use glusterfs again?
[01:48:24] <mutante>	 in labs as project storage ?
[01:48:34] * bd808 hopes not
[01:48:49] <bd808>	 gluster was teh suk
[01:48:49] <mutante>	 i see things like this:
[01:48:56] <mutante>	 default => 'projectstorage.pmtpa.wmnet',
[01:49:09] <mutante>	 just wondering how much to remove 
[01:49:24] <YuviPanda>	 off glusterfs?
[01:49:27] <YuviPanda>	 I'd say all of it :)
[01:49:33] <YuviPanda>	 we can always pick code back up from history if needed
[01:49:35] <mutante>	 281         $gluster_server_name = $instanceproject ? {
[01:49:40] <mutante>	 274 class ldap::client::autofs($ldapconfig) {
[01:49:51] <mutante>	 ok :)
[01:52:05] <grrrit-wm>	 (03Draft1) 10Dereckson: Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) 
[01:53:14] <YuviPanda>	 hmm
[01:53:22] <YuviPanda>	 shinken isn't delivering mail nor is it sending to IRC
[01:53:23] <mutante>	 drafts, rare enough
[01:54:16] <mutante>	 YuviPanda: did it send to IRC before or first time?
[01:54:29] <YuviPanda>	 nope, first time
[01:54:36] <YuviPanda>	 but I guess the underlying notifications is broken somehow
[01:55:00] <mutante>	 doesn't like the shared notification commands from nagios_common yet?
[01:55:24] <grrrit-wm>	 (03PS4) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 
[01:55:26] <YuviPanda>	 not sure, still debugging
[01:56:23] <icinga-wm>	 PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%)  
[01:57:29] <grrrit-wm>	 (03CR) 10Dereckson: [C: 031] Enable VisualEditor by default on Tagalog Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172993 (https://bugzilla.wikimedia.org/73365) (owner: 10Jforrester)
[02:17:24] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 2, active_shards: 63, initializing_shards: 0, number_of_data_nodes: 3  
[02:17:54] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 2, active_shards: 63, initializing_shards: 0, number_of_data_nodes: 3  
[02:18:01] <logmsgbot>	 !log LocalisationUpdate completed (1.25wmf7) at 2014-11-14 02:18:01+00:00
[02:18:02] <jgage>	 :D
[02:18:08] <morebots>	 Logged the message, Master
[02:19:04] <jgage>	 !log logstash1003 elasticsearch migration to new raid0 complete
[02:19:07] <morebots>	 Logged the message, Master
[02:28:11] <TimStarling>	 !log progressively increasing load on mw1114, attempting to reproduce the previous overload
[02:28:14] <morebots>	 Logged the message, Master
[02:30:35] <logmsgbot>	 !log LocalisationUpdate completed (1.25wmf8) at 2014-11-14 02:30:34+00:00
[02:30:37] <morebots>	 Logged the message, Master
[03:33:54] <grrrit-wm>	 (03CR) 10Mattflaschen: "I don't think this should have been merged in less than 2 days without any input from the maintainers. Neither the person who originally" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172110 (owner: 10Nemo bis)
[03:49:50] <cscott>	 bd808: you around?
[03:50:13] <bd808>	 cscott: yeah. what's up?
[03:50:55] <cscott>	 i was experimenting with https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Converting_a_host_to_use_local_puppetmaster_and_salt_master on deployment-pdf02
[03:51:37] <cscott>	 there's a stack of cherry-picked puppet patches on deployment-salt.eqiad.wmflabs, and the top one is yours and doesn't rebase cleanly
[03:51:39] <bd808>	 Maybe not the best place.
[03:52:00] <cscott>	 deployment-pdf02 isn't actually in use for anything, it's just a spare clone of the ocg setup
[03:52:25] <cscott>	 my goal is to test some puppet changes, so it's good place to do so
[03:52:46] <cscott>	 cf hashar's comments on https://gerrit.wikimedia.org/r/170130
[03:53:22] <cscott>	 at any rate, i thought i should poke you about rebasing your cherry-pick at some point
[03:53:47] <cscott>	 but i've moved on to https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster in the meantime, so that i don't have to deal with a shared puppetmaster
[03:53:59] <bd808>	 Yeah. I thought Yuvi was going to fix that. Looking now
[04:00:11] <bd808>	 cscott: Conflict resolved. Thanks for the poke
[04:02:19] <cscott>	 it seems to be working, although i had to manually sudo service puppetmaster start on deployment-pdf02 -- but it's not picking up the ocg_*override values from https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=configure&project=deployment-prep&instanceid=54c66f88-4c39-487b-802b-2eec751f4300&region=eqiad
[04:02:40] <cscott>	 (this is re: the self-hosted puppetmaster, which i switched to instead of using the labs puppetmaster)
[04:03:25] <jgage>	 !log logstash1002 migration to new md0 complete
[04:03:28] <cscott>	 how are those configuration values actually exported to puppet?  it seems to be using the configuration values for the production servers instead of the ones configured for deployment-pdf02, i'm not sure how that could get switched up
[04:03:35] <morebots>	 Logged the message, Master
[04:03:56] <jgage>	 bd808, one more to go! but now it is dinnertime.
[04:04:05] <bd808>	 jgage: awesome!
[04:04:38] <bd808>	 cscott: I'm not sure how the wikitech -> puppet magic works. :/
[04:05:50] <bd808>	 cscott: It has something to do with ldap. Those values are stored in ldap and then injected into the puppet run
[04:07:43] <cscott>	 well, i switched from self-hosted back to labs-hosted puppet, and now the configuration values magically shifted to the correct values again
[04:07:58] <cscott>	 so i guess it's something subtly wrong with the self-hosted puppet configuration
[04:10:18] <bd808>	 labs default or the beta cluster puppet? Because the beta puppet master is self-hosted with the normal role
[04:10:52] <cscott>	 deployment-pdf02 matches https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Converting_a_host_to_use_local_puppetmaster_and_salt_master now
[04:11:06] <cscott>	 which i believe is using deployment-salt as its puppetmaster
[04:11:13] <bd808>	 *nod*
[04:11:42] <bd808>	 Which is a role::puppet::self instance itself
[04:12:24] <cscott>	 it is also running a puppetmaster on localhost, but when i try to use that (by blanking the 'puppetmaster' and 'deployment_server_override' settings on the instance's config page), puppet runs but it uses the wrong configuration values
[04:13:26] <bd808>	 Are you tweaking a role that is used on the pdf01 host?
[04:14:25] <bd808>	 It might be easiest to disable puppet on pdf01 (puppet agent --disable "testing config changes on pdo02") and then try your patches via deployment-salt
[04:14:41] <cscott>	 not at the moment; https://gerrit.wikimedia.org/r/#/c/170130/5/manifests/role/ocg.pp adds a new role::ocg::beta role but pdf01 currently uses the role::ocg::production
[04:15:05] <bd808>	 oh even easier, just cherry-pick to deployment-salt and test away
[04:15:49] <cscott>	 bd808: yeah, that was my original plan, until i ran into the rebase conflict.  then i thought that self-hosting would be even better since i didn't risk accidentally breaking anything for anyone else.
[04:16:00] <cscott>	 but since i can't seem to get the self-hosting to work, i guess i'm back to plan A ;)
[04:16:14] <bd808>	 meh. we can fix it if it breaks. reflog to the rescue
[04:16:57] <cscott>	 incidentally, switching to self-hosting gives this after the first puppet run:
[04:16:57] <cscott>	 Error: /Stage[main]/Puppet::Self::Master/Service[puppetmaster]: Failed to call refresh: Could not start Service[puppetmaster]: Execution of '/etc/init.d/puppetmaster start' returned 1: 
[04:16:57] <cscott>	 Error: /Stage[main]/Puppet::Self::Master/Service[puppetmaster]: Could not start Service[puppetmaster]: Execution of '/etc/init.d/puppetmaster start' returned 1: 
[04:17:16] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 14 04:17:16 UTC 2014 (duration 17m 15s)
[04:17:16] <cscott>	 you need to manually run `sudo service puppetmaster start` to get it going
[04:17:21] <morebots>	 Logged the message, Master
[04:17:41] <cscott>	 (since we were spitballing about upstart and systemd and rc.d earlier today)
[04:18:02] <bd808>	 yeah that started happening when we switched to puppet3. I haven't bothered to dig into it. Seems like an Ops problem. :)
[04:18:56] <bd808>	 cscott: You can add the checkbox to apply your new role via -- https://wikitech.wikimedia.org/wiki/Special:NovaPuppetGroup
[04:20:26] <grrrit-wm>	 (03CR) 10BryanDavis: "I would really like to see a trebuchet porcelain created that can be used to automate trebuchet deploys rather than more custom deployment" [puppet] - 10https://gerrit.wikimedia.org/r/170130 (owner: 10Cscott)
[04:21:09] <cscott>	 bd808: trebuchet porcelain is a nice mental image, but i really don't know what it means ;)
[04:21:40] <bd808>	 a frontend script. I think I picked up the term from git
[04:21:57] <bd808>	 all the cli stuff in git is called porcelain.
[04:22:17] <bd808>	 http://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
[04:22:37] <cscott>	 bd808: the puppet patch in question just adds the appropriate keys & permission bits to allow jenkins to reach over and run jobs on the host
[04:23:06] <bd808>	 yeah. That's the custom deploy setup like parsoid uses that I don't like.
[04:23:22] <cscott>	 bd808: i haven't written the patch to do the actual deploy yet, that could be standard git-deploy
[04:23:31] <bd808>	 nope
[04:23:39] <bd808>	 you can't script git-deploy
[04:23:48] <bd808>	 or if you can you're a wizard
[04:24:31] <cscott>	 i'm saying that i think the part you're objecting to is not actually the puppet part, which is firewall and ssh stuff afaik, but rather https://gerrit.wikimedia.org/r/#/c/170030/1/jjb/beta.yaml
[04:24:49] <cscott>	 which is currently copied over from parsoid and uses rsync (yuck)
[04:25:05] <cscott>	 that's the part I'm guessing you're rather see use trebuchet or some other wizardry
[04:26:09] <cscott>	 the puppet part just makes the "node: deployment-ocg-{datacenter}" part of 170030 work -- that is, ensures that this task (whether it's trebuchet or whatever) is running on a specific deploy host.
[04:26:14] <bd808>	 Yeah, but you wouldn't need this part either. scap runs via jenkins job that runs on deployment-bastion. This *should* work the same way except trebuchet is not scripting friendly
[04:26:55] <bd808>	 The script running on deployment-prep should be just like you were doing a deploy yourself
[04:27:05] <bd808>	 and not need any funny config on the target hosts
[04:29:04] <cscott>	 bd808: you're talking about the beta-scap-{datacenter} job in integration-config?
[04:29:15] <bd808>	 yeah
[04:29:46] <cscott>	 node: deployment-bastion-{datacenter} will presumably already suffice to make those commands run on bastion-eqiad.wmflabs.org ?
[04:31:10] <bd808>	 yup. but then you have the "how does one script git-deploy" problem. There is no unattended mode for it.
[04:32:07] <bd808>	 It requires a human to read the N/M hosts … messages and decide to continue to wait or advance to the next step
[04:33:21] <bd808>	 Ryan assures me it's an "easy fix" but it is still undone, and I don't have time to do it. :(
[04:33:51] <cscott>	 apt-cache show expect
[04:33:59] <bd808>	 ewwww
[04:34:14] <bd808>	 that would be a horrible expect script
[04:34:26] <cscott>	 all expect scripts are horrible :)
[04:34:29] <bd808>	 the app needs to be fixed
[04:34:45] <bd808>	 a deploy tool that can't be automated is not a deploy tool
[04:34:49] <cscott>	 so git-deploy uses trebuchet to actually put the code on the servers?
[04:34:56] <bd808>	 yeah
[04:35:44] <cscott>	 reading git-deploy it says, "The main feature of this tool is not actually doing rollouts, it's doing reverts. [...] One thing it definitely doesn't do is worry about how your code gets copied around to your production servers, that's completely up to you. [...] git-deploy solves the problem of making your deployment history available in a distributed way to everyone with a Git checkout, as well as making sure that there's an exclusive lock o
[04:35:58] <cscott>	 so it sounds like git-deploy is really the wrong tool for the job of doing automated deploys
[04:36:13] <cscott>	 and we probably want to be using trebuchet directly.  which is i guess what you've been saying.
[04:36:24] <icinga-wm>	 RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK  
[04:36:57] <TimStarling>	 !log on mw1114 restarting hhvm
[04:37:00] <morebots>	 Logged the message, Master
[04:42:25] <bd808>	 cscott: Our git-deploy is some version of https://github.com/trebuchet-deploy/trigger. And yeah it's the porcelain for running the trebuchet plumbing.
[04:43:29] <cscott>	 https://github.com/trebuchet-deploy/trigger/blob/master/trigger/drivers/trebuchet/local.py shows the low-level commands invoked
[04:44:59] <bd808>	 cscott: And https://github.com/trebuchet-deploy/trigger/blob/master/trigger/drivers/trebuchet/local.py#L103-L124 is the problematic bit for scripting
[04:45:09] <cscott>	 and its use of subprocess.Popen appalls me
[04:45:43] <cscott>	 try putting a single quote in your repo name for a good time: https://github.com/trebuchet-deploy/trigger/blob/master/trigger/drivers/trebuchet/local.py#L90
[04:46:04] <bd808>	 ugh
[04:46:21] <bd808>	 # TODO (ryan-lane): Check return values from these commands
[04:49:17] <cscott>	 use the subprocess module, python people.  otherwise you are in a state of sin.
[04:49:25] <cscott>	 or working for a ride-sharing company.
[04:49:28] <cscott>	 one or the other
[04:49:39] <bd808>	 Or porting perl to python?
[04:49:56] <bd808>	 anyhow, have fun with your experiments
[04:50:23] * bd808 goes back to watching Cal get thumped
[04:52:56] <cscott>	 i might play around with deploying via "sudo salt-call -l quiet publish.runner deploy.fetch $REPO ; sudo salt-call -l quiet publish.runner deploy.checkout $REPO ; sudo salt-call -l quiet --out=json publish.runner deploy.restart $REPO", since that's what it looks like `git-deploy sync ; git-deploy service restart` is doing under the hood
[04:53:40] <bd808>	 All that sudo is part of what I don't like about using salt
[04:54:43] <bd808>	 But yeah that might be the way to automate it
[04:54:53] <cscott>	 i might also submit a PR to rip out the string arguments to Popen and replace them with proper arrays :-/
[04:55:07] <bd808>	 crazy pants!
[04:56:19] <cscott>	 i'm afraid the N/M hosts... messages are a inherent part of trebuchet, however :(
[04:56:43] <cscott>	 down in https://github.com/trebuchet-deploy/trigger/blob/master/trigger/drivers/trebuchet/local.py#L290 we are reading that information from redis (!) after the salt-call command returns
[04:56:52] <bd808>	 they are inherent in the async-ness of salt
[04:57:21] <bd808>	 and the redis returner and the way that it never knows for sure how many hosts there really should be
[04:57:49] <cscott>	 parsoid's rsync 1-liner is looking better and better
[04:58:25] <bd808>	 The salt master knows but the redis cache of hosts requires manual pruning -- https://wikitech.wikimedia.org/wiki/Trebuchet#Removing_minions_from_redis
[05:05:40] <cscott>	 another option is, if i'm hacking git-deploy to make the subprocess argument handling sane, to add an --auto or -y option that shortcuts the _ask method with some reasonable logic (a maximum of X times wait Y seconds for self._report_driver.report_sync to report all complete, or else retry and loop back to the top, a maximum of Z times). 
[06:28:43] <icinga-wm>	 PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:28:53] <icinga-wm>	 PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:28:53] <icinga-wm>	 PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:31:26] <grrrit-wm>	 (03CR) 10Catrope: "This was scheduled to be deployed in the SWAT deploy about 6 hours ago at 00:00 UTC, but it was overlooked because of a problem with other" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172993 (https://bugzilla.wikimedia.org/73365) (owner: 10Jforrester)
[06:45:54] <icinga-wm>	 RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures  
[06:45:54] <icinga-wm>	 RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures  
[06:46:45] <icinga-wm>	 RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures  
[06:49:23] <icinga-wm>	 PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: Puppet has 1 failures  
[07:07:44] <icinga-wm>	 RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures  
[07:20:24] <PeperPots_>	 so any chance of adding .mobi format to the render engine and sticking the option in the beta ??
[07:34:03] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 19 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 44, uinitializing_shards: 0, unumber_of_data_nodes: 2}  
[07:34:14] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 19 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 44, uinitializing_shards: 0, unumber_of_data_nodes: 2}  
[07:34:57] <jgage>	 bah, didn't schedule my maintenance for long enough
[07:34:59] <jgage>	 almost done
[07:37:14] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 2, active_shards: 63, initializing_shards: 0, number_of_data_nodes: 3  
[07:37:33] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 2, active_shards: 63, initializing_shards: 0, number_of_data_nodes: 3  
[07:38:36] <jgage>	 !log logstash hosts: elasticsearch moved to bigger disks
[07:38:44] <morebots>	 Logged the message, Master
[07:45:51] <grrrit-wm>	 (03CR) 10Nemo bis: "Temporarily... for two months?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172110 (owner: 10Nemo bis)
[07:58:32] <grrrit-wm>	 (03PS3) 10Giuseppe Lavagetto: puppet: get rid of the nagios_group global variable [puppet] - 10https://gerrit.wikimedia.org/r/172531 
[08:01:10] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] puppet: get rid of the nagios_group global variable [puppet] - 10https://gerrit.wikimedia.org/r/172531 (owner: 10Giuseppe Lavagetto)
[08:05:12] <_joe_>	 ach sorry, I am going to leave this unmerged for a few minutes
[08:06:03] <icinga-wm>	 PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail  
[08:10:54] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet).  
[08:10:54] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet).  
[08:11:54] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge.  
[08:11:54] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge.  
[08:13:32] <grrrit-wm>	 (03CR) 10Nikerabbit: [C: 031] Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson)
[08:14:55] <_joe_>	 meh I forgot to apply the new hiera lib in the puppet compiler
[08:25:34] <icinga-wm>	 RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures  
[09:04:15] <icinga-wm>	 PROBLEM - puppet last run on amslvs2 is CRITICAL: CRITICAL: puppet fail  
[09:04:38] <Glaisher>	 uh. why is redirects.conf and dat in both puppet and apache-config repo?
[09:04:47] <Glaisher>	 it looks like the one in apache-config is not in sync
[09:14:03] <hashar>	 !log Zuul is flapping 
[09:14:08] <morebots>	 Logged the message, Master
[09:15:25] <_joe_>	 Glaisher: in the apache-config repo there is a file explaining it's dismissed
[09:16:01] <_joe_>	 https://github.com/wikimedia/operations-apache-config/blob/master/README_BEFORE_EDITING
[09:16:03] <hashar>	 !log Zuul is back
[09:16:06] <morebots>	 Logged the message, Master
[09:16:21] <Glaisher>	 ah
[09:16:35] <_joe_>	 hey hashar 
[09:21:24] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: gdash: fix parser cache gdash [puppet] - 10https://gerrit.wikimedia.org/r/173246 
[09:22:32] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: fix parser cache gdash [puppet] - 10https://gerrit.wikimedia.org/r/173246 (owner: 10Filippo Giunchedi)
[09:22:44] <icinga-wm>	 RECOVERY - puppet last run on amslvs2 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures  
[09:24:13] <_joe_>	 godog: we should really think about changing the servers. hierarchy
[09:24:34] <_joe_>	 godog: right now it's impossible to aggregate data per server cluster, and we need it
[09:24:56] <_joe_>	 I'm willing to lose all current stats there, I'm still the only one using them anyway
[09:25:19] <godog>	 what makes you say that? :)
[09:25:24] <_joe_>	 so my idea is servers.$hostname.$metric
[09:25:31] <godog>	 anyways yes per-cluster would be nice
[09:25:33] <_joe_>	 is what we have now
[09:26:05] <_joe_>	 and I thought about servers.$cluster.$site.$metric
[09:26:22] <_joe_>	 sorry servers.$cluster.$site.$server.$metric
[09:27:39] <godog>	 what happens when we change a machine's cluster?
[09:28:23] <_joe_>	 it's still aggregation
[09:28:43] <_joe_>	 so servers.*.eqiad.mw1117.$metric
[09:29:01] <_joe_>	 they will show as separate entries, but you still have history
[09:29:34] <grrrit-wm>	 (03PS1) 10Glaisher: Add DNS for mul.wikisource.org [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) 
[09:35:12] <godog>	 _joe_: that or aggregate into cluster metrics while we ingest the metrics
[09:35:31] <_joe_>	 godog: on graphite?
[09:35:41] <_joe_>	 er, in statsd?
[09:36:10] <godog>	 in graphite yeah, carbon-relay
[09:37:56] <_joe_>	 so we'd need to keep yet another list of server-cluster associations
[09:40:51] <_joe_>	 or, we collect metrics in the form I described above, and we do 2 aggregations, one per-server and one per-cluster, and we discard the original metrics 
[09:41:00] <_joe_>	 (by using a very short retention)
[09:41:16] <godog>	 yeah the problem specifically is that it doesn't seem possible given a cluster to have a list of hosts that are part of that cluster right now
[09:41:28] <_joe_>	 do you think this may work?
[09:42:01] <_joe_>	 godog: this is easily changed - I can write a small script that can give us that information via HTTP
[09:42:20] <_joe_>	 what about that?
[09:42:35] <godog>	 yeah that seems useful to me regardless of this issue
[09:43:18] <_joe_>	 but what about the solution I just proposed?
[09:43:22] <_joe_>	 that would be:
[09:43:55] <_joe_>	 - diamond sends back metrics in the form servers.<cluster>.<site>.<hostname>.metric
[09:44:00] <grrrit-wm>	 (03PS1) 10Hashar: Drop role::zuul::labs [puppet] - 10https://gerrit.wikimedia.org/r/173248 
[09:44:18] <_joe_>	 - these metrics are stored with a very short retention time
[09:44:47] <_joe_>	 - We aggregate those to <site>.<cluster>.metric
[09:45:06] <_joe_>	 and to server.<hostname>.metric
[09:45:24] <_joe_>	 and these two metrics will have the normal retention rules
[09:45:35] <grrrit-wm>	 (03CR) 10Hashar: [C: 031 V: 032] "Cherry picked on integration puppetmaster (labs). The integration-zuul-server instances already uses role::zuul::merger and role::zuul::s" [puppet] - 10https://gerrit.wikimedia.org/r/173248 (owner: 10Hashar)
[09:45:41] <_joe_>	 do you think this could work?
[09:45:49] <_joe_>	 or it sounds like an horrible hack?
[09:47:29] <godog>	 _joe_: I think it might work, but needs some testing, e.g. with https://github.com/grobian/carbon-c-relay
[09:52:30] <grrrit-wm>	 (03PS1) 10Glaisher: Redirect mul.wikisource.org to wikisource.org [puppet] - 10https://gerrit.wikimedia.org/r/173250 (https://bugzilla.wikimedia.org/73407) 
[09:56:24] <grrrit-wm>	 (03PS2) 10Glaisher: Add DNS for mul.wikisource.org [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) 
[09:57:13] <icinga-wm>	 RECOVERY - swift-object-replicator on ms-be2010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator  
[09:57:30] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on ms-be2005 is CRITICAL: CRITICAL: Puppet has 1 failures Filippo Giunchedi pending disk change
[10:00:57] <grrrit-wm>	 (03PS1) 10Hashar: role::ci::website::labs [puppet] - 10https://gerrit.wikimedia.org/r/173251 
[10:20:10] <grrrit-wm>	 (03PS2) 10Hashar: role::ci::website::labs [puppet] - 10https://gerrit.wikimedia.org/r/173251 
[10:35:03] <_joe_>	 I updated https://wikitech.wikimedia.org/wiki/Puppet_Hiera with info about the new "regex matching" feature I introduced
[10:35:13] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[10:35:29] <_joe_>	 I'll take a look at making https://wikitech.wikimedia.org/wiki/Puppet_coding modern and relevant too
[10:36:00] <_joe_>	 then I'll start to -1 and unjustified refusal to handle things with hiera when needed
[10:37:31] <_joe_>	 "Our code is, as of July 2013, in transition 
[10:37:49] <_joe_>	 from a system of global manifests to a system of modules and roles."
[10:37:53] <_joe_>	 eheh
[10:40:13] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[10:45:17] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[10:50:14] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[10:55:18] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[11:00:14] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[11:05:09] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "Daniel, my comments have actually not been addressed. Patch sets 13 and 14 have no diff. Gabriel, could you have a look please ?" [puppet] - 10https://gerrit.wikimedia.org/r/167213 (owner: 10GWicke)
[11:05:11] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[11:10:10] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[11:14:53] <mark>	 hehe
[11:14:54] <mark>	 "Our historical take on role classes was 'do not parametrize, use node-scope variables to configure'."
[11:14:59] <mark>	 that's not true I think
[11:15:09] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[11:15:34] <mark>	 unless it's a more recent development while I haven't been paying attention much
[11:15:51] <mark>	 but our ORIGINAL use case of node-scope variables was because puppet didn't even have class parameters yet :)
[11:20:09] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[11:21:01] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 031] ssh server: make listening port configurable [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn)
[11:21:55] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 031] ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn)
[11:25:09] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[11:30:13] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null)  
[11:35:07] <grrrit-wm>	 (03PS2) 10JanZerebecki: ssh server: make PermitRootLogin configurable [puppet] - 10https://gerrit.wikimedia.org/r/172804 (owner: 10Dzahn)
[11:42:19] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] ssh server: make ListenAddress configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn)
[12:01:24] <icinga-wm>	 PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%)  
[12:03:39] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] rrdcached tuning [puppet] - 10https://gerrit.wikimedia.org/r/173032 (owner: 10Alexandros Kosiaris)
[12:05:19] <grrrit-wm>	 (03PS4) 10Giuseppe Lavagetto: varnish: make varnish::instance not depend on ganglia [puppet] - 10https://gerrit.wikimedia.org/r/172967 
[12:07:24] <icinga-wm>	 RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK  
[12:16:49] <grrrit-wm>	 (03PS5) 10Giuseppe Lavagetto: varnish: make varnish::instance not depend on ganglia [puppet] - 10https://gerrit.wikimedia.org/r/172967 
[12:26:45] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] Redirect mul.wikisource.org to wikisource.org [puppet] - 10https://gerrit.wikimedia.org/r/173250 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher)
[12:31:44] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] varnish: make varnish::instance not depend on ganglia [puppet] - 10https://gerrit.wikimedia.org/r/172967 (owner: 10Giuseppe Lavagetto)
[12:35:39] <grrrit-wm>	 (03PS3) 10Giuseppe Lavagetto: role::cache: make ganglia inclusion optional [puppet] - 10https://gerrit.wikimedia.org/r/172974 
[12:44:18] <grrrit-wm>	 (03PS4) 10Giuseppe Lavagetto: role::cache: make ganglia inclusion optional [puppet] - 10https://gerrit.wikimedia.org/r/172974 
[12:49:50] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] role::cache: make ganglia inclusion optional [puppet] - 10https://gerrit.wikimedia.org/r/172974 (owner: 10Giuseppe Lavagetto)
[13:03:44] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "While the patch mostly makes sense, before merging we need to:" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/172418 (owner: 10Ori.livneh)
[13:35:48] <icinga-wm>	 PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Puppet has 1 failures  
[13:37:21] <grrrit-wm>	 (03PS4) 10JanZerebecki: ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn)
[13:39:26] <grrrit-wm>	 (03PS5) 10JanZerebecki: ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn)
[13:40:06] <grrrit-wm>	 (03CR) 10JanZerebecki: ssh server: make ListenAddress configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn)
[13:42:53] <grrrit-wm>	 (03PS3) 10JanZerebecki: ssh server: make PermitRootLogin configurable [puppet] - 10https://gerrit.wikimedia.org/r/172804 (owner: 10Dzahn)
[13:53:19] <icinga-wm>	 RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures  
[14:03:19] <icinga-wm>	 PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (12.50%)  
[14:16:12] <grrrit-wm>	 (03PS1) 10Eranroz: Removing special wgAccountThrottle for hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173273 
[14:24:00] <grrrit-wm>	 (03PS1) 10Hashar: zuul: update layout_config path [puppet] - 10https://gerrit.wikimedia.org/r/173276 
[14:25:55] <hashar>	 dear ops, I could really use a merge of https://gerrit.wikimedia.org/r/#/c/173276/ :-D
[14:35:38] <paravoid>	 !log cr1-ulsfo: setting up BGP with new transit provider
[14:35:45] <morebots>	 Logged the message, Master
[14:43:00] <grrrit-wm>	 (03CR) 10Ottomata: "Links about hiera (and role classes)" [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke)
[14:49:31] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] zuul: update layout_config path [puppet] - 10https://gerrit.wikimedia.org/r/173276 (owner: 10Hashar)
[14:56:34] <grrrit-wm>	 (03CR) 10Ottomata: "> convert all calls to the varnishkafka class" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/172418 (owner: 10Ori.livneh)
[14:59:30] <grrrit-wm>	 (03CR) 10Hashar: "Thank you very much. I have applied puppet on the production and labs Zuul servers. Works like a charm :-)" [puppet] - 10https://gerrit.wikimedia.org/r/173276 (owner: 10Hashar)
[15:39:49] <ottomata>	 is someone messing with analytics1003? jgage?
[15:46:04] <grrrit-wm>	 (03CR) 10QChris: "I cleaned up /var/log/eventlogging, so this change should" [puppet] - 10https://gerrit.wikimedia.org/r/172884 (owner: 10QChris)
[15:46:48] <ottomata>	 !log analytics1003 (a cisco) is acting crazy, stuck in some loop while trying to boot.  Am attempting to fix with power cycle
[15:46:50] <morebots>	 Logged the message, Master
[15:47:24] <grrrit-wm>	 (03PS3) 10Ottomata: Link EventLogging logs into /var/log/eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/172884 (owner: 10QChris)
[15:51:27] <icinga-wm>	 RECOVERY - Host analytics1003 is UP: PING OK - Packet loss = 0%, RTA = 1.09 ms  
[15:53:47] <icinga-wm>	 PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Puppet has 1 failures  
[15:54:03] * cmjohnson hates the cisco servers 
[15:55:29] <ottomata>	 yeah, taht worries me, cmjohnson.  i upgraded it to trusty yesterday
[15:55:31] <ottomata>	 it was fine
[15:55:48] <ottomata>	 but this morning it says it had been down for 16h.  for an03 this is fine, as it is not a prod machine
[15:55:53] <cmjohnson>	 yeah..they just stop working for unexplainable reasons
[15:55:54] <ottomata>	 but i will have to do this to an04 and an10
[15:56:43] <ottomata>	 !log upgrading analytics1024 to trusty
[15:56:47] <morebots>	 Logged the message, Master
[15:57:04] <grrrit-wm>	 (03CR) 10GWicke: "Argh, crap. It looks like I lost most of the patch in a git stash merge conflict. Will fix in a follow-up." [puppet] - 10https://gerrit.wikimedia.org/r/167213 (owner: 10GWicke)
[16:09:58] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Link EventLogging logs into /var/log/eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/172884 (owner: 10QChris)
[16:10:07] <icinga-wm>	 PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: puppet fail  
[16:11:32] <grrrit-wm>	 (03PS1) 10GWicke: Re-do several lost fixes in restbase module [puppet] - 10https://gerrit.wikimedia.org/r/173287 
[16:11:52] <gwicke>	 akosiaris: ^^
[16:18:22] <akosiaris>	 gwicke: thanks. I did a minor change and I will merge. Really thanks for following up
[16:18:53] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: Re-do several lost fixes in restbase module [puppet] - 10https://gerrit.wikimedia.org/r/173287 (owner: 10GWicke)
[16:19:08] <icinga-wm>	 PROBLEM - Host analytics1003 is DOWN: PING CRITICAL - Packet loss = 100%  
[16:19:31] <gwicke>	 akosiaris: thank you for the careful review!
[16:19:47] <akosiaris>	 gwicke: btw, I wanna do that https://rt.wikimedia.org/Ticket/Display.html?id=8529 . Do we need to keep the data ? aka (all of them together? one at a time ?)
[16:20:00] <akosiaris>	 something in between ?
[16:20:21] <gwicke>	 akosiaris: all of them would be great
[16:20:36] <akosiaris>	 simultaneously ? cool !
[16:20:43] <gwicke>	 they are pure test hosts
[16:20:56] <akosiaris>	 yeah I know, just making sure
[16:21:05] <gwicke>	 the tricky bit is the disk configuration
[16:21:23] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] "Thanks for following up!" [puppet] - 10https://gerrit.wikimedia.org/r/173287 (owner: 10GWicke)
[16:21:36] <gwicke>	 which is different from the previous one, with the SSDs in a RAID-0
[16:23:17] <icinga-wm>	 RECOVERY - Host analytics1003 is UP: PING OK - Packet loss = 0%, RTA = 1.82 ms  
[16:25:35] <gwicke>	 akosiaris: https://gerrit.wikimedia.org/r/#/c/173287/1..2/modules/eventlogging/manifests/init.pp looks odd
[16:25:46] <jackmcbarn>	 gmail is tossing wikipedia emails in my spam folder now, and it never did before. did we change something about our mailing setup?
[16:26:57] <akosiaris>	 I don't see that
[16:27:01] <akosiaris>	 gwicke: ^
[16:27:19] <akosiaris>	 gerrit's internal issues with diffing across rebases ?
[16:28:05] <gwicke>	 akosiaris: yeah, gerrit fooled me with the rebase diff
[16:28:12] <gwicke>	 nm
[16:28:54] <gwicke>	 what's the reasoning behind specifying file modes with a leading zero?
[16:29:30] <akosiaris>	 resetting possible sticky, suid, guid bits
[16:29:47] <icinga-wm>	 RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures  
[16:30:27] <gwicke>	 akosiaris: I see, so for the case that those were set outside of puppet
[16:30:36] <akosiaris>	 exactly
[16:30:47] <gwicke>	 k, makes sense
[16:31:51] <icinga-wm>	 PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Puppet has 1 failures  
[16:35:09] <icinga-wm>	 RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures  
[16:36:03] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Removed some obsolete roles. [puppet] - 10https://gerrit.wikimedia.org/r/173293 
[16:36:05] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Move openstack_version and use_neutron into hiera [puppet] - 10https://gerrit.wikimedia.org/r/173294 
[16:36:28] <icinga-wm>	 RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK  
[16:40:19] <bd808>	 !log Increased replica count from 0 to 2 for all logstash elasticsearch indices. Expect icinga warnings as replicas are populated.
[16:40:22] <morebots>	 Logged the message, Master
[16:41:06] <bd808>	 5.5T of storage on each logstash host now. Log all the things!
[16:41:18] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 60 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 54, utimed_out: False, uactive_primary_shards: 41, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 63, uinitializing_shards: 6, unumber_of_data_nodes: 3}  
[16:41:29] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 60 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 54, utimed_out: False, uactive_primary_shards: 41, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 63, uinitializing_shards: 6, unumber_of_data_nodes: 3}  
[16:41:50] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 60 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 54, utimed_out: False, uactive_primary_shards: 41, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 63, uinitializing_shards: 6, unumber_of_data_nodes: 3}  
[16:46:36] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn)
[16:52:50] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Change xenon,cerium,praseodium raid scheme [puppet] - 10https://gerrit.wikimedia.org/r/173307 
[16:55:06] <grrrit-wm>	 (03CR) 10Gage: [C: 032] logstash: Use doc_values for normalized_message.raw [puppet] - 10https://gerrit.wikimedia.org/r/173209 (owner: 10BryanDavis)
[16:57:46] <grrrit-wm>	 (03PS17) 10GWicke: Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 
[16:58:27] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke)
[16:58:47] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Change xenon,cerium,praseodium raid scheme [puppet] - 10https://gerrit.wikimedia.org/r/173307 (owner: 10Alexandros Kosiaris)
[17:01:41] <grrrit-wm>	 (03PS18) 10GWicke: Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 
[17:05:59] <icinga-wm>	 PROBLEM - Host ms-be2005 is DOWN: CRITICAL - Plugin timed out after 15 seconds  
[17:10:08] <icinga-wm>	 RECOVERY - check_mysql on lutetium is OK: Uptime: 612941  Threads: 1  Questions: 9371930  Slow queries: 20893  Opens: 1578  Flush tables: 2  Open tables: 64  Queries per second avg: 15.290 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0  
[17:16:11] <grrrit-wm>	 (03CR) 10Ottomata: Add a simple restbase::labs role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke)
[17:19:40] <grrrit-wm>	 (03CR) 10GWicke: Add a simple restbase::labs role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke)
[17:22:01] <gwicke>	 ottomata: the question is mostly about per-role vs. per-host parameters in hiera
[17:26:38] <ottomata>	 gwicke: i am not sure of the details, but the point of hiera is to hierachically assign variables
[17:26:45] <ottomata>	 hierarchically? 
[17:26:45] <ottomata>	 *
[17:27:13] <ottomata>	 in labs, in particular, i think yuvi is making it so you can set them by project
[17:27:35] <greg-g>	 hierarchachaiciclclcy
[17:27:37] <ottomata>	 in production, they can be set on a node level, a site (datacenter level), or other
[17:27:40] <ottomata>	 node level too
[17:27:56] <ottomata>	 it depends on which file they are defined in....i think, but the variable names remain the same
[17:28:10] <ottomata>	 https://wikitech.wikimedia.org/wiki/Puppet_Hiera
[17:29:22] <ottomata>	 gwicke: read that ^ i think it explains pretty well
[17:29:49] <bd808>	 The wikitech backend works now too -- https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep
[17:30:24] <gwicke>	 ottomata: so, the mechanism for a cluster would be the fqdn regexp group?
[17:30:29] <icinga-wm>	 RECOVERY - Host ms-be2005 is UP: PING OK - Packet loss = 0%, RTA = 43.16 ms  
[17:30:46] <^d>	 What would be the best way to run a cron once a week on all wikis. But like staggered...I don't want to just run `foreachwiki` every Sunday or some crap.
[17:31:24] <bd808>	 Hmmm... like for each wiki sometime during the week?
[17:32:00] <ottomata>	 gwicke: will you have more than one restbase cluster in production?
[17:32:07] <bd808>	 ^d: 200+ cron jobs :(
[17:32:08] <ottomata>	 in production/eqiad?
[17:32:23] <gwicke>	 ottomata: that's not unlikely
[17:32:39] <^d>	 bd808: Yeah :(
[17:33:01] <ottomata>	 gwicke: then they would probably have different role classes, as a role clas would be meant to configure a particular usage of restbase module
[17:33:03] <ottomata>	 so, dunno
[17:33:16] <ottomata>	 and you *could* use the mainrole/ level to configure them then
[17:33:31] <ottomata>	 but, the mainrole would be configured by regexes, hm
[17:33:58] <ottomata>	 i would really like to get _joe_'s input here
[17:34:12] <^d>	 bd808: We have almost 900 wikis.
[17:34:32] <^d>	 I don't think we should add that many cron entries to terbium.
[17:34:37] <gwicke>	 will a site.pp regexp group automatically set up a corresponding hiera regexp group?
[17:34:38] <bd808>	 ^d: I think I'd ask Reedy to help figure something out that's better than that. There's got to be a way.
[17:37:33] <ottomata>	 gwicke: no, i don't think so
[17:38:44] <gwicke>	 ottomata: hmm; that strikes me as potentially ugly, as we'll have to manually keep the two regexp groups in sync
[17:38:52] <ottomata>	 i agree
[17:38:56] <Nemo_bis>	 What crons?
[17:39:07] <Nemo_bis>	 maintenance.pp has examples of crons by groups of wikis
[17:39:28] <ottomata>	 it'd be better if you could set mainrole in puppet.
[17:39:59] <ottomata>	 gwicke: what is your use case for having multiple restbase clusters in eqiad?
[17:41:26] <gwicke>	 ottomata: at the cassandra level we'll eventuallly want to have separate clusters for groups of wikis
[17:42:30] <gwicke>	 for isolation, independent DC fail-over, offline stuff in the same DC
[17:44:31] <grrrit-wm>	 (03PS1) 10Mark Bergsma: Allocate labstore200[12] mgmt IPs [dns] - 10https://gerrit.wikimedia.org/r/173313 
[17:46:23] <gwicke>	 ottomata: that doesn't necessarily have to mean separate restbase clusters too, but it might be easier to implement it that way
[17:56:58] <icinga-wm>	 PROBLEM - puppet last run on analytics1024 is CRITICAL: CRITICAL: Puppet has 1 failures  
[17:57:20] <grrrit-wm>	 (03CR) 10Mark Bergsma: [C: 032] Allocate labstore200[12] mgmt IPs [dns] - 10https://gerrit.wikimedia.org/r/173313 (owner: 10Mark Bergsma)
[17:59:50] <icinga-wm>	 PROBLEM - Host xenon is DOWN: PING CRITICAL - Packet loss = 100%  
[18:01:07] <akosiaris>	 !log reimaging xenon
[18:01:11] <morebots>	 Logged the message, Master
[18:03:27] <grrrit-wm>	 (03PS1) 10Papaul: Revert "Allocate labstore200[12] mgmt IPs" [dns] - 10https://gerrit.wikimedia.org/r/173318 
[18:04:03] <manybubbles>	 gi11es: do you know anything about this?  https://bugzilla.wikimedia.org/show_bug.cgi?id=69362
[18:05:00] <icinga-wm>	 RECOVERY - Host xenon is UP: PING OK - Packet loss = 0%, RTA = 1.01 ms  
[18:07:19] <icinga-wm>	 PROBLEM - SSH on xenon is CRITICAL: Connection refused  
[18:07:28] <icinga-wm>	 PROBLEM - puppet last run on xenon is CRITICAL: Connection refused by host  
[18:07:29] <icinga-wm>	 PROBLEM - RAID on xenon is CRITICAL: Connection refused by host  
[18:07:39] <icinga-wm>	 PROBLEM - check configured eth on xenon is CRITICAL: Connection refused by host  
[18:07:43] <icinga-wm>	 PROBLEM - Disk space on xenon is CRITICAL: Connection refused by host  
[18:07:52] <icinga-wm>	 PROBLEM - check if salt-minion is running on xenon is CRITICAL: Connection refused by host  
[18:07:52] <icinga-wm>	 PROBLEM - DPKG on xenon is CRITICAL: Connection refused by host  
[18:07:58] <icinga-wm>	 PROBLEM - check if dhclient is running on xenon is CRITICAL: Connection refused by host  
[18:10:03] <grrrit-wm>	 (03PS1) 10Ottomata: Update zookeeper version to reflect upgraded version after Trusty upgrade [puppet] - 10https://gerrit.wikimedia.org/r/173321 
[18:11:04] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Update zookeeper version to reflect upgraded version after Trusty upgrade [puppet] - 10https://gerrit.wikimedia.org/r/173321 (owner: 10Ottomata)
[18:12:29] <icinga-wm>	 RECOVERY - puppet last run on analytics1024 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures  
[18:13:59] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: nagios: convert monitor_service to monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/173322 
[18:17:25] <grrrit-wm>	 (03CR) 10Papaul: [C: 031] Revert "Allocate labstore200[12] mgmt IPs" [dns] - 10https://gerrit.wikimedia.org/r/173318 (owner: 10Papaul)
[18:19:39] <icinga-wm>	 PROBLEM - NTP on xenon is CRITICAL: NTP CRITICAL: No response from NTP server  
[18:23:49] <icinga-wm>	 RECOVERY - SSH on xenon is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0)  
[18:26:11] <ottomata>	 gwicke: aye.  I've added _joe_as reviewer to that
[18:26:32] <gwicke>	 ottomata: k, thx
[18:27:03] <ottomata>	 so, ja, i think if you have multiple clusters, you will have multiple roles...at least as sublcasses
[18:27:23] <ottomata>	 hm, or maybe not...if the only thing that is different about them are module parameters
[18:27:45] <ottomata>	 but, either way, you don't need to parameterize the role class, and you don't need a realm based (i.e. ::labs) specific role
[18:28:14] <ottomata>	 for now, since you are testing in labs, and are just trying to get a single eqiad test cluster up, you can configure this with yaml 
[18:28:33] <ottomata>	 in labs, via the ProjectName:Hiera editor, and in produciton i think, in eqiad.yaml
[18:28:43] <gwicke>	 ottomata: I'm still waiting for the cassandra module to be merged
[18:29:06] <gwicke>	 that should make testing in beta labs easier
[18:29:51] <grrrit-wm>	 (03CR) 10Ottomata: "Giuseppe, can we get your advice on this? Gabriel possibly will want to have multiple RESTbase clusters in production eqiad. How should" [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke)
[18:29:55] <ottomata>	 gwicke: me too!
[18:29:58] <ottomata>	 paravoid: ? :)
[18:30:20] <gwicke>	 https://gerrit.wikimedia.org/r/#/c/166888/
[18:30:46] <gwicke>	 so if I remove the parameter forwarding, will the parameters still namespaced to the role class?
[18:30:53] <gwicke>	 *be
[18:31:37] <gwicke>	 whoa: https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=terbium.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1395860566&v=648583&m=Global%20JobQueue%20length&z=large
[18:31:43] <gwicke>	 40m jobs in the queue
[18:32:13] <andrewbogott>	 bd808 or Reedy: In general there's no actual correspondance between wikis and wiki hosts, right?  So in theory wikitech should just live in the 'wiki pool' and get hosted here and there and everywhere?
[18:32:30] <bd808>	 andrewbogott: yup
[18:32:56] <andrewbogott>	 Does it matter that wikitech uses its own auth and accounts and such?
[18:33:22] <gwicke>	 many looks like it's basically all cirrusSearchLinksUpdate jobs
[18:33:29] <icinga-wm>	 RECOVERY - check configured eth on xenon is OK: NRPE: Unable to read output  
[18:33:29] <icinga-wm>	 RECOVERY - Disk space on xenon is OK: DISK OK  
[18:33:32] <bd808>	 andrewbogott: no, but all the appserver hosts would need to be able to access ldap and such
[18:33:38] <icinga-wm>	 RECOVERY - check if salt-minion is running on xenon is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion  
[18:33:39] <icinga-wm>	 RECOVERY - DPKG on xenon is OK: All packages OK  
[18:33:39] <icinga-wm>	 RECOVERY - check if dhclient is running on xenon is OK: PROCS OK: 0 processes with command name dhclient  
[18:33:42] <andrewbogott>	 Ah, right, ldap
[18:33:46] <andrewbogott>	 Hm...
[18:34:02] <bd808>	 If you can break the link to openstack then the wiki is just a plain old wiki again
[18:34:05] <andrewbogott>	 Actually I bet they already can.  Lemme check
[18:34:09] <icinga-wm>	 RECOVERY - RAID on xenon is OK: OK: no disks configured for RAID  
[18:34:25] <andrewbogott>	 Well, the link to openstack has always been via REST.  So in theory that should be able to run anyplace...
[18:34:37] <bd808>	 oh cool
[18:34:38] <andrewbogott>	 There might be a million race conditions with ldap though
[18:34:56] <bd808>	 I bet ldap is accessible because we use if for misc services
[18:35:06] <bd808>	 *use it
[18:36:08] <andrewbogott>	 I'd think.  The ldap tools aren't installed on the app servers but that might not matter.
[18:37:40] <andrewbogott>	 So, hm… I can't think of a good way to test this incrementally.  I guess we could set up an emtpy wikitech replacement on the cluster and then migrate the content later once it's tested.
[18:38:03] <andrewbogott>	 "set up an empty wikitech replacement" which I definitely don't know how to do :)
[18:39:30] <bd808>	 andrewbogott: Reedy would be the guy to talk to. I just pretend to understand how this stuff works. He groks it fully.
[18:39:46] <andrewbogott>	 ok.  And he's in the UK, yes?
[18:39:56] <bd808>	 yeah
[18:40:21] <bd808>	 put how mostly kind of works on central US time'ish
[18:40:27] <bd808>	 *but he
[18:40:37] <andrewbogott>	 ok, I'll bug him on Monday.  It would be pretty great to get the wikitech wiki fully out of my hands.
[18:40:45] * bd808 agrees
[18:41:30] <bd808>	 I'd still like to figure out how to update to a modern version of SMW for it too
[18:41:41] <bd808>	 so many side projects...
[18:41:57] <paravoid>	 or just get rid of SMW :)
[18:42:08] <paravoid>	 andrewbogott: maybe you should wait until OSM is gone, though?
[18:42:55] <andrewbogott>	 paravoid: maybe, although I'm not sure it matters.  OSM is already deployed with the normal deployment system...
[18:43:10] <andrewbogott>	 So as long as the appservers can talk to virt1000 via http everything would work fine.
[18:43:29] <andrewbogott>	 hm, and labnet1001
[18:43:38] <paravoid>	 we have a higher bar for extensions that run in production, due to the elevated access they have
[18:43:49] <icinga-wm>	 RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures  
[18:43:49] <paravoid>	 e.g. csteipp may want to security review it
[18:44:11] <bd808>	 We ship the code to the cluster now, but yeah it's not enabled there
[18:44:12] <andrewbogott>	 yeah, that's a fair point.
[18:44:28] <legoktm>	 !log running scripts to fix bug 72927
[18:44:33] <morebots>	 Logged the message, Master
[18:45:05] <paravoid>	 as far as I understand it, wikitech's future probably has no OSM, and possibly not even SMW
[18:45:17] <paravoid>	 and then it could just have no LDAP at all and just move to centralauth :)
[18:45:26] <paravoid>	 and converted into a documentation wiki
[18:45:48] <paravoid>	 but maybe that's too far away, I don't know
[18:46:00] <andrewbogott>	 Hm, yeah, although we'd need to provide a different front-end to create ldap accounts.
[18:46:10] <andrewbogott>	 Which maybe Horizon can handle, I'm not sure.  Right now I've only been thinking of it as a consumer
[18:46:21] <andrewbogott>	 Overall openstack/keystone has a read-only approach to ldap
[18:46:43] <paravoid>	 how do others handle this?
[18:46:55] <paravoid>	 creating accounts with keystone?
[18:47:18] <andrewbogott>	 I think others mostly don't use ldap, they just let keystone manage users in its own db
[18:47:29] <icinga-wm>	 PROBLEM - Disk space on db1017 is CRITICAL: DISK CRITICAL - free space: /var/lib/carbon 3443 MB (3% inode=98%):  
[18:47:40] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] "support! per "The ISO code for multi-lingual resources is <code>mul</code> and this was recently added to the interwiki map to allow for l" [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher)
[18:48:05] <andrewbogott>	 SMW is useful because it makes things that are otherwise keystone-protected (instances in a project, etc.) publicly visible queryable and sortable and such.
[18:48:20] <andrewbogott>	 There might be another better way of exposing that though
[18:48:31] <andrewbogott>	 I dunno, is wikidata the catch-all replacement for SMW?
[18:49:13] <mutante>	 in the future, but not yet
[18:49:19] <ori>	 they're in the same conceptual space but there's nothing like feature parity
[18:49:23] <mutante>	 because getting data out of it is still too expensive
[18:49:46] <paravoid>	 instances won't be in wikitech in our horizon feature, though, no?
[18:50:04] <ori>	 andrewbogott: couldn't you just have a 'public' keystone account that the web app uses to query data?
[18:50:27] <andrewbogott>	 paravoid: well, there are two things...
[18:50:41] <andrewbogott>	 There's OSM which controls and queries OpenStack
[18:50:55] <andrewbogott>	 but there's also an openstack notifier which updates the instance stat pages.
[18:51:00] <andrewbogott>	 That latter thing is totally unrelated to OSM
[18:51:07] <andrewbogott>	 And that's (mostly) what SMW consumes.
[18:51:25] <paravoid>	 aha
[18:51:39] <icinga-wm>	 RECOVERY - NTP on xenon is OK: NTP OK: Offset -0.01260328293 secs  
[18:51:42] <paravoid>	 do people actually use the latter?
[18:51:47] <andrewbogott>	 ori: I'm not sure if there are read-only rights in openstack by default.  I've never messed with them, at least.
[18:51:59] <andrewbogott>	 paravoid: Yeah, I think they do.  I do, at least.
[18:52:08] <andrewbogott>	 That doesn't mean they're necessarily important, but...
[18:52:14] <andrewbogott>	 It's nice to be able to see a project without first joining it.
[18:53:19] <andrewbogott>	 So, for example, a page like this:  https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep
[18:53:30] <andrewbogott>	 that's mostly SMW.  We'd still have all that info if we yanked out OSM
[18:53:43] <andrewbogott>	 Hm, some crazy bad formatting on that :(
[18:54:28] <mutante>	 yes, people use that
[18:56:51] <YuviPanda>	 ori: andrewbogott re: exposing info, we can just write API on top
[18:56:53] <YuviPanda>	 like we started
[18:58:22] <andrewbogott>	 YuviPanda: Having the data pushed out into a db (if SMW can be called that) on update is kind of nice for querying though.  Otherwise it'd be a serious storm of API calls anytime someone loaded a stat page
[18:58:33] <YuviPanda>	 andrewbogott: well, memcached :)
[18:58:42] <YuviPanda>	 like, we're hitting our APIs fairly often
[18:58:43] <YuviPanda>	 and things are ok
[18:58:46] <andrewbogott>	 And, really, how we get the data (API call or callback) is unrelated to the SMW issue
[18:58:53] <andrewbogott>	 since we'd still need some kind of query/sort system
[19:00:39] <YuviPanda>	 true
[19:00:43] <YuviPanda>	 it's SMW vs writing our own
[19:01:02] <YuviPanda>	 I far prefer the latter, tbh. those are fairly easy to write, and SMW is painful (at least for me)
[19:02:59] * andrewbogott lunches
[19:14:41] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "for https://gerrit.wikimedia.org/r/#/c/173250/" [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher)
[19:15:43] <grrrit-wm>	 (03CR) 10Dzahn: "mul.wikisource.org has address 198.35.26.96" [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher)
[19:18:40] <grrrit-wm>	 (03CR) 10Dzahn: "Glaisher: http://mul.wikisource.org/ already works without even changing Apache (right?)" [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher)
[19:21:05] <grrrit-wm>	 (03CR) 10Dzahn: "i merged the DNS change (https://gerrit.wikimedia.org/r/173247) and it looks to me like it already works just fine without even needing th" [puppet] - 10https://gerrit.wikimedia.org/r/173250 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher)
[19:32:19] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson)
[19:36:17] <grrrit-wm>	 (03PS1) 10Anomie: Copy a sanitized version of api-feature-usage [puppet] - 10https://gerrit.wikimedia.org/r/173336 
[20:08:05] <grrrit-wm>	 (03PS1) 10Manybubbles: Send more update jobs to Elasticsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 
[20:09:01] <grrrit-wm>	 (03PS2) 10Manybubbles: Send more update jobs to Elasticsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 
[20:10:04] <grrrit-wm>	 (03PS3) 10Manybubbles: Send more update jobs to Elasticsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 
[20:11:46] <grrrit-wm>	 (03CR) 10Manybubbles: "Right now the load average on the Elasticsearch cluster is super low. I figured it was because we through so much more hardware at it. I" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 (owner: 10Manybubbles)
[20:15:00] <andrewbogott>	 Is Jenkins speaking Spanish to everyone or just me?  https://integration.wikimedia.org/ci/
[20:15:49] <mutante>	 not just you
[20:16:55] <mutante>	 well it's mixed, actually
[20:17:10] <mutante>	 "Clave del API" right above "Color blind support"
[20:17:14] <mutante>	 under "Configurar" 
[20:17:41] <andrewbogott>	 It's not really a problem, just… interesting
[20:17:41] <mutante>	 sounds like set to es but not all translations exist and then fallback to en
[20:17:47] <mutante>	 yea, indeed
[20:17:48] <manybubbles>	 cute
[20:17:54] <mutante>	 vaguely remember this came up before
[20:18:25] <grrrit-wm>	 (03PS1) 10Dzahn: remove glusterfs and pmtpa remnants [puppet] - 10https://gerrit.wikimedia.org/r/173349 
[20:18:27] <grrrit-wm>	 (03PS1) 10Dzahn: protoproxy - update usage examples to current [puppet] - 10https://gerrit.wikimedia.org/r/173350 
[20:19:17] <grrrit-wm>	 (03CR) 10Aaron Schulz: [C: 031] Send more update jobs to Elasticsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 (owner: 10Manybubbles)
[20:19:21] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-1] "change to modules/mariadb should not be here, grrmbl" [puppet] - 10https://gerrit.wikimedia.org/r/173349 (owner: 10Dzahn)
[20:21:02] <renoirb>	 I have a question about SwiftFileBackend and LocalRepo configuration
[20:21:42] <grrrit-wm>	 (03CR) 10Manybubbles: "Scheduled for Monday 'morning' SWAT:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 (owner: 10Manybubbles)
[20:21:43] <YuviPanda>	 thoughts about importing this module into our puppet repo?
[20:21:44] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "just changes comments - show current example like from role/protoproxy.pp" [puppet] - 10https://gerrit.wikimedia.org/r/173350 (owner: 10Dzahn)
[20:21:49] <YuviPanda>	 Coren: andrewbogott mutante ^
[20:21:50] <YuviPanda>	 err
[20:21:52] <YuviPanda>	 module being https://github.com/erwbgy/puppet-limits
[20:22:02] <renoirb>	 Is $wgLocalFileRepo useful only with a $wgFileBackends,?
[20:22:26] <renoirb>	 AaronSchulz, question about wgFileBackends.   Is $wgLocalFileRepo useful only with a $wgFileBackends?
[20:23:34] <grrrit-wm>	 (03PS2) 10Dzahn: protoproxy - update usage examples to current [puppet] - 10https://gerrit.wikimedia.org/r/173350 
[20:24:10] <mutante>	 YuviPanda: which one are you pointing at? context?
[20:24:23] <YuviPanda>	 mutante: pointing at https://github.com/erwbgy/puppet-limits
[20:24:24] <AaronSchulz>	 renoirb: you can configure it without $wgFileBackends, as long as you don't try to set 'backend'
[20:24:28] <YuviPanda>	 mutante: context is we need to disable coredumps on toollabs
[20:24:39] <YuviPanda>	 and that requires placing an entry in /etc/security/limits.conf
[20:24:39] <grrrit-wm>	 (03PS1) 10Ori.livneh: Update EventLogging listener IP for labs [puppet] - 10https://gerrit.wikimedia.org/r/173352 
[20:24:41] <grrrit-wm>	 (03PS1) 10Ori.livneh: keyholder: add /etc/keyholder.d and `keyholder arm` subcommand [puppet] - 10https://gerrit.wikimedia.org/r/173353 
[20:24:48] <YuviPanda>	 and that's a single file, so can't just put different files in
[20:24:52] <YuviPanda>	 so needs some way to do this...
[20:25:09] <andrewbogott>	 YuviPanda: that is pleasingly minimalist for an upstream module :)  If you can verify that it won't tromp on anything currently running it seems fine with me.
[20:25:10] <renoirb>	 AaronSchulz, the reason of my question is that i’m making configuration that will check if local deployment has an assigned file backend endpoint.
[20:25:17] <YuviPanda>	 :D
[20:25:27] <grrrit-wm>	 (03PS2) 10Ori.livneh: keyholder: add /etc/keyholder.d and `keyholder arm` subcommand [puppet] - 10https://gerrit.wikimedia.org/r/173353 
[20:26:06] <grrrit-wm>	 (03PS3) 10Ori.livneh: keyholder: add /etc/keyholder.d and `keyholder arm` subcommand [puppet] - 10https://gerrit.wikimedia.org/r/173353 
[20:26:09] <YuviPanda>	 andrewbogott: goddamit, it uses puppet augeas
[20:26:13] <YuviPanda>	 andrewbogott: and I don't know if we use that
[20:26:23] <renoirb>	 It is then safe to assume that $wgLocalFileRepo is not really required if no $wgFileBackends pointing to a Swift exists AaronSchulz ?
[20:26:26] <andrewbogott>	 Hm, I don't know what that is
[20:26:29] <mutante>	 YuviPanda: ah, i see, so you are suggesting we import that module? sounds reasonable, i don't know about use_hiera true/false
[20:26:42] <mutante>	 YuviPanda: we used augeas for old firewall iptables
[20:26:49] <YuviPanda>	 mutante: oh
[20:26:49] <mutante>	 before ferm
[20:26:56] <YuviPanda>	 is augeas something you enable?
[20:27:03] <YuviPanda>	 oh
[20:27:07] <YuviPanda>	 apparently it's there by default?
[20:27:09] <YuviPanda>	 https://docs.puppetlabs.com/references/latest/type.html#augeas
[20:27:21] <mutante>	 YuviPanda: we use it in modules/interface/ too
[20:27:23] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] keyholder: add /etc/keyholder.d and `keyholder arm` subcommand [puppet] - 10https://gerrit.wikimedia.org/r/173353 (owner: 10Ori.livneh)
[20:27:25] <YuviPanda>	 oooh
[20:27:26] <mutante>	 modules/interface/manifests/tagged.pp:        # Use augeas
[20:27:30] <YuviPanda>	 mutante: so that means I *can* use this module
[20:27:34] <mutante>	 sounds like it, yea
[20:27:41] <mutante>	 modules/postgresql/manifests/user.pp:        augeas { "hba_create-${name}":
[20:27:49] <mutante>	 grep -r augeas *
[20:27:51] <ori>	 mutante: ok to merge?
[20:27:56] <ori>	 dzahn: protoproxy - update usage examples to current (f0613a989d)
[20:28:01] <AaronSchulz>	 renoirb: really depends what you are doing...but if you only set wgLocalFileRepo for the purpose of using swift (and don't care about the other wgLocalFileRepo settings), then I suppose that could work
[20:28:07] <mutante>	 ori: sorry, yes please, just changes comments
[20:28:14] <ori>	 np
[20:28:25] <YuviPanda>	 ah, right
[20:30:17] <YuviPanda>	 using that module would also mean we can just use hiera directly for limits anywhere
[20:31:17] <andrewbogott>	 YuviPanda: is that a file that's not otherwise present on labs and prod systems?
[20:31:27] <YuviPanda>	 andrewbogott: aha, so that's the thing. it sometimes is.
[20:31:39] <YuviPanda>	 andrewbogott: so potential to run into conflicts is... high
[20:31:42] <YuviPanda>	 well, not 'high'
[20:31:43] <andrewbogott>	 ok, that's a bit worrisome then -- best to figure out what's happening there first.
[20:31:44] <YuviPanda>	 but still...
[20:31:59] <YuviPanda>	 andrewbogott: oh, no, if I include the module I'll change the other usages to use the module
[20:32:06] <YuviPanda>	 andrewbogott: by default it's emptyu
[20:32:08] <YuviPanda>	 *empty
[20:32:15] <andrewbogott>	 ok, so it's only present when puppetized?
[20:32:16] <YuviPanda>	 andrewbogott: we've puppet code that specifically puts a file there sometime
[20:32:18] <YuviPanda>	 yeah
[20:32:26] <andrewbogott>	 OK, yeah, then standardizing on that module seems great.
[20:32:28] <YuviPanda>	 well, the file is present otherwise too, but is fully commented out by default
[20:32:42] <renoirb>	 AaronSchulz, i’m making a config generator in Salt.  I want it to support multiple deployments, local, on staging (with a different set of credentials) and production.  So, I need to figure out what is required when 
[20:32:42] <YuviPanda>	 I wonder if I should use librarian puppet
[20:32:45] <YuviPanda>	 probably not right now
[20:32:55] <YuviPanda>	 andrewbogott: I guess we'll need to import it into gerrit
[20:33:37] <andrewbogott>	 You could manage it as a submodule or just copypaste into the normal repo
[20:33:48] <andrewbogott>	 If the license allows it… the latter might be just as good
[20:33:53] <_joe_>	 YuviPanda: librarian is horrible and we don't want it
[20:34:03] <_joe_>	 I took a look and it's a crippled concept
[20:34:25] <_joe_>	 what would you want librarian for, and what would it give us over git submodules?
[20:34:42] <YuviPanda>	 andrewbogott: ok, this isinfuriating.
[20:34:46] <YuviPanda>	 andrewbogott: it doesn't have a LICENSE file
[20:34:55] <YuviPanda>	 andrewbogott: oh
[20:34:56] <YuviPanda>	 andrewbogott: it does
[20:34:59] <YuviPanda>	 andrewbogott: license 'Apache License, Version 2.0' 
[20:35:12] <andrewbogott>	 that certainly allows for copypasta
[20:35:19] <YuviPanda>	 yeah, but we should use a submodule, I THINK
[20:35:20] <YuviPanda>	 *think
[20:35:32] <YuviPanda>	 _joe_: I haven't looked into it too much, but I'm not too much a fan of submodules.
[20:35:43] <YuviPanda>	 anyway, that's for another day...
[20:35:44] <andrewbogott>	 ok, I take it back, my Spanish is not so good
[20:35:46] <_joe_>	 YuviPanda: I plainly fucking hate them
[20:35:51] <YuviPanda>	 hehe
[20:35:58] <_joe_>	 but they're still better than librarian
[20:35:59] <_joe_>	 :)
[20:36:02] <YuviPanda>	 haha :)
[20:36:12] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Removed some obsolete roles. [puppet] - 10https://gerrit.wikimedia.org/r/173293 (owner: 10Andrew Bogott)
[20:36:34] <YuviPanda>	 _joe_: considering using https://github.com/erwbgy/puppet-limits
[20:36:41] <YuviPanda>	 _joe_: will let us manage security/limits.conf via hiera
[20:36:51] <YuviPanda>	 _joe_: I need to use that to disable coredumps on labs
[20:37:08] <YuviPanda>	 but since the code I need to conditionalize it is in the base module...
[20:37:09] <_joe_>	 YuviPanda: remember upstart doesn't give a fuck about limits.conf :)
[20:37:26] <YuviPanda>	 hmm? these are coredumps from lighty started by SGE...
[20:37:33] <YuviPanda>	 why would upstart matter, at least for this particular use case
[20:37:50] <YuviPanda>	 (this is for https://phabricator.wikimedia.org/T1259)
[20:38:04] <_joe_>	 YuviPanda: I just reminded you that you have this caveat now when using that file
[20:38:09] <YuviPanda>	 ah, right
[20:38:11] <YuviPanda>	 yeah
[20:38:49] <YuviPanda>	 anyway, let me put that in
[20:38:54] <_joe_>	 so if SGE is started via upstart (I have no idea)
[20:39:03] <YuviPanda>	 it is...
[20:39:05] <_joe_>	 YuviPanda: that module is not that great
[20:39:44] <_joe_>	 YuviPanda: so look up how limits are enforced on linux, and see how that won't affect your processes :)
[20:39:52] <YuviPanda>	 hmm, alright.
[20:40:03] <_joe_>	 unless SGE does read that file itself
[20:40:16] <_joe_>	 and sets limits to its jobs
[20:40:28] <YuviPanda>	 hmm, all I wanted to do was disable coredumps, and now I guess I'll start yakshaving...
[20:41:36] <_joe_>	 YuviPanda: http://upstart.ubuntu.com/cookbook/#limit
[20:42:22] <YuviPanda>	 ori: I'm wondering if I should just revert https://gerrit.wikimedia.org/r/#/c/171206/ (and add counteracting code) for the weekend. TOolLabs hosts get their (tiny) /vars filled up every few hours now because of coredumps
[20:42:22] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 031] "Anomie: as soon as you are happy with the data you are getting in logstash from this, add jgage as a reviewer and poke him to merge it." [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie)
[20:42:30] <YuviPanda>	 ori: and I don't want to manually fight that during the weekend...
[20:42:47] <ori>	 YuviPanda: just add a special tidy {}  resource for labs
[20:43:00] <YuviPanda>	 ori: well, sometimes just one coredump is enough to fill it up
[20:43:11] <ori>	 that's a bug
[20:43:24] <YuviPanda>	 indeed, because /var is 2G, and a lot of it is filled with logs
[20:44:03] <ori>	 YuviPanda: another workaround: add a profile.d script that calls 'ulimit -S -c 0 > /dev/null 2>&1'
[20:44:13] <YuviPanda>	 fix is to have bigger /var, and the newer labs images do have a slightly bigger /var. but these aren't new instances...
[20:44:22] <YuviPanda>	 ah, hmm...
[20:44:41] <YuviPanda>	 still, that's just a workaround. the 'core' bug (tiny /var) isn't going to be fixed anytime soon.
[20:45:05] <YuviPanda>	 and nobody uses the coredumps on tools. they're almost always lighty going off.
[20:45:11] <YuviPanda>	 hmm, I wonder what the status quo was before that patch.
[20:45:13] <ori>	 so disable core dumps
[20:45:51] <YuviPanda>	 hmm, limits.conf, and _joe_ was pointing out weirdness there, and now I'm thoroughly confused.
[20:46:16] * YuviPanda goes to read more things
[20:47:57] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 04-2] "Still WIP" [puppet] - 10https://gerrit.wikimedia.org/r/173080 (owner: 10Yuvipanda)
[20:52:54] <mutante>	 puppet compiler also Spanish :)
[20:52:56] <mutante>	 Proyecto operations-puppet-catalog-compiler
[20:53:06] <mutante>	 Enlaces permanentes
[20:54:11] <grrrit-wm>	 (03CR) 10Anomie: "I wonder whether we should merge this as-is or combine it with whatever changes are necessary to push it into the search ES." [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie)
[21:02:27] <ebernhardson>	 has anything changed recently re: hhvm on osmium?  Trying to run some tests but i keep getting 'Warning: Compilation failed: this version of PCRE is compiled without UTF support at offset 0 in /srv/mediawiki/..." along with the regular expressions not matching
[21:05:00] <_joe_>	 ebernhardson: ask ori, but I think osmium is more of a hhvm dev sandbox than a real hhvm test box
[21:05:00] <ori>	 ebernhardson: the build there is locally hacked in half a dozen ways to make it possible to debug an issue we had
[21:05:04] <ori>	 ebernhardson: what are you trying to do?
[21:05:12] <MatmaRex>	 ebernhardson: hah, re bug 73426?
[21:05:27] <ebernhardson>	 ori: have a bug thats happening on ptwiki Echo that i cant reproduce locally, so i boot an instance on osmium and attach the debugger
[21:05:49] <ebernhardson>	 (on alternate ports than the default, because attaching a debugger stops the world)
[21:06:04] <ori>	 ebernhardson: you can do that on mw1017 too
[21:06:18] <ebernhardson>	 ori: ok excellent i'll try there, thanks
[21:06:35] <grrrit-wm>	 (03PS1) 10Ori.livneh: mediawiki: move beta::mwdeploy_sudo to mediawiki::users [puppet] - 10https://gerrit.wikimedia.org/r/173364 
[21:07:42] <ebernhardson>	 ls
[21:08:57] <grrrit-wm>	 (03CR) 10BryanDavis: "Building on this until you get the whole thing ready is fine with me, but merging now is fine with me too. One advantage to merging earlie" [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie)
[21:08:57] <mutante>	 bin boot dev etc home sbin usr var
[21:14:14] <andrewbogott>	 _joe_: would you expect me to still specify default values for variables as parameters?  Or should I just not use params at all and assume that it's hiera all the way up?
[21:14:30] <andrewbogott>	 This all makes me slightly nervous since it feels a bit like re-inventing global variables
[21:16:38] <grrrit-wm>	 (03PS1) 10Ori.livneh: trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 
[21:23:42] <ebernhardson>	 MatmaRex: yea.  For extra fun when i run that same revision through echo's generate events method in the debugger, its sending the notification... funsies :)
[21:23:54] <MatmaRex>	 ouch
[21:35:39] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 031] "I thought that deployment-rsycn01 would need to have ::mediawiki::users applied, but it turns out it already has it." [puppet] - 10https://gerrit.wikimedia.org/r/173364 (owner: 10Ori.livneh)
[21:37:59] <grrrit-wm>	 (03PS2) 10Andrew Bogott: Move openstack_version and use_neutron into hiera [puppet] - 10https://gerrit.wikimedia.org/r/173294 
[21:38:47] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Move openstack_version and use_neutron into hiera [puppet] - 10https://gerrit.wikimedia.org/r/173294 (owner: 10Andrew Bogott)
[21:40:00] <grrrit-wm>	 (03PS3) 10Andrew Bogott: Move openstack_version and use_neutron into hiera [puppet] - 10https://gerrit.wikimedia.org/r/173294 
[21:43:43] <grrrit-wm>	 (03CR) 10Dzahn: "thanks guys for reviewing and fixes. also http://puppet-compiler.wmflabs.org/489/change/172803/diff/iron.wikimedia.org.diff.formatted just" [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn)
[21:46:59] <bd808>	 !log restarted /etc/init.d/ganglia-monitor on logstash1003
[21:47:04] <morebots>	 Logged the message, Master
[21:48:24] <grrrit-wm>	 (03CR) 10Dzahn: "Alex, you are right, we just really needed the ListenAddress part, nevertheless i think it wouldn't hurt and it was already a dependency n" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn)
[21:51:21] <ori>	 mutante: wld appreciate a +1 for https://gerrit.wikimedia.org/r/#/c/173364/ if you have a moment
[21:51:29] <grrrit-wm>	 (03PS2) 10Dzahn: misc-web varnish: bugzilla to phab box [puppet] - 10https://gerrit.wikimedia.org/r/172471 
[21:52:41] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-2] "technical downvote because it depends on change in other repo" [puppet] - 10https://gerrit.wikimedia.org/r/172471 (owner: 10Dzahn)
[21:54:06] <grrrit-wm>	 (03PS3) 10Dzahn: switch bugzilla names over to misc-web [dns] - 10https://gerrit.wikimedia.org/r/172469 
[21:57:08] <grrrit-wm>	 (03PS1) 10Yuvipanda: tools: Remove unused MountCollector ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/173433 
[21:58:11] <grrrit-wm>	 (03PS2) 10Ori.livneh: trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 
[21:58:24] <grrrit-wm>	 (03PS2) 10Yuvipanda: tools: Remove unused MountCollector ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/173433 
[22:01:58] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] tools: Remove unused MountCollector ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/173433 (owner: 10Yuvipanda)
[22:03:24] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] "In general I dislike "features" that don't have an obvious use case. That being said, we have already discussed this way more that necessa" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn)
[22:08:51] <grrrit-wm>	 (03PS3) 10Ori.livneh: trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 
[22:09:02] <grrrit-wm>	 (03PS4) 10Ori.livneh: trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 
[22:09:15] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 (owner: 10Ori.livneh)
[22:13:38] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "thanks : http://puppet-compiler.wmflabs.org/490/change/172799/html/iron.wikimedia.org.html" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn)
[22:14:03] <grrrit-wm>	 (03CR) 10Ori.livneh: "another possibility is to provision /etc/init/ssh.override which includes the line "exec sshd -D -p <%= @port %>". This will take preceden" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn)
[22:16:38] <James_F>	 greg-g: Whoops, your deployment e-mail got the wmfNs off by one. :-)
[22:17:41] <icinga-wm>	 PROBLEM - SSH on ms-fe1002 is CRITICAL: Connection refused  
[22:18:22] <icinga-wm>	 PROBLEM - SSH on mw1131 is CRITICAL: Connection refused  
[22:18:27] <mutante>	 oh oh
[22:23:31] <icinga-wm>	 RECOVERY - SSH on mw1131 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0)  
[22:24:52] <icinga-wm>	 RECOVERY - SSH on ms-fe1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0)  
[22:25:00] <mutante>	 puppet race condition or so 
[22:25:16] <mutante>	 the port was blank but it fixed it on second run
[22:35:00] <greg-g>	 James_F: gah!
[22:35:16] <James_F>	 greg-g: Not a major issue. :-)
[22:36:51] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] mediawiki: move beta::mwdeploy_sudo to mediawiki::users [puppet] - 10https://gerrit.wikimedia.org/r/173364 (owner: 10Ori.livneh)
[22:37:03] <grrrit-wm>	 (03PS1) 10Ori.livneh: trebuchet: Fix-up for Ie6673c8af [puppet] - 10https://gerrit.wikimedia.org/r/173438 
[22:37:05] <grrrit-wm>	 (03PS2) 10Ori.livneh: mediawiki: move beta::mwdeploy_sudo to mediawiki::users [puppet] - 10https://gerrit.wikimedia.org/r/173364 
[22:37:09] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] mediawiki: move beta::mwdeploy_sudo to mediawiki::users [puppet] - 10https://gerrit.wikimedia.org/r/173364 (owner: 10Ori.livneh)
[22:37:18] <grrrit-wm>	 (03PS2) 10Ori.livneh: trebuchet: Fix-up for Ie6673c8af [puppet] - 10https://gerrit.wikimedia.org/r/173438 
[22:37:29] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] trebuchet: Fix-up for Ie6673c8af [puppet] - 10https://gerrit.wikimedia.org/r/173438 (owner: 10Ori.livneh)
[22:39:53] <grrrit-wm>	 (03PS2) 10Ori.livneh: Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson)
[22:39:56] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson)
[22:40:05] <grrrit-wm>	 (03Merged) 10jenkins-bot: Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson)
[22:41:11] <logmsgbot>	 !log ori Synchronized wmf-config/InitialiseSettings.php: If60e3fe97: Deploy Translate extension on ca.wikimedia (duration: 00m 05s)
[22:41:19] <morebots>	 Logged the message, Master
[22:43:37] <Nikerabbit>	 there it goes
[22:44:32] <Nemo_bis>	 It's not that rare ;)
[22:45:10] <Nemo_bis>	 Once upon a time, every new Wikimedia wiki joining Translate needed assistance from N. We got better!
[22:46:36] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/491/change/172804/diff/iron.wikimedia.org.diff.formatted" [puppet] - 10https://gerrit.wikimedia.org/r/172804 (owner: 10Dzahn)
[22:46:45] <Nikerabbit>	 ori: you will create the database tables as well, will you?
[22:48:54] <Nemo_bis>	 hehe
[22:49:25] <ori>	 nod
[22:51:46] <ori>	 done
[22:56:52] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-1] "yea, uhm.. i would call this one a limitation of puppet-lint itself. looks like it might not be possible to fix the warning without using " [puppet] - 10https://gerrit.wikimedia.org/r/170493 (owner: 10John F. Lewis)
[22:58:06] <grrrit-wm>	 (03PS2) 10Dzahn: remove glusterfs and pmtpa remnants [puppet] - 10https://gerrit.wikimedia.org/r/173349 
[22:59:03] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-2] "if you feel like fixing it and remove the unrelated change to mariadb, please go ahead" [puppet] - 10https://gerrit.wikimedia.org/r/173349 (owner: 10Dzahn)
[23:05:24] <grrrit-wm>	 (03PS1) 10Kaldari: Updating WikiGrok A/B test start/end times - postponing until Monday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173447 
[23:11:38] <grrrit-wm>	 (03PS2) 10Kaldari: Updating WikiGrok A/B test start/end times - postponing until Monday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173447 
[23:14:21] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "identical: http://puppet-compiler.wmflabs.org/493/change/164273/html/ (and where it fails those are unrelated to this change. Error: Faile" [puppet] - 10https://gerrit.wikimedia.org/r/164273 (owner: 10Hoo man)
[23:14:30] <grrrit-wm>	 (03PS5) 10Dzahn: Remove all references to pmtpa from role::cache [puppet] - 10https://gerrit.wikimedia.org/r/164273 (owner: 10Hoo man)
[23:15:51] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] Remove all references to pmtpa from role::cache [puppet] - 10https://gerrit.wikimedia.org/r/164273 (owner: 10Hoo man)
[23:19:42] <grrrit-wm>	 (03CR) 10Kaldari: [C: 032] Updating WikiGrok A/B test start/end times - postponing until Monday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173447 (owner: 10Kaldari)
[23:19:51] <grrrit-wm>	 (03Merged) 10jenkins-bot: Updating WikiGrok A/B test start/end times - postponing until Monday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173447 (owner: 10Kaldari)
[23:21:01] <logmsgbot>	 !log kaldari Synchronized wmf-config/mobile.php: Updating WikiGrok A/B test start/end times (duration: 00m 07s)
[23:21:04] <morebots>	 Logged the message, Master
[23:22:13] <grrrit-wm>	 (03PS1) 10Dzahn: pmacct - remove commented pmtpa core router [puppet] - 10https://gerrit.wikimedia.org/r/173454 
[23:23:01] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "cr2-pmtpa is gone" [puppet] - 10https://gerrit.wikimedia.org/r/173454 (owner: 10Dzahn)
[23:27:53] <grrrit-wm>	 (03PS1) 10Dzahn: lvs config: remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/173456 
[23:28:29] <grrrit-wm>	 (03PS2) 10Dzahn: lvs config: remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/173456 
[23:29:16] <mutante>	 ^ is there LVS in eqiad labs?
[23:29:19] <mutante>	 https://gerrit.wikimedia.org/r/#/c/173456/2/modules/lvs/manifests/configuration.pp
[23:31:32] <grrrit-wm>	 (03PS3) 10Dzahn: lvs (labs) config: remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/173456 
[23:37:16] <grrrit-wm>	 (03PS1) 10Dzahn: openstack: rm all files/folsom/ [puppet] - 10https://gerrit.wikimedia.org/r/173459 
[23:43:55] <grrrit-wm>	 (03PS1) 10Dzahn: openstack: folsom -> havana as default version [puppet] - 10https://gerrit.wikimedia.org/r/173460 
[23:54:30] <grrrit-wm>	 (03PS1) 10Dzahn: mha: replace pmtpa with codfw? (and logging.pp) [puppet] - 10https://gerrit.wikimedia.org/r/173464 
[23:59:19] <grrrit-wm>	 (03CR) 10Dzahn: "logstash.pmtpa.wmflabs is gone. how about mha though?" [puppet] - 10https://gerrit.wikimedia.org/r/173464 (owner: 10Dzahn)