[00:00:04] RoanKattouw, ^d, marktraceur, MaxSem, RoanKattouw: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141114T0000). [00:00:56] * marktraceur deploys from the airport [00:01:01] No, wait. Never mind. [00:01:31] <^d> marktraceur: If you want to deploy from an airplane go for it :) [00:01:34] <^d> {{bebold}} [00:01:49] Airport, at least. [00:02:03] ^d: Though technically the rule applies to *ferries*, not airplanes. [00:02:45] Air Bud would approve of my loophole. [00:02:51] So would FDR. [00:03:07] "There's nothing in the rulebook about a President playing basketball." [00:04:35] !log demon Synchronized php-1.25wmf7/extensions/Echo: (no message) (duration: 00m 06s) [00:04:37] Logged the message, Master [00:04:44] !log demon Synchronized php-1.25wmf8/extensions/Echo: (no message) (duration: 00m 04s) [00:04:46] marktraceur: you would violate "don't leave town rule" :D [00:04:48] <^d> ebernhardson: ^ [00:04:48] Logged the message, Master [00:05:09] matanya: good point [00:05:20] ^d: OK my first pair of commits is sitting in Jenkins now, once those merge I can create the actual submodule updates [00:05:26] (03CR) 10Chad: [C: 032] Share parsoid cookie forwarding config for VE/Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173175 (owner: 10EBernhardson) [00:05:26] matanya: Wait, what if I weren't *in* town in the first place? [00:05:35] <^d> RoanKattouw: k thnx [00:05:43] Just wait until we're over the ocean [00:05:51] (03Merged) 10jenkins-bot: Share parsoid cookie forwarding config for VE/Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173175 (owner: 10EBernhardson) [00:06:24] marktraceur: invent a new rule, don't depoly when out of town [00:06:37] !log demon Synchronized wmf-config/: (no message) (duration: 00m 07s) [00:06:39] Logged the message, Master [00:06:43] <^d> ebernhardson: And the last of yours ^ [00:06:47] <^d> (you should be all done now) [00:06:54] thanks ^d [00:07:04] <^d> I haven't done yours yet! [00:07:05] <^d> :) [00:07:17] <^d> But now you're here, lemme merge [00:07:32] ^d: appears to work, thanks [00:07:34] (03CR) 10Chad: [C: 032] Enable Flow on some testwiki pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173196 (owner: 10Spage) [00:07:42] <^d> ebernhardson: Sweet, yw :) [00:07:42] (03Merged) 10jenkins-bot: Enable Flow on some testwiki pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173196 (owner: 10Spage) [00:08:00] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s) [00:08:02] Logged the message, Master [00:08:07] <^d> spagewmf: that's you ^ [00:08:22] ^d you are the greatest [00:08:28] <^d> I try :) [00:08:31] spage's law https://meta.wikimedia.org/wiki/User:SPage_%28WMF%29 [00:08:33] OK I'm tired of this crap, I'm gonna bypass Jenkins [00:08:49] <^d> RoanKattouw: Awww, I'm telling! [00:10:41] didnt jenkins just merge stuff up there a minute ago? [00:10:53] <^d> Other stuff :) [00:10:56] ah [00:11:22] cscott: around? [00:11:25] last famous words by RoanKattouw [00:12:12] ^d: https://gerrit.wikimedia.org/r/173206 for wmf7 [00:13:02] (03CR) 10Alexandros Kosiaris: "Daniel, yes that will be accomplished by changing the Listening IP address, not the port. No need really change the port. Point being, the" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn) [00:13:17] ^d: And https://gerrit.wikimedia.org/r/173208 for wmf8 [00:13:21] ^d: I'll go add them to the wiki page [00:13:52] (03CR) 10Alexandros Kosiaris: [C: 032] ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn) [00:14:49] !log demon Synchronized php-1.25wmf7/extensions/VisualEditor: (no message) (duration: 00m 05s) [00:14:51] Logged the message, Master [00:14:59] !log demon Synchronized php-1.25wmf8/extensions/VisualEditor: (no message) (duration: 00m 04s) [00:15:02] Logged the message, Master [00:15:03] <^d> RoanKattouw: ^^ [00:15:05] !log nickel - shutdown [00:15:07] Logged the message, Master [00:16:21] (03PS1) 10BryanDavis: logstash: Use doc_values for normalized_message.raw [puppet] - 10https://gerrit.wikimedia.org/r/173209 [00:17:25] <^d> Thanks for playing in today's SWAT. Today's grand prize goes to spagewmf, runner up RoanKattouw. [00:17:36] <^d> Tune in next week for more SWAT action and prizes. [00:18:01] ^d: what were the prizes :o [00:18:05] (03PS1) 10Spage: Fix typo in Flow-enable page name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173210 [00:18:29] <^d> JohnFLewis: I'm sure I can find some stickers or something in the supply closet :p [00:18:33] (03CR) 10BryanDavis: "Already applied on the beta and production clusters via curl. The copy in the puppet repo is just for setting up a brand new cluster. It a" [puppet] - 10https://gerrit.wikimedia.org/r/173209 (owner: 10BryanDavis) [00:18:40] Well I went to the raffle for a Broadway show today and didn't get in [00:18:52] MaxSem: are you hitting globalusage a lot again? [00:18:56] So there were potential prizes for me, I just didn't win any [00:19:05] ^d: I'll be sure to tune in next week as a contestant :D [00:19:06] ori, ??? [00:19:07] <^d> spagewmf: You need that ^? [00:19:16] ^d: dang, typo in one of those, can you deploy https://gerrit.wikimedia.org/r/173210 ? ( remember how great I said you were? ) [00:19:22] MaxSem: isn't globalusage the API endpoint the wikidata mobile stuff hits? [00:19:28] <^d> spagewmf: One moment. [00:19:36] (03CR) 10Chad: [C: 032] Fix typo in Flow-enable page name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173210 (owner: 10Spage) [00:19:39] RoanKattouw: I think you might have just taken first place ;) [00:19:44] (03Merged) 10jenkins-bot: Fix typo in Flow-enable page name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173210 (owner: 10Spage) [00:20:01] ori, labs [00:20:02] haha [00:20:03] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s) [00:20:06] Logged the message, Master [00:20:40] <^d> No, spagewmf still wins 1st today. He was super nice :) [00:20:47] <^d> Roan didn't have his patches ready, which was -10 points :( [00:20:57] <^d> (Although he got some back for making them up so quickly!) [00:21:52] ^d: what is the penalty for a typo? [00:22:07] <^d> Depends on my mood :) [00:22:17] <^d> And if it crashed the cluster. [00:23:10] so, if it Sets Wikis Ablaze, -100? :p [00:23:30] <^d> Unless I think it's funny, in which case it's like +200 :p [00:23:34] hmm, https://test.wikipedia.org/wiki/Wikipedia:Co-op/Mentorship_match just doesn't want to be a Flow board. [00:23:37] (03CR) 10Dzahn: [C: 032] remove nickel's public IP [dns] - 10https://gerrit.wikimedia.org/r/172819 (owner: 10Dzahn) [00:24:03] <^d> spagewmf: I purged it. [00:24:11] <^d> It shows up now for me. [00:25:34] ^d: I've run out of superlatives. [00:26:10] If I want to wfGetDB and specify metawiki in the third, wikiId parameter, should I use a literal 'mediawiki', or is there a lookup for this? [00:26:18] ^d are the Echo and VE deployment all done? [00:26:24] <^d> kaldari: Yessir. [00:26:26] thanks [00:26:37] <^d> awight: Not "mediawiki" but yes. [00:26:48] <^d> Although, might be best to make it configurable? [00:27:19] (03CR) 10Dzahn: [C: 031] Change ru.wikinews.org to HTTPS only. [puppet] - 10https://gerrit.wikimedia.org/r/173078 (owner: 10JanZerebecki) [00:27:23] ^d: is a "Wikipedia does not have a page with this exact name" not cached? I wonder why sometimes we need to purge when enabling Flow and sometimes not. [00:27:23] ^d: yah it's a global config variable. So... I meant 'metawiki'. Is that a valid wikiId? [00:27:40] <^d> Yeah, metawiki is valid for meta.wikimedia [00:27:50] ^d: great, thanks for confirming! [00:27:51] <^d> yw [00:27:58] (03CR) 10Dzahn: [C: 031] "after https://gerrit.wikimedia.org/r/#/c/173078/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173083 (owner: 10JanZerebecki) [00:32:06] (03PS1) 10Cmjohnson: Adding dns entries for new frack host bismuth [dns] - 10https://gerrit.wikimedia.org/r/173216 [00:33:04] (03CR) 10Cmjohnson: [C: 032] Adding dns entries for new frack host bismuth [dns] - 10https://gerrit.wikimedia.org/r/173216 (owner: 10Cmjohnson) [00:34:45] (03CR) 10Dzahn: "bismuth looks pretty cool http://en.wikipedia.org/wiki/Bismuth#mediaviewer/File:Wismut_Kristall_und_1cm3_Wuerfel.jpg" [dns] - 10https://gerrit.wikimedia.org/r/173216 (owner: 10Cmjohnson) [00:35:17] bismuth does look cool mutante [00:35:20] very colorful [00:35:50] :) i think i had one as a kid if i'm not mistaken [00:40:24] (03PS1) 10Kaldari: Updating A?B test start and end times for WikiGrok test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173219 [00:40:58] (03CR) 10Kaldari: [C: 032] Updating A?B test start and end times for WikiGrok test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173219 (owner: 10Kaldari) [00:41:06] (03Merged) 10jenkins-bot: Updating A?B test start and end times for WikiGrok test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173219 (owner: 10Kaldari) [00:42:17] !log kaldari Synchronized wmf-config/mobile.php: Update WikiGrok A/B test times (duration: 00m 03s) [00:42:23] Logged the message, Master [00:45:09] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3612 MB (10% inode=93%): /dev 32200 MB (99% inode=99%): /run 6403 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /srv 480540 MB (33% inode=99%): [00:49:27] !log logstash1003: migrating elasticsearch data to new raid volume [00:49:33] Logged the message, Master [00:51:58] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 307 seconds [00:52:08] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 43, uinitializing_shards: 1, unumber_of_data_nodes: 2} [00:52:43] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 43, uinitializing_shards: 1, unumber_of_data_nodes: 2} [00:52:53] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 362 seconds [00:53:24] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -1 seconds [00:54:14] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [00:54:52] ACKNOWLEDGEMENT - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 43, uinitializing_shards: 1, unumber_of_data_nodes: 2} Jeff Gage adding storage to logstash1003 [00:54:52] ACKNOWLEDGEMENT - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 43, uinitializing_shards: 1, unumber_of_data_nodes: 2} Jeff Gage adding storage to logstash1003 [00:55:07] jgage: yum! 5.5T is >> the ~300G we had before [00:55:19] yeah really, i'm stoked :D [00:55:38] hopefully this means i can reenable the hadoop firehose :) [00:55:59] I would hope so. [00:56:55] data is copying to the new partition in a screen session, i will check back in a bit. when it's done i'll mv /var/lib/elasticsearch{,.old} and mount the new raid on /var/lib/eleasticseach and start things back up [00:57:16] then later when we're satisifed we can nuke elasticsearch.old [00:57:37] *nod* [01:02:22] RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 62, initializing_shards: 1, number_of_data_nodes: 3 [01:02:53] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 62, initializing_shards: 1, number_of_data_nodes: 3 [01:04:03] !log kaldari Synchronized php-1.25wmf7/extensions/WikiGrok: (no message) (duration: 00m 03s) [01:04:08] Logged the message, Master [01:04:19] !log kaldari Synchronized php-1.25wmf7/extensions/MobileFrontend: (no message) (duration: 00m 05s) [01:04:22] Logged the message, Master [01:10:12] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3573 MB (10% inode=93%): /dev 32200 MB (99% inode=99%): /run 6403 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /srv 470272 MB (32% inode=99%): [01:15:12] RECOVERY - check_disk on lutetium is OK: DISK OK - free space: / 12484 MB (35% inode=93%): /dev 32200 MB (99% inode=99%): /run 6403 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /srv 468204 MB (32% inode=99%): [01:20:04] jgage: I think that puppet may have tried to be helpful and restarted elasticsearch on logstash1003 :( [01:36:02] (03PS1) 10Kaldari: Updating WikiGrok A/B test start and end times [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173228 [01:36:32] bd808: ah geez [01:36:41] yeah forgot it would do that [01:36:49] it's ok, i just have to start the copy again [01:36:53] This Time For Real [01:37:07] sudo puppet agent --disable "copying elasticsearch files" :) [01:37:25] yeah :) [01:37:34] scheduling maint in icinga first [01:38:02] (03CR) 10Kaldari: [C: 032] Updating WikiGrok A/B test start and end times [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173228 (owner: 10Kaldari) [01:38:10] (03Merged) 10jenkins-bot: Updating WikiGrok A/B test start and end times [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173228 (owner: 10Kaldari) [01:39:20] !log kaldari Synchronized wmf-config/mobile.php: updating WikiGrok A/B test times (duration: 00m 03s) [01:39:27] Logged the message, Master [01:40:16] ok, puppet is disabled and logstash1003 data is copying anew [01:40:43] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 1, uactive_shards: 44, uinitializing_shards: 0, unumber_of_data_nodes: 2} [01:41:03] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 20 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 1, uactive_shards: 44, uinitializing_shards: 0, unumber_of_data_nodes: 2} [01:43:26] ^ silenced [01:44:30] thx gage [01:47:17] (03PS3) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [01:48:18] we will never use glusterfs again? [01:48:24] in labs as project storage ? [01:48:34] * bd808 hopes not [01:48:49] gluster was teh suk [01:48:49] i see things like this: [01:48:56] default => 'projectstorage.pmtpa.wmnet', [01:49:09] just wondering how much to remove [01:49:24] off glusterfs? [01:49:27] I'd say all of it :) [01:49:33] we can always pick code back up from history if needed [01:49:35] 281 $gluster_server_name = $instanceproject ? { [01:49:40] 274 class ldap::client::autofs($ldapconfig) { [01:49:51] ok :) [01:52:05] (03Draft1) 10Dereckson: Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) [01:53:14] hmm [01:53:22] shinken isn't delivering mail nor is it sending to IRC [01:53:23] drafts, rare enough [01:54:16] YuviPanda: did it send to IRC before or first time? [01:54:29] nope, first time [01:54:36] but I guess the underlying notifications is broken somehow [01:55:00] doesn't like the shared notification commands from nagios_common yet? [01:55:24] (03PS4) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [01:55:26] not sure, still debugging [01:56:23] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [01:57:29] (03CR) 10Dereckson: [C: 031] Enable VisualEditor by default on Tagalog Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172993 (https://bugzilla.wikimedia.org/73365) (owner: 10Jforrester) [02:17:24] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 2, active_shards: 63, initializing_shards: 0, number_of_data_nodes: 3 [02:17:54] RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 2, active_shards: 63, initializing_shards: 0, number_of_data_nodes: 3 [02:18:01] !log LocalisationUpdate completed (1.25wmf7) at 2014-11-14 02:18:01+00:00 [02:18:02] :D [02:18:08] Logged the message, Master [02:19:04] !log logstash1003 elasticsearch migration to new raid0 complete [02:19:07] Logged the message, Master [02:28:11] !log progressively increasing load on mw1114, attempting to reproduce the previous overload [02:28:14] Logged the message, Master [02:30:35] !log LocalisationUpdate completed (1.25wmf8) at 2014-11-14 02:30:34+00:00 [02:30:37] Logged the message, Master [03:33:54] (03CR) 10Mattflaschen: "I don't think this should have been merged in less than 2 days without any input from the maintainers. Neither the person who originally" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172110 (owner: 10Nemo bis) [03:49:50] bd808: you around? [03:50:13] cscott: yeah. what's up? [03:50:55] i was experimenting with https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Converting_a_host_to_use_local_puppetmaster_and_salt_master on deployment-pdf02 [03:51:37] there's a stack of cherry-picked puppet patches on deployment-salt.eqiad.wmflabs, and the top one is yours and doesn't rebase cleanly [03:51:39] Maybe not the best place. [03:52:00] deployment-pdf02 isn't actually in use for anything, it's just a spare clone of the ocg setup [03:52:25] my goal is to test some puppet changes, so it's good place to do so [03:52:46] cf hashar's comments on https://gerrit.wikimedia.org/r/170130 [03:53:22] at any rate, i thought i should poke you about rebasing your cherry-pick at some point [03:53:47] but i've moved on to https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster in the meantime, so that i don't have to deal with a shared puppetmaster [03:53:59] Yeah. I thought Yuvi was going to fix that. Looking now [04:00:11] cscott: Conflict resolved. Thanks for the poke [04:02:19] it seems to be working, although i had to manually sudo service puppetmaster start on deployment-pdf02 -- but it's not picking up the ocg_*override values from https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=configure&project=deployment-prep&instanceid=54c66f88-4c39-487b-802b-2eec751f4300®ion=eqiad [04:02:40] (this is re: the self-hosted puppetmaster, which i switched to instead of using the labs puppetmaster) [04:03:25] !log logstash1002 migration to new md0 complete [04:03:28] how are those configuration values actually exported to puppet? it seems to be using the configuration values for the production servers instead of the ones configured for deployment-pdf02, i'm not sure how that could get switched up [04:03:35] Logged the message, Master [04:03:56] bd808, one more to go! but now it is dinnertime. [04:04:05] jgage: awesome! [04:04:38] cscott: I'm not sure how the wikitech -> puppet magic works. :/ [04:05:50] cscott: It has something to do with ldap. Those values are stored in ldap and then injected into the puppet run [04:07:43] well, i switched from self-hosted back to labs-hosted puppet, and now the configuration values magically shifted to the correct values again [04:07:58] so i guess it's something subtly wrong with the self-hosted puppet configuration [04:10:18] labs default or the beta cluster puppet? Because the beta puppet master is self-hosted with the normal role [04:10:52] deployment-pdf02 matches https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Converting_a_host_to_use_local_puppetmaster_and_salt_master now [04:11:06] which i believe is using deployment-salt as its puppetmaster [04:11:13] *nod* [04:11:42] Which is a role::puppet::self instance itself [04:12:24] it is also running a puppetmaster on localhost, but when i try to use that (by blanking the 'puppetmaster' and 'deployment_server_override' settings on the instance's config page), puppet runs but it uses the wrong configuration values [04:13:26] Are you tweaking a role that is used on the pdf01 host? [04:14:25] It might be easiest to disable puppet on pdf01 (puppet agent --disable "testing config changes on pdo02") and then try your patches via deployment-salt [04:14:41] not at the moment; https://gerrit.wikimedia.org/r/#/c/170130/5/manifests/role/ocg.pp adds a new role::ocg::beta role but pdf01 currently uses the role::ocg::production [04:15:05] oh even easier, just cherry-pick to deployment-salt and test away [04:15:49] bd808: yeah, that was my original plan, until i ran into the rebase conflict. then i thought that self-hosting would be even better since i didn't risk accidentally breaking anything for anyone else. [04:16:00] but since i can't seem to get the self-hosting to work, i guess i'm back to plan A ;) [04:16:14] meh. we can fix it if it breaks. reflog to the rescue [04:16:57] incidentally, switching to self-hosting gives this after the first puppet run: [04:16:57] Error: /Stage[main]/Puppet::Self::Master/Service[puppetmaster]: Failed to call refresh: Could not start Service[puppetmaster]: Execution of '/etc/init.d/puppetmaster start' returned 1: [04:16:57] Error: /Stage[main]/Puppet::Self::Master/Service[puppetmaster]: Could not start Service[puppetmaster]: Execution of '/etc/init.d/puppetmaster start' returned 1: [04:17:16] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 14 04:17:16 UTC 2014 (duration 17m 15s) [04:17:16] you need to manually run `sudo service puppetmaster start` to get it going [04:17:21] Logged the message, Master [04:17:41] (since we were spitballing about upstart and systemd and rc.d earlier today) [04:18:02] yeah that started happening when we switched to puppet3. I haven't bothered to dig into it. Seems like an Ops problem. :) [04:18:56] cscott: You can add the checkbox to apply your new role via -- https://wikitech.wikimedia.org/wiki/Special:NovaPuppetGroup [04:20:26] (03CR) 10BryanDavis: "I would really like to see a trebuchet porcelain created that can be used to automate trebuchet deploys rather than more custom deployment" [puppet] - 10https://gerrit.wikimedia.org/r/170130 (owner: 10Cscott) [04:21:09] bd808: trebuchet porcelain is a nice mental image, but i really don't know what it means ;) [04:21:40] a frontend script. I think I picked up the term from git [04:21:57] all the cli stuff in git is called porcelain. [04:22:17] http://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain [04:22:37] bd808: the puppet patch in question just adds the appropriate keys & permission bits to allow jenkins to reach over and run jobs on the host [04:23:06] yeah. That's the custom deploy setup like parsoid uses that I don't like. [04:23:22] bd808: i haven't written the patch to do the actual deploy yet, that could be standard git-deploy [04:23:31] nope [04:23:39] you can't script git-deploy [04:23:48] or if you can you're a wizard [04:24:31] i'm saying that i think the part you're objecting to is not actually the puppet part, which is firewall and ssh stuff afaik, but rather https://gerrit.wikimedia.org/r/#/c/170030/1/jjb/beta.yaml [04:24:49] which is currently copied over from parsoid and uses rsync (yuck) [04:25:05] that's the part I'm guessing you're rather see use trebuchet or some other wizardry [04:26:09] the puppet part just makes the "node: deployment-ocg-{datacenter}" part of 170030 work -- that is, ensures that this task (whether it's trebuchet or whatever) is running on a specific deploy host. [04:26:14] Yeah, but you wouldn't need this part either. scap runs via jenkins job that runs on deployment-bastion. This *should* work the same way except trebuchet is not scripting friendly [04:26:55] The script running on deployment-prep should be just like you were doing a deploy yourself [04:27:05] and not need any funny config on the target hosts [04:29:04] bd808: you're talking about the beta-scap-{datacenter} job in integration-config? [04:29:15] yeah [04:29:46] node: deployment-bastion-{datacenter} will presumably already suffice to make those commands run on bastion-eqiad.wmflabs.org ? [04:31:10] yup. but then you have the "how does one script git-deploy" problem. There is no unattended mode for it. [04:32:07] It requires a human to read the N/M hosts … messages and decide to continue to wait or advance to the next step [04:33:21] Ryan assures me it's an "easy fix" but it is still undone, and I don't have time to do it. :( [04:33:51] apt-cache show expect [04:33:59] ewwww [04:34:14] that would be a horrible expect script [04:34:26] all expect scripts are horrible :) [04:34:29] the app needs to be fixed [04:34:45] a deploy tool that can't be automated is not a deploy tool [04:34:49] so git-deploy uses trebuchet to actually put the code on the servers? [04:34:56] yeah [04:35:44] reading git-deploy it says, "The main feature of this tool is not actually doing rollouts, it's doing reverts. [...] One thing it definitely doesn't do is worry about how your code gets copied around to your production servers, that's completely up to you. [...] git-deploy solves the problem of making your deployment history available in a distributed way to everyone with a Git checkout, as well as making sure that there's an exclusive lock o [04:35:58] so it sounds like git-deploy is really the wrong tool for the job of doing automated deploys [04:36:13] and we probably want to be using trebuchet directly. which is i guess what you've been saying. [04:36:24] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [04:36:57] !log on mw1114 restarting hhvm [04:37:00] Logged the message, Master [04:42:25] cscott: Our git-deploy is some version of https://github.com/trebuchet-deploy/trigger. And yeah it's the porcelain for running the trebuchet plumbing. [04:43:29] https://github.com/trebuchet-deploy/trigger/blob/master/trigger/drivers/trebuchet/local.py shows the low-level commands invoked [04:44:59] cscott: And https://github.com/trebuchet-deploy/trigger/blob/master/trigger/drivers/trebuchet/local.py#L103-L124 is the problematic bit for scripting [04:45:09] and its use of subprocess.Popen appalls me [04:45:43] try putting a single quote in your repo name for a good time: https://github.com/trebuchet-deploy/trigger/blob/master/trigger/drivers/trebuchet/local.py#L90 [04:46:04] ugh [04:46:21] # TODO (ryan-lane): Check return values from these commands [04:49:17] use the subprocess module, python people. otherwise you are in a state of sin. [04:49:25] or working for a ride-sharing company. [04:49:28] one or the other [04:49:39] Or porting perl to python? [04:49:56] anyhow, have fun with your experiments [04:50:23] * bd808 goes back to watching Cal get thumped [04:52:56] i might play around with deploying via "sudo salt-call -l quiet publish.runner deploy.fetch $REPO ; sudo salt-call -l quiet publish.runner deploy.checkout $REPO ; sudo salt-call -l quiet --out=json publish.runner deploy.restart $REPO", since that's what it looks like `git-deploy sync ; git-deploy service restart` is doing under the hood [04:53:40] All that sudo is part of what I don't like about using salt [04:54:43] But yeah that might be the way to automate it [04:54:53] i might also submit a PR to rip out the string arguments to Popen and replace them with proper arrays :-/ [04:55:07] crazy pants! [04:56:19] i'm afraid the N/M hosts... messages are a inherent part of trebuchet, however :( [04:56:43] down in https://github.com/trebuchet-deploy/trigger/blob/master/trigger/drivers/trebuchet/local.py#L290 we are reading that information from redis (!) after the salt-call command returns [04:56:52] they are inherent in the async-ness of salt [04:57:21] and the redis returner and the way that it never knows for sure how many hosts there really should be [04:57:49] parsoid's rsync 1-liner is looking better and better [04:58:25] The salt master knows but the redis cache of hosts requires manual pruning -- https://wikitech.wikimedia.org/wiki/Trebuchet#Removing_minions_from_redis [05:05:40] another option is, if i'm hacking git-deploy to make the subprocess argument handling sane, to add an --auto or -y option that shortcuts the _ask method with some reasonable logic (a maximum of X times wait Y seconds for self._report_driver.report_sync to report all complete, or else retry and loop back to the top, a maximum of Z times). [06:28:43] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:53] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:53] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:26] (03CR) 10Catrope: "This was scheduled to be deployed in the SWAT deploy about 6 hours ago at 00:00 UTC, but it was overlooked because of a problem with other" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172993 (https://bugzilla.wikimedia.org/73365) (owner: 10Jforrester) [06:45:54] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:45] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:49:23] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: Puppet has 1 failures [07:07:44] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:20:24] so any chance of adding .mobi format to the render engine and sticking the option in the beta ?? [07:34:03] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 19 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 44, uinitializing_shards: 0, unumber_of_data_nodes: 2} [07:34:14] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 19 threshold =0.1% breach: {ustatus: ured, unumber_of_nodes: 2, uunassigned_shards: 19, utimed_out: False, uactive_primary_shards: 32, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 44, uinitializing_shards: 0, unumber_of_data_nodes: 2} [07:34:57] bah, didn't schedule my maintenance for long enough [07:34:59] almost done [07:37:14] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 2, active_shards: 63, initializing_shards: 0, number_of_data_nodes: 3 [07:37:33] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 2, active_shards: 63, initializing_shards: 0, number_of_data_nodes: 3 [07:38:36] !log logstash hosts: elasticsearch moved to bigger disks [07:38:44] Logged the message, Master [07:45:51] (03CR) 10Nemo bis: "Temporarily... for two months?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172110 (owner: 10Nemo bis) [07:58:32] (03PS3) 10Giuseppe Lavagetto: puppet: get rid of the nagios_group global variable [puppet] - 10https://gerrit.wikimedia.org/r/172531 [08:01:10] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet: get rid of the nagios_group global variable [puppet] - 10https://gerrit.wikimedia.org/r/172531 (owner: 10Giuseppe Lavagetto) [08:05:12] <_joe_> ach sorry, I am going to leave this unmerged for a few minutes [08:06:03] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [08:10:54] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [08:10:54] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [08:11:54] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [08:11:54] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [08:13:32] (03CR) 10Nikerabbit: [C: 031] Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson) [08:14:55] <_joe_> meh I forgot to apply the new hiera lib in the puppet compiler [08:25:34] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [09:04:15] PROBLEM - puppet last run on amslvs2 is CRITICAL: CRITICAL: puppet fail [09:04:38] uh. why is redirects.conf and dat in both puppet and apache-config repo? [09:04:47] it looks like the one in apache-config is not in sync [09:14:03] !log Zuul is flapping [09:14:08] Logged the message, Master [09:15:25] <_joe_> Glaisher: in the apache-config repo there is a file explaining it's dismissed [09:16:01] <_joe_> https://github.com/wikimedia/operations-apache-config/blob/master/README_BEFORE_EDITING [09:16:03] !log Zuul is back [09:16:06] Logged the message, Master [09:16:21] ah [09:16:35] <_joe_> hey hashar [09:21:24] (03PS1) 10Filippo Giunchedi: gdash: fix parser cache gdash [puppet] - 10https://gerrit.wikimedia.org/r/173246 [09:22:32] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: fix parser cache gdash [puppet] - 10https://gerrit.wikimedia.org/r/173246 (owner: 10Filippo Giunchedi) [09:22:44] RECOVERY - puppet last run on amslvs2 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [09:24:13] <_joe_> godog: we should really think about changing the servers. hierarchy [09:24:34] <_joe_> godog: right now it's impossible to aggregate data per server cluster, and we need it [09:24:56] <_joe_> I'm willing to lose all current stats there, I'm still the only one using them anyway [09:25:19] what makes you say that? :) [09:25:24] <_joe_> so my idea is servers.$hostname.$metric [09:25:31] anyways yes per-cluster would be nice [09:25:33] <_joe_> is what we have now [09:26:05] <_joe_> and I thought about servers.$cluster.$site.$metric [09:26:22] <_joe_> sorry servers.$cluster.$site.$server.$metric [09:27:39] what happens when we change a machine's cluster? [09:28:23] <_joe_> it's still aggregation [09:28:43] <_joe_> so servers.*.eqiad.mw1117.$metric [09:29:01] <_joe_> they will show as separate entries, but you still have history [09:29:34] (03PS1) 10Glaisher: Add DNS for mul.wikisource.org [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) [09:35:12] _joe_: that or aggregate into cluster metrics while we ingest the metrics [09:35:31] <_joe_> godog: on graphite? [09:35:41] <_joe_> er, in statsd? [09:36:10] in graphite yeah, carbon-relay [09:37:56] <_joe_> so we'd need to keep yet another list of server-cluster associations [09:40:51] <_joe_> or, we collect metrics in the form I described above, and we do 2 aggregations, one per-server and one per-cluster, and we discard the original metrics [09:41:00] <_joe_> (by using a very short retention) [09:41:16] yeah the problem specifically is that it doesn't seem possible given a cluster to have a list of hosts that are part of that cluster right now [09:41:28] <_joe_> do you think this may work? [09:42:01] <_joe_> godog: this is easily changed - I can write a small script that can give us that information via HTTP [09:42:20] <_joe_> what about that? [09:42:35] yeah that seems useful to me regardless of this issue [09:43:18] <_joe_> but what about the solution I just proposed? [09:43:22] <_joe_> that would be: [09:43:55] <_joe_> - diamond sends back metrics in the form servers....metric [09:44:00] (03PS1) 10Hashar: Drop role::zuul::labs [puppet] - 10https://gerrit.wikimedia.org/r/173248 [09:44:18] <_joe_> - these metrics are stored with a very short retention time [09:44:47] <_joe_> - We aggregate those to ..metric [09:45:06] <_joe_> and to server..metric [09:45:24] <_joe_> and these two metrics will have the normal retention rules [09:45:35] (03CR) 10Hashar: [C: 031 V: 032] "Cherry picked on integration puppetmaster (labs). The integration-zuul-server instances already uses role::zuul::merger and role::zuul::s" [puppet] - 10https://gerrit.wikimedia.org/r/173248 (owner: 10Hashar) [09:45:41] <_joe_> do you think this could work? [09:45:49] <_joe_> or it sounds like an horrible hack? [09:47:29] _joe_: I think it might work, but needs some testing, e.g. with https://github.com/grobian/carbon-c-relay [09:52:30] (03PS1) 10Glaisher: Redirect mul.wikisource.org to wikisource.org [puppet] - 10https://gerrit.wikimedia.org/r/173250 (https://bugzilla.wikimedia.org/73407) [09:56:24] (03PS2) 10Glaisher: Add DNS for mul.wikisource.org [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) [09:57:13] RECOVERY - swift-object-replicator on ms-be2010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [09:57:30] ACKNOWLEDGEMENT - puppet last run on ms-be2005 is CRITICAL: CRITICAL: Puppet has 1 failures Filippo Giunchedi pending disk change [10:00:57] (03PS1) 10Hashar: role::ci::website::labs [puppet] - 10https://gerrit.wikimedia.org/r/173251 [10:20:10] (03PS2) 10Hashar: role::ci::website::labs [puppet] - 10https://gerrit.wikimedia.org/r/173251 [10:35:03] <_joe_> I updated https://wikitech.wikimedia.org/wiki/Puppet_Hiera with info about the new "regex matching" feature I introduced [10:35:13] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [10:35:29] <_joe_> I'll take a look at making https://wikitech.wikimedia.org/wiki/Puppet_coding modern and relevant too [10:36:00] <_joe_> then I'll start to -1 and unjustified refusal to handle things with hiera when needed [10:37:31] <_joe_> "Our code is, as of July 2013, in transition [10:37:49] <_joe_> from a system of global manifests to a system of modules and roles." [10:37:53] <_joe_> eheh [10:40:13] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [10:45:17] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [10:50:14] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [10:55:18] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [11:00:14] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [11:05:09] (03CR) 10Alexandros Kosiaris: "Daniel, my comments have actually not been addressed. Patch sets 13 and 14 have no diff. Gabriel, could you have a look please ?" [puppet] - 10https://gerrit.wikimedia.org/r/167213 (owner: 10GWicke) [11:05:11] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [11:10:10] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [11:14:53] hehe [11:14:54] "Our historical take on role classes was 'do not parametrize, use node-scope variables to configure'." [11:14:59] that's not true I think [11:15:09] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [11:15:34] unless it's a more recent development while I haven't been paying attention much [11:15:51] but our ORIGINAL use case of node-scope variables was because puppet didn't even have class parameters yet :) [11:20:09] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [11:21:01] (03CR) 10JanZerebecki: [C: 031] ssh server: make listening port configurable [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn) [11:21:55] (03CR) 10JanZerebecki: [C: 031] ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn) [11:25:09] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [11:30:13] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: No Slave SQL: No Seconds Behind Master: (null) [11:35:07] (03PS2) 10JanZerebecki: ssh server: make PermitRootLogin configurable [puppet] - 10https://gerrit.wikimedia.org/r/172804 (owner: 10Dzahn) [11:42:19] (03CR) 10Alexandros Kosiaris: [C: 04-1] ssh server: make ListenAddress configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn) [12:01:24] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (11.11%) [12:03:39] (03CR) 10Alexandros Kosiaris: [C: 032] rrdcached tuning [puppet] - 10https://gerrit.wikimedia.org/r/173032 (owner: 10Alexandros Kosiaris) [12:05:19] (03PS4) 10Giuseppe Lavagetto: varnish: make varnish::instance not depend on ganglia [puppet] - 10https://gerrit.wikimedia.org/r/172967 [12:07:24] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [12:16:49] (03PS5) 10Giuseppe Lavagetto: varnish: make varnish::instance not depend on ganglia [puppet] - 10https://gerrit.wikimedia.org/r/172967 [12:26:45] (03CR) 10Filippo Giunchedi: [C: 031] Redirect mul.wikisource.org to wikisource.org [puppet] - 10https://gerrit.wikimedia.org/r/173250 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher) [12:31:44] (03CR) 10Giuseppe Lavagetto: [C: 032] varnish: make varnish::instance not depend on ganglia [puppet] - 10https://gerrit.wikimedia.org/r/172967 (owner: 10Giuseppe Lavagetto) [12:35:39] (03PS3) 10Giuseppe Lavagetto: role::cache: make ganglia inclusion optional [puppet] - 10https://gerrit.wikimedia.org/r/172974 [12:44:18] (03PS4) 10Giuseppe Lavagetto: role::cache: make ganglia inclusion optional [puppet] - 10https://gerrit.wikimedia.org/r/172974 [12:49:50] (03CR) 10Giuseppe Lavagetto: [C: 032] role::cache: make ganglia inclusion optional [puppet] - 10https://gerrit.wikimedia.org/r/172974 (owner: 10Giuseppe Lavagetto) [13:03:44] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "While the patch mostly makes sense, before merging we need to:" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/172418 (owner: 10Ori.livneh) [13:35:48] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Puppet has 1 failures [13:37:21] (03PS4) 10JanZerebecki: ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn) [13:39:26] (03PS5) 10JanZerebecki: ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn) [13:40:06] (03CR) 10JanZerebecki: ssh server: make ListenAddress configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn) [13:42:53] (03PS3) 10JanZerebecki: ssh server: make PermitRootLogin configurable [puppet] - 10https://gerrit.wikimedia.org/r/172804 (owner: 10Dzahn) [13:53:19] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [14:03:19] PROBLEM - CI: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: integration.integration-puppetmaster.diskspace._var.byte_avail.value (12.50%) [14:16:12] (03PS1) 10Eranroz: Removing special wgAccountThrottle for hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173273 [14:24:00] (03PS1) 10Hashar: zuul: update layout_config path [puppet] - 10https://gerrit.wikimedia.org/r/173276 [14:25:55] dear ops, I could really use a merge of https://gerrit.wikimedia.org/r/#/c/173276/ :-D [14:35:38] !log cr1-ulsfo: setting up BGP with new transit provider [14:35:45] Logged the message, Master [14:43:00] (03CR) 10Ottomata: "Links about hiera (and role classes)" [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [14:49:31] (03CR) 10Andrew Bogott: [C: 032] zuul: update layout_config path [puppet] - 10https://gerrit.wikimedia.org/r/173276 (owner: 10Hashar) [14:56:34] (03CR) 10Ottomata: "> convert all calls to the varnishkafka class" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/172418 (owner: 10Ori.livneh) [14:59:30] (03CR) 10Hashar: "Thank you very much. I have applied puppet on the production and labs Zuul servers. Works like a charm :-)" [puppet] - 10https://gerrit.wikimedia.org/r/173276 (owner: 10Hashar) [15:39:49] is someone messing with analytics1003? jgage? [15:46:04] (03CR) 10QChris: "I cleaned up /var/log/eventlogging, so this change should" [puppet] - 10https://gerrit.wikimedia.org/r/172884 (owner: 10QChris) [15:46:48] !log analytics1003 (a cisco) is acting crazy, stuck in some loop while trying to boot. Am attempting to fix with power cycle [15:46:50] Logged the message, Master [15:47:24] (03PS3) 10Ottomata: Link EventLogging logs into /var/log/eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/172884 (owner: 10QChris) [15:51:27] RECOVERY - Host analytics1003 is UP: PING OK - Packet loss = 0%, RTA = 1.09 ms [15:53:47] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Puppet has 1 failures [15:54:03] * cmjohnson hates the cisco servers [15:55:29] yeah, taht worries me, cmjohnson. i upgraded it to trusty yesterday [15:55:31] it was fine [15:55:48] but this morning it says it had been down for 16h. for an03 this is fine, as it is not a prod machine [15:55:53] yeah..they just stop working for unexplainable reasons [15:55:54] but i will have to do this to an04 and an10 [15:56:43] !log upgrading analytics1024 to trusty [15:56:47] Logged the message, Master [15:57:04] (03CR) 10GWicke: "Argh, crap. It looks like I lost most of the patch in a git stash merge conflict. Will fix in a follow-up." [puppet] - 10https://gerrit.wikimedia.org/r/167213 (owner: 10GWicke) [16:09:58] (03CR) 10Ottomata: [C: 032] Link EventLogging logs into /var/log/eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/172884 (owner: 10QChris) [16:10:07] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: puppet fail [16:11:32] (03PS1) 10GWicke: Re-do several lost fixes in restbase module [puppet] - 10https://gerrit.wikimedia.org/r/173287 [16:11:52] akosiaris: ^^ [16:18:22] gwicke: thanks. I did a minor change and I will merge. Really thanks for following up [16:18:53] (03PS2) 10Alexandros Kosiaris: Re-do several lost fixes in restbase module [puppet] - 10https://gerrit.wikimedia.org/r/173287 (owner: 10GWicke) [16:19:08] PROBLEM - Host analytics1003 is DOWN: PING CRITICAL - Packet loss = 100% [16:19:31] akosiaris: thank you for the careful review! [16:19:47] gwicke: btw, I wanna do that https://rt.wikimedia.org/Ticket/Display.html?id=8529 . Do we need to keep the data ? aka (all of them together? one at a time ?) [16:20:00] something in between ? [16:20:21] akosiaris: all of them would be great [16:20:36] simultaneously ? cool ! [16:20:43] they are pure test hosts [16:20:56] yeah I know, just making sure [16:21:05] the tricky bit is the disk configuration [16:21:23] (03CR) 10Alexandros Kosiaris: [C: 032] "Thanks for following up!" [puppet] - 10https://gerrit.wikimedia.org/r/173287 (owner: 10GWicke) [16:21:36] which is different from the previous one, with the SSDs in a RAID-0 [16:23:17] RECOVERY - Host analytics1003 is UP: PING OK - Packet loss = 0%, RTA = 1.82 ms [16:25:35] akosiaris: https://gerrit.wikimedia.org/r/#/c/173287/1..2/modules/eventlogging/manifests/init.pp looks odd [16:25:46] gmail is tossing wikipedia emails in my spam folder now, and it never did before. did we change something about our mailing setup? [16:26:57] I don't see that [16:27:01] gwicke: ^ [16:27:19] gerrit's internal issues with diffing across rebases ? [16:28:05] akosiaris: yeah, gerrit fooled me with the rebase diff [16:28:12] nm [16:28:54] what's the reasoning behind specifying file modes with a leading zero? [16:29:30] resetting possible sticky, suid, guid bits [16:29:47] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:30:27] akosiaris: I see, so for the case that those were set outside of puppet [16:30:36] exactly [16:30:47] k, makes sense [16:31:51] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Puppet has 1 failures [16:35:09] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:36:03] (03PS1) 10Andrew Bogott: Removed some obsolete roles. [puppet] - 10https://gerrit.wikimedia.org/r/173293 [16:36:05] (03PS1) 10Andrew Bogott: Move openstack_version and use_neutron into hiera [puppet] - 10https://gerrit.wikimedia.org/r/173294 [16:36:28] RECOVERY - CI: Low disk space on /var on labmon1001 is OK: OK: All targets OK [16:40:19] !log Increased replica count from 0 to 2 for all logstash elasticsearch indices. Expect icinga warnings as replicas are populated. [16:40:22] Logged the message, Master [16:41:06] 5.5T of storage on each logstash host now. Log all the things! [16:41:18] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 60 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 54, utimed_out: False, uactive_primary_shards: 41, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 63, uinitializing_shards: 6, unumber_of_data_nodes: 3} [16:41:29] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 60 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 54, utimed_out: False, uactive_primary_shards: 41, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 63, uinitializing_shards: 6, unumber_of_data_nodes: 3} [16:41:50] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 60 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 54, utimed_out: False, uactive_primary_shards: 41, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 63, uinitializing_shards: 6, unumber_of_data_nodes: 3} [16:46:36] (03CR) 10Alexandros Kosiaris: [C: 032] ssh server: make ListenAddress configurable [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn) [16:52:50] (03PS1) 10Alexandros Kosiaris: Change xenon,cerium,praseodium raid scheme [puppet] - 10https://gerrit.wikimedia.org/r/173307 [16:55:06] (03CR) 10Gage: [C: 032] logstash: Use doc_values for normalized_message.raw [puppet] - 10https://gerrit.wikimedia.org/r/173209 (owner: 10BryanDavis) [16:57:46] (03PS17) 10GWicke: Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 [16:58:27] (03CR) 10jenkins-bot: [V: 04-1] Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [16:58:47] (03CR) 10Alexandros Kosiaris: [C: 032] Change xenon,cerium,praseodium raid scheme [puppet] - 10https://gerrit.wikimedia.org/r/173307 (owner: 10Alexandros Kosiaris) [17:01:41] (03PS18) 10GWicke: Add a simple restbase::labs role [puppet] - 10https://gerrit.wikimedia.org/r/171741 [17:05:59] PROBLEM - Host ms-be2005 is DOWN: CRITICAL - Plugin timed out after 15 seconds [17:10:08] RECOVERY - check_mysql on lutetium is OK: Uptime: 612941 Threads: 1 Questions: 9371930 Slow queries: 20893 Opens: 1578 Flush tables: 2 Open tables: 64 Queries per second avg: 15.290 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [17:16:11] (03CR) 10Ottomata: Add a simple restbase::labs role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [17:19:40] (03CR) 10GWicke: Add a simple restbase::labs role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [17:22:01] ottomata: the question is mostly about per-role vs. per-host parameters in hiera [17:26:38] gwicke: i am not sure of the details, but the point of hiera is to hierachically assign variables [17:26:45] hierarchically? [17:26:45] * [17:27:13] in labs, in particular, i think yuvi is making it so you can set them by project [17:27:35] hierarchachaiciclclcy [17:27:37] in production, they can be set on a node level, a site (datacenter level), or other [17:27:40] node level too [17:27:56] it depends on which file they are defined in....i think, but the variable names remain the same [17:28:10] https://wikitech.wikimedia.org/wiki/Puppet_Hiera [17:29:22] gwicke: read that ^ i think it explains pretty well [17:29:49] The wikitech backend works now too -- https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [17:30:24] ottomata: so, the mechanism for a cluster would be the fqdn regexp group? [17:30:29] RECOVERY - Host ms-be2005 is UP: PING OK - Packet loss = 0%, RTA = 43.16 ms [17:30:46] <^d> What would be the best way to run a cron once a week on all wikis. But like staggered...I don't want to just run `foreachwiki` every Sunday or some crap. [17:31:24] Hmmm... like for each wiki sometime during the week? [17:32:00] gwicke: will you have more than one restbase cluster in production? [17:32:07] ^d: 200+ cron jobs :( [17:32:08] in production/eqiad? [17:32:23] ottomata: that's not unlikely [17:32:39] <^d> bd808: Yeah :( [17:33:01] gwicke: then they would probably have different role classes, as a role clas would be meant to configure a particular usage of restbase module [17:33:03] so, dunno [17:33:16] and you *could* use the mainrole/ level to configure them then [17:33:31] but, the mainrole would be configured by regexes, hm [17:33:58] i would really like to get _joe_'s input here [17:34:12] <^d> bd808: We have almost 900 wikis. [17:34:32] <^d> I don't think we should add that many cron entries to terbium. [17:34:37] will a site.pp regexp group automatically set up a corresponding hiera regexp group? [17:34:38] ^d: I think I'd ask Reedy to help figure something out that's better than that. There's got to be a way. [17:37:33] gwicke: no, i don't think so [17:38:44] ottomata: hmm; that strikes me as potentially ugly, as we'll have to manually keep the two regexp groups in sync [17:38:52] i agree [17:38:56] What crons? [17:39:07] maintenance.pp has examples of crons by groups of wikis [17:39:28] it'd be better if you could set mainrole in puppet. [17:39:59] gwicke: what is your use case for having multiple restbase clusters in eqiad? [17:41:26] ottomata: at the cassandra level we'll eventuallly want to have separate clusters for groups of wikis [17:42:30] for isolation, independent DC fail-over, offline stuff in the same DC [17:44:31] (03PS1) 10Mark Bergsma: Allocate labstore200[12] mgmt IPs [dns] - 10https://gerrit.wikimedia.org/r/173313 [17:46:23] ottomata: that doesn't necessarily have to mean separate restbase clusters too, but it might be easier to implement it that way [17:56:58] PROBLEM - puppet last run on analytics1024 is CRITICAL: CRITICAL: Puppet has 1 failures [17:57:20] (03CR) 10Mark Bergsma: [C: 032] Allocate labstore200[12] mgmt IPs [dns] - 10https://gerrit.wikimedia.org/r/173313 (owner: 10Mark Bergsma) [17:59:50] PROBLEM - Host xenon is DOWN: PING CRITICAL - Packet loss = 100% [18:01:07] !log reimaging xenon [18:01:11] Logged the message, Master [18:03:27] (03PS1) 10Papaul: Revert "Allocate labstore200[12] mgmt IPs" [dns] - 10https://gerrit.wikimedia.org/r/173318 [18:04:03] gi11es: do you know anything about this? https://bugzilla.wikimedia.org/show_bug.cgi?id=69362 [18:05:00] RECOVERY - Host xenon is UP: PING OK - Packet loss = 0%, RTA = 1.01 ms [18:07:19] PROBLEM - SSH on xenon is CRITICAL: Connection refused [18:07:28] PROBLEM - puppet last run on xenon is CRITICAL: Connection refused by host [18:07:29] PROBLEM - RAID on xenon is CRITICAL: Connection refused by host [18:07:39] PROBLEM - check configured eth on xenon is CRITICAL: Connection refused by host [18:07:43] PROBLEM - Disk space on xenon is CRITICAL: Connection refused by host [18:07:52] PROBLEM - check if salt-minion is running on xenon is CRITICAL: Connection refused by host [18:07:52] PROBLEM - DPKG on xenon is CRITICAL: Connection refused by host [18:07:58] PROBLEM - check if dhclient is running on xenon is CRITICAL: Connection refused by host [18:10:03] (03PS1) 10Ottomata: Update zookeeper version to reflect upgraded version after Trusty upgrade [puppet] - 10https://gerrit.wikimedia.org/r/173321 [18:11:04] (03CR) 10Ottomata: [C: 032] Update zookeeper version to reflect upgraded version after Trusty upgrade [puppet] - 10https://gerrit.wikimedia.org/r/173321 (owner: 10Ottomata) [18:12:29] RECOVERY - puppet last run on analytics1024 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [18:13:59] (03PS1) 10Giuseppe Lavagetto: nagios: convert monitor_service to monitoring::service [puppet] - 10https://gerrit.wikimedia.org/r/173322 [18:17:25] (03CR) 10Papaul: [C: 031] Revert "Allocate labstore200[12] mgmt IPs" [dns] - 10https://gerrit.wikimedia.org/r/173318 (owner: 10Papaul) [18:19:39] PROBLEM - NTP on xenon is CRITICAL: NTP CRITICAL: No response from NTP server [18:23:49] RECOVERY - SSH on xenon is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [18:26:11] gwicke: aye. I've added _joe_as reviewer to that [18:26:32] ottomata: k, thx [18:27:03] so, ja, i think if you have multiple clusters, you will have multiple roles...at least as sublcasses [18:27:23] hm, or maybe not...if the only thing that is different about them are module parameters [18:27:45] but, either way, you don't need to parameterize the role class, and you don't need a realm based (i.e. ::labs) specific role [18:28:14] for now, since you are testing in labs, and are just trying to get a single eqiad test cluster up, you can configure this with yaml [18:28:33] in labs, via the ProjectName:Hiera editor, and in produciton i think, in eqiad.yaml [18:28:43] ottomata: I'm still waiting for the cassandra module to be merged [18:29:06] that should make testing in beta labs easier [18:29:51] (03CR) 10Ottomata: "Giuseppe, can we get your advice on this? Gabriel possibly will want to have multiple RESTbase clusters in production eqiad. How should" [puppet] - 10https://gerrit.wikimedia.org/r/171741 (owner: 10GWicke) [18:29:55] gwicke: me too! [18:29:58] paravoid: ? :) [18:30:20] https://gerrit.wikimedia.org/r/#/c/166888/ [18:30:46] so if I remove the parameter forwarding, will the parameters still namespaced to the role class? [18:30:53] *be [18:31:37] whoa: https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=terbium.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1395860566&v=648583&m=Global%20JobQueue%20length&z=large [18:31:43] 40m jobs in the queue [18:32:13] bd808 or Reedy: In general there's no actual correspondance between wikis and wiki hosts, right? So in theory wikitech should just live in the 'wiki pool' and get hosted here and there and everywhere? [18:32:30] andrewbogott: yup [18:32:56] Does it matter that wikitech uses its own auth and accounts and such? [18:33:22] many looks like it's basically all cirrusSearchLinksUpdate jobs [18:33:29] RECOVERY - check configured eth on xenon is OK: NRPE: Unable to read output [18:33:29] RECOVERY - Disk space on xenon is OK: DISK OK [18:33:32] andrewbogott: no, but all the appserver hosts would need to be able to access ldap and such [18:33:38] RECOVERY - check if salt-minion is running on xenon is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:33:39] RECOVERY - DPKG on xenon is OK: All packages OK [18:33:39] RECOVERY - check if dhclient is running on xenon is OK: PROCS OK: 0 processes with command name dhclient [18:33:42] Ah, right, ldap [18:33:46] Hm... [18:34:02] If you can break the link to openstack then the wiki is just a plain old wiki again [18:34:05] Actually I bet they already can. Lemme check [18:34:09] RECOVERY - RAID on xenon is OK: OK: no disks configured for RAID [18:34:25] Well, the link to openstack has always been via REST. So in theory that should be able to run anyplace... [18:34:37] oh cool [18:34:38] There might be a million race conditions with ldap though [18:34:56] I bet ldap is accessible because we use if for misc services [18:35:06] *use it [18:36:08] I'd think. The ldap tools aren't installed on the app servers but that might not matter. [18:37:40] So, hm… I can't think of a good way to test this incrementally. I guess we could set up an emtpy wikitech replacement on the cluster and then migrate the content later once it's tested. [18:38:03] "set up an empty wikitech replacement" which I definitely don't know how to do :) [18:39:30] andrewbogott: Reedy would be the guy to talk to. I just pretend to understand how this stuff works. He groks it fully. [18:39:46] ok. And he's in the UK, yes? [18:39:56] yeah [18:40:21] put how mostly kind of works on central US time'ish [18:40:27] *but he [18:40:37] ok, I'll bug him on Monday. It would be pretty great to get the wikitech wiki fully out of my hands. [18:40:45] * bd808 agrees [18:41:30] I'd still like to figure out how to update to a modern version of SMW for it too [18:41:41] so many side projects... [18:41:57] or just get rid of SMW :) [18:42:08] andrewbogott: maybe you should wait until OSM is gone, though? [18:42:55] paravoid: maybe, although I'm not sure it matters. OSM is already deployed with the normal deployment system... [18:43:10] So as long as the appservers can talk to virt1000 via http everything would work fine. [18:43:29] hm, and labnet1001 [18:43:38] we have a higher bar for extensions that run in production, due to the elevated access they have [18:43:49] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:43:49] e.g. csteipp may want to security review it [18:44:11] We ship the code to the cluster now, but yeah it's not enabled there [18:44:12] yeah, that's a fair point. [18:44:28] !log running scripts to fix bug 72927 [18:44:33] Logged the message, Master [18:45:05] as far as I understand it, wikitech's future probably has no OSM, and possibly not even SMW [18:45:17] and then it could just have no LDAP at all and just move to centralauth :) [18:45:26] and converted into a documentation wiki [18:45:48] but maybe that's too far away, I don't know [18:46:00] Hm, yeah, although we'd need to provide a different front-end to create ldap accounts. [18:46:10] Which maybe Horizon can handle, I'm not sure. Right now I've only been thinking of it as a consumer [18:46:21] Overall openstack/keystone has a read-only approach to ldap [18:46:43] how do others handle this? [18:46:55] creating accounts with keystone? [18:47:18] I think others mostly don't use ldap, they just let keystone manage users in its own db [18:47:29] PROBLEM - Disk space on db1017 is CRITICAL: DISK CRITICAL - free space: /var/lib/carbon 3443 MB (3% inode=98%): [18:47:40] (03CR) 10Dzahn: [C: 031] "support! per "The ISO code for multi-lingual resources is mul and this was recently added to the interwiki map to allow for l" [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher) [18:48:05] SMW is useful because it makes things that are otherwise keystone-protected (instances in a project, etc.) publicly visible queryable and sortable and such. [18:48:20] There might be another better way of exposing that though [18:48:31] I dunno, is wikidata the catch-all replacement for SMW? [18:49:13] in the future, but not yet [18:49:19] they're in the same conceptual space but there's nothing like feature parity [18:49:23] because getting data out of it is still too expensive [18:49:46] instances won't be in wikitech in our horizon feature, though, no? [18:50:04] andrewbogott: couldn't you just have a 'public' keystone account that the web app uses to query data? [18:50:27] paravoid: well, there are two things... [18:50:41] There's OSM which controls and queries OpenStack [18:50:55] but there's also an openstack notifier which updates the instance stat pages. [18:51:00] That latter thing is totally unrelated to OSM [18:51:07] And that's (mostly) what SMW consumes. [18:51:25] aha [18:51:39] RECOVERY - NTP on xenon is OK: NTP OK: Offset -0.01260328293 secs [18:51:42] do people actually use the latter? [18:51:47] ori: I'm not sure if there are read-only rights in openstack by default. I've never messed with them, at least. [18:51:59] paravoid: Yeah, I think they do. I do, at least. [18:52:08] That doesn't mean they're necessarily important, but... [18:52:14] It's nice to be able to see a project without first joining it. [18:53:19] So, for example, a page like this: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep [18:53:30] that's mostly SMW. We'd still have all that info if we yanked out OSM [18:53:43] Hm, some crazy bad formatting on that :( [18:54:28] yes, people use that [18:56:51] ori: andrewbogott re: exposing info, we can just write API on top [18:56:53] like we started [18:58:22] YuviPanda: Having the data pushed out into a db (if SMW can be called that) on update is kind of nice for querying though. Otherwise it'd be a serious storm of API calls anytime someone loaded a stat page [18:58:33] andrewbogott: well, memcached :) [18:58:42] like, we're hitting our APIs fairly often [18:58:43] and things are ok [18:58:46] And, really, how we get the data (API call or callback) is unrelated to the SMW issue [18:58:53] since we'd still need some kind of query/sort system [19:00:39] true [19:00:43] it's SMW vs writing our own [19:01:02] I far prefer the latter, tbh. those are fairly easy to write, and SMW is painful (at least for me) [19:02:59] * andrewbogott lunches [19:14:41] (03CR) 10Dzahn: [C: 032] "for https://gerrit.wikimedia.org/r/#/c/173250/" [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher) [19:15:43] (03CR) 10Dzahn: "mul.wikisource.org has address 198.35.26.96" [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher) [19:18:40] (03CR) 10Dzahn: "Glaisher: http://mul.wikisource.org/ already works without even changing Apache (right?)" [dns] - 10https://gerrit.wikimedia.org/r/173247 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher) [19:21:05] (03CR) 10Dzahn: "i merged the DNS change (https://gerrit.wikimedia.org/r/173247) and it looks to me like it already works just fine without even needing th" [puppet] - 10https://gerrit.wikimedia.org/r/173250 (https://bugzilla.wikimedia.org/73407) (owner: 10Glaisher) [19:32:19] (03CR) 10Dzahn: [C: 031] Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson) [19:36:17] (03PS1) 10Anomie: Copy a sanitized version of api-feature-usage [puppet] - 10https://gerrit.wikimedia.org/r/173336 [20:08:05] (03PS1) 10Manybubbles: Send more update jobs to Elasticsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 [20:09:01] (03PS2) 10Manybubbles: Send more update jobs to Elasticsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 [20:10:04] (03PS3) 10Manybubbles: Send more update jobs to Elasticsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 [20:11:46] (03CR) 10Manybubbles: "Right now the load average on the Elasticsearch cluster is super low. I figured it was because we through so much more hardware at it. I" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 (owner: 10Manybubbles) [20:15:00] Is Jenkins speaking Spanish to everyone or just me? https://integration.wikimedia.org/ci/ [20:15:49] not just you [20:16:55] well it's mixed, actually [20:17:10] "Clave del API" right above "Color blind support" [20:17:14] under "Configurar" [20:17:41] It's not really a problem, just… interesting [20:17:41] sounds like set to es but not all translations exist and then fallback to en [20:17:47] yea, indeed [20:17:48] cute [20:17:54] vaguely remember this came up before [20:18:25] (03PS1) 10Dzahn: remove glusterfs and pmtpa remnants [puppet] - 10https://gerrit.wikimedia.org/r/173349 [20:18:27] (03PS1) 10Dzahn: protoproxy - update usage examples to current [puppet] - 10https://gerrit.wikimedia.org/r/173350 [20:19:17] (03CR) 10Aaron Schulz: [C: 031] Send more update jobs to Elasticsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 (owner: 10Manybubbles) [20:19:21] (03CR) 10Dzahn: [C: 04-1] "change to modules/mariadb should not be here, grrmbl" [puppet] - 10https://gerrit.wikimedia.org/r/173349 (owner: 10Dzahn) [20:21:02] I have a question about SwiftFileBackend and LocalRepo configuration [20:21:42] (03CR) 10Manybubbles: "Scheduled for Monday 'morning' SWAT:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173347 (owner: 10Manybubbles) [20:21:43] thoughts about importing this module into our puppet repo? [20:21:44] (03CR) 10Dzahn: [C: 032] "just changes comments - show current example like from role/protoproxy.pp" [puppet] - 10https://gerrit.wikimedia.org/r/173350 (owner: 10Dzahn) [20:21:49] Coren: andrewbogott mutante ^ [20:21:50] err [20:21:52] module being https://github.com/erwbgy/puppet-limits [20:22:02] Is $wgLocalFileRepo useful only with a $wgFileBackends,? [20:22:26] AaronSchulz, question about wgFileBackends. Is $wgLocalFileRepo useful only with a $wgFileBackends? [20:23:34] (03PS2) 10Dzahn: protoproxy - update usage examples to current [puppet] - 10https://gerrit.wikimedia.org/r/173350 [20:24:10] YuviPanda: which one are you pointing at? context? [20:24:23] mutante: pointing at https://github.com/erwbgy/puppet-limits [20:24:24] renoirb: you can configure it without $wgFileBackends, as long as you don't try to set 'backend' [20:24:28] mutante: context is we need to disable coredumps on toollabs [20:24:39] and that requires placing an entry in /etc/security/limits.conf [20:24:39] (03PS1) 10Ori.livneh: Update EventLogging listener IP for labs [puppet] - 10https://gerrit.wikimedia.org/r/173352 [20:24:41] (03PS1) 10Ori.livneh: keyholder: add /etc/keyholder.d and `keyholder arm` subcommand [puppet] - 10https://gerrit.wikimedia.org/r/173353 [20:24:48] and that's a single file, so can't just put different files in [20:24:52] so needs some way to do this... [20:25:09] YuviPanda: that is pleasingly minimalist for an upstream module :) If you can verify that it won't tromp on anything currently running it seems fine with me. [20:25:10] AaronSchulz, the reason of my question is that i’m making configuration that will check if local deployment has an assigned file backend endpoint. [20:25:17] :D [20:25:27] (03PS2) 10Ori.livneh: keyholder: add /etc/keyholder.d and `keyholder arm` subcommand [puppet] - 10https://gerrit.wikimedia.org/r/173353 [20:26:06] (03PS3) 10Ori.livneh: keyholder: add /etc/keyholder.d and `keyholder arm` subcommand [puppet] - 10https://gerrit.wikimedia.org/r/173353 [20:26:09] andrewbogott: goddamit, it uses puppet augeas [20:26:13] andrewbogott: and I don't know if we use that [20:26:23] It is then safe to assume that $wgLocalFileRepo is not really required if no $wgFileBackends pointing to a Swift exists AaronSchulz ? [20:26:26] Hm, I don't know what that is [20:26:29] YuviPanda: ah, i see, so you are suggesting we import that module? sounds reasonable, i don't know about use_hiera true/false [20:26:42] YuviPanda: we used augeas for old firewall iptables [20:26:49] mutante: oh [20:26:49] before ferm [20:26:56] is augeas something you enable? [20:27:03] oh [20:27:07] apparently it's there by default? [20:27:09] https://docs.puppetlabs.com/references/latest/type.html#augeas [20:27:21] YuviPanda: we use it in modules/interface/ too [20:27:23] (03CR) 10Ori.livneh: [C: 032] keyholder: add /etc/keyholder.d and `keyholder arm` subcommand [puppet] - 10https://gerrit.wikimedia.org/r/173353 (owner: 10Ori.livneh) [20:27:25] oooh [20:27:26] modules/interface/manifests/tagged.pp: # Use augeas [20:27:30] mutante: so that means I *can* use this module [20:27:34] sounds like it, yea [20:27:41] modules/postgresql/manifests/user.pp: augeas { "hba_create-${name}": [20:27:49] grep -r augeas * [20:27:51] mutante: ok to merge? [20:27:56] dzahn: protoproxy - update usage examples to current (f0613a989d) [20:28:01] renoirb: really depends what you are doing...but if you only set wgLocalFileRepo for the purpose of using swift (and don't care about the other wgLocalFileRepo settings), then I suppose that could work [20:28:07] ori: sorry, yes please, just changes comments [20:28:14] np [20:28:25] ah, right [20:30:17] using that module would also mean we can just use hiera directly for limits anywhere [20:31:17] YuviPanda: is that a file that's not otherwise present on labs and prod systems? [20:31:27] andrewbogott: aha, so that's the thing. it sometimes is. [20:31:39] andrewbogott: so potential to run into conflicts is... high [20:31:42] well, not 'high' [20:31:43] ok, that's a bit worrisome then -- best to figure out what's happening there first. [20:31:44] but still... [20:31:59] andrewbogott: oh, no, if I include the module I'll change the other usages to use the module [20:32:06] andrewbogott: by default it's emptyu [20:32:08] *empty [20:32:15] ok, so it's only present when puppetized? [20:32:16] andrewbogott: we've puppet code that specifically puts a file there sometime [20:32:18] yeah [20:32:26] OK, yeah, then standardizing on that module seems great. [20:32:28] well, the file is present otherwise too, but is fully commented out by default [20:32:42] AaronSchulz, i’m making a config generator in Salt. I want it to support multiple deployments, local, on staging (with a different set of credentials) and production. So, I need to figure out what is required when [20:32:42] I wonder if I should use librarian puppet [20:32:45] probably not right now [20:32:55] andrewbogott: I guess we'll need to import it into gerrit [20:33:37] You could manage it as a submodule or just copypaste into the normal repo [20:33:48] If the license allows it… the latter might be just as good [20:33:53] <_joe_> YuviPanda: librarian is horrible and we don't want it [20:34:03] <_joe_> I took a look and it's a crippled concept [20:34:25] <_joe_> what would you want librarian for, and what would it give us over git submodules? [20:34:42] andrewbogott: ok, this isinfuriating. [20:34:46] andrewbogott: it doesn't have a LICENSE file [20:34:55] andrewbogott: oh [20:34:56] andrewbogott: it does [20:34:59] andrewbogott: license 'Apache License, Version 2.0' [20:35:12] that certainly allows for copypasta [20:35:19] yeah, but we should use a submodule, I THINK [20:35:20] *think [20:35:32] _joe_: I haven't looked into it too much, but I'm not too much a fan of submodules. [20:35:43] anyway, that's for another day... [20:35:44] ok, I take it back, my Spanish is not so good [20:35:46] <_joe_> YuviPanda: I plainly fucking hate them [20:35:51] hehe [20:35:58] <_joe_> but they're still better than librarian [20:35:59] <_joe_> :) [20:36:02] haha :) [20:36:12] (03CR) 10Andrew Bogott: [C: 032] Removed some obsolete roles. [puppet] - 10https://gerrit.wikimedia.org/r/173293 (owner: 10Andrew Bogott) [20:36:34] _joe_: considering using https://github.com/erwbgy/puppet-limits [20:36:41] _joe_: will let us manage security/limits.conf via hiera [20:36:51] _joe_: I need to use that to disable coredumps on labs [20:37:08] but since the code I need to conditionalize it is in the base module... [20:37:09] <_joe_> YuviPanda: remember upstart doesn't give a fuck about limits.conf :) [20:37:26] hmm? these are coredumps from lighty started by SGE... [20:37:33] why would upstart matter, at least for this particular use case [20:37:50] (this is for https://phabricator.wikimedia.org/T1259) [20:38:04] <_joe_> YuviPanda: I just reminded you that you have this caveat now when using that file [20:38:09] ah, right [20:38:11] yeah [20:38:49] anyway, let me put that in [20:38:54] <_joe_> so if SGE is started via upstart (I have no idea) [20:39:03] it is... [20:39:05] <_joe_> YuviPanda: that module is not that great [20:39:44] <_joe_> YuviPanda: so look up how limits are enforced on linux, and see how that won't affect your processes :) [20:39:52] hmm, alright. [20:40:03] <_joe_> unless SGE does read that file itself [20:40:16] <_joe_> and sets limits to its jobs [20:40:28] hmm, all I wanted to do was disable coredumps, and now I guess I'll start yakshaving... [20:41:36] <_joe_> YuviPanda: http://upstart.ubuntu.com/cookbook/#limit [20:42:22] ori: I'm wondering if I should just revert https://gerrit.wikimedia.org/r/#/c/171206/ (and add counteracting code) for the weekend. TOolLabs hosts get their (tiny) /vars filled up every few hours now because of coredumps [20:42:22] (03CR) 10BryanDavis: [C: 031] "Anomie: as soon as you are happy with the data you are getting in logstash from this, add jgage as a reviewer and poke him to merge it." [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie) [20:42:30] ori: and I don't want to manually fight that during the weekend... [20:42:47] YuviPanda: just add a special tidy {} resource for labs [20:43:00] ori: well, sometimes just one coredump is enough to fill it up [20:43:11] that's a bug [20:43:24] indeed, because /var is 2G, and a lot of it is filled with logs [20:44:03] YuviPanda: another workaround: add a profile.d script that calls 'ulimit -S -c 0 > /dev/null 2>&1' [20:44:13] fix is to have bigger /var, and the newer labs images do have a slightly bigger /var. but these aren't new instances... [20:44:22] ah, hmm... [20:44:41] still, that's just a workaround. the 'core' bug (tiny /var) isn't going to be fixed anytime soon. [20:45:05] and nobody uses the coredumps on tools. they're almost always lighty going off. [20:45:11] hmm, I wonder what the status quo was before that patch. [20:45:13] so disable core dumps [20:45:51] hmm, limits.conf, and _joe_ was pointing out weirdness there, and now I'm thoroughly confused. [20:46:16] * YuviPanda goes to read more things [20:47:57] (03CR) 10Yuvipanda: [C: 04-2] "Still WIP" [puppet] - 10https://gerrit.wikimedia.org/r/173080 (owner: 10Yuvipanda) [20:52:54] puppet compiler also Spanish :) [20:52:56] Proyecto operations-puppet-catalog-compiler [20:53:06] Enlaces permanentes [20:54:11] (03CR) 10Anomie: "I wonder whether we should merge this as-is or combine it with whatever changes are necessary to push it into the search ES." [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie) [21:02:27] has anything changed recently re: hhvm on osmium? Trying to run some tests but i keep getting 'Warning: Compilation failed: this version of PCRE is compiled without UTF support at offset 0 in /srv/mediawiki/..." along with the regular expressions not matching [21:05:00] <_joe_> ebernhardson: ask ori, but I think osmium is more of a hhvm dev sandbox than a real hhvm test box [21:05:00] ebernhardson: the build there is locally hacked in half a dozen ways to make it possible to debug an issue we had [21:05:04] ebernhardson: what are you trying to do? [21:05:12] ebernhardson: hah, re bug 73426? [21:05:27] ori: have a bug thats happening on ptwiki Echo that i cant reproduce locally, so i boot an instance on osmium and attach the debugger [21:05:49] (on alternate ports than the default, because attaching a debugger stops the world) [21:06:04] ebernhardson: you can do that on mw1017 too [21:06:18] ori: ok excellent i'll try there, thanks [21:06:35] (03PS1) 10Ori.livneh: mediawiki: move beta::mwdeploy_sudo to mediawiki::users [puppet] - 10https://gerrit.wikimedia.org/r/173364 [21:07:42] ls [21:08:57] (03CR) 10BryanDavis: "Building on this until you get the whole thing ready is fine with me, but merging now is fine with me too. One advantage to merging earlie" [puppet] - 10https://gerrit.wikimedia.org/r/173336 (owner: 10Anomie) [21:08:57] bin boot dev etc home sbin usr var [21:14:14] _joe_: would you expect me to still specify default values for variables as parameters? Or should I just not use params at all and assume that it's hiera all the way up? [21:14:30] This all makes me slightly nervous since it feels a bit like re-inventing global variables [21:16:38] (03PS1) 10Ori.livneh: trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 [21:23:42] MatmaRex: yea. For extra fun when i run that same revision through echo's generate events method in the debugger, its sending the notification... funsies :) [21:23:54] ouch [21:35:39] (03CR) 10BryanDavis: [C: 031] "I thought that deployment-rsycn01 would need to have ::mediawiki::users applied, but it turns out it already has it." [puppet] - 10https://gerrit.wikimedia.org/r/173364 (owner: 10Ori.livneh) [21:37:59] (03PS2) 10Andrew Bogott: Move openstack_version and use_neutron into hiera [puppet] - 10https://gerrit.wikimedia.org/r/173294 [21:38:47] (03CR) 10jenkins-bot: [V: 04-1] Move openstack_version and use_neutron into hiera [puppet] - 10https://gerrit.wikimedia.org/r/173294 (owner: 10Andrew Bogott) [21:40:00] (03PS3) 10Andrew Bogott: Move openstack_version and use_neutron into hiera [puppet] - 10https://gerrit.wikimedia.org/r/173294 [21:43:43] (03CR) 10Dzahn: "thanks guys for reviewing and fixes. also http://puppet-compiler.wmflabs.org/489/change/172803/diff/iron.wikimedia.org.diff.formatted just" [puppet] - 10https://gerrit.wikimedia.org/r/172803 (https://bugzilla.wikimedia.org/35611) (owner: 10Dzahn) [21:46:59] !log restarted /etc/init.d/ganglia-monitor on logstash1003 [21:47:04] Logged the message, Master [21:48:24] (03CR) 10Dzahn: "Alex, you are right, we just really needed the ListenAddress part, nevertheless i think it wouldn't hurt and it was already a dependency n" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn) [21:51:21] mutante: wld appreciate a +1 for https://gerrit.wikimedia.org/r/#/c/173364/ if you have a moment [21:51:29] (03PS2) 10Dzahn: misc-web varnish: bugzilla to phab box [puppet] - 10https://gerrit.wikimedia.org/r/172471 [21:52:41] (03CR) 10Dzahn: [C: 04-2] "technical downvote because it depends on change in other repo" [puppet] - 10https://gerrit.wikimedia.org/r/172471 (owner: 10Dzahn) [21:54:06] (03PS3) 10Dzahn: switch bugzilla names over to misc-web [dns] - 10https://gerrit.wikimedia.org/r/172469 [21:57:08] (03PS1) 10Yuvipanda: tools: Remove unused MountCollector ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/173433 [21:58:11] (03PS2) 10Ori.livneh: trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 [21:58:24] (03PS2) 10Yuvipanda: tools: Remove unused MountCollector ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/173433 [22:01:58] (03CR) 10Yuvipanda: [C: 032] tools: Remove unused MountCollector ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/173433 (owner: 10Yuvipanda) [22:03:24] (03CR) 10Alexandros Kosiaris: [C: 032] "In general I dislike "features" that don't have an obvious use case. That being said, we have already discussed this way more that necessa" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn) [22:08:51] (03PS3) 10Ori.livneh: trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 [22:09:02] (03PS4) 10Ori.livneh: trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 [22:09:15] (03CR) 10Ori.livneh: [C: 032 V: 032] trebuchet: try to resolve tag against local repo, too [puppet] - 10https://gerrit.wikimedia.org/r/173365 (owner: 10Ori.livneh) [22:13:38] (03CR) 10Dzahn: [C: 032] "thanks : http://puppet-compiler.wmflabs.org/490/change/172799/html/iron.wikimedia.org.html" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn) [22:14:03] (03CR) 10Ori.livneh: "another possibility is to provision /etc/init/ssh.override which includes the line "exec sshd -D -p <%= @port %>". This will take preceden" [puppet] - 10https://gerrit.wikimedia.org/r/172799 (owner: 10Dzahn) [22:16:38] greg-g: Whoops, your deployment e-mail got the wmfNs off by one. :-) [22:17:41] PROBLEM - SSH on ms-fe1002 is CRITICAL: Connection refused [22:18:22] PROBLEM - SSH on mw1131 is CRITICAL: Connection refused [22:18:27] oh oh [22:23:31] RECOVERY - SSH on mw1131 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [22:24:52] RECOVERY - SSH on ms-fe1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [22:25:00] puppet race condition or so [22:25:16] the port was blank but it fixed it on second run [22:35:00] James_F: gah! [22:35:16] greg-g: Not a major issue. :-) [22:36:51] (03CR) 10Dzahn: [C: 031] mediawiki: move beta::mwdeploy_sudo to mediawiki::users [puppet] - 10https://gerrit.wikimedia.org/r/173364 (owner: 10Ori.livneh) [22:37:03] (03PS1) 10Ori.livneh: trebuchet: Fix-up for Ie6673c8af [puppet] - 10https://gerrit.wikimedia.org/r/173438 [22:37:05] (03PS2) 10Ori.livneh: mediawiki: move beta::mwdeploy_sudo to mediawiki::users [puppet] - 10https://gerrit.wikimedia.org/r/173364 [22:37:09] (03CR) 10Ori.livneh: [C: 032 V: 032] mediawiki: move beta::mwdeploy_sudo to mediawiki::users [puppet] - 10https://gerrit.wikimedia.org/r/173364 (owner: 10Ori.livneh) [22:37:18] (03PS2) 10Ori.livneh: trebuchet: Fix-up for Ie6673c8af [puppet] - 10https://gerrit.wikimedia.org/r/173438 [22:37:29] (03CR) 10Ori.livneh: [C: 032 V: 032] trebuchet: Fix-up for Ie6673c8af [puppet] - 10https://gerrit.wikimedia.org/r/173438 (owner: 10Ori.livneh) [22:39:53] (03PS2) 10Ori.livneh: Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson) [22:39:56] (03CR) 10Ori.livneh: [C: 032] Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson) [22:40:05] (03Merged) 10jenkins-bot: Deploy Translate extension on ca.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173229 (https://bugzilla.wikimedia.org/73394) (owner: 10Dereckson) [22:41:11] !log ori Synchronized wmf-config/InitialiseSettings.php: If60e3fe97: Deploy Translate extension on ca.wikimedia (duration: 00m 05s) [22:41:19] Logged the message, Master [22:43:37] there it goes [22:44:32] It's not that rare ;) [22:45:10] Once upon a time, every new Wikimedia wiki joining Translate needed assistance from N. We got better! [22:46:36] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/491/change/172804/diff/iron.wikimedia.org.diff.formatted" [puppet] - 10https://gerrit.wikimedia.org/r/172804 (owner: 10Dzahn) [22:46:45] ori: you will create the database tables as well, will you? [22:48:54] hehe [22:49:25] nod [22:51:46] done [22:56:52] (03CR) 10Dzahn: [C: 04-1] "yea, uhm.. i would call this one a limitation of puppet-lint itself. looks like it might not be possible to fix the warning without using " [puppet] - 10https://gerrit.wikimedia.org/r/170493 (owner: 10John F. Lewis) [22:58:06] (03PS2) 10Dzahn: remove glusterfs and pmtpa remnants [puppet] - 10https://gerrit.wikimedia.org/r/173349 [22:59:03] (03CR) 10Dzahn: [C: 04-2] "if you feel like fixing it and remove the unrelated change to mariadb, please go ahead" [puppet] - 10https://gerrit.wikimedia.org/r/173349 (owner: 10Dzahn) [23:05:24] (03PS1) 10Kaldari: Updating WikiGrok A/B test start/end times - postponing until Monday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173447 [23:11:38] (03PS2) 10Kaldari: Updating WikiGrok A/B test start/end times - postponing until Monday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173447 [23:14:21] (03CR) 10Dzahn: [C: 032] "identical: http://puppet-compiler.wmflabs.org/493/change/164273/html/ (and where it fails those are unrelated to this change. Error: Faile" [puppet] - 10https://gerrit.wikimedia.org/r/164273 (owner: 10Hoo man) [23:14:30] (03PS5) 10Dzahn: Remove all references to pmtpa from role::cache [puppet] - 10https://gerrit.wikimedia.org/r/164273 (owner: 10Hoo man) [23:15:51] (03CR) 10Dzahn: [C: 032] Remove all references to pmtpa from role::cache [puppet] - 10https://gerrit.wikimedia.org/r/164273 (owner: 10Hoo man) [23:19:42] (03CR) 10Kaldari: [C: 032] Updating WikiGrok A/B test start/end times - postponing until Monday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173447 (owner: 10Kaldari) [23:19:51] (03Merged) 10jenkins-bot: Updating WikiGrok A/B test start/end times - postponing until Monday [mediawiki-config] - 10https://gerrit.wikimedia.org/r/173447 (owner: 10Kaldari) [23:21:01] !log kaldari Synchronized wmf-config/mobile.php: Updating WikiGrok A/B test start/end times (duration: 00m 07s) [23:21:04] Logged the message, Master [23:22:13] (03PS1) 10Dzahn: pmacct - remove commented pmtpa core router [puppet] - 10https://gerrit.wikimedia.org/r/173454 [23:23:01] (03CR) 10Dzahn: [C: 032] "cr2-pmtpa is gone" [puppet] - 10https://gerrit.wikimedia.org/r/173454 (owner: 10Dzahn) [23:27:53] (03PS1) 10Dzahn: lvs config: remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/173456 [23:28:29] (03PS2) 10Dzahn: lvs config: remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/173456 [23:29:16] ^ is there LVS in eqiad labs? [23:29:19] https://gerrit.wikimedia.org/r/#/c/173456/2/modules/lvs/manifests/configuration.pp [23:31:32] (03PS3) 10Dzahn: lvs (labs) config: remove pmtpa [puppet] - 10https://gerrit.wikimedia.org/r/173456 [23:37:16] (03PS1) 10Dzahn: openstack: rm all files/folsom/ [puppet] - 10https://gerrit.wikimedia.org/r/173459 [23:43:55] (03PS1) 10Dzahn: openstack: folsom -> havana as default version [puppet] - 10https://gerrit.wikimedia.org/r/173460 [23:54:30] (03PS1) 10Dzahn: mha: replace pmtpa with codfw? (and logging.pp) [puppet] - 10https://gerrit.wikimedia.org/r/173464 [23:59:19] (03CR) 10Dzahn: "logstash.pmtpa.wmflabs is gone. how about mha though?" [puppet] - 10https://gerrit.wikimedia.org/r/173464 (owner: 10Dzahn)