[00:00:07] I quickly jumped on terbium to have a look but I couldn't get hold of anything useful :/ [00:00:12] telling icinga to check for excess sleepers specifically could be done. would need to work around wikiadmin user which sleeps a lot [00:01:51] hoo: don't know about the memcached timeout [00:02:00] that is rather high [00:03:01] SELECT COUNT(*) FROM PROCESSLIST WHERE user = 'wikiuser' AND time > 30; [00:03:09] something like that would suffice, right? [00:03:22] Will ping Ori or so tomorrow about the memcached timeout [00:03:48] would need AND command = 'Sleep' AND info is null AND time < 1000000 [00:04:26] right... we don't care about non-sleepers (although 30s is quite awry anyway) [00:04:34] anyway, I should go to bed if I want to start working around 11am tomorrow :P [00:04:43] last clause because for a very brief period, new connections have time = 2147483647 [00:05:00] sure. thanks [00:05:02] never obeserved that, I think [00:05:05] * observed [00:05:16] If you find anything mail me or open a bug or so [00:05:16] 5.5 bug iirc [00:06:36] this is the script i've been using to kill: https://git.wikimedia.org/blob/operations%2Fsoftware/a091384003604ed280cf0bf83479162ec360d5fe/dbtools%2Farbiter.pl [00:06:45] simlar queries [00:08:13] (03PS1) 10Ori.livneh: Beta: use MemcachedPhpBagOStuff if running under HHVM [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122272 [00:09:13] (03CR) 10Ori.livneh: [C: 032] Beta: use MemcachedPhpBagOStuff if running under HHVM [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122272 (owner: 10Ori.livneh) [00:09:20] (03Merged) 10jenkins-bot: Beta: use MemcachedPhpBagOStuff if running under HHVM [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122272 (owner: 10Ori.livneh) [00:26:22] springle: nice one, although I'm not into perl much :P [00:26:22] # wikiuser sleepers get killed at 300s [00:26:24] :) [00:26:55] that block is new, as a result of the weekend :) [00:27:12] but its possible the other limits caught sleepers too [00:27:59] something probably killed them, but I'm not sure how long the whole thing lasted etc. [00:28:12] hoo: since you're awake ;)... monitoring max_connections -- how to do it sanely without using the root account i wonder [00:28:39] and how to catch it in the act. icinga checks are quite far apart [00:29:09] maybe check for spikes in Max_used_connections, but that would be after the fact if nagios cannot get a connection itself [00:29:11] not sure how often icinga checks... given that these lasted like 300s maybe we should probably check every 100s or so? [00:29:44] if we go much higher than that occasional troubles might go unnoticed [00:30:05] these sorts of torubles are very obvious in the dberror.log and logstash [00:30:19] maybe watching fo noise there somehow [00:30:21] right... but does anybody really look at those [00:30:32] well true [00:30:33] like I only do then shit gets serious [00:30:36] * when [00:31:15] of course if the dberror are crying that there are no more read slaves at hand stuff is probably quite wrong [00:31:49] so something else will complain, yes. still, it's good point about max_connections [00:32:42] but that would be after the fact if nagios cannot get a connection itself << well it will error than anyway, right? If could probably also be smart about the error returned [00:32:50] * It could [00:32:56] I should go to bed, I think :P [00:33:03] i'd like some sort of icinga dberror-is-growing-fast [00:33:12] yes true, a failure would be critical [00:33:26] failure *to connect [00:34:54] ok, really leaving now... will poke people tomorrow about memcached, that path seems rather likely (and waiting for something like memcached for 250s can only have one purpose: Making sure that stuff doesn't cache stampede in case mc is down) [00:35:29] I guess that's the reason, but who knows... that should probbaly also log [02:13:38] !log LocalisationUpdate completed (1.23wmf19) at 2014-03-31 02:13:38+00:00 [02:13:45] Logged the message, Master [02:30:26] !log LocalisationUpdate completed (1.23wmf20) at 2014-03-31 02:30:26+00:00 [02:30:32] Logged the message, Master [03:08:42] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Mar 31 03:08:39 UTC 2014 (duration 8m 38s) [03:08:48] Logged the message, Master [04:27:22] (03CR) 10Faidon Liambotis: [C: 04-1] Manage scap proxy rsync config in puppet (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119677 (owner: 10Reedy) [04:28:47] (03CR) 10Faidon Liambotis: [C: 04-1] Update docroot_dir_allows to use network::constants::mediawiki_appservers (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119668 (owner: 10Reedy) [04:30:02] (03CR) 10Faidon Liambotis: [C: 032] Add mw1161 and mw1201 as scap proxies for EQIAD row C and D [operations/puppet] - 10https://gerrit.wikimedia.org/r/119686 (owner: 10Reedy) [04:33:10] (03CR) 10Faidon Liambotis: Manage scap proxy rsync config in puppet (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119677 (owner: 10Reedy) [04:33:56] (03CR) 10Faidon Liambotis: [C: 04-1] "Minor gotcha -- otherwise looks good." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119686 (owner: 10Reedy) [05:11:33] hey TimStarling [05:12:01] hello [05:13:16] busy? [05:14:12] not with anything urgent [05:15:18] it's about my email regarding memcached [05:16:09] on the ops list? [05:16:11] yes [05:17:18] is this high rate of filter:minify-js queries normal? [05:18:50] I'll just document what I've done on bug 62623 and then have a look at it [05:19:02] okay :) [05:23:07] the spam_blacklist_regexes that Ori found is definitely a lot of data [05:23:26] I suspect it might be related to this Ukrainian spammer that has been hitting us for a few days now [05:23:47] * Jasper_Deng has always wanted to ask, do we ever firewall spammers from even connecting? [05:23:49] but I don't think it's the cause for the spike [05:24:09] since that's just mc1004, not mc1010 [05:24:23] Jasper_Deng: firewall no, but mediawiki returns a throttled error [05:27:46] although it's kind of tempting to firewall of this ISP entirely, I must say [05:31:47] from the last 10.000 throttlederror requests [05:31:51] 9917 GeoIP ASNum Edition: AS15895 Kyivstar PJSC [05:32:07] *responses [05:32:19] ResourceLoader::makeModuleResponse & ResourceLoader::filter have not been invoked more frequently: http://graphite.wikimedia.org/render/?width=586&height=308&_salt=1396243737.595&from=-2weeks&target=MediaWiki.ResourceLoader.filter.count&target=MediaWiki.ResourceLoader.makeModuleResponse.count [05:32:40] that's about ~23mins [05:32:42] oh hi ori [05:33:00] sorry, didn't mean to ping you with the above :) [05:33:30] Just as well, since I just logged on looking for some moment's distraction [05:34:57] also: hello [05:36:48] heh: http://graphite.wikimedia.org/render/?width=586&height=308&_salt=1396244200.34&from=-2weeks&target=MediaWiki.SpamBlacklist.getRegex.count [05:36:57] I think we have a winner [05:37:09] nice [05:37:33] that explains mc1004 [05:37:57] also, that looks like aligned to the deployment window, not just the spammer? [05:39:14] so you need to know what calls SpamBlacklist::getRegex()? [05:39:30] https://gerrit.wikimedia.org/r/#/c/117231/ ? [05:40:25] that's a very promising candidate [05:43:57] we can comment the hook at SpamBlacklist.php as a local hack and sync [05:44:10] we can see the effect almost immediately [05:44:18] shall I? I'll wait for TimStarling :) [05:45:13] go ahead [05:45:15] mc1010 gets queried for enwiki:messages:individual:Spam-blacklist constantly [05:47:16] !log faidon synchronized php-1.23wmf19/extensions/SpamBlacklist/SpamBlacklist.php 'local hack to test memcached traffic increase theory' [05:47:21] Logged the message, Master [05:48:25] we have a winner [05:48:34] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Memcached+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [05:49:09] 526M -> 311M [05:50:00] 295 and dropping [05:52:35] dwelling too long on the question of about why 83 kilobytes have to be repeatedly evacuated from memory and retrieved across a network link is liable to make one very sad [05:52:49] s/about // [06:33:46] http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=mc1010.eqiad.wmnet&m=cpu_report&r=month&s=by%20name&hc=4&mc=2&st=1396247583&g=network_report&z=large&c=Memcached%20eqiad [06:33:50] I wonder if it was the same issue [06:34:11] it probably was [06:40:25] (03PS5) 10Faidon Liambotis: Add ?download parameter to images [operations/puppet] - 10https://gerrit.wikimedia.org/r/120617 (owner: 10Gilles) [06:53:01] (03PS6) 10Faidon Liambotis: Varnish: add ?download parameter to upload [operations/puppet] - 10https://gerrit.wikimedia.org/r/120617 (owner: 10Gilles) [06:53:13] (03PS7) 10Faidon Liambotis: Varnish: add ?download parameter to upload [operations/puppet] - 10https://gerrit.wikimedia.org/r/120617 (owner: 10Gilles) [07:10:42] Reedy: ping when you're around [07:20:27] paravoid: still around ? [07:20:52] for the next 3' or so [07:21:04] cool, short question: [07:21:52] otto told me to ask you. hashar (and i) would like to bring puppet-lint fron trusty to apt.wikimedia.org what should be done? i have tested it in labs and it seems sane [07:22:03] same dep's and the like [07:22:08] what do we need it for? [07:22:18] better jenkins lints [07:22:45] !log faidon synchronized php-1.23wmf19/extensions/SpamBlacklist/SpamBlacklist.php 'revert I694860b - SpamBlacklist' [07:22:51] Logged the message, Master [07:23:01] okay [07:23:04] !log faidon synchronized php-1.23wmf20/extensions/SpamBlacklist/SpamBlacklist.php 'revert I694860b - SpamBlacklist' [07:23:08] I'll reprepro include it later today [07:23:09] Logged the message, Master [07:23:23] thank you, do you need a ticket for it ? [07:24:07] I don't, but I wouldn't mind it either [07:25:04] for the sake of completence i'll create one. so we know it was reqested and handled. Thanks a lot [07:37:37] (03CR) 10QChris: [C: 04-1] Puppetizing Camus cronjob (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/121546 (owner: 10Ottomata) [07:48:20] (03CR) 10Hashar: Lint misc/icinga.pp (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122131 (owner: 10Hashar) [07:48:28] (03PS2) 10Hashar: Lint misc/icinga.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/122131 [07:51:20] hashar: fwiw puppet lint != our common practice [07:52:42] hi, i was going to check if puppet-lint update happened, and noticed apt.wm.org doesnt show me much [07:53:40] ah, that ticket is just a couple minutes old, didn't notice:) [07:54:16] yes, if you were around you can see backlog here [07:54:36] yes [07:54:58] and morning, or night, or whatever it is now for you :) [07:56:26] thanks, it's like "season's greetings" on a daily basis [08:12:13] (03CR) 10Dzahn: "icinga::monitor::snmp is included twice (line 43), looks good but the lower section that splits up all the files is really hard to review " (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122131 (owner: 10Hashar) [08:14:35] (03CR) 10Dzahn: [C: 031] "i see that duplication was in there all this time, so it's not introducing anything new" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122131 (owner: 10Hashar) [08:19:28] (03CR) 10Dzahn: Manage scap proxy rsync config in puppet (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119677 (owner: 10Reedy) [08:28:04] (03CR) 10Hashar: "Removing dupe include." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122131 (owner: 10Hashar) [08:28:11] (03PS3) 10Hashar: Lint misc/icinga.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/122131 [08:28:46] (03CR) 10Hashar: "Alexandros, can you please run your catalog compilation script to verify this change is a no-op ? Thx" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122131 (owner: 10Hashar) [08:28:48] (03PS1) 10Dzahn: lint fundraising [operations/puppet] - 10https://gerrit.wikimedia.org/r/122331 [08:30:06] (03PS1) 10Matanya: db: lint role [operations/puppet] - 10https://gerrit.wikimedia.org/r/122332 [08:30:31] mutante: hopefully we will get akosiaris catalog compilation script added in Jenkins which would provide a diff output or something [08:30:42] I am not sure how long it takes to generate the catalogs though [08:30:54] (03PS1) 10Dzahn: retab role/gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/122333 [08:31:17] he said a long time hashar [08:31:35] (03CR) 10Hashar: [C: 031] retab role/gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/122333 (owner: 10Dzahn) [08:31:40] hashar: yes, having that in Jenkins has been a thought, would be really nice [08:31:56] matanya: we can make it a Jenkins jobs that folks can trigger manually by passing the change # [08:32:47] hashar: why not just a check [08:33:24] cause it is too long? [08:33:33] that would delay reporting back to gerrit [08:33:39] makes sense [08:35:06] heh, and once we have that (manually triggered job), add to IRC bot.. !check 12345 :) [08:35:11] that would be fun [08:35:22] yea! [08:35:33] btw hashar have you seen https://wikitech.wikimedia.org/wiki/Puppet_usage#Coding_Style ? [08:35:34] can definitely do that one day [08:35:41] upstream is adding an API to Zuul [08:35:59] nice! [08:43:37] (03PS1) 10Dzahn: lint role/keystone (labs) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122334 [08:52:29] (03PS1) 10Dzahn: lint labsproxy.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/122335 [08:52:33] (03CR) 10Matanya: lint fundraising (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122331 (owner: 10Dzahn) [08:55:45] (03CR) 10Matanya: lint role/keystone (labs) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122334 (owner: 10Dzahn) [09:00:00] (03CR) 10Matanya: lint labsproxy.pp (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122335 (owner: 10Dzahn) [09:09:20] (03PS1) 10Dzahn: lint role/deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 [09:10:18] (03CR) 10Dzahn: "here's a challenge, try aliging all => correctly in this file between line 7 and 143 :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [09:11:29] (03CR) 10Dzahn: "i say even if correct it would look super ugly and might be an example of the limits of lint/style. that's why i didn't do it on this one" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [09:12:19] * matanya hates it :) [09:12:49] :) [09:17:10] (03CR) 10Matanya: "awww, oh, ouch." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [09:30:14] (03PS2) 10Hashar: role::parsoid::beta needs contint slave scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/120823 [09:37:35] btw mutante you can see in the gdash role it is done nicely [09:46:07] !log Jenkins: applied hasBrowserTests label on both labs slave. Unblocks the browser tests which were still tied to a deleted instance {{gerrit|122341}} [09:46:13] Logged the message, Master [09:48:18] matanya: i would challenge "no need for []". it makes it easier to add a second user? [09:48:53] hmm, it does. but with this argument we should add it to any resource [09:48:56] actually it appears where there was more than 1 user in the past and then got removed [09:49:00] (as well) [09:49:10] still [09:54:16] (03PS1) 10Hashar: contint::slave-scripts recurse submodules [operations/puppet] - 10https://gerrit.wikimedia.org/r/122342 [10:06:48] (03CR) 10Hashar: [C: 031 V: 032] "deployed on integration-puppetmaster.eqiad.wmflabs" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122342 (owner: 10Hashar) [10:20:16] (03PS2) 10Dzahn: lint fundraising [operations/puppet] - 10https://gerrit.wikimedia.org/r/122331 [10:20:20] (03CR) 10Dzahn: lint fundraising (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122331 (owner: 10Dzahn) [10:22:53] !change 121574 | hashar [10:23:00] (03CR) 10Matanya: [C: 031] lint fundraising [operations/puppet] - 10https://gerrit.wikimedia.org/r/122331 (owner: 10Dzahn) [10:25:41] (03CR) 10Dzahn: [C: 032] Fix repos with checked-in .gitmodules [operations/puppet] - 10https://gerrit.wikimedia.org/r/121574 (owner: 10Ryan Lane) [10:26:05] !gerrt 121575 [10:26:08] !gerrit 121575 [10:26:16] STUPID BOT OF DOOM IS DEAD AGAIN [10:27:30] ah [10:27:30] Fix repos with checked-in .gitmodules [10:27:30] https://gerrit.wikimedia.org/r/121574 [10:27:31] :-] [10:28:00] * mutante merges that and also a change to vagrant ori forgot to merge on palladium [10:28:10] which changes the home directory of vagrant user [10:28:19] ori: [10:28:42] mutante: can you run puppetd on tin.eqiad.wmnet so I can test the fix? [10:28:44] hashar: , yes, done. no deployment right now, heh [10:28:52] yes [10:29:22] running [10:33:25] hashar: it ran once, not sure if that was it already..looking [10:33:31] trying [10:35:54] hashar: deploy.py is on other node .. [10:36:00] salt_master.pp [10:36:10] checks on palladium [10:36:37] salt_master.pp: require => [File["${module_dir}/deploy.py"]], [10:37:04] eh, i should say source => 'puppet:///modules/deployment/modules/deploy.py' [10:37:16] notice: /Stage[main]/Deployment::Salt_master/File[/srv/salt/_modules/deploy.py]/content: content changed [10:37:20] try now [10:38:08] (03CR) 10Dzahn: "has been deployed on salt master (palladium). notice: /Stage[main]/Deployment::Salt_master/File[/srv/salt/_modules/deploy.py]/content: con" [operations/puppet] - 10https://gerrit.wikimedia.org/r/121574 (owner: 10Ryan Lane) [10:38:40] mutante: that fixed it thanks [10:39:20] (03CR) 10Hashar: "Bug fixed!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/121574 (owner: 10Ryan Lane) [10:39:32] hashar: great! [10:39:48] thank you [10:40:23] yw, thank Ryan for the fix [10:40:38] (btw, i met him at salt user group last week0 [10:45:49] (03PS1) 10Hashar: contint: browsertests need ruby1.9.1-dev [operations/puppet] - 10https://gerrit.wikimedia.org/r/122346 [10:48:25] (03CR) 10Hashar: [C: 031 V: 032] "Cherry picked on integration puppet master integration-puppetmaster.eqiad.wmflabs.org" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122346 (owner: 10Hashar) [10:48:52] !log Jenkins fixed up browsertests jobs. Bundler could not compile gems on the eqiad slaves {{gerrit|122346}} [10:48:58] Logged the message, Master [10:58:44] (03PS2) 10Dzahn: decom Tampa: remove service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 [10:58:46] (03CR) 10jenkins-bot: [V: 04-1] decom Tampa: remove service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 (owner: 10Dzahn) [11:00:07] thanks jenkins, i just clicked in web ui to change commit message :p [11:00:21] hah, yea, rebasing [11:05:58] (03PS3) 10Dzahn: decom Tampa: remove service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 [11:07:11] (03PS4) 10Dzahn: decom Tampa: remove service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 [11:13:47] (03PS1) 10Dzahn: remove more Tampa search remnants [operations/dns] - 10https://gerrit.wikimedia.org/r/122350 [11:14:57] (03PS5) 10Dzahn: decom Tampa: remove service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 [11:15:40] (03CR) 10Dzahn: [C: 031] "to confirm, see pybal config" [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 (owner: 10Dzahn) [11:24:08] (03CR) 10Matanya: [C: 031] remove more Tampa search remnants [operations/dns] - 10https://gerrit.wikimedia.org/r/122350 (owner: 10Dzahn) [11:24:37] (03CR) 10Matanya: [C: 031] decom Tampa: remove service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 (owner: 10Dzahn) [11:37:37] (03CR) 10Faidon Liambotis: [C: 032] Varnish: add ?download parameter to upload [operations/puppet] - 10https://gerrit.wikimedia.org/r/120617 (owner: 10Gilles) [11:57:34] !log Jenkins: updating jslint jobs to run a PHP based json linter, will bails out whenever json files are invalid. {{bug|58279}} {{gerrit|113958}} [11:57:40] Logged the message, Master [11:59:00] (03PS1) 10Matanya: glance: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/122353 [12:21:12] hashar: salt -G 'rolename:role::ci::*' grains.items [12:21:43] :-D [12:21:57] (to combine the answer how to get all grains, all the values of the grains AND use a wildcard on multiple roles with common name) [12:22:08] replied on list [12:22:08] got documented at https://wikitech.wikimedia.org/wiki/Salt :D [12:22:59] thx [12:24:22] yw, so many grains [12:24:47] we just need to make the role names follow common schema [12:25:15] and i'd say drop "role" from the value. key is called "rolename", no need to repeat "role" in the value [12:26:19] same as puppet class. it's already called system::role, no need to have "role::" in the name [12:26:47] that's why the syntax becomes "rolename:role::foo::bla" [13:01:11] hashar, you can't go :((( seems you killed the dns resolution for beta [13:15:24] (03PS1) 10Dzahn: fix bastion host system::roles [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 [13:17:20] (03PS2) 10Dzahn: fix bastion host system::roles [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 [13:22:51] (03PS3) 10Dzahn: include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 [13:25:40] (03CR) 10Dzahn: "originally just wanted to make the system::role consistent, then noticed just some bastions use the module" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 (owner: 10Dzahn) [13:28:23] (03CR) 10Dzahn: "duplicate of Change-Id: Ia24c0219c3b3ac3dc3b1f50c963f814b72302da1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122353 (owner: 10Matanya) [13:32:13] (03PS1) 10Ottomata: [WIP] Adding research posix group [operations/puppet] - 10https://gerrit.wikimedia.org/r/122401 [13:33:21] mutante: partial dup :) [13:33:44] you want me to abandon, or merge changes? [13:33:52] (03PS1) 10Hashar: beta: update deployment-cache-upload address [operations/puppet] - 10https://gerrit.wikimedia.org/r/122402 [13:34:29] !log fundraising-system full-stop for hardware repairs on queue server [13:34:35] Logged the message, Master [13:35:13] matanya: you can pick one:) [13:35:16] (03CR) 10Hashar: [C: 031 V: 032] "cherry picked on deployment prep puppet master deployment-salt.eqiad.wmflabs" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122402 (owner: 10Hashar) [13:35:55] ok, merge yours and ill rebase mutante fair? [13:39:50] ottomata: di you mail the remaining 6 stat1 account holders ? [13:40:20] *did [13:42:11] no [13:42:23] plan to ? [13:43:09] hadn't! but maybe I should! [13:45:07] PROBLEM - Host silicon is DOWN: PING CRITICAL - Packet loss = 100% [13:46:22] Jeff_Green: hey [13:46:24] ^ [13:46:49] mutante: yep, thanks. it's being rebooted for a disk replacement. we forgot to disable notification [13:47:21] good, that's the best answer:) [13:48:09] !log icinga alerts disabled for silicon [13:48:14] Logged the message, Master [13:48:22] matanya: emailed, [13:48:24] danke [13:48:30] ACKNOWLEDGEMENT - Host silicon is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn disk being replaced [13:48:39] thanks to you :) [13:50:17] RECOVERY - Host silicon is UP: PING OK - Packet loss = 0%, RTA = 1.08 ms [13:52:10] Jeff_Green:can you please to change the topic to reflect you are on the duty ? [13:52:17] -to [13:52:25] yep. [13:52:43] thanks [13:55:16] (03PS1) 10Dzahn: remove db67 from coredb,decom db64,db65,db66,db70 [operations/puppet] - 10https://gerrit.wikimedia.org/r/122406 [13:55:25] (03PS1) 10Hashar: beta: update deployment-cache-upload address [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122407 [13:55:45] (03CR) 10Hashar: [C: 032] beta: update deployment-cache-upload address [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122407 (owner: 10Hashar) [13:56:13] (03Merged) 10jenkins-bot: beta: update deployment-cache-upload address [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122407 (owner: 10Hashar) [13:58:17] (03CR) 10Dzahn: [C: 032] "51.17.68.10.in-addr.arpa domain name pointer deployment-cache-upload02.eqiad.wmflabs." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122402 (owner: 10Hashar) [14:02:08] (03CR) 10Dzahn: [C: 032] beta: point autoupdater to use /data/project [operations/puppet] - 10https://gerrit.wikimedia.org/r/121360 (owner: 10Hashar) [14:03:05] (03PS2) 10Dzahn: beta: bring in scap related scripts on bastion [operations/puppet] - 10https://gerrit.wikimedia.org/r/121365 (owner: 10Hashar) [14:03:14] (03CR) 10Dzahn: [C: 032] beta: bring in scap related scripts on bastion [operations/puppet] - 10https://gerrit.wikimedia.org/r/121365 (owner: 10Hashar) [14:03:41] mutante: do you plan on a dns patch for those db's remove? [14:05:21] matanya: if you wanna make it, go ahead:) [14:05:23] yes [14:05:29] ok, i will [14:05:33] thanks! [14:08:07] (03CR) 10Dzahn: [C: 032] role::parsoid::beta needs contint slave scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/120823 (owner: 10Hashar) [14:10:39] (03PS1) 10Matanya: db: remove 64-67 and 70 [operations/dns] - 10https://gerrit.wikimedia.org/r/122412 [14:11:10] springle: https://gerrit.wikimedia.org/r/#/c/122406/1 :) [14:11:15] how about db67 [14:12:36] I'm going to be messing with the logstash cluster for a bit. I'm upgrading the elasticsearch there to the latest 1.0.x version in our apt repo [14:13:25] !log Stopped logstash on logstash1001 [14:13:31] Logged the message, Master [14:16:07] PROBLEM - ElasticSearch health check on logstash1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.137 [14:16:17] PROBLEM - ElasticSearch health check on logstash1003 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.136 [14:16:27] PROBLEM - ElasticSearch health check on logstash1001 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.138 [14:16:31] !log Stopped elasticsearch on logstash100[123] [14:16:37] Logged the message, Master [14:17:21] (03CR) 10Andrew Bogott: [C: 031] retab role/glance [operations/puppet] - 10https://gerrit.wikimedia.org/r/121667 (owner: 10Dzahn) [14:17:34] bd808: I have switched beta cluster to EQIAD :-] [14:17:35] bd808: thanks for the heads up [14:18:03] hashar: yay!:) [14:18:09] Jeff_Green: Could you please abandon the old changes at https://gerrit.wikimedia.org/r/#/q/owner:%22Jgreen+%253Cjgreen%2540wikimedia.org%253E%22+status:open,n,z that I linked at https://meta.wikimedia.org/wiki/User_talk:Jgreen#Open_review_requests_at_Gerrit? Thanks! [14:18:30] !log beta cluster DNS entries migrated to point to the EQIAD datacenter. Keeping pmtpa instances around for a couple days [14:18:35] Logged the message, Master [14:19:31] (03CR) 10Andrew Bogott: [C: 032] retab role/glance [operations/puppet] - 10https://gerrit.wikimedia.org/r/121667 (owner: 10Dzahn) [14:19:42] :) [14:20:30] hashar: re: eqiad switch https://gerrit.wikimedia.org/r/#/q/project:operations/puppet+owner:hashar+status:merged+topic:beta,n,z [14:22:37] great [14:22:44] mutante: I am deploying them on a local puppet master :D [14:22:47] https://gerrit.wikimedia.org/r/#/c/122338/ :p [14:22:49] (03Abandoned) 10Jgreen: reenable otrs GenericAgent.pm [operations/puppet] - 10https://gerrit.wikimedia.org/r/78819 (owner: 10Jgreen) [14:23:01] (03PS2) 10Matanya: glance: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/122353 [14:23:05] !log Elasticsearch upgraded to 1.0.1 on logstash100[123] [14:23:08] andrewbogott: ^^ [14:23:10] Logged the message, Master [14:23:27] hashar: i remember you guys set that up recently, yea [14:23:39] mutante: yeah Bryan did :] [14:24:06] nice [14:24:45] (03Abandoned) 10Jgreen: puppetize otrs exim system_filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/77840 (owner: 10Jgreen) [14:27:46] (03Abandoned) 10Jgreen: Revert "more fighting with puppet re. exim variables" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77475 (owner: 10Jgreen) [14:28:53] !log Started logstash on logstash1001 [14:28:58] Logged the message, Master [14:29:21] the speed of gerrit! [14:30:17] Jeff_Green: its like the flash! [14:30:53] (03Abandoned) 10Jgreen: start puppetizing new OTRS host iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/77227 (owner: 10Jgreen) [14:32:19] it was fine until you guys started working .. hush : [14:32:21] :) [14:33:29] manybubbles: Some shards are still recovering, but it looks like the upgrade worked as expected. [14:33:35] (03CR) 10Andrew Bogott: [C: 032] glance: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/122353 (owner: 10Matanya) [14:33:36] sweet [14:33:40] bd808: I'm happy to hear that! [14:33:46] It'll take a few shards a while to recover [14:33:54] the primaries come much faster [14:35:56] (03Abandoned) 10Jgreen: Revert "try setting some variables in my.cnf for fundraising db" [operations/puppet] - 10https://gerrit.wikimedia.org/r/72157 (owner: 10Jgreen) [14:36:11] Jeff_Green: https://gerrit.wikimedia.org/r/#/c/122331/1 ? [14:36:56] mutante: accessing.. [14:36:59] accessing... [14:37:02] accessing... [14:37:04] * Jeff_Green dies [14:38:31] :..yea, confirmed it became slow again [14:39:06] and next reload.. much better [14:39:12] try this https://gerrit.wikimedia.org/r/#/c/122331/2/manifests/role/fundraising.pp [14:40:47] i wonder if gerrit would be happier with double the RAM [14:41:31] there is one single thing about that, $exim_signs_dkim/bounce_collectors must not be used in any if-statements and still work when they are boolean instead of string [14:41:56] i'd expect that the RAM option has been discussed before [14:42:02] but not sure [14:43:24] accessing...accessing.accessing. [14:44:20] re. $exim_* ok [14:46:49] andrewbogott: thanks for that merge. i have more labs infra related stuff anytime you feel like it:) [14:47:29] yea, gerrit slowness reports [14:48:00] mutante: merged [14:48:03] well [14:48:05] (03CR) 10Jgreen: [C: 032 V: 031] lint fundraising [operations/puppet] - 10https://gerrit.wikimedia.org/r/122331 (owner: 10Dzahn) [14:48:11] "Working..." anyway [14:48:15] Jeff_Green: :) yay [14:48:29] HM [14:48:32] 502’s? [14:48:51] now plz nobody make me look at gerrit any more today, it's a blackhole for time [14:48:55] Same as sjoerddebruin. Getting them quite a bit. [14:50:58] +1 [14:51:00] nlwiki is working, zeawiki not [14:54:05] bd808: https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Logstash%20cluster%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&st=1396277586&g=network_report&z=large [14:54:24] (03PS1) 10Hashar: gerrit: remove one replication to gallium [operations/puppet] - 10https://gerrit.wikimedia.org/r/122419 [14:54:24] * Jeff_Green is confused about the whether there is an Ops meeting today [14:55:50] mutante: That's expected. The shards are recovering after I upgraded the elasticsearch version this morning. It's close to done but will probably be high io/cpu for another 20 minutes or so. [14:56:11] mutante: Also, thanks for checking that and pinging me [14:56:19] (03PS1) 10Hashar: beta: update logstash url in monitor fatals [operations/puppet] - 10https://gerrit.wikimedia.org/r/122420 [14:56:55] mutante: another super simple change if you want https://gerrit.wikimedia.org/r/#/c/122420/ :D [14:57:48] bd808: yea, no worries, i just glanced at it because you announced work [14:58:15] sjoerddebruin: zea works for me [14:58:34] does anyone know about the 502 reports? [14:59:49] hashar_: slowness on gerrit .. or i'd be on it already [15:00:02] that just started around the time of switching beta ,fwiw [15:00:11] oh interesting [15:00:15] no idea if it can be related though [15:00:58] I dont think [15:01:12] well, hmm, intermittent slowness... [15:01:18] beta hits Gerrit via an automatic Jenkins job that fetch every six minutes. But I have created that job sometime last week [15:01:22] yea, unlikely [15:01:29] nods [15:02:00] cpu WIO is higher [15:02:50] (03CR) 10Dzahn: [C: 032] beta: update logstash url in monitor fatals [operations/puppet] - 10https://gerrit.wikimedia.org/r/122420 (owner: 10Hashar) [15:04:26] Jeff_Green: the fundraising change is still on master.. [15:04:35] i was about to merge with hashar's change then [15:06:04] eh, wait [15:06:12] Gerrit Code Review: Merge "lint fundraising" into production (2aabae4) [15:06:16] dzahn: lint fundraising (3808112) [15:06:16] is that normal? [15:07:30] mutante: ? I did the gerrit merge but I did not do the puppet-merge on palladium [15:07:47] sorry for any misinformation [15:08:27] RECOVERY - ElasticSearch health check on logstash1001 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 36: active_shards: 103: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [15:09:02] greg-g: {{done}} with logstash upgrade work. [15:09:07] RECOVERY - ElasticSearch health check on logstash1002 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 36: active_shards: 103: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [15:09:17] bd808: awesome, g'morning [15:09:17] RECOVERY - ElasticSearch health check on logstash1003 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 36: active_shards: 103: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [15:11:50] mutante: The network io graph for the logstash cluster dropped back to baseline after recovery as expected [15:12:39] Jeff_Green: done, no worries, i was bit confused about puppet-merge showing 2 lines for that one thing [15:13:05] bd808: confirmed:) [15:14:30] bd808: and the logstash url should have changed now [15:14:58] https://gerrit.wikimedia.org/r/#/c/122420/1/modules/beta/files/monitor_fatals.rb [15:15:43] The site says: "Logstash (see [[office:User:BDavis_(WMF)/logstash]])" [15:15:50] * bd808 nods [15:15:55] that reminds me of some change to unify the $realm message, hehe [15:16:18] I wish we had a better auth method for labs [15:16:19] the login message [15:16:40] (03CR) 10Hashar: "Rebased the labs puppet master." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122420 (owner: 10Hashar) [15:16:46] bd808: use the wmf LDAP group? [15:16:50] bd808: it's LDAP ? [15:16:54] yea, that [15:17:07] "a" group [15:17:12] bd808: known weird parse errors on the default logstash view? [15:21:00] hashar_, mutante: We shouldn't use ldap there because the apache that is checking is inside the labs environment and thus vulnerable to having the passwords entered captured. Ryan had an idea about adding openid/oauth to wikitech to allow strong auth inside labs. [15:21:25] greg-g: Just seeing that now. I'll check it out [15:22:09] * bd808 saw this in beta at some point [15:23:04] greg-g: Try reloading. [15:24:51] (03CR) 10Dzahn: [C: 032] contint: browsertests need ruby1.9.1-dev [operations/puppet] - 10https://gerrit.wikimedia.org/r/122346 (owner: 10Hashar) [15:29:10] bd808: I am not sure what would be the attack scenario [15:29:10] bd808: a possibility would be to use the shared proxy in front of labs [15:29:11] though it might be a labs instance as well [15:29:11] bd808: :) all gone [15:29:12] hashar_: Attach scenario is that anyone with sudo in the deployment-prep project could make a change to the apache config on deployment-logstash1 to log the http basic auth headers and capture a record of all ldap passwords used there. [15:29:12] http basic auth is trivially reversible to plain text. [15:29:12] (03CR) 10Greg Grossmeier: [C: 031] "+1'ing in spirit (but heed James' previous comment)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121892 (owner: 10Jforrester) [15:31:43] bd808: sounds great :-] [15:32:31] (03PS2) 10Dzahn: remove db67 from coredb,decom db64,db65,db66,db70 [operations/puppet] - 10https://gerrit.wikimedia.org/r/122406 [15:32:39] greg-g: The new version of elasticsearch and the version of kibana we are running are not getting along for adding filters to exclude/include based on the event type. I saw this in beta but sort of ignored it. I need to cehck and see if upgrading kibana fixes it (which seems likely). [15:32:52] (03CR) 10Legoktm: [C: 04-1] Use the BetaFeatures whitelist for production to avoid accidental deploys (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121892 (owner: 10Jforrester) [15:33:22] !log reenable fundraising services, reenable silicon icinga monitoring [15:33:27] Logged the message, Master [15:33:39] (03CR) 10Jforrester: Use the BetaFeatures whitelist for production to avoid accidental deploys (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121892 (owner: 10Jforrester) [15:33:57] bd808: gotcha [15:34:03] James_F: I don't follow [15:34:51] legoktm: the wg == wmg thing isn't pointful, and isn't always followed. I don't see a reason to waste yet more processing time on this practice here. [15:35:02] It won't work if you don't do that. [15:35:12] It won't? [15:35:15] Nope [15:35:18] How? [15:35:34] wgConf will extract the 'wgBetaFeaturesWhitelist' to $wgBetaFeaturesWhitelist [15:35:46] then require_once "$IP/extensions/BetaFeatures/BetaFeatures.php" [15:35:53] which will set $wgBetaFeaturesWhitelist = null [15:36:25] which is why you need to set it as a $wmg, and then set it after the extension is require_once'd [15:36:27] So the other ones work magically? [15:36:45] for anything in core it'll work magically. All extensions have to use wmg [15:36:54] Ah. Helpful. [15:36:55] * James_F sighs. [15:37:51] neat [15:38:20] yeah, it's annoying [15:38:42] git review -d 122333 ... Branch already exists - reusing .. Switched to branch "review/dzahn/120956" [15:38:46] good thing config db will (hopefully) fix this ;) [15:38:53] <- isnt't that the wrong branch? [15:39:12] (03PS3) 10Jforrester: Use the BetaFeatures whitelist for production to avoid accidental deploys [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121892 [15:39:21] legoktm: {{done}} [15:39:49] (03CR) 10Legoktm: [C: 031] Use the BetaFeatures whitelist for production to avoid accidental deploys [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121892 (owner: 10Jforrester) [15:44:00] Hi, someone just reported the following error message on Wikipedia: http://pastebin.com/JG01V7fJ -- relevant for you? [15:45:02] pajz: people might be poking at the search servers [15:45:09] bd808: manybubbles ^ [15:45:22] not those search servers [15:45:33] it's a search request [15:45:54] bd808 did the logstash ES servers, which is a separate cluster from the production search/cirrusearch cluster [15:46:00] hmmm [15:46:22] the search link in pastebin works for me [15:46:43] Is dewiki primary on cirrus or still using the old search? [15:46:49] no idea [15:47:20] (03PS1) 10Nuria: Adding new scheduler mode to wikimetrics. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122425 [15:47:38] lets see [15:47:51] * bd808 thinks cirrus is a beta feature there [15:49:04] bd808: it is [15:50:38] it is secondary [15:50:58] aude: ^^ . you can use cirrus to work around whatever issue lsearchd is having [15:51:02] it seems to be working for me though [15:51:10] lsearchd, that is [15:52:41] i suppose no issue, unless it happens again / it's reported again / etc [15:52:52] just some temporary glitch [15:53:38] pajz: thanks for the report, probably temporary, let us know if it happens more regularly [15:53:48] good, thanks. [15:55:32] (03PS2) 10Dzahn: lint role/gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/122333 [15:56:06] (03CR) 10Manybubbles: [C: 031] remove more Tampa search remnants [operations/dns] - 10https://gerrit.wikimedia.org/r/122350 (owner: 10Dzahn) [16:04:46] (03PS1) 10Dzahn: role/gerrit: single quoted string containing a variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/122426 [16:06:23] greg-g: I just tested the latest upstream kibana in beta and it fixes the filtering issue. Since I still have some time in my deploy window can I get a green light to upgrade production as well? [16:06:41] It's a git-deploy operation [16:07:08] (03PS3) 10Dzahn: lint role/gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/122333 [16:07:16] bd808: yessir [16:08:07] (03CR) 10Dzahn: [C: 032] remove more Tampa search remnants [operations/dns] - 10https://gerrit.wikimedia.org/r/122350 (owner: 10Dzahn) [16:08:43] !log DNS update - remove tampa search pools (a second time, was duplicate, heh) [16:08:49] Logged the message, Master [16:11:07] !log Upgraded kibana on logstash cluster to e317bc663495d0172339a4d4ace9c2a580ceed45 [16:11:12] Logged the message, Master [16:13:06] greg-g: {{done}} and filtering by type seems to be fixed. https://logstash.wikimedia.org/#/dashboard/elasticsearch/exceptionmonitor works again now. [16:13:57] (03PS6) 10Dzahn: decom Tampa: remove service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 [16:14:57] bd808: really? " Oops! QueryParsingException[[logstash-2014.03.31] No query registered for [field]]" [16:15:37] greg-g: You may need to force reload logstash.wikimedia.org to get latest javascript. [16:16:07] I thought I did... [16:16:18] * bd808 tries [16:16:38] (03CR) 10Dzahn: [C: 031] "split out of Change-Id: If76480fb861abb9e872df89abac6d0857205f07a" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122426 (owner: 10Dzahn) [16:16:52] bd808: fourth times a charm [16:17:17] greg-g: good. There may be varnish cache in the way as well I suppose [16:18:08] (03CR) 10Alexandros Kosiaris: [C: 032] role/gerrit: single quoted string containing a variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/122426 (owner: 10Dzahn) [16:31:39] (03CR) 10Aaron Schulz: [C: 032] Added "downloadpdf" pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/119220 (owner: 10Aaron Schulz) [16:31:49] (03CR) 10jenkins-bot: [V: 04-1] Added "downloadpdf" pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/119220 (owner: 10Aaron Schulz) [16:35:22] (03PS2) 10Aaron Schulz: Added "downloadpdf" pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/119220 [16:35:36] (03CR) 10Aaron Schulz: [C: 032] Added "downloadpdf" pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/119220 (owner: 10Aaron Schulz) [16:35:44] (03Merged) 10jenkins-bot: Added "downloadpdf" pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/119220 (owner: 10Aaron Schulz) [16:36:27] (03CR) 10QChris: [C: 031] "I do not know of other parts using that replication target." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122419 (owner: 10Hashar) [16:37:27] !log aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Added "downloadpdf" pool counter config' [16:37:33] Logged the message, Master [16:40:22] (03PS4) 10Dzahn: lint role/gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/122333 [16:44:15] (03CR) 10Dzahn: [C: 04-1] "sorry, rebasing hell now" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122333 (owner: 10Dzahn) [16:44:48] (03CR) 10Matanya: "as said before, must check callers for unquote bools, to verify there is not string/bool comparison." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122333 (owner: 10Dzahn) [16:45:39] (03Abandoned) 10Dzahn: lint role/gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/122333 (owner: 10Dzahn) [16:47:37] (03CR) 10Matanya: [C: 031] decom Tampa: remove service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 (owner: 10Dzahn) [16:47:59] (03CR) 10Dzahn: "arr.. "mid-air collision". just saw new comment" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122333 (owner: 10Dzahn) [16:54:14] (03CR) 10Dzahn: [WIP] Adding research posix group (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122401 (owner: 10Ottomata) [17:12:06] does somebody already said that gerrit is so slow? :P I don't want to have time to brew coffee when I'm only pulling changes from git.. [17:23:46] Reedy, hi, around? [17:24:47] yup [17:24:53] just about to not be for an hour or so [17:35:55] (03PS1) 10Hashar: beta: lower # of procs on jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/122436 [17:36:06] (03PS1) 10Aaron Schulz: Added "downloadtiff" pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122437 [17:37:30] the beta cluster could really need less jobrunner procs being spawned please :D https://gerrit.wikimedia.org/r/122436 [17:37:45] basically cut by more than half the # of jobs being started on beta [17:40:27] (03PS1) 10QChris: Revert "role/gerrit: single quoted string containing a variable" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122438 [17:42:31] (03CR) 10QChris: "Examples of ${name} use can be found for example on the" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122438 (owner: 10QChris) [17:42:36] (03CR) 10Alexandros Kosiaris: [C: 032] "OK, this is something we were not aware of. Merging this and will comment this to avoid similar problems in the future." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122438 (owner: 10QChris) [17:46:54] (03PS1) 10Alexandros Kosiaris: Comment gerrit's usage of ${name} [operations/puppet] - 10https://gerrit.wikimedia.org/r/122439 [17:48:22] (03CR) 10QChris: [C: 031] Comment gerrit's usage of ${name} [operations/puppet] - 10https://gerrit.wikimedia.org/r/122439 (owner: 10Alexandros Kosiaris) [17:48:53] (03CR) 10QChris: "Thanks :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122439 (owner: 10Alexandros Kosiaris) [17:50:16] (03CR) 10Dzahn: "thanks! But while this explains why things worked, isn't it still a lint thing to be fixed? puppet-lint even calls it an ERROR, not just a" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122438 (owner: 10QChris) [17:51:50] (03CR) 10Dzahn: "ah, i just saw on Change-Id: If732fcb1f8ea74ae4ceddd3cf53e1c0c3e64bc33" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122438 (owner: 10QChris) [17:53:25] qchris_meeting: akosiaris ... dduh :) at least i had it separated from the other lint :p [17:53:38] of course puppet-lint doesn't get this:) [17:54:21] yeah. anyway at least now it is documented [17:54:57] yes, thanks [18:00:05] (03PS2) 10Alexandros Kosiaris: beta: lower # of procs on jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/122436 (owner: 10Hashar) [18:03:25] Reedy, bummer, missed your reply, let me know when would be a good time to work on the zero portal wiki [18:10:19] (03CR) 10Alexandros Kosiaris: [C: 032] Comment gerrit's usage of ${name} [operations/puppet] - 10https://gerrit.wikimedia.org/r/122439 (owner: 10Alexandros Kosiaris) [18:16:32] akosiaris, mutante: Sorry for being terse on gerrit (I suck at multitasking). And thanks for the quick revert. [18:16:35] A saw the change to add comments around the use of ${name}. Thanks! [18:16:36] akosiaris: https://rt.wikimedia.org/Ticket/Display.html?id=7133 would help unfuck beta a lot, much love if you can get to that somehow [18:16:38] Next to that ... is there anything I could do to make the linter more happy about the file? [18:23:02] rt testing is now updated to fe776d0b [18:23:14] eh, wrong channel [18:31:18] <^d> akosiaris: Heh, ${name} look confusing? [18:31:19] <^d> :) [18:34:52] (03CR) 10Krinkle: [C: 031] gerrit: remove one replication to gallium [operations/puppet] - 10https://gerrit.wikimedia.org/r/122419 (owner: 10Hashar) [18:37:08] yurik: Here now [18:37:31] Reedy, hey, missed you last week :) [18:37:42] (03CR) 10Chad: [C: 031] "lgtm, less replication++. Just needs an ops merge and me or Christian can restart the replication plugin to pick up config." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122419 (owner: 10Hashar) [18:37:58] bah [18:38:01] gerrit is watchmouse paging [18:38:13] <^d> gerrit's UI is being laggy again [18:38:26] have you tried turning it off and on again? [18:38:40] ^d: just checkign there is nothign that ops can do for this right now right? [18:38:45] Reedy, https://gerrit.wikimedia.org/r/#/c/119985/ - when do you think we can start deploying it? [18:38:46] or should one of us be poking it? [18:38:56] (wasnt sure since we are in our ops meeting right now) [18:39:12] <^d> Nothing anyone can do really. [18:39:16] <^d> Ops or otherwise. [18:39:19] Reedy, followed by https://gerrit.wikimedia.org/r/#/c/119990/ i guess (i don't have any -1 to it, just a few clarification comments) [18:39:46] ^d: cool, i just wanted to find out before folks left meeting to find out. thanks =] [18:40:02] yurik: Mostly just need to find someone in ops to do it - review, merge, deploy, graceful [18:40:12] greg-g, can i push out an update to static resources on bits (firefox OS update) [18:41:13] Reedy, ok, will bug ops than. What do you think about my comments in 119990 ? [18:41:32] are you waiting for apache to do mw-config? [18:41:40] or should we wait the other way around? [18:41:43] I thought the latter [18:41:47] apache first is fine [18:41:59] it'll just point to the missing wiki stuff [18:42:08] at worst, there's a couple of purges to do via purgeList.php [18:42:09] yurik: yep yep [18:42:23] greg-g, when's a good time? [18:42:47] Reedy, will +2 of the config auto-create zero DB ? [18:42:57] nope [18:42:59] needs doing manually [18:43:08] only takes a few minutes [18:43:09] yurik: for that, anytime today [18:43:24] greg-g, thx, will push it out in a bit [18:45:02] kk [18:49:55] ottomata: are you aware of the an1010 disk warning failure? [18:50:09] nope [18:50:28] you should check icinga more often :) [18:50:31] ah warning [18:50:37] i click on the critical one occasionally [18:50:46] yeah [19:22:51] (03PS1) 10Ottomata: Periodically eleting old fairscheduler event logs [operations/puppet] - 10https://gerrit.wikimedia.org/r/122460 [19:22:58] (03PS1) 10Yurik: Updated Firefox OS App [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122461 [19:23:03] (03PS2) 10Ottomata: Periodically deleting old fairscheduler event logs [operations/puppet] - 10https://gerrit.wikimedia.org/r/122460 [19:28:54] (03CR) 10Yurik: [C: 032 V: 032] Updated Firefox OS App [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122461 (owner: 10Yurik) [19:31:21] !log yurik synchronized docroot/bits/WikipediaMobileFirefoxOS/ [19:31:26] Logged the message, Master [19:31:33] dr0ptp4kt, ^ [19:31:51] yurik, thx, will take a peek [19:37:28] (03CR) 10Ottomata: [C: 032 V: 032] Periodically deleting old fairscheduler event logs [operations/puppet] - 10https://gerrit.wikimedia.org/r/122460 (owner: 10Ottomata) [19:39:10] yurik, looks like it deployed cleanly. thanks! [19:39:34] dr0ptp4kt, np. Are you updating the about screen? still shows 321 :) [19:39:54] it shows 3.2.7 for me now. you may need to uninstall, then freshly install [19:40:00] paravoid, will you be the one updating apache configs? [19:40:26] or who usually deals with that? [19:40:53] yurik, beware the simulator, too. you may actually have to completely uninstall the simulator, then reinstall the simulator. there's a glitch with the simulator where it's obviously retaining old files even theough the appcache manifest clearly says to get the new files [19:42:08] (03PS2) 10Ottomata: Puppetizing Camus cronjob [operations/puppet] - 10https://gerrit.wikimedia.org/r/121546 [19:43:14] (03PS3) 10Hashar: beta: lower # of procs on jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/122436 [19:58:10] (03PS1) 10Matanya: access: revoke Leslie Carr [operations/puppet] - 10https://gerrit.wikimedia.org/r/122464 [19:58:52] (03CR) 10Matanya: "make sure to revoke the key in private repo too before merging!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122464 (owner: 10Matanya) [20:00:27] (03CR) 10Chad: [C: 031] beta: lower # of procs on jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/122436 (owner: 10Hashar) [20:02:46] Jeff_Green or RobH one of you mind doing that one ^^ please? [20:03:46] (03CR) 10RobH: [C: 032] access: revoke Leslie Carr [operations/puppet] - 10https://gerrit.wikimedia.org/r/122464 (owner: 10Matanya) [20:04:31] RobH: you removed the key from private? didn't get a mail about that [20:04:44] i'm doing right now actually =] [20:04:56] merged public one first [20:04:56] thanks [20:06:16] thx for patchset [20:06:31] np [20:06:53] ticket updated [20:07:36] steve has access? [20:07:55] oh, found it [20:07:57] pushing [20:11:06] (03PS1) 10Matanya: access: revoke steve bernardin [operations/puppet] - 10https://gerrit.wikimedia.org/r/122468 [20:11:15] this one too RobH ^ [20:11:46] (03PS2) 10Matanya: access: revoke steve bernardin [operations/puppet] - 10https://gerrit.wikimedia.org/r/122468 [20:15:05] hrmm [20:15:17] i dont think you can pull the entire class or it wont know where to pull those keys right? [20:15:25] plus we'll use the class later, so easier to leave as reference [20:15:33] (we'll be hiring a new dc tech for the new site) [20:15:45] misleading since he was only one in it [20:15:59] you think we need the class? [20:16:39] (03CR) 10RobH: access: revoke steve bernardin (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122468 (owner: 10Matanya) [20:16:43] RobH: i can put it back, but it was not called from anywhere and not seem to be used [20:16:54] well, something calls it to put his keys on systems [20:17:00] i think.... [20:17:08] but yea, lets leave it like we remove access for other users [20:17:13] something == ? [20:17:38] the dc tech class gives sudo rights for specific raid utilities [20:17:51] so a non root dc tech can do some root level disk diagnosis [20:18:19] so it also should give rights to those boxes where its needed. at least that was it's original intended task and it worked afaik [20:18:22] ok, putting back [20:18:42] i was never in the group though, chris was when first hired [20:18:45] and then steve [20:21:42] gerrit is sloooow [20:22:15] (03PS9) 10BryanDavis: Manage scap proxy rsync config in puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/119677 (owner: 10Reedy) [20:23:48] (03PS3) 10Matanya: access: revoke steve bernardin [operations/puppet] - 10https://gerrit.wikimedia.org/r/122468 [20:24:09] (03CR) 10BryanDavis: "Updated with changes pointed out by Faidon. Responded to question about inclusion of 208.80.152.0/22 CIDR range with inline comment." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119677 (owner: 10Reedy) [20:24:27] now it is ready RobH [20:25:18] ahh firefox, why you hate gerrit so [20:25:32] (03CR) 10RobH: [C: 032] access: revoke steve bernardin [operations/puppet] - 10https://gerrit.wikimedia.org/r/122468 (owner: 10Matanya) [20:25:42] bah, hurry up zuul [20:26:30] does he have key in private too? [20:26:37] nope [20:26:50] he never was full root [20:27:03] (and i checked) [20:27:24] ok, enough for today, i guess. [20:27:29] thanks a lot [20:28:20] thanks for doin the puppet work =] [20:28:30] hrmm, its lunch time. [20:36:18] (03PS1) 10Kaldari: Add 'upload' custom debug group to wgDebugLogGroups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122549 [20:49:06] (03CR) 10Hashar: "Deployed on beta cluster puppet master on deployment-salt.eqiad.wmflabs and applied on deployment-jobrunner01.eqiad.wmflabs." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122436 (owner: 10Hashar) [20:49:14] (03CR) 10Hashar: [C: 031 V: 032] beta: lower # of procs on jobrunner [operations/puppet] - 10https://gerrit.wikimedia.org/r/122436 (owner: 10Hashar) [20:58:54] (03PS5) 10BryanDavis: Add mw1161 and mw1201 as scap proxies for EQIAD row C and D [operations/puppet] - 10https://gerrit.wikimedia.org/r/119686 (owner: 10Reedy) [20:59:48] ^d: Do you have access to the apache logs on gerrit? Since gerrit is getting slow again and again around this time of day ... can it be that apache's connection limit is too low? [21:00:43] qchris: isn't gerrit slow all day today? [21:00:55] <^d> qchris: Yes I do. [21:01:11] se4598: It alwoys only gats slow for me in the UTC evenings. [21:01:39] (03PS1) 10Hashar: beta: send Parsoid log to shared dir [operations/puppet] - 10https://gerrit.wikimedia.org/r/122561 [21:01:39] ^d: Could you have a look on the number of connections we're getting more or less simultaneously? [21:01:53] (03PS2) 10Hashar: beta: send Parsoid log to shared dir [operations/puppet] - 10https://gerrit.wikimedia.org/r/122561 [21:02:00] today is slower as usual, took me around 10 min to fetch/pull git changes for mw core from gerrit [21:02:13] se4598: :-( [21:02:20] and that was approx. 6 hours ago [21:02:52] before the usual more slower time range [21:03:04] <^d> qchris: 2-3 requests/sec, on avg. [21:03:14] <^d> Bursts of more, periods of less. [21:03:19] Ok. [21:03:30] So we're nowhere close to hitting the 25 req/second. [21:03:43] Strange. [21:03:51] Do we have any clue where the bottleneck in? [21:03:54] s/in/is/? [21:03:58] <^d> Offhand no. [21:04:33] Do we wait for Phabricator to magically solve all our problems or do we wont to fix gerrit before? [21:04:39] s/wont/want/ [21:05:12] <^d> qchris: I wouldn't mind phabricator. I wouldn't mind a band-aid to make this stop happening with gerrit too. [21:05:17] <^d> If I knew wtf was going on [21:06:55] (03CR) 10Faidon Liambotis: [C: 032] Add zerowiki [operations/apache-config] - 10https://gerrit.wikimedia.org/r/119985 (owner: 10Reedy) [21:06:58] Mhmm. Ok. [21:07:26] (03PS3) 10Hashar: beta: send Parsoid log to shared dir [operations/puppet] - 10https://gerrit.wikimedia.org/r/122561 [21:12:27] (03CR) 10PiRSquared17: "This can be merged now." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113656 (owner: 10Gerrit Patch Uploader) [21:13:34] (03PS10) 10BryanDavis: Manage scap proxy rsync config in puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/119677 (owner: 10Reedy) [21:13:47] (03PS1) 10Hashar: parsoid: logrotated file is now a parameter [operations/puppet] - 10https://gerrit.wikimedia.org/r/122564 [21:14:52] (03CR) 10BryanDavis: "I actually didn't see this commit in the chain and made these same changes in the updates I made to Iba7f8dc." [operations/puppet] - 10https://gerrit.wikimedia.org/r/119668 (owner: 10Reedy) [21:17:53] (03PS5) 10Mattflaschen: Have Commons on Beta Labs use $stdlogo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 [21:26:39] (03CR) 10Mattflaschen: "> I need to move File:Commons-Beta-logo.svg to File:Wiki.svg (or creating a Wiki.png?)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 (owner: 10Mattflaschen) [21:27:18] (03CR) 10Mattflaschen: "To clarify, an image would need to be uploaded there." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 (owner: 10Mattflaschen) [21:28:59] (03CR) 10Hashar: [C: 031 V: 032] "Deployed on beta cluster puppet master on deployment-salt.eqiad.wmflabs and applied on deployment-jobrunner01.eqiad.wmflabs." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122561 (owner: 10Hashar) [21:29:20] (03CR) 10Hashar: [C: 031] "Deployed on beta cluster puppet master on deployment-salt.eqiad.wmflabs and applied on deployment-jobrunner01.eqiad.wmflabs." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122564 (owner: 10Hashar) [21:30:29] (03PS1) 10Chad: Lower search suggestions to reasonable values [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122571 [21:48:11] Reedy: can you please clarify if you need access to stat1 ? [22:00:12] (03PS2) 10Yurik: Add zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/119990 (owner: 10Reedy) [22:01:11] (03CR) 10Yurik: "PS2 was a rebase" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/119990 (owner: 10Reedy) [22:01:45] Reedy, rebased ^ [22:26:46] (03PS1) 10Yurik: Added zerowiki to $private_wikis [operations/puppet] - 10https://gerrit.wikimedia.org/r/122588 [22:31:03] ops, please +2 ^ before creating the new zero portal db [22:31:14] (i was looking at instructions at https://wikitech.wikimedia.org/wiki/Add_a_wiki#IMPORTANT:_For_Private_Wikis [22:38:52] (03PS1) 10BryanDavis: Beta: set math storage directory to NFS share [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122599 [22:41:22] ]Hm.. running mwgrep is giving me strange results. I've ran a search 20 minutes ago and gave me 1 result (which seemed plausible), then I run it again now and there's 20. Run it again a minute later and there's 12, then 17 then 19. [22:41:23] tin$ mwgrep user.anonymous [22:41:35] Some kind of staleness? [22:41:57] These results shouldn't change that much (the pages in the results were not recently edited) [22:42:04] different slaves maybe [22:42:07] ori: [22:45:17] RobH: me again [22:45:35] ? [22:47:26] tickets https://rt.wikimedia.org/Ticket/Display.html?id=7130 and https://rt.wikimedia.org/Ticket/Display.html?id=7131 should be resolved and a follow up ticket in tampa created [22:49:01] RobH: ^ [22:51:06] Krinkle: hm, dunno [22:51:21] what sort of consistency guarantees are there for the elasticsearch cluster [22:51:51] Maybe slave lag, but seems like that isn't the case. These pages weren't recenetly modified and it keeps changing [22:52:00] They aren't subtle differences in sorting either [22:52:07] ^d or manybubbles would know [22:52:28] These are radically different results, like pages that should be there not being there, on a result set of < 100 (so it's not rank that could cause it I think) [22:53:47] TimStarling: can you please look at https://gerrit.wikimedia.org/r/#/c/74591/ [22:53:49] Krinkle: it could be the timeout setting [22:53:58] if it is applied per ES instance [22:54:05] maybe some nodes hit the timeout, and some do not [22:54:06] and explain where/why this cron job disappered ? [22:54:12] but if you get your results in under 30s, then that's not it [22:58:06] <^d> ori: Hmm? [22:58:31] ^d: mwgrep (the small script that searches .js/.css pages in elastic for a string) is returning inconsistent results [22:58:42] (03PS3) 10Yurik: Add zerowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/119990 (owner: 10Reedy) [22:58:58] <^d> We've actually got a bug at the moment with JS pages. [22:59:04] <^d> Haven't run it down yet. [22:59:13] `tin$ mwgrep user.anonymous` keeps changing from 1, or even 0, to anywhere between 16 and 25 results [22:59:37] <^d> Now *that's* weird. [22:59:46] I've had every number between 16 and 25 in the last 10 minutes [22:59:55] and I'm pretty confident that no relevant edits have been made [23:00:03] matanya: that sounds like a question for Reedy [23:00:08] it doesn't seem to be settling on any number (so not likely to be lag related) [23:00:35] thanks TimStarling but he pointed at you :) [23:00:57] I see [23:01:13] We don't know if it actually ever existed [23:01:58] so please TimStarling and Reedy can you please clarify this for ops ? [23:02:07] * ori checks the SWAT schedule [23:02:23] matanya: I don't know why we care so much [23:02:25] Reedy, was this missing? https://gerrit.wikimedia.org/r/#/c/122588/ [23:02:26] Just decomission hume [23:03:00] <^d> Krinkle: I'm curious what you get if you skip LVS and don't get routed to $semiRandomBox. [23:03:04] !log Starting SWAT deploy window. [23:03:04] Reedy: mutante|away doesn't want to apply puppet code that didn't ever run [23:03:10] Logged the message, Master [23:03:16] <^d> Try hitting each of the elasticsearch boxes individually with that query and see what sort of results you get. [23:03:30] <^d> (Curious if they're consistent on an individual node) [23:04:07] yurik: Seemingly. legalteamwiki is missing too [23:04:11] ah, so you just want me to merge that change? [23:04:27] TimStarling: if you are bold, please do :) [23:04:27] it sounded like you were asking me a question [23:04:43] (03CR) 10Ori.livneh: [C: 032] Add 'upload' custom debug group to wgDebugLogGroups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122549 (owner: 10Kaldari) [23:04:46] Tim? Bold? never. :) [23:04:51] (03Merged) 10jenkins-bot: Add 'upload' custom debug group to wgDebugLogGroups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122549 (owner: 10Kaldari) [23:04:54] "can you review this change" is a valid question [23:04:55] i was TimStarling , where is that cronjob coming from? [23:05:03] !log ori updated /a/common to {{Gerrit|I532f8ee7c}}: Add 'upload' custom debug group to wgDebugLogGroups [23:05:08] Logged the message, Master [23:05:28] what do you mean? [23:05:43] !log ori synchronized wmf-config/InitialiseSettings.php 'I532f8ee7c: Add "upload" custom debug group to $wgDebugLogGroups' [23:05:44] puppet shows that check was on hume [23:05:49] Logged the message, Master [23:05:54] but there is no such cron on hume [23:05:55] matanya: it may have simply never been puppetized, if that is what you mean [23:05:56] Reedy, so whom should i bug to get these patches in? [23:06:20] might be ori, but someone from platform should know [23:06:48] and since Reedy said TimStarling and TimStarling said Reedy i'm clueless [23:07:14] Reedy, zero biz-devs have been asking about it for the past 3 months, need to make them happy :) [23:07:37] yurik: never use that term again [23:07:50] greg-g, "happy" or "biz-devs"? :) [23:07:57] yurik: biz-dev :) [23:08:15] (03PS11) 10Tim Starling: Properly puppeti[sz]e purge-checkuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/74591 (owner: 10Reedy) [23:08:18] greg-g, i think that's what they call themselves though :( [23:08:28] suggested alternatives? [23:08:30] MaxSem: hello [23:08:31] (03CR) 10Tim Starling: [C: 032] Properly puppeti[sz]e purge-checkuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/74591 (owner: 10Reedy) [23:08:37] ori, hey [23:08:56] for the extension updates for the SWAT window, (a) do i need to scap? (b) if not, in which order should i sync the extensions? [23:09:22] ori, a) no b) doesn't matter [23:09:32] yurik: I know, I'm just making fun of it [23:09:38] MaxSem: Cool, thanks. [23:09:59] TimStarling, https://gerrit.wikimedia.org/r/#/c/109853/ pls pls? [23:10:19] <^d> Reedy: I think your comment is right though about it being the last thing on hume. [23:10:31] does jenkins merge puppet changes? [23:10:35] TimStarling: no [23:10:43] (03CR) 10Tim Starling: [V: 032] Properly puppeti[sz]e purge-checkuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/74591 (owner: 10Reedy) [23:10:51] it's been a while since I've done this [23:11:03] you also need to run puppet-merge on saltmaster [23:11:07] i can do that, if you like [23:11:31] ori: jenkins does it [23:11:33] no need [23:11:46] jenkins doesn't run puppet-merge [23:12:04] nvm [23:12:16] yeah, I've done it a few times since puppet-merge was brought in [23:13:32] ori: i meant for the verify, but meh [23:15:42] so you're saying jenkins in fact does merge puppet changes? [23:15:58] !log ori synchronized php-1.23wmf20/extensions 'I6f0f1b18d: Update MobileFrontend, PageImages and TextExtracts for bug 63248' [23:16:04] Logged the message, Master [23:16:08] obviously not on palladium, that would be too scary to contemplate [23:16:16] I was just asking about gerrit [23:16:31] ori, you're taking the swat today? [23:16:40] (I just saw the time) [23:17:05] mwalker: yep, almost done [23:17:24] TimStarling: I was replying re: gerrit. matanya, is that wrong? does Jenkins verify operations/puppet changes now? [23:17:39] MaxSem: can you confirm it looks OK on wmf20? [23:17:42] before I do 19 [23:17:43] it verifies ori [23:18:16] matanya: and merge? [23:18:21] *s [23:18:27] by hand [23:18:29] ori, waiting for RL to update [23:18:57] right, so you still need to 'submit' on gerrit [23:19:17] MaxSem: OK, let me know. [23:20:39] ori, works now [23:20:47] OK, so I'll sync 19 as well. [23:21:24] !log ori synchronized php-1.23wmf19/extensions 'I6f0f1b18d: Update MobileFrontend, PageImages and TextExtracts for bug 63248' [23:21:29] Logged the message, Master [23:22:27] !log End of SWAT deploy. [23:22:33] Logged the message, Master [23:22:42] ty ori [23:23:17] my pleasure [23:24:37] (03PS1) 10Matanya: decom: hume [operations/puppet] - 10https://gerrit.wikimedia.org/r/122605 [23:24:44] finally [23:24:56] ori, thanks! [23:25:33] <^d> We shouldn't decom hume without warning. [23:25:37] <^d> Lotsa stuff ran on it. [23:25:47] <^d> Should give people $someTime to make sure they're not losing stuff. [23:26:52] ^d: the ticket says it is empty, but i'll be happy if you mail all eng and platform :) [23:28:53] (03PS1) 10Springle: S7 depool db1034 for upgrade to MariaDB 5.5.36 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122608 [23:29:27] (03CR) 10Springle: [C: 032] S7 depool db1034 for upgrade to MariaDB 5.5.36 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122608 (owner: 10Springle) [23:29:34] (03Merged) 10jenkins-bot: S7 depool db1034 for upgrade to MariaDB 5.5.36 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122608 (owner: 10Springle) [23:30:28] !log springle synchronized wmf-config/db-eqiad.php 's7 depool db1034 for upgrade' [23:30:29] <^d> matanya: Sent to engineering/ops. [23:30:33] Logged the message, Master [23:30:46] thanks a lot d^ [23:32:29] (03PS1) 10Matanya: hume: decom, left mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/122609 [23:33:42] mutante|away: quick!! [23:34:05] Reedy: still need access to stat1? [23:35:05] (03PS1) 10Reedy: Add legalteamwiki to private_wikis [operations/puppet] - 10https://gerrit.wikimedia.org/r/122610 [23:36:32] trigger happy, nice term :) [23:37:29] poor hume [23:37:33] it was a workhorse [23:37:38] and was reliable as hell [23:38:09] is it a dell 310 ? [23:38:26] nah wayyyy older [23:38:34] old 2950 iirc [23:38:55] good old servers [23:38:56] heh, yep, purchase 2008-02-02 [23:41:23] uptime : 2250 days [23:42:17] ori: What's SWAT? [23:42:19] it was always running jobs, never wanted to do maint window [23:42:37] PROBLEM - DPKG on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:43:37] RECOVERY - DPKG on db1047 is OK: All packages OK [23:44:06] RobH: any comments ? https://etherpad.wikimedia.org/p/what_is_left_in_tampa [23:44:38] isnt this tracked elsewhre? [23:45:10] RT [23:45:24] oh well, i dunno what it means about racktables id [23:45:30] " (not sure if RobH cares about you providing racktables ID, because you can't " [23:45:36] i have no idea what that means. [23:45:39] ;] [23:45:53] Gloria: https://wikitech.wikimedia.org/wiki/SWAT_deploys [23:45:57] for a ticket in the tampa queue [23:46:24] RobH: if you close those two, they need a follow up ticket in tampa queue [23:46:55] i closed them earlier when you asked me to [23:46:59] and linked new followup tickets [23:47:00] ori: Fascinating. [23:47:06] but i dont get the comment about racktables id [23:47:12] is what im sayin [23:48:00] RobH: in decom tickets you provide the place in rack, right? [23:48:07] no [23:48:12] i was for steve [23:48:15] but steve isnt doing them [23:48:33] i trust myself or cmjohnson1 to find the right server and confirm the tag/asset tag and ensure we dont wipe wrong thing =] [23:48:47] so yea, that makes sense now [23:48:55] but yea, we dont need to do that for stuff right now [23:49:06] (when we hire a new DC tech we will be doing it for that site for the first few months) [23:49:18] so since i don't have access there anyway, that is the base for that comment [23:49:25] where i say server X is in rack Y in slot Z [23:49:27] cool [23:49:31] makes sense now [23:50:15] ok, those two done [23:50:22] very productive night [23:51:08] i added some notes [23:51:12] to that pad [23:52:34] i see, thanks, that clears stuff [23:55:20] RobH: i'll sent it to the list and see what comments come up [23:56:23] yea killing the ganglia aggregator for tampa misc while there are other misc servers isnt gonna be good idea [23:56:30] the rest are pretty on point already