[00:00:50] (03PS11) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [00:04:46] TimStarling: have you considering using --bwlimit with no -F limit? [00:06:12] there could be other limitations, like memory usage [00:06:16] it's otherwise annoying to handle server-specific scap tasks that take a bit of time [00:06:53] if you run sync-common manually on a single server, presumably you wouldn't want --bwlimit, right? [00:07:47] it could be an argument...would be easier if it wasn't bash [00:08:33] feel free to run it with high fanout to see what happens [00:10:26] TimStarling: F60? ;) [00:11:38] well, wc -l mediawiki-installation is 431 [00:12:05] so -F108 would let you do it in 4 batches... [00:12:27] -F216 in two batches... [00:13:22] what I would want to know is: [00:13:38] * what fanout do you need to saturate the network [00:13:40] (03PS12) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [00:13:51] * what fanout do you need to saturate the proxy CPU [00:15:46] <^d> AaronSchulz: Do abandoned jobs keep things from claiming new ones to run? [00:16:00] no [00:16:16] <^d> http://p.defau.lt/?k4D3qKJTqm1fTJ7bHghDYw - current jobs for enwiki [00:16:59] (03PS13) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [00:17:24] TimStarling: for later: http://www.psc.edu/index.php/hpn-ssh (""OpenSSH is network performance limited by statically defined internal flow control buffers. These buffers often end up acting as a bottleneck for network throughput of SCP, especially on long and high bandwith network links. [00:17:25] "Modifying the ssh code to allow the buffers to be defined at run time eliminates this bottleneck. We have created a patch that will remove the bottlenecks in OpenSSH and is fully interoperable with other servers and clients." [00:17:54] ori: I think I just finished addressing yours and faidon's comments; so if you (when you have a chance) take a peek at https://gerrit.wikimedia.org/r/102352 to see what other silly stuff I did; it would be much appreciated [00:18:20] mwalker: ok, looking [00:21:02] TimStarling: I found out about it via "High Performance Bulk Data Transfer", a presentation from the Lawrence Berkeley National Laboratory in the US. It has some useful info about rsync's limitations and what to do about them. [00:21:10] er, link: http://fasterdata.es.net/assets/fasterdata/JT-201010.pdf [00:26:26] mwalker: Could not find template 'ocg/mw-ocg-service.js.erb' [00:26:34] it's mw-ocg-frontend.js.erb [00:27:06] and this is why I shouldn't change my mind about what to call things 15 times [00:27:35] yes, change your mind either 14 or 16 times, but not 15 [00:27:46] hehe; the evenness is important :) [00:28:22] (03PS14) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [00:28:29] ^ see, even patch number! :p [00:32:46] !log kaldari synchronized php-1.23wmf9/extensions/MobileFrontend 'updating MobileFrontend in wmf9' [00:32:53] Logged the message, Master [00:33:50] (03PS1) 10Springle: warm up db1055 and db1023 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107274 [00:34:28] (03CR) 10Springle: [C: 032] warm up db1055 and db1023 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107274 (owner: 10Springle) [00:35:39] !log springle synchronized wmf-config/db-eqiad.php 'warm up db1055 and db1023' [00:35:45] Logged the message, Master [01:19:52] !log kaldari synchronized php-1.23wmf10/extensions/MobileFrontend 'updating MobileFrontend in wmf10' [01:19:59] Logged the message, Master [01:29:10] mwalker: sorry, I got side-tracked. I'll finish reviewing later. [01:29:33] ori: no worries! I appreciate your time when you can give it [01:30:15] if you want to test it more thoroughly yourself, my strategy is simple: i comment out the few wmf-specific things (monitor_group, system_role) and cp -R the module directory to puppet/modules in the vagrant repo [01:31:01] then i edit puppet/manifests/roles.pp to add https://dpaste.de/bLn5/raw [01:31:13] then 'vagrant enable-role ocg' followed by 'vagrant provision' [01:32:17] file { $temp_dir: path => $temp_dir } is awkward, btw. the resource name is the path by default, so you don't need to specify it manually. and, more importantly, it's bizarre to make the temp dir configurable [01:32:36] just make it /var/run/ocg or something and force people to live with it; it's not the sort of configurability that people actually need IMO [01:32:38] anyways, back later [01:32:43] ^ mwalker [01:33:09] I'm pretty sure I tried just file { $temp_dir } and it complained; and I made it configurable because the temp dir should live on an SSD; so it would be nice to configure that without playing with mount points [01:35:09] (03CR) 10Mwalker: "(05:32:17 PM) ori: file { $temp_dir: path => $temp_dir } is awkward, btw. the resource name is the path by default, so you don't need to s" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [02:20:29] (03PS1) 10Springle: depool db1040 and db1049 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107310 [02:22:43] (03PS2) 10Springle: depool db1040 and db1049 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107310 [02:23:14] (03CR) 10Springle: [C: 032] depool db1040 and db1049 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107310 (owner: 10Springle) [02:23:23] (03Merged) 10jenkins-bot: depool db1040 and db1049 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107310 (owner: 10Springle) [02:24:22] !log springle synchronized wmf-config/db-eqiad.php 'depool db1040 and db1049' [02:25:24] !log LocalisationUpdate completed (1.23wmf9) at Tue Jan 14 02:25:24 UTC 2014 [02:29:22] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:49] (03CR) 10MZMcBride: "Thanks for including example output in a comment, Nemo." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106892 (owner: 10Tinaj1234) [02:38:12] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.194 second response time [02:38:50] !log wikitech down. restarted apache on virt0 [02:39:11] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [02:39:21] !log synchronized wmf-config/db-eqiad.php [02:39:41] still doesn't seem happy [02:41:21] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:52] !log wikitech down. restarted apache on virt0. phusion_passenger exception + MaxClients hit [02:44:02] * springle waits hopefully [02:44:11] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.160 second response time [02:46:16] looks to me like a lot of puppet calls [02:48:14] It's kind of painful to watch people try to log info to wikitech when wikitech is down. [02:48:39] yep [02:48:48] we should really split the labs puppetmaster apart from wikitech [02:48:56] really it would be nice for wikitech to just be in the cluster [02:49:43] also, the way we log is fucking stupid [02:49:59] !log LocalisationUpdate completed (1.23wmf10) at Tue Jan 14 02:49:59 UTC 2014 [02:49:59] we have logstash and kibana now, don't we? [02:50:00] Yes. :-) [02:50:07] I file a bug about that general issue. [02:50:09] why don't we log to that, and have a canned search? [02:50:19] You must be new around here [02:50:26] :) [02:50:30] Who's kibana? New intern? [02:50:40] No [02:50:42] kibana is an interface for logstash [02:51:04] Let me find the bug. [02:51:11] I suspect when the infrastructure is a little bit more stablised, it shouldn't be too much work to improve the situation greatly [02:51:14] https://bugzilla.wikimedia.org/show_bug.cgi?id=57343 [02:51:33] Gloria: yeah, I saw the bug [02:51:48] If you think it should use kibana and logstash, you should say so. ;-) [02:51:52] Direction helps a bug like that. [02:55:35] ok, i'm not crazy... [02:55:40] can't log into wikitech [02:58:31] springle has been poking at it, he might need some help. I'm reluctant to wake people up for it though [02:58:43] poking is the right word [02:59:08] ah [02:59:15] /a is full on virt0 [02:59:57] I'm deleting some backups [03:00:08] seems keeping 7 days of backups isn't doable anymore [03:00:22] thanks Ryan_Lane :) [03:00:30] that's mostly thanks to our stupid logging strategy [03:00:47] since we have like 5G of logs [03:01:02] yw [03:04:25] !log synchronized wmf-config/db-eqiad.php 'depool db1040 and db1049' [03:04:29] * springle tries again [03:04:33] Logged the message, Master [03:04:38] \o/ [03:08:34] !log xtrabackup clone db1050 to db1049 [03:08:40] Logged the message, Master [03:11:44] andrewbogott: howdy. any idea when you want to deploy the changes I pushed in for OSM? [03:11:59] it looks like most of it was reviewed? [03:12:45] Ryan_Lane: You're talking about https://gerrit.wikimedia.org/r/104144 and https://gerrit.wikimedia.org/r/104129? [03:12:49] Or did I miss some merges? [03:14:44] and https://gerrit.wikimedia.org/r/#/c/104320/ [03:14:58] ah. missed you as a reviewer [03:15:50] (03PS1) 10Springle: switch db1049 to innodb_file_per_table=1 during reclone [operations/puppet] - 10https://gerrit.wikimedia.org/r/107315 [03:17:07] (03CR) 10Springle: [C: 032] switch db1049 to innodb_file_per_table=1 during reclone [operations/puppet] - 10https://gerrit.wikimedia.org/r/107315 (owner: 10Springle) [03:17:25] * andrewbogott reads [03:17:55] Ryan_Lane: I can schedule a roll-out for later this week… do you want to be around to watch when that happens? [03:18:08] yep [03:18:25] it does look like my keystone change could be fixed a little [03:18:30] ok… what time of day would suit you? [03:19:04] after 6pm PDT is best for me [03:19:10] I can do other times if necessary [03:19:35] Great -- I'm in an exceptionally difficult TZ right now (+8) so SF evening is ideal for me. [03:20:02] ah. great [03:21:55] greg-g, you still working? If so can you walk me through scheduling a labs deployment? [03:24:21] heh. I never scheduled any of that :D [03:25:21] Me neither but I caused a brief wikitech outage last week due to incompatible extension versions… so I'm a bit edgy just now. [03:25:53] heh [03:26:05] I always update to the newest branch [03:26:28] Not true -- we're using an old version of SMW and related [03:26:33] ah. true [03:26:38] we should probably upgrade SMW [03:26:39] The tip is incompatible with the core tip [03:27:00] Or was as of last week, I haven't followed up yet. [03:27:39] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Jan 14 03:27:39 UTC 2014 [03:27:44] Logged the message, Master [03:37:48] we "fixed" core by making the variable public again... [03:39:03] Reedy: Yeah, that fixed backwards compatibility with the old version of SMW. [03:39:11] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [03:39:17] I don't remember what the story was with updating SMW… maybe just a dependency cascade [04:57:32] springle, this TZ is so much quieter than CST! Do you read the whole US day's IRC transcripts when you wake up, or just live in comfortable ignorance? [04:59:11] i usually page back a ways, but ignorance is nice too :) [05:00:59] Perhaps my coding productivity will skyrocket with things so peaceful? [05:01:01] Perhaps not. [05:01:16] where are you? [05:01:45] Singapore. Still very far from you, but you're my closest Ops neighbor :) [05:02:31] 2h away [05:02:47] Plus many many degrees of latitude. [05:03:03] :) [05:03:27] wow, 4000 miles, that's /way/ more than I would've guessed. [05:04:06] big desert in between [05:04:17] yep [05:10:57] * springle plays whac-a-mole with slow queries [05:20:48] !log xtrabackup clone db1022 to db1040 [05:20:54] Logged the message, Master [05:36:57] (03CR) 10Faidon Liambotis: [C: 04-1] "More comments :) It's shaping up nicely though!" (0319 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [05:38:20] mwalker: ^ [05:38:32] yep yep; wandering through the comments now [05:41:19] paravoid: the reason I'm requiring the package install to notify on the service is because it doesn't seem safe to me to start a service without it's dependencies [05:42:14] that's a require/before relationship [05:42:48] before... that's probably the keyword I was looking for [05:43:38] http://docs.puppetlabs.com/learning/ordering.html & http://docs.puppetlabs.com/puppet/2.7/reference/lang_relationships.html [05:43:50] also; I gave the logs the wikidev group so that I can read them for debug [06:18:13] (03CR) 10Mwalker: Collection Renderer (Now a module!) (0319 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [06:19:29] (03PS15) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [06:37:40] (03CR) 10Ori.livneh: Collection Renderer (Now a module!) (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [06:38:43] paravoid: proposed convention: one 'include' per line [06:38:53] yeah, it makes for less dirty diffs [06:39:33] adding to style guide [07:06:15] (03CR) 10Mwalker: Collection Renderer (Now a module!) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [07:06:51] (03PS16) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [07:21:20] mwalker: Invalid parameter user at /tmp/vagrant-puppet-1/modules-0/ocg/manifests/init.pp:103 [07:21:25] should be 'owner' [07:21:51] mwalker: also: warning: Dynamic lookup of $hostname at /tmp/vagrant-puppet-1/modules-0/ocg/manifests/init.pp:14 is deprecated. Support will be removed in Puppet 2.8. Use a fully-qualified variable name (e.g., $classname::variable) or parameterized classes. [07:21:57] should be '$::hostname' [07:22:35] ori, how well do you know XSS? [07:22:59] i know the basics, not some of the fancy new techniques [07:23:13] i need to evaluate if something is a live XSS [07:24:27] Krinkle is in charge of front-end security, I think? Anyways, feel free to PM. An e-mail to security@ is usually a good idea if the suspicion is credible. [07:35:41] (03CR) 10Ori.livneh: [C: 04-1] Collection Renderer (Now a module!) (038 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [07:37:13] yurik: I can look at it if you need help [07:37:36] csteipp, i'm about to email to to security@ [07:37:43] is that the right approach? [07:37:55] i can cc you [07:37:58] If you're pretty sure it's an issue, yet [07:38:00] yep [07:38:07] csteipp, nope, i'm not :) [07:38:09] I'm on security@, so just that is fine [07:38:32] If not, email me directly or open a bug is best [07:38:47] csteipp, ori , just sent it to you two [07:55:59] (03CR) 10Mwalker: Collection Renderer (Now a module!) (038 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [07:56:25] (03PS17) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [07:56:53] ^ shoot; it's an odd PS number; that definitely means that ori is going to find more bugs :[ [07:59:07] mwalker: if you insist: config.redis.port is a string, but config.frontend.port is an int [07:59:38] but the modules applies correctly [08:00:11] yay! thanks :) also -- how hard would it be to recreate your vagrant stuff for ubuntu 13.10; paravoi_ has given us a tentative green light to deploy on that [08:00:41] (to avoid having to backport lots of stuff) [08:01:26] (03PS18) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [08:01:51] mwalker: you'd have to change these two lines in Vagrantfile: [08:01:54] config.vm.box = 'precise-cloud' [08:01:54] config.vm.box_url = 'https://cloud-images.ubuntu.com/vagrant/precise/current/precise-server-cloudimg-amd64-vagrant-disk1.box' [08:02:14] there are 13.10 images: https://cloud-images.ubuntu.com/vagrant/saucy/current/ [08:02:31] ok! task for tomorrow right there! (also; that is significantly easier than I thought you were going to say...) [08:02:57] * mwalker adds to list of things to do [08:03:14] let me know if you run into issues, i haven't tried provisioning on saucy [08:03:45] (03CR) 10Ori.livneh: [C: 031] Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [08:04:21] will do! /me *crosses fingers* for no big huge issues [08:04:50] I suspect it's going to come down to if our production puppet makes 12.04 specific assumptions [08:05:22] we'll figure it out [08:05:27] we need to anyway [08:08:52] paravoid: since you're around; either myself or max will play with the vagrant side of things tomorrow -- but who do I need to talk to to get a saucy install on rhodium? [08:09:04] or; what's the process for that? [08:09:12] jeff [08:09:31] saucy ships with puppet 3, interesting [08:09:39] he has offered to help with the ops side of things for this I think [08:09:57] he has; he's just a bit swamped with some fundraising stuff at present [08:11:07] but, that's good to hear; I was worried I was going to have to take some of Chris's time for an install in front of the console in the DC [08:13:09] ideally we'd start with a labs instance [08:13:18] but testing on rhodium is also possible [08:13:21] I don't particularly mind [08:14:45] ori: yeah, that's going to be fun, esp. since you're supposed they promise compatibility only for master > client scenarios [08:14:54] we might have to do puppet 3.x on the server first [08:14:59] akosiaris: weren't you working on this? [08:15:23] and apergos finding/fixing scoping issues I think? [08:15:39] max tested briefly today in labs on 13.10; but it wasn't the most ideal install (he started from 12.04 and then do-release-upgrade'd to 13.10 [08:15:39] more or less [08:16:02] i guess it could easily be added to the set of images available on labs [08:16:21] yes, that's right [08:17:21] mwalker: what's the instance name? [08:18:27] ori: not sure; it doesn't look like he created it in the collection_alt_renderer project [08:19:32] he also said he only got a couple minutes of playtime on it before it bricked [08:19:40] (03PS1) 10Matanya: dynamicproxy: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107339 [08:20:42] ... so according to https://wikitech.wikimedia.org/wiki/OpenStack#Building_new_images someone with shell on the openstack controller would need to at least import the image -- and most likely to have created it... [08:33:38] (03PS1) 10Hydriz: Move WikimediaIncubator extension call to be after Scribunto [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107340 [08:34:31] (03PS1) 10Matanya: download: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107341 [08:35:13] (03CR) 10Hydriz: "See also: If0f030674c728d39876a251a7c35613b6f88d2bf" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107340 (owner: 10Hydriz) [08:35:43] akosiaris, apergos, so, what's the progress then? :) [08:37:42] on the puppet 3 front ? On my part stalled due to OSM. [08:38:14] ok [08:39:13] I'm going to ask someone else to review https://gerrit.wikimedia.org/r/#/c/97007/ (I need to know if this is the right approach before I go on and apply it to several other changes) [08:39:21] ryan has been too busy or whatever [08:39:35] ori, good perf chat, listening to it now :) [08:39:36] suggestions for who to poke? [08:40:10] no that's not great [08:40:21] but leave ryan/mike to handle the openstack refactoring [08:40:48] the openstack classes are a bit complicated, I wouldn't dare to touch them [08:41:23] well I still need someone to say 'yes this is a good way to fix that sort of scope issue' or 'no, this is a better way' (even if they just comment on the changeset) [08:41:54] it's not a good way to fix this scope issue [08:42:05] a better way would be to refactor the openstack classes significantly [08:42:11] I see [08:42:51] well there will be a few more of those [08:43:47] apergos: why does the openstack version impact the scope? [08:45:29] I don't quite follow the question [08:56:44] getting "Pool queue is full" fairly regularly on the JSON based search API. [08:56:55] Not sure if I should file a bug for that or just mention it here. [08:57:17] as I doubt its a bug in the software, probably more to do with capacity. [08:58:13] sdehaan: file it anyway. Worst case it gets closed [08:58:18] we're hitting that API harder than usual with the Wikipedia Text app in Kenya because the local mobile network operator did a big media push. [08:58:21] yuvipanda: thanks [08:58:34] yuvipanda: http://bugzilla.wikimedia.org/? [08:58:40] or is there a more relevant one? [08:59:01] sdehaan: that's good enough. [09:05:44] yuvipanda: thanks, https://bugzilla.wikimedia.org/show_bug.cgi?id=60032 filed. [09:06:34] sdehaan: sweet. let do a bit of fiddling so that the proper people are notified [09:07:18] yuvipanda: thanks, appreciate it. fwiw our side of the issue is here https://github.com/praekelt/vumi-wikipedia/issues/28 [09:07:28] sdehaan: :) [09:07:57] sdehaan: there were some issues with the poolcounters yesterday, this might be the cause of that [09:08:00] err [09:08:03] that might be the cause of this [09:08:42] kk, I have a graphite graph showing the timeouts & problems. Has UTC timestamps, let me know if that's helpful to attach. [09:12:20] sdehaan: might as well do it :) [09:12:26] sdehaan: more info is usually nice anyway [09:12:31] sdehaan: the search folks are all asleep now (SF time) [09:15:22] done [09:25:06] (03PS1) 10Matanya: deployment: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107343 [09:58:39] (03PS1) 10Matanya: cpufrequtils: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107346 [10:14:10] (03PS1) 10Matanya: contint: lint packages.pp before spliting up [operations/puppet] - 10https://gerrit.wikimedia.org/r/107347 [10:27:07] (03CR) 10Faidon Liambotis: [C: 032] Varnish: don't mobile redirect www.$project.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/89879 (owner: 10JanZerebecki) [10:27:19] (03CR) 10Faidon Liambotis: [C: 032] varnish: simplify the mobile redirect regexp [operations/puppet] - 10https://gerrit.wikimedia.org/r/106669 (owner: 10Faidon Liambotis) [10:28:13] sdehaan: what wiki? [10:28:28] Nemo_bis: english [10:28:32] the normal/old search constantly gives pool queue full errors [10:28:38] English Wikipedia I suppose you mean [10:29:09] I don't know if there's a parameter to get the Cirrus search results via API too [10:29:58] Nemo_bis: cirrus isn't supposed to be used in production yet, I think [10:30:02] Nemo_bis: yeah English Wikipedia [10:30:19] yurik mentioned something about a new search backend, is that the cirrus stuff? [10:30:25] yuvipanda: no idea what you mean in production, it's already default on at least 2 top10 Wikipedias [10:30:31] yes [10:31:15] well, not the default for enwiki, at the least :) [10:31:22] has the search api itself changed due to the backend changes? [10:31:30] the old search is haunted (quot.), can't really expect it to work [10:31:48] no, the search API won't be affected by the backend [10:31:50] well, it 'mostly' works :) [10:31:54] but yeah, haunted [10:33:32] sdehaan: I suggest you to watchlist https://www.mediawiki.org/wiki/Search#Wikis to be notified when the backend changes on en.wiki too [10:34:43] thanks [10:39:15] gah [10:39:24] (03PS1) 10Faidon Liambotis: Fix mobile redirect breakage [operations/puppet] - 10https://gerrit.wikimedia.org/r/107349 [10:39:48] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Fix mobile redirect breakage [operations/puppet] - 10https://gerrit.wikimedia.org/r/107349 (owner: 10Faidon Liambotis) [11:13:13] (03Abandoned) 10Matthias Mullie: Fix notice due to incorrect capitalization [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107186 (owner: 10Matthias Mullie) [11:39:59] (03PS1) 10Springle: warm up db1049 in s1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107354 [11:40:35] (03CR) 10Springle: [C: 032] warm up db1049 in s1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107354 (owner: 10Springle) [11:40:43] (03Merged) 10jenkins-bot: warm up db1049 in s1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107354 (owner: 10Springle) [11:41:39] !log springle synchronized wmf-config/db-eqiad.php 'warm up db1049' [11:41:45] Logged the message, Master [11:45:26] hrmm.. too warm too quick [11:49:33] When is the first dump of 2014 ready for Wikidata ? [11:55:44] springle: saw my mail? [11:56:36] paravoid: yep. sorry, meant to reply earlier [11:56:47] no worries, it's nothing urgent obviously [11:59:48] (03PS1) 10Matanya: base: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107355 [12:00:22] (03CR) 10jenkins-bot: [V: 04-1] base: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107355 (owner: 10Matanya) [12:00:42] the good news it can actually be found :) http://dumps.wikimedia.org/wikidatawiki/20140106/ [12:01:48] (03PS2) 10Matanya: base: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107355 [12:02:23] (03CR) 10jenkins-bot: [V: 04-1] base: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107355 (owner: 10Matanya) [12:05:46] Nemo_bis: sorry, I'm going to need some explanation on how to watchlist [12:07:01] (03PS3) 10Matanya: base: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107355 [12:07:07] typos :/ [12:11:27] sdehaan: login/register and click the star [12:11:34] doh [12:11:54] https://meta.wikimedia.org/wiki/Help:Watching_pages [12:14:17] thanks [12:29:15] (03PS1) 10Matanya: bacula: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107356 [12:39:31] (03CR) 10Alexandros Kosiaris: [C: 032] bacula: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107356 (owner: 10Matanya) [12:44:05] (03PS1) 10Matanya: apt: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107358 [12:45:00] and now my project is offically over. all modules are linted :) [12:45:51] what's next?:P [12:46:08] reforming webserver [12:46:46] i would like some ops input before. got from andrewbogott and i ran into some issues [12:47:02] but that is the plan [13:05:24] (03CR) 10Alexandros Kosiaris: [C: 031] "LGTM, I also like the new format. The flag is existent at least as far back as hardy so it is supported across the cluster. I personally c" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106892 (owner: 10Tinaj1234) [13:09:50] akosiaris: should manifests/mail.pp become a module? [13:23:11] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [14:00:08] (03PS1) 10Faidon Liambotis: Varnish: don't inadvertently convert 500s to 503s [operations/puppet] - 10https://gerrit.wikimedia.org/r/107364 [14:00:40] mark: ^^^ we might need to tune this further -- I think 4 is too much [14:00:50] I'll merge for now [14:01:06] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Varnish: don't inadvertently convert 500s to 503s [operations/puppet] - 10https://gerrit.wikimedia.org/r/107364 (owner: 10Faidon Liambotis) [14:07:47] (03PS1) 10Faidon Liambotis: Varnish: brown paper bag fix for return(restart) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107365 [14:09:02] (03PS2) 10Faidon Liambotis: Varnish: brown paper bag fix for return(restart) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107365 [14:09:40] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Varnish: brown paper bag fix for return(restart) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107365 (owner: 10Faidon Liambotis) [14:11:49] (03PS1) 10Springle: This was turned off for wikidata queries on wb_terms. However it caused a parformance hit for an ipblocks query on enwiki. Latter trumps former. Revert and find another way... [operations/puppet] - 10https://gerrit.wikimedia.org/r/107366 [14:13:17] (03CR) 10Springle: [C: 032] This was turned off for wikidata queries on wb_terms. However it caused a parformance hit for an ipblocks query on enwiki. Latter trumps for [operations/puppet] - 10https://gerrit.wikimedia.org/r/107366 (owner: 10Springle) [14:17:06] (03PS1) 10Faidon Liambotis: gdash: remove logbase from reqstats.5xx [operations/puppet] - 10https://gerrit.wikimedia.org/r/107367 [14:17:36] (03CR) 10Faidon Liambotis: [C: 032 V: 032] gdash: remove logbase from reqstats.5xx [operations/puppet] - 10https://gerrit.wikimedia.org/r/107367 (owner: 10Faidon Liambotis) [14:20:11] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [14:23:08] paravoid: looks like only bugzilla still talking to db9 [14:23:20] oh dear [14:23:46] that migration is in progess though, afaik. mutante may know more [14:23:58] yeah [14:24:38] (03CR) 10Nikerabbit: [C: 031] Changed date format in l10nupdate-1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/106892 (owner: 10Tinaj1234) [14:29:24] (03PS1) 10Springle: db1049 to full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107370 [14:29:54] (03CR) 10Springle: [C: 032] db1049 to full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107370 (owner: 10Springle) [14:30:02] (03Merged) 10jenkins-bot: db1049 to full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107370 (owner: 10Springle) [14:30:55] !log springle synchronized wmf-config/db-eqiad.php 'db1049 to full steam' [14:31:01] Logged the message, Master [14:32:44] hashar: hello there [14:33:06] i was wondring if it is possible to add puppet-lint to jenkins tests [14:33:34] I doubt all of our manifests pass that [14:34:07] Jenkins tests can be marked "non-voting". [14:34:12] I know [14:34:15] I fail to see their point [14:34:53] i guess most of them don't but i found out not everybody know about them [14:35:00] (03PS1) 10Springle: reassign db1040 to s4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107371 [14:37:21] (03CR) 10Springle: [C: 032] reassign db1040 to s4 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107371 (owner: 10Springle) [14:37:41] matanya: there is a rake target for puppet-lint [14:37:55] matanya: try rake -T [14:37:55] [14:38:03] rake lint [14:38:14] (03PS1) 10Aude: Enable Wikibase Client on Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107372 [14:38:20] it calls puppet linter and disable a bunch of tests [14:38:31] !log xtrabackup clone db1042 to db1040 [14:38:37] Logged the message, Master [14:39:06] I think they're useful for when you don't want to enforce puppet-lint yet, but want to migrate step by step. [14:39:21] (03CR) 10Aude: [C: 04-1] "*important* verify wikisource has sites table (and populated) before deploying this!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107372 (owner: 10Aude) [14:42:15] thanks hashar [15:04:16] !log Jenkins: uninstalled phpunit has provided by PEAR (pear uninstall pear.phpunit.de/PHPUnit) We are using Wikimedia deployment system now (integration/phpunit) [15:04:23] Logged the message, Master [15:15:28] (03PS1) 10Nemo bis: [Planet] Fix Guillaume's blog feed URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/107376 [15:15:37] (03CR) 10jenkins-bot: [V: 04-1] [Planet] Fix Guillaume's blog feed URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/107376 (owner: 10Nemo bis) [15:50:28] more poolcounter fun: https://bugzilla.wikimedia.org/show_bug.cgi?id=60032 [16:04:37] just what I wanted. [16:06:09] added yurik because he told me about that issue on Friday [16:29:51] (03PS1) 10Ottomata: Adding parameters to set root database (mysql) user and pass [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/107387 [16:30:40] (03CR) 10Ottomata: [C: 032 V: 032] Adding parameters to set root database (mysql) user and pass [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/107387 (owner: 10Ottomata) [16:40:51] (03PS4) 10Ottomata: added basic hbase support [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 (owner: 10Physikerwelt) [16:43:12] (03CR) 10Physikerwelt: "by the way you can see how to set up zookeeper here" [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 (owner: 10Physikerwelt) [17:03:51] ksnider, Coren: https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration [17:10:49] (03CR) 10Ottomata: "Yeah the zookeeper issue is a tough one. Our problem is that we prefer to use Debian/Ubuntu packages where possible. The configuration s" [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 (owner: 10Physikerwelt) [17:22:45] !log reedy updated /a/common to {{Gerrit|I8d1013484}}: db1049 to full steam [17:22:49] (03PS1) 10Reedy: All non wikipedias to 1.23wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107390 [17:22:51] Logged the message, Master [17:32:27] hi, just saw the mail about wikitech outage. looks like we have no disk space monitoring for that node in icinga? [17:33:07] also i don't understand why wikitech is hosted on a node called 'virt0'? [17:34:18] jgage, because it's tightly related to labs? [17:35:09] jgage: 1) we should, 2) it's on a virtual machine just so it is separate from the main cluster (ie: if the main cluster goes down, we still want to see the Server Admin Log and other documentation) [17:35:29] it's not on a virtual machine [17:35:32] oh [17:35:37] well, a separate machien [17:35:41] virt0 is the labs "master", used to be labsconsole [17:35:41] (why call it virt?) [17:35:45] ahhh [17:35:53] labsconsole merged with wikitech [17:35:57] * greg-g nods [17:36:13] it still is the frontend to openstack nova [17:36:53] i see, ok. i will educate myself about how to add a disk space check. [17:36:54] virt0 (in contrary to virtN, N > 1) doesn't do virtualization at all actually, it's just control stuff [17:37:02] ldap, keystone, wikitech etc. [17:37:04] far too many things [17:37:21] that explains why virt* is the only class i've found which counts from 0 instead of 1 [17:37:38] heh [17:38:07] yeah it's a bit confusing [17:38:12] the cluster is being moved to eqiad as we speak [17:38:36] it's probably too late to rethink that decision, since it's a project on a very tight deadline [17:38:41] but it won't hurt to ask [17:39:00] s/virt/labs/ for the rest of them would be nice too [17:39:23] but again, I wouldn't take the bet that it will happen :) [17:40:06] being moved to eqiad as in the project which has been running for 12 months, or something more specific? [17:40:28] i see several hosts wtihout disk space checks, i'll do an audit [17:40:48] (03PS5) 10BryanDavis: Add logstash config for udp2log [operations/puppet] - 10https://gerrit.wikimedia.org/r/106154 [17:40:50] https://rt.wikimedia.org/Ticket/Display.html?id=6158 [17:40:52] labs being moved to eqiad, it's happening as we speak [17:40:57] cool [17:42:21] jgage: i'd expect disk space to be a default icing check on prerty much everything [17:42:34] "apparently not" [17:43:14] hmm, yea, but "should" be like DPKG and SSH [17:44:20] jgage: /puppet/modules/base/manifests/monitoring$ vi host.pp [17:44:23] (yay for new guy asking the obvious questions that expose holes!) (seriously) [17:44:31] maybe it's because those are missing NRPE [17:44:46] greg-g: +1 [17:44:52] ugh nrpe [17:45:16] nrpe::monitor_service { 'disk_space': [17:45:25] nrpe_command => '/usr/lib/nagios/plugins/check_disk -w 6% -c 3% -l -e -A -i "/srv/sd[a-b][1-3]"' [17:45:34] should be it [17:46:46] you can check on neon if the Icinga command has been defined for this host, and then you can check on the host if the NRPE command has been added there [17:47:09] one would result in just no checks, and the other in broken checks [17:48:46] and the service you want running on the _monitored_ host is /etc/init.d/nagios-nrpe-server [17:48:51] thanks mutante [17:48:55] yw [17:50:06] jgage: you cna copy that from other hosts too [17:50:10] *can [17:50:44] yeah [17:50:56] the NRPE commands that are executed locally on the host are in /etc/nagios/nrpe_local.cfg or /etc/nagios/nrpe.de/ [17:51:01] .d [17:51:08] heh nice finger macro [17:55:29] mutante: i finished my linting modules project, i'd appricate if you find time to review whenever [17:57:17] (03PS1) 10Jgreen: remove fundraising users from aluminium, they have migrated to lutetium [operations/puppet] - 10https://gerrit.wikimedia.org/r/107394 [17:59:14] (03CR) 10Jgreen: [C: 032 V: 031] remove fundraising users from aluminium, they have migrated to lutetium [operations/puppet] - 10https://gerrit.wikimedia.org/r/107394 (owner: 10Jgreen) [18:14:06] !log Created and populated sites and site_identifier tables on all wikisource projects [18:14:13] Logged the message, Master [18:33:29] paravoid, SoS? [18:34:36] Scrum of Scrums [18:36:37] <^demon> save our ship. [18:37:01] Stop our sockpuppets [18:37:15] scap our software [18:37:26] shoot our selves [18:38:40] <^demon> drdee++ [18:39:55] (03PS1) 10Reedy: Kill 1.22wmfXX remnants [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107398 [18:40:12] (03CR) 10Reedy: [C: 032] Kill 1.22wmfXX remnants [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107398 (owner: 10Reedy) [18:40:20] (03Merged) 10jenkins-bot: Kill 1.22wmfXX remnants [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107398 (owner: 10Reedy) [18:46:20] (03PS1) 10Jkrauska: Mark iptables as being soon deprecated. [operations/puppet] - 10https://gerrit.wikimedia.org/r/107399 [18:53:22] (03CR) 10Ottomata: "See inline comments." (039 comments) [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 (owner: 10Physikerwelt) [18:54:26] (03PS2) 10Ottomata: Log correctly encoded url with parameters for nginx [operations/puppet] - 10https://gerrit.wikimedia.org/r/105449 (owner: 10QChris) [18:54:41] qchris and I are about to deploy a couple of nginx related logging changes, should be pretty simpole [18:54:47] speak now if we shouldn't! [18:55:25] (03PS2) 10Ottomata: Stop nginx from escaping the user agent [operations/puppet] - 10https://gerrit.wikimedia.org/r/105450 (owner: 10QChris) [18:55:29] (03PS2) 10Ottomata: Log outgoing X-Analytics header for nginx [operations/puppet] - 10https://gerrit.wikimedia.org/r/105451 (owner: 10QChris) [18:56:36] ok, those are all just changes to the log format line [18:56:43] i'm going to merge and run puppet on ssl1001 [18:56:44] qchris, s'ok? [18:56:48] Ok. [18:56:57] (03CR) 10Ottomata: [C: 032 V: 032] Log correctly encoded url with parameters for nginx [operations/puppet] - 10https://gerrit.wikimedia.org/r/105449 (owner: 10QChris) [18:57:10] (03CR) 10Ottomata: [C: 032 V: 032] Stop nginx from escaping the user agent [operations/puppet] - 10https://gerrit.wikimedia.org/r/105450 (owner: 10QChris) [18:57:16] (03PS2) 10Reedy: All non wikipedias to 1.23wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107390 [18:57:21] (03CR) 10Reedy: [C: 032] All non wikipedias to 1.23wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107390 (owner: 10Reedy) [18:57:25] (03CR) 10Ottomata: [C: 032 V: 032] Log outgoing X-Analytics header for nginx [operations/puppet] - 10https://gerrit.wikimedia.org/r/105451 (owner: 10QChris) [18:58:18] (03Merged) 10jenkins-bot: All non wikipedias to 1.23wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107390 (owner: 10Reedy) [18:58:47] nginx? [18:58:48] what for? [18:58:59] we've wanted to remove udp2log from nginx for some time now [18:59:23] paravoid: Yup. That's a separate problem. [18:59:35] some time = 2 years or so [18:59:39] But the current format is causing problems for webstatscollector consumers [18:59:44] just remove it [18:59:52] And they thing we're seeing requests that we aren't seeing [19:00:03] And create redirects for those URLs by hand [19:00:05] kill it :) [19:00:29] That would require updates to much of our infrastructure, [19:00:38] as the non-ssl request comes from an internal IP [19:00:47] and such requests are typically discarded [19:00:53] by all of our legacy systems [19:00:59] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: All non wikipedias to 1.23wmf10 [19:01:05] Logged the message, Master [19:01:09] (Those suck at X-Forwarded-For handling) [19:01:51] that's what everyone's been saying for two years now :) [19:02:26] and we keep piling more work on top of it [19:02:34] !log reedy updated /a/common to {{Gerrit|Ic854d268b}}: All non wikipedias to 1.23wmf10 [19:02:38] (03PS1) 10Reedy: Closed Wikipedias to 1.23wmf10 too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107403 [19:02:40] Logged the message, Master [19:02:48] (03CR) 10Reedy: [C: 032] Closed Wikipedias to 1.23wmf10 too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107403 (owner: 10Reedy) [19:02:58] Those changes just make it easier for the community :-) [19:02:58] (03Merged) 10jenkins-bot: Closed Wikipedias to 1.23wmf10 too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107403 (owner: 10Reedy) [19:03:09] ok, leaving now [19:03:10] byebye [19:03:30] (nothing to do with this discussion, it's just 9pm) [19:03:39] hehe, bybye! [19:03:56] checks local date.. oh man it is.. bye :) [19:04:04] bye :-) [19:04:12] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Closed wikipedias to 1.23wmf10 too [19:04:18] Logged the message, Master [19:05:32] !log reedy synchronized php-1.23wmf10/extensions/Wikibase 'https://gerrit.wikimedia.org/r/#/c/107375' [19:05:38] Logged the message, Master [19:06:32] Reedy: so, categorytree... what the dillio? [19:06:44] no idea [19:06:46] I'm pretty confused [19:06:53] !log deployed nginx log format changes on ssl* [19:06:59] Logged the message, Master [19:07:02] qchris, they are out! [19:07:10] looking fine as far as I can tell [19:07:14] Yay! [19:07:17] Thanks ottomata [19:07:39] It's confusing how it apparently didn't break stuff before [19:08:30] yeah, so, what do we do? :) [19:11:46] <^demon> greg-g: No more categories? [19:13:12] (03PS2) 10Nemo bis: [Planet] Fix Guillaume's blog feed URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/107376 [19:14:42] (03PS3) 10Nemo bis: [Planet] Fix Guillaume's blog feed URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/107376 [19:15:58] A database query error has occurred. This may indicate a bug in the software. [19:15:58] Function: LinksUpdate::incrTableUpdate [19:15:58] Error: 1054 Unknown column 'el_id' in 'field list' (10.64.16.28) [19:16:03] wtf? [19:16:18] Ugh [19:16:20] I bet I know [19:16:27] What wiki? [19:16:39] ^demon: my vote is to just kill categorytree [19:16:42] commons [19:16:44] it's dead software [19:16:47] Reedy: meta [19:16:55] <^demon> greg-g: It has...uses [19:17:01] <^demon> :) [19:17:06] commons and meta [19:17:08] the grants people replaced it's uses with... uh... another extentions [19:17:09] ^demon: wouldn't that disablecache config help? [19:17:14] I cannot even create a page [19:17:21] Hang on [19:17:22] <^demon> Nemo_bis: No clue. [19:17:25] greg-g: DPL? it's not enabled everywhere [19:17:30] DynamicPageList [19:17:33] ah [19:17:34] <^demon> Ewww. [19:17:34] <^demon> DPL [19:17:37] heh [19:17:38] Reedy: poke me if there's any test I could do [19:17:41] <^demon> DPL's soooo slow [19:17:45] * greg-g nods [19:17:54] CategoryTree >> DPL? [19:17:58] ori ori ori! you see my mediawiki-vagrant patches? :D :D [19:18:07] Tue Jan 14 19:16:26 UTC 2014 mw1054 commonswiki LinksUpdate::incrTableUpdate 10.64.32.29 1054 Unknown column 'el_id' in 'field list' (10.64.32.29) [19:18:07] <^demon> Anyway, CategoryTree is also what gives us expanding trees in the category listing pages. [19:18:14] oh, well then [19:18:19] <^demon> Which are still kind of useful, even if you don't ever use tags. [19:18:21] shouldn't that be, uh, merged into core? [19:18:36] * greg-g utters the forbidden words [19:18:36] to bring the cluster down more subtly? :P [19:18:57] just thinking out loud here, trying to figure out what we can do here [19:19:11] https://git.wikimedia.org/commit/mediawiki%2Fcore.git/a33048bb8b4a1c7717c0760acd03563bf937687d [19:19:12] FFS [19:19:24] https://gerrit.wikimedia.org/r/#/c/105243/ [19:20:11] <^demon> ew. [19:20:23] 1 revert [19:20:33] 2 reverts [19:21:00] (03PS6) 10BryanDavis: Kibana puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 [19:21:02] <^demon> red reverts, blue reverts? [19:21:08] "this is a no-op" = "do not commit this no matter what" [19:21:32] Vito: Thanks. fix to cluster incoming in a couple of minutes [19:22:18] yvw Reedy [19:22:27] that was a good catch [19:22:30] apergos: :) https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=94410&oldid=94409 got another one coming later today ;) [19:22:49] :-D [19:23:03] * Reedy waits for Jenkins [19:23:12] lol, lemme undelete some stuffs before the Internet will explode [19:23:14] if you think it's a no-op that means (as devopsborat might have said 'no ops for you!' [19:23:16] ) [19:23:22] haha [19:23:40] I'm wary whenever anyone says that to me, don't worry [19:23:55] :-) [19:27:42] (03CR) 10BryanDavis: "All prior review questions/comments addressed." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 (owner: 10BryanDavis) [19:27:43] !log reedy synchronized php-1.23wmf10/includes/deferred/LinksUpdate.php [19:27:51] Logged the message, Master [19:36:02] On that note.... [19:36:09] springle-away: How far through the el_id additions are you? :) [19:49:26] (03PS1) 10Aaron Schulz: Bump ParsoidCacheUpdateJobOnDependencyChange runners [operations/puppet] - 10https://gerrit.wikimedia.org/r/107420 [19:51:41] (03PS2) 10Chad: Bump ParsoidCacheUpdateJobOnDependencyChange runners [operations/puppet] - 10https://gerrit.wikimedia.org/r/107420 (owner: 10Aaron Schulz) [19:51:48] (03CR) 10Chad: [C: 031] Bump ParsoidCacheUpdateJobOnDependencyChange runners [operations/puppet] - 10https://gerrit.wikimedia.org/r/107420 (owner: 10Aaron Schulz) [19:52:20] (03CR) 10GWicke: "So it seems that we can do this in steps:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [19:52:31] paravoid, ping [19:56:02] (03PS1) 10Gage: nrpe: enable on virt0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107424 [19:56:36] (03CR) 10jenkins-bot: [V: 04-1] nrpe: enable on virt0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107424 (owner: 10Gage) [19:57:57] hm why did that build fail? [19:58:26] ah probably the comma [20:05:12] (03PS2) 10Gage: nrpe: enable on virt0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107424 [20:05:41] (03CR) 10Aude: "sites table is populated" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107372 (owner: 10Aude) [20:05:57] (03PS2) 10Reedy: Enable Wikibase Client on Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107372 (owner: 10Aude) [20:05:59] (03CR) 10Tim Landscheidt: [C: 031] gridengine: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107035 (owner: 10Matanya) [20:06:03] (03CR) 10Reedy: [C: 032] Enable Wikibase Client on Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107372 (owner: 10Aude) [20:06:12] (03Merged) 10jenkins-bot: Enable Wikibase Client on Wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107372 (owner: 10Aude) [20:10:08] (03PS1) 10Aude: Update $wmgWikibaseEnableData setting for wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107426 [20:10:16] (03CR) 10Matanya: [C: 031] nrpe: enable on virt0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107424 (owner: 10Gage) [20:10:52] jgage: are you new? [20:11:30] (03CR) 10Subramanya Sastry: "To clarify, we will temporarily have two different deployment targets (each with its own repo)." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [20:11:37] (03CR) 10Reedy: [C: 032] Update $wmgWikibaseEnableData setting for wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107426 (owner: 10Aude) [20:11:45] (03Merged) 10jenkins-bot: Update $wmgWikibaseEnableData setting for wikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107426 (owner: 10Aude) [20:12:01] matanya, yes. hi :) [20:12:06] thanks for the CR [20:12:25] np, it won't help much, as i'm not ops [20:12:40] !log reedy synchronized wmf-config/ [20:12:47] Logged the message, Master [20:12:49] yay [20:13:09] dblists aren't done yet ;) [20:13:51] jgage: anyway, welcome! :) [20:15:03] !log reedy synchronized database lists files: Enable Wikidata on wikisources [20:15:09] site reliabilty engineer [20:15:10] Logged the message, Master [20:15:15] i see [20:15:54] !log reedy synchronized wmf-config/InitialiseSettings.php 'touch' [20:16:00] Logged the message, Master [20:24:15] (03PS1) 10Aude: re-enable data transclusion on commons (connected items only) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107430 [20:26:17] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [20:26:24] Logged the message, Master [20:27:16] !log reedy synchronized wmf-config/InitialiseSettings.php 'touch' [20:27:23] Logged the message, Master [20:28:33] !log reedy synchronized database lists files: [20:28:40] Logged the message, Master [20:30:43] mutante: matanya: this is weird. https://rt.wikimedia.org/Ticket/Display.html?id=1754 lists no requestor but then click People and there is a requestor [20:31:33] also, the email's footer doesn't match the ticket creator [20:31:36] abartov: ping? [20:32:26] oh, i see he cut and pasted from something abartov wrote elsewhere. maybe zendesk or equivalent. but still it should show him as requestor! [20:33:33] ok, apparently happens with other tickets too. https://rt.wikimedia.org/Ticket/Display.html?id=1960 [20:33:43] (03PS4) 10Reedy: Create Chinese Wikivoyage (zhwikivoyage) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104355 (owner: 10Odder) [20:34:21] (03CR) 10Reedy: [C: 032] Create Chinese Wikivoyage (zhwikivoyage) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104355 (owner: 10Odder) [20:34:29] (03Merged) 10jenkins-bot: Create Chinese Wikivoyage (zhwikivoyage) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104355 (owner: 10Odder) [20:34:54] chinese wikivoyage? [20:34:56] wee zhwikivoyage [20:35:08] You're all slacking [20:35:09] https://zh.wikivoyage.org/wiki/Special:%E7%94%A8%E6%88%B7%E5%88%97%E8%A1%A8 [20:35:13] I've had an account there for 7 minutes [20:35:14] needs sites table :) [20:35:20] !log reedy synchronized database lists files: [20:35:20] aude: Among other things [20:35:25] yup [20:35:26] add a wiki is a bit out of date it seems [20:35:27] Logged the message, Master [20:35:47] this is weird indeed jeremyb [20:35:48] Reedy: told ya that on Dec 29 [20:36:09] !log reedy synchronized wmf-config/InitialiseSettings.php [20:36:14] matanya: no longer with the first i linked now that i changed it but i found another test case first :) [20:36:15] Logged the message, Master [20:37:59] jeremyb: i still see it requestless [20:38:08] matanya: which? [20:38:14] both [20:38:22] i don't believe you :) [20:39:06] need a screenshot? :) [20:40:50] aude: I guess PopulateSitesTable needs running now doesn'tt it? [20:41:40] it shouldn't [20:41:49] err for zhwikivoyage, yes [20:42:05] <^demon> Reedy: zhwikivoyage. Are you importing some content from somewhere? [20:42:10] <^demon> Or are they starting fresh? [20:42:13] No idea [20:42:21] https://bugzilla.wikimedia.org/show_bug.cgi?id=59077 [20:42:23] It's not mentioned [20:42:38] <^demon> I'm looking at https://bugzilla.wikimedia.org/show_bug.cgi?id=59100 [20:42:42] <^demon> I asked there. [20:42:57] matanya: i guess?? :) [20:43:57] hi ^demon [20:44:20] <^demon> Hi :) [20:45:05] ^demon: I'm guessing a steward / importer / whatever this group is called will import pages from https://incubator.wikimedia.org/wiki/Category:Wy/zh [20:45:21] jeremyb: where do you want it? [20:45:38] (03CR) 10Jgreen: [C: 031] Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [20:46:03] <^demon> twkozlowski: Gotcha. Anyway, the index is created, so any pages edited will get indexed. [20:46:09] matanya: just the people box? then can go to imgur i guess? [20:46:19] <^demon> I need to confirm the indexer runs when you import pages. If not, we'll want to run it manually again after importing. [20:47:01] jeremyb: http://imgur.com/TJf8dYD [20:47:54] (03PS1) 10Jkrauska: Initial commit of pmacct module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 [20:48:36] jeremyb: if you saw it let me know, i want to delete it [20:48:46] i saw it [20:49:34] matanya: refresh harder maybe? the updated timestamp is different here [20:51:18] jeremyb: only closing the browser did it [20:51:19] !log reedy synchronized php-1.23wmf10/resources 'touch' [20:51:25] Logged the message, Master [20:51:34] matanya: weird browser [20:51:46] matanya: anyway, bug is still bug [20:51:59] no, weird user, saving his cache forever [20:52:02] (1960) [20:52:05] yes [20:53:45] ^demon: So what do you need me to do? Would a short note on the bug saying "OK, the import's done" be enough? [20:54:25] <^demon> Yup, me and Nik are cc'd on the bug. [20:54:39] <^demon> :) [20:55:04] !log reedy synchronized php-1.23wmf10/extensions/Wikibase/lib/resources/wikibase.Site.js 'touch' [20:55:10] Logged the message, Master [20:56:38] ^demon: okay, I'll make sure a note's dropped there. Thanks for your help :) [20:56:43] <^demon> yw [21:00:19] Reedy, are you done deploying? [21:01:37] MaxSem: I think so.. [21:01:57] thanks:) [21:02:18] !log reedy synchronized wmf-config/interwiki.cdb 'Updating interwiki cache' [21:02:25] Logged the message, Master [21:02:28] (03PS1) 10MaxSem: Add TextExtracts [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107476 [21:02:38] !log reedy updated /a/common to {{Gerrit|I24dd4a00e}}: Create Chinese Wikivoyage (zhwikivoyage) [21:02:45] Logged the message, Master [21:02:48] (03PS1) 10Reedy: Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107477 [21:02:55] (03CR) 10Reedy: [C: 032] Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107477 (owner: 10Reedy) [21:03:05] (03Abandoned) 10Aude: re-enable data transclusion on commons (connected items only) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107430 (owner: 10Aude) [21:03:33] (03Merged) 10jenkins-bot: Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107477 (owner: 10Reedy) [21:11:15] (03CR) 10MaxSem: [C: 032] Add TextExtracts [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107476 (owner: 10MaxSem) [21:11:46] (03Merged) 10jenkins-bot: Add TextExtracts [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107476 (owner: 10MaxSem) [21:17:16] (03PS1) 10Jgreen: add user gdubuc per RT 6619 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107482 [21:19:50] (03PS2) 10Jgreen: add user gdubuc per RT 6619 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107482 [21:21:18] (03CR) 10Jgreen: [C: 032 V: 031] add user gdubuc per RT 6619 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107482 (owner: 10Jgreen) [21:22:38] !log maxsem started scap [21:22:44] Logged the message, Master [21:24:43] (03PS1) 10MaxSem: Enable TextExtracts on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107483 [21:24:54] !log reedy synchronized wikidataclient.dblist [21:25:01] Logged the message, Master [21:25:56] !log reedy synchronized wmf-config/InitialiseSettings.php 'touch' [21:26:02] Logged the message, Master [21:26:52] !log reedy updated /a/common to {{Gerrit|Ie5fe8bd7e}}: Update interwiki cache [21:26:57] (03PS1) 10Reedy: Re-add wikisources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107484 [21:26:58] Logged the message, Master [21:27:16] (03CR) 10Reedy: [C: 032] Re-add wikisources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107484 (owner: 10Reedy) [21:27:24] (03Merged) 10jenkins-bot: Re-add wikisources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107484 (owner: 10Reedy) [21:32:46] !log maxsem started scap: Preparing for TextExtracts deployment [21:32:52] Logged the message, Master [21:49:08] (03CR) 10Lcarr: [C: 04-1] "see inline comments" (038 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [22:01:57] of course Aaron leaves right when max has issues [22:01:58] !log maxsem finished scap: Preparing for TextExtracts deployment (duration: 30m 26s) [22:02:04] Logged the message, Master [22:04:47] (03CR) 10BryanDavis: [C: 031] Enable TextExtracts on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107483 (owner: 10MaxSem) [22:04:57] (03CR) 10MaxSem: [C: 032] Enable TextExtracts on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107483 (owner: 10MaxSem) [22:05:27] (03Merged) 10jenkins-bot: Enable TextExtracts on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107483 (owner: 10MaxSem) [22:08:18] !log maxsem synchronized wmf-config 'https://gerrit.wikimedia.org/r/107483' [22:08:25] Logged the message, Master [22:12:02] (03CR) 10Jeremyb: [C: 04-1] Initial commit of pmacct module (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [22:12:04] (03CR) 10Tim Landscheidt: Initial commit of pmacct module (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [22:12:25] (03CR) 10Legoktm: [C: 031] Add global permissions for Flow [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106306 (owner: 10EBernhardson) [22:13:32] !log maxsem synchronized wmf-config 'https://gerrit.wikimedia.org/r/107483' [22:13:37] Logged the message, Master [22:14:37] (03CR) 10Matanya: "and more comments. overall layout is good." (039 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [22:17:35] (03PS1) 10MaxSem: Enable TextExtracts everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107489 [22:17:41] PROBLEM - Varnishkafka Delivery Errors on cp3013 is CRITICAL: STALE [22:17:47] ottomata: reviewed https://gerrit.wikimedia.org/r/#/c/107317/ [22:17:51] PROBLEM - Varnishkafka Delivery Errors on cp3014 is CRITICAL: STALE [22:18:10] (03CR) 10MaxSem: [C: 032] Enable TextExtracts everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107489 (owner: 10MaxSem) [22:18:18] (03Merged) 10jenkins-bot: Enable TextExtracts everywhere [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107489 (owner: 10MaxSem) [22:19:31] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: STALE [22:19:31] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: STALE [22:19:31] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: STALE [22:19:32] PROBLEM - Varnishkafka Delivery Errors on cp4019 is CRITICAL: STALE [22:19:32] PROBLEM - Kafka Broker Messages In on analytics1022 is CRITICAL: STALE [22:19:32] PROBLEM - Varnishkafka Delivery Errors on cp4020 is CRITICAL: STALE [22:20:01] PROBLEM - Varnishkafka Delivery Errors on cp4011 is CRITICAL: STALE [22:20:08] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/107489' [22:20:11] PROBLEM - Varnishkafka Delivery Errors on cp4012 is CRITICAL: STALE [22:20:13] Logged the message, Master [22:20:33] (03PS2) 10Jkrauska: Initial commit of pmacct module Initial commit of pmacct, with Leslie's comments addressed. Change-Id: I8bb1cfb204d8b7fcf775a549d004edb72f039edf [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 [22:24:49] (03CR) 10Matanya: "please fix my comments from patchset one too." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [22:25:45] LeslieCarr: what junos is cr2 running? [22:26:15] checking [22:26:16] MaxSem: http://torrus.wikimedia.org/torrus/Network?nodeid=device//cr2-eqiad.wikimedia.org [22:26:18] matanya: [22:26:20] ^ [22:26:27] http://torrus.wikimedia.org/torrus/Network?path=/Core_routers/ [22:26:29] Torrus has all [22:26:44] 10.4r7.5 matanya -- we can't upgrade it due to the fact that it's a spof for pmtpa [22:26:45] thanks Reedy [22:27:00] yeah, guessed that [22:27:06] ori, thanks, new patchset submitted [22:27:12] so soon to getting rid of tampa [22:27:18] soooo soon [22:27:24] i hear it for the last year or so [22:27:27] haha [22:27:33] It's getting closer and closer! [22:27:45] in our hearts it's dead to us .... we're just looking at the rotting hulk..... [22:27:46] Labs is one of the big things left I think [22:28:03] yeah... that's gonna be "fun" [22:28:07] i know, did some RT stuff and saw how we advanced! [22:28:44] <^demon> What's the master rt ticket for tracking shutting things down in tampa? [22:28:50] * ^demon is wondering what other than labs is still left [22:28:51] 6009 [22:29:10] https://rt.wikimedia.org/Ticket/Display.html?id=6099 [22:29:11] (03PS3) 10Jkrauska: Initial commit of pmacct module Initial commit of pmacct, with Leslie's comments addressed. Now with Matanya and Jeremyb's comments addressed. Change-Id: I8bb1cfb204d8b7fcf775a549d004edb72f039edf [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 [22:29:14] <^demon> "No permission to view ticket" [22:29:14] <^demon> boo [22:29:26] really? :/ [22:29:28] 6099 you hsould have permission [22:29:35] It's in core-ops [22:29:44] (03CR) 10Jkrauska: "Jeremyb and Matanya: Welcome to Patch Set 3 !" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [22:29:47] <^demon> user error. [22:29:48] <^demon> 6009 != 6099 [22:30:02] (03CR) 10jenkins-bot: [V: 04-1] Initial commit of pmacct module Initial commit of pmacct, with Leslie's comments addressed. Now with Matanya and Jeremyb's comments addressed. Change-Id: I8bb1cfb204d8b7fcf775a549d004edb72f039edf [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [22:30:05] it is called layer 8 issue ^demon [22:30:24] :) [22:30:35] (03CR) 10GWicke: "To clarify the clarification: We won't not touch anything in the existing setup in the first step. It will just add a new repo and the ups" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [22:30:52] speaking of ops [22:31:12] I was thinking about power usage yesterday after that wikitech-l conversation about ARM [22:31:33] what a waste of power was it [22:31:35] (03CR) 10Tim Landscheidt: Initial commit of pmacct module Initial commit of pmacct, with Leslie's comments addressed. Now with Matanya and Jeremyb's comments addresse (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [22:31:35] and rightsizing machines ? [22:31:48] torrus has power strip graphs like http://torrus.wikimedia.org/torrus/Facilities?path=/Power_strips/eqiad/ps1-b6-eqiad/System/ [22:31:58] ARM is so not ready to replace x86 in production [22:32:36] but that one in particular, b6-eqiad, seems a bit low for what's in it, considering the power measurements that the RAC gives [22:32:38] i think rightsizing is actually a pretty good idea --- there's also the idea of possibly using openstack or virtualization in the cluster, so that it many misc services could be "rightsized" [22:33:04] do you know if all the servers in each rack are actually connected to the same power strip? or do they share when there's a lot of density? [22:33:37] ottomata: oh, wait, how do I get the modules --? [22:33:48] RobH: cmjohnson1 ^^ b6 does look fairly low in usage --- do we think there may be a reporting error ? [22:34:13] TimStarling: they are all connected to the one servertech .... i'm wondering if there is a reporting error at the strip level or what is going on [22:34:19] lesliecarr: nope..b6 is really b5 which is netapp rack [22:34:23] oh! [22:34:29] right... [22:34:31] well that explains it [22:34:34] that needs to be fixed [22:34:41] isn't that in racktables LeslieCarr ? [22:34:49] ori, see top of analytics class [22:34:53] role::analytics [22:34:54] so which one is wrong, racktables or torrus? [22:35:42] it sounds like torrus is incorrect ? [22:36:20] so B7 in torrus is B6 in reality? [22:37:19] (03PS4) 10Jkrauska: Initial commit of pmacct module Initial commit of pmacct, with Leslie's comments addressed. Now with Matanya and Jeremyb's comments addressed. Typo fix Change-Id: I8bb1cfb204d8b7fcf775a549d004edb72f039edf [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 [22:37:26] (03CR) 10jenkins-bot: [V: 04-1] Initial commit of pmacct module Initial commit of pmacct, with Leslie's comments addressed. Now with Matanya and Jeremyb's comments addressed. Typo fix Change-Id: I8bb1cfb204d8b7fcf775a549d004edb72f039edf [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [22:37:43] timstarling: i am pretty confident that b5 and b6 are mixed up [22:37:52] ok [22:38:23] cajoel: ping [22:38:39] ottomata: ah, right. why don't we just add the submodule to the repo? [22:38:48] jeremyb: hello [22:39:43] cajoel: 2 problems. 1) what tim said (don't write a story in the commit msg about how all the patchsets evolved [22:39:49] caught that -- ok [22:39:58] cajoel: not fixed though [22:40:15] LeslieCarr: torrus is incorrect cuz the ips are wrong on the pdus [22:40:31] oh yeah we can do that i suppose [22:40:31] ok [22:40:31] ok, so in a PS5, I should just revert the comment to initial commit? [22:40:34] still ahve to run update though [22:40:34] this was discovered when we thought we had a pdu issue [22:40:35] ok [22:40:35] cajoel: 2) first line should be at most ~50 chars (don't remember exactly, vim changes color at some point for me) and 2nd line should be blank. your second line is not blank [22:40:36] ottomata: Could not find template 'hadoop/fair-scheduler.xml.erb' at /tmp/vagrant-puppet-1/manifests/roles.pp:831 [22:40:44] OHhh right, need to add that [22:40:48] that is not in the cdh4 module [22:40:52] that is a custom addition [22:40:59] can I put that in templates/hadoop/...? [22:41:03] that's where it is in ops/puppet [22:41:04] cajoel: any extra elaboration can go starting on line 3. line 1 should be a complete thought [22:41:16] ottomata: sure, why not? [22:41:18] k [22:41:31] i'm going to commit the submodule add too [22:41:32] cajoel: sure, original msg is fine [22:41:37] jeremyb: so init.pp meets that, but the others do not.. [22:41:50] ottomata: hang on -- do we want to do that, or do we want to just copy the module into the vagrant repo? [22:41:54] jeremyb: my bad, I see now that the back and forth should happen in gerrit [22:42:00] oof, no copy [22:42:05] what if i want to change it? [22:42:08] (which I do!) [22:42:09] cajoel: huh? everything i said above is about the commit msg not about the diff [22:42:14] that's the whole point of submodule [22:42:15] s [22:42:18] jeremyb: oh, thought you meant pp [22:42:24] no [22:42:37] ori, added templates/hadoop files [22:42:38] try again [22:42:38] ottomata: it's inelegant, yeah, but it keeps things simple. i don't mind if you +2 your own changes when updating cdh4 code in vagrant [22:42:50] (03PS5) 10Jkrauska: Initial commit of pmacct module Change-Id: I8bb1cfb204d8b7fcf775a549d004edb72f039edf [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 [22:42:53] submodule is fine too, I don't feel strongly about it [22:42:57] oof, i do not want to have to deal with merging back and forth [22:43:10] i'm also happy with making people add the submoudle manually liek that [22:43:10] hmm [22:43:32] ori, should I commit the submodule add? :p [22:43:36] either way is fine with me [22:43:41] people will still have to run update if so [22:43:43] ottomata: well, you would still need to update the submodule commit [22:43:43] yeahhhhh, probably should [22:43:48] yeah [22:43:52] but it would be frozen at a commit [22:43:54] the submodule points to a specific ref, not HEAD [22:44:00] so wouldn't change unless we mean it to [22:44:00] yeah [22:44:12] ok, yeah, i will commit, s'ok? [22:44:17] yes [22:44:38] (03PS1) 10GWicke: WIP: Phase 1 of moving to new parsoid repo / upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/107492 [22:45:35] ok [22:45:36] ori, done [22:45:51] (03CR) 10GWicke: "Step one is https://gerrit.wikimedia.org/r/107492." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [22:47:59] cajoel: no, i said blank line 2 [22:48:30] ottomata: cool, testing [22:48:41] (03PS1) 10Jkrauska: Initial commit of pmacct module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107493 [22:48:43] jeremyb: ps6 [22:48:48] hah [22:49:45] I aim to please. [22:51:06] cajoel: no [22:51:23] what is this? [22:51:36] what happened to commit --amend cajoel ? [22:51:42] hrm [22:51:51] cajoel: you now made a new changeset. you need to leave in the change-id line. just don't make it line 2. (so it can be line 3) [22:51:56] $ git commit --amend -a [22:52:24] so abandon 107493 now [22:52:28] ok, so maybe with a fresh amend an dthe change set id on line 3, we'll be happy? [22:52:53] first abandon the new one so you don't accidentally push to it :) [22:52:56] (03Abandoned) 10Jkrauska: Initial commit of pmacct module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107493 (owner: 10Jkrauska) [22:52:57] yep [22:53:02] you don't need to touch the change-id [22:53:39] (03CR) 10Jeremyb: "see I8bb1cfb204d8b7fcf775a549d004edb72f039edf" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107493 (owner: 10Jkrauska) [22:53:46] ^demon: CirrusSearch.log seems kind of verbose [22:53:51] (03PS6) 10Jkrauska: Initial commit of pmacct module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 [22:53:56] <^demon> AaronSchulz: I know. [22:54:00] there we go... [22:54:01] <^demon> We're splitting it into a couple of logs. [22:54:06] <^demon> (As of master, we have) [22:54:38] (this is my first review, thanks for the pointers guys) [22:55:01] hah, grou p [22:55:13] cajoel: mazel tov :) [22:55:18] cajoel: are you familiar with puppet-lint? [22:56:14] matanya: yes ran that a bunch in the early stages.. has it wandered far off?? it would have caught your alignment issues... [22:56:49] <^demon> AaronSchulz: We'll have a couple of logs. "All searches," "Slow searches," "Failed updates" and then the general debug log for everything else. [22:56:58] <^demon> With all searches in a different log it'll shut up a lot. [22:57:02] cajoel: yes, i wonder if you did it just before the commit [22:57:11] PROBLEM - Varnishkafka Delivery Errors on cp4012 is CRITICAL: STALE [22:57:31] PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: STALE [22:57:31] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: STALE [22:57:31] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: STALE [22:57:31] PROBLEM - Kafka Broker Messages In on analytics1022 is CRITICAL: STALE [22:57:31] PROBLEM - Varnishkafka Delivery Errors on cp4020 is CRITICAL: STALE [22:57:31] PROBLEM - Varnishkafka Delivery Errors on cp4019 is CRITICAL: STALE [22:57:41] PROBLEM - Varnishkafka Delivery Errors on cp3013 is CRITICAL: STALE [22:57:43] here goes some fun [22:57:46] matanya: I had not, and had made a few minor changes --- found another alignment of =>s error [22:57:51] PROBLEM - Varnishkafka Delivery Errors on cp3014 is CRITICAL: STALE [22:58:01] PROBLEM - Varnishkafka Delivery Errors on cp4011 is CRITICAL: STALE [22:58:26] cajoel: must manifests don't pass 100% but errors should be in any case [22:58:49] matanya: [22:58:58] matanya: only thing left -- WARNING: string containing only a variable [22:59:02] *most [22:59:11] cajoel: you can fix it [22:59:13] I assume you can just do a direct assignment? no quotes? [22:59:22] I'd love to be lint free. [22:59:23] 1s [22:59:44] TIAS? ;) [22:59:48] does that still necessitate the {}s ? [22:59:54] cajoel: your best friend: http://puppet-lint.com/checks/ [23:00:06] sweet [23:00:34] cajoel: does that answer your question? [23:00:40] you betcha [23:01:09] (03PS1) 10MaxSem: Sync TextExtracts config with MobileFrontend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107499 [23:02:00] (03CR) 10MaxSem: [C: 032] Sync TextExtracts config with MobileFrontend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107499 (owner: 10MaxSem) [23:02:08] (03Merged) 10jenkins-bot: Sync TextExtracts config with MobileFrontend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107499 (owner: 10MaxSem) [23:02:46] (03PS7) 10Jkrauska: Initial commit of pmacct module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 [23:03:03] (03PS1) 10Ottomata: Fixes for ganglia types and non standard gmond aggregator ports [operations/software/ganglios] - 10https://gerrit.wikimedia.org/r/107501 [23:03:08] ottomata: https://dpaste.de/yuDV/raw [23:03:10] matanya: [23:03:16] matanya: lint-free puppet [23:03:40] hmmm, ori those aren't enough logs to tell what's happening [23:03:41] do this [23:03:42] hm [23:03:47] tail -f /var/log/hadoop*/* [23:03:50] and then try to start it [23:04:35] !log maxsem synchronized wmf-config/CommonSettings.php 'https://gerrit.wikimedia.org/r/107499' [23:04:40] Logged the message, Master [23:05:31] ottomata: https://dpaste.de/5yab/raw ; TL;DR: java.io.IOException: Failed on local exception: java.net.SocketException: Unresolved address; Host Details : local host is: "mediawiki-vagrant.corp.wikimedia.org"; destination host is: (unknown):0; [23:05:53] (03PS2) 10Ottomata: Fixes for ganglia types and non standard gmond aggregator ports [operations/software/ganglios] - 10https://gerrit.wikimedia.org/r/107501 [23:06:04] hmmmm [23:06:12] where is it getting .corp.wikimedia.org? [23:06:17] ha, no idea [23:06:19] oh, my resolv.conf [23:06:20] it uses $fqdn [23:06:29] yeah, sorryi couldnt' make it work without fqdn [23:06:37] there are too many places in cdh4 where that is needed [23:06:41] (03CR) 10Matanya: [C: 031] Initial commit of pmacct module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [23:06:47] i'm in the office, so facter fqdn says "mediawiki-vagrant.corp.wikimedia.org" [23:06:53] hmmm, ok [23:06:56] so that's cool [23:07:10] its totally weird that that changes though, i feel like we should make vagrant set it manually somehow [23:07:29] yeah, my thought exactly [23:08:35] ori, check out /etc/hadoop/conf/core-site.xml [23:08:43] what is fs.defaultFS [23:08:44] ? [23:09:02] (03CR) 10Lcarr: [C: 032] Initial commit of pmacct module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107457 (owner: 10Jkrauska) [23:09:40] LeslieCarr: are you doing the private puppet part? [23:09:46] ottomata: hdfs://mediawiki-vagrant.corp.wikimedia.org/ [23:09:51] ottomata: going to try http://www.benjaminoakes.com/category/vagrant/ [23:10:16] jeremyb: yes i will [23:10:32] LeslieCarr: want to do https://rt.wikimedia.org/Ticket/Display.html?id=6631 too then? [23:10:54] sure [23:10:54] :) [23:11:43] (03PS1) 10Chad: Add new CirrusSearch logs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107503 [23:11:51] ottomata: works! [23:11:53] ottomata: notice: /Stage[main]/Cdh4::Hadoop::Namenode/Service[hadoop-hdfs-namenode]/ensure: ensure changed 'stopped' to 'running' [23:12:12] i'll submit that in a separate commit [23:12:45] (03PS2) 10GWicke: WIP: Phase 1 of moving to new parsoid repo / upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/107492 [23:13:50] ok awesome, [23:13:54] good find ori [23:14:01] ori, as a final check [23:14:05] if you get hive role happy [23:14:13] launch hive shell [23:14:13] hive [23:14:14] and run [23:14:16] show databases; [23:14:19] just make sure that it works [23:14:25] and doesn't crap out [23:14:38] i wonder who gets the aliases cronmail? cajoel ? [23:15:38] why do we not use the email notification feature in DRAC? [23:16:12] abartov: you have mail [23:16:43] ottomata: how do i run hive? it's not in my path [23:17:00] TimStarling: I think the idea is we rather do icinga checks for notification, since its not as vendor dependent, but i may be wrong. [23:17:13] ottomata: also, +1 / +2 for https://gerrit.wikimedia.org/r/#/c/107504/ plz [23:17:19] which role did you include? [23:17:29] but it would send out hw failure logs better than an icinga check. [23:17:35] icinga won't tell you when a fan goes down [23:17:45] ottomata: ah, d'oh. hadoop but not hive. [23:17:46] well, we can setup checks for that stuff, but someone has to write it [23:17:50] yeah, ori [23:17:53] you can do just analytics role if you want [23:17:55] to get everything :) [23:17:58] there is software for dell reads, but its rpm as well [23:18:07] where as you point out, drac email is just enable and its good [23:18:17] (we'd have to also setup mail relay for mgmt network) [23:18:22] but thats not that bad i dont think [23:18:34] though, others may disagree. [23:19:02] smtp.wikimedia.org is not reachable? [23:21:20] dunno [23:22:11] springle-away: are you secretly here ? [23:22:45] ottomata: i think it worked: https://dpaste.de/032T/raw [23:23:13] TimStarling: there's a smtp.pmtpa.wmnet but doesn't seem to exist for the other DCs. don't recall offhand if MTA has moved yet [23:23:35] it has not [23:23:54] its on our remaining services list(s) ;[ [23:24:05] I tried sending a test email out of mc1001.mgmt via smtp.wikimedia.org, it didn't work [23:24:07] hah, smtp is sanger [23:24:11] but yea, even if its not reachable, making it such shouldnt be that difficult. [23:24:19] does sanger still exist? [23:24:22] yes [23:24:23] which db servers are the misc cluste rin eqiad ? [23:24:26] its still our mail server [23:24:27] =] [23:27:17] ottomata: merged, but see my review message for some ideas for future commits [23:27:25] LeslieCarr: db1001, db1016, db1046, db1048 according to site.pp [23:27:41] that is role::coredb::m1 and ::m2 [23:28:19] ther is also role::coredb::x1, not sure what it is for [23:28:29] thanks - we're talking about doing puppet defined database schema and was trying to see if we did that anywher ein our infrastructure (AFAICT, no( [23:29:22] welp, there's my answer manifests/misc/maintenance.pp:354: $recipient = 'officeit@wikimedia.org' [23:29:53] LeslieCarr: https://xkcd.com/859/ [23:30:21] cajoel: so, julie's on that list, right? [23:30:47] )) fixed it [23:30:49] gwicke: re: https://gerrit.wikimedia.org/r/#/c/107492, I'm not sure, but you may have to specify provider => init or provider => debian on the Service['parsoid'] resource [23:31:05] thanks tim, now i can sleep tonight [23:31:09] gwicke: otherwise how would Puppet know which one to manage? [23:32:33] gwicke: you can use provider => base and specify the status and restart commands explicitly as resource parameters to avoid any possibility of ambiguity [23:32:37] ori, the intention is to set this up for manual testing only in this step [23:32:49] so we don't want puppet to call upstart just yet [23:32:53] right, but it could [23:33:02] that will happen after testing [23:33:02] because you have service { 'parsoid': ensure => running } [23:33:17] yes, that is using the init script and old repo [23:33:25] if there is both an init.d script and an upstart job config in /etc/init, how does puppet know which to call? [23:33:26] (03PS1) 10Lcarr: Applying pmacct to netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107507 [23:33:52] possibly whatever 'service parsoid status' resolves to [23:35:04] (03CR) 10Lcarr: [C: 032] Applying pmacct to netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/107507 (owner: 10Lcarr) [23:35:05] ori: first guess is depends on the provider? [23:35:25] jeremyb: but what if the provider is unspecified? the default service provider on debian/ubuntu is 'debian' IIRC [23:35:41] ori, it says require => File['/etc/init.d/parsoid'] in the service block [23:35:53] will that hint on the provider too? [23:36:01] nope [23:36:16] https://github.com/puppetlabs/puppet/blob/master/lib/puppet/provider/service/debian.rb [23:36:26] so puppet tries to guess based on which config files are present? [23:36:47] i think so, yes. it might not be puppet but a debian tool [23:37:36] awesome! [23:37:38] ori looks good [23:37:42] in practice it might not matter much [23:38:00] as 1) we won't deploy using the old repo any more, and 2) automatic restarts are disabled [23:38:14] well, yeah, but if it finds the upstart service first, it will try to start it [23:38:17] only explicit salt calls will attempt a restart [23:38:20] so you could end up with both running in parallel [23:38:31] RECOVERY - DPKG on searchidx1001 is OK: All packages OK [23:38:42] hm [23:39:09] it may be that it resolves to the init.d service by default, in which case this is moot, but still worth checking [23:40:44] gwicke: for /sbin/service, upstart comes first [23:40:48] if [ -r "/etc/init/${SERVICE}.conf" ]; then [23:40:48] # Upstart configuration exists for this job [23:40:53] # Otherwise, use the traditional sysvinit [23:40:53] if [ -x "${SERVICEDIR}/${SERVICE}" ]; then [23:41:02] hmm, ok [23:41:08] I'll rename the puppet config name then [23:41:15] so that it does not match the service name [23:41:24] yeah, that should do the trick [23:41:32] thanks for checking that [23:41:36] np [23:41:37] then ensure disabled the other one? [23:41:52] jeremyb: you can just not declare a Service for it [23:41:59] (03PS1) 10Lcarr: requiring ferm class if ferm rule is called [operations/puppet] - 10https://gerrit.wikimedia.org/r/107509 [23:42:05] so puppet doesn't manage it [23:42:21] ori: idk about this case. but it could be enabled (at system boot) some other way [23:42:33] jeremyb: it's upstart, so it's based on the start-on condition [23:43:09] well, good thing you said that [23:43:13] gwicke: you want to change that too: start on (local-filesystem and net-device-up IFACE!=lo) [23:43:18] (03CR) 10Lcarr: [C: 032] requiring ferm class if ferm rule is called [operations/puppet] - 10https://gerrit.wikimedia.org/r/107509 (owner: 10Lcarr) [23:43:35] gwicke: you can just omit the 'start on' directive entirely, which makes the job completely manual [23:44:03] otherwise if the machine is restarted puppet will enable the sysvinit service and upstart will start the test one [23:44:11] ori, it would be nice to make sure that parsoid keeps running even after a reboot [23:44:18] (03PS3) 10GWicke: WIP: Phase 1 of moving to new parsoid repo / upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/107492 [23:44:49] with the service name no longer matching I would expect the upstart job not to trigger automatically any more [23:44:59] it will if the machine is rebooted [23:45:11] because it specifies: start on (local-filesystem and net-device-up IFACE!=lo) [23:45:17] the config file is now called parsoid-test.conf [23:45:34] ah, from upstart [23:45:36] yeah, this has nothing to do with any ambiguity with how 'parsoid' is resolved [23:45:37] yep [23:46:32] (03PS1) 10Lcarr: cannot have two require statements [operations/puppet] - 10https://gerrit.wikimedia.org/r/107510 [23:46:39] we are sharing the upstart config with beta labs, so I can't simply hack that [23:47:13] (03CR) 10jenkins-bot: [V: 04-1] cannot have two require statements [operations/puppet] - 10https://gerrit.wikimedia.org/r/107510 (owner: 10Lcarr) [23:47:57] I guess this is fine for now as long as we keep the test window small [23:48:15] something like merge tomorrow, then test on some machine, then switch completely to upstart [23:48:33] maybe add: [23:48:35] that would make a machine restart in those 1-2 hours unlikely [23:48:41] pre-start script [23:48:44] /etc/init.d/parsoid status && { stop; exit 0; } [23:48:50] end script [23:48:57] (03PS2) 10Lcarr: cannot have two require statements [operations/puppet] - 10https://gerrit.wikimedia.org/r/107510 [23:49:09] per http://upstart.ubuntu.com/cookbook/#pre-start [23:49:46] though there's still a race condition if upstart runs first after a reboot, i guess [23:50:01] yeah, I don't see how that will heelp [23:50:04] anyways, up to you how much you want to worry about it [23:50:16] it is for a few hours at most in any case [23:50:18] (03CR) 10Lcarr: [C: 032] cannot have two require statements [operations/puppet] - 10https://gerrit.wikimedia.org/r/107510 (owner: 10Lcarr) [23:50:40] if a machine reboots in that time we can deal with it manually [23:50:46] * ori nods [23:52:31] (03PS1) 10Lcarr: including class ferm (to make ferm rules work) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107512 [23:53:22] (03CR) 10Ori.livneh: [C: 031] WIP: Phase 1 of moving to new parsoid repo / upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/107492 (owner: 10GWicke) [23:53:58] (03CR) 10Lcarr: [C: 032] including class ferm (to make ferm rules work) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107512 (owner: 10Lcarr) [23:54:44] bad LeslieCarr mixing tabs and spaces!! (gerrit 107507) [23:55:34] (03PS1) 10Lcarr: Revert "requiring ferm class if ferm rule is called" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107514 [23:55:45] (03CR) 10jenkins-bot: [V: 04-1] Revert "requiring ferm class if ferm rule is called" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107514 (owner: 10Lcarr) [23:55:55] (03Abandoned) 10Lcarr: Revert "requiring ferm class if ferm rule is called" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107514 (owner: 10Lcarr) [23:56:44] (03PS1) 10Lcarr: requiring ferm winds up making cyclical dependancy [operations/puppet] - 10https://gerrit.wikimedia.org/r/107515 [23:57:12] did you look at the dependancy graph? [23:57:40] i didn't look at the graph but it makes sense [23:57:41] :) [23:58:11] i was trying to get the ferm class to call itself it the ferm rule was called [23:58:14] because turns out it won't :) [23:58:34] (03CR) 10Lcarr: [C: 032] requiring ferm winds up making cyclical dependancy [operations/puppet] - 10https://gerrit.wikimedia.org/r/107515 (owner: 10Lcarr)