[08:04:31] paravoid: do you have time to do some puppet reviews today? [08:11:09] matanya: I'm afraid not [08:11:47] paravoid: ok, thank you. I would like to have a short personal talk with why you have 5 minutes [08:11:54] *when [08:12:03] let's do it now :) [08:14:39] (03PS2) 10Matanya: ganglia_new: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107128 [08:15:57] (03PS7) 10Matanya: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 [08:29:06] (03PS6) 10Matanya: beta: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 [08:30:46] (03PS4) 10Matanya: coredb_mysql: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/108488 [08:33:21] (03PS3) 10Matanya: Torrus: move from manutius to netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/108314 [08:36:13] (03PS3) 10Matanya: coredb_mysql: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/108313 [08:37:08] (03PS3) 10Matanya: torrus: move into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108498 [08:38:51] (03PS2) 10Matanya: realm: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/109074 [08:44:33] (03PS1) 10Ori.livneh: LVS: add Icinga checks for critically important sysctl params [operations/puppet] - 10https://gerrit.wikimedia.org/r/111163 [08:48:37] good morning [08:51:47] (03PS3) 10Matanya: nfs: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109081 [08:51:48] hi hashar [08:52:03] ori: hello [08:52:22] (03PS3) 10Matanya: gerrit: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109088 [08:52:38] yesterday summary score for player [ori] : -1 on easter egg removal +1 on migrating scap to python [08:52:39] :D [08:53:05] it's just one script of several [08:54:13] we should probably think of working toward integration with salt [08:54:32] i'll reply on the patch [08:54:37] or Fab if it better suit our purposes [08:56:46] (03PS20) 10Matanya: site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 [08:59:26] PROBLEM - Puppet freshness on cp3019 is CRITICAL: Last successful Puppet run was Mon 03 Feb 2014 08:55:41 PM UTC [09:01:42] ori: integrate scap with salt? [09:02:24] Ryan_Lane1: with $DEPLOYMENT_SYSTEM [09:02:30] heh [09:02:44] why not just move to the new system? [09:02:59] I'll likely be replacing the perl code with trigger this week [09:03:05] (03PS7) 10Hashar: beta: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 (owner: 10Matanya) [09:03:20] I'm in the last stages of testing that. it's packaged and everything [09:03:35] if it works, sure [09:03:40] (03CR) 10Hashar: "Misc fix:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 (owner: 10Matanya) [09:03:55] after doing so I can also add the web interface for reporting [09:04:22] ah Ryan_Lane1 good morning :-] [09:04:29] hashar: howdy [09:04:33] it's like 1am here ;) [09:04:58] was wondering how hard it is to create a labs project which isolate instances from the network [09:05:13] would the security group be enough for that or does it need something on openstack side? [09:05:19] should be easier with neutron [09:05:28] which andrewbogott is working on for eqiad [09:05:31] would have to be done manually or maybe provisioned via LDAP ? [09:05:50] well, if we expose access to neutron, you could create your own network [09:05:55] but I wouldn't recommend that [09:06:26] What kind of isolation do you need that isn't accomplished via firewall? [09:06:30] security groups are only somewhat helpful there [09:06:49] andrewbogott: this would be for running jenkins slaves in lavs [09:06:50] *labs [09:06:51] my use case is to isolate an instance from the rest of our network but still let bastion/contint server to ssh to it and grant web access to the instance [09:07:06] Ah, I see. Isolated in both directions :) [09:07:10] good afternoon Singapore :-] [09:07:18] will write a blueprint somewhere on the wiki [09:07:45] it may not be a big deal. if someone wants to attack labs, they just need to get an account for the same level of privilege I guess [09:08:03] another question Ryan_Lane1, is it possible to get access to OpenStack REST API with a user that would be able to create instances bypassing OpenStackManager ? [09:08:17] yes. we've been hesitant to do it previously [09:08:21] (03PS3) 10Nemo bis: Relative path in varnish error message: remove excess / [operations/puppet] - 10https://gerrit.wikimedia.org/r/102945 [09:08:27] use case is OpenStack created a small daemon that can maintain a pool of instances to be consumed by Jenkins/Zuul [09:08:29] because it wouldn't add DNS or puppet configuration for the instance [09:08:46] Who touches varnish stuff apart from m.ark? https://gerrit.wikimedia.org/r/#/c/102945/ [09:08:52] if we use designate the dns issue goes away [09:09:01] the instances can likely live without puppet [09:09:27] sounds good [09:09:35] gotta wait for eqiad to be ready anyway [09:09:37] it would be ideal to have both, though :D [09:10:03] ah puppet I would probably need it to provision the instance properly :( [09:10:17] maybe not [09:10:31] you could create an image that was private to the project [09:10:40] using vmbuilder [09:10:49] it could be pre-loaded with everything you need [09:10:58] yeah that is another point, I would need to be able to refresh the image on a daily basis [09:11:03] daily? [09:11:16] cron @daily glance .... [09:11:20] I'd say write some puppet modules that can be run via puppet apply [09:11:33] put the puppet module into the image, and have it git pull on update [09:11:36] err [09:11:39] on instance start [09:11:44] then run puppet apply [09:11:59] then you'd only need to update the image occasionally for speed [09:12:11] daily image updates would eat a ton of space [09:12:17] still mean the instance will take a bunch of time to build [09:12:26] depends on how much it changes [09:12:51] well the instances are going to be deleted anyway, so that would free up space? [09:12:57] nope [09:13:08] * hashar is a newbie [09:13:11] because the images would still be in glance [09:13:16] you'd have to delete the images too [09:13:25] it's doable, but it's quite a bit of work [09:13:43] ok ok [09:13:45] so, it's possible to upload your own custom images yourself, if we give access to do so [09:13:49] and to delete them [09:14:09] you could have jenkins build them and upload them [09:14:22] and maybe have a cron to delete any older than 7 days or so [09:16:22] yeah [09:16:30] I guess that is enough for today. Sleep week Ryan_Lane1 :-] [09:16:52] :) [09:16:55] * Ryan_Lane1 waves [09:22:40] (03PS1) 10Andrew Bogott: Add a few more neutron packages, adjust sysctl settings. [operations/puppet] - 10https://gerrit.wikimedia.org/r/111165 [09:25:17] (03PS8) 10Hashar: beta: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 (owner: 10Matanya) [09:25:32] (03CR) 10Hashar: "limited number of roles" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 (owner: 10Matanya) [09:25:45] Ryan_Lane1: speaking of images, any advice about how to get our glance images from virt0 to virt1000? [09:25:57] I presume that I can't just rsync stuff since the db needs to know what's in there. [09:26:13] (03CR) 10Alexandros Kosiaris: [C: 04-1] "You are removing all usages of nrpe_check_disk_6_3. Why not remove the nrpe definitions as well?" (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110844 (owner: 10Matanya) [09:26:21] matanya: ^ there you go. [09:26:44] btw this is going to need an nrpe change on my part. I am evaluating it. [09:26:48] akosiaris: good morning :-D could you merge in https://gerrit.wikimedia.org/r/#/c/108289/ please that makes beta a module :-] [09:27:09] thanks akosiaris [09:27:14] akosiaris: will apply it right away and tweak the classes applied on instances [09:28:24] (03CR) 10Ori.livneh: "There are plusses and minuses that come with using Fabric." [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 (owner: 10Ori.livneh) [09:29:27] hashar: it has no reviews ... [09:29:47] andrewbogott: why bother? [09:29:56] make new ones on virt1000 with the same names [09:30:13] akosiaris or apergos, do you know? That's a pretty ugly error: Nemo_bis> Who touches varnish stuff apart from m.ark? https://gerrit.wikimedia.org/r/#/c/102945/ [09:30:16] akosiaris: the modularization has been made by matanya so I reviewed it already :d Did a few tweaks [09:30:21] oh. hm. we're migrating instances [09:30:24] Ryan_Lane1: well, when we transfer... [09:30:25] yeah [09:30:26] PROBLEM - Puppet freshness on cp3021 is CRITICAL: Last successful Puppet run was Mon 03 Feb 2014 09:26:27 PM UTC [09:30:33] Nemo_bis: MaxSem has some varnish knowledge [09:30:48] andrewbogott: dump the database [09:30:52] copy the files over [09:30:53] thanks, added [09:30:58] upgrade glance [09:31:02] akosiaris: are you planning to add the critical, or implement some other mechanism? [09:31:02] (the schema) [09:31:11] (03CR) 10Nikerabbit: "This is a followup to Ie5e46a9feb I assume?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110970 (owner: 10Reedy) [09:31:11] matanya: add the critical [09:31:15] that should likely work ok [09:31:28] Ryan_Lane1: hm… yeah, that'll probably work. [09:31:40] I'll give it a try [09:31:46] I'm glad the new version has a sync for glance [09:32:25] Yeah, it's a good idea, just too late to help me right now. [09:32:29] heh. yep [09:32:42] well, it'll help when we set up a region in the new dc [09:33:01] yep. [09:33:26] Do I even need to dump the database? Can I just copy those files too? This page, surprisingly, seems to think that that works. https://dev.mysql.com/doc/refman/5.0/en/copying-databases.html [09:34:02] hm [09:34:07] I'd do a dump [09:34:14] in fact, dumps already exist [09:34:22] Oh, sure, in backup [09:34:23] in /a/backups [09:35:21] ok, off to sleep [09:35:23] * Ryan_Lane1 waves [09:36:06] 'night! [09:38:36] (03PS2) 10Matanya: mail :lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109514 [09:39:13] (03CR) 10jenkins-bot: [V: 04-1] mail :lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109514 (owner: 10Matanya) [09:39:53] andrewbogott: if you create new images with glance, I could use one with ubuntu 14.04 :-D [09:40:09] (filled a RT about it iirc) [09:40:09] andrewbogott: +1 [09:40:16] ok -- I haven't done it before. [09:40:23] let me find the doc [09:41:18] https://wikitech.wikimedia.org/wiki/OpenStack#Building_new_images [09:41:51] it doesn't say where to get .img from though :D [09:42:27] I'm in the middle of something right now, sorry. I'll try to get to that soon. [09:42:33] yeah no hurry [09:42:40] just one more thing on your stack [09:42:46] until it is full like Faidon one :-D [09:44:39] Nemo_bis: I am failing to understand the problem. It adds indeed an extra slash (which is correct because it wants a protocol relative url). Also on any javascript aware browser will just reload due to the onclick. [09:46:12] andrewbogott: for labs do you prefer bugs in Wikimedia Labs > Infrastructure or RT ticket to ops-requests ? [09:46:25] bugzilla mostly. [09:46:54] akosiaris: it's not a protocol relative URL. Have you tried looking at the link? [09:47:03] (03CR) 10Ori.livneh: "...basically, I think we should stick to shelling out to dsh, even though it's horrible and gross, until we've finished Pythonizing the re" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 (owner: 10Ori.livneh) [09:47:30] (03PS3) 10Matanya: mail :lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109514 [09:47:32] akosiaris: andrewbogott: the request for Ubuntu 14.04 is in bugzilla https://bugzilla.wikimedia.org/show_bug.cgi?id=60684 [09:48:08] (03PS4) 10Nemo bis: Relative path in varnish error message: remove excess / [operations/puppet] - 10https://gerrit.wikimedia.org/r/102945 [09:48:09] (03CR) 10jenkins-bot: [V: 04-1] mail :lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109514 (owner: 10Matanya) [09:48:10] Nemo_bis: it is [09:48:28] Yes, which is obviously wrong. :) [09:48:39] "wiki" is not a domain [09:48:55] obviously [09:49:03] but why does it matter ? [09:49:39] hmmm now I see what you mean [09:50:26] (03PS4) 10Matanya: mail :lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109514 [09:50:36] Firefox even refuses to do anything when I click such a malformed link. [09:51:13] heh ? doesn't javascript kick in ? [09:51:25] or do you have it disabled ? [09:51:42] It's enabled on that domain. [09:52:54] (03CR) 10Mark Bergsma: [C: 031] "Awesome." [operations/puppet] - 10https://gerrit.wikimedia.org/r/111163 (owner: 10Ori.livneh) [09:53:37] mark: i felt pretty awful reading your e-mail.. sorry about that. not quire sure how we missed it. [09:53:42] *quite [09:54:16] Nemo_bis: still, removing the / won't solve anything. [09:54:22] don't worry about it... in dutch we have a saying: "waar gehakt wordt vallen spaanders" ;) [09:54:31] let's see if I can find an english translation of it hehe [09:54:54] an omelette without breaking eggs? [09:55:01] the dutch one is better ;) [09:55:25] we have that saying too [09:55:30] akosiaris: why not? [09:56:30] heh [09:59:02] mark, http://www.slate.com/blogs/lexicon_valley/2013/12/30/english_idioms_it_may_be_true_that_you_can_t_make_an_omelet_without_breaking.html [09:59:31] Nemo_bis: cause it is not my day today... It will obviously solve it. [09:59:34] (I do not share that writer's opinion about banning the phrase, just funny that many languages claim it for their own) [09:59:43] i told you the dutch one is better ;) [10:00:34] mark: mind if I merge https://gerrit.wikimedia.org/r/#/c/102945 ? [10:01:12] (03CR) 10Andrew Bogott: [C: 032] Add a few more neutron packages, adjust sysctl settings. [operations/puppet] - 10https://gerrit.wikimedia.org/r/111165 (owner: 10Andrew Bogott) [10:01:22] akosiaris: go ahead [10:01:45] Nemo_bis: btw my firefox does try to reload despite the malformed link. [10:02:16] akosiaris: mine shows some spinning but doesn't actually do anything (nothing visible or logged). [10:02:51] then i'd say it works as expected. [10:03:53] (03CR) 10Alexandros Kosiaris: [C: 032] Relative path in varnish error message: remove excess / [operations/puppet] - 10https://gerrit.wikimedia.org/r/102945 (owner: 10Nemo bis) [10:06:10] (03PS4) 10Matanya: mysql: change nrpe monitoring to use nrpe::monitor [operations/puppet] - 10https://gerrit.wikimedia.org/r/110844 [10:07:00] (03PS2) 10Matanya: swift: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109625 [10:07:09] mark, I already have an appointment with Leslie to sort this out, but in case there's an easy answer: How can assign additional IPs to additional nics on a server? I know how to add them to the dns repo, I think, but it's unclear to me what happens if multiple fixed IPs are assigned to one box. [10:07:55] so for multiple ips to the same interface you can use interface::ip [10:08:06] if it's for a different interface, we don't currently have puppet config for that [10:08:19] that's really a matter of configuring it in /etc/network/interfaces [10:08:36] In this case it's different interfaces -- three IPs, three interfaces. [10:08:51] Ok, but the server will pick up those IPs from dhcp, right? So they have to be assigned externally first? [10:09:06] well... [10:09:12] dhcp is only used during the installation phase [10:09:16] and should not be used after [10:09:24] oh, ok. [10:09:33] the installer converts the dhcp ip into a static ip [10:09:40] so you should only do that for eth0, the "system" ip [10:09:49] other nics normally have different functions [10:09:56] have a look at site.pp for lvs1001 etc [10:10:02] those configure ips on additional nics [10:10:06] I think you could use that [10:10:34] hmm it uses interface::tagged though [10:10:39] which I think is what we used anyway [10:10:46] do you know what a "tagged" interface is? [10:10:49] 802.1Q tagging [10:10:54] Where do I document the fact that these additional IPs are 'taken'? [10:11:01] in reverse dns [10:11:02] (I don't know what a tagged interface is, yet.) [10:11:17] ok, 802.1Q tagging is vlan tagging, basically multiplexing of vlans [10:11:17] ok. [10:11:33] so using tagging you can have traffic for multiple virtual lans across the same link/interface [10:11:46] a tag is nothing more than a prefix to the message saying "this packet belongs to subnet X" [10:12:02] I don't need that, though, do I? Since I have multiple interfaces anyway? [10:12:04] so e.g. the LVS servers sit in all vlans we have realservers in [10:12:09] yeah but you do want that anyway [10:12:12] since you may want it later [10:12:14] and it's "cleaner" [10:12:19] we did it in tampa that way [10:12:24] even though I think only one vlan was used so far [10:12:45] so, iirc, the tampa hosts have eth1 configured with tagging [10:12:47] but in any case don't use untagged vlans [10:12:55] ? [10:13:21] if this is andrewbogott's setup, an untagged vlan won't work [10:13:39] i'm not sure what you mean [10:14:22] andrewbogott is trying to have multiple interfaces having muliple source traffic, correct? [10:14:58] For reference, I'm trying to set up what is specified in the first 'note' on this page: http://docs.openstack.org/trunk/install-guide/install/apt/content/neutron-install.dedicated-network-node.html right up top [10:16:36] oh, then ignore me [10:16:46] ok so [10:16:53] linux treats a tagged interface as a separate nic [10:16:55] sorry for interupet [10:17:07] so eth1.2 is "all packets arriving/sending on eth1 which have vlan tag 2" [10:17:36] so you can treat eth1.2 as if it were a dedicated nic (eth1) that is just connected to a subnet vlan 2 as normal [10:18:39] mark: That means I'm doubling up the traffic on eth1 though. Isn't that worse than using the separate nics? [10:18:48] Maybe I'm misunderstanding [10:18:53] why would it double up traffic? [10:19:31] I feel like you're describing a setup that maps multiple channels onto a single nic, leaving the box's other nics idle. [10:19:42] no [10:19:50] it's also exactly the same as in tampa [10:19:51] i.e. [10:20:00] eth0 is used as it always is [10:20:17] and eth1 is used for inter-host communication (e.g. between instances on different nodes) [10:20:25] the fact that a vlan tag is added doesn't matter [10:20:31] it just provides additional flexibility later [10:20:40] should we need to expand labs to multiple vlans, or between multiple data center rows [10:20:43] we haven't needed that in tampa yet [10:20:50] but it's not complex to add now, and is hard to change later [10:21:45] Ah, ok -- you're talking about a /different/ nic with a tag. Sorry, was distracted by the tag, forgot you were now talking about eth1 instead of eth0 [10:22:59] yeah [10:23:09] so if we set this up, which is pretty easy with puppet [10:23:18] you can just think of eth1. as if it were eth1 [10:23:26] (03PS1) 10Andrew Bogott: Grab a couple more IPs for labnet1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/111169 [10:23:26] So, my attempt to designate additional IPs: [10:23:33] Is that the right idea? [10:24:00] i'm not sure what you're trying to do there [10:24:30] ok :) [10:24:32] why do you need multiple ips in the same subnet? [10:25:32] Sorry, I'm lost. Is your point that they should be in different subnets? Or that that's not the place to do this at all? [10:25:59] so, normally the only reason you need multiple ips [10:26:18] is because you need to do different things on the same port (say port 80) on an ip [10:26:23] if they are in the same subnet [10:26:33] if they are in different subnets, it is to communicate across subnets [10:26:43] i think i'm not being clear, sorry ;) [10:26:55] perhaps I should ask [10:26:59] why do you think you need multiple ips? [10:27:30] Ah! Because the docs say so :) Here's that link again…. http://docs.openstack.org/trunk/install-guide/install/apt/content/neutron-install.dedicated-network-node.html [10:27:50] A few lines down (after saying that I need three nics) it says "All nics need static IPs" [10:28:02] So, I'm just blindly complying, so far. [10:28:08] hehe [10:28:16] I think I should find some time to look over this with you [10:28:16] (03CR) 10Hashar: "I have copy pasted ori comments on wikitech at https://wikitech.wikimedia.org/wiki/Fabric for later references." [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 (owner: 10Ori.livneh) [10:28:36] because it's already complex enough without the openstack specific terminology [10:29:01] but right now I can say that allocating multiple ips within the _same_ subnet is probably not what you need [10:29:10] Later on there are steps like "Configure the EXTERNAL_INTERFACE without an IP address and in promiscuous mode. Additionally, you must set the newly created br-ex interface to have the IP address that formerly belonged to EXTERNAL_INTERFACE." [10:29:16] yes [10:29:23] So, further refs to each nic having an IP. [10:29:23] i'm not sure yet what EXTERNAL_INTERFACE means in openstack speak [10:29:26] do you know? [10:29:32] we should definitely draw some diagrams [10:30:10] It's awkward because there's clearly an editing mistake here. But it says "The management network handles communication among nodes. The data network handles communication coming to and from VMs. The external NIC connects the network node, and optionally to the controller node, so your VMs can connect to the outside world." [10:30:18] Which makes modest amounts of sense to me. [10:30:30] The 'management' network is what I would call the normal connection, eth0 [10:30:49] 'data' and 'external' are the additional OpenStack-specific bits that I need to add. [10:30:53] yeah it's confusing [10:31:17] (03PS2) 10Matanya: dns: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109632 [10:31:50] external nic is almost certainly eth0 then [10:32:12] and the data network is almost certainly eth1. [10:32:15] (03PS2) 10Matanya: varnish: puppet 3 compatibility fix: correct variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/109869 [10:32:16] but i'll have to check [10:32:35] (03PS2) 10Matanya: facilities: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/110339 [10:33:00] (03PS2) 10Matanya: certs: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/110366 [10:34:01] i would say "management network" and "external nic" are the same, eth0 [10:34:15] we use it both to manage the hosts (with puppet and all) [10:34:30] and right now it's also used for sending off traffic from the network node to the internet [10:34:40] there's hardly any reason to separate them, especially with 10G [10:36:11] ok, I thought I saw a reason here, let me find the note... [10:36:59] Ah, ok, this: "The host must have an IP address associated with an interface other than EXTERNAL_INTERFACE, and your remote terminal session must be associated with this other IP address." http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plug-in.ovs.html [10:37:00] (03PS2) 10Matanya: emery: RT #6143 move two logs to erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/110382 [10:37:12] That suggests that they can't be the same, at least to follow this guide. [10:37:41] (03PS5) 10Matanya: webserver: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/110454 [10:37:46] (03PS1) 10Ori.livneh: package-builder.pp: add admonition to not refactor or lint this module [operations/puppet] - 10https://gerrit.wikimedia.org/r/111170 [10:38:07] so [10:38:19] the network node(s) will have at least one additional interface ip [10:38:20] for NAT [10:38:32] so any instance that doesn't have its own public ip [10:38:38] will share in the SNAT ip for outgoing traffic [10:39:16] sure. That sounds like 'External' to me. [10:39:30] (03CR) 10Ori.livneh: [C: 032] "No-op." [operations/puppet] - 10https://gerrit.wikimedia.org/r/111170 (owner: 10Ori.livneh) [10:40:00] yes [10:40:26] but that NAT ip will be in a very different ip subnet/prefix [10:40:49] ok [10:41:07] so, have a look at how tampa is configured currently [10:41:15] even though it's different openstack component [10:41:21] i think the setup will be very similar [10:41:39] what is the network node in tampa currently? [10:42:11] virt2 [10:43:01] right [10:43:05] have a look at role::nova::network [10:43:19] it sets up the bonding, which we won't need with 10G [10:43:23] and it sets up the tagged interface [10:44:02] not quite correctly I think, we should just use interface::tagged now [10:44:03] but anyway [10:44:11] you see there that it sets up the "SNAT" ip for the network node [10:44:26] ah no [10:44:35] it looks like the interface is actually managed by openstack itself [10:44:43] so openstack currently configures the ip to that interface [10:44:51] and puppet only assigns the SNAT ip to the loopback interface for other reasons [10:45:22] which seems to be configured in role::nova::config [10:45:26] that will be different now i'm sure [10:47:46] So, are you still thinking that we don't need to use three interfaces? [10:47:55] absolutely [10:48:02] and if we do need 3 [10:48:06] we'll use tagging [10:48:28] so we might end up with say, eth0, eth1.1101, eth1.1102 or something [10:48:41] that's 3 interfaces right there, even though it's only 2 physical ones [10:48:45] Ah, ok, so that's effectively three. [10:48:47] yes [10:48:56] but i don't think we'll want that [10:49:04] That's fine, I'm not bothered by whether or not there are three /physical/ networks. [10:49:05] i'll try to find some time to read these docs too [10:49:07] Well... [10:49:09] so I understand the openstack terms [10:49:21] ok. Because it's going to be very hard for me to follow the setup guide if at step one we're already not doing what it says :) [10:49:33] i know what the network should look like, but it's difficult to know what openstack expects and what it manages itself [10:49:40] yes i understand [10:51:11] (03PS2) 10Matanya: nrpe: remove hard coded disk checks [operations/puppet] - 10https://gerrit.wikimedia.org/r/110880 [11:05:39] (03PS1) 10Springle: repool db1027 in s3, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111173 [11:06:15] (03CR) 10Springle: [C: 032] repool db1027 in s3, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111173 (owner: 10Springle) [11:06:23] (03Merged) 10jenkins-bot: repool db1027 in s3, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111173 (owner: 10Springle) [11:08:04] !log springle synchronized wmf-config/db-eqiad.php 'repool db1027 in s3, warm up' [11:08:12] Logged the message, Master [11:15:27] (03PS1) 10Springle: increase db1059 load (96G ram compared to 64G for s4 siblings) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111175 [11:15:50] (03CR) 10Springle: [C: 032] increase db1059 load (96G ram compared to 64G for s4 siblings) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111175 (owner: 10Springle) [11:15:56] (03Merged) 10jenkins-bot: increase db1059 load (96G ram compared to 64G for s4 siblings) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111175 (owner: 10Springle) [11:17:18] !log springle synchronized wmf-config/db-eqiad.php 'db1059 LB increase' [11:17:26] Logged the message, Master [11:28:42] (03PS9) 10Alexandros Kosiaris: beta: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 (owner: 10Matanya) [11:38:57] (03CR) 10Alexandros Kosiaris: "avoid the include class1,\n class2,\n class3 pattern" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 (owner: 10Matanya) [11:39:11] (03CR) 10Alexandros Kosiaris: [C: 032] beta: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 (owner: 10Matanya) [11:53:04] (03CR) 10Byfserag: [C: 031] "Per TTO" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [12:00:26] PROBLEM - Puppet freshness on cp3019 is CRITICAL: Last successful Puppet run was Mon 03 Feb 2014 08:55:41 PM UTC [12:05:06] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 70.433334 [12:18:06] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [12:23:46] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 1.223 second response time [12:27:06] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 243.366669 [12:29:06] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [12:31:26] PROBLEM - Puppet freshness on cp3021 is CRITICAL: Last successful Puppet run was Mon 03 Feb 2014 09:26:27 PM UTC [12:35:17] mark: This should be an easier one… do you know what (if anything) is blocking http and https on virt1000? I have apache up but still no access. [12:35:59] there's no firewalling by the network equipment if that's what you mean [12:36:13] or hm [12:36:16] for labs maybe there is [12:36:18] let me check [12:36:35] Ryan said that the website was 'disabled' and I thought that just meant he had stopped apache… until just now. [12:36:37] thanks [12:38:59] so right now there is no filtering even though there should be [12:39:22] just tcpdump port 80 and see if you get packets when you telnet to it? [12:40:55] Oh, yep, packets are getting through. So, not a filtering issue. [12:43:55] (03PS4) 10Ori.livneh: Rewrite 'scap' script in Python [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 [12:44:06] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 86.866669 [12:44:33] (03CR) 10Ori.livneh: [C: 04-1] "Needs more work" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 (owner: 10Ori.livneh) [12:45:06] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [12:50:46] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Cannot make SSL connection [12:53:34] (03PS1) 10Matanya: bots: minor lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/111181 [13:02:43] akosiaris: modulrize sudo is ok? [13:04:10] matanya: seems so [13:04:33] ok, i'll do. any hints before i start i should know? [13:19:06] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 511.733337 [13:21:06] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [13:21:51] re [13:24:06] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 151.199997 [13:26:06] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [13:34:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:35:10] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 121.400002 [13:35:49] (03CR) 10Alexandros Kosiaris: [C: 04-1] etherpad: convert into a module (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 (owner: 10Matanya) [13:36:10] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [13:36:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:38:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:40:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:40:54] akosiaris: matanya: thanks for the beta modularization :-] [13:41:10] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 21.466667 [13:41:20] credit mostly to akosiaris [13:42:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:43:08] matanya: ?? that one i don't think i did much about. It is all you :-) [13:43:10] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [13:43:39] * paravoid looks at the puppet3 doc [13:44:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:46:51] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:47:15] (03CR) 10Faidon Liambotis: [C: 04-1] coredb_mysql: puppet 3 compatibility fix: fully qualify variable (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108488 (owner: 10Matanya) [13:48:16] matanya: ^ [13:48:25] already fixing. thanks paravoid [13:48:33] (03PS1) 10Hashar: labs: send puppet traps to icinga.pmtpa.wmflabs [operations/puppet] - 10https://gerrit.wikimedia.org/r/111187 [13:48:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:49:26] the puppet trap are wrong [13:49:27] :( [13:49:44] they are supposed to reflect a successful run of puppet but happens in `base` :( [13:50:27] heh... [13:50:34] already known... [13:50:44] there are other wrong think with that approach [13:50:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:50:52] s/think/things/ [13:51:14] we need traps and nobody knows how to handle them? :D [13:51:26] !log reedy updated /a/common to {{Gerrit|I8a1341149}}: increase db1059 load (96G ram compared to 64G for s4 siblings) [13:51:31] (03PS1) 10Reedy: Update non wikipedias to 1.23wmf12 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111188 [13:51:34] Logged the message, Master [13:52:10] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 121.633331 [13:52:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:53:10] hashar: we don't need traps. They are the wrong way of doing this. [13:53:10] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [13:53:26] (03CR) 10Petrb: [C: 031] labs: send puppet traps to icinga.pmtpa.wmflabs [operations/puppet] - 10https://gerrit.wikimedia.org/r/111187 (owner: 10Hashar) [13:54:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:55:19] akosiaris: meanwhile can we get the trap on labs to be send on another instance? The old one got phased out https://gerrit.wikimedia.org/r/#/c/111187/ :D [13:55:59] sure. Why ? did we get sue by nagios enterprises ? [13:56:22] (03PS1) 10Matanya: sudo: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/111189 [13:56:24] they didn't like us pointing nagios.wm.o to icinga [13:56:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [13:56:58] (03CR) 10jenkins-bot: [V: 04-1] sudo: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/111189 (owner: 10Matanya) [13:58:01] (03PS2) 10Matanya: sudo: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/111189 [13:58:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:29:24 PM UTC [14:00:00] RECOVERY - Puppet freshness on ms-be1008 is OK: puppet ran at Tue Feb 4 13:59:49 UTC 2014 [14:01:50] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 01:59:49 PM UTC [14:02:34] (03CR) 10Alexandros Kosiaris: [C: 032] labs: send puppet traps to icinga.pmtpa.wmflabs [operations/puppet] - 10https://gerrit.wikimedia.org/r/111187 (owner: 10Hashar) [14:02:38] akosiaris: na the old instance has been removed :-) [14:02:55] akosiaris: damiana / petan reinstalled it from scratch on another one [14:07:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 100.633331 [14:07:40] thank you [14:10:05] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [14:17:53] (03PS3) 10Matanya: sudo: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/111189 [14:18:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 127.933334 [14:18:29] grrr cp3019 [14:21:40] ok, akosiaris sudo module is in place. review at your own time [14:22:37] (03PS1) 10BBlack: Send NZ traffic to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/111194 [14:23:13] (03CR) 10Faidon Liambotis: [C: 032] Send NZ traffic to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/111194 (owner: 10BBlack) [14:23:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [14:23:54] (03PS1) 10Alexandros Kosiaris: Adding critical parameter to nrpe::monitor_service [operations/puppet] - 10https://gerrit.wikimedia.org/r/111195 [14:23:58] (03CR) 10BBlack: [C: 032 V: 032] Send NZ traffic to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/111194 (owner: 10BBlack) [14:24:26] \o/ [14:27:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 131.03334 [14:28:55] RECOVERY - Puppet freshness on ms-be1008 is OK: puppet ran at Tue Feb 4 14:28:50 UTC 2014 [14:31:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [14:32:17] (03CR) 10Alexandros Kosiaris: [C: 032] Adding critical parameter to nrpe::monitor_service [operations/puppet] - 10https://gerrit.wikimedia.org/r/111195 (owner: 10Alexandros Kosiaris) [14:33:04] matanya: ^ that is your critical parameter to nrpe::monitor_service [14:33:26] yeah, saw it already, thank you [14:36:01] (03PS8) 10Matanya: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 [14:36:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 86.533333 [14:37:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [14:39:15] ottomata: can you fix cp3019 please? [14:39:18] it's been spamming us for hour [14:39:20] s [14:39:43] (03PS5) 10Matanya: coredb_mysql: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/108488 [14:39:53] paravoid: ^ [14:40:06] forgot to push [14:40:41] (03CR) 10jenkins-bot: [V: 04-1] coredb_mysql: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/108488 (owner: 10Matanya) [14:40:42] arrg [14:41:39] (03PS6) 10Matanya: coredb_mysql: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/108488 [14:44:50] paravoid, yes, its hard to know if/when i have fixed it [14:45:00] because it takes a while for that to start happening again when I restart varnishkafka [14:45:09] i have been trying a couple of different settings on cp3019 and cp3021 [14:45:23] just getting on compy with bfast now, have meetings all day, will fix asap [14:45:39] have you found the root cause? [14:46:14] not entirely, producer queue gets full [14:46:22] but [14:46:28] does happen on others [14:46:29] but [14:46:45] kafka rtt to eqiad is higher on 3019 than other esams bits [14:46:46] dunno why [14:46:51] you should find a root cause rather than just try different settings [14:46:54] how much higher? [14:47:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 310.866669 [14:47:18] does it happen locally within esams too? [14:47:32] ottomata: https://gerrit.wikimedia.org/r/#/c/110382/ [14:47:55] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=cp30(19%7C2%5B0-2%5D)&mreg%5B%5D=kafka.rdkafka.brokers.analytics1022.*.rtt.avg>ype=line&glegend=show&aggregate=1 [14:48:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [14:48:45] that seems wrong [14:49:04] have you tried icmp pings etc.? [14:49:23] yes ping times seem equivalent from cp3019 vs cp3021 [14:49:41] also, i'm not sure how much of this is because we only have one broker right now [14:49:44] what's the y in this graph? [14:49:52] microsecs i believe [14:52:09] (03PS1) 10BBlack: Send all of OC to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/111196 [14:52:24] yeah microsecs [14:53:32] varnishkafka consumes 7.5G of RAM on cp3019 [14:53:45] (03CR) 10BBlack: [C: 032 V: 032] Send all of OC to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/111196 (owner: 10BBlack) [14:54:15] vs. 300G on cp3020 [14:54:47] ok yes, that is one of the settings i had been trying, it obviously did not help [14:54:52] that's queue size [14:55:05] why aren't you trying to find why latency is different first? [14:55:43] I did for a bit, but didn't discover anything [14:56:12] haven't given up, but I had tried this and restarted, and i guess it takes almost a day to start happening again [15:00:54] PROBLEM - Puppet freshness on cp3019 is CRITICAL: Last successful Puppet run was Mon 03 Feb 2014 08:55:41 PM UTC [15:01:38] Snaps: you there? [15:12:15] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1027.133301 [15:13:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [15:14:17] ottomata: sir yes sir [15:14:29] ottomata: whats with cp3019? It's been whining for two days now [15:16:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 381.933319 [15:16:20] yeah, unsure, rtt is higher there than other bits esams [15:16:22] been trying to figure that out [15:16:24] so question [15:16:50] rtt.cnt is the number of times rtt has been sampled, is that correct? [15:17:04] Snaps: ^ [15:17:57] ottomata: yes, number of rtt samples in averaging period [15:18:38] in averaging period, ok hm [15:19:03] ok rtt.cnt is way lower on cp3019 than on other esams bits [15:19:17] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [15:19:25] cp3021 and cp3022 are at 419 and 472 rtt.cnt [15:19:29] cp3019 is 47 [15:20:40] which number corresponds to kafka.queue.messages? [15:20:44] outbuf_cnt? [15:20:45] (03CR) 10Hashar: [C: 04-1] Rewrite 'scap' script in Python [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 (owner: 10Ori.livneh) [15:24:17] Snaps: ^^:) [15:24:48] ottomata: okay, so rtt.cnt is the number of replies received from broker in one averaging window. [15:24:52] I HATE JENKINNNNSSSS [15:26:09] ottomata: outbufs are buffers, which contains one or more messages. [15:26:30] any of the buffers? [15:27:00] outbuf_cnt on cp3019 is > 20K [15:27:27] with nothin in msgq or xmit_msgq [15:27:29] on cp3019 [15:28:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 263.299988 [15:29:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [15:30:21] there is on msgq and xmit_msgq per toppar, are all empty? [15:31:23] s,on,one, [15:31:54] PROBLEM - Puppet freshness on cp3021 is CRITICAL: Last successful Puppet run was Mon 03 Feb 2014 09:26:27 PM UTC [15:32:54] varnishkafka is consuming 300G on cp3020? Please say thats M, not G [15:35:06] ha i think that was a type [15:35:07] typo [15:36:59] yesh Snaps all are empty [15:37:11] here, I will show you values for cp3019 vs cp3020 [15:37:48] https://gist.github.com/ottomata/8805990 [15:38:10] ok this time I check they are not empty [15:38:18] xmit_msgq_cnt has a few [15:38:32] but outbuf_cnt is so high [15:38:46] i'm not sure what you mean by "outbufs are buffers, which contains one or more messages" [15:38:48] hah [15:38:59] like, is outbuf a total sum of messages in any buffer or queue? [15:41:02] no, outbufs doesnt count messages, it counts buffers. rdkafka packs multiple messages into a request, each request has a buffer. [15:41:38] ah ok [15:41:49] so outbufs is the number of buffers waiting to be sent, waitresp_cnt are the number of buffers in flight awaiting reply. [15:42:14] number of buffers? or number of requests? [15:43:05] same thing, one buffer is one request [15:43:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 480.266663 [15:44:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [15:44:17] that terminology just sounds weird to me, a buffer is just a storage/holding space [15:44:38] a queue is a kind of buffer in my dictionary [15:44:40] anyway, ok [15:44:40] so [15:45:07] this is telling me that outbuf_cnt is 22106 [15:45:16] I have a queue of buffers, each buffer holding a request :) [15:45:24] yeah, I dont really know whats up there [15:45:30] so there are 22K requests waiting to be sent? [15:45:42] they haven't been sent yet, right? [15:45:48] correct [15:45:49] there are 41 in waitresp_cnt [15:45:54] those have been sent but not acked [15:46:15] yep [15:46:44] so, on cp3019, kafka.queue.buffering.max.messages = 5000000, which is why it is using so much ram [15:46:47] but, that is not our problem [15:47:00] it was doing this at 1000000 and 2000000 too [15:47:35] the rdkafka defaults for max.messages and max.ms are set [15:47:50] sorry, not max.messages [15:48:11] batch.num.messages [15:48:15] so that is 1000 [15:48:30] it is like cp3019's connectivity is slower [15:48:31] we are doing about 6000 msgs / second on this bits varnish box [15:48:45] def only 1000 msgs in each of those requests [15:49:14] can you compare netstat -s on 19 and 20? are there more tcp related errors on 3019? [15:49:30] resends, et.al [15:50:42] and check varnishkafka socket buffer usage too: netstat -anp | grep :9092 [15:51:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 84.73333 [15:51:55] if there are requests/buffers on the output queue (outbuf_cnt>0), librdkafka will try to send those messages as fast as it can, only socket buffer being full holding it back. [15:52:22] hmmm [15:52:49] Snaps: [15:52:49] http://etherpad.wikimedia.org/p/varnishkafka_troubleshooting [15:54:01] hmmm [15:54:10] confused about those other IPs there… [15:55:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [15:56:00] checked netstat -anp again on cp3019, those extra cons don't show anymore [15:57:20] ok Snaps, how do I read that? net/tcp? [15:57:23] ottomata: those :80 are probably web requests that just happen to be sourced on port 9092 though? [15:57:27] very verbose [15:57:41] OHHH, [15:57:42] haha [15:57:42] paste it on the pad [15:57:43] rigiht [15:57:43] thanks [15:57:47] its more than can fit [15:57:57] pasted some [15:57:58] huhm [15:58:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 92.666664 [15:58:27] oh i can find hex of port and grep [15:58:28] is that from varnishkafka's net/tcp? [15:58:29] probbaly [15:58:32] yes [15:59:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:00:43] * Jeff_Green is running a dump of a big chunk of the OTRS database on db48, replag expected [16:01:22] oh, net/tcp shows for all processes, how not handy [16:03:44] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.962 second response time [16:05:15] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 132.566666 [16:10:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:11:17] so, Snaps, we can see that the tx_queue on cp3019 is rather more full than on cp3020, right? [16:11:28] unless i'm reading that wrong? [16:11:57] Im trying to wrap my head around the net/tcp output [16:14:47] well, the same shows in netstat, right?i think [16:15:05] hmm, maybe? [16:15:05] hm [16:15:30] not the retransmits, but they are 0 on both [16:15:43] one url suggested that retransmits was not implemented though [16:16:18] yeah [16:16:36] but, send/tx queue is large on cp3019 but not on cp3020 [16:16:42] right? [16:17:09] yep [16:17:18] ok, just making sure i'm reading that right [16:17:38] could you paste the entire Tcp: and TcpExt: sections of 'netstat -s' from both hosts? [16:19:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 174.433334 [16:19:17] willl paste all output [16:19:57] pasted at bottom [16:23:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:23:46] thanks [16:23:52] more retransmissions on cp3020? [16:24:39] yeah, cp3020 has about 1.5x times more retransmissions than transmits, while cp3019 has 0.29x [16:25:03] but the kafka traffic is a drop in the ocean compared to all other traffic on these hosts, so the global stats are useless I think [16:25:38] yeah [16:26:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 45.433334 [16:26:37] so, Snaps, ping times are about the same between these hosts and analytics1022 [16:26:48] does that rule out weird networking problems for cp3019? [16:27:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:27:19] hmm [16:28:25] Snaps, I forget, does random partition in this case mean the vk instance will only send traffic to a single randomly chosen partition? [16:29:01] it will randomize a partition for each message sent [16:29:09] ok [16:29:18] right ok cool [16:29:18] hm [16:29:30] so it will randomly choose one of all available partitions [16:29:30] which gives an equal load between partitions. [16:29:33] great, ok [16:29:34] hm [16:29:47] i'm trying to think of other reasons this host could have a different rtt.avg [16:30:00] thought maybe there could be a problem with disk/io for a particular partition maybe [16:30:02] on broker [16:30:09] but if all hosts balance across the same number of partitions [16:30:16] should have same behavior on all [16:30:16] hm [16:30:21] dont think its a broker thing. [16:30:25] yeah don't htikn so either [16:30:35] i'm just guessing at this point [16:30:54] so, i should put kafka.queue.buffering.max.messages back at the same as other hosts [16:30:58] but, if I do [16:31:10] vk will restart and we'll lose the problem until tomorrow [16:31:36] oh, so you've restarted it previously? It doesnt automatically recover? [16:32:06] (03PS1) 10Andrew Bogott: Set up second net interface for neutron [operations/puppet] - 10https://gerrit.wikimedia.org/r/111208 [16:32:06] ? [16:32:15] yesterday yes, i set max.messages higher [16:32:18] and restarted vk [16:32:21] yes, it automatically recovers [16:32:31] and is fine for a while [16:32:31] hmm [16:32:34] does rtt chagen? [16:32:36] (03PS1) 10Hashar: contint: browsers for testing + xvfb for headless [operations/puppet] - 10https://gerrit.wikimedia.org/r/111209 [16:32:36] lemme check [16:32:53] Snaps: [16:32:54] http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&title=&vl=&x=&n=&hreg[]=cp30%2819%7C2%5B0-2%5D%29&mreg[]=kafka.rdkafka.brokers.analytics1022.%2A.rtt.avg>ype=line&glegend=show&aggregate=1 [16:32:54] but [16:33:03] (03CR) 10Lcarr: [C: 04-1] Set up second net interface for neutron (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/111208 (owner: 10Andrew Bogott) [16:33:04] that is also load [16:33:16] LOL at manifests/misc/package-builder.pp [16:33:21] lets see, when did I restart vk [16:33:23] (03PS2) 10Hashar: contint: browsers for testing + xvfb for headless [operations/puppet] - 10https://gerrit.wikimedia.org/r/111209 [16:33:57] (03PS2) 10Andrew Bogott: Set up second net interface for neutron [operations/puppet] - 10https://gerrit.wikimedia.org/r/111208 [16:34:31] LeslieCarr: sad to see you go [16:34:40] best of luck [16:35:25] (03CR) 10Lcarr: [C: 032] "needs rebase though" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111208 (owner: 10Andrew Bogott) [16:35:38] (03PS3) 10Andrew Bogott: Set up second net interface for neutron [operations/puppet] - 10https://gerrit.wikimedia.org/r/111208 [16:35:44] ottomata: should manifests/misc/udp2log.pp be converted into a module? [16:36:19] naw, i wouldn't bother matanya, we hopefully will be deprecating udp2log in a few months [16:36:42] thanks [16:37:01] (03CR) 10Liangent: "What per TTO?" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [16:37:19] (03CR) 10Andrew Bogott: [C: 032] Set up second net interface for neutron [operations/puppet] - 10https://gerrit.wikimedia.org/r/111208 (owner: 10Andrew Bogott) [16:37:36] ottomata: will manifests/misc/logging.pp be replaced too? [16:37:44] hmm, maybe, mostly [16:38:07] hey Snaps, why is rtt.cnt low for cp3019? [16:38:15] 57 on cp3019 [16:38:20] 429 on cp3020 [16:38:47] ottomata: the cnt is based on number of requests being answered, and since cp3019 has a lower kafka message rate for some reason, it has a lower cnt [16:39:08] ottomata: seems like cp3019 is exaggerating compared to the other hosts. Is the cpu usage higher on it? does it handle more traffic for some reason? are there lots of ecc errors on the ram? does /sbin/ifconfig indicate more errors? [16:39:30] the cnt is lower because there are fewer acked requests? [16:39:58] cpu usage is similar [16:40:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 94.199997 [16:40:17] according to [16:40:17] http://noc.wikimedia.org/pybal/esams/bits [16:40:23] it should handle the same amount of traffic [16:40:26] ottomata: yes, the rtt is calculated on receiving an ack (time between sending request and receiving reply): less rtt.cnt == less requests answered [16:40:56] ok makes sense then [16:41:20] hm, this is probably obivious, but both drerr and txerr are up [16:41:44] not sure if that is significant, probably not, right? txerr would mean drerr implicitly? [16:43:22] txerr are most likely "queue full", while drerr are either mgs timeouts or broker-sourced errors [16:43:31] ifconfig does show more errors on cp3019, but not siginficantly i don't think [16:43:50] just pasted at bottom of eitherpad [16:45:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:45:25] (03CR) 10Matanya: "The commit message says 1001,1002, but you removed 1001-1020 from cache." [operations/puppet] - 10https://gerrit.wikimedia.org/r/111132 (owner: 10Cmjohnson) [16:46:59] ottomata: Iguess the drerrs are timeouts, right? [16:47:13] in vk.log [16:47:20] yeah so there are lots of these [16:47:20] Feb 4 16:46:29 cp3019 varnishkafka[19176]: PRODUCE: Failed to produce Kafka message (seq 243765876): No buffer space available (5000000 messages in outq) [16:47:54] oh yeah and Feb 4 16:46:29 cp3019 varnishkafka[19176]: KAFKADR: Suppressed 11277 (out of 11377) Kafka message delivery failures [16:47:54] Feb 4 16:47:22 cp3019 varnishkafka[19176]: KAFKADR: Kafka message delivery error: Local: Message timed out [16:47:55] Feb 4 16:47:29 varnishkafka[19176]: last message repeated 99 times [16:47:56] mm, yeah [16:48:06] the queue is so large and the output rate so low that messages are timed out [16:48:20] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 32.466667 [16:48:32] they are locally timed out because they can't be placed on the queue? [16:48:52] (03PS3) 10Matanya: Set default sql server for mysql [operations/puppet] - 10https://gerrit.wikimedia.org/r/102721 (owner: 10Merlijn van Deen) [16:50:02] ottomata: they are placed on the queue (otherwise txerr: no buffer space available) but they never make it to the broker before timing out [16:50:16] (03PS1) 10Andrew Bogott: Pass proper external interface to openvswitch [operations/puppet] - 10https://gerrit.wikimedia.org/r/111211 [16:51:09] ah, the timeout while they are in the queue, ok [16:51:13] (03CR) 10Matanya: [C: 031] "rebased an linted." [operations/puppet] - 10https://gerrit.wikimedia.org/r/102721 (owner: 10Merlijn van Deen) [16:51:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:51:57] hmmm, Snaps [16:52:15] so, i'm just looking at the rate of messages in the webrequest_bits topic per host [16:52:21] (consumeing it and using pipe viewer) [16:52:27] (03CR) 10Alexandros Kosiaris: etherpad: convert into a module (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 (owner: 10Matanya) [16:52:31] (03PS2) 10Andrew Bogott: Pass proper external interface to openvswitch [operations/puppet] - 10https://gerrit.wikimedia.org/r/111211 [16:52:37] (03CR) 10Alexandros Kosiaris: [C: 04-1] etherpad: convert into a module (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 (owner: 10Matanya) [16:52:40] cp3020 hovers around 5.7 K lines / sec [16:53:05] cp3029 around 6.14 K lines /sec [16:53:08] cp3019* [16:53:19] is cp3019 just doing more traffic maybe? [16:53:31] and it happens to be above some threshold…? [16:53:32] hm [16:53:42] hmmmmmm [16:53:43] naw [16:53:46] i think it isn't significant [16:53:52] also my sampling is unscientific [16:53:55] :p [16:54:06] i'm seeing cp3019 lower atm, so meh [16:54:06] nm [16:54:09] (03CR) 10Lcarr: [C: 031] Pass proper external interface to openvswitch [operations/puppet] - 10https://gerrit.wikimedia.org/r/111211 (owner: 10Andrew Bogott) [16:54:44] (03CR) 10Mark Bergsma: [C: 04-1] "base_interface => "eth1"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111211 (owner: 10Andrew Bogott) [16:55:07] are they equally load balanced or sharded in some way? [16:55:10] (03CR) 10Tim Landscheidt: [C: 031] "@Matanya: Could you please leave rebases for the merger (unless there are substantial changes)? This way it requires a lot of effort to s" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102721 (owner: 10Merlijn van Deen) [16:56:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 192.166672 [16:56:30] (03CR) 10Matanya: "Yes, sorry." [operations/puppet] - 10https://gerrit.wikimedia.org/r/102721 (owner: 10Merlijn van Deen) [16:56:57] they are equally weighted in pybal [16:56:59] Snaps: [16:57:03] http://noc.wikimedia.org/pybal/esams/bits [16:57:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:57:38] also [16:57:39] Snaps: yeah [16:57:45] same number of client_reqs [16:57:45] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=cp30(19%7C2%5B0-2%5D)&mreg%5B%5D=varnish.client_req>ype=line&glegend=show&aggregate=1 [16:58:00] (03PS3) 10Andrew Bogott: Pass proper external interface to openvswitch [operations/puppet] - 10https://gerrit.wikimedia.org/r/111211 [16:58:35] ok, cool [16:59:24] (03PS9) 10Matanya: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 [17:01:12] (03CR) 10Mark Bergsma: [C: 031] Pass proper external interface to openvswitch [operations/puppet] - 10https://gerrit.wikimedia.org/r/111211 (owner: 10Andrew Bogott) [17:01:20] Ryan_Lane: by what magic did you disable the wiki on virt1000? I turned apache back on, re-linked the wiki, still can't get web access. [17:01:32] ottomata: I really think this is a problem outside varnishkafka or kafka. Something IO, CPU or network related. But Im out of ideas what to look for. [17:01:53] ok, thanks Snaps, I'm a little stumped too [17:02:03] I am in meetings basically for the rest of the day [17:02:12] ottomata: what time did you restart vk yesterday? [17:02:12] i'm going to reset the max messages queue on cp3019 [17:02:24] feb 3 21:13 [17:02:28] utc [17:02:45] (03CR) 10Andrew Bogott: [C: 032] Pass proper external interface to openvswitch [operations/puppet] - 10https://gerrit.wikimedia.org/r/111211 (owner: 10Andrew Bogott) [17:05:06] !log Jenkins: updated bin/multigit.sh script to point to zuul.eqiad.wmnet instead of non working integration.wikimedia.org [17:05:14] Logged the message, Master [17:06:57] (03PS2) 10Tim Landscheidt: Tools: Add Ukrainian locale and sort list [operations/puppet] - 10https://gerrit.wikimedia.org/r/110827 [17:07:34] PROBLEM - Host labnet1001 is DOWN: PING CRITICAL - Packet loss = 100% [17:12:23] (03CR) 10Matanya: "I can provide three approaches how to solve this:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90684 (owner: 10Diederik) [17:12:37] hashar: ^ [17:12:48] this one is for you with love :) [17:12:58] in meeting assign me as a reviewer :-] will look tomorrow [17:14:39] you are already, you raised a question, i tried to answer :) [17:22:12] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 393.5 [17:22:18] matanya: wanna try again on etherpad without me commenting? [17:22:41] akosiaris: you mean i did something wrong? :) [17:23:02] not my words but heh :-) [17:23:12] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [17:23:24] * matanya is looking [17:24:03] oh, most likely ssl => required [17:24:16] matanya: hint, different syntax for defining a class and different for calling it [17:26:20] dammit [17:26:29] i can be so dumb some times :) [17:26:45] class { ::etherpad: [17:26:58] parameters blah balh [17:27:10] (03PS1) 10Lcarr: labnet1001 interface addressing adding external interface + removing unneeded site selector [operations/puppet] - 10https://gerrit.wikimedia.org/r/111219 [17:27:18] :-) [17:28:22] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 277.799988 [17:29:08] (03PS10) 10Matanya: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 [17:29:13] (03CR) 10Andrew Bogott: [C: 032] labnet1001 interface addressing adding external interface + removing unneeded site selector [operations/puppet] - 10https://gerrit.wikimedia.org/r/111219 (owner: 10Lcarr) [17:29:30] !log reenabling puppet on cp3019 [17:29:38] Logged the message, Master [17:29:44] (03CR) 10jenkins-bot: [V: 04-1] etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 (owner: 10Matanya) [17:30:03] RECOVERY - Puppet freshness on cp3019 is OK: puppet ran at Tue Feb 4 17:30:00 UTC 2014 [17:31:12] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [17:31:52] RECOVERY - Puppet freshness on cp3021 is OK: puppet ran at Tue Feb 4 17:31:46 UTC 2014 [17:31:55] !log reenabling puppet on cp3021, this reverts the batch.num.messages varnishkafka.conf change, let's see if cp3021 starts dropping again tomorrow [17:32:03] Logged the message, Master [17:34:05] (03PS11) 10Matanya: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 [17:37:36] ok, makes more sesnce now [17:37:42] *sense [17:43:05] !log demon synchronized php-1.23wmf12/extensions/LiquidThreads/pages/TalkpageView.php [17:43:13] Logged the message, Master [17:43:34] matanya: how many parameters does the definition have? And how many do you provide when you call it ? [17:43:49] !log demon synchronized php-1.23wmf11/extensions/LiquidThreads/pages/TalkpageView.php [17:43:56] Logged the message, Master [17:47:40] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 422.933319 [17:47:49] (03PS12) 10Matanya: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 [17:48:04] akosiaris: forgive me, too late, and doing stupid things [17:49:48] matanya: there is no rush. we don't have a deadline to catch. Go to sleep, we will finish another time. [17:50:07] akosiaris: you mean, go home from work :) [17:50:36] UTC+3 right ? [17:51:13] oh look! critical vk drerr in short time! [17:51:22] love it. [17:52:40] +2 akosiaris [17:52:51] still already too late. I ain't reviewing you any more today :-) [17:53:06] ok, :wq on 16 tabs [17:53:11] akosiaris: review me! [17:53:11] https://gerrit.wikimedia.org/r/#/c/110590/ [17:53:12] :p [17:53:13] akosiaris, paravoid, quick question: we are just debating the name of the parsoid package (https://gerrit.wikimedia.org/r/#/c/110666/) and are considering mediawiki-parsoid, mw-parsoid or just parsoid [17:53:29] (03CR) 10CSteipp: [C: 031] "This will help us comply with the new Privacy policy, which although still in draft, looks like it's going to go with 90 days as the time " [operations/puppet] - 10https://gerrit.wikimedia.org/r/111127 (owner: 10Chad) [17:54:35] ottomata: already on that. So, kafka did not like the number of open files much. [17:54:36] currently it is mediawiki-parsoid, but start-stop-daemon complains that the name is too long for process names [17:54:37] mw-parsoid for me [17:54:55] yeah, seems to be a good compromise [17:54:55] I suppose it does not have a tunable right? and I suppose scala does not call setrlimit(2)... [17:54:56] does not have a call for* [17:55:30] i haven't seen one [17:55:31] will double check [17:56:28] so it inherits shell and you override shell.... [17:56:37] akosiaris: [17:56:38] https://cwiki.apache.org/confluence/display/KAFKA/Operations [17:56:50] no mention of it tuning itself, only that they (1) we upped the number of file descriptors s [17:56:57] (03PS1) 10Lcarr: adding a new vlan for network nodes [operations/dns] - 10https://gerrit.wikimedia.org/r/111223 [17:57:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [17:59:04] ottomata: ok thanks... looking [18:00:15] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 540.93335 [18:03:53] (03CR) 10coren: [C: 032] "+locale" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110827 (owner: 10Tim Landscheidt) [18:08:53] ottomata: so both raising the limit in /etc/security and in the init script ? [18:09:32] isn't that 1 place more than a local admin would like to have to adjust ? [18:10:13] (03CR) 10Mark Bergsma: [C: 04-1] "typo" (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/111223 (owner: 10Lcarr) [18:11:57] (03PS1) 10Tim Landscheidt: Tools: Mirror font packages used in production [operations/puppet] - 10https://gerrit.wikimedia.org/r/111226 [18:12:54] (03PS2) 10Lcarr: adding a new vlan for network nodes [operations/dns] - 10https://gerrit.wikimedia.org/r/111223 [18:13:47] (03PS5) 10Ori.livneh: Rewrite 'scap' script in Python [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 [18:14:29] (03PS2) 10Tim Landscheidt: Tools: Mirror font packages used in production [operations/puppet] - 10https://gerrit.wikimedia.org/r/111226 [18:14:39] (03CR) 10Lcarr: [C: 032] adding a new vlan for network nodes [operations/dns] - 10https://gerrit.wikimedia.org/r/111223 (owner: 10Lcarr) [18:16:16] (03CR) 10coren: [C: 032] Tools: Mirror font packages used in production [operations/puppet] - 10https://gerrit.wikimedia.org/r/111226 (owner: 10Tim Landscheidt) [18:18:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [18:21:14] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 687.266663 [18:22:14] RECOVERY - Host labnet1001 is UP: PING WARNING - Packet loss = 64%, RTA = 0.29 ms [18:28:10] (03Abandoned) 10Andrew Bogott: Grab a couple more IPs for labnet1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/111169 (owner: 10Andrew Bogott) [18:28:37] (03PS1) 10Andrew Bogott: Add a note about a couple of steps we did w/out puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/111230 [18:28:39] (03PS1) 10Andrew Bogott: Add another network interface to neutron server. [operations/puppet] - 10https://gerrit.wikimedia.org/r/111231 [18:31:06] (03CR) 10Lcarr: [C: 04-1] Add another network interface to neutron server. (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/111231 (owner: 10Andrew Bogott) [18:31:08] (03CR) 10Andrew Bogott: [C: 032] Add a note about a couple of steps we did w/out puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/111230 (owner: 10Andrew Bogott) [18:32:31] (03CR) 10Lcarr: Add another network interface to neutron server. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/111231 (owner: 10Andrew Bogott) [18:33:26] (03CR) 10Alexandros Kosiaris: [C: 04-1] Adding ability to set ulimit nofiles, increased to 8192 by default for kafka server (033 comments) [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/110590 (owner: 10Ottomata) [18:35:15] petan, your email to the password policy thread with different scenario lengths - I've seen that before but can't remember where.. can you send me the link please? [18:36:39] !log disabled puppet on cp3019 again, trying batch.num.messages there (just want these flappy alerts to be quiet!) [18:36:48] (03PS1) 10Lcarr: adding ipv6 labs subnets [operations/dns] - 10https://gerrit.wikimedia.org/r/111234 [18:36:48] Logged the message, Master [18:38:02] (03PS2) 10Andrew Bogott: Add another network interface to neutron server. [operations/puppet] - 10https://gerrit.wikimedia.org/r/111231 [18:38:14] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [18:39:21] MaxSem, is 17-February okay for you to have WML unsupported? dfoy said he'll send the message tonight, and paravoid said he's cool with that date...if you are. i wanted to verify with dfoy that it's okay for him to send the email with the proposed dates. [18:41:40] dr0ptp4kt, sounds reasonable to me. worst case, if something goes wrong e.g. serious discussion taking too long we can always postpone [18:41:57] MaxSem, thanks. i'll reply on the thread to confirm [18:42:09] andrewbogott: eth0 promisc ??? [18:42:26] akosiaris: It's a long story [18:42:48] Anyway, latest version is eth1.1122 promisc [18:42:48] smells like vmware from afar [18:42:59] anyway... please go on [18:43:23] akosiaris: Configuring 'neutron' which is a virual network service. [18:43:35] I'm pretty much just following the docs. Don't know much about the deeper theory. [18:43:55] (03CR) 10Ottomata: "Hm, I tested it and it worked. Will test again." [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/110590 (owner: 10Ottomata) [18:44:14] isn't neutron that thing with the various modules ? like for EX, nexus, openvswitch and yada yada ? [18:44:40] yes [18:44:52] openvswitch is the one I'm working on [18:45:02] post a link please at some point. just curious why promisc is needed [18:45:20] http://docs.openstack.org/trunk/install-guide/install/apt/content/neutron-install.dedicated-network-node.html [18:45:51] thanks [18:46:30] (03PS1) 10Lcarr: adding labnet1001-ext [operations/dns] - 10https://gerrit.wikimedia.org/r/111235 [18:46:41] (03PS3) 10Andrew Bogott: Add another network interface to neutron server. [operations/puppet] - 10https://gerrit.wikimedia.org/r/111231 [18:46:54] that doc is more practice than theory… I think if you look elsewhere in that same manual there's explanation of what they're actually up to. [18:47:34] (03CR) 10Lcarr: "LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111231 (owner: 10Andrew Bogott) [18:48:20] (03CR) 10Andrew Bogott: [C: 032] Add another network interface to neutron server. [operations/puppet] - 10https://gerrit.wikimedia.org/r/111231 (owner: 10Andrew Bogott) [18:52:01] (03PS6) 10BryanDavis: Rewrite 'scap' script in Python [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 (owner: 10Ori.livneh) [18:57:39] (03CR) 10Lcarr: [C: 032] adding ipv6 labs subnets [operations/dns] - 10https://gerrit.wikimedia.org/r/111234 (owner: 10Lcarr) [18:57:46] (03CR) 10Lcarr: [C: 032] adding labnet1001-ext [operations/dns] - 10https://gerrit.wikimedia.org/r/111235 (owner: 10Lcarr) [19:05:12] (03CR) 10Alexandros Kosiaris: "It probably worked because you used su or sudo while testing." [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/110590 (owner: 10Ottomata) [19:06:52] (03CR) 10Reedy: [C: 032] Update non wikipedias to 1.23wmf12 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111188 (owner: 10Reedy) [19:07:01] (03Merged) 10jenkins-bot: Update non wikipedias to 1.23wmf12 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111188 (owner: 10Reedy) [19:08:45] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Update non wikipedias to 1.23wmf12 [19:08:52] Logged the message, Master [19:10:16] Ouch. I wonder how long Commons will last with a non-functional RecentChanges. [19:10:38] What's broken? [19:10:41] !log deployed zuul change for fundraising's very own mwalker [19:10:49] Logged the message, Master [19:11:05] mwalker|alt: You might submit a test change [19:11:13] doing so now! [19:11:53] Nemo_bis, ? [19:12:25] Oh [19:12:27] Not completely broken [19:12:32] https://bugzilla.wikimedia.org/show_bug.cgi?id=60795 [19:12:49] Proposed patch at https://gerrit.wikimedia.org/r/#/c/111201/ [19:13:38] it's only very slightly broken :o [19:21:54] RECOVERY - RAID on ms-be1002 is OK: OK: optimal, 13 logical, 13 physical [19:25:57] paravoid: disk replaced ^ [19:26:12] andrewbogott: the nic card came...okay to take labnet down? [19:27:06] cmjohnson1: doh! [19:27:09] you have bad timing ;) [19:27:12] well or awesome timing [19:27:15] i'm not certain ;) [19:27:41] so, with the new nic card what will the interface layout be ? [19:30:37] akosiaris: [19:30:37] "The init file approach will work however" [19:31:57] what init file approach? [19:32:30] i added and sfp module to asw-b3 lesliecarr [19:33:02] ottomata: if [ -n "$KAFKA_NOFILES_ULIMIT" ]; then [19:33:03] isn't that yours ? [19:33:04] in kafka.ini [19:33:04] kafka.init [19:33:04] greg-g: does Flow have a 1pm PDT window today? [19:33:14] yes [19:33:15] but the global ulimit has to be set higher too [19:33:15] that just sets it for that process, right? [19:33:15] yes [19:33:16] so what is the proper way to set the global one? [19:33:28] for starters it needs to be for root [19:33:28] and not kafka (that was my point) [19:33:34] kafka user that is [19:34:30] so... sure, take it down cmjohnson1 [19:34:36] so it probably worked in your tests because you got to 4096 which is root's hard limit [19:34:44] and connect the two interfaces to b3 [19:35:08] i get 503 on commons [19:35:16] ah, i see, you're saying because start-stop-daemon doesn't su to kafka it won't respect the change [19:35:25] ? [19:35:26] exactly [19:35:36] PROBLEM - Host labnet1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:36:07] it will respect the ulimit -n however and until you reach that limit you will have no problem [19:36:25] yeah, but i want to set it much higher in production [19:36:32] so i'll need to set a larger global ulimit [19:36:32] then puppet to the rescue [19:36:36] hmmmmm [19:36:37] ohhh [19:36:47] i see [19:36:48] ok cool [19:36:55] so i should leave the defaults + init.d change in [19:36:57] cmjohnson1: can you confirm that the box will only have 2x10G interfaces ? [19:37:01] but scrap the package level global change [19:37:02] cmjohnson1: or will there be a 1g left ? [19:37:05] and just use puppet to change it for us [19:37:08] kkkkkk [19:37:09] ottomata: exactly :-) [19:37:10] yeah that is better [19:37:10] thanks [19:38:27] lesliecarr: the nic card has 2 10G ports...the onboard copper nics will not be removed if that is what you mean [19:38:55] excellent ;) [19:44:33] quiddity: hi. [19:46:08] quiddity: You guys aware of http://tools.wikimedia.pl/~odder/screenshots/FlowDatesBug.png? [19:46:44] (03PS4) 10Ottomata: Adding ability to set ulimit nofiles, increased to 8192 by default for kafka server [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/110590 [19:52:26] twkozlowski, aware of, but details would help: what browser/OS ? [19:53:35] quiddity: is there a bug for this already somewhere? [19:53:45] I checked all the bugs filled today, couldn't find anything [19:56:36] akosiaris_away: : https://gerrit.wikimedia.org/r/#/c/110590/ better? [19:56:47] ah away, ok! [20:02:20] lesliecarr: connected to xe-3/1/0 and xe-3/1/2 [20:08:16] !dontlog ori testing IRC log handler [20:18:32] hey Ryan_Lane, you there? [20:23:21] greg-g: Can I get a LD for VisualEditor? https://bugzilla.wikimedia.org/show_bug.cgi?id=57209 https://gerrit.wikimedia.org/r/#/q/Iada098d0fa,n,z [20:23:30] (per James_F) [20:23:41] manybubbles1: hiaaa? [20:23:51] yo? [20:29:09] (03PS7) 10Ori.livneh: Rewrite 'scap' script in Python [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 [20:29:14] got a sec to brain bounce git deploy stuff? [20:30:20] yeah [20:32:02] k [20:35:36] (03PS1) 10Ori.livneh: Make mw-deployment-vars.sh valid Python [operations/puppet] - 10https://gerrit.wikimedia.org/r/111256 [20:36:16] (03CR) 10Ori.livneh: [C: 032 V: 032] Make mw-deployment-vars.sh valid Python [operations/puppet] - 10https://gerrit.wikimedia.org/r/111256 (owner: 10Ori.livneh) [20:42:52] (03CR) 10Hashar: "Might want to put a comment at the top of the file stating that it should be valid for both python import and shell sourcing?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111256 (owner: 10Ori.livneh) [20:52:42] !log aaron synchronized php-1.23wmf12/includes 'faf18db37b0553bd4f42b72d24cc6f7f297a0b5f' [20:52:50] Logged the message, Master [20:57:13] (03PS8) 10Ori.livneh: Rewrite 'scap' script in Python [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 [20:57:27] (03CR) 10Ori.livneh: "Hashar: done in PS8 of https://gerrit.wikimedia.org/r/#/c/110904/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111256 (owner: 10Ori.livneh) [20:57:39] apergos: Forgot how dumps is set up: If I look at /public/datasets/public on Labs, is that the dumps server, or a copy on a Labs host? (WRT https://bugzilla.wikimedia.org/45646 => "Create -latest alias for dumps".) [20:58:02] should be a copy [20:58:43] (03CR) 10Ori.livneh: [C: 032] "Renamed the file to 'scappy' for now, leaving 'scap' itself unmodified, so that this is easier to test." [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 (owner: 10Ori.livneh) [21:01:15] apergos: Would it be possible/preferable to have the symlinks in the source, or should I restrict them to the Labs copy? [21:01:55] (03PS1) 10Hashar: Merge tag 'v0.8.0' from upstream [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/111263 [21:03:01] (03PS1) 10Ori.livneh: scappy: qualify path to dsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/111266 [21:03:43] (03CR) 10Ori.livneh: [C: 032 V: 032] scappy: qualify path to dsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/111266 (owner: 10Ori.livneh) [21:03:51] the source has latest links but they may point to files in incompleted dumps [21:04:25] iirc the dumps copied to labs are from known good (and therefore completed) runs [21:04:30] so you'll want to generate a separate set there I guess [21:04:54] greg-g: is it OK for Flow to start its deploy window? [21:06:54] apergos: Okay, thanks. [21:07:27] sure [21:08:30] https://www.wikidata.org/wiki/Special:ActiveUsers wow, botland ;) [21:08:39] (03CR) 10Hashar: "Updated my instance integration-debian-builder.pmtpa.wmflabs with it (the package is added to a local repository)." [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/111263 (owner: 10Hashar) [21:09:09] any debian pinning guru might explain why I can install a more up to date package please ? http://paste.debian.net/plain/80179 [21:09:09] Reedy,looks like you're long finished group1 to wmf12, so Flow is starting its deploy window. [21:09:34] I got 0.8.0 available but apt does not select it as a candidate :-( [21:11:54] hashar: Do you have any pinning in place now? [21:12:07] I guess so [21:12:19] was merely looking to apt-get install and specify the repository I want to use [21:12:29] aka the special distribution jenkins-debian-glue/main [21:12:33] instead of precise-wikimedia/main [21:13:17] Ah, okay. I don't know how to do that without changing /etc/apt/preferences.d. [21:13:32] -o=? Set an arbitrary configuration option, eg -o dir::cache=/tmp [21:13:32] :D [21:14:49] hashar: Does that work or was that a question? :-) [21:15:01] ahh [21:15:16] I am pretty sure there used to be something like apt-get install firefox@unstable [21:15:35] "man apt-get" has an option "-t". Tried that? [21:16:34] ah yeah that was it [21:16:40] scfc_de: thank you very much :-D [21:17:22] E: The value 'jenkins-debian-glue/main' is invalid for APT::Default-Release as such a release is not available in the sources [21:17:26] debian never cease to amaze me [21:18:22] ottomata: analytics1021 is fixed...you will need to fix ip [21:19:56] RECOVERY - Host analytics1021 is UP: PING OK - Packet loss = 0%, RTA = 1.04 ms [21:20:30] scfc_de: apt-get install jenkins-debian-glue/jenkins-debian-glue [21:20:33] scfc_de: that did the trick [21:21:31] !!! [21:21:33] oooooooo [21:21:55] cmjohnson1: fix IP? [21:22:23] (03PS1) 10Ori.livneh: scap: fix formatting of log message; set default to '(no message)' [operations/puppet] - 10https://gerrit.wikimedia.org/r/111325 [21:22:50] (03CR) 10Ori.livneh: [C: 032 V: 032] scap: fix formatting of log message; set default to '(no message)' [operations/puppet] - 10https://gerrit.wikimedia.org/r/111325 (owner: 10Ori.livneh) [21:23:12] ottomata: nevermind [21:23:37] :) [21:23:48] scfc_de: fixed. thank you very much [21:24:05] has anyone else done an upgrade ont he dells from 1g to 10g card? [21:24:09] hashar: That's ... interesting :-). There is even one (!) mention of "pkg/release" in the man page. [21:24:12] i'm wondering if i need to reinstall to get it to work ... [21:24:33] cmjohnson1: thank you thank you thank youuuuu! [21:24:54] scfc_de: yeah that is nice [21:25:04] scfc_de: also that means we now have jenkins-debian-glue v0.8.0 [21:25:48] lesliecarr: i believe a reinstall was needed and the bios had to be configured to boot from the card. [21:26:02] oh man, this will be fun ;) [21:26:32] cmjohnson1: can you set the bios to boot from the card ? (if you are still there and it's easy for you to hook in) ? [21:26:43] i am not there now [21:26:46] PROBLEM - Puppet freshness on cp3019 is CRITICAL: Last successful Puppet run was Tue 04 Feb 2014 06:26:26 PM UTC [21:26:54] left about 30 mins ago [21:27:07] ok [21:28:55] (03CR) 10Hashar: [C: 032] "I use integration/jenkins-debian-glue.git to push new versions of upstream package. That trigger a jenkins job which build it for me using" [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/111263 (owner: 10Hashar) [21:29:02] greg-g: : greg-g: Can I get a LD for VisualEditor? https://bugzilla.wikimedia.org/show_bug.cgi?id=57209 https://gerrit.wikimedia.org/r/#/q/Iada098d0fa,n,z [21:30:43] !log bsitu synchronized php-1.23wmf11/extensions/Flow 'Update Flow' [21:30:52] Logged the message, Master [21:40:12] PROBLEM - check_mysql on lutetium is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [21:40:51] AaronSchulz: I think that's you [21:45:06] RECOVERY - check_mysql on lutetium is OK: Uptime: 2163896 Threads: 2 Questions: 17665367 Slow queries: 85734 Opens: 14668 Flush tables: 2 Open tables: 64 Queries per second avg: 8.163 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [21:49:49] !log rebuilding search indexes for non-wikipedias after cirrus update on the train went to them earlier today. [21:49:57] Logged the message, Master [21:53:24] alantz: to double check, there's some reason you don't want that file on commons? (big theora video) [21:53:34] Krinkle: yeah, sorry, wasn't ignoring, please do [21:53:41] alantz: on commons you can upload that size yourself. not sure about other places [21:54:44] alantz: if not commons than need to find some way to get it to the person that will upload. maybe easiest to do that in person at the office [21:55:28] (instructions for doing the upload at commons: https://commons.wikimedia.org/wiki/Commons:Chunked_uploads ) [22:01:21] (03CR) 10Hashar: "scappy is a lovely name :-]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110904 (owner: 10Ori.livneh) [22:02:01] * jeremyb mailed alantz [22:03:00] !log aaron synchronized php-1.23wmf12/includes/specials/SpecialActiveusers.php '6e1fd797c58c4ce01c19c57d9ffe06b13acc816a' [22:03:07] Logged the message, Master [22:03:15] !log bsitu synchronized php-1.23wmf12/extensions/Flow 'Update Flow' [22:03:20] Logged the message, Master [22:04:50] (03PS1) 10Tim Landscheidt: Dynamic proxy: Serve SSL certificate chain [operations/puppet] - 10https://gerrit.wikimedia.org/r/111342 [22:06:00] (03CR) 10Tim Landscheidt: "As per ." [operations/puppet] - 10https://gerrit.wikimedia.org/r/111342 (owner: 10Tim Landscheidt) [22:07:00] greg-g: we're still looking for someone with CheckUser powers on enwiki, but assuming no show-stoppers (hahaha) in testing, the Flow fixes deploy is finished, thanks <3 [22:08:23] spage: coolio [22:09:08] greg-g: is DGarry around? I think he has the godliness [22:10:40] spage: He does but he is not on IRC. [22:13:07] spage: Coren might be online. [22:13:15] My experience is that DGarry is magical, he might turn up sooner than you think, JohnLewis [22:13:49] twkozlowski: My experience is the opposite. [22:14:00] You mean he's a Muggle? [22:14:05] Meh [22:14:57] (03PS1) 10Ori.livneh: Add 'scap' submodule [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111343 [22:15:52] JohnLewis: thx, legoktm found a CheckUser who confirmed the fix \o/ [22:15:57] (03CR) 10Ori.livneh: [C: 032] Add 'scap' submodule [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111343 (owner: 10Ori.livneh) [22:18:12] greg-g: OK, so I have a go for the deployment in the LD window? [22:18:36] spage: Ok :) [22:26:21] Krinkle: yep. [22:27:51] jeremyb sorry! I went out to get some lunch [22:30:41] (03PS1) 10Ori.livneh: Revert "Add 'scap' submodule" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111349 [22:30:54] (03CR) 10Ori.livneh: [C: 032 V: 032] Revert "Add 'scap' submodule" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/111349 (owner: 10Ori.livneh) [22:31:08] !log ori updated /a/common to {{Gerrit|Ia15a12587}}: Revert "Add 'scap' submodule" [22:31:16] Logged the message, Master [22:32:28] alantz: np, just read what i wrote :) [22:32:55] (03PS1) 10GWicke: Bug 60694: Make the config file path configurable [operations/puppet] - 10https://gerrit.wikimedia.org/r/111350 [22:34:21] !log deployed mwalker's chicken-out commit making sphinx non-voting, to gallium [22:34:28] Logged the message, Master [22:34:56] heh; nice log message [22:35:13] it looks like sartoris is the only thing using is already [22:35:18] and it's using python setup-utils [22:35:47] which is interesting; and I'm not sure a dependency I want to introduce to a predominantly php project [22:41:40] (03PS1) 10Ori.livneh: Configure staging area for scap rewrites in /srv/scap [operations/puppet] - 10https://gerrit.wikimedia.org/r/111352 [22:45:48] (03CR) 10Ori.livneh: [C: 032] Configure staging area for scap rewrites in /srv/scap [operations/puppet] - 10https://gerrit.wikimedia.org/r/111352 (owner: 10Ori.livneh) [22:46:48] jeremyb Thanks! I'll respond soon, got pulled into another project. [22:47:06] alantz: k [22:57:33] (03PS1) 10Ori.livneh: Git::clone['/srv/scap']: owner => 'root' [operations/puppet] - 10https://gerrit.wikimedia.org/r/111355 [22:58:23] (03CR) 10Ori.livneh: [C: 032 V: 032] Git::clone['/srv/scap']: owner => 'root' [operations/puppet] - 10https://gerrit.wikimedia.org/r/111355 (owner: 10Ori.livneh) [23:24:03] !log ori started scap: Test of scap modifications; no changes going out. [23:24:10] Logged the message, Master [23:38:48] !log ori started scap: Test of scap modifications; no changes going out. [23:38:56] Logged the message, Master [23:39:15] sorry, there will be a few of these [23:39:17] !log ori started scap: Test of scap modifications; no changes going out. [23:39:25] Logged the message, Master [23:46:15] (03PS1) 10Ori.livneh: Make scap script files symlinks to /srv/scap/bin files [operations/puppet] - 10https://gerrit.wikimedia.org/r/111373 [23:55:43] (03PS2) 10Tim Landscheidt: Dynamic proxy: Serve SSL certificate chain [operations/puppet] - 10https://gerrit.wikimedia.org/r/111342 [23:56:17] (03PS1) 10Ori.livneh: ensure tmpfile is accessible to dsh subprocess [operations/puppet] - 10https://gerrit.wikimedia.org/r/111375 [23:59:48] (03CR) 10coren: [C: 032] Dynamic proxy: Serve SSL certificate chain [operations/puppet] - 10https://gerrit.wikimedia.org/r/111342 (owner: 10Tim Landscheidt)