[00:01:32] ori: thanks for the SWAT! [00:06:33] tgr: my pleasure [02:00:04] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Last successful Puppet run was Tue 22 Apr 2014 04:57:34 PM UTC [02:12:14] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3471 MB (3% inode=99%): [02:18:14] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 2935 MB (3% inode=99%): [02:18:16] (03CR) 10Springle: "Good plan." [operations/puppet] - 10https://gerrit.wikimedia.org/r/111152 (owner: 10Diederik) [02:25:41] !log restarted rebuilding common's Cirrus index after something crashed. going to get more logging out of it if it crashes again. or it'll work. Either way. Like last time the Elasticsearch check might freak out for a bit after it finished because shards are assigning. That can be ignored for an hour or so. [02:25:48] Logged the message, Master [02:43:45] Any ops around that can look for rt tickets related to Freenode IP limit exceptions for Wikimedia Labs? [02:43:51] (aka an "iline") [02:44:12] Looks like something got screwed up, bots are being necked on labs. Unable to connect to freenode [02:44:39] Coren: (rt duty -ish) [02:45:28] Krinkle: That shouldn't be the case because identd is running; though we may be hitting the higher cap. You certain that's the reason? [02:45:54] 2014-04-23 02:43:12,048 [Main] FATAL - Could not connect: Could not connect to: chat.freenode.net:6667 Connection timed out [02:45:55] 2014-04-23 02:42:08,409 [Main] ERROR - IRC: Closing Link: 208.80.155.255 (Too many user connections (global)) [02:46:00] it's alternating between those two erros [02:46:06] on each retry [02:46:20] also tried the older irc.freenode.net just in case, but no difference. [02:46:40] that's an eqiad ip [02:46:49] !log LocalisationUpdate completed (1.23wmf22) at 2014-04-23 02:46:47+00:00 [02:46:56] Logged the message, Master [02:47:21] Coren: What do you mean by identd is running. [02:48:07] The nodes have identd running, which freenode uses to bypass the per-IP limit, placing a per-user limit instead -- which may be what you run into. [02:48:28] Where would that be running? Is that something specific to the instance? [02:48:52] No, it runs on every exec node -- or do you mean you're hitting that limit for a non-tools project? [02:49:16] Coren: This isn't in tools, it's on cvn-app4 [02:49:25] but that shouldn't make a difference though, right? [02:49:50] Ah. You have to get identd running then, and have a public IP. Otherwise, you're just using the single labs NAT IP and hitting the per-IP limit (which is relatively low) [02:49:50] wait, tools-exec nodes? [02:50:14] Are you saying this is an internal limit, not something capped by Freenode? [02:51:30] No, it's capped by freenode. It does one of two things: if you have an identd running at the source IP, then it uses per-user limits based on what identd returns. If you don't, then it uses a per-IP limit. If your instance doesn't have a public IP with identd running, then it gets natted and lumped with any other IRC connection that goes through there. [02:51:38] Or do non-tools project have a significantly different outgoing connection that the iline exception on their end (which is IP based afaik) doesn't cover it? [02:52:13] It's not tools vs non-tools; it's public IP vs natted. [02:52:31] Coren: Ok, got that. [02:52:53] Coren: so when its natted, that connection limit is not one from freenode, but somewhere on our end? [02:53:12] Nope, it's from freenode, but then shared between all the non-natted instances. [02:53:26] Interesting. [02:53:29] Because they all connect from the same IP [02:53:46] But if I get a public IP for my instance, how does it get the IP limit excempt? [02:53:55] Or do they have the exception on a range? [02:54:28] It won't, but the normal limit it quite generous. You can then further expand that if you run identd because then it doesn't get the per-IP limit but the per-user limit. [02:55:57] So you reckon that if cvn has its own public IP, it likely won't hit the general freenode connections-from-IP limit. I;m hitting it now because I'm sharing the same IP as all other non-tools natted labs instances. [02:56:35] and tool labs does hit that limit which is why only tool labs has its IP raised from Freenode's end? [02:57:08] Can you roughly estimate what that limit is? CVN has quite a few, and growing as we migrate all our bots from other hosting places we're trying to phase out. [02:57:13] Probably a dozen, 2 dozen at most. [02:57:13] Tool labs doesn't have a raised IP limit either; it has identd turned on so it uses the per-user limit. [02:57:36] Interesting. [02:57:47] Hm.. then why did Toolserver need it? [02:57:50] IIRC, the limit is about 20; though if you turn identd on and your bots use different user IDs (as they probably should for security) they each get their own limit. [02:58:28] I expect the toolserver natted all the bots, and didn't have identd running. [02:58:32] If I mention "bot" and "wikipedia" in #freenode, the first thing most staffers there get popped in their head is the fact that "Wikipedia's Toolserver" has an exempt because they run many bots there. [02:59:23] Yeah; that exception was only made because of the way toolserver did things (nat everything, no identd) [02:59:39] Coren: I find it odd that local unix user ids or running something called identd (I should know about identd, I don't) how that could bare any significance in a connection limit to prevent abuse [02:59:45] Seems rather trivially circumvented. [03:00:15] RECOVERY - Disk space on virt0 is OK: DISK OK [03:00:16] Anyway, so more to the problem at hand (-> #wikimedia-labs) [03:00:31] Of course it could; and the freenode staff would then beat you up for it. :-) [03:09:01] !log LocalisationUpdate completed (1.24wmf1) at 2014-04-23 03:08:58+00:00 [03:09:05] Logged the message, Master [03:16:46] I think the Toolserver ran oidentd. [03:16:51] With per-user config disabled. [03:18:44] oident is a user on nightshade and seems to be currently running, anyway. [03:55:08] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 23 03:55:03 UTC 2014 (duration 55m 2s) [03:55:14] Logged the message, Master [04:03:24] PROBLEM - Disk space on dataset1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:04] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Last successful Puppet run was Tue 22 Apr 2014 04:57:34 PM UTC [06:00:22] (03PS1) 10Dzahn: remove db48,db63,db68,db75,db76,db77 from dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129101 [06:02:08] (03CR) 10Springle: [C: 04-1] remove db48,db63,db68,db75,db76,db77 from dhcp (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129101 (owner: 10Dzahn) [06:05:44] (03PS2) 10Dzahn: remove db48,db63,db75,db76,db77 from dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129101 [06:07:49] (03CR) 10Springle: [C: 031] remove db48,db63,db75,db76,db77 from dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129101 (owner: 10Dzahn) [06:09:53] (03CR) 10ArielGlenn: [C: 031] remove db48,db63,db75,db76,db77 from dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129101 (owner: 10Dzahn) [06:11:14] (03PS1) 10Dzahn: remove db48,db63,db75,db76,db77 [operations/dns] - 10https://gerrit.wikimedia.org/r/129102 [06:11:38] (03CR) 10Dzahn: [C: 032] remove db48,db63,db75,db76,db77 from dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129101 (owner: 10Dzahn) [06:12:37] (03PS1) 10Withoutaname: Set $wgBabelCategoryNames for betawikiversity [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129103 [06:16:45] (03PS1) 10Dzahn: fix typo re: db68 in DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/129104 [06:17:21] (03CR) 10Dzahn: [C: 032] fix typo re: db68 in DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/129104 (owner: 10Dzahn) [06:22:34] (03CR) 10Dzahn: "springle, s1-secondary/m2-secondary in pmtpa, ok to remove like that too?" [operations/dns] - 10https://gerrit.wikimedia.org/r/129102 (owner: 10Dzahn) [06:23:41] mutante: hmm checking [06:26:25] (03CR) 10Springle: [C: 04-1] "s1-secondary should become db60. m1-secondary can go away." (032 comments) [operations/dns] - 10https://gerrit.wikimedia.org/r/129102 (owner: 10Dzahn) [06:26:56] !log db48,db63 - revoke puppet cert, salt key, kill from storedconfigs [06:27:03] Logged the message, Master [06:27:59] (03PS1) 10Ori.livneh: mwgrep: account for db names with underscores [operations/puppet] - 10https://gerrit.wikimedia.org/r/129105 [06:29:31] (03PS2) 10Dzahn: remove db48,db63,db75,db76,db77 [operations/dns] - 10https://gerrit.wikimedia.org/r/129102 [06:30:22] (03CR) 10Springle: [C: 031] remove db48,db63,db75,db76,db77 [operations/dns] - 10https://gerrit.wikimedia.org/r/129102 (owner: 10Dzahn) [06:33:38] (03PS1) 10Dzahn: update db dsh group file [operations/puppet] - 10https://gerrit.wikimedia.org/r/129106 [06:35:07] (03CR) 10Dzahn: [C: 032] remove db48,db63,db75,db76,db77 [operations/dns] - 10https://gerrit.wikimedia.org/r/129102 (owner: 10Dzahn) [06:36:42] (03PS2) 10Dzahn: update db dsh group file [operations/puppet] - 10https://gerrit.wikimedia.org/r/129106 [06:37:15] (03CR) 10Dzahn: [C: 032] update db dsh group file [operations/puppet] - 10https://gerrit.wikimedia.org/r/129106 (owner: 10Dzahn) [06:54:35] (03PS1) 10Dzahn: remove nfs2 from backup list,dsh,dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129107 [06:57:57] (03PS1) 10Dzahn: remove nfs2 from ldap and base config [operations/puppet] - 10https://gerrit.wikimedia.org/r/129108 [06:59:28] (03CR) 10Dzahn: [C: 032] "it's down" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129107 (owner: 10Dzahn) [07:06:12] (03PS1) 10Dzahn: remove nfs2 from site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129110 [07:08:32] (03PS1) 10ArielGlenn: remove all dns entries for or referring to es2,3,5,6,8,9 [operations/dns] - 10https://gerrit.wikimedia.org/r/129111 [07:09:28] (03CR) 10ArielGlenn: [C: 031] remove nfs2 from ldap and base config [operations/puppet] - 10https://gerrit.wikimedia.org/r/129108 (owner: 10Dzahn) [07:10:00] (03CR) 10Dzahn: [C: 032] remove nfs2 from ldap and base config [operations/puppet] - 10https://gerrit.wikimedia.org/r/129108 (owner: 10Dzahn) [07:10:54] (03CR) 10Dzahn: [C: 032] remove nfs2 from site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/129110 (owner: 10Dzahn) [07:12:07] !log nfs2 - revoke puppet cert,salt key,stored configs [07:12:12] Logged the message, Master [07:12:45] ACKNOWLEDGEMENT - Host nfs2 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn decom, RT #7341 [07:13:58] (03CR) 10Springle: [C: 031] remove all dns entries for or referring to es2,3,5,6,8,9 [operations/dns] - 10https://gerrit.wikimedia.org/r/129111 (owner: 10ArielGlenn) [07:16:12] (03CR) 10ArielGlenn: [C: 032] remove all dns entries for or referring to es2,3,5,6,8,9 [operations/dns] - 10https://gerrit.wikimedia.org/r/129111 (owner: 10ArielGlenn) [07:27:58] (03PS1) 10ArielGlenn: remove db10 from dhcp, dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/129114 [07:29:34] (03CR) 10Dzahn: [C: 031] remove db10 from dhcp, dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/129114 (owner: 10ArielGlenn) [07:29:54] (03CR) 10ArielGlenn: [C: 032] remove db10 from dhcp, dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/129114 (owner: 10ArielGlenn) [07:32:06] RECOVERY - Puppet freshness on nfs1 is OK: puppet ran at Wed Apr 23 07:32:04 UTC 2014 [07:32:11] !log nfs1 - re-enabled puppet [07:32:17] Logged the message, Master [07:41:48] (03PS1) 10ArielGlenn: remove last vestiges of es1, db62, tesla [operations/dns] - 10https://gerrit.wikimedia.org/r/129115 [07:45:05] !log nfs1 - delete some old kernels and zip mw logs last touched in 2012/13 to free some disk on / [07:45:12] Logged the message, Master [07:45:36] RECOVERY - Disk space on nfs1 is OK: DISK OK [07:47:06] RECOVERY - NTP on nfs1 is OK: NTP OK: Offset 0.02552473545 secs [07:51:29] ACKNOWLEDGEMENT - twemproxy process on fenari is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker daniel_zahn start: Job failed to start does it matter on fenari? [07:59:46] (03CR) 10Dzahn: [C: 032] Dead blogs are dead [operations/puppet] - 10https://gerrit.wikimedia.org/r/129031 (owner: 10Odder) [08:05:35] (03CR) 10Krinkle: [C: 031] "Nice." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129105 (owner: 10Ori.livneh) [08:07:27] (03Abandoned) 10Dzahn: Removing redundant config. MediaWiki:privacypage is now redirecting to Fondationswiki. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/127488 (owner: 10Gerrit Patch Uploader) [08:13:16] (03CR) 10Dzahn: [C: 032] remove last vestiges of es1, db62, tesla [operations/dns] - 10https://gerrit.wikimedia.org/r/129115 (owner: 10ArielGlenn) [08:48:08] (03CR) 10Krinkle: "Hm.. doesn't seem like an obvious benefit indeed. Haven't done any checks but just to spell out the factors I see:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125927 (owner: 10Ori.livneh) [08:50:01] akosiaris: i have looked at the pastebin you gave me yesterday, not sure where those defs fit [08:51:25] (03CR) 10Krinkle: Gzip SVGs on back upload varnishes (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108484 (owner: 10Ori.livneh) [08:52:30] matanya: in https://gerrit.wikimedia.org/r/#/c/117674/ for example to replace iptables_add_service{ 'private_eqiad1': blah blah } with the corresponding ferm::rule it is better to use the def instead of directly adding the ip space in the rule [08:53:04] those defs are calculated directly from network.pp so any change in network.pp is bound to update those defs as well [08:53:38] the idea is to only update these things once and not having to chase down the puppet tree for every possible way they can be defined [08:54:34] <_joe_> akosiaris: or, you need to separate data and functionality [08:54:36] <_joe_> :) [08:54:48] <_joe_> that's always a good idea and hiera will help a lot [08:54:58] aha, i see. thanks. my question was regarding the fact the ranges don't fit [08:55:36] e.g. in icinga: 208.80.152.0/24 vs 208.80.152.0/22 in the paste [08:56:00] if i use the paste ranges i in fact open larger networks [08:56:25] akosiaris: which i guess isn't prefferable :) [08:57:48] so this might be the result of having these things defined in many places and so they are out of sync now [08:59:04] in fact it is worse [08:59:18] seems like my next task is to sync it back ... [08:59:27] someone just put the the /24 for 154 [08:59:36] it is not even split that way... [09:00:20] so look at all_network_subnets [09:00:27] in network.pp [09:01:02] public1-{a,b,c}-eqiad are three /26 [09:01:05] yes, i started there at first [09:01:59] the last /26 is not servers anyway [09:02:23] c or d ? [09:02:35] oh d is /27 [09:02:36] 208.80.154.192/26 [09:02:42] and it is not even really a /26 [09:03:35] public-services-2 on pmtpa [09:03:36] but it is split up in many smaller /30, /31s and yada yada [09:04:00] seems like needing an overhaul [09:04:15] i can do that, if i know the correct ranges [09:04:55] for anything that really matters about servers, network.pp is authoritative [09:05:08] the rest, DNS is authoritative as well [09:06:14] http://metrics.wikimedia.org is just "it works".. hmmm [09:06:42] mutante: please close: https://rt.wikimedia.org/Ticket/Display.html?id=4346 [09:06:53] eh, hi, let me read backlog [09:07:01] hi :) [09:08:21] (03PS3) 10Krinkle: bugzilla apache: Enable required modules for caching [operations/puppet] - 10https://gerrit.wikimedia.org/r/127254 (owner: 10JanZerebecki) [09:09:00] (03PS2) 10Krinkle: bugzilla: Enable strict transport security [operations/puppet] - 10https://gerrit.wikimedia.org/r/127256 (owner: 10JanZerebecki) [09:10:54] https://en.wikipedia.org/wiki/HTTP_Strict_Transport_Security#Limitations [09:10:54] >> [09:10:55] << [09:11:21] matanya: akosiaris : one lazy way to check subnets, if we trust DNS: [09:11:24] dns/templates$ grep \; 10.in-addr.arpa [09:13:06] matanya: shouldn't i just rename it? [09:13:25] so, if i got it right: private_pmtpa_nolabs should be 10.0.0.0/14 ? [09:13:43] rename to what mutante ? [09:14:03] * so, if i got it right: private_pmtpa_nolabs should be 10.0.0.0/16 ? [09:14:17] which is in fact $PMTPA_PRIVATE_PRIVATE_IPV4 [09:14:18] matanya: "put ferm rules on sanger replacement"? [09:14:35] matanya: lgtm [09:14:50] mutante: it depends on what the ticket intended to be [09:14:52] the /16 one [09:15:23] since sanger does mainly ldap [09:15:32] thanks akosiaris i'll fix them [09:16:08] ; 10.0.0.0/12 - pmtpa ; ; 10.0.0.0/16 - pmtpa private vlan :p [09:19:29] matanya: so.. i read those again.. an update on 6163 is what we need [09:20:44] ok [09:25:32] matanya: updated on RT [09:25:42] thanky [09:40:17] (03PS5) 10Matanya: icinga: replace iptable with ferm rules [operations/puppet] - 10https://gerrit.wikimedia.org/r/117674 [09:42:22] akosiaris: i hope this ^ is what you and mutante meant [09:44:48] matanya: nope. The defs are only valid inside ferm, not inside a puppet array [09:45:11] so you need to move those inside ferm::rule s [09:45:22] oh, i did that at first, but thought this way would be better [09:45:49] back to the saved patch in staging [09:45:59] I am wondering whether it is better to have one large ferm::rule for each [09:46:13] or one per port/def pair ... [09:46:48] i think the latter is more flexible but it does not really offer that much of a benefit [09:46:59] so feel free to do it in one rule per port [09:48:01] (03PS6) 10Matanya: icinga: replace iptable with ferm rules [operations/puppet] - 10https://gerrit.wikimedia.org/r/117674 [09:48:55] here it is akosiaris [09:50:44] matanya: ok, this one looks good [09:50:57] so, let's see if neon is ferm enabled :-) [09:51:11] thanks. i doubt that [09:52:22] yeah.... :-( [09:53:13] (03CR) 10Alexandros Kosiaris: [C: 032] "LGTM, not merging however before the rest of neon's puppetized services become ferm enabled too" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117674 (owner: 10Matanya) [09:53:16] where do we have ferm? i haven't seen many hosts with it [09:54:09] and what else is on neon? [09:54:38] carbon has ferm due to role::installserver [09:54:52] and neon has the icinga webui [09:55:11] so port 80/443 [09:55:27] and motd also says tendril, ishmael, ircecho but all these seem old and wrong [09:56:21] tcpircbot howerver on 9200 is correct [09:56:29] i think it does have ircecho for icinga-wm [09:56:50] or tcpircbot does that? [09:57:21] iirc tcpircbot is for logmsgbot [09:57:33] or what ever his name is now [09:57:59] heh tcpircbot::instance { 'logmsgbot': [09:58:30] neon also needs to receive snmtrapd [09:58:30] yeah, that :) [09:58:43] snmp [09:58:46] that is in my patch [09:59:38] so a role::tcpircbot including ::tcpircbot, a ferm role and a system::role thingy and maybe we are ok ? [09:59:47] a ferm::rule* [10:00:18] I get the distinct feeling the icinga installation is not puppetized btw [10:00:33] it is not [10:01:02] but neon wasnt the first host it was on.. what is missing [10:01:18] no it seems I was wrong [10:01:21] used to be spence [10:01:27] class icinga::monitor { [10:01:32] but it also was nagios->icinga .. so [10:01:37] and it includes icinga::monitor::apache [10:01:44] akosiaris: from: https://wikitech.wikimedia.org/wiki/Puppet_Todo : nrpe.pp, nagios.pp, misc/icinga.pp: Alex to modularize those [10:01:46] there are still some remnants called nagios , also confusing in private repo [10:02:10] but we did not want to go so far to build icinga-plugins.deb :p [10:02:14] matanya: yeah the first has been done [10:02:28] the other two.... not yet [10:05:27] (03PS1) 10QChris: Remove unused metrics and metrics-api [operations/dns] - 10https://gerrit.wikimedia.org/r/129134 [10:08:00] (03PS1) 10Matanya: tcpircbot: add a role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129135 [10:08:20] akosiaris: it ^ comes with a bonus of monitoring too :) [10:10:16] (03CR) 10Dzahn: [C: 031] "ah, thanks for that Chris, +1, unless you want to add a redirect for the old URLs" [operations/dns] - 10https://gerrit.wikimedia.org/r/129134 (owner: 10QChris) [10:10:38] qchris: aha! thanks [10:11:02] mutante: I am not sure about fiking the links to that domain. [10:11:04] i just happened to run across the link via google actually [10:11:22] Oh. [10:11:34] I'll bring it up today during our standup meeting. [10:11:37] somebody asked how to get "number of bytes per user" [10:11:41] (03PS2) 10Matanya: tcpircbot: add a role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129135 [10:12:10] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor stuff" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129135 (owner: 10Matanya) [10:12:16] mutante: Wikimetrics can answer these kind of questions (it is the new "UserMetrics") [10:12:19] qchris: thanks. i found this one https://www.mediawiki.org/wiki/UserMetrics/Guide [10:12:25] ah... already PS2 ? [10:12:33] qchris: gotcha,ok [10:12:34] btw... why does tcpircbot listen on 9200 ? [10:12:35] mutante: https://metrics.wmflabs.org/ [10:12:49] ah:) [10:13:08] ah found it [10:13:16] https://doc.wikimedia.org/puppet/classes/tcpircbot.html [10:13:31] mutante: But yes. The dangling links are bad :-( [10:13:40] matanya: also a ferm::rule on that role for TCP/9200 [10:13:45] (03CR) 10Dzahn: "redirect URLs to https://metrics.wmflabs.org/ ? (or fix links in https://www.mediawiki.org/wiki/UserMetrics/Guide)" [operations/dns] - 10https://gerrit.wikimedia.org/r/129134 (owner: 10QChris) [10:13:57] akosiaris: done in the next ps [10:14:34] (03PS3) 10Matanya: tcpircbot: add a role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129135 [10:17:13] I don't think ferm likes mapped-ipv6 addresses though... [10:17:18] hmm let's see [10:17:43] the foundation doesn't use true IPv4-mapped addresses [10:18:12] the bits aren't the same, they just read the same b/c the digits read the same (but have different values, as IPv6 uses base 16) [10:18:25] (03PS1) 10Giuseppe Lavagetto: Removed decommissioned pdus from monitoring. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129138 [10:20:00] (03CR) 10QChris: "You're right. Redirecting to https://metrics.wmflabs.org/ might" [operations/dns] - 10https://gerrit.wikimedia.org/r/129134 (owner: 10QChris) [10:20:42] apergos: hey [10:21:43] (03CR) 10Giuseppe Lavagetto: [C: 032] Removed decommissioned pdus from monitoring. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129138 (owner: 10Giuseppe Lavagetto) [10:24:02] (03CR) 10Dzahn: "thanks" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129138 (owner: 10Giuseppe Lavagetto) [10:25:27] (03CR) 10Dzahn: "reduces number of DOWN hosts to 3 :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129138 (owner: 10Giuseppe Lavagetto) [10:28:20] (03PS1) 10Yuvipanda: toollabs: Handle missing trailing slash for tools [operations/puppet] - 10https://gerrit.wikimedia.org/r/129139 [10:32:51] !log Jenkins switching integration-slave1001.eqiad.wmflabs java to use Java 7 . In https://integration.wikimedia.org/ci/computer/integration-slave1001/configure changed JavaPath to /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java [10:32:58] Logged the message, Master [10:35:12] !log Jenkins: update lanthanum slave agent to use java7 [10:35:19] Logged the message, Master [10:39:07] paravoid: hello (was out) [10:39:12] what's up [10:56:22] (03PS1) 10Matanya: gitblit: remove nrpe from role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129143 [10:58:25] <_joe_> matanya: why that? [10:58:37] this change? [10:58:44] <_joe_> (removing nrpe from gitblit role) [11:00:34] if i understand it correctly (which i doubt) nrpe is being installed on this host by calling base which is called by standard which is on the machine. if we have two classes calling the same package to install we have duplicate defs and no joy [11:00:56] isn't it _joe_ ? [11:01:02] <_joe_> matanya: duplicate includes are not an issue [11:01:31] only explicit resources calling ? [11:01:43] <_joe_> matanya: but that's ok, if nrpe is included in standard it shouldn't be included again. My point was - make the commit msg more xplicit [11:01:55] oh, yeah [11:01:56] <_joe_> matanya: yes AFAIR and maybe this is true for puppet 3.x only [11:02:02] <_joe_> but check this [11:06:40] (03CR) 10Dzahn: "BugMail.pm - looks ok, this is what is live and should be synced" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/127898 (owner: 10Aklapper) [11:14:35] (03PS1) 10Dzahn: sync Bugzilla 4.4.4 changes on custom BugMail.pm [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/129146 [11:16:32] (03CR) 10Dzahn: [C: 04-1] "synced BugMail.pm in Change-Id: Id70e8dcc95" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/127898 (owner: 10Aklapper) [11:19:53] (03CR) 10Dzahn: [C: 032] "straight copy, same thing as in Change-Id: I7b1d9e735" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/129146 (owner: 10Dzahn) [11:20:27] (03CR) 10Dzahn: [V: 032] "straight copy, same thing as in Change-Id: I7b1d9e735" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/129146 (owner: 10Dzahn) [11:22:17] (03PS2) 10Dzahn: Port upstream Bugzilla 4.4.4 changes to our modifications [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/127898 (owner: 10Aklapper) [11:24:19] (03PS1) 10Faidon Liambotis: swift: bump rsync max connections [operations/puppet] - 10https://gerrit.wikimedia.org/r/129149 [11:24:35] (03CR) 10Faidon Liambotis: [C: 032 V: 032] swift: bump rsync max connections [operations/puppet] - 10https://gerrit.wikimedia.org/r/129149 (owner: 10Faidon Liambotis) [11:29:39] mutante: should nrpe be removed from everywhere but base? [11:31:17] matanya: yes, since https://gerrit.wikimedia.org/r/#/c/124576/ as long as the nodes all actually have base already [11:31:39] so my patch above is valid? [11:31:50] but i tried to "add base where it was not included yet" [11:32:07] step by step [11:33:02] matanya: pretty much what _joe_ said, duplicate includes not an issue, but you can clean it up nevertheless [11:33:43] mutante: the question is i should continue with this, i have like 5 more of this [11:34:01] or shouldn't [11:34:47] just explain it a bit more in the message and fix the link [11:34:59] "see #cd511 [11:35:13] that doesn't link [11:35:19] you need a few more chars [11:36:26] (03PS2) 10Matanya: gitblit: remove nrpe from role [operations/puppet] - 10https://gerrit.wikimedia.org/r/129143 [11:36:58] (03CR) 10Dzahn: "indeed, nrpe is in base since Change-Id: I1e25a532db0 (also RT #80). so it's not really needed here anymore as long as the nodes also have" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129143 (owner: 10Matanya) [11:40:39] (03PS1) 10Hashar: contint: symlink for Jenkins email templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/129152 [11:42:34] (03CR) 10Hashar: [C: 031 V: 032] "I have created the symlink manually on gallium (recipient of the role::ci::master class." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129152 (owner: 10Hashar) [12:07:33] (03CR) 10Manybubbles: [C: 031] "Looks good to me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129105 (owner: 10Ori.livneh) [12:10:33] (03CR) 10Ori.livneh: [C: 032] mwgrep: account for db names with underscores [operations/puppet] - 10https://gerrit.wikimedia.org/r/129105 (owner: 10Ori.livneh) [12:15:39] (03CR) 10Dzahn: [C: 032 V: 032] "ignore my previous comments, this just confuses me every time at first (default vs. custom), lgtm" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/127898 (owner: 10Aklapper) [12:24:25] Reedy, manybubbles: please have a look at https://bugzilla.wikimedia.org/show_bug.cgi?id=64283 [12:24:31] I just am ;) [12:25:57] bleh [12:26:12] lsearchd's tendrils in places we don't like [12:26:15] or expect [12:26:29] $wgSearchType just needs resetting for beta... [12:27:19] $wgSearchType = 'cirrus'; ? [12:28:32] $wgSearchType = 'CirrusSearch'; [12:29:04] wmgUseCirrus [12:30:55] > var_dump( $wmgUseCirrus, $wgSearchType ); [12:30:55] bool(true) [12:30:55] string(12) "CirrusSearch" [12:30:57] wut [12:31:44] I wonder if the apaches are out of date [12:33:34] (03CR) 10Dzahn: [C: 04-1] "thank you! i just set -1 for now because of the comment " should probably be tested on an instance that is not live" by submitter" [operations/puppet] - 10https://gerrit.wikimedia.org/r/127254 (owner: 10JanZerebecki) [12:35:37] stupid fallbacks [12:36:05] $wgSearchTypeAlternatives = array( 'LuceneSearch' ); [12:38:03] (03PS1) 10Reedy: Override wgSearchTypeAlternatives for beta to remove lucene [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129161 [12:38:34] (03CR) 10Reedy: [C: 032] Override wgSearchTypeAlternatives for beta to remove lucene [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129161 (owner: 10Reedy) [12:38:41] (03Merged) 10jenkins-bot: Override wgSearchTypeAlternatives for beta to remove lucene [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129161 (owner: 10Reedy) [12:39:24] se4598: Try that [12:41:06] Reedy: ok, the page creation now works for me [12:44:14] Reedy: feel free to close one (both), maybe also move the fatals-one to the correct product. Meh, I wonder if I unnecessarily reopened the other bug and shouldn't do that in the future... [12:46:13] It's fine doing it like that [12:46:36] !log Jenkins: upgrading email-ext and JobConfigHistory plugins (the later now supports slaves configs!) [12:46:42] Logged the message, Master [12:46:56] !log restarting jenkins [12:47:03] Logged the message, Master [12:50:46] !log Jenkins back [12:50:53] Logged the message, Master [12:54:28] !log marc synchronized wmf-config/interwiki.cdb 'Updating interwiki cache' [12:54:35] Logged the message, Master [13:09:26] (03CR) 10Nullzero: [C: 031] "add some reviewers please. I desperately want this bug to be fixed!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129139 (owner: 10Yuvipanda) [13:19:01] Apr 22 19:35:16 cp3019 sshd[9093]: Found matching RSA key: a3:28:bf:a0:45:34:4e:12:53:4b:da:bd:d8:fc:1b:c6 [13:19:05] 2048 a3:28:bf:a0:45:34:4e:12:53:4b:da:bd:d8:fc:1b:c6 otto@klein.local (RSA) [13:19:11] ottomata: what did you do on cp3019? [13:19:24] see http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=cp3019.esams.wikimedia.org&m=mem_report&r=day&s=by%20name&hc=4&mc=2&st=1398258399&g=mem_report&z=large&c=Bits%20caches%20esams [13:19:34] something with varnishkafka I'm guessing? [13:20:00] hm, recently? [13:20:05] yesterday [13:20:15] yesterday I turned on varnishkafka on text varnishes [13:20:27] cp3019 is bits [13:20:30] you sshed [13:20:42] yeah was checking out some general memory usage [13:20:47] i don't think I started anything [13:21:10] checking... [13:22:03] that is approximately when I turned on vk for text [13:22:50] <_joe_> paravoid: it's been instantly the same on all the bits esams cluster http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Bits+caches+esams&m=cpu_report&s=by+name&mc=2&g=mem_report [13:26:22] the varnishkafka process has been running on cp3019 since March 17 [13:26:41] <_joe_> the same happened in eqiad AFAICT, and in ulsfo http://ganglia.wikimedia.org/latest/graph.php?r=day&z=large&c=Bits+caches+eqiad&m=cpu_report&s=descending&mc=2&g=mem_report [13:29:34] http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Bits+caches+esams&h=cp3019.esams.wikimedia.org&jr=&js=&v=76399&m=varnish.n_objecthead&vl=N&ti=N+struct+objecthead [13:30:01] a deploy, most probably [13:30:33] what is struct objecthead? [13:30:39] objects [13:31:46] <_joe_> btw, I can't confirm bits being slow. [13:31:58] me neither [13:32:03] could be a separate issue, e.g. a network issue [13:33:14] that timing does really line up with when I turned on text varnishkafka though [13:35:23] it lines up with 1.24wmf1 too though [13:48:41] ^demon|away: Ideas how long it might take for Cirrus indexes to build? [13:49:02] twkozlowski: manybubbles would now how long it takes for Cirrus indexes to build :] [13:49:05] I understand that those wikis that are still having their indexes built cannot use CirrusSearch yet? [13:49:31] twkozlowski and hashar: a few days [13:49:42] the price of resolving templates is pretty steep [13:50:02] lsearchd has a hacky wikitext processor in java thats a ton faster [13:50:10] but templates are nomy [13:51:07] (03PS1) 10Ottomata: Fixing metric names in kafkatee ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129172 [13:51:13] manybubbles: so that those new wikis (sv, ru, pl, ja, zh) cannot use CirrusSearch yet? [13:51:33] twkozlowski: well, _technically_ they should be able to access what is indexes via a magic url parameter [13:51:36] (03CR) 10Ottomata: [C: 032 V: 032] Fixing metric names in kafkatee ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129172 (owner: 10Ottomata) [13:52:15] !log unmounted /vol/originals and /vol/thumbs on fenari (was /mnt/upload7, /mnt/thumbs2) see RT #7076 [13:52:15] twkozlowski: srbackend=CirrusSearch [13:52:21] Logged the message, Master [13:52:34] manybubbles: Oh, I understand. Just trying to see if we should feature this news in this week's issue of Tech News or the next one. [13:52:46] <^d> Morning folks. [13:53:08] Hola ^d [13:53:13] twkozlowski: maybe "it is coming" but not, here is a special debuggin url parameter you can try [13:53:36] <^d> Or we could just wait til next week when we flip the beta switch :) [13:53:43] manybubbles: We close our issues around 19:00 UTC on Saturdays, so maybe it'll be done by then. [13:53:53] or not? :-) [13:54:55] twkozlowski: wait 'till we flip the beta switch next week [13:55:24] manybubbles: okay, we'll announce it in the future tense then. [13:55:41] manybubbles: Thank you for your work! [13:57:55] ^d: Thank you for your work as well! Really appreciate what you guys are doing. [13:58:14] <^d> You're welcome :) [14:03:17] !log adding AS path 1257 6830 (Tele2 -> UPC) to avoided paths @ cr1-esams/cr2-knams, multiple users reporting slowness issues [14:03:24] Logged the message, Master [14:15:12] (03PS1) 10Alexandros Kosiaris: Remove old PTRs for srv193, srv235-250.pmtpa.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/129173 [14:16:34] (03CR) 10Alexandros Kosiaris: "apergos, adding you because of the in commit message mentioned change." [operations/dns] - 10https://gerrit.wikimedia.org/r/129173 (owner: 10Alexandros Kosiaris) [14:16:47] looking [14:17:21] good, yes [14:17:27] :-) [14:17:48] (03CR) 10ArielGlenn: [C: 031] Remove old PTRs for srv193, srv235-250.pmtpa.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/129173 (owner: 10Alexandros Kosiaris) [14:17:58] (03CR) 10Alexandros Kosiaris: [C: 032] Remove old PTRs for srv193, srv235-250.pmtpa.wmnet [operations/dns] - 10https://gerrit.wikimedia.org/r/129173 (owner: 10Alexandros Kosiaris) [14:18:15] akosiaris: while you're here, there is no reason whatsoever we need stafford, right? [14:18:31] does that thing still exist ? [14:18:34] I mean, it was puppetmaster once, in its heyday but now it doesn't do anything new does it? that somehow I missed? [14:18:36] kill it with fire!!!! [14:18:43] goooooood [14:20:27] I had a good picture for occasions like this [14:20:37] http://i1.kym-cdn.com/photos/images/newsfeed/000/337/603/43f.gif [14:20:40] BURNINATE WITH FIRE [14:21:14] lol [14:21:39] :-D :-D [14:27:46] (03PS1) 10ArielGlenn: remove last vestiges of stafford, long since retired [operations/puppet] - 10https://gerrit.wikimedia.org/r/129179 [14:29:19] (03CR) 10ArielGlenn: [C: 032] remove last vestiges of stafford, long since retired [operations/puppet] - 10https://gerrit.wikimedia.org/r/129179 (owner: 10ArielGlenn) [14:30:19] !log break replication for volumes originals, thumbs on nas1001-a, nas1-a [14:30:26] Logged the message, Master [14:32:24] apergos: dataset icinga alerts? [14:32:50] dataset2? [14:34:24] and 1001 [14:35:42] Coren: https://gerrit.wikimedia.org/r/#/c/129139/ [14:36:46] will look as soon as this last decom is done [14:36:53] (03CR) 10coren: [C: 032] "Huh, that's elegantly simple." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129139 (owner: 10Yuvipanda) [14:40:18] !log unexported /vol/{originals,thumbs} on nas1001-a, nas1-a [14:40:25] Logged the message, Master [14:40:33] Coren: I am unsure if it was a dream or me just reading email when I was really sleepy but that was ori's solution either way :P [14:41:13] Coren: ok, not a dream https://bugzilla.wikimedia.org/show_bug.cgi?id=64274 [14:43:14] Indeed. That's kind of obnoxious in a way because that pushes the broken URI to the lighttpd backend, but it reacts with the default redirect so the net effect seems to be okay. [14:43:49] Coren: yeah. needs a longer term solution at some point tho [14:47:38] (03PS1) 10ArielGlenn: remove stafford dns entries... bye bye [operations/dns] - 10https://gerrit.wikimedia.org/r/129182 [14:48:38] (03CR) 10ArielGlenn: [C: 032] remove stafford dns entries... bye bye [operations/dns] - 10https://gerrit.wikimedia.org/r/129182 (owner: 10ArielGlenn) [14:58:22] (03PS1) 10ArielGlenn: remove dhcp for srv193, last of the srvs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129183 [15:00:08] (03CR) 10ArielGlenn: [C: 032] remove dhcp for srv193, last of the srvs [operations/puppet] - 10https://gerrit.wikimedia.org/r/129183 (owner: 10ArielGlenn) [15:05:16] (03PS1) 10Ottomata: Fixing kafkatee ganglia view host regex for a few metrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/129185 [15:05:26] (03PS2) 10Ottomata: Fixing kafkatee ganglia view host regex for a few metrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/129185 [15:05:31] (03CR) 10Ottomata: [C: 032 V: 032] Fixing kafkatee ganglia view host regex for a few metrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/129185 (owner: 10Ottomata) [15:11:22] (03PS1) 10ArielGlenn: remove ips for: sq67-70, db6,40, es9, nfs2, srv193,235-250 [operations/dns] - 10https://gerrit.wikimedia.org/r/129188 [15:14:27] (03CR) 10ArielGlenn: [C: 032] remove ips for: sq67-70, db6,40, es9, nfs2, srv193,235-250 [operations/dns] - 10https://gerrit.wikimedia.org/r/129188 (owner: 10ArielGlenn) [15:14:43] mutante: I theoretically have access to the 'releases' webserver, but trying to ssh to releases via either bast or directly both gives me a permission denied [15:17:28] As do I... [15:18:30] don't ssh to releases, that hostname is on an lvs server... try caesium [15:18:35] ah! [15:18:37] Reedy and YuviPanda [15:18:45] yeah, trying now [15:18:54] haha [15:19:11] the motd should tell you it does releases [15:20:02] yup, it does [15:20:03] thanks apergos [15:20:07] yw [15:22:26] (03PS1) 10Giuseppe Lavagetto: Tools to compare compiled puppet catalogs. [operations/software] - 10https://gerrit.wikimedia.org/r/129189 [15:22:29] (03CR) 10jenkins-bot: [V: 04-1] Tools to compare compiled puppet catalogs. [operations/software] - 10https://gerrit.wikimedia.org/r/129189 (owner: 10Giuseppe Lavagetto) [15:22:45] ooohhh lookng forward to this [15:24:06] <_joe_> and of course the old hacky file downloaded from the internets is not pep8-compliant [15:24:11] <_joe_> big surprise. [15:24:34] (03CR) 10Rush: [C: 04-1] "beat to this, seems like good stuff to me overall. I haven't actually tested this out on a host, especially the logging changeup of handi" (039 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129075 (owner: 10Ori.livneh) [15:32:30] !log rebooting dataset2 hoping to detect the arrays on reboot [15:32:36] Logged the message, Master [15:32:51] any objections to me deploying https://gerrit.wikimedia.org/r/#/c/129169/ during the swat (now)? [15:32:53] anomie: ^^? [15:33:33] manybubbles: I don't see a problem with it [15:33:51] * manybubbles has the conch [15:34:25] PROBLEM - Host dataset2 is DOWN: PING CRITICAL - Packet loss = 100% [15:38:23] (03PS2) 10Giuseppe Lavagetto: Tools to compare compiled puppet catalogs. [operations/software] - 10https://gerrit.wikimedia.org/r/129189 [15:38:36] RECOVERY - Disk space on dataset2 is OK: DISK OK [15:38:45] RECOVERY - Host dataset2 is UP: PING OK - Packet loss = 0%, RTA = 35.36 ms [15:38:51] yeah it's lies though [15:46:30] !log temporarily disabling puppet on analytics1003 to test some kafkatee settings [15:46:35] Logged the message, Master [15:55:29] !log manybubbles synchronized php-1.24wmf1/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php 'swat update to fix maintenance script' [15:55:35] Logged the message, Master [15:56:14] * manybubbles puts down the conch [15:57:22] !log rebuilding commons' cirrus search index [15:57:25] PROBLEM - Host dataset2 is DOWN: PING CRITICAL - Packet loss = 100% [15:57:28] Logged the message, Master [15:58:46] !log updated adminbot on apt.wikimedia.org to 1.7.5 [15:58:53] Logged the message, Master [16:03:58] (03CR) 10BryanDavis: [C: 031] remove sudo::appserver from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [16:28:26] (03PS1) 10BBlack: add icinga rights for myself [operations/puppet] - 10https://gerrit.wikimedia.org/r/129205 [16:29:04] (03CR) 10BBlack: [C: 032 V: 032] add icinga rights for myself [operations/puppet] - 10https://gerrit.wikimedia.org/r/129205 (owner: 10BBlack) [16:31:35] Why did you guys switch from Nagios to Icinga? [16:40:39] (03CR) 10coren: [C: 032] "Sane." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [16:40:45] (03PS5) 10coren: remove sudo::appserver from bastions [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [16:42:13] (03CR) 10coren: [C: 032] "Still sane, even after a rebase." [operations/puppet] - 10https://gerrit.wikimedia.org/r/126014 (owner: 10Dzahn) [16:46:14] hey, i'd love to start contributing, is there an easy starter ticket I can help with? [16:46:24] Contributing to what? [16:47:05] pancakes9: Meaning what sort of things would you like to help with? [16:47:24] anything ops related [16:47:27] puppet? [16:52:43] (03PS1) 10coren: Tool Labs: remove conflicting package [operations/puppet] - 10https://gerrit.wikimedia.org/r/129206 [16:52:56] akosiaris: ^ [16:53:55] akosiaris: ping [16:54:06] Krinkle: pong [16:54:11] akosiaris: regarding jsduck [16:54:39] akosiaris: I'm fairly confident it's working, (using my own local install of jsduck 5 against local copy of mwcore). However I'd be happy to test the package you got [16:54:44] akosiaris: How to I install it on a labs node? [16:55:31] (03PS1) 10Ottomata: Fixing varnishkafka view host regex, removing erroneous client req [operations/puppet] - 10https://gerrit.wikimedia.org/r/129207 [16:55:51] Krinkle: wget http://apt.wikimedia.org/pending/ruby-jsduck_5.3.4-1wmf1_all.deb and dpkg -i ruby-jsduck_5.3.4-1wmf1_all.deb [16:56:23] then make sure no gem installed or other manually installed version of jsduck gets in the way and test :-) [16:56:27] (03CR) 10Ottomata: [C: 032 V: 032] Fixing varnishkafka view host regex, removing erroneous client req [operations/puppet] - 10https://gerrit.wikimedia.org/r/129207 (owner: 10Ottomata) [16:56:52] bd808: any helpful pointers? [16:56:53] once you give me an ok, I can upload on apt.wikimedia.org [16:57:20] (03CR) 10Alexandros Kosiaris: [C: 032] Tool Labs: remove conflicting package [operations/puppet] - 10https://gerrit.wikimedia.org/r/129206 (owner: 10coren) [16:57:33] akosiaris: How was it done? I heard there were some deps changes [16:57:38] testing now.. [16:58:23] pancakes9: I think there is still quite a bit of puppet cleanup help wanted. matanya is a volunteer who has been doing a lot. You might try pinging him for good places to start. [17:00:07] Krinkle: good thing that you asked, I have to upload a patch for the package. [17:00:12] bd808: are there other things apart from puppet that would be a good starter point? [17:01:28] akosiaris: I just finished logging into a labs instance, cloning mwcore and fetching the apt/deb url you gave. Should I hold on with dpkg? [17:02:00] pancakes9: I think there is some work starting soon on improving our monitoring services. I'm just a lowly developer though so having Ops folks give you pointers would probably be best. :) [17:02:24] Krinkle: nope. go on [17:06:15] bd808: ha okay, last question, i'm in the right channel right for what I'm looking for? [17:06:19] pancakes9: If you were interested in greenfield work, I know there are many wish list features for our Logstash setup -- https://bugzilla.wikimedia.org/buglist.cgi?component=Logstash&list_id=308885&product=Wikimedia&query_format=advanced&resolution=---&resolution=LATER&resolution=DUPLICATE [17:06:43] pancakes9: Yeah. This is the place that the Ops team hangs out. [17:08:15] bd808: Where the ops team hang out, have the odd work discussion and make it look like their doing work? :p [17:13:55] (03PS1) 10BBlack: add more icinga perms for myself [operations/puppet] - 10https://gerrit.wikimedia.org/r/129208 [17:14:47] (03CR) 10BBlack: [C: 032 V: 032] add more icinga perms for myself [operations/puppet] - 10https://gerrit.wikimedia.org/r/129208 (owner: 10BBlack) [17:26:47] bd808: next time point people at: https://wikitech.wikimedia.org/wiki/Get_involved :) [17:28:12] _joe|away: +1000 on puppet catalog diff tools [17:31:01] matanya: Thanks. I didn't know that page existed. [17:31:33] pancakes9: See https://wikitech.wikimedia.org/wiki/Get_involved as suggested by matanya [17:31:39] it is fairly new, was a google code-in task by qgil i think [17:31:42] pancakes9, re your earlier question about Icinga: Nagios has beeen overtaken by sales managers and is a FLOSS only by name, so we switched to its fork Icinga [17:32:00] already told him in pmbd808 [17:32:13] what is FLOSS? [17:32:24] free open source [17:32:33] software [17:32:49] and that is not 100% correct MaxSem [17:32:52] free/libre/open-source software [17:33:06] that too :) [17:33:34] mark: so does something make the disk persistence varnish plug in hard to maintain? [17:34:02] * AaronSchulz wishes https://www.varnish-cache.org/utility/secondary-hash-hash-ninja was open-core [17:34:43] so pancakes9 : git clone https://gerrit.wikimedia.org/r/p/operations/puppet would be a good start [17:42:22] <_joe|away> matanya: :) [17:45:26] bd808: i have looked at Bug 60690 - Add puppet logs to logstash, what you offer there seems nice, but i think we can do better [17:46:19] matanya: Ok. Patches welcome :) [17:46:38] since i have no access, i can't really help there, but i do have some ideas, that might be completely stupid [17:47:17] matanya: We could probably get you access to the deployment-prep (beta) instances to work on things [17:47:49] We have logstash and a local puppet master there so we can work on things like this outside of prod [17:48:05] I asked greg-g about it some time ago, he said there are some legal things involved [17:48:34] blerg. [17:48:42] (03PS1) 10Chad: Adding 7 new wikis to beta, mainly for search testing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129210 [17:48:45] i know nothing about legal kind of stuff [17:49:07] I've got a stand alone logstash project too where we could rig up the necessary parts [17:49:29] bd808: is that in kabs? [17:49:32] *labs [17:49:37] There wouldn't be any lega issues there since that instance isn't getting any traffic at the moment [17:49:40] matanya: Yes. [17:50:01] you can add me there then [17:50:02] you can add me there then [17:50:07] laggg [17:50:22] (03CR) 10Chad: [C: 032] Adding 7 new wikis to beta, mainly for search testing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129210 (owner: 10Chad) [17:50:30] (03Merged) 10jenkins-bot: Adding 7 new wikis to beta, mainly for search testing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129210 (owner: 10Chad) [17:50:59] <^d> Again with the unmerged interwiki.cdb file. [17:51:10] !log demon updated /a/common to {{Gerrit|I960a792bc}}: Override wgSearchTypeAlternatives for beta to remove lucene [17:51:17] Logged the message, Master [17:51:22] (03PS1) 10Chad: Commit interwiki.cdb again [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129211 [17:51:30] (03CR) 10Chad: [C: 032 V: 032] Commit interwiki.cdb again [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129211 (owner: 10Chad) [17:52:08] !log demon synchronized wikiversions-labs.json 'no-op for prod, syncing for completeness' [17:52:13] Logged the message, Master [17:52:30] !log demon synchronized all-labs.dblist 'no-op for prod, syncing for completeness' [17:52:37] Logged the message, Master [17:52:40] matanya: {{done}} [17:52:47] thanks [17:54:09] bd808: any public ip/proxy for kibana UI? [17:54:15] (03Abandoned) 10Tim Landscheidt: Tools: Remove already defined ttf-dejavu-core from exec_environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/127476 (owner: 10Tim Landscheidt) [17:54:39] matanya: There is a logstash-dev.eqiad.wmflabs host but I really haven't set it up yet. That project needs a puppet master and salt master setup so we can use the prod puppet config there [17:55:13] steal the one hashar built [17:55:20] I can spend some time today/tonight getting the basic parts up and running [17:55:50] Yeah It's just a matter of pushing buttons and following instructions [17:56:07] ok, cool [17:56:19] (03PS1) 10Chad: Fix $wmgBetaFeaturesWhitelist for labs too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129213 [17:56:20] I got it all setup nicely in beta which trumped the need for the dedicated project until today :) [17:56:32] i'll try to find some moments to poke at that [17:56:36] (03CR) 10Chad: [C: 032 V: 032] Fix $wmgBetaFeaturesWhitelist for labs too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129213 (owner: 10Chad) [17:57:21] <^d> Public service announcement for anyone with deploy rights: If you don't actually know what you're doing with InitialiseSettings, leave it for someone who does. [17:57:37] (03CR) 10Ori.livneh: miscellaneous improvements for diamond module (038 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129075 (owner: 10Ori.livneh) [17:59:12] !log demon synchronized wmf-config/InitialiseSettings-labs.php 'no-op in prod, for completeness' [17:59:19] Logged the message, Master [17:59:43] ^d, why do you want so badly that people didn't have any fun?:P [17:59:53] (03CR) 10BryanDavis: "I think Chad may have fixed this with I865a087." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/128963 (owner: 10Jforrester) [18:00:11] ^d: ^ [18:00:22] <^d> isset() won't help actually. [18:00:38] <^d> Should default to false and make that check just loosely check bool if(true) [18:01:11] <^d> Lemme amend. [18:01:24] I'd like to understand the het deploy config stuff, but right now I know it just confuses me. [18:01:43] bd808: It's a hilarious POS. [18:01:53] Mostly in the "what order do these things get loaded in" sense [18:02:17] bd808: grep for the number of 'default' => x statements with no heterogenous config that are in InitialiseSettings rather than CommonSettings, for instance. [18:02:20] (03PS2) 10Chad: Follow-up Ic04c7c8ad: Check if whitelist is set before assigning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/128963 (owner: 10Jforrester) [18:02:48] (03CR) 10Chad: [C: 032] Follow-up Ic04c7c8ad: Check if whitelist is set before assigning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/128963 (owner: 10Jforrester) [18:02:54] (03Merged) 10jenkins-bot: Follow-up Ic04c7c8ad: Check if whitelist is set before assigning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/128963 (owner: 10Jforrester) [18:02:55] bd808: in alphasupercalifragilisticexpialidocious order [18:03:06] <^d> Ok, now let's see if we unbroke beta [18:03:36] ori: regarding my ping yesterday [18:03:52] i was looking for your input on akosiaris comment on https://gerrit.wikimedia.org/r/#/c/118966/ [18:06:01] my comment is "akosiaris knows more than i do, so in doubt, defer to whatever he said" [18:06:24] ^d: Thanks! [18:06:55] matanya: oh, you mean the '@' being wrong? yes, he's right [18:06:57] * ^d is waiting on jenkins [18:07:03] meaning: yes, it is indeed wrong [18:07:03] matanya: he is right [18:07:24] yes, he always right, afaik [18:07:36] we'll catch him make a mistake one day [18:07:40] and then... wham! [18:07:41] my question is what would be a suggested fix? [18:07:58] for his comment? just remove the @ [18:08:04] on proxy_address [18:08:05] that [18:08:46] so the same should be done to the other var [18:08:47] matanya: for more reference [18:08:49] https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Variables_and_Constants [18:09:02] in general, in puppet ERb templates [18:09:07] (03PS1) 10Krinkle: webperf: Add statsd module for mw-js-deprecate [operations/puppet] - 10https://gerrit.wikimedia.org/r/129214 [18:09:09] use @ if the variable came from a .pp file [18:09:15] if the variable is defined in the template itself [18:09:19] it is a local variable [18:09:24] and does not need @ [18:09:54] puppet makes vars in local scope be instance vars in the erb template [18:10:00] my ruby knowledge is almost none-existent so thanks! i'll read that one [18:12:46] (03PS5) 10Matanya: protoproxy: call enable_ipv6_proxy in a sane way [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 [18:17:37] (03PS1) 10Ori.livneh: Add eventlogging service alias [operations/dns] - 10https://gerrit.wikimedia.org/r/129220 [18:18:03] (03PS1) 10Ottomata: Including ganglia::plugin::python { 'diskstat': } on Kafka servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/129221 [18:23:52] (03PS2) 10Ori.livneh: miscellaneous improvements for diamond module [operations/puppet] - 10https://gerrit.wikimedia.org/r/129075 [18:26:25] ^ chasemp [18:26:36] (03PS2) 10Krinkle: webperf: Add statsd module for mw-js-deprecate [operations/puppet] - 10https://gerrit.wikimedia.org/r/129214 [18:29:03] (03PS2) 10Ottomata: Including ganglia::plugin::python { 'diskstat': } on Kafka servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/129221 [18:29:17] (03CR) 10Ottomata: [C: 032 V: 032] Including ganglia::plugin::python { 'diskstat': } on Kafka servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/129221 (owner: 10Ottomata) [18:31:33] (03PS3) 10Krinkle: webperf: Add statsd module for mw-js-deprecate [operations/puppet] - 10https://gerrit.wikimedia.org/r/129214 [18:31:34] RECOVERY - Host dataset2 is UP: PING OK - Packet loss = 0%, RTA = 36.61 ms [18:32:55] (03PS4) 10Krinkle: webperf: Add statsd module for mw-js-deprecate [operations/puppet] - 10https://gerrit.wikimedia.org/r/129214 [18:35:56] (03PS5) 10Ori.livneh: webperf: Add statsd module for mw-js-deprecate [operations/puppet] - 10https://gerrit.wikimedia.org/r/129214 (owner: 10Krinkle) [18:38:39] (03CR) 10Ori.livneh: [C: 032] webperf: Add statsd module for mw-js-deprecate [operations/puppet] - 10https://gerrit.wikimedia.org/r/129214 (owner: 10Krinkle) [18:48:19] Reedy: it looks like search on beta is disabled [18:48:22] http://en.wikipedia.beta.wmflabs.org/w/index.php?search=used+by+selenium&title=Special%3ASearch&go=Go [18:48:31] Wikipedia search is disabled for performance reasons. You can search via Google or Yahoo! in the meantime. [18:50:09] Cool [18:50:41] Cirrus is still loaded [18:50:47] I removed Lucene from $wgSearchTypeAlternatives earlier to fix the fatals [18:51:59] > var_dump( $wgSearchType, $wgSearchTypeAlternatives ); [18:51:59] string(12) "CirrusSearch" [18:51:59] array(0) { [18:51:59] } [18:53:05] PROBLEM - Host ps1-d1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [18:53:44] > var_dump( $wgDisableTextSearch ); [18:53:44] bool(true) [18:54:14] manybubbles: Ah [18:54:24] PROBLEM - Host ps1-d2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [18:54:24] PROBLEM - Host ps1-d3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [18:54:56] manybubbles: We need to set $wgDisableTextSearch = false; for Cirrus [18:55:06] Reedy: sounds good to me [18:55:11] I'm slightly confused why we set it to true in CommonSettings.php [18:55:18] "# This is overridden in the Lucene section below" [18:55:24] PROBLEM - Host ps1-c1-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [18:55:24] might be a relic of before lucenesearch [18:55:34] <^d> We should just remove that. [18:56:06] I presume wgDisableSearchUpdate wants to stay as true? [18:56:08] (03PS1) 10Ottomata: Using only Kafka log disks in diskstats monitoring, adding to kafka ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129231 [18:56:21] (03PS2) 10Ottomata: Using only Kafka log disks in diskstats monitoring, adding to kafka ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129231 [18:56:24] PROBLEM - Host ps1-c2-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [18:56:26] (03CR) 10jenkins-bot: [V: 04-1] Using only Kafka log disks in diskstats monitoring, adding to kafka ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129231 (owner: 10Ottomata) [18:56:48] Overridden in Cirrus config, not in Lucene [18:56:51] <^d> Reedy: Disable search update should be *false*, like core. [18:56:54] PROBLEM - Host ps1-c3-pmtpa is DOWN: PING CRITICAL - Packet loss = 100% [18:57:03] <^d> Just remove the CommonSettings/lucene-common overrides. [18:57:08] ok [18:57:48] (03PS3) 10Ottomata: Using only Kafka log disks in diskstats monitoring, adding to kafka ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129231 [18:57:59] !log Running deleteEqualMessages.php on amwiki (bug 43917) [18:58:06] Logged the message, Master [18:58:12] !log reedy updated /a/common to {{Gerrit|I865a08779}}: Fix $wmgBetaFeaturesWhitelist for labs too [18:58:17] (03PS1) 10Reedy: Remove $wgDisableTextSearch and $wgDisableSearchUpdate overrides. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129234 [18:58:18] Logged the message, Master [18:59:31] (03PS2) 10Reedy: Remove $wgDisableTextSearch and $wgDisableSearchUpdate overrides. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129234 [18:59:50] (03CR) 10Reedy: [C: 032] Remove $wgDisableTextSearch and $wgDisableSearchUpdate overrides. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129234 (owner: 10Reedy) [18:59:57] (03Merged) 10jenkins-bot: Remove $wgDisableTextSearch and $wgDisableSearchUpdate overrides. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129234 (owner: 10Reedy) [19:00:08] Silly old convoluted config [19:00:44] !log reedy synchronized wmf-config/ 'I543df75e364171a71a48f18429972b662b542894' [19:00:49] Logged the message, Master [19:00:55] (03CR) 10Ottomata: [C: 032 V: 032] Using only Kafka log disks in diskstats monitoring, adding to kafka ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/129231 (owner: 10Ottomata) [19:05:35] (03PS1) 10Ottomata: Oo, devices => array of disks did not work as advertised for diskstat plugin, commenting out [operations/puppet] - 10https://gerrit.wikimedia.org/r/129235 [19:05:59] (03CR) 10Ottomata: [C: 032 V: 032] Oo, devices => array of disks did not work as advertised for diskstat plugin, commenting out [operations/puppet] - 10https://gerrit.wikimedia.org/r/129235 (owner: 10Ottomata) [19:11:02] (03PS1) 10Mwalker: Customize the node runtime per OS [operations/puppet] - 10https://gerrit.wikimedia.org/r/129238 [19:12:13] (03PS2) 10Mwalker: Customize the node runtime per OS [operations/puppet] - 10https://gerrit.wikimedia.org/r/129238 [19:12:47] (03PS3) 10Mwalker: Customize the node runtime per OS [operations/puppet] - 10https://gerrit.wikimedia.org/r/129238 [19:12:52] (03PS1) 10Tim Landscheidt: Tools: Remove redundant package requirement [operations/puppet] - 10https://gerrit.wikimedia.org/r/129239 [19:14:14] RECOVERY - Host ps1-d1-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 36.88 ms [19:18:08] (03CR) 10Jgreen: [C: 031 V: 031] Customize the node runtime per OS [operations/puppet] - 10https://gerrit.wikimedia.org/r/129238 (owner: 10Mwalker) [19:18:16] (03CR) 10Jgreen: [C: 032] Customize the node runtime per OS [operations/puppet] - 10https://gerrit.wikimedia.org/r/129238 (owner: 10Mwalker) [19:21:38] paravoid: does varnish have collapsed-forwarding style logic for thumb_handler.php requests? [19:25:03] (03PS1) 10Mwalker: Puppet templates are 'content' [operations/puppet] - 10https://gerrit.wikimedia.org/r/129241 [19:26:51] meh, that would still be per URL and not file, so thus of limited use [19:27:05] and would be disabled for views with auth cookies [19:28:32] (03CR) 10Jgreen: [C: 032 V: 031] Puppet templates are 'content' [operations/puppet] - 10https://gerrit.wikimedia.org/r/129241 (owner: 10Mwalker) [19:31:34] RECOVERY - Host ps1-d2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 49.07 ms [19:34:07] !log Running deleteEqualMessages.php on dawiki (bug 43917) [19:34:13] Logged the message, Master [19:37:40] !log Running deleteEqualMessages.php on hrwiki (bug 43917) [19:37:40] !log Running deleteEqualMessages.php on hrwiktionary (bug 43917) [19:37:45] Logged the message, Master [19:37:52] Logged the message, Master [19:42:14] RECOVERY - Host ps1-d3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 37.37 ms [19:48:45] RECOVERY - Host ps1-c1-pmtpa is UP: PING WARNING - Packet loss = 80%, RTA = 45.11 ms [19:50:54] !log Running deleteEqualMessages.php on iawiki (bug 43917) [19:50:54] !log Running deleteEqualMessages.php on afwiki (bug 43917) [19:50:54] !log Running deleteEqualMessages.php on brwiki (bug 43917) [19:51:00] Logged the message, Master [19:51:07] Logged the message, Master [19:51:14] Logged the message, Master [19:59:50] (03PS1) 10Hashar: contint: split Zuul server and merger (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 [20:00:01] (03CR) 10Hashar: [C: 04-1] contint: split Zuul server and merger (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 (owner: 10Hashar) [20:00:14] RECOVERY - Host ps1-c2-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 41.46 ms [20:04:51] !log deployed Parsoid 9c99b0be (deploy SHA cf5eb4d0) [20:04:58] Logged the message, Master [20:08:54] RECOVERY - Host ps1-c3-pmtpa is UP: PING OK - Packet loss = 0%, RTA = 39.24 ms [20:11:19] !log Running deleteEqualMessages.php on euwiktionary (bug 43917) [20:11:19] !log Running deleteEqualMessages.php on gvwiki (bug 43917) [20:11:25] Logged the message, Master [20:11:31] Logged the message, Master [20:14:11] !log Running deleteEqualMessages.php on miwiki (bug 43917) [20:14:11] !log Running deleteEqualMessages.php on mlwiki (bug 43917) [20:14:16] Logged the message, Master [20:14:23] Logged the message, Master [20:14:50] (03PS2) 10Hashar: contint: split Zuul server and merger (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 [20:16:54] (03PS3) 10Hashar: contint: split Zuul server and merger (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 [20:18:57] hashar: have you seen: https://gerrit.wikimedia.org/r/#/c/129189/ ? [20:24:49] (03PS4) 10Hashar: contint: split Zuul server and merger (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 [20:26:05] (03CR) 10Matanya: contint: split Zuul server and merger (DO NOT SUBMIT) (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129292 (owner: 10Hashar) [20:27:45] matanya: dont waste your time with the Zuul change :] [20:27:59] ok :) [20:28:08] matanya: it is merely a draft. will surely abandon it and repurpose it when it is done [20:28:17] matanya: thanks for the review though! [20:28:26] I have copy pasted those bits from OpenStack :] [20:28:29] Utopic Unicorn meh [20:28:57] sure, always at your service sir [20:58:37] ^d, I don't know if I need to do this yet; but is there a way to delete a gerrit commit? I just submitted a patchset of 3.2M lines of node.js dependencies (hopefully most of its cruft I can get rid of) [21:00:01] <^d> mwalker: Not easily. Lots of manual wrangling. [21:01:18] hokay; is it a problem that there is a patch this big sitting around? [21:01:38] outside of, if I actually cant get the size down, it's going to bloat the heck out of my repo [21:03:26] <^d> mwalker: Well yes, it'll bloat the repo if it can't be reasonably compressed. [21:04:04] <^d> If nothing references it then gc should probably prune the object. [21:04:10] <^d> But still, it's a pain to get there. [21:06:10] *cough* I just tarred and gzipped the directory... 107MB [21:06:54] it would be better not to dump large things into a git repo, if you can avoid it some other way [21:07:36] we deploy node services using git deploy; our current method of building the dependencies is to commit them to git [21:07:52] *build them on some other host than the host they'll end up on; and commit them to git [21:08:02] this is making me rethink that strategy [21:08:35] cd lib [21:09:50] mwalker: There is new magic in trebuchet for managing binaries during a deploy. ottomatta and/or manybubbles might be able to help you figure out if it would be good for your use case [21:10:11] bd808: is based on git-fat [21:10:42] mwalker: you use git-fat to commit a hash and a pointer to a place to fetch the file on the cluster then you git-deploy it [21:10:52] and the nodes use git-fat to pick it up [21:11:07] how do you get your binaries to the cluster in the first place? [21:12:02] if your node.js deps really are remote js projects hosted on e.g. github, you could also use git submodules to reference a specific tag in a remote repo for each one [21:13:07] ya; npm doesn't support doing that; but I've been tempted in the past to make it support it [21:13:19] still would have to call out to github on the cluster though and that's not allowed [21:16:49] seems like the kind of situation that would be better served with a build process that generates a tarball or deb pkg [21:17:14] where the real git repo is a small source repo with build scripts that know how to fetch deps + build the final package, and the final package is what gets pushed and installed [21:17:38] how come we trust Debian packages, then?:P [21:17:52] what do you mean by trust? [21:18:30] we use them without committing to our VCS... [21:19:28] <^d> You can't use github repos as submodules with our deploy setup. [21:19:33] <^d> tin can't read github ;-) [21:19:33] I assume what bothers you there is the lack of review process as opposed to the vcs itself? [21:20:36] if we trust packages.ubuntu.com, why can't we do the same with npmjs.org? [21:20:55] even though through our own proxy [21:21:06] oh, I don't know the answer to that [21:22:36] if npmjs.org has prebuilt packages that obviate all of the above mess, it would seem silly that we don't find a way to (review+?)use them [21:23:12] the only problem is that these packages are not .deb but tarballs with .js files [21:24:43] well if the package format is the only issue (and it may not be!), there's: https://www.npmjs.org/package/npm2debian [21:24:46] MaxSem: welcome to the Ruby vs. Debian debate, which has been going on for years [21:25:17] yeah Perl has similar issues if you aren't lucky enough for all your deps to already be in the distro's repo as packages [21:27:08] yeah, so Ruby shops trust wheatever the authoriative source of gems is - us not doing so with node module is a matter of us having experience with debs but less of it with other things. but as we're drifting towwards SOA we'll have to change our habits [21:28:14] keeping blobs of third-party code in version control is not going to work well as their number grows [21:29:18] MaxSem: I have a friend who worked at Groupon, they reviewed, vetted, and packaged the gems they needed in house. [21:29:49] I dont actually think that's a bad idea; some of the nodejs stuff is scary terrible [21:30:37] however having dependencies in packagees means that developers who currently update node services will nott be able to do it as this requires root [21:38:39] (03PS1) 10Aaron Schulz: Set FileRender pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129306 [21:38:49] (03CR) 10jenkins-bot: [V: 04-1] Set FileRender pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129306 (owner: 10Aaron Schulz) [21:55:40] !log moved cp301[34] ethernet ports to private1-esams [21:55:46] Logged the message, Master [22:20:01] (03PS1) 10BBlack: add ipv6 reverse for lvs300[1234] [operations/dns] - 10https://gerrit.wikimedia.org/r/129312 [22:20:03] (03PS1) 10BBlack: move cp301[34] addrs to private1-esams [operations/dns] - 10https://gerrit.wikimedia.org/r/129313 [22:20:34] (03CR) 10BBlack: [C: 032 V: 032] add ipv6 reverse for lvs300[1234] [operations/dns] - 10https://gerrit.wikimedia.org/r/129312 (owner: 10BBlack) [22:21:33] (03CR) 10BBlack: [C: 032 V: 032] move cp301[34] addrs to private1-esams [operations/dns] - 10https://gerrit.wikimedia.org/r/129313 (owner: 10BBlack) [22:29:15] (03PS1) 10BBlack: move cp301[34] to private1-esams [operations/puppet] - 10https://gerrit.wikimedia.org/r/129317 [22:32:21] (03CR) 10BBlack: [C: 032 V: 032] move cp301[34] to private1-esams [operations/puppet] - 10https://gerrit.wikimedia.org/r/129317 (owner: 10BBlack) [22:47:34] <^d> RobH: You about? [23:03:46] is anyone doing swat? if not, i can take it [23:06:33] (03PS1) 10Andrew Bogott: Disable Andre Engels shell accounts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129321 [23:06:35] (03PS2) 10Aaron Schulz: Set FileRender pool counter config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129306 [23:07:28] (03PS2) 10Andrew Bogott: Disable Andre Engels shell accounts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/129321 [23:10:36] the ubuntu installer is so horrible. long random delays with no information as to what might be timing out or going wrong, messed up console output that's hardly readable, etc. [23:14:35] SWAT time! [23:14:59] All that's in the calendar is a change of my own, if anyone else wants to make last-minute additions they have 5-10 minutes :) [23:15:30] RoanKattouw: oh, hey, you jsut added it :P [23:15:38] Yeah, retroactive addition, sorry [23:15:49] no problem at all, just making sure i'm not crazy [23:15:53] I had meetings all day and so at 3:58pm James was like "hey, do we need to cherry-pick something?" [23:15:57] RoanKattouw: hmm [23:16:19] RoanKattouw: https://gerrit.wikimedia.org/r/#/c/127472/ + https://gerrit.wikimedia.org/r/#/c/127473/ ? [23:16:36] Nemo_bis: ^ [23:16:49] i've actually gotta go now, so feel free to ignore this; sorry [23:16:53] I just heard about another VE change too :) [23:17:26] MatmaRex: If you schedule that for tomorrow you can even abandon the wmf22 one I suppose [23:18:49] MatmaRex: Oh that change looks OK, I'll pick it up for today's SWAT [23:19:18] thanks [23:20:17] MatmaRex: I suppose the extra span is to thwart the CSS selector that applies the ellipsis? [23:21:00] yeah, it's so that i have something i can measure the real width of [23:21:31] massive ugly mess, i hate the search suggestions modules [23:22:54] PROBLEM - Host lvs3001 is DOWN: PING CRITICAL - Packet loss = 100% [23:23:40] ^ nothing to worry about [23:26:14] RECOVERY - Host lvs3001 is UP: PING OK - Packet loss = 0%, RTA = 95.64 ms [23:26:50] MatmaRex: Rightly so [23:27:09] Maybe we can replace them with the search widgets from oojs-ui at some point [23:27:25] When crap like https://bugzilla.wikimedia.org/64334 has stopped happening :) [23:27:46] :) [23:37:10] !log catrope synchronized php-1.23wmf22/resources/src/jquery/jquery.suggestions.js 'Handle CSS ellipsis when calculating suggestions widths' [23:37:18] Logged the message, Master [23:38:49] !log catrope synchronized php-1.24wmf1/resources/src/jquery/jquery.suggestions.js 'Handle CSS ellipsis when calculating suggestions widths' [23:38:55] Logged the message, Master [23:39:54] !log catrope synchronized php-1.24wmf1/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js 'Unbreak badtoken recovery in mobile VE' [23:40:00] Logged the message, Master [23:40:35] !log catrope synchronized php-1.24wmf1/extensions/VisualEditor/lib/ve/modules/ve/ui/widgets/ve.ui.SurfaceWidget.js 'Fix surface focusing bug in Firefox' [23:40:40] Logged the message, Master [23:43:11] (03PS1) 10BBlack: update-initramfs for bnx2x early boot options [operations/puppet] - 10https://gerrit.wikimedia.org/r/129329 [23:50:52] (03CR) 10BBlack: [C: 032 V: 032] update-initramfs for bnx2x early boot options [operations/puppet] - 10https://gerrit.wikimedia.org/r/129329 (owner: 10BBlack)