[00:00:32] <^d> Any roots around who mind doing a few minutes of spelunking for me? [00:00:53] <^d> (It's easy, I promise) [00:01:06] (03PS2) 10Ori.livneh: puppet-merge: warn if multiple committers [operations/puppet] - 10https://gerrit.wikimedia.org/r/110104 [00:01:18] ^d: possibly -- what do you need? [00:01:26] <^d> I'm trying to move the purge-checkuser cron from hume to terbium. I'm not sure how often and when it currently runs on hume. [00:02:28] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 6.586 second response time [00:02:55] hume:/etc/cron.d/mw-purge-checkuser doesn't exist [00:03:44] <^d> Grrr :\ [00:06:28] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 310587 bytes in 7.431 second response time [00:06:35] (03PS2) 10Danny B.: skwiki: Configure transwiki import sources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109723 [00:06:40] (03CR) 10Reedy: [C: 032] skwiki: Configure transwiki import sources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109723 (owner: 10Danny B.) [00:06:47] (03Merged) 10jenkins-bot: skwiki: Configure transwiki import sources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109723 (owner: 10Danny B.) [00:09:55] (03PS5) 10Odder: Enable per-wiki addition to 'translationadmin' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 [00:10:27] (03Abandoned) 10Chad: WIP: Move purge-checkuser script off hume and to terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/108165 (owner: 10Chad) [00:16:44] (03CR) 10Reedy: [C: 032] Enable per-wiki addition to 'translationadmin' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 (owner: 10Odder) [00:16:51] (03Merged) 10jenkins-bot: Enable per-wiki addition to 'translationadmin' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 (owner: 10Odder) [00:17:30] (03PS6) 10Chad: Properly puppeti[sz]e purge-checkuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/74591 (owner: 10Reedy) [00:18:05] (03CR) 10jenkins-bot: [V: 04-1] Properly puppeti[sz]e purge-checkuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/74591 (owner: 10Reedy) [00:18:38] (03PS11) 10Dereckson: Throttle now handles IP ranges. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/65644 [00:18:43] (03CR) 10Reedy: [C: 032] Throttle now handles IP ranges. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/65644 (owner: 10Dereckson) [00:19:24] (03Merged) 10jenkins-bot: Throttle now handles IP ranges. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/65644 (owner: 10Dereckson) [00:19:47] (03PS7) 10Chad: Properly puppeti[sz]e purge-checkuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/74591 (owner: 10Reedy) [00:20:11] !log reedy synchronized wmf-config/ [00:20:19] Logged the message, Master [00:20:34] (03PS1) 10Ori.livneh: gdash: fix capitalization of dashboard name [operations/puppet] - 10https://gerrit.wikimedia.org/r/110109 [00:20:56] (03CR) 10Ori.livneh: [C: 032 V: 032] gdash: fix capitalization of dashboard name [operations/puppet] - 10https://gerrit.wikimedia.org/r/110109 (owner: 10Ori.livneh) [00:22:28] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:22:56] <^d> Well crap. [00:23:02] You stole my commit [00:23:03] ! [00:23:13] <^d> Reedy: Yes I did :) [00:24:23] :( [00:24:53] what's going on with gitblit? [00:25:20] ori: Hump Day Strike [00:26:28] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 321746 bytes in 7.279 second response time [00:28:54] !log gitblit on antimony crashed with org.eclipse.jetty.io.EofException. trace: . lots of java.lang.NullPointerException due to malformed URLs, but these appear to happen continuously. [00:29:02] Logged the message, Master [00:29:20] <^d> Yeah I saw that recently. [00:29:40] <^d> Something's not encoding the /s in repo names to %2Fs [00:32:02] Is somebody looking at "PHP Warning: dba_fetch() expects parameter 2 to be resource, boolean given in /usr/local/apache/common-local/wmf-config/missing.php on line 76" [00:32:18] lol [00:32:19] No [00:32:22] That'll have been me [00:32:26] "me" [00:33:28] damn it [00:34:15] !log reedy synchronized wmf-config/missing.php [00:34:22] Logged the message, Master [00:34:31] !log another recurrent error in antimony:/var/log/upstart/gitblit.log : "org.eclipse.jgit.api.errors.JGitInternalException: Garbage collection failed." repeats for each repository. traces: [00:34:39] Logged the message, Master [00:34:39] !log reedy updated /a/common to {{Gerrit|I2294bac73}}: Throttle now handles IP ranges. [00:34:42] (03PS1) 10Reedy: Move function_exists( 'dba_open' ) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110111 [00:34:47] Logged the message, Master [00:35:05] (03CR) 10Reedy: [C: 032] Move function_exists( 'dba_open' ) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110111 (owner: 10Reedy) [00:35:11] (03Merged) 10jenkins-bot: Move function_exists( 'dba_open' ) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110111 (owner: 10Reedy) [00:40:58] ori: That org.eclipse.jetty.io.EofException is probably just log noise. It means that the user-agent closed the socket before the response was sent (i.e. the client timed out waiting for the resource) [00:43:05] <^d> ori: I have jgit gc turned on for the gitblit replicas. [00:43:13] <^d> I wonder if we should gc using normal c git instead. [00:43:18] <^d> Might be...less painful [00:48:50] bd808: *shrug* I'm just logging it so ^d / opsen are aware; I can't troubleshoot it further atm. [01:42:13] * Eloquence waves to chasemp2  [01:51:11] cmjohnson1: hi [01:51:19] cmjohnson1: are you on-site perhaps? [02:10:25] (03PS8) 10Yurik: Handle HTTPS for Zero traffic [operations/puppet] - 10https://gerrit.wikimedia.org/r/102316 [02:16:57] !log LocalisationUpdate completed (1.23wmf11) at 2014-01-29 02:16:57+00:00 [02:17:07] Logged the message, Master [02:31:30] !log LocalisationUpdate completed (1.23wmf10) at 2014-01-29 02:31:30+00:00 [02:31:37] Logged the message, Master [02:49:16] ottomata: I think it should work with yajl1 aswell. [02:58:21] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-01-29 02:58:21+00:00 [02:58:29] Logged the message, Master [03:04:45] (03PS1) 10Andrew Bogott: Allow caller to specify Pin-Priority in apt::repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/110124 [03:08:21] 04:56 < Betacommand> save the .log as a txt file [03:08:21] 04:57 < Betacommand> open AWB [03:08:24] er [03:08:31] middle-click, sorry [03:08:38] paravoid: what? [03:08:42] sorry, wrong paste [03:33:26] (03PS1) 10Andrew Bogott: Pin the ubuntu-cloud repo so that dependencies work [operations/puppet] - 10https://gerrit.wikimedia.org/r/110126 [03:33:49] Coren, puppet+havana is cheered up by ^ and ^^. Thanks for diagnosing. [03:58:07] Does anyone feel like helping me debug a strange puppet logic thing? [03:58:16] re: https://dpaste.de/POHr [04:15:59] andrewbogott: What's the strangeness? [04:16:13] scfc_de: The 'unless' clause doesn't work [04:16:19] It always does it no matter what [04:16:27] When I run that 'unless' line on the cmdline it works properly [04:16:51] Um, wait, I pasted the wrong thing. One moment... [04:17:01] Have you tried replacing it with /bin/true? [04:17:18] OK, this is the good bit [04:17:19] https://dpaste.de/p7Vf [04:17:38] I will try [04:18:25] So… 'unless true' means that it should not run [04:18:37] In my understanding, yes. [04:18:40] Somehow the meaning of 'unless' keeps flipping in my head. been looking at this too long :) [04:19:30] I'm not sure if you could use "onlyif", or if there are additional differences to "!unless" :-). [04:20:03] I tried using onlyif, the behavior then was that the exec never ran. [04:20:15] :-) [04:20:22] So I think puppet understands what unless and onlyif means, but the actual shell command is not evaluating as I'd expect. [04:20:30] I can give you a login if you're interested in tinkering with the cmdline [04:20:54] Possibly ${glance_db_name} is empty in the 'unless' clause? That would explain the behavior I see [04:21:08] But, I can see that it is set properly in the exec command that immediately follows [04:21:21] andrewbogott: if you're just looking to avoid this without tinkering with unless, note there is CREATE DATABASE IF NOT EXISTS [04:21:59] The advantage of "unless" is that otherwise Puppet will log the command on every run. [04:22:22] fair enough [04:22:25] Also I would like to understand... [04:22:46] Although, the thirst for wisdom often leads to damnation [04:23:12] And if you replace the unless command with "echo ${glance_db_name} > /var/tmp/log.txt"? [04:23:27] There [04:23:47] scfc_de: That's a good one! Will try next... [04:23:53] is also an option to show what commands Puppet is actually executing, but I forgot it. [04:24:06] (It takes five mins or so to run each test) [04:24:50] scfc_de: more than just -v [04:24:51] ? [04:25:15] Don't remember, sorry. [04:27:31] ok, unless => '/bin/true' does not create the database. So that is encouraging [04:35:49] andrewbogott: what would the $HOME and $PATH be for the command when run by puppet? [04:36:04] springle: I'm not sure -- do they matter? [04:36:12] Shouldn't everything that matters be in my.cnf? [04:36:22] -uroot would only detect the /root/.my.cnf if $HOME was set properly [04:37:20] Ah, which contains the password… hm [04:40:03] although if unless => /bin/true caused the command to run, which also uses -uroot, probably not the problem :) nm me [04:43:00] why does /bin/true use -uroot? [04:43:32] And the command is failing, could be for lack of password rather than (as I assumed) due to the presence of the db [04:43:40] although then how would the db exist in the first place? Hm... [04:44:42] springle, https://dpaste.de/vRHg [04:44:45] Here is what I think: [04:45:05] - when puppet runs due to the periodic puppet cron, $home is correct, so db is created properly. [04:45:28] - when I run puppet via puppetd -tv, $home is wrong (it's mine) so the 'unless' clause fails [04:45:41] So… this is bad behavior is only visible when I look for it. [04:45:44] Sound credible? [04:46:07] sounds possible [04:46:23] i like puppet puzzles. is this one complicated to explain, or is there a code snippet to look at it? [04:46:26] *to look at [04:46:37] * ori is just joining in, my kibbitzer sense tingled. [04:46:50] ori, the original snip is https://dpaste.de/p7Vf [04:47:07] if you use -uroot as 'andrew', you would need to have /home/andrew/.my.cnf with root credentials [04:47:14] And the problem is -- when I do puppetd -tv, puppet tries to create the db every single time, whether or not it really exists. [04:48:10] so, let's see… is there sudo syntax that says 'use root env instead of mine'? [04:48:24] other than sudo su - etc. [04:50:37] you can set the HOME environment variable for that exec [04:50:53] yeah, but... [04:51:21] now I'm thinking that my standard debug process ($ sudo puppetd -tv) is inherently flawed. If it differs in behavior from the cron puppet run [04:51:26] then I must reform my process! [04:51:32] * andrewbogott is testing to make sure this is the case [04:51:48] just use an explicit --defaults-file=/root/.my.cnf each time? assuming it's readable by both the final puppet run and your test env... [04:52:14] sorry, who would I be passing --defaults-file to? mysql? [04:52:39] yes [04:52:48] that's better than messing with shell environment [04:52:50] /usr/bin/mysql -uroot --defaults-file=/root/.my.cnf [04:53:10] it's not --defaults-extra-file ? [04:53:15] hmm [04:53:22] oh no, it's --defaults-file [04:53:30] Ok, I've verified that if I sudo su - first then all is well. [04:53:38] no, don't [04:53:43] springle's solution is way less hacky [04:54:13] But… now I'm thinking this isn't actually a problem at all. [04:54:18] Since 'real' puppet runs as root anyway. [04:54:24] It's only my test process that scrambles the env [04:54:30] it's still ugly. [04:54:48] you were confused by it, right? which means that its behavior is hard to reason about, right? so make it explicit. [04:54:56] Patching the environment for every single exec in our codebase… also ugly? [04:55:12] sudo relies on the shell environment being just so; --defaults-extra-file reads the global configuration file first, which is another environment factor. --defaults-file means the exact behavior is fully specified in the command line; there are no hidden forces. [04:55:40] I guess I can fix the bits I'm looking at without having to modify the whole codebase. Halfmeasure better than no measure at all [04:57:21] full measure better than half measure :P [04:58:36] no more half measures, ori [04:58:52] go big or go home! [04:59:05] * ori readies the nukes. [05:03:48] hm, any chance --defaults-file isn't supported by my version of mysql? [05:05:37] andrewbogott: You're right. [05:05:40] ah, nm, it just doesn't like it after -uroot [05:06:09] You have to use "--defaults-file=/root/something" -- the "=" is important. [05:06:40] MySQL has some very strange manners. [05:09:02] Hrm. [05:09:09] Bye, Ken. [05:15:19] (03PS1) 10Andrew Bogott: Pass in explicit --defaults-file=/root/.my.cnf to db creation calls. [operations/puppet] - 10https://gerrit.wikimedia.org/r/110128 [05:16:30] ori, ^ makes my puppet runs all quiet and happy [05:17:55] (03CR) 10Ori.livneh: [C: 032] "I'll let you merge." [operations/puppet] - 10https://gerrit.wikimedia.org/r/110128 (owner: 10Andrew Bogott) [05:18:27] to avoid this sort of repetition i create a 'sql' resource for the mysql module in mediawiki-vagrant [05:18:53] so you can write, e.g.: mysql::sql { 'add user': sql => "create user 'monty'@'localhost'", unless => "select 1 from mysql.user where user = 'monty'", } [05:18:59] https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/modules/mysql/manifests/sql.pp [05:21:12] yeah, that's much easier to read [05:24:21] ori (and departed scfc_de and springle-afk) thanks for help w/sorting that [05:39:19] (03PS2) 10Andrew Bogott: Allow caller to specify Pin-Priority in apt::repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/110124 [05:39:21] (03PS2) 10Andrew Bogott: Pin the ubuntu-cloud repo so that dependencies work [operations/puppet] - 10https://gerrit.wikimedia.org/r/110126 [05:39:23] (03PS1) 10Andrew Bogott: Openstack Havana in eqiad, baby step: [operations/puppet] - 10https://gerrit.wikimedia.org/r/110130 [05:43:03] (03CR) 10Andrew Bogott: [C: 032] Allow caller to specify Pin-Priority in apt::repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/110124 (owner: 10Andrew Bogott) [05:43:11] why do you need to pin? [05:43:36] because of https://gerrit.wikimedia.org/r/#/c/110126/ [05:43:49] but why pinning them? [05:43:51] Um… which, otherwise apt refuses to install the needed dependencies from the ubuntu cloud archive [05:44:00] why? [05:44:09] the regular apt repo should be in the same priority as the ubuntu-cloud one [05:44:11] Because the standard brewster repo is already pinned [05:44:39] right, so, it's ours (apt.wikimedia.org) > ubuntu precise, and ubuntu precise == ubuntu-cloud [05:44:52] yes... [05:45:01] which packages do we have in our repo that conflict? [05:45:43] there were several I believe… if I give me a few minutes I can reproduce the problem, maybe there's a better solution. [05:45:52] um… if you give me :) [05:46:18] sure [05:46:20] I wonder why [05:46:36] maybe they were imported for the existing openstack cluster instead of using ubuntu-cloud? [05:46:56] pinning is ok too, but it might bite you in the future [05:47:09] if you need to override e.g. a single package from ubuntu-cloud [05:47:17] Yeah, might be cruft from before the ubuntu cloud existed [05:54:06] (03CR) 10Andrew Bogott: [C: 04-1] Openstack Havana in eqiad, baby step: (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110130 (owner: 10Andrew Bogott) [06:28:24] (03PS1) 10Ori.livneh: graphite: fix storage aggregation patterns [operations/puppet] - 10https://gerrit.wikimedia.org/r/110133 [06:28:51] (03CR) 10Ori.livneh: [C: 032 V: 032] graphite: fix storage aggregation patterns [operations/puppet] - 10https://gerrit.wikimedia.org/r/110133 (owner: 10Ori.livneh) [06:29:36] andrewbogott: should i merge your change? [06:30:01] did I forget to run puppet-merge? If so then go ahead and merge... [06:30:09] * ori does [06:30:09] If you're talking about the patch in gerrit then, not yet please [06:30:18] no, the former [06:32:33] thx [06:37:35] ok… paravoid, do you want to log in and tinker with apt yourself? on labs host puppet-testing-6 you can see the pinning problem by doing apt-get install nova-api [06:37:43] 'nova-api : Depends: nova-common (= 1:2013.2-0ubuntu1~cloud0) but it is not going to be installed' [06:39:27] python-nova : Depends: python-jsonschema (>= 1.3.0) but 1.1.0-1~precise1 is to be installed [06:40:01] if you force that, it works [06:40:09] now let's find out why/where do we use this [06:40:58] python-jsonschema is hosted on brewster [06:41:08] So that would do it. Hard to know /why/ it's on brewster though... [06:41:19] whoa, python-nova is using python-jsonschema? [06:41:24] https://rt.wikimedia.org/Ticket/Display.html?id=4474 [06:41:27] ori requested it [06:41:42] yeah, it's used by eventlogging [06:42:23] right [06:42:31] so, I see the following solutions, andrewbogott [06:42:46] upgrade the package on brewster? [06:43:00] a) do the pinning thing you did, with the drawback that you have no way (other than pinning a specific package even higher) to override ubuntu-cloud [06:43:52] b) upgrade python-jsonschema in our repo to 1.3.0 (or ubuntu-cloud's version as-is), with the drawback that they may get out of sync again and break python-nova (unlikely, imho) [06:44:17] Yeah, not really worried about it breaking nova stuff… ori, will it mess with you if I upgrade that package? [06:44:23] c) downpin ptyhon-jsonschema to 500 specifically in the nova manifests (which is kind of ugly, but will work) [06:44:29] (03Abandoned) 10Matanya: emery: move left emery udp2log logs and sync jobs to erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/109957 (owner: 10Matanya) [06:44:41] possibly. i need to walk the dog, back in 5. [06:44:49] whoah, you have a dog too? [06:44:53] paravoid, c) involves downpinning particular package for particular repo? [06:44:54] man, where do you find the time [06:45:06] andrewbogott: yeah [06:45:43] paravoid: seems like b) is the preferred option if we have a good understanding of everyone else who is currently using that package. [06:45:53] I think so too [06:45:59] all of them suck, (b) sucks less :) [06:46:15] yeah b would be the way to go imho [06:47:08] paravoid: have sec for pm? [06:47:15] The question with b) is if we manually upgrade everything that currently uses that package so that version is consistent with future installs... [06:47:39] trusty has python-jsonschema 2.3.0 fwiw [06:47:51] you can test test that in labs andrewbogott [06:47:52] and it's 2-3 months away [06:49:02] hm, well… that might argue for a or c, and then just undoing later. Dunno, will wait for ori to comment. [06:50:00] (03PS1) 10Springle: depool db1042 for schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110136 [06:50:26] (03CR) 10Springle: [C: 032] depool db1042 for schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110136 (owner: 10Springle) [06:50:32] (03Merged) 10jenkins-bot: depool db1042 for schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110136 (owner: 10Springle) [06:51:10] (03PS1) 10Matanya: emery: remove one udp2log logger. [operations/puppet] - 10https://gerrit.wikimedia.org/r/110137 [06:51:19] !log springle synchronized wmf-config/db-eqiad.php 'depool db1042 for schema changes' [06:51:28] Logged the message, Master [06:56:06] (03PS1) 10Matanya: emery: remove rsync arab banner job [operations/puppet] - 10https://gerrit.wikimedia.org/r/110138 [06:57:34] ori, suddenly a work crew is setting up ladders in my room and I'm also a few hours late for lunch, so will have to catch your answer on the backscroll. Presuming that the upgrade won't break things for you, I will make an RT bug to track the by-hand upgrades. [06:58:24] andrewbogott_afk: i have no idea if it will or not [06:58:30] i need to read the changelog and test it [06:59:43] andrewbogott_afk: it looks fine, based on . i'd still prefer to be around when you upgrade. [07:01:34] (03PS1) 10Matanya: emery: move api logs to erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/110139 [07:02:47] (03PS1) 10Ori.livneh: graphite: re-enable logging of VE performance counters [operations/puppet] - 10https://gerrit.wikimedia.org/r/110140 [07:03:26] (03CR) 10Ori.livneh: [C: 032 V: 032] graphite: re-enable logging of VE performance counters [operations/puppet] - 10https://gerrit.wikimedia.org/r/110140 (owner: 10Ori.livneh) [07:04:05] i regret the assault of tiny commits this past week [07:04:33] there was a long tail of graphite / statsd / metric module niggles to discover and fix [07:16:00] hi mutante [07:17:19] when you are around, i'd like your help in decoming erzurumi (idle) [07:17:19] loudon (active secondary central logger) [07:17:19] payments1 (active paymentsdb master) [07:17:19] payments2 (idle) [07:17:19] payments3 (idle) [07:17:20] payments4 (idle) [07:17:22] db78 (active db+archive [07:17:24] pappas (active bastion) [07:18:04] jeff green approved in 6635 [07:22:48] PROBLEM - MySQL InnoDB on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:22:48] PROBLEM - MySQL disk space on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:22:48] PROBLEM - Full LVS Snapshot on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:22:48] PROBLEM - MySQL Recent Restart on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:22:58] PROBLEM - mysqld processes on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:23:08] PROBLEM - puppet disabled on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:23:08] PROBLEM - Disk space on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:23:28] PROBLEM - MySQL Idle Transactions on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:23:28] PROBLEM - MySQL Processlist on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:23:28] PROBLEM - RAID on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:23:31] how interesting [07:23:32] springle: i assume you're aware [07:23:39] :) [07:25:56] paravoid: any point in fixing Dynamic lookup of $cluster at /etc/puppet/manifests/ganglia.pp:168 is deprecated. Support will be removed in Puppet 2.8. Use a fully-qualified variable name (e.g., $classname::variable) or parameterized classes. [07:26:04] since we new have a module? [07:26:07] *now [07:27:21] yes, it's small and trivial to verify [07:28:02] (03CR) 10Dzahn: [C: 032] "removes Arabic Wikipedia Banner Pages" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110137 (owner: 10Matanya) [07:29:09] (03CR) 10Dzahn: [C: 032] "removes the cron for Arabic banners" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110138 (owner: 10Matanya) [07:29:11] ori: where does cluster come from? [07:29:34] no, mutante you shloudn't have merged that yet :/ [07:29:51] git grep 'cluster =' [07:30:09] matanya: siggggh?! [07:30:11] RT #6143 says: [07:30:11] Those should be fine to delete. Thanks again for checking! [07:30:12] - Jonathan [07:31:27] yes, but otto wanted to let the cron to run one more day, in order to remove all left logs [07:31:39] oh well, not critical [07:31:55] well, it didn't say so on the change :/ [07:31:59] revert or not [07:32:08] no [07:32:11] ok [07:32:17] see his comments on https://gerrit.wikimedia.org/r/109957 [07:33:13] i see, well i didnt notice because that was abanonded [07:33:19] at least he says "I'd rather move these filters one at a time" [07:33:22] and that's what we did [07:33:25] right [07:45:56] !log powercycled unresponsive db1042, /a tank data mount failed on boot, vgchange -a y + mount + xfs_check. still investigating [07:46:03] Logged the message, Master [08:12:23] (03CR) 10Lydia Pintscher: "@Peachey88: No. Wikidata relies on ULS more than any other of our projects. It has language built into its core like no other. We are prob" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/96771 (owner: 10Dereckson) [08:14:00] (03PS1) 10Matanya: deployment: puppet 3 compatibility fix: full path to puppet file server [operations/puppet] - 10https://gerrit.wikimedia.org/r/110145 [08:23:01] ori: can you please merge https://gerrit.wikimedia.org/r/#/c/100760/ ? [08:24:38] andrewbogott: did you have a chance glancing at my etherpad module? [08:25:44] (03PS13) 10Matanya: svn: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 [08:25:46] matanya: Not yet, sorry, focused on labs stuff [08:26:23] OK, get your chips [08:26:27] how many fix-up commits? [08:26:30] for SVN [08:26:43] i hope not [08:26:54] it happens to everyone [08:26:59] you can bet 0 if you like [08:27:10] not liely thoght [08:27:13] i say 3 [08:27:14] i'm going with 2 [08:27:19] *likely [08:27:23] look at that, you're more pessimistic than i am! [08:27:37] since i did the change :) [08:27:56] (03CR) 10Ori.livneh: [C: 032] svn: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [08:28:09] ori, if you're pulling a late night then we can try a python-jsonschema update now... [08:28:19] Or if you can suggest a good test to run on labs I'm happy to set that up [08:28:36] i have to shepherd matanya's svn patch now, let's see how well that goes first [08:28:45] what's your pick, btw, andrewbogott? [08:29:12] can we hedge 1 and 2? [08:29:13] For svn fixups? I'll go with 1, I like long odds. [08:29:15] unfair, you can split the fixes or squash them to adjust the winning number [08:29:22] hehe :P [08:29:31] mutante: ori always wins :P [08:29:46] we'll see, maybe not [08:29:57] mutante, that's risky -- even if you squash everything you might wind up with another one on the end. [08:29:57] that's what the people who always win always say as well :P [08:30:24] yuvipanda: i don't commit the fixups, matanya does [08:30:28] i report puppet failures [08:30:49] hmm, seems fair enough actually [08:31:06] no fucking way [08:31:27] matanya: http://p.defau.lt/?mXU0M674u0Z_AJY86Xx5_A [08:31:49] nice, good job matanya :) [08:31:54] sweet, i wanted it to be "subversion":) [08:31:57] wow! [08:32:02] matanya: thanks!! [08:32:07] wow! [08:32:09] nobody bet 0 [08:32:20] settle down, i have to do the other svn hosts too [08:32:52] one was just installing client [08:32:56] the other server [08:33:10] formey and antimony are both server [08:33:32] ok, then there was one more where it just uses the client role, right [08:34:42] antimony was a no-op run too aside from the motd, so that's 2/2 so far [08:34:59] ori: i meant subversion::client [08:35:02] is on bast1001 [08:35:07] yeah, puppet already running [08:35:11] cool! [08:37:21] no-op on bast1001 too [08:37:43] :) [08:37:56] fenari's last [08:39:38] should we still have svn clients on bastions? [08:39:57] i have no idea what they're doing there in the first place [08:40:16] Reedy probably knows [08:40:22] agree [08:41:32] fenari too [08:41:52] nice job, matanya! [08:42:11] & thanks for the patch [08:42:25] thank you :) some credit to mutante too [08:45:55] andrewbogott: there's a labs instance, yeah [08:46:08] Host deployment-eventlogging.labs [08:46:08] Hostname = I-00000733 [08:46:26] If I upgrade the package there will you be able to tell w/not it broke something? [08:46:39] we may need a ticket for "twemproxy on fenari" [08:46:39] And, are you pretty confident that eventlogging is the only thing that's using the package? [08:46:39] upgrade the package and run "eventloggingctl restart" [08:46:42] i'll be able to tell [08:46:46] linked to "out of Tampa" [08:47:12] i think so, yeah [08:47:26] it's pretty obscure [08:48:23] matanya: check doc.wm.org now for auto-generated docs it takes from module structure:) [08:48:33] Hm… using an 'asia' mirror when downloading to a tampa host? Not really the model of efficiency [08:48:51] https://doc.wikimedia.org/puppet/classes/subversion.html [08:51:15] (03PS2) 10Matanya: deployment: puppet 3 compatibility fix: full path to puppet file server [operations/puppet] - 10https://gerrit.wikimedia.org/r/110145 [08:51:24] ori: OK, upgraded and restarted. [08:51:40] * ori looks at the logs to be safe [08:51:47] (03CR) 10Ori.livneh: [C: 032] deployment: puppet 3 compatibility fix: full path to puppet file server [operations/puppet] - 10https://gerrit.wikimedia.org/r/110145 (owner: 10Matanya) [08:52:43] nice mutante, thaks :) [08:53:00] (03CR) 10ArielGlenn: [C: 032] snapshots: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109653 (owner: 10Matanya) [08:53:11] so many pings [08:53:26] matanya: if a .pp file has comments on the line right before a class{} or define{} it finds them, but only if there is no newline [08:53:58] should add some docs to some modules [08:54:03] and there are cases where just due to the newline it doesnt show up there [08:54:06] ori, going to merge your salt master changes [08:54:07] !log applying NTP access lists on cr{1,2}-{esams,knams,eqiad,pmtpa,sdtpa,ulsfo}, csw2-esams, pfw1-eqiad [08:54:11] er puppet-merge them [08:54:15] Logged the message, Master [08:54:24] apergos: i already did [08:54:49] we must have hit it the same time, palladium asked me to do them [08:55:18] kk, thanks either way [08:55:53] ori, so, that package is ensure=>present, so when I update brewster nothing will happen… for now. [08:56:19] until when? [08:56:26] this sounds like a good candidate for RT 135 [08:56:29] oh, until it gets reprovisioned [08:56:32] Until a new server uses that module in which case it will have a different version [08:56:35] or newly provisioned after a hw failure [08:56:36] right [08:56:41] Yeah, so might be best to force an upgrade just so we aren't surprised later. [08:56:49] Is it a bunch of machines? [08:57:03] just one, vanadium [08:57:10] Oh, easy then. [08:57:17] too many spammers in http://www.sub-bavaria.de/w/index.php?title=Spezial:Letzte_%C3%84nderungen&limit=500 and it's so difficult to get rid of them [08:57:30] too many spammers in http://www.sub-bavaria.de/w/index.php?title=Spezial:Letzte_%C3%84nderungen&limit=500 and it's so difficult to get rid of them [08:57:31] Are you feeling ok about that labs box or still looking? [08:57:37] still looking [08:57:39] (03PS1) 10Faidon Liambotis: Add mr1-eqiad.wikimedia.org forward record [operations/dns] - 10https://gerrit.wikimedia.org/r/110148 [08:57:41] (03PS1) 10Faidon Liambotis: Remove references to br1/2-ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/110149 [08:57:42] just a minute longer [08:57:56] btw, mutante, what's the deal with our reprepro server tampa vs. eqiad? Do they sync automatically or should I be mirroring my brewster changes someplace else? [08:58:02] good morning [08:58:06] morning hashar [08:58:07] ori, sorry, didn't mean to nag [08:58:23] ori: how do you find the puppet 3 compatibility issues? Was wondering if we could get a jenkins job to report them [08:58:35] ask matanya, i just merged it [08:58:38] it's his change [08:58:44] hashar: https://etherpad.wikimedia.org/p/Puppet3 [08:58:48] (matanya, it applied cleanly btw) [08:58:50] paravoid: gave me the list [08:59:04] thanks ori, it is a ggod day today :) [08:59:08] andrewbogott: - migrate install-server and apt.wm.org from brewster to carbon is still unresolved [08:59:14] https://rt.wikimedia.org/Ticket/Display.html?id=6133 [08:59:20] matanya: seems its extracted from syslog. nice [08:59:24] ok, so brewster is still the one and only? That's simple for me :) [08:59:30] andrewbogott: afaik, yes [09:00:03] andrewbogott: wait, did i misunderstand you? are you *downgrading* python-jsonschema from 1.30 to 1.10? [09:00:13] nope [09:00:17] 1.10 -> 1.30 [09:00:24] hashar: it seems soon DNS repo will not have tabs anymore. does that mean you'd want a ticket to adjust jenkins checks there? [09:00:37] * Jasper_Deng diffueses all of ori's nukes [09:00:52] ori: and I just now upgraded deployment-eventlogging to 1.3.0... [09:00:54] I think? [09:00:55] oh, right, ok. i was thrown off by the fact that apt-get upgrade on the labs instance said The following packages will be DOWNGRADED: python-jsonschema. but that's because you upgraded it [09:01:07] yep! [09:01:20] mutante: the DNS lint job does not check for tabs, but we can surely tweak it to bail out whenever tabs are found [09:01:36] that would be great hashar [09:01:41] hashar: gotcha, yea that was the idea [09:01:48] but of course only after it's been changed [09:02:23] you can amend the job description at https://git.wikimedia.org/blob/integration%2Fjenkins-job-builder-config.git/master/operations-misc.yaml#L24 [09:02:40] aka ssh://gerrit.wikimedia.org:29418/integration/jenkins-job-builder-config.git and edit operations-misc.yaml [09:03:50] andrewbogott: +2 [09:03:55] all clear, etc. it works just fine. [09:04:00] hashar: minor: on doc.wm.org when you look at classes, there is always a "Validate" link to w3.org but when you hit it "No Referer header found" bug? [09:04:13] ok, thanks. Do you want to do the upgrade on vanadium or shall I? [09:04:44] could you? just the package update; no need to restart the service. i'm satisfied that it'll restart properly if needed. [09:04:47] mutante: could it be that your web browser does not send referer ? (privacy concern) [09:04:52] hashar: re: jenkins, ok, operations-misc.yaml :) thx [09:05:19] hashar: yea, could be extension, totally, ignore [09:05:46] i installed something to change my referer for testing in the past [09:07:55] ori: "#6562: Configure twemproxy to bind a unix domain socket" is this related to where it runs or not at all? [09:07:57] mutante: we might be able to get rid of the w3 link as well [09:08:20] andrewbogott: ori: for puppet 3 , we might be able to migrate beta cluster to it if there is any way to do so. That is a nice playground area :-] [09:10:14] ori: ok, vanadium is upgraded [09:10:32] hashar: That would be nice, especially if we can migrate it for 20 minutes and then immediately switch it back :) [09:10:48] andrewbogott: thanks! [09:10:54] ori, now I get to find out if this actually fixes the thing I wanted it for [09:11:18] hashar: andrewbogott it might be easier if we merge https://gerrit.wikimedia.org/r/#/c/108289/ before [09:12:39] andrewbogott: murphy's law says no [09:12:56] Yeah, then I'll just pin 'cause I know that that works. [09:14:26] no, come on! [09:15:43] !log reenable ospfv3 on the eqiad/esams link [09:15:49] Logged the message, Master [09:15:51] hey, i was also about to merge a lint change on imagescaler.pp , or is this getting too much at once now ?:P) [09:17:02] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:17:48] wild guess: xfs deadlock [09:17:52] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [09:18:17] no [09:18:33] people have been saying that it's i/o saturated [09:18:36] I logged in yesterday [09:18:53] found pretty quickly that its BBU has gone bad and it has switched to write-through [09:19:02] BBU? [09:19:07] battery backed unit [09:19:12] on the raid controller [09:19:14] ah [09:19:21] so of course it's i/o saturated [09:19:32] waiting for cmjohnson to replace it [09:19:37] should be fine after that [09:20:22] see rt 6717 [09:20:24] can you guys think of any reason to NOT remove Tampa srv* and mw* from dsh group files yet? (Chad asked for it because it slows down scapping) [09:21:08] this is like the 6th time I've been asked about this [09:21:14] chad has been prodding me in real life too [09:21:31] I explained to him that the appservers in tampa still work [09:21:47] and they serve as a backup [09:21:52] andrewbogott: I have no clue how we could get puppet3 installed on the beta instances though [09:21:54] doitdoitdoit [09:22:06] and [citation needed] on slowing scap [09:22:16] hashar: you can update the package [09:22:23] paravoid: thanks for clarification, i'm gonna take that as a -1 on https://gerrit.wikimedia.org/r/#/c/108070/ [09:22:23] scap is hierarchical now, i don't see why it slow down everything [09:22:42] hashar: we would probably neet beta to use a beta-local puppetmaster, then we could just upgrade puppet everywhere by hand or with salt... [09:22:43] it doesn't slow down the other hosts. but scap doesn't print progress indicators. [09:22:48] dsh still deploys to several hosts at once [09:22:51] not hard, but I don't think now is the time [09:23:01] so it just hangs on that last host. but the wait is indistinguishable from general slowness. [09:23:09] that last host is searchidx1001 [09:23:14] that was my experience at least [09:23:17] right [09:23:28] which was far, far, slower than any pmtpa apache [09:23:29] which is the BBU issue ... [09:23:31] and there both topics are linked, heh [09:23:40] oh, sorry [09:23:43] i missed part of the context [09:23:50] didn't realize were were talking about the tampa apaches [09:24:04] andrewbogott: was thinking about upgrading a single box (for example a varnish cache), then fix puppet manifests until they pass [09:24:08] we were talking about searchidx1001 above, then you said and [citation needed] on slowing scap, so i assumed that you were talking about that [09:24:16] andrewbogott: then upgrade another instance (ex: application server), and fix puppet again [09:24:37] hashar: why do it by hand? [09:24:42] hashar, well, no need to do it in beta first in that case, just mock up a labs instance that resembles a beta box. [09:25:48] andrewbogott: ah yeah that will work as well :D [09:26:01] though using beta would break it whenever someone write a bad manifest hehe [09:26:33] yeah, but best to make sure it works in theory before in practice :) [09:26:41] in addition to slowing scap, pmtpa apaches also make memcached-serious.log useless [09:26:42] (03CR) 10Dzahn: [C: 04-1] "per paravoid on IRC: "appservers in Tampa still work and serve as a backup" - "scap is hierarchical now, i don't see why it slow down ever" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108070 (owner: 10Chad) [09:27:37] MaxSem: do you have any clue what errors are in memcached-serious.log ? [09:28:18] hashar: it's a libmemcached bug [09:28:29] tim spent a while chasing that down, paravoid too i think [09:28:45] there's a thread or three about it on the ops list i think [09:29:08] oh right, we need to deploy tim's changes [09:29:10] it's on us [09:29:59] thx [09:37:59] paravoid, so... measure scap now, temp remove pmtpa from dsh, measure again?:) [09:38:26] we can also remove all of eqiad, that would make it blazingly fast :) [09:38:41] support [09:38:53] will also save us from wasting time on development [09:39:29] we could require 4 people in the same room to hit a sequence of keys before scap is run [09:39:39] 's', 'c', 'a', 'p'? [09:40:55] ori: missing a return [09:41:13] we should establish a 'Change Control Commitee' that needs to approve all +2s [09:41:36] and it should consist of stakeholders from all relevant areas! Design, Product, Ops, Community, Upper Management, Middle Management, Lower Management... [09:41:45] that should solve all our problems, I think [09:41:57] who's the product owner? [09:42:58] https://www.mediawiki.org/wiki/Project_management_tools/Review [09:43:02] ah [09:43:16] i heard mingle is losing [09:43:24] that will be devastating [09:43:32] yuvipanda: one day I will have to write a documentation about how we managed projects at previous job. That was a crazy long chain :-] [09:43:58] ori: I think that goes to the person with the fanciest hair in the room? [09:44:02] you guys can comment on the talk page at https://www.mediawiki.org/wiki/Talk:Project_management_tools/Review [09:44:03] or is that the fanciest glasses? I'm not sure [09:44:03] you can "add your section" on the Talk: page of that [09:44:06] if you want to [09:44:21] what hashar said, andre said yesterday it was the "last call" [09:44:23] andre__ is looking for feedback about our current tools (gerrit/mingle/trello/bugzilla..) [09:44:35] so make your voice hear [09:45:02] * ori prefers to snark [09:45:17] I spent a lot of time last week talking about it with various people. A bunch are eager to migrate to phabricator which apparently could replace everything (gerrit review / bugzilla / mingle .. ). [09:45:20] i'm so disorganized that i'm not one to talk [09:45:29] * yuvipanda also prefers to not talk about it now and then just complain later on [09:45:35] i do think we should migrate to phabricator [09:45:39] or at least seriously consider it [09:45:45] i regret not militating for that in the past [09:46:04] there's never a lack of "let's do it different":) [09:46:27] ori: the reason was that phabricator did not match our workflow two years ago (aka pre commit review) [09:46:39] woah, we've been on Gerrit for two years :| [09:46:42] feels like yesterday [09:46:48] ori: and yeah I agree it is probably the best choice for us nowadays since it seems to match all requirements AND it is written in PHP. [09:46:50] it also didn't have ldap integration iirc [09:46:59] i does now [09:47:19] that would surely be better than perl (bugzilla) and java (gerrit). Plus offer a nice integration of all utilities [09:47:29] no more sync bots between different tools :_D [09:47:35] heh, it feels weird considering how something being written in PHP is a 'good' thing, but considering the horror of horrors that is GWT... [09:47:45] it's just a much better tool [09:47:55] yuvipanda: and we have a STRONG PHP community, so that would make it easier for folks to tweak the software [09:48:03] the command-line tooling is amazingly better [09:48:05] agreed hashar [09:48:14] you can basically tell that someone sweated the details to make it awesome [09:48:14] OpenStack is using launchpad which is a nice utility as well [09:48:28] just maybe we should not always change everything after we just got a few volunteers to use gerrit , there's always something better, but how about fixing old boring clean-up tickets first [09:48:36] though for some reason they are considering migrating out of launchpad to a custom made software written in … python! (openstack is all python) [09:48:38] yeah, but I don't think it is anywhere near reasonable to try to run our own launchpad instance [09:48:47] plus bzr? [09:48:54] mutante: i know, but gerrit is truly, truly awful. [09:49:15] Gerrit matched our workflow expectation which was to do pre commit review for mw/core and ops/puppet [09:49:26] so that was merely a few of us pushing for Gerrit against everyone else [09:49:38] to fix a major issue we had at that time: getting thing reviewed and deploying faster [09:50:15] insisting on a tool that enforces a pre-commit review workflow was going to make reviewing/deploying things faster...? [09:50:16] I think that has been successful albeit it has been painful to everyone and caused lot of people to end up frustrated by Gerrit GUI :( [09:50:28] yup more or less [09:50:43] to me it feels more like no matter what UI you use, some people will hate it, but it does the job, doesnt it [09:50:43] the problem with post commit review was we had only a bunch of people actually doing review [09:50:48] and way more people submitting patch [09:51:03] mutante: it's a question of how many people hate it and how legitimate their gripes are [09:51:07] * andrewbogott pretty much likes gerrit. [09:51:12] when we deployed 1.19, that has been a nightmare. We even had to abandon the REL1_19 branch and retrench it from master [09:51:21] * andrewbogott is also history's greatest monster -- unrelated coincidence [09:51:25] I think it took us close to a year to do 1.18 -> 1.19 [09:51:29] ori: so you say we need a poll and numbers? fair [09:52:19] if we had some alternatives read to use in labs, and then you let people try it and vote, i guess [09:52:48] another thing being talked about is merging our three ticket systems: bugzilla, RT and whatever Office IT is using [09:53:20] yeah, we have too few migrations going on at this time [09:53:21] hashar: yea, but from a technical point, RT wins, heh [09:53:39] hashar: OIT uses Zendesk or somesuch I think [09:53:41] just need to change some queue permissions and add more, done [09:53:53] while BZ can't easily replace role/group based permissions [09:54:01] for the non-public exceptions [09:54:26] Zendesk is a black box [09:54:35] way more than RT is [09:54:41] heh [09:54:43] yeah [09:55:58] anyway, be sure to write on https://www.mediawiki.org/wiki/Talk:Project_management_tools/Review :-D [09:56:03] i put "free and open source" as a requirement as well [09:56:05] even if it is only a few lines [09:56:14] and "not outsourced" [09:57:44] it might make sense to outsource it though :-D [09:57:52] might be more cost effective [09:59:31] does that apply to everything? [10:00:03] oh yea, convert all tickets to odesk ?:) [10:07:12] (03CR) 10Dzahn: [C: 032] "lint-only, don't see functional changes, checking puppet runs on tampa host first though" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109502 (owner: 10Matanya) [10:07:51] checking puppet run on imagescaler after lint change [10:08:13] mw75 first [10:08:38] (03Abandoned) 10Andrew Bogott: Pin the ubuntu-cloud repo so that dependencies work [operations/puppet] - 10https://gerrit.wikimedia.org/r/110126 (owner: 10Andrew Bogott) [10:08:47] notice: Finished catalog run in 40.20 seconds [10:10:27] mw1153 - also finished fine, nothing bad [10:10:30] matanya: [10:10:35] imagescaler.pp is also in [10:11:18] thank you, really nice day for my patches today [10:11:26] :) yea, merge day for you [10:12:36] (03PS2) 10Andrew Bogott: Openstack Havana in eqiad, baby step: [operations/puppet] - 10https://gerrit.wikimedia.org/r/110130 [10:17:59] (03CR) 10Dzahn: "git log says Mark renamed it in March 2013" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107128 (owner: 10Matanya) [10:18:02] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:18:57] (03PS1) 10TTO: Add variant rewrites for zhwikivoyage [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 [10:21:02] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [10:24:02] (03CR) 10Dzahn: "nothing happened on puppet runs, thanks!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109502 (owner: 10Matanya) [10:27:28] (03CR) 10Liangent: [C: 04-1] "Include zh-mo and zh-my here as well, or exclude them using wgDisabledVariants in InitialiseSettings.php. Ask the community about which on" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [10:29:09] (03CR) 10Dzahn: "hey ottomata, i merged the removal of one of the loggers, (and matanya pointed me to your comments on a related abandoned patch), i'm leav" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110139 (owner: 10Matanya) [10:49:10] (03PS1) 10Dzahn: payments1-4 decom,replace payments100x in ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/110156 [10:50:30] (03PS2) 10Dzahn: payments1-4 decom,replace payments100x in ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/110156 [10:50:37] (03CR) 10TTO: "Only wikipedia has zh-mo currently set up and no project has zh-my." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [10:52:00] (03CR) 10TTO: "Sorry, that doesn't address your point. Do you think you, Liangent, as a Chinese speaker, could ask the zhwikivoyage community?" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/110155 (owner: 10TTO) [11:02:11] (03PS1) 10Dzahn: decom "pappas" (formerly fr bastion) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110158 [11:02:13] (03CR) 10Andrew Bogott: [C: 032] Openstack Havana in eqiad, baby step: [operations/puppet] - 10https://gerrit.wikimedia.org/r/110130 (owner: 10Andrew Bogott) [11:18:12] PROBLEM - Puppet freshness on nickel is CRITICAL: Last successful Puppet run was Wed 29 Jan 2014 08:17:48 AM UTC [11:19:19] ori: hahah [11:19:27] somebody won [11:19:30] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class svn::client for nickel.wikimedia.org [11:19:39] 1 fix [11:19:50] That was me! [11:19:55] aha! [11:20:08] it's called subversion::client now [11:20:09] andrewbogott: [11:20:21] mutante: I mean, it was me that bet on 1 fix [11:20:28] oh, hehe, ok:) [11:20:32] heh [11:20:33] I know nothing about the failure [11:20:39] yes, andrewbogott won it fair and square [11:20:40] i'll see [11:20:54] What does it mean when a production server (e.g. virt1001) cannot ping anything outside of wmnet? Surely that's not by design…? [11:21:00] I mean, I know it's not by /my/ design [11:22:18] These boxes seem to have reasonable dns. Just… no internets [11:24:59] (03PS1) 10Dzahn: fix svn::client requirement in ganglia::web [operations/puppet] - 10https://gerrit.wikimedia.org/r/110163 [11:25:48] * andrewbogott fears that the silence means he has asked a Stupid Question [11:26:27] andrewbogott: what do you mean outside of wmnet? [11:26:52] wait, I think I understand the question [11:27:00] are you aware that we don't do NAT at all? [11:27:16] i.e. everything that is on private vlans doesn't have internet reachability [11:27:31] (03CR) 10Dzahn: [C: 032] "this should fix the puppet run on nickel" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110163 (owner: 10Dzahn) [11:27:47] we use a (forward) http proxy for the few cases that need some kind of reachability, like e.g. security.ubuntu.com [11:28:23] so not 100% clean [11:28:28] ? [11:28:34] @ mutante [11:28:42] RECOVERY - Puppet freshness on nickel is OK: puppet ran at Wed Jan 29 11:28:33 UTC 2014 [11:28:44] … sorry [11:29:06] (03CR) 10Dzahn: "matanya, yea, andrewB won. but fixed. <+icinga-wm> RECOVERY - Puppet freshness on nickel is OK" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110163 (owner: 10Dzahn) [11:29:32] andrewbogott: here are my chips [11:30:23] paravoid, that makes perfect sense, I just don't think about it much. [11:30:41] So, given that puppet is trying to pull packages down from ubuntu cloud archive... [11:30:48] matanya: that puppet run on nickel also made other changes now [11:30:54] that must have been merged recently [11:30:56] shall I set up a forward proxy for that? My guess is we have one already for pmtpa but not for eqiad [11:31:06] no, that should work [11:31:10] apt is configured for a proxy already [11:31:26] when I do apt-get update it hangs on the cloud archive. [11:31:26] matanya: ganglia-web-conf/conf/view_udp2log.json [11:31:42] W: Failed to fetch http://ubuntu-cloud.archive.canonical.com/ubuntu/dists/precise-updates/havana/Release.gpg Could not connect to ubuntu-cloud.archive.canonical.com:80 (91.189.92.152). - connect (110: Connection timed out) [11:31:44] etc [11:32:10] + "regex": "emery|oxygen|erbium" [11:33:28] paravoid, I see an entry for security.ubuntu.com, seems straightforward to add one for ^^ [11:33:52] but, not sure if that's the right approach (and, is maybe a Bad Idea?) [11:35:21] it works for swift [11:36:05] hrm, swift has a generic apt.conf [11:36:28] commit 836242d8b3e04164d9d130076bc7a5681063eb69 [11:36:32] Well, it's quite possible that these servers are not properly configured since I just set them up... [11:36:32] might have been the culprit [11:36:46] in any case, I think you can just add ubuntu-cloud to modules/apt/manifests/init.pp [11:36:53] it feels a bit of a hack, but it should do it for now [11:38:04] adding it to all servers vs. just the ones that need it -- doesn't trouble you? (fine with me if it is w/you) [11:38:12] PROBLEM - Puppet freshness on lanthanum is CRITICAL: Last successful Puppet run was Wed 29 Jan 2014 08:37:11 AM UTC [11:38:34] nah [11:39:15] it's harmless unless you actually have the repo defined [11:40:34] * andrewbogott types 'git review,' counts to 1000 [11:41:04] makes you wonder how many man hours are we losing waiting for gerrit & jenkins [11:41:13] in total, across the org [11:42:18] (03PS1) 10Andrew Bogott: Proxy ubuntu cloud archive so we can install openstack stuff. [operations/puppet] - 10https://gerrit.wikimedia.org/r/110165 [11:42:31] paravoid: that what you meant? [11:43:29] (03CR) 10Faidon Liambotis: [C: 032] Proxy ubuntu cloud archive so we can install openstack stuff. [operations/puppet] - 10https://gerrit.wikimedia.org/r/110165 (owner: 10Andrew Bogott) [11:43:35] (03PS1) 10Dzahn: fix svn::client include in contint module [operations/puppet] - 10https://gerrit.wikimedia.org/r/110166 [11:45:02] (03CR) 10Matanya: [C: 031] fix svn::client include in contint module [operations/puppet] - 10https://gerrit.wikimedia.org/r/110166 (owner: 10Dzahn) [11:45:25] (03CR) 10Dzahn: [C: 032] "fixing puppet run on lanthanum - role::ci::slave" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110166 (owner: 10Dzahn) [11:46:21] andrewbogott: sorry, 2 [11:46:30] dammit! [11:46:32] ori: you won [11:46:39] I think that means ori has it… for now [11:46:54] just one more, and i get it [11:47:02] RECOVERY - Puppet freshness on lanthanum is OK: puppet ran at Wed Jan 29 11:46:54 UTC 2014 [11:47:21] yuvipanda: ^ [11:47:42] hey, I hedged at 1 and 2! [11:47:44] so I sortof won? [11:48:42] paravoid: worked, thank you! [11:51:52] cool [11:54:35] andrewbogott: familiar with handling ssl certs/keys in labs, labs/private ? [11:54:48] i know there was a long thread etc [11:55:16] mmmmmaybe? What specifically? [11:55:20] people suggest adding keys to labs/private in gerrit for testing [11:55:28] but then that key is in public gerrit [11:55:48] https://gerrit.wikimedia.org/r/#/c/109480/1 [11:56:03] he suggested it to be able to test module in labs [11:56:10] where i would have just skipped the key install [11:56:13] and looked at the rest [11:56:43] Is this an old thread or something I haven't caught up with yet? [11:57:02] so we have star.planet.wikimedia.org , it's own star cert [11:57:07] Nothing in labs/private is private. But an explicit 'this is totally not private' key in there for testing… dunno, seems weird but ok. [11:57:08] but this is star.planet.wmflabs of course [11:57:23] the certs in labs/private are bogus, just placeholders. [11:57:41] yea, i want to confirm it makes sense to test this way [11:58:18] It sort of does. In a perfect world labs/private would have shadows of all our passwords and keys and such. [11:58:22] it's just the gerrit change right now, not another thread [11:58:33] he tried to test my pending module change [11:58:35] As long as you're mindful of the profound lack of security involved :) [11:59:16] If it's just for a one-off test, why not just generate the key locally? [11:59:25] afaik the special instance, labs-proxy, has keys but restricted access [11:59:30] and people need NDAs [12:00:18] yes, that's correct. [12:00:21] andrewbogott: i think it's just trying to do it the proper way [12:00:32] submitting that change vs. creating it locally [12:00:59] Yeah, I understand the impulse. Using labs/private is OK but I'd advise adding a comment there about how insecure it is. [12:01:09] ok, thanks for comments! [12:01:22] Um… this tweaks my 'dangerous security hole' radar but I can't think of an actual scenario that's dangerous [12:01:39] seeing a key file in public just trigger something, i know [12:01:52] yep [12:02:14] well, and, of course it depends on what is using the key. Whatever service relies on it will be utterly insecure. [12:02:26] So that's a reason to create the key locally, it keeps that instance moderately safer. [12:02:36] i think the only intention was to not have the puppet run break [12:02:40] when testing the module [12:02:44] * andrewbogott nods [12:06:40] !log deployed (broken!) havana/nova on virt1000, 1001, 1002, 1003, labnet1001. Should be safe, but any recent breakage on virt1000 is most likely a side-effect. [12:06:49] Logged the message, Master [12:07:08] ok, I don't have the heart to explore just how broken that is, back later... [12:11:28] https://wikitech.wikimedia.org/wiki/Deploy#Don.27t_leave_town [12:47:32] (03PS1) 10Dzahn: fix collectstats.pl cron for new Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/110170 [12:49:14] (03CR) 10Dzahn: "akosiaris, if you remember a review once on BZ module, why the cron does a "cd" first, this was why" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110170 (owner: 10Dzahn) [12:51:58] (03PS2) 10Dzahn: fix collectstats.pl cron for new Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/110170 [12:53:41] (03CR) 10Dzahn: [C: 032] "andre, that should get this off the blocker list too now" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110170 (owner: 10Dzahn) [12:59:08] \o/ [13:02:00] (03PS1) 10Dzahn: fix whine.pl whining cron for new Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/110172 [13:04:32] (03CR) 10Dzahn: "andre, eh, should we do this right now as well, or just put it on the list for switch day" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110172 (owner: 10Dzahn) [13:04:56] (03CR) 10Dzahn: "exactly the same fix, just for whining feature which is every 15 min" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110172 (owner: 10Dzahn) [13:05:25] (03CR) 10Aklapper: "no strong preference, but doing it right now means one item less we could forget? :P" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110172 (owner: 10Dzahn) [13:06:06] (03CR) 10Dzahn: [C: 032] "right, i'll just watch it doesn't spam anymore" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110172 (owner: 10Dzahn) [13:08:46] andre__: root@zirconium:~# cd /srv/org/wikimedia/bugzilla; ./whine.pl [13:08:49] no errors [13:10:07] andre__: and i'm killing that muzilla instance, since that also spammed and isn't up2date anymore with puppetmaster::self [13:11:24] so if you have labs instances using the role and you want the crons, you might wanna pull now [13:32:30] (03CR) 10Faidon Liambotis: [C: 032] Add mr1-eqiad.wikimedia.org forward record [operations/dns] - 10https://gerrit.wikimedia.org/r/110148 (owner: 10Faidon Liambotis) [13:33:00] (03CR) 10Faidon Liambotis: [C: 032] Remove references to br1/2-ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/110149 (owner: 10Faidon Liambotis) [13:47:34] (03PS1) 10Matanya: generic-definitions: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/110179 [13:47:39] hey [13:47:49] hello nosy [13:48:36] hello matanya [14:00:00] (03PS2) 10Matanya: generic-definitions: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/110179 [14:13:02] (03PS1) 10Springle: repool db1040, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110180 [14:13:30] (03CR) 10Springle: [C: 032] repool db1040, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110180 (owner: 10Springle) [14:13:36] (03Merged) 10jenkins-bot: repool db1040, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110180 (owner: 10Springle) [14:14:38] !log springle synchronized wmf-config/db-eqiad.php 'repool db1040, warm up' [14:14:47] Logged the message, Master [14:22:58] (03PS1) 10Aude: Revert "New extra language for wikidata: Ottoman Turkish (ota)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110182 [14:23:18] (03PS2) 10Aude: Revert "New extra language for wikidata: Ottoman Turkish (ota)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110182 [14:34:11] (03CR) 10Ottomata: "Awesooome!" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110104 (owner: 10Ori.livneh) [14:39:19] (03Abandoned) 10Ottomata: once more with commas [operations/puppet] - 10https://gerrit.wikimedia.org/r/109910 (owner: 10Gage) [14:42:25] (03PS3) 10Ori.livneh: puppet-merge: warn if multiple committers [operations/puppet] - 10https://gerrit.wikimedia.org/r/110104 [14:44:04] (03CR) 10Dzahn: "nice idea, i like" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110104 (owner: 10Ori.livneh) [14:48:36] (03PS1) 10Springle: db1040 full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110183 [14:49:01] (03CR) 10Springle: [C: 032] db1040 full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110183 (owner: 10Springle) [14:49:07] (03Merged) 10jenkins-bot: db1040 full steam [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110183 (owner: 10Springle) [14:50:02] (03CR) 10Cmjohnson: [C: 031] decom "pappas" (formerly fr bastion) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110158 (owner: 10Dzahn) [14:50:21] !log springle synchronized wmf-config/db-eqiad.php 'db1040 LB full steam' [14:50:29] Logged the message, Master [14:53:26] (03PS4) 10Ori.livneh: puppet-merge: warn if multiple committers [operations/puppet] - 10https://gerrit.wikimedia.org/r/110104 [14:53:32] (03CR) 10Ottomata: [C: 032 V: 032] puppet-merge: warn if multiple committers [operations/puppet] - 10https://gerrit.wikimedia.org/r/110104 (owner: 10Ori.livneh) [14:53:58] (03CR) 10Cmjohnson: [C: 031] add FIXMEs for erzurumi references [operations/puppet] - 10https://gerrit.wikimedia.org/r/109655 (owner: 10Dzahn) [14:54:08] (03CR) 10Matanya: [C: 031] decom "pappas" (formerly fr bastion) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110158 (owner: 10Dzahn) [14:55:02] (03CR) 10Matanya: [C: 031] add FIXMEs for erzurumi references [operations/puppet] - 10https://gerrit.wikimedia.org/r/109655 (owner: 10Dzahn) [14:55:59] (03CR) 10Cmjohnson: [C: 031] payments1-4 decom,replace payments100x in ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/110156 (owner: 10Dzahn) [14:56:07] (03PS2) 10Ottomata: emery: move rsync teahouse job [operations/puppet] - 10https://gerrit.wikimedia.org/r/109894 (owner: 10Matanya) [14:56:33] (03PS3) 10Matanya: emery: move rsync teahouse job [operations/puppet] - 10https://gerrit.wikimedia.org/r/109894 [14:56:39] (03CR) 10Ottomata: [C: 032 V: 032] emery: move rsync teahouse job [operations/puppet] - 10https://gerrit.wikimedia.org/r/109894 (owner: 10Matanya) [14:57:50] ottomata: all do all the rest in 4 seperate patches [14:57:54] *i'll [14:58:11] k thanks [14:59:03] still working but off IRC and stopping away nick usage again :) tty; [14:59:18] coren, paravoid, join us? [15:00:03] andrewbogott: Was about to. [15:00:08] (03PS2) 10Matanya: emery: move api logs to erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/110139 [15:00:15] you're not late, we're just early [15:00:28] labs migration meeting? [15:00:29] (03CR) 10Ottomata: [C: 032 V: 032] "We'll need to change the rsync job so that it starts copying from erbium, but yeah, let's wait a day to do that." [operations/puppet] - 10https://gerrit.wikimedia.org/r/110139 (owner: 10Matanya) [15:01:18] * andrewbogott nods [15:03:31] (03PS1) 10Ottomata: Fixing api-usage write location [operations/puppet] - 10https://gerrit.wikimedia.org/r/110185 [15:03:53] (03CR) 10Ottomata: [C: 032 V: 032] Fixing api-usage write location [operations/puppet] - 10https://gerrit.wikimedia.org/r/110185 (owner: 10Ottomata) [15:10:35] (03PS1) 10Ori.livneh: Drive-by lint! [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/110188 [15:11:30] ottomata: waiting for boarding call, nothing better to do :) [15:14:08] ori :) [15:14:15] All good, except hm [15:14:24] i know that if !defined isn't a really good check [15:14:29] but, i somehow like to do it anyway [15:14:38] as if, if I had control over the user/group in this case [15:14:40] that was set [15:14:59] then I would be able to make sure that all declarations are wrapped with if !defined [15:15:11] so, by at least doing it in cases where this could be a problem [15:15:19] i give the user of the module the ability to do so as well [15:15:46] i settled on hard-coding a custom user / group for services. i found that it's consistent with the behavior of debian packages and that parametrizing it rarely adds value [15:16:08] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Moving only nicely. After another round or two we might be able to run some catalog compilations and see what happens." (0312 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 (owner: 10Matanya) [15:16:48] hmmm [15:16:58] I mean, it's safe to assume that the user / group 'wikimetrics' won't be taken by anything else, right? And why would you want it to be something different? [15:17:45] hmmm [15:18:09] i think you are right in this case [15:18:14] i still like if !defined sometimes! [15:18:17] but it hink you are right [15:18:29] in that case, we should probably just get rid of the user,group params all together,eh? [15:19:00] and just hardcode them to 'wikimetrics'? [15:19:43] is there an internal variable that includes the module name? [15:19:51] yeah. it's more readable to provide an interface that presents the relevant configuration choices as parameters [15:20:12] and leave the rest to be implementation details that aren't worth bothering about [15:20:45] mutante: that is my least-favorite pattern, perhaps! [15:20:49] (03CR) 10Alexandros Kosiaris: [C: 04-1] generic-definitions: lint (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/110179 (owner: 10Matanya) [15:21:02] ori: heh, ok [15:21:42] i think there's a kind of instinctive horror of string literals that doesn't make sense in this case [15:22:52] there's no module polymorphism in puppet so using user { $module_name: ... } just means your reader has to pause and parse it [15:23:56] whereas user { 'wikimetrics': } is instantly clear [15:24:46] yeah ori is right. At first I thought source => "puppet:///{$module_name}/myfile" was cool. Then... who else but your module is ever going to evaluate that ? [15:25:24] s/{$/${/ and not i am not going to escape $ in that re [15:25:24] :P [15:25:30] the resentful person reading your code, that's who :P [15:25:38] ahahaha [15:25:59] heheh [15:28:09] haha,ok,thx [15:28:12] ttyl [15:30:21] (03CR) 10Alexandros Kosiaris: "Heh... yeah I was the only suggesting this change. I still don't like it this way but since it works let's leave it like that for now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/110170 (owner: 10Dzahn) [15:31:18] drdee: want me to wikify that etherpad? [15:32:46] (03PS9) 10Matanya: site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 [15:33:03] those rebases are nightmare [15:33:22] (03CR) 10jenkins-bot: [V: 04-1] site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 (owner: 10Matanya) [15:34:10] (03PS10) 10Matanya: site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 [15:35:19] (03PS3) 10Matanya: generic-definitions: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/110179 [15:36:00] ok, i'm out see ya :) [15:36:57] (03PS1) 10Manybubbles: Turn file search back on for commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110191 [15:37:10] (03CR) 10Manybubbles: [C: 04-1] "Hold for deployment window." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110191 (owner: 10Manybubbles) [15:37:43] (03CR) 10Manybubbles: [C: 04-1] "Hold for deployment window." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109692 (owner: 10Manybubbles) [15:38:24] (03CR) 10Alexandros Kosiaris: [C: 031] "There is one point that needs very minor point but otherwise LGTM. However given that this is a huge change and something might have slipp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 (owner: 10Matanya) [15:39:45] andrewbogott: thanks for wikiyfing the notes! [15:40:03] np [15:40:12] 'wikify' in this case = 'cut and paste' [15:41:54] (03PS1) 10Manybubbles: Start building the Cirrus index for huwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110192 [15:42:10] (03CR) 10Manybubbles: [C: 04-1] "Hold for deployment window." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110192 (owner: 10Manybubbles) [16:01:17] (03PS2) 10Ottomata: Drive-by lint! [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/110188 (owner: 10Ori.livneh) [16:01:42] (03PS3) 10Ottomata: Drive-by lint! [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/110188 (owner: 10Ori.livneh) [16:03:43] LeslieCarr: be warned that it is midnight on my clock so my ability to absorb & retain new information is rapidly waning :) [16:04:39] andrewbogott: so... the basic setup is pretty simple --- each virt host has a "front end" network connection that lets us talk to it (or in the case of the network node, facilitates communication between the virt hosts and outside world) [16:05:04] and then the "back end" connection, which does not have any direct connection with any of the outside world, is the labs subnet [16:05:11] (currently the 10.4.X ip space) [16:05:32] ok, so that would be e.g. frontend=eth0 backend=eth1, right? (Was just looking at that in a config file) [16:06:46] I guess what i mean is -- do these 'connections' correspond to the interfaces I'm familiar with? [16:09:11] oh [16:09:17] yes, frontend eth0 backend eth1 [16:09:33] A few days ago I made this edit: https://wikitech.wikimedia.org/w/index.php?title=IP_addresses&diff=97286&oldid=71257 [16:09:49] so the "fixed ips" are the backend vlan [16:09:54] But the range I specified for private eqiad IPs was a total guess. [16:10:05] i'm not sure we actually allocated a range.... [16:10:07] let me look [16:10:42] I just picked a range that was far away from anything else used, but -- I guess I didn't think that there had to be actual vlan config that agreed. [16:11:03] actually in this case there doesn't ... it's just good to have it documented [16:11:25] Ah, so it is arbitrary, great. [16:11:27] the vlan isn't IP'ed on the network gear -- its default gateway is the network node [16:11:36] and this is another layer to prevent it from talking directly [16:12:10] ok, so the backend interface can /only/ talk to labsnet1001 [16:12:27] so yay, according to rdns, there's already ip ranges picked out in 10.68.X [16:12:39] however rdns only has them as /24's -- i'd expand them to /20's i think [16:12:47] labs is currently just in labs row b [16:12:52] AH, so this is important to note [16:13:01] wait when you say 'already picked out' does that mean for labs, or someone else is using them already? [16:13:16] picked out for labs -- do you have the dns repo ? [16:13:29] I do, someplace. [16:13:48] labsconsole.wikimedia.org uses an invalid security certificate. The certificate is only valid for wikitech.wikimedia.org [16:13:58] templates/10.in-addr.arpa line 2822 [16:13:58] fwiw [16:14:35] mutante: as far as I know no one calls it labsconsole but me anymore. Barely matters I think. [16:15:03] labsconsole had an invalid cert for ages [16:15:06] andrewbogott: i had it saved as URL in keepass [16:15:08] LeslieCarr: Ok, I see it, makes sense. [16:15:10] that's all [16:15:21] from before the merge [16:15:36] mutante: yep, I think this is just a "Don't do that, then" situation. [16:15:42] hi paravoid [16:15:42] ok [16:15:42] so very important --- all vlans are row based in eqiad [16:15:54] q for you from Snaps and me about yajl1 vs yajl2 and debian unstable [16:15:56] and compute nodes need to be in the same backend vlan [16:16:06] so.... you must make sure compute nodes are all in row b [16:16:10] if you add new ones [16:16:19] unless you want to make a new, discrete, labs cluster [16:16:45] Yep, makes sense. This is familiar from when talking to Chris about hardware setup. [16:16:52] (03CR) 10Alexandros Kosiaris: [C: 032] generic-definitions: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/110179 (owner: 10Matanya) [16:17:40] WARNING: Revision range includes commits from multiple committers! [16:17:40] LeslieCarr: So, when a labs instance sends a packet out into the world... [16:17:45] it most certainly does not... [16:17:47] meh... [16:18:02] It always travels via the backend, and then the net node, and then out into the world? [16:18:12] mental note to look at puppet-merge bug [16:18:26] Or does this somehow depend on whether it's using public or private ip? (Which I don't even know what that would mean for outbound traffic) [16:18:28] yep [16:18:44] well if it's not got a public ip, then it can't really go past the network node [16:19:06] Ah, true, I guess if it's proxied then… well, then I know how that works. [16:20:10] So when I assign a public IP to an instance, does that reroute /all/ outbound traffic to the external interface? Or is it using both interfaces in some clever way? [16:21:20] (I realize this is getting a bit into Computers: How do they work? territory) [16:21:37] if it doesn't have a public ip it will use SNAT, leslie [16:21:44] there is one public ip set aside for that [16:21:59] if an instance does have a public ip assigned to it, it will just use that for outside communication [16:22:05] so all instances can actually talk to the internet [16:27:07] ah cool [16:27:11] did not realize that [16:27:42] andrewbogott: so it will route all traffic not destined for another labs instance to that external interface [16:28:11] LeslieCarr: yeah, so, that specific SNAT ip is also routed as a /32 to the network node separately [16:28:22] really the entire block is at this point [16:28:31] we had plans to have multiple network nodes [16:28:34] but that may make it more complicated [16:28:40] we'll just have to see if that's still feasible [16:28:42] I am briefly confused by how an instance /knows/ that it has been assigned a public IP… that must happen by dhcp I suppose. [16:28:50] no [16:28:58] instances don't have the public ip assigned at all [16:29:01] it's all NAT in one way or another [16:29:09] so the network node translates from public ip to internal ip and vice versa [16:29:19] well an instance's default route is the network node [16:29:19] except that, when there is a public ip for an instance, it can do that 1:1 [16:29:26] when there isn't, only outbound traffic gets source-NATed [16:29:32] and all instances share that public ip [16:30:10] So then why have two network interfaces on the vm hosts? [16:30:21] If everything goes through the network node... [16:30:33] I guess I am confusing VM host with VM. It sounds like the VM itself really does only use one interface. [16:36:36] yeah we used VM hosts and guests I believe [16:36:44] so [16:36:53] one interface on the compute nodes is for the systems themselves [16:37:06] so we are able to manage them with puppet and all that [16:37:14] they will also receive and send outside traffic on them [16:37:34] and one shared via bridge-utils for the VMs i suppose... [16:37:37] the other interface is on the special vlan that instances sit on [16:37:47] so they can bridge that traffic between all compute nodes [16:37:53] so it doesn't matter which compute node an instance is on [16:38:01] so that's the labs-instances vlan [16:38:14] storage ? over eth0 ? [16:38:16] so, a given instance may want to communicate with other instances on arbitrary compute nodes [16:38:16] and also [16:38:35] a given instance will send traffic to (currently) _the_ network node, which may be another compute node [16:38:41] it will do that over the second interface [16:38:58] ok, I think this is all making sense. [16:39:29] if it gets too confusing --- just look her e- http://cuteoverload.com/2014/01/28/oh-thank-heaven/ [16:40:18] akosiaris: depends [16:40:27] if storage traffic is sent by the hosts, it will be over eth0 [16:40:36] if it's NFS from an instance, it will be over eth1 [16:40:40] aaaah yes... cause VMs also mount NFS .. [16:41:01] well, glusterfs/nfs ... whatever... [16:41:01] OK, now, the network host itself… it as all of its interfaces bonded into one big super-speed interface, right? [16:41:20] Or is that 'all but one' are bonded, and one is left to talk to the outside world? [16:41:26] yes! [16:41:33] the latter [16:41:44] i'd actually like to have 10G for eqiad [16:41:46] So is that actually true, or is that just what should be true? [16:41:48] i guess it's too late for that now? [16:41:53] let's see what netmon1001 has... [16:42:21] mark, I thought that 10G wasn't supported on row B. Or maybe we just didn't have the hardware for it? Can't remember... [16:42:27] or labnet1001 .. [16:42:35] Definitely went round about the question for a while. [16:42:38] it is harder in row B [16:42:55] I guess we should just bond for now [16:43:02] well right now it's not bonded .... but it is possible to bond [16:43:15] wait, right now it's not? I'm sure that... [16:43:18] * andrewbogott digs for RT ticket [16:43:20] so... we could put a daughter card in this switch [16:43:33] and attach labnet1001 to that [16:44:22] if we swap out its network card [16:44:29] so only the network node you mean [16:44:34] yes that should be possible [16:44:39] eventually we'd like to have multiple network nodes though [16:44:42] then this would become problematic [16:44:44] RT ticket about bonding ports for that box… https://rt.wikimedia.org/Ticket/Display.html?id=6393 [16:44:58] That ticket suggests that all four are bonded [16:45:25] they are all 4 connected [16:45:27] but not setup for bonding yet [16:45:30] connected, not bonded [16:45:33] there's nothing stopping us from doing that now though [16:45:44] also not all in the right vlans.... [16:45:46] i think given our time constraints, we should just go with that [16:45:49] i can get that now [16:45:50] 2x2? [16:46:00] one bond for internal of 2 ports, one bond for external [16:46:28] iirc the internal gets much more traffic [16:46:37] why we did 1/3 in tampa [16:47:02] ok [16:47:15] which rack is that host in? [16:47:27] b3 (eqiad) [16:47:46] Ryan says "We have 3 bonded ports in pmtpa and it approaches saturation occasionally." We're ignoring that for now? [16:47:47] hmm shared with analytics [16:48:06] andrewbogott: i wish this wouldn't have been ignored earlier :) [16:48:09] it's a bit late to try to fix that now [16:48:20] we can try, but it has further potential to delay things [16:48:31] you know [16:48:31] We didn't really ignore it, just... [16:48:35] let's buy a new daughterboard [16:48:40] if we can fix it in time, we can [16:48:42] if not, we'll do bonding [16:48:59] let me shoot a procurement ticket with rush [16:49:06] (03CR) 10Chad: "lgtm, will merge during window." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109692 (owner: 10Manybubbles) [16:49:31] so i'll bond it now, while i'm thinking abotu it --- if we get the daughter card in time, it's no more work to delete that config from the switch [16:49:36] yes [16:49:38] indeed [16:49:42] (03CR) 10Chad: "lgtm, will merge during window" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110192 (owner: 10Manybubbles) [16:49:42] what sort of box is that? [16:49:58] 'bonding' is a software thing, not a hardware thing? [16:50:04] (03CR) 10Chad: "\o/ will merge during window" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110191 (owner: 10Manybubbles) [16:50:22] once all cables are connected, it's a matter of configuration on both sides, yes [16:50:25] yes, it's taking multiple physical ports and making them into one logical port [16:50:33] Ok, makes sense. [16:51:43] andrewbogott, mark: which RT tickets are related to the labs migration? [16:52:25] 6724 is what I just created [16:52:28] drdee: there were a bunch of procurement tickets, now closed. I haven't had my eye on any others. [16:52:33] :) [16:52:33] if you could do some searching and organization that would be much appreciated ;) [16:52:37] k [16:52:40] do you have access? [16:52:49] of course, i am the master of RT [16:53:02] !log aggregated labnet1001 secondary port [16:53:10] Logged the message, Mistress of the network gear. [16:53:50] (03PS1) 10Alexandros Kosiaris: Fix bug introduced in 4a79aa1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/110203 [16:54:22] meh... bad commit message [16:55:08] (03PS2) 10Alexandros Kosiaris: puppet-merge: Fix bug introduced in 4a79aa1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/110203 [16:55:55] So… it sounds to me like as things are now, virt hosts should still be able to talk to each other and the network host, they just won't hold up well under traffic. [16:56:04] Is that right? Or are things actually non-functional currently? [16:57:49] well currently i'm not sure if things are set up :) [16:58:00] i have a meeting now [16:58:03] awww, so many changes in puppet repo... [16:59:38] LeslieCarr: :-) [17:01:04] <^d> manybubbles: Ready? [17:01:16] oh, I suppose so. [17:01:30] wasn't paying attention [17:01:40] <^d> Heh, I can do the work. Just want you at least half paying attention :p [17:01:43] ok, have to fix up the puppet config [17:01:52] (03CR) 10Chad: [C: 032] Turn file search back on for commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110191 (owner: 10Manybubbles) [17:02:01] (03Merged) 10jenkins-bot: Turn file search back on for commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110191 (owner: 10Manybubbles) [17:02:17] (03CR) 10Chad: [C: 032] Start building the Cirrus index for huwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110192 (owner: 10Manybubbles) [17:03:27] ^d: thanks! [17:03:31] half paying attention [17:03:35] I can build the index too [17:03:36] ! [17:04:10] (03CR) 10Chad: [C: 032] Split some wikis into more shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109692 (owner: 10Manybubbles) [17:04:27] LeslieCarr: I need to turn in, can you ping my bouncer with your puppet change so when I get up I can see what you did? [17:04:45] ok [17:05:04] Unless it'll pay for me to stay tuned for another 10...? [17:05:24] it'll probably be about 15 or so [17:05:25] !log demon synchronized wmf-config/CirrusSearch-common.php 'Turn commons file searching back on' [17:05:33] Logged the message, Master [17:05:36] Ah, I can linger then [17:06:33] !log demon synchronized wmf-config/InitialiseSettings.php 'Enable Cirrus for huwiki + some shard config' [17:06:40] Logged the message, Master [17:07:00] <^d> The hell? [17:07:17] hell? [17:07:20] <^d> manybubbles: http://p.defau.lt/?SF8xdYHJv4rRY_vew3ZfIA [17:07:35] I was just building it [17:07:39] maybe we clashed [17:07:41] <^d> Must've. [17:08:09] <^d> Easily fixed. [17:08:48] I think it fixed itself because my process won [17:08:53] <^d> Heh [17:08:56] I've started throwing in jobs [17:09:11] (03PS1) 10Lcarr: Creating network config for eqiad nova network controller [operations/puppet] - 10https://gerrit.wikimedia.org/r/110206 [17:09:16] andrewbogott: ^^ [17:09:46] wee ^d [17:09:55] (03CR) 10Jgreen: [C: 04-1] "in ganglia.pp, in eqiad the ganglia aggregators are pay-lvs1001/pay-lvs1002 and they are already in the config--nix the new payments1001-p" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110156 (owner: 10Dzahn) [17:10:13] <^d> twkozlowski: wheee, indeed. [17:11:02] ^d: I think we're doing better with jobs [17:11:10] <^d> Oh yeah definitely [17:11:31] LeslieCarr: ok... [17:11:44] (03CR) 10Lcarr: [C: 032] Creating network config for eqiad nova network controller [operations/puppet] - 10https://gerrit.wikimedia.org/r/110206 (owner: 10Lcarr) [17:11:51] <^d> manybubbles: For enwiki: http://p.defau.lt/?s_7SOsAk2xcQJat8Ef1Jdg [17:12:12] that secondary number is a bit annoying [17:12:17] oh, oops, LeslieCarr, type [17:12:19] 'taged' [17:12:28] <^d> manybubbles: Secondary isn't as bad. [17:12:32] um… typo [17:12:32] <^d> I'll keep an eye on it today. [17:12:42] oh thanks [17:13:12] That class is currently applied to labnet1001 so you can check your handiwork if you want. [17:13:14] (03PS1) 10Lcarr: fixing typo - taged to tagged [operations/puppet] - 10https://gerrit.wikimedia.org/r/110207 [17:13:17] yeah, it isn't too bad. I think for now it isn't worth worrying about compared to other things [17:13:18] andrewbogott: ^^ [17:13:35] <^d> manybubbles: I'll take 16k over 1.6m ;-) [17:13:40] (03CR) 10Lcarr: [C: 032] fixing typo - taged to tagged [operations/puppet] - 10https://gerrit.wikimedia.org/r/110207 (owner: 10Lcarr) [17:13:42] (03CR) 10Andrew Bogott: [C: 032] fixing typo - taged to tagged [operations/puppet] - 10https://gerrit.wikimedia.org/r/110207 (owner: 10Lcarr) [17:13:44] damn right [17:13:50] ^d: Re Gerrit, are we now using a version where "secondary index" can be enabled for "file:" searches to work? [17:13:54] my children don't seem to understand "close the door, it is cold" [17:14:08] <^d> scfc_de: Yes! I need to run the indexer at some point. [17:14:14] <^d> Didn't want to do it all at once during the upgrade. [17:15:06] ^d: Perfect! Will you post to wikitech-l after that? [17:15:13] running puppet now andrewbogott [17:15:24] ok [17:15:30] <^d> scfc_de: Yeah. I'll have to read the documentation first and probably schedule the downtime. [17:15:32] I wouldn't know what to check for anyway :) [17:16:17] ^d: No problem, it would just be very nice to have :-). [17:16:33] <^d> Indeed, it will. [17:16:43] :( doesn't like augeas creating the bonded interface [17:18:18] hrm, i'll try just manually create that... [17:22:26] grrr colloquy crashed again... whatdo os x folks use ? [17:22:29] also got root@labnet1001:/etc/network# ifup bond1 [17:22:30] Waiting for bonding kernel module to be ready (will timeout after 5s) [17:22:31] Waiting for a slave to join bond1 (will timeout after 60s) [17:22:40] i know that we've experienced that problem int he past [17:22:46] trying to remember how we fixed it! [17:23:47] LeslieCarr, I use adium [17:25:11] (03CR) 10Chad: [C: 032] Remove underscore from class names LBFactory_* [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/96472 (owner: 10Siebrand) [17:25:18] (03Merged) 10jenkins-bot: Remove underscore from class names LBFactory_* [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/96472 (owner: 10Siebrand) [17:26:38] !log demon synchronized wmf-config/db-pmtpa.php 'Removing underscores from class names' [17:26:46] Logged the message, Master [17:27:05] !log demon synchronized wmf-config/db-labs.php 'Removing underscores from class names' [17:27:12] Logged the message, Master [17:27:33] !log demon synchronized wmf-config/db-eqiad.php 'Removing underscores from class names' [17:27:40] Logged the message, Master [17:28:04] andre__: woot [17:28:08] andrewbogott: i meant -- woot [17:28:12] bond1 is up [17:30:52] Now I'm nervously watching a conversation in #openstack where someone has suggested that neutron requires three different nics... [17:31:06] Waiting with fingers crossed for someone to say, no, two is enough [17:35:33] Ah, there it is [17:35:35] i did it with two in vmware, though neutron was less than impressive [17:36:38] andrewbogott, mark, Coren: I added an RT section to https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration#Relevant_RT_tickets based on searching through RT. Are there are any glaring omissions? [17:36:58] jgage, I may hit you up for assistance in the next couple of days. But first… sleep! [17:37:06] Thanks LeslieCarr, catch you later! [17:37:11] sleep well :) [17:37:23] cool ok, i'll copy that vm from $oldlaptop [17:41:19] LeslieCarr: re OS X irc clients: LimeChat and Textual have both been stable for me. I currently use Textual built from their github repo https://github.com/Codeux/Textual. [17:41:37] * yuvipanda loves LimeChat and it's scrolling bottom pane with chatter from all windows [17:41:55] can keep an eye on things on other channels without leaving [17:42:01] <^d> I hid that bottom pane ages ago. [17:42:06] <^d> :p [17:43:17] That is the biggest LimeChat feature that I miss in Textual. I think Textual is cleaner code to hack on for those of us that are so inclined but the authors kind of hide the fact that they are open source. [17:44:41] drdee: At first glance, that covers the big points. [17:47:17] Coren: ty [17:54:42] the abuse filter on enwiki is causing havoc [18:11:28] (03CR) 10Cmjohnson: [C: 031] decom professor, add decommissioning.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/109884 (owner: 10Dzahn) [18:12:56] (03CR) 10Cmjohnson: [C: 031] "This is an old SUN, there isn't any need to power back up." [operations/dns] - 10https://gerrit.wikimedia.org/r/109286 (owner: 10Dzahn) [18:13:53] jeff_green: do you want to review the changes mutante made and commit them or should I? [18:23:14] cmjohnson1: I reviewed the one set of changes I saw [18:25:28] (03PS1) 10RobH: changing ttl for svn.w.o for future migration behind misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/110213 [18:25:37] hrmm [18:25:48] our dns should support use of 5M in place of 1H [18:25:53] rather than the seconds... i think. [18:26:30] paravoid: You about? https://gerrit.wikimedia.org/r/#/c/110213/1/templates/wikimedia.org is legit right? [18:26:35] I rather not push to test ;] [18:26:44] (I know I can put in seconds instead, but meh.) [18:27:20] ack, disregard ping, found another one in a different template file, sorry dude [18:27:40] (03CR) 10RobH: [C: 032] changing ttl for svn.w.o for future migration behind misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/110213 (owner: 10RobH) [18:32:21] (03PS3) 10Cmjohnson: payments1-4 decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/110156 (owner: 10Dzahn) [18:37:30] (03CR) 10Jgreen: [C: 032 V: 031] payments1-4 decom [operations/puppet] - 10https://gerrit.wikimedia.org/r/110156 (owner: 10Dzahn) [18:39:32] RobH: the DNS change is okay, svn behind misc-web-lb is problematic, though. I just commented that on the RT [18:40:30] manybubbles: about? [18:40:50] paravoid: not me, I think [18:40:55] hi [18:42:14] since you're here, question: you are familiar with lsearchd, if memory serves, right? [18:42:37] (03PS2) 10Dzahn: decom professor, add decommissioning.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/109884 [18:42:51] (03CR) 10Cmjohnson: [C: 032] decom professor, add decommissioning.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/109884 (owner: 10Dzahn) [18:44:00] manybubbles: cmjohnson1 needs a 10' downtime for searchidx1001 to replace a hardware component; is it going to be possible? [18:44:39] manybubbles we can schedule for anytime (day or night)...whatever has least impact [18:44:46] paravoid: let me check [18:45:25] cmjohnson1: I'm actually kind of surprised that a BBU replacement needs a downtime, can you double-check? [18:45:31] that box is so hammered. [18:45:55] manybubbles: that's what the hardware replacement is about [18:45:58] to fix its I/O performance [18:46:06] that'd help. [18:46:21] write behind with battery backup? [18:47:01] paravoid: in order for it to fix the problem...then yes it will need to reboot. the controller will run the battery though a charge cycle to be sure it's OK. Then it will re-enable write-back on the cache [18:47:04] the BBU is probably faulty, which has made the raid controller fall back to write-through, which makes the box super-slow I/O wise, very frequently at 100% I/O wait [18:47:24] cmjohnson1: we can do that with megacli [18:47:59] honestly you can probably do it any time [18:48:05] just let me know and I'll restart the lsearchd process [18:48:12] because it doesn't have an init script [18:48:19] there are croned processess that run and do thing [18:48:23] there is nothing that says we can't do it hot but I would prefer not to jic [18:48:26] but they will pick up the next time around [18:48:51] ok, do it please [18:50:42] paravoid: ahh, noted, i'll read ticket in moment, thanks [18:52:28] manybubbles: lmk when you want to do it and go ahead and shut it down [18:52:42] cmjohnson1: you want me to shut it down? [18:52:48] any time is fine. now, whatever [18:53:14] ok now [18:53:48] !log shutting down searchidx1001 for hardware fix [18:53:57] Logged the message, Master [18:54:13] shutdown requested [18:54:25] I can't comment on how long that'll take [18:54:44] well, it is refusing ssh so I imagine it is mostly ready for you [18:54:50] cmjohnson1: ^ [18:54:50] okay..as soon as i see it go off I will make the swap [18:54:55] sweep [18:54:56] sweet [18:56:22] PROBLEM - SSH on searchidx1001 is CRITICAL: Connection refused [18:56:52] PROBLEM - RAID on searchidx1001 is CRITICAL: Connection refused by host [18:57:12] PROBLEM - puppet disabled on searchidx1001 is CRITICAL: Connection refused by host [18:57:12] PROBLEM - Disk space on searchidx1001 is CRITICAL: Connection refused by host [18:57:22] PROBLEM - DPKG on searchidx1001 is CRITICAL: Connection refused by host [18:59:57] yeah yeah yeah [19:00:08] still hasn't powered off yet [19:01:06] yikes. I wonder if I should have killed some processess to give it a push [19:05:02] PROBLEM - Host searchidx1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:09:52] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [19:10:02] RECOVERY - Host searchidx1001 is UP: PING OK - Packet loss = 0%, RTA = 1.50 ms [19:10:04] manybubbles ^ [19:10:12] RECOVERY - puppet disabled on searchidx1001 is OK: OK [19:10:12] RECOVERY - Disk space on searchidx1001 is OK: DISK OK [19:10:15] happy days [19:10:22] RECOVERY - DPKG on searchidx1001 is OK: All packages OK [19:10:22] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [19:11:31] I've restarted the incremental updater [19:11:53] I won't try to manually rekick off the croned tasks. They'll start on their own and folks are used to them being abit flakey [19:13:11] how much cache does it have? [19:23:05] (03PS1) 10Alexandros Kosiaris: Renamed labstore100[34] to labsdb100[45] [operations/dns] - 10https://gerrit.wikimedia.org/r/110220 [19:26:18] Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU [19:27:02] http://ganglia.wikimedia.org/latest/graph.php?r=2hr&z=xlarge&h=searchidx1001.eqiad.wmnet&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Search+eqiad [19:27:32] manybubbles: ^ [19:28:40] paravoid: thanks. The load could have dropped off because I didn't restart all the croned things. Best to check again in the morning. Still, I've noticed that it is able to write at a higher rate which is nice [19:30:07] it still gets high in i/o load, but the BBU should make a difference [19:30:11] ok, going now [19:30:12] bye all! [19:31:12] bye [19:33:17] bye paravoid [19:33:52] (03PS1) 10RobH: Revert "changing ttl for svn.w.o for future migration behind misc-web-lb" [operations/dns] - 10https://gerrit.wikimedia.org/r/110221 [19:34:09] ok, since it wont move, revert! [19:34:56] (03CR) 10RobH: [C: 032] Revert "changing ttl for svn.w.o for future migration behind misc-web-lb" [operations/dns] - 10https://gerrit.wikimedia.org/r/110221 (owner: 10RobH) [19:42:16] !log db1002 replacing disk at slot 1 [19:42:23] Logged the message, Master [19:45:22] PROBLEM - RAID on db1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [19:49:04] (03PS1) 10Ottomata: Adding jgonera to admins::mortals group for deploy rights [operations/puppet] - 10https://gerrit.wikimedia.org/r/110222 [19:49:27] !log tungsten replacing failing disk at slot 10 [19:49:36] Logged the message, Master [19:50:45] (03CR) 10Ottomata: [C: 032 V: 032] Adding jgonera to admins::mortals group for deploy rights [operations/puppet] - 10https://gerrit.wikimedia.org/r/110222 (owner: 10Ottomata) [19:52:22] PROBLEM - RAID on tungsten is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [19:52:24] cmjohnson1: ok to merge your payments decom commit? [19:52:44] ottomata: yes..sorry about that [19:54:18] no worries [20:11:55] ottomata: yo [20:12:09] hiyaaaa [20:12:21] so, yeah, let's delete those two gerrit repos [20:12:25] git-deploy and sartoris [20:12:26] oh ok! [20:12:27] awesome [20:12:42] our git-deploy is the python port of the perl git-deploy, and sartoris is the salt backend [20:12:42] it'll help reduce the confusion factor [20:12:47] nop [20:12:48] and we want to rename them both to trebuchet stuff [20:12:49] *nope [20:12:50] no? [20:12:54] we use the perl frontend [20:13:04] but our repository i mean [20:13:05] sartoris was a python port of that [20:13:14] we never used it [20:13:27] the backend of the system has always been in the deployment module in puppet [20:13:30] it's really never had a name [20:13:45] hm [20:13:49] we decided we'd name everything sartoris, then there was an issue with that that I'll ignore [20:13:58] so, I decided I'd name everything trebuchet [20:14:11] and then I wrote the frontend from scratch and called it trebuchet-trigger [20:14:47] now there's trebuchet (the salt backend), trebuchet-trigger (the git frontend) and trebuchet-ricochet (the web interface) [20:15:13] trebuchet is most up to date in puppet in the deployment module [20:15:38] I'm working on switching that upstream to git-hub, rather than wikimedia's puppet repo [20:15:49] here's all the stuff that's in github: https://github.com/trebuchet-deploy/ [20:16:06] for trigger and ricochet, github is the proper upstream [20:16:26] trigger has debianization [20:16:36] ricochet and trebuchet will soon too [20:16:42] ok cool, i think i'm mostly interested in trigger, right? [20:16:50] trigger and trebuchet [20:16:52] i want to build in support for grabbing jars from urls [20:16:59] and verifying hashes [20:16:59] trigger, as the name implies, just triggers a deployment [20:17:08] it could generate the jars, grab them, etc [20:17:13] and place them somewhere to be deployed [20:17:19] but to deploy it, you'll need to use trebuchet [20:17:30] hmm, interesting [20:17:30] hm [20:17:42] i'm thinking that this can acutally be done very simply, maybe not even dependent on java stuff [20:17:55] just enabling individual files to be deployed based on urls and hashes [20:18:07] yeah, that's sane [20:18:13] trigger could generate them [20:18:15] so, maintain a list of urls to .jar or whatever files in a config file somewhere [20:18:19] and trebuchet could grab the hashes [20:18:31] and then when a deploy happens, those files are dled, hashes computed and compared to what is in the file [20:18:35] config file [20:18:50] it may be a good idea to make a directory for each deployment [20:18:54] with all the necessary files [20:19:10] could this just be a feature of a normal git-deploy maybe? [20:19:11] on the minion side its ideal to be able to quickly revert [20:19:18] ignore git-deploy [20:19:21] ok [20:19:23] you don't need to use git [20:19:28] hmmm [20:19:31] (for deployment) [20:19:38] you can write any style of deployment you want [20:19:41] ok, i am about 25% familiar with this system, so appreciate the handholding [20:19:44] :) [20:19:47] (03PS1) 10Aude: Have each site group use own cache key for wikibase [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110224 [20:19:52] trigger (git deploy blah) is necessary [20:19:59] but for the backend it can be anything you want [20:20:06] you could use git annex [20:21:00] in fact, git annex may be a sane way to handle this, in fact [20:21:11] s/, in fact$// [20:21:27] oh hm [20:21:43] I'm assuming you've seen this? https://wikitech.wikimedia.org/wiki/Trebuchet/Design [20:21:57] ugh, this image is still named sartoris: https://wikitech.wikimedia.org/wiki/File:Sartoris.png [20:22:00] * Ryan_Lane renames [20:22:47] ah yes i have seen that [20:23:31] \o/ https://wikitech.wikimedia.org/wiki/File:Trebuchet.png [20:24:16] so, in that diagram, everything except for git-deploy is trebuchet [20:24:25] (and git-deploy is now trigger) [20:24:38] Ryan_Lane: the fetch is done against the local repository where git-deploy is run, right? [20:24:39] so [20:24:48] yeah [20:25:06] running git deploy sync triggers salt on the deploy hosts to fetch against the local repo there [20:25:08] ok [20:25:10] trebuchet modifies the origin and all submodules to point to the deployment server [20:25:35] yeah. technically it calls a runner, which calls the salt modules on the minions [20:26:29] but that's easy enough to ignore [20:26:30] hhmm git annex actually looks good [20:26:34] indeed [20:27:02] ottomata: what you want to look at is this: puppet/modules/deployment/files/modules [20:27:15] mostly deploy.py [20:27:24] right now it's written to be very git specific [20:27:55] deploy_server.pp? [20:27:56] we'd want to split this into multiple modules: deploy_git.py, deploy_.py, etc [20:28:18] then from deploy, you'd want to call the method based on the type of deployment from the config [20:28:32] oh files sorry [20:28:56] __salt__['deploy_.'](args, kwargs) [20:29:02] or something along those lines [20:29:16] that way we can have multiple methods that implement the same interface [20:30:03] this may take a bit of refactoring, of course [20:30:12] yeah, grokking... [20:30:21] so, I have a development environment for this in labs [20:30:22] RECOVERY - RAID on tungsten is OK: OK: optimal, 1 logical, 2 physical [20:30:36] which is terribly enough the "sartoris" project [20:30:43] let me add you to that project [20:30:52] I have a new trebuchet project I'll be moving everything into [20:31:29] What's your username on wikitech? [20:31:42] ok cool [20:31:43] ottomata [20:32:06] Ryan_Lane: if we were to use git annex, could we just add support for that, like _update_gitmodules? [20:32:13] yes [20:32:22] that woudln't need a new deployment method then [20:32:24] that would be a pretty simple way to do this without refactoring [20:32:33] I'm all for that if you'd like to do it [20:32:43] jars and other not-in-get content could be managed with just git annex commands then [20:32:47] hmm [20:33:02] so, in the sartoris project are the following instances: sartoris-server, sartoris-deploy, sartoris-target, sartoris-target4 [20:33:02] i think…that would be ok, right, hm [20:33:14] server is a puppet/salt master [20:33:20] deploy is a deployment server [20:33:31] target and target4 are targets ;) [20:33:36] ah ok cool [20:33:37] awesome [20:33:50] they are all pointing to the salt and puppet master that's on server [20:33:58] are the deployment projects puppetized like they are in productoin? [20:34:01] so you can do local dev there for testing [20:34:07] yeah [20:34:32] there's basically nothing custom in the dev environment right now [20:34:48] you can check out their puppet config via the "configure" action in wikitech [20:35:05] role::deployment::test? [20:35:21] ah, parsoid::production too [20:35:24] awesome [20:35:24] ok [20:35:24] yeah [20:35:27] coool [20:35:34] so, you can set up anything you want for targeting [20:35:38] awesome [20:35:42] it should work just like it does in production [20:35:46] ok great [20:36:03] I spent quite a bit of time setting this up during my last sprint :) [20:36:06] ok, so, since i'm really trying to get this working sooner rather than later, would you be ok with me adding the annex support (if that is what I do) to the python files in puppet? [20:36:12] yep [20:36:15] or would you rather me help you refactor the final stuff and get us up to date on that [20:36:19] do it in puppet [20:36:31] ok cool [20:36:32] we can work the changes upstream later [20:36:38] ok awesom, thanks [20:36:45] hopefully the anex thing will make this way easier, i don't fully understand it yet [20:36:49] the downside is that it won't show your changes in github [20:36:50] but it looks like it should work [20:36:54] unless you upstream the changes ;) [20:36:56] heh, yeah :/ [20:37:07] well, if the python is basically the same in upstream trebuchet [20:37:08] we'll do that when I'm ready to move things [20:37:11] so you get proper credit [20:37:13] then that shouldn't be hard? [20:37:17] yep [20:37:19] exactly [20:37:28] ok cool [20:38:07] so, a kind of pain in the ass way of doing dev for this is to develop inside of the puppet module [20:38:17] and to restart the puppetmaster, and to run puppet on the server [20:38:23] it'll sync the modules to the targets [20:38:33] oh... [20:38:34] hm ,heh, yeah [20:38:34] hm [20:38:49] I have trigger in use in labs, rather than the perl frontend [20:39:00] i could edit the synced modules on the targets? [20:39:01] so changes to that should go to github [20:39:10] you could, but I wouldn't recommend it [20:39:18] heheh [20:39:18] you could disable puppet [20:39:28] one sec [20:39:44] well, if annex works like I think it will, i don't think i will need to mess with the front end [20:39:51] true [20:40:16] if config['checkout_annex']: blablalb [20:40:17] so... [20:40:17] or whatever [20:40:42] on server: /srv/salt/_modules [20:40:47] that's where modules are sync'd from [20:41:12] if you disable puppet and edit that directly, you can just run: salt '*' saltutil.sync_modules [20:41:23] if you run puppet it's going to overwrite your work though! [20:41:29] so make sure to back that up [20:41:33] or to make it into a repo [20:41:34] ah ok awesome [20:41:45] ok cool, i'll probably editi locally [20:41:46] so ja [20:41:58] editing the modules directly on the minions is hard [20:41:59] Reedy: do you know if we have php pdo enabled on the cluster? [20:42:04] because it does stuff with caching and such [20:42:05] actually, when I need to test things like this in labs, i have a really hacky rsync setup so I can edit locally [20:42:16] heh [20:42:26] so i will probalby edit locally, rsync to sartoris-server (rigth?) and then run saltutil.sync [20:42:29] I usually edit in git, then do a git push [20:42:31] that would be fine, right? [20:42:34] yep [20:42:37] k cool [20:42:56] I'm definitely interested in having this support ;) [20:43:08] yeah, this will be simple and generic and awesome if it works this way [20:43:13] git sucks at large objects [20:43:20] annex is good with it, though [20:43:27] i've been playing with archiva, and it will let me curl whatever i want from it [20:43:32] and it works with s3/swift, etc [20:43:47] and annex looks like it verifies checksums, maybe? [20:43:57] yes [20:43:59] hmm, ok still ahve lots of research to do [20:44:11] basically, all we want is a way to maintain a list of files and checksums [20:44:14] let me know if you have any issues, I'm around to help [20:44:15] and change thme (in git?) [20:44:24] drdee: PDO and pdo_mysql seems to be on the apaches [20:44:24] that way the list goes through git review [20:44:28] awesooome [20:44:31] ok thanks Ryan_Lane [20:44:35] yw [20:44:37] * ottomata archives this conversation [20:44:40] thanks Reedy [20:44:46] * yuvipanda annexes the conversation  [20:45:03] Ryan_Lane: since you are around, do you know if this is still relevant: https://rt.wikimedia.org/Ticket/Display.html?id=2111 ? [20:45:40] drdee: for migration this doesn't matter [20:45:47] but it's still relevant, yes [20:45:50] ok, ty [20:46:01] maybe not with neutron, but I don't know what they use [20:46:59] (they being neutron) [20:53:42] ottomata: trigger isn't working properly with submodules yet [20:53:45] fyi [20:53:47] oh [20:53:56] it's on my plate for this week [20:54:14] new trigger or what we have now? [20:54:22] the old thing is the perl git-deploy [20:54:25] trigger is the new thing [20:54:44] one more question mr Ryan_Lane: still relevant https://rt.wikimedia.org/Ticket/Display.html?id=1876 ? [20:54:47] just a warning in case you try it and it isn't working :) [20:55:03] drdee: can you just give me the ticket description? [20:55:09] I'm on my lyft laptop [20:55:12] sure [20:55:14] "Instance creation in different projects causes IP addresses to be reused" [20:55:38] mind pm'ing me the full description? [20:56:22] and, sorry, Ryan_Lane, which is the labs sartoris project using? [20:56:36] trigger [20:56:54] now that I remember it, that's in a pretty inconsistent state. let me fix that [20:57:03] ottomata: I'll fix that today [20:57:20] I was working with that in a virtualenv [20:57:27] when it's just me I don't mind things being in a weird state :) [20:57:47] aye ja [21:04:42] AHHH Ryan_Lane! [21:04:43] # git annex addurl http://kitenet.net/~joey/screencasts/git-annex_coding_in_haskell.ogg [21:05:01] ottomata: ? [21:05:07] yeah, annex is written in haskell [21:05:12] naw, [21:05:13] you shouldn't need to write any annex code [21:05:14] addurl support [21:05:15] just learned that [21:05:20] oh [21:05:21] heh [21:05:22] yeah [21:05:25] annex is pretty great [21:05:25] i think that is exactly what I want [21:05:48] where the url will be internal, right? [21:05:56] yeah, internal archiva instance [21:05:59] cool [21:06:07] or jenkins mabye one day [21:06:08] we will see [21:06:23] it would be cool to have trigger generate the annex config [21:06:36] we could probably add an action for that [21:07:08] at minimum it could be an extension, but I could see this being pretty useful [21:07:16] or does annex have a config file already? [21:09:26] (03PS11) 10Matanya: site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 [21:17:22] RECOVERY - RAID on db1002 is OK: OK: optimal, 1 logical, 2 physical [21:28:41] (03Abandoned) 10Yurik: Add more zero values to analytics header [operations/puppet] - 10https://gerrit.wikimedia.org/r/93006 (owner: 10Yurik) [21:32:54] ori: is the mwprof salt module yours? [21:33:00] ottomata: or yours? [21:33:42] not mine [21:36:31] (03PS1) 10RobH: svn.wikimedia.org to use own cert, not wildcard [operations/puppet] - 10https://gerrit.wikimedia.org/r/110237 [21:37:00] just wondering, cause I looked at it and I don't think it could possibly work [21:37:17] it just does a subprocess call and doesn't specify the directory to run it in [21:42:24] (03CR) 10RobH: [C: 032] svn.wikimedia.org to use own cert, not wildcard [operations/puppet] - 10https://gerrit.wikimedia.org/r/110237 (owner: 10RobH) [21:45:04] !log updating svn to use own cert, service disruption may (but shouldnt) occur [21:45:13] Logged the message, RobH [21:48:12] !log svn.w.o on own cert, confirmed chain is properly functioning [21:48:20] Logged the message, RobH [21:50:01] (03PS1) 10Ryan Lane: Deployment module changes for trebuchet-trigger [operations/puppet] - 10https://gerrit.wikimedia.org/r/110239 [21:51:08] (03CR) 10Ryan Lane: [C: 04-2] "This change is in preparation to switching the deployment frontend from the perl git-deploy to trebuchet-trigger." [operations/puppet] - 10https://gerrit.wikimedia.org/r/110239 (owner: 10Ryan Lane) [21:57:37] (03PS1) 10Physikerwelt: WIP: Enable orthogonal MathJax config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110240 [21:57:41] hm, crap. I need a package pushed into the repo and I can't ssh into production [21:57:55] I also can't ssh into the build instance in labs [21:58:23] erg [21:58:27] can I help? [21:58:36] yeah, let me get into the build instance first [21:58:38] k [21:58:43] then you can push the package into the repo [21:58:44] k [21:59:45] oh crap [21:59:46] :( [21:59:57] andrewbogott_afk: we were using build-precise1 :( [22:00:36] oh well. I'll make a build instance in the trebuchet project [22:00:44] it'll be a good start to moving stuff [22:03:25] I might be able to get upstream trebuchet in good enough working order for us to use that directly [22:03:47] (03PS1) 10RobH: ticket.wikimedia.org to use own cert, not wildcard [operations/puppet] - 10https://gerrit.wikimedia.org/r/110242 [22:03:56] I won't promise today, though :) [22:05:37] (03PS2) 10RobH: ticket.wikimedia.org to use own cert, not wildcard [operations/puppet] - 10https://gerrit.wikimedia.org/r/110242 [22:08:58] (03CR) 10RobH: [C: 032] ticket.wikimedia.org to use own cert, not wildcard [operations/puppet] - 10https://gerrit.wikimedia.org/r/110242 (owner: 10RobH) [22:12:54] !log otrs now using its own cert per rt 6702, confirmed working chain [22:13:01] Logged the message, RobH [22:21:46] jgage: I think we're using the Obsolete: namespace now ("move" tab) [22:21:58] template is called {{old}} if you prefer that [22:25:24] ok thanks. someone suggested i use {{obsolete}} even though that's not a defined template. [22:26:27] hm i don't have a "move" tab, perhaps i am not powerful enough [22:26:46] jgage: maybe you're not a dinosaur still stuck on Monobook? [22:27:01] jgage: you have a dropdown with a "Move" link underneath [22:27:18] which is also apparently awful design because people keep not noticing it, but oh well [22:27:21] i have no idea what you mean by dinosaur or monobook [22:27:38] heh [22:28:34] ok RobH showed me the tiny triangle [22:31:32] RECOVERY - RAID on db1057 is OK: OK: optimal, 1 logical, 2 physical [22:38:07] robla: Hi... can you tell me how to get into rt.wikimedia? Seems like I don't have an account yet, although I was added on a change [22:38:21] s/cahnge/ticket/ [22:38:26] hoo, you can reply to the email you got [22:39:05] ottomata: ok, mh... I would prefer to being able to see the Initial ticket, though [22:39:14] (might be good to document the "How do I get an account on RT?" question, which is easier than many think ;) ) [22:39:27] greg-g: What's the process? [22:39:28] hoo: you need to get your password reset afaik [22:39:35] * hoo is NDAed, if that matters [22:39:44] something like: send an email to rt@wikimedia.org, then ask for a password reset [22:40:02] just an empty one? [22:40:04] oof i think that exists [22:40:16] (03Abandoned) 10Aude: Have each site group use own cache key for wikibase [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110224 (owner: 10Aude) [22:40:47] * greg-g shrugs [22:40:48] it might [22:42:23] SMTP error from remote server after RCPT command: [22:42:23] host mchenry.wikimedia.org[208.80.152.186]: [22:42:23] 550 Address rt@wikimedia.org does not exist [22:42:25] well :P [22:43:26] hoo: pick a queue: https://wikitech.wikimedia.org/wiki/RT#Which_queues_do_we_have_and_what_are_they_used_for.3F [22:43:33] sorry, I was wrong before [22:44:16] Ok, for now I guess I only need access-requests@rt.wikimedia.org [22:45:28] "You'll want the user's full first and last name, and the typical account name is first initial last name though it's not absolutely required to be in that form. " [22:45:35] Anyone wants to do that maybe? [22:45:53] Or if it's faster one might want to mail/ msg / whatever me tikcet 6731 (access requests) [22:46:28] MatmaRex: do you know what admin priv i need to be able to see the Obsolete namespace? I've just been added to some groups, enabled 2-factor auth, re-authenticated, but Obsolete still isn't in my list. [22:53:16] greg-g: ^ [22:57:41] bd808: does log stash take care of this rt ticket? https://rt.wikimedia.org/Ticket/Display.html?id=2934 [22:58:47] drdee: It doesn't do that yet, but it could be made to do it. [22:59:01] could you maybe chime in on that ticket? [22:59:27] Sure. I think we have the log feed that would be needed. Just need to make some classification rules and pipe data out to graphite. [22:59:54] sweet! [23:07:41] jgage: no idea, i'm not ops, i just lurk a lot [23:07:56] how would i know that? :D [23:11:15] hehe ok, i'll ask a cow [23:11:18] ottomata: --^ (see above) [23:13:02] <> [23:13:07] hoo, i will be watching that ticket this week, as I am on RT duty [23:13:12] heh, dberror and dbperformance logs is silly :) [23:13:12] it needs approval from someone, but i'm not sure who [23:13:15] Robla maybe? [23:13:17] csteipp: ? [23:13:55] AaronSchulz, that must be https://en.wikipedia.org/wiki/Special:AbuseFilter/554 [23:14:18] AaronSchulz: someone create a filter at en.wo today that matched all of the edits and it caused weird stuff (dunno if the news reached you here) [23:14:24] ottomata: Yeah, I'm not sure what the official policy is. I'll ask robla. [23:14:26] (the failsafe triggered in the end) [23:14:26] ottomata: Ok, to answer the question, I want/ will be helping with Wikidata deploys (what aude already does)... also do some volunteer stuff in the Admin tools part... but mostly it's about Wikidata [23:14:56] there's a VPT thread about the wiki being sluggish, you might want to look at it [23:15:00] what hoo says [23:15:09] Oh, you're here :) [23:15:29] stuff like abuse filter, etc... investigating bugs might require seeing parts of the db not on labs [23:15:35] or helping with wikidata [23:15:42] MatmaRex: I already made a patch [23:17:02] AaronSchulz: there are a couple of throttling-related bugs, btw [23:17:20] It takes 3 people to do a deploy? [23:17:39] Reedy: Nah, usually we're two [23:17:51] Daniel is around sometimes, though [23:18:04] I was meaning 3 poeple with "shell"? [23:18:21] And I thought aude wasn't "allowed" to deploy to the cluster? [23:18:34] Reedy: Where's the problem with that? I don't know what aude is allowed to [23:19:35] Too many cooks etc [23:19:57] * RobH waits for someone to deploy without putting it on calendar. he likes watching greg-g yell at folks. [23:20:26] :) [23:21:07] I can understand you're concerns... I don't think any of use will be "just doing stuff", we are usually coordinating things well in advance [23:21:09] having help to debug things would be good [23:21:24] i don't think we all need to be in deploy group (yet) [23:21:44] One in deploy group should be enough [23:22:04] for now [23:22:35] * aude normally won't deploy but suppose at some point for small stuff as i get super confident how to do stuff [23:22:54] of course, bd808 has mostly removed the need for shell to view logs [23:23:10] yay logstash [23:23:35] Except for the "only open to wmf group" issue that we are trying to work through [23:23:35] <^d> RobH: I did it twice today ;-) [23:23:38] (03PS1) 10RobH: set lists.wikimedia.org to use own cert, not wildcard [operations/puppet] - 10https://gerrit.wikimedia.org/r/110257 [23:23:45] ^d: gerrit doesnt count! [23:24:01] or else you'd fill the calendar [23:24:03] <^d> Wasn't counting it ;-) [23:28:34] (03CR) 10RobH: [C: 032] set lists.wikimedia.org to use own cert, not wildcard [operations/puppet] - 10https://gerrit.wikimedia.org/r/110257 (owner: 10RobH) [23:38:36] (03PS1) 10Chad: Stop making AdminSettings symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110259 [23:39:55] :9 [23:40:00] *:( [23:40:45] <^d> Reedy: Feeling nostalgic for AdminSettings? [23:40:45] <^d> :p [23:41:17] It's amusing that they're still there [23:49:10] <^d> Reedy: Indeed. It still exists in wmf-config on tin ;-) [23:54:28] (03PS1) 10RobH: setting lighttpd's config for lists.wikimedia.org cert [operations/puppet] - 10https://gerrit.wikimedia.org/r/110260 [23:56:20] (03CR) 10RobH: [C: 032] setting lighttpd's config for lists.wikimedia.org cert [operations/puppet] - 10https://gerrit.wikimedia.org/r/110260 (owner: 10RobH) [23:58:40] !log all lists.w.o updates done and now on individual certificate [23:58:40] RobH: Hm.. doc.wikimedia.org is still infinite looping for some people. It works on my machine, but on Trevor's for example he's getting nowhere [23:58:45] https://gist.github.com/trevorparscal/ac9df065059656311973 [23:58:48] Logged the message, RobH [23:58:53] it's pointing to misc-lb-eqiad [23:58:57] which is the new one right? [23:59:04] yep [23:59:18] points to misc-web-lb which handles ssl termination for it