[00:05:50] hashar: i don't know any background about galliium, but i'm going to take a look at it [00:06:21] binasher: hi:) it has been upgraded using Ubuntu dist-upgrade [00:06:33] since we had some stuff not fully puppetized [00:07:19] good to know [00:08:02] \/win 38 [00:08:11] I thought it could be PHP related but it is hard to know really since everything got upgraded [00:08:23] php5-apc is installed at least [00:08:40] also the machine went to swap earlier (around 8pm UTC) [00:09:10] some PHP process started eating all memory. but that must be a bug in either Jenkins or our php scripts. [00:10:44] and it looks jenkins eats a lot of disk [00:10:47] according to atop [00:11:42] binasher: actually, killing the job_random inequality and leaving the order by works too [00:12:54] !log installing package upgrades on marmontel (blog) [00:13:02] Logged the message, Master [00:21:40] binasher: sorry heading bed. 1:20am there :/ [00:21:59] if you find any thing simply reply to the email :-] [00:22:11] have a good night! [00:27:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:09] New patchset: J?r?mie Roquet; "(bug 41526) Disable the Contest extension on mwwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/31165 [00:35:59] binasher: here is what I have so far https://gerrit.wikimedia.org/r/#/c/31129/3/includes/job/JobQueueDB.php [00:40:30] TimStarling: can you look at https://gerrit.wikimedia.org/r/#/c/31129/3 ? [00:40:59] I'm in a meeting [00:42:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.020 seconds [00:43:00] AaronSchulz: why does claimOldest order by job_random? [00:43:08] vs timestamp [00:43:28] it doesn't seem to actually claim oldest [00:43:31] binasher: job_random is indexed and is timestamp based in that case [00:43:46] well, it's always indexed [00:44:46] binasher: I may do it by job_id later by an the index, which I may do while making some other changes (dropping the old job_cmd index, adding a job_retries column) [00:45:33] *by a new index, erm [00:45:53] ordering by job_id asc will always be oldest to newest [00:46:30] yes, I might do that when I make some other db changes, but right now there is no job_cmd,job_id index [00:46:34] but ok, job_cmd_token index [00:46:51] maybe I'll rename job_cmd_token -> job_cmd_token_rand while at it [00:47:17] I hate it when indexes don't mention everything in it ;) [00:48:15] where is job_random changed into a time based value? [00:48:48] binasher: look at insertFields() [00:49:40] just a (job_cmd, job_token) index might be ok for that case, pk is at the end of every secondary [00:50:10] does that work for sorting? [00:50:27] actually I was looking for a straight answer in the docs for that just earlier [00:50:49] is insertFields() in a different patch? [00:51:04] binasher: it's in that file, not the patch [00:51:48] ah, it's already supported, ok [00:52:21] i do think it would word in that case [00:52:45] binasher: I was too cheap to add another index...though I'm ok doing that if the (job_cmd, job_namespace, job_title, job_params(128)); index is nuked [00:53:16] * AaronSchulz imagines that one is a little expensive [00:53:45] which it can be since we use job_sha1 now [00:57:57] i think job_random meaning different things in different cases is a bit counter intuitive [00:57:58] but [00:57:59] eh [00:58:25] binasher: yeah, it will be dealt with soon :) [00:58:27] frankly, i'd rather see all of this as a stepping stone towards replacing JobQueueDB with something else [00:58:43] JobQueueShinyThing [00:59:03] first I need to finish up performance and retry attempts for the db one [00:59:05] so i don't know if it's really worth putting too much effort into polishing this beyond the point of functioning well [00:59:12] then I'll look into something else [00:59:41] binasher: how php do you know? ;) [01:00:18] and getting the indexes and naming perfect isn't critical to make it function well [01:00:27] what's php? [01:15:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:17:06] binasher: meh, I won't bother rearranging the deck chairs...I mean renaming that one index [01:17:29] ;) [01:24:08] binasher: in the meantime, you can compile your ideas for other subclasses, like redis [01:29:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.092 seconds [01:32:50] New patchset: Asher; "adding redis class to mc pmtpa servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31166 [01:33:53] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31166 [01:42:31] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 285 seconds [01:44:39] New patchset: Asher; "remove superfluous variable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31167 [01:45:00] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31167 [01:47:47] New patchset: Asher; "fix typo not caught by puppet parser validate" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31168 [01:48:42] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31168 [01:54:06] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 15 seconds [01:54:10] New patchset: Asher; "fix default pkg version" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31169 [01:54:36] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31169 [01:57:10] New patchset: Asher; "fnord" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31170 [01:57:25] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31170 [02:00:41] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 290 seconds [02:01:02] New patchset: Asher; "by default, name redis server after package name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31171 [02:01:16] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [02:01:16] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [02:01:16] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [02:01:31] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31171 [02:03:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:18:49] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.016 seconds [02:28:04] !log LocalisationUpdate completed (1.21wmf3) at Thu Nov 1 02:28:03 UTC 2012 [02:28:13] Logged the message, Master [02:54:45] !log LocalisationUpdate completed (1.21wmf2) at Thu Nov 1 02:54:45 UTC 2012 [02:54:55] Logged the message, Master [02:58:00] RECOVERY - Puppet freshness on brewster is OK: puppet ran at Thu Nov 1 02:57:57 UTC 2012 [03:56:04] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [03:59:15] !log tstarling synchronized php-1.21wmf3/includes/job/JobQueueDB.php 'fix deadlocks' [03:59:21] Logged the message, Master [04:11:37] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [04:47:20] !log aaron rebuilt wikiversions.cdb and synchronized wikiversions files: reverted wmf3 deployment made earlier today. [04:47:24] Logged the message, Master [04:54:14] TimStarling: Hm.. just revisiting http://tstarling.com/presentations/Tim%20Lua%202012.pdf and couldn't help notice that table indexes are 1 based, not 0. [04:54:21] looking it up confirms it [04:54:40] how annoying. Is that influenced from somewhere or just a Lua oddity? [04:55:20] I mean, its not bad. just different than pretty much every other language I know. [04:56:13] FORTRAN arrays start at 1 [04:56:48] Aaron|home: that's a large community of sixty year olds that we could tap to work on templates [04:57:09] Success rate of builds on Travis CI: https://twitter.com/konstantinhaase/status/263235120151027712 [04:57:09] interesting [04:57:09] * ori-l is just trolling. Disregard. [04:57:09] and it supports complex numbers! [04:57:10] you just don't know enough languages ;) [04:57:41] I know about FORTRAN's existence and rough place in history / family, but that's about it. [04:58:23] arrays in old dialects of BASIC were 1-based, and you couldn't set a zero element if you wanted to [04:58:33] you just had to offset [04:58:33] TimStarling: not the languages that you think count, i guess :P [04:58:51] at least in lua, you can set elements with keys less than 1 [04:59:01] I also find odd how it creates a new property in a table by referring to the property in the function name of a function declaration [04:59:29] TimStarling: Keys can be strings as well, right? [04:59:31] it's just that array constructors start from 1 if you don't specify a key, and some library functions return arrays indexed from 1 [04:59:35] anyway, that's the sort of superficial property that people outside the language huff and puff about and that doesn't end up mattering at all, like semantic whitespace in python. [04:59:48] yes, or tables or functions [04:59:57] any value can be a key [05:00:07] Hm.. even functions and other tables? [05:00:11] Interesting.. [05:00:12] two days after you start working with python you forget that "semantic whitespace" is even a thing; the only time it comes up is when you're talking with skeptics who don't know the language [05:00:16] So they go by reference then I suppose? [05:00:20] Or is it serialized? [05:00:21] ditto for go's object system more recently [05:00:26] or lack thereof [05:01:01] people approach a language by looking for the things that they know and if their expectations are violated they get upset. [05:01:03] I mean in javascript you can do obj[function foo() { return 123; }] = 'Hi', but it will toString() the function body and use that as a key, so a similar (but not the same by reference) function will work for the same key. [05:01:24] better sort out this operations issue first [05:01:36] sure thing [05:02:23] ori-l: meh, as an observing javascript developer I know better than that. The most common problem and most commonly with this one language, javascript, is that people don't learn it before they write it. [05:03:31] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [05:03:34] i don't think it's such a severe problem any more. it's fun to rage about people who don't know what they're doing, but the mean quality of JS code "in the wild" has improved tremendously over the past five years, and no one mistakes the language for a toy anymore. no one who isn't complete idiot, anyway. [05:03:34] Douglas Crockford even uses that as his general theme throughout his presentations. How people try to beat it into something its not, thus limiting themselves to the overlap with other languages, and ignoring some of the most powerful features. [05:03:57] ori-l: I disagree. It depends a lot on where you look. [05:04:18] Krinkle: you have to look hard nowadays to find things like var x = Array(); [05:05:07] I often help out on StackOverflow. Just a few days ago I got a student asking a question, who claimed to be in a class room. The examples the professor was giving were awful, to be ashamed of. [05:05:19] the page was deleted so I can't share it anymore [05:05:34] good javascript books written a few years ago, like stoyanov's JS patterns, still devoted half their bulk to ill-conceived ways of implementing classical inheritance in JS. but that stuff jsut disappeared from the recent literature. [05:05:38] The problem is transforming [05:06:21] it isn't ignorant people who don't want to be writing javascript, its the new generation who want to do it, but are given shit from the old generation (some of them, that is. There is many great devs, too, of course) [05:07:41] e.g. leaving off 'var', everything global. incorrectly using new, or writing x = Array(1, 2, 3, 4) indeed. [05:07:43] There is so much. [05:08:26] yeah. the traps and edge cases are js's big problem [05:08:34] most of it is due to the flaws in the language, but that only proves the writer didn't "read the manual". [05:09:06] if you know the language, you wouldn't fall in the traps as you wouldn't try to write such code to begin with. [05:09:47] there's still edge cases that everybody traps into from time to time, but fortunately we have code quality checks for that now (like jshint) [05:09:58] and ES5 strict mode. [05:10:45] yes, but -- i'm a pretty experienced JS developer, and if you had to sit behind me and watch me code and were not allowed to say anything i bet you would go barking mad in ten minutes. [05:11:03] today? [05:11:37] yeah, there are just sooooo many tiny things one has to know to avoid [05:11:44] Maybe, it depends on whether I'm in the mode of what's good enough and what I would want to do instead. [05:12:26] although 'good enough' is a tricky one. I can't define it myself. [05:12:32] hello gerrit-wm & gerrit-wm_ [05:13:08] logmsgbot_, logmsgbot [05:13:18] example: obj[function foo() { return 123; }] = 'Hi' [05:13:21] what the flying fuck [05:13:34] PROBLEM - Puppet freshness on ms-fe3 is CRITICAL: Puppet has not run in the last 10 hours [05:13:35] Objects are hashes, indexed by strings [05:13:42] this just made my head explode. i would have sworn you'd end up with obj[123] [05:13:48] of course not [05:13:50] er nevermind i guess you aren't invoking it [05:13:51] there is no invocation [05:13:56] yeah [05:13:58] i just misread that, nevermind [05:14:10] any non-string is passed through [[toString]] [05:14:17] So they go by reference then I suppose? [05:14:17] Or is it serialized? [05:14:24] by reference [05:14:38] TimStarling: that's pretty cool [05:14:48] and it has both strong and weak references [05:15:13] say if you have some objects managed by some other module, and you're writing an extension to that module [05:15:28] and you want some data associated with the foreign object [05:15:54] TimStarling: often in JS there is the issue of "private" keys. For example in an implementation of the Purse model or Safe system. You'd get an object (could just be an empty object) which is then the unique key for whatever data. [05:16:02] so you make a table with weak keys, indexed by object reference, and store your data in it [05:16:30] but since in JS objects can't be keys, one has to work around it with an array. Then look up the object in the array and use the index as the ID internally. [05:16:38] then when the foreign module deletes all references to the object you're interested in, the garbage collector magically deletes it from the weakly keyed array [05:16:47] very nice [05:17:18] ori-l: Since arrays are just objects in javascript, even "array" indexes are string keys. var x = ['x']; return x['0']; [05:17:34] Although I know from V8 that it optimises for this (in that x[0] is faster than x['0 [05:17:44] yes, i know. i read an extra () into that earlier [05:18:04] whereas it is usually the opposite since it has to convert 0 into a string to do the lookup. [05:19:16] TimStarling: aha, that's even more awesome. in the JS implementation that wouldn't work since the object reference would be stored in the array as well, so it'd never garbage collect on its own. [05:19:20] if you write "var x = [ 14, 23, 1232 ];" v8 will store that as an array of ints [05:19:34] if you then do x.push('foo'); it has to box the array and it's expensive [05:20:20] I'm not sure if it optimises for the kind of values, but it does optimise for the kind of keys (e.g. for simple arrays [0] will be faster than ['0'], until it gets a string property) [05:20:22] it works very well most of the time without you having to think about it, but it's useful to know to avoid violently breaking the engine's expectations about the types flowing through [05:20:49] ori-l: you know this? (I don't) [05:20:58] Or hypothetical. [05:20:58] yes [05:20:58] nice [05:21:48] for example, if you have a function, function sum(a, b) { return a + b; } [05:21:55] and you invoke it often, and always with ints [05:22:03] * robla reads backlog and notes that Pascal also had 1-based arrays [05:22:10] *has even [05:22:19] robla: okay, people in their 60s *and* 50s [05:22:24] * ori-l ducks [05:22:30] 40s even :-P [05:22:43] * ori-l was taught pascal in elementary school [05:22:46] shh [05:23:02] child abuse! [05:23:07] you whippersnappers and your 0-based arrays [05:23:15] Krinkle: anyways v8 will compile that into code that works on ints [05:23:25] 1-based makes more sense, its just that the world has turned somehow. [05:23:35] humanly speaking anyway. [05:23:41] it's all C's fault [05:23:49] if after 1000 calls the 1001th call is sum(4, "hello"), the engine has to slam on the breaks [05:24:19] arrays are just pointers....bah! [05:24:21] and basically reinterpret your code [05:24:52] anyway, as Trevor so eloquently puts it, It is important not to optimise for the optimiser! [05:24:55] robla: arrays are just objects that have special syntax for getting and setting keys that are strings of digits! [05:24:57] it's so simple! [05:25:16] (especially given that there are more than 1) [05:25:31] ... in js. [05:26:21] ori-l: + some Array.prototype methods for convenience, which naturally inherit from Object.prototype of course. [05:26:40] It becomes especially tricky when prototype objects themselves are objects that inherit from the Object.prototype [05:27:24] so even when there is a 20-long inheritance chain (e.g. HTMLDivElement > HTMLElement > Element > Node ... > Object) every prototype object inbetween is also an object [05:28:12] ori-l: btw, did you know that the worst thing in javascript is also what we use inevitably in the browser? [05:28:18] with statement. [05:28:32] Every inline and external script is evaluated in with (window) { .. } [05:29:20] which is why 'document' is a "global" variable and why "x = 5" is an implied global (because that's how a with statement works). var a = {}; with (a) { foo = true; } . creates a.foo [05:30:06] which is yet another reason why browsers are evil, but javascript itself isn't so bad. js doesn't really have implied globals I believe. [05:30:21] rant! brb later [05:42:55] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [05:42:55] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [05:42:55] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [06:22:58] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [06:37:19] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.032 second response time on port 8123 [06:47:22] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [06:48:52] RECOVERY - Lucene on search1016 is OK: TCP OK - 9.027 second response time on port 8123 [06:55:55] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [06:59:31] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [07:05:50] RECOVERY - Lucene on search1016 is OK: TCP OK - 3.026 second response time on port 8123 [07:24:34] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [07:28:55] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [07:30:20] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.031 second response time on port 8123 [07:35:58] RECOVERY - Lucene on search1016 is OK: TCP OK - 9.029 second response time on port 8123 [07:38:49] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [07:40:12] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.031 second response time on port 8123 [07:40:20] !log Killed and restarted lucene on search1016 [07:40:26] Logged the message, Master [07:41:45] New patchset: Mark Bergsma; "Make the backend weights equal to upstream" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31182 [07:42:05] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31182 [07:44:40] sigh [07:55:36] New patchset: Mark Bergsma; "Fix double spaces" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31183 [07:57:06] New patchset: Mark Bergsma; "Fix double spaces" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31183 [07:57:51] New patchset: Mark Bergsma; "Fix double spaces" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31183 [07:58:21] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31183 [07:59:13] PROBLEM - Puppet freshness on db51 is CRITICAL: Puppet has not run in the last 10 hours [08:00:16] d'oh [08:00:21] i need more coffee again [08:01:12] New patchset: Mark Bergsma; "Revert "Fix double spaces"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31184 [08:01:31] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31184 [08:07:48] New patchset: Mark Bergsma; "Fix double spaces" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31185 [08:08:18] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31185 [08:08:59] :) [08:10:10] really more coffee [08:10:31] New patchset: Mark Bergsma; "Fix method name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31186 [08:10:50] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31186 [08:16:33] hehe [08:16:43] cp3003 sees all eqiad backends as sick [08:16:56] the moment I turn on prefer_ipv6 [08:16:56] it's all happy [08:22:01] New review: Hydriz; "Hmm, this is weird, not sure why it doesn't work." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/30588 [08:24:30] New patchset: Mark Bergsma; "Allow extra runtime parameters" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31188 [08:24:56] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31188 [08:27:56] New patchset: Mark Bergsma; "Puppet's lack of string concatenation sucks" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31189 [08:28:12] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31189 [08:35:12] New patchset: Mark Bergsma; "Could not use $extraopts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31190 [08:35:39] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31190 [08:40:20] !log running some nonpriority jobs manually on mw12 (so people later don't get weirded out bu the ganglia graphs) [08:40:25] Logged the message, Master [08:40:38] New patchset: Mark Bergsma; "Prefer IPv6 when contacting eqiad backends in esams" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31191 [08:41:22] New patchset: Mark Bergsma; "Prefer IPv6 when contacting eqiad backends in esams" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31191 [08:41:42] hehe [08:41:49] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31191 [08:42:24] can I restart more upload caches in eqiad? [08:45:01] yes [08:46:14] !log Restarted backend varnish instance on cp1025 [08:46:18] Logged the message, Master [08:48:19] swift req/s doubled [08:50:29] New patchset: Mark Bergsma; "Add Ganglia cluster Upload caches esams" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31193 [08:55:46] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31193 [09:22:00] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [09:24:51] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [09:30:10] New patchset: Mark Bergsma; "Fix ganglia aggregators" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31197 [09:30:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31197 [09:45:48] hello [10:23:08] New patchset: Mark Bergsma; "Fix tcptweaks" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31199 [10:24:33] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31199 [10:25:30] Anyone available for deploying? [10:26:47] no, let me change that: Is anyone available to deploy code for me? [10:34:23] New patchset: Mark Bergsma; "Move generic::tcptweaks to base, where it belongs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31200 [10:35:35] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31200 [10:37:41] New patchset: Mark Bergsma; "Fix the dependencies of base::tcptweaks" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31201 [10:37:57] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31201 [10:46:58] hello OrenBochman [10:55:06] !log Restarted backend varnish instance on cp1026 [10:55:12] Logged the message, Master [11:02:23] New patchset: Nikerabbit; "Space attack, reduce. See I3aa4e3a3" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/31205 [11:29:54] New patchset: Mark Bergsma; "Significantly lower the streaming threshold on backend instances" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31213 [11:30:18] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31213 [11:57:39] New patchset: Mark Bergsma; "Lower stream threshold for esams frontends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31217 [11:58:34] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31217 [12:01:35] hm? [12:01:54] New patchset: Mark Bergsma; "Pass $cluster_tier to frontends as well" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31219 [12:02:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31219 [12:02:22] aha [12:02:24] got it :) [12:02:35] it takes longer to download large files to esams [12:02:39] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [12:02:39] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [12:02:39] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [12:02:40] and there are more http hops [12:02:46] yeah yeah [12:02:47] got it :) [12:02:49] if they all wait for the entire object to be in it adds up [12:03:00] thought of that before you said it [12:03:57] waiting 4s for a 5 mb file sucks :) [12:06:04] hey we don't have initcwnd 10 for ipv6 [12:06:04] hah! [12:06:04] I guess we don't [12:06:04] kernel supports it [12:06:08] but it's a little annoying to manage with puppet [12:06:26] i just fixed up the ipv4 one too [12:06:32] after I found some more varnish boxes without it applied [12:06:47] didn't you reboot them? [12:06:50] i'm pretty sure I did [12:07:15] but also this change applies only on the 2nd puppet run [12:07:16] which we should fix [12:07:21] because the first puppet run deploys the facts [12:07:29] and then the 2nd one the initcwnd change [12:07:35] the facts should be deployed by the fileserver facts module [12:07:37] or a puppet module [12:07:43] not through puppet, that's a deadlock [12:07:52] I put that in the rt ticket for leslie to fix, but she hasn't yet [12:07:58] what do you mean? [12:08:12] when are facts run? [12:08:13] before a puppet run [12:08:16] yes [12:08:18] if a fact is broken... [12:08:20] puppet won't run [12:08:29] so once she had a broken fact script [12:08:33] and it blocked all future puppet runs [12:08:41] for that reason [12:08:49] i manually fixed some boxes which were in that state [12:08:57] yeah, I've had a similar problem in the past [12:09:15] so if puppet would simply download them from the puppetmaster from the factsync module [12:09:16] the fact wasn't broken, it just didn't detect virtual correctly [12:09:19] instead of as a file resource [12:09:24] then we wouldn't have that problem [12:09:28] so it removed ntp and stuff, until the next run [12:09:52] are we deploying them as files? [12:09:53] oh dear :) [12:09:53] yes [12:10:06] so I asked leslie to fix that, but I guess she didn't understand it [12:10:06] yeah, we should deploy them as part of a respective module [12:10:10] yup [12:10:20] not a factsync module though [12:10:26] just place the facts where they belong [12:10:29] in fact... [12:10:34] yeah that's what I meant [12:10:38] i confused it with the old pluginsync [12:10:39] I started coding yesterday an apt module for an entirely differnt reason :-) [12:11:25] (I want to add the Ubuntu Cloud Archive and don't want to hack it up like some other repos in our puppet) [12:11:39] right [12:11:53] Change abandoned: Cmjohnson; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/31070 [12:12:22] sigh, so much work to do [12:13:14] why won't firefox respect my /etc/hosts anymore [12:13:16] it's annoying [12:25:53] New review: Hydriz; "Look good, but..." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/30319 [12:29:33] !log Pooled cp3003 as upload Varnish cluster in PyBal with weight 1 [12:29:39] Logged the message, Master [12:31:52] are there more detailed logs for varnish errors? [12:31:56] i.e. http://upload.wikimedia.org/wikipedia/test2/thumb/7/78/Floating_in_the_dead_sea.webm/800px--Floating_in_the_dead_sea.webm.jpg [12:32:08] sometimes gives me a varnish error and sometimes an error from the imagescaler [12:35:17] !log Depooled cp3003, high rate of 500 responses [12:35:22] no there aren't [12:35:22] Logged the message, Master [12:37:25] hmm [12:37:31] lots of requests for one thumb which gives a 500 response [12:37:45] Wappen_Reinerzau.png [12:38:41] all 180px [12:38:42] while the original is 140 [12:44:12] sec [12:44:53] hi folks, mutante are you here? [12:45:12] there is some problem with instance creation on labs [12:50:11] back [12:53:28] petan: IIRC it's known (plus for mutante it's 5AM) [12:53:51] andre__ where [12:53:56] is it known [12:54:11] afaik I am subscribed to all wmflabs bugs [12:55:00] I'm not sure if it is properly logged [12:56:04] mark: could you give the exact URL? [12:56:19] upload.wikimedia.org /wikipedia/commons/thumb/a/ac/Wappen_Reinerzau.png/180px-Wappen_Reinerzau.png [12:56:19] or I'll find it from the logs I guess [12:56:21] ah cool [12:56:24] looking [12:56:24] i think it's just the image scaler saying "I won't scale to larger than original" [12:56:33] so not necessarily a problem [12:56:37] although a 404 would be a more appropriate response [12:57:01] yes that's what it is [12:57:35] 500 is what mediawiki sends [12:57:44] swift just proxies that [12:59:27] yes [12:59:29]