[00:00:29] strchr would be enough [00:01:40] are you sure you own that pointer? [00:02:42] pretty sure it would be a bad idea to modify that pointer [00:02:53] that said, he isn't. I just had that discussion with him [00:04:57] it's a pity C has no efficient substring operation [00:05:20] you know D has slices, which are a substring reference that can be used like a string [00:09:52] TimStarling, I think on a gnu/linux system strtok is threadsafe [00:10:11] it'd have __thread storage [00:10:14] yes, probably [00:10:52] and yes, it should be better documented [00:11:10] well, I said "according to the linux manual" [00:11:23] I know [00:11:52] the man pages point what the spec says [00:12:03] from ISO C or POSIX... [00:12:07] the glibc manual doesn't say much either [00:12:13] which have no notion of thread-local [00:13:15] well, to be fair the glibc manual says it's not reentrant, and gives a link to a discussion of signal handlers [00:13:23] it doesn't say anything about threads [00:13:52] oh, right, C std doesn't know about threads either :) [00:14:34] yes, it may still not be safe for a signal-handler [00:15:36] New patchset: preilly; "use strtok_r instead of strtok for thread safety reasons" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16095 [00:15:50] Ryan_Lane: ^ [00:15:55] TimStarling, Platonides, Ryan_Lane, binasher ^^ [00:16:05] yeah [00:16:12] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16095 [00:16:24] Ryan_Lane said you weren't going to modify the pointer [00:16:40] seems I had this exact same code in another spot [00:16:47] using strtok_r, though [00:16:50] preilly, why are you repeating VRT_GetHdr() ? [00:16:51] easy enough to switch to [00:16:54] but you're calling strtok_r() which modifies the pointer [00:17:02] surely it is much more expensive than keeping the first pointer? [00:17:17] Change abandoned: preilly; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16095 [00:18:13] you should use that awesome C99 variable-size automatic array feature [00:18:41] what language is that, by the way? [00:19:03] int len = strcspn(header, ","); [00:19:04] looks like a mixture [00:19:23] char ip[len+1]; [00:19:42] memcpy(ip, header, len); [00:19:51] ip[len] = '\0'; [00:20:08] char* ip = strchrnul(header, ','); [00:20:10] you know I discovered it when one of the analytics people used it by mistake [00:20:15] *ip = '\0'; [00:20:26] I saw it and thought "why the hell does that compile?" [00:20:39] it has been in gnu for a long time, too [00:20:46] I'm getting old [00:20:57] TimStarling: the first call modifies the pointer? [00:21:01] don't understand this new-fangled C99 [00:21:04] or only subsequent calls? [00:21:29] Ryan_Lane: strtok overwrites the input string, adding \0 characters where it finds delimiters [00:21:31] what's that \020 there? [00:21:46] as does strtok_r [00:21:59] I was under the impression that only happens on subsequent calls [00:22:09] then it returns a pointer to within the input string [00:22:11] not the initial [00:22:12] no [00:22:19] it happens in the initial, too [00:22:26] * Ryan_Lane grumbles [00:22:39] so strtok("whatever", 'sep') is a way to truncate at sep [00:23:00] binasher, https://gerrit.wikimedia.org/r/#/c/16097/ [00:23:23] it'd be equivalent of doing [00:23:27] char *t = strchr("whatever", ','); if (R) *R = '\0'; [00:23:36] * Ryan_Lane nods [00:23:47] er.. r and R being the same variable :P [00:24:17] so, this would effectively mangle the XFF that's sent past this point [00:24:30] clients would likely only get the 1st in the list [00:25:20] yes [00:25:24] AFAIK, we only care about the 1st one [00:25:25] New patchset: preilly; "use strtok_r instead of strtok for thread safety reasons" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16098 [00:25:42] I hope you're not going to argue that that's just fine and elegant and good practice [00:25:44] we strip XFF if it doesn't come from the HTTPS servers [00:25:55] it's not elegant [00:25:58] but it's fine [00:26:02] wouldn't inet_pton stop at the comma anyway? [00:26:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16098 [00:26:05] I gave you code that will work and be good practice [00:26:24] no allocation, leet C99 feature use [00:26:34] ah. missed that [00:27:04] you could also set the , to \0, then restore back to a , [00:27:09] hehe [00:27:30] you would think it would return a const char* [00:27:51] and then hopefully somewhere deep in varnish, a compiler will be screaming in pain [00:28:06] old tricks when constant section wasn't read-only [00:28:13] I got some bugs when the compiler changed that [00:28:19] also there's the little problem of varnish being multithreaded [00:28:28] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16098 [00:28:34] several threads holding the same header? [00:28:42] seems unlikely [00:28:44] what if there is some monitor thread that's dumping current request headers? [00:28:52] so, we'll merge this in, and later, when we fix this, we'll fix it in both places it's broken [00:28:54] MaxSem: what was happening without a cutoff in setlimits? [00:29:04] though in practice it isn't broken [00:29:12] I'd test inet_pton [00:29:20] I think it is likely to handle it right [00:29:37] without any need of string operations [00:29:57] one problem with writing broken C code and knowing that it will work for some subtle reason is that other people will read it and not know that that's what you're doing [00:30:03] binasher, the server's limit (1000 by default) gets used. this change is the only one needed for PECL extension to bw used [00:30:08] TimStarling: indeed [00:30:10] s/bw/be/ [00:30:14] it's been broken for about 6 months, though [00:30:18] and they will copy your example and really break something properly [00:30:23] yeah [00:31:06] in both of our use cases, we specifically only want one IP [00:31:13] MaxSem: cutoff should be optional for the pecl sphinxclient? [00:31:17] and we only want to pass a single one through [00:31:17] btw that code I gave you will work as long as the headers are limited to some reasonable size, like a few megabytes [00:31:30] if someone passes a gigabyte header then it will probably segfault [00:31:39] but most webservers limit header length [00:31:50] /* This functions returns the first ip provided by a list header like XFF */ [00:31:51] int first_ip_to_inet_addr(const char* src, void* dst) { /* We know inet_pton() does the right thing*/ return inet_pton(src, dst); } [00:31:53] wasn't this a problem very recently in apache? [00:32:13] I don't think so [00:32:17] MaxSem: oh, that's max_matches, not cutoff.. nm [00:32:42] good night [00:32:47] segfaults on header length, not this specific example [00:32:55] right [00:33:18] well, if you don't know about that variable size automatic feature you might be tempted to do [00:33:25] heh [00:33:32] char header[10000]; // surely no header could be bigger than this [00:33:55] right [00:34:05] strcpy(header, VRT_GetHdr(...)); [00:34:08] I think for our use case, the solution we have is simple and non-buggy [00:34:12] lol [00:34:24] yes, that would indeed be bad. [00:35:13] maybe documentation specifying our use case would suffice? [00:35:32] Ryan, the right way to do it is like 3 lines of code [00:35:42] and you want to document why you did it the wrong way? [00:35:50] binasher, though on my VM's Linux PECL seems slower than pure-PHP [00:36:21] TimStarling: so you want it like: http://ideone.com/Ikfrg [00:36:24] meh. this is a review I'm doing at the end of the day. I don't feel like fixing it right now. [00:36:29] that's the real issue [00:36:53] I would commit a fix if I had a test setup [00:37:41] preilly: yes [00:38:18] maybe size_t instead of int for len [00:48:50] TimStarling: how does this look http://ideone.com/OcWIz to you? [00:48:51] MaxSem: that's odd. slower at any one thing in particular? [00:49:23] preilly: looks good [00:50:22] New patchset: preilly; "remove use of strtok_r" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16101 [00:50:46] Ryan_Lane: ^^ [00:51:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16101 [00:51:50] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16101 [00:56:07] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [01:18:01] I really hope you guys are actually writing a buffer overflow in our varnish instance [01:19:35] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [01:21:17] csteipp: how so? [01:21:51] maybe we should switch channels? [01:42:32] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 284 seconds [01:44:20] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 299 seconds [01:50:11] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 654s [01:51:32] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 13 seconds [01:56:20] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 11 seconds [01:56:38] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 11s [02:05:38] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [03:53:38] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [03:56:38] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [04:16:49] Reedy: Do you know how to /// have you ever // .. deploy bugzilla updates? [04:17:06] not bugzilla core, but I mean a simple change in svn to your skin and comment parser regex [04:17:11] our* [04:17:31] I created an RT ticket, but its been 2 weeks. Something like this shouldn't take weeks, nor require an RT ticket. [04:17:48] isn't it just svn up on the right machines? [04:28:38] morning [06:48:44] yes it is [07:24:51] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [08:23:17] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [08:54:07] paravoid: I have been looking at your module for ntp. [08:54:28] paravoid: have you considered making each of our puppet module an independent git repo ? [08:54:40] this way third parties could easily reuse / include our modules in their project [08:54:57] we could publish them in http://forge.puppetlabs.com/ ;-D [09:24:41] sorry, was talking with banks [09:24:52] multiple silly charges for christ's sakes [09:24:54] I hate greek banks [09:25:00] anyway [09:25:04] <3 greek food [09:25:13] so, separate git for every tiny module is a bit of an overkill imho [09:25:33] also, it would make doing site-wide changes atomically oh so much more difficult [09:25:48] (I presume you also meant using git submodules) [09:26:59] publishing some of our things in forge (or riseup or just somewhere locally) is something I mentioned in my mail [09:27:32] so, it might make sense splitting up a very large independent module of ours to a separate repo [09:27:44] but let's do that later or when a real need arises :) [09:36:31] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [09:58:46] paravoid: totally presumed using submodules :-] [09:59:20] paravoid: a typical use case would be the git:: classes which are just a wrapper for the git client [09:59:33] we will see :-] we can always make them a git repo later on [09:59:48] paravoid: what is your thought about rspec / cucumber ? [10:04:14] paravoid: oh and by pure coincidence, i was looking at a test system for puppet when you send your mails :D [10:04:56] well got http://projects.puppetlabs.com/projects/ci-modules/wiki/Blog which compare them [10:05:18] hashar: http://puppetlabs.com/blog/the-next-generation-of-puppet-module-testing/ [10:05:22] jul 12 :) [10:05:27] ahah [10:05:34] i should subscribe to that blog rss [10:06:05] cucumber-puppet is not compatible with puppet 2.7 anyway [10:06:54] the reason I was asking is because I am rewriting the squid class to factor out some code [10:09:04] the draft https://gerrit.wikimedia.org/r/16115 [10:09:43] added you as a reviewer so you can read it :-D [10:10:14] anyway, daughter duty + lunch [10:10:17] will be back in roughly 2 hours [10:38:38] !log Built new varnish 3.0.3~rc1+persistent1-wm1 packages and inserted them into the precise-wikimedia APT repository [10:38:48] Logged the message, Master [10:39:15] yay [10:39:36] hm, !log for reprepro uploads? [10:39:42] should I do that too? [10:39:48] I've added several packages yesterday [10:39:58] yes [10:40:08] but since noone else does that, I added a log script to reprepro yesterday [10:40:18] just in time I saw, because I got a whole bunch of stuff immediately ;) [10:40:42] I saw the mails and was wondering :) [10:40:55] i noticed a lot of packages got added which I've never seen [10:41:09] in !log you mean? [10:41:11] or yesterday? [10:41:15] and I also noticed leslie saying on labs-l: "feel free to build a package and I'll happily add it to the repo!" [10:41:19] and I was thinking: "not so fast" [10:41:26] hahaha [10:41:31] stuff should get decent reviews [10:41:38] *at least* I want to notice when stuff gets added [10:42:05] there are security risks etc [10:42:09] oh I know :) [10:42:18] so yeah the existing log script is simply echo $@ | mail [10:42:19] I was thinking of how to fix that [10:42:25] as I didn't have time to figure out what all the params were [10:42:27] but it works well enough ;) [10:42:33] as in, how to make a list of all the things that we have in the repo [10:42:43] and notify us on USNs etc. [10:42:47] we didn't have that much so far [10:42:51] but it's growing quickly now [10:42:57] yeah that would be good [10:43:19] I still haven't deployed the new php5 [10:43:29] and noone seems to care, which is disappointing [10:43:55] noone really knows how to handle it well [10:44:14] there's not really any better process than what you've already done [10:44:24] other than people intimately familiar with mediawiki keeping an eye on it ;) [10:44:30] (which is typically not ops) [10:44:35] so that should change [10:45:10] any ideas how? [10:46:26] we need to run a mediawiki test suite also for system level changes [10:46:34] sort of what you did by upgrading the jenkins server (I think?) [10:46:44] but I have no idea how complete / thorough that is by now [10:46:50] (I don't follow mediawiki development at all tbh) [10:48:26] running the phpunit test suite is better than nothing [10:48:46] certainly [10:48:58] TimStarling: are there instructions on how to do so? [10:49:09] I'd be interested [10:50:22] see tests/phpunit/README [10:50:27] DO NOT RUN THESE TESTS ON A PRODUCTION SYSTEM OR ON ANY SYSTEM WHERE YOU NEED [10:50:27] TO RETAIN YOUR DATA. [10:50:37] that's what README says, so don't do that [10:50:39] lol [10:50:44] interesting [10:51:04] you'll have to create a separate instance of MediaWiki [10:51:27] we had a problem a while back where running the test suite would create an admin user with a fixed password [10:51:46] I think that's fixed now, but it's hard to stop people from accidentally accessing the DB [10:51:57] but I think hashar has been working on that problem [10:52:55] I was hoping more on getting an answer like "ssh to srv193, cd /foo/bar, run php runtests.php" [10:53:18] i also always hope my work is already done by others [10:53:19] ;) [10:53:29] indeed [10:53:32] maybe hashar will have something already set up for you [10:54:00] hashar seems to be the one to talk to [10:54:09] okay [10:54:31] he set up jenkins recently, for pre-merge automated testing [10:54:46] he mostly works on QA stuff these days [10:55:02] yeah, we've been working a bit together [10:56:16] speaking of PHP, I was looking at the obama benchmark again today [10:56:51] the PHP VM seems to be at least 25% of the execution time [10:56:57] and 25% of that is branch misprediction [10:57:01] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [10:57:42] there's no way to avoid branch misprediction in a normal VM, there's no way for the processor to guess what handler it will jump to far enough ahead [10:58:38] so I thought, it would be easy enough to avoid that overhead by converting the sequence of opcodes to a sequence of machine code handler calls instead [10:59:04] with fixed call addresses [10:59:11] then the processor will know what is going on [10:59:24] long story short, someone already did it 3 years ago, integrating LLVM with Zend PHP [10:59:50] 1000 lines of code, looks as simple as falling off a log [11:03:19] TimStarling: http://gitorious.org/php-llvm [11:03:24] Brion Vibber created project php-llvm [11:03:38] heh [11:04:05] yeah, that one [11:04:15] I guess he didn't commit anything to it? [11:05:16] New patchset: Mark Bergsma; "Puppetize reprepro configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16117 [11:05:33] I'm cleaning up the bitrot, trying to get it to compile against a recent version of LLVM and PHP [11:05:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16117 [11:06:06] I guess at some point I'll work out why everyone has abandoned it [11:06:38] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16117 [11:06:45] btw I also identified the main reason why 5.4 is faster than 5.3 [11:07:56] a "zend_literal" abstraction was introduced for passing string literals around [11:08:29] it is used to avoid hash calculations, to make hashtable lookups faster [11:09:00] New patchset: Mark Bergsma; "Fix file paths" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16118 [11:09:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16118 [11:09:49] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16118 [11:09:58] especially looking up object method calls [11:10:14] haha [11:20:49] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [11:22:30] mark: any reason to keep karmic & oneiric? [11:22:38] we don't have any of these anymore [11:22:47] I believe so too, but wasn't sure [11:22:52] the old search servers were karmic [11:22:53] servermon is your friend :) [11:23:02] but I believe peter reinstalled them all [11:23:51] where is servermon again? [11:23:52] i've never seen it [11:23:58] bad mark [11:24:04] http://sockpuppet.pmtpa.wmnet/servermon/ [11:24:19] fwiw 10 hardy, 708 lucid, 63 precise [11:25:01] django++ [11:25:06] hehe [11:25:14] I'm fluent in django [11:25:44] so, go to fact query [11:25:48] i'll remove those dists [11:25:54] choose all hosts [11:26:03] and pick lsbdistcodename at the facts box [11:26:09] and generate the report [11:26:20] fact query is quite handy [11:26:24] i used to use ganglia for this [11:26:26] but this is better :) [11:27:04] glad to hear that [11:27:12] esp. since I've written half of it :) [11:27:31] yeah good work [11:27:48] that also means I can implement features that you'd like [11:28:23] you could do that even when you hadn't written it [11:28:35] well yes, but now it's easier for me [11:28:40] i already told mithrandir that you're our debian minion now [11:28:45] hehe [11:28:57] ;) [11:29:00] so, since you're at it [11:29:07] could you help me with a cleanup? [11:29:17] on the main page there's the "problematic puppetized hosts" view [11:29:33] it's basically a list of hosts that the puppetmaster hasn't seen for N runs [11:29:40] (in our case, 4 hours) [11:29:42] right [11:30:11] the list is extensive, which means that there are decommisioned hosts that are still in the puppet db [11:30:22] yup [11:30:30] those mw*.eqiad we should put in decommissioning [11:30:32] but we have a cronjob that cleans up [11:30:34] we should turn those off until we start using them [11:30:38] they should have never been installed [11:30:58] ms1-4 can be decommissioned too [11:31:09] (not 5-8) [11:31:20] payments no longer runs our puppet [11:31:23] that's the fundraising realm [11:31:25] by decomissioned you mean puppetstoredconfigclean.rb or something else? [11:31:27] PCI scope etc [11:31:38] i mean listed in "decommissioning.pp" [11:31:44] that also means that cron job will run for them [11:31:49] we can take them out there again when we want to reuse [11:31:59] it's a bit misnamed, it's basically a cleanup list [11:32:16] knsq30 is broken [11:32:19] I don't think i'll fix it [11:32:19] why does this exist? [11:32:23] so we can decommission it [11:32:38] it exists to cleanup stuff, e.g. in the puppet db, in ganglia, ec [11:32:40] nagios [11:32:41] etc [11:33:27] hm, shouldn't nagios be cleaned up automagically? [11:33:38] how? [11:33:46] as long as it stays in the puppet db, it stays in nagios [11:33:50] in fact [11:33:55] if it doesn't stay in the puppet db, it still stays in nagios :) [11:34:06] at least using the old method, not sure about naggen [11:34:10] oh, I meant, after cleaning up the puppet db [11:34:18] perhaps naggen removes [11:34:19] with naggen it will cleaned up [11:34:23] but the old method it will just stay around [11:34:24] and without naggen there was another way [11:34:35] to purge unmanaged resources [11:34:42] in puppet maybe [11:35:02] i've never really tried that [11:35:12] lemme find itresources { [ "nagios_service", "nagios_servicegroup", "nagios_host" ]: purge => true; } [11:35:27] s/^lemme find it// [11:35:27] :) [11:35:45] should be slow though [11:35:58] I don't remember why I didn't use that [11:36:03] perhaps it didn't exist at the time [11:36:13] in any case, right now we just do a check on that decommissioning.pp list [11:36:15] maybe because it's a very esoteric puppet feature [11:36:16] and if a host is in there [11:36:18] it's ensure =. absent [11:36:29] yeah, I'm looking at the manifest right now [11:36:37] and does that for a few other things [11:36:44] like in ganglia, it puts those hosts in a decommissioned group [11:36:49] aha [11:36:53] okay [11:36:57] makes sense [11:36:59] the idea was to do more with it [11:37:05] but not much has happened with it [11:37:13] like what? [11:37:15] one problem is that often decommissioned hosts don't run puppet themselves anymore [11:37:19] because they're broken or so [11:37:27] well, general cleanup of services, where necessary [11:38:29] we wanted to generate node groups from puppet for example (but didn't yet) [11:38:44] or automatically remove from torrus/other statistics [11:39:12] it can be useful in some places in our puppet manifests to know that a certain host existed but does no longer [11:39:53] * paravoid nods [11:43:50] so, I should add mw*.eqiad, ms1-4 and payments* to decom, right? [11:44:14] yep [11:44:21] what about the rest? [11:44:21] owa* too [11:44:34] ms7 is solaris [11:44:38] we're never gonna run puppet on that anymore [11:44:46] yucks [11:44:56] knsq30 can be decommissioned [11:45:10] it's out of warranty, replacement servers are racked already [11:45:24] knsq25 not sure [11:45:42] and the rest will need investigation [11:45:56] mw*.eqiad have been seen sometimes as low as a day [11:46:04] are you sure someone is not working on them? [11:46:36]