[02:24:59] !log LocalisationUpdate completed (1.21wmf7) at Sun Jan 13 02:24:58 UTC 2013 [02:48:40] !log LocalisationUpdate completed (1.21wmf6) at Sun Jan 13 02:48:39 UTC 2013 [08:14:50] Ryan_Lane: yt? [08:15:02] ori-l: what's up? [08:15:39] subha (ip: 14.140.227.67) claims to be part of http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Summit_Pune and is unable to create accounts; looks like the exemption was properly requested but they were screwed by a last-minute IP change. [08:15:51] (subha is on #wikimedia-tech) [08:15:59] ops doesn't handle that [08:16:02] I think the community does [08:17:30] right, but the current set of exemptions (including the one requested for this event) is defined in https://noc.wikimedia.org/conf/highlight.php?file=throttle.php.. I think matanya is trying to help out, but if he is unable, do you think it would be insane to deploy a change adding an IP? [08:18:19] ah. didn't realize there was a config setting for this [08:18:32] wait -- 14.140.227.67 is already there. [08:18:38] hrm. [08:18:53] ori-l: it isn't the ip [08:19:30] it's past the time defined in that setting [08:19:59] hm. or is it? [08:20:17] I'm a little tired to be looking at this :D [08:20:23] yes [08:20:43] matanya: what is it, then? [08:21:09] to' => '2013-01-12T17:00 +5:30', [08:21:12] the chosen user name [08:22:26] * ori-l facepalms. [08:22:27] New patchset: Ori.livneh; "Extend throttle exemption for WP Summit Pune" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43680 [08:22:39] well, I'll abandon that. [08:22:54] the time does seem wrong... [08:23:02] so I think your change is valid [08:24:55] Ryan_Lane: I'm comfortable deploying it -- it's a 1-letter change. What's your take? If you think it'd be OK, can you stick around in case (implausibly) something blows up? [08:25:20] that isn't going to break anything [08:25:51] the value is used to derive the innodb autoincrement value for the rev table [08:26:08] well, here goes nothing. [08:26:09] o.O [08:27:19] Change merged: Ori.livneh; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43680 [08:28:48] !log olivneh synchronized wmf-config/throttle.php 'Extending throttle exemption for Pune summit' [08:30:57] Ryan_Lane: looks OK, I think.. haven't heard back from subha yet but even if that didn't fix it I'm not up for making any other changes [08:31:03] thanks [08:33:10] yw [08:33:34] Ryan_Lane: thanks a lot ori-1, it is working :) [08:33:43] great [08:34:14] good night. (i'll stick around for a bit in case anything funny happens..) [08:36:06] good night [08:53:26] !log Context for throttle.php sync: https://bugzilla.wikimedia.org/show_bug.cgi?id=43856#c8 [08:53:40] * ori-l sighs. [08:59:16] ori-l: I'm around too [09:00:12] paravoid: :) I'm probably being excessively paranoid. the change was as trivial as could be. but, you know. [09:04:27] paravoid: anyway, things look totally OK and you're around so I'm going to bed, good night. [09:22:08] night! [13:25:47] New review: Nemo bis; "I wonder if ganglia being indefinitely closed makes this higher or lower prioty." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/37441 [15:59:47] upload.wm.o is extremely slow for me. anyone else? [16:18:26] MaxSem: looking [16:19:18] paravoid, already looks reasonable again [16:20:33] yeah, but there was definitely something [16:20:41] graphs are all over [16:23:19] mmm, https://ganglia.wikimedia.org/latest/graph.php?c=Upload%20squids%20esams&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&st=1358094051&g=network_report&z=medium&r=hour [16:23:58] yes, I know [16:28:30] nice [16:28:50] I'm up to three different problems, none of which are related to this [16:30:04] Have you tried installing Linux? [16:32:01] ...or reinstalling Windows? [16:37:33] New patchset: DamianZaremba; "Puppetizing the bots setup for labs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26441 [16:38:35] New patchset: DamianZaremba; "Puppetizing the bots setup for labs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26441 [16:42:59] New patchset: DamianZaremba; "Puppetizing the bots setup for labs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26441 [16:43:38] !log stopping squid on knsq17 (disk broken RT #4321) [16:44:32] !log restarting backend varnish: cp1021, cp1023, cp1025, cp1026, cp1028, cp1030, cp1031, cp1033, cp1034, cp1036 (varnish bug) [16:44:38] argh, morebots is down [16:44:40] dammit [16:45:01] anyone around with access to wikitech? I don't think it was ever set up for me [16:47:07] New patchset: DamianZaremba; "Puppetizing the bots setup for labs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26441 [16:47:42] New patchset: DamianZaremba; "Puppetizing the bots setup for labs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26441 [16:49:21] New patchset: DamianZaremba; "Puppetizing the bots setup for labs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26441 [16:50:47] New patchset: DamianZaremba; "Puppetizing the bots setup for labs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/26441 [17:09:46] paravoid: linode info is in /home/w/doc/linode. I shot the python process and the shell script restarted it [17:10:39] if you look up linode on wikitech it will remind you to check that file (for the future) [17:11:11] !log restarted morebots [17:11:21] Logged the message, Master [17:15:52] thanks! [17:15:57] !log stopping squid on knsq17 (disk broken RT #4321) [17:16:07] Logged the message, Master [17:16:07] !log restarting backend varnish: cp1023, cp1025, cp1026, cp1028, cp1030, cp1031, cp1033, cp1034, cp1036 (varnish bug) [17:16:17] Logged the message, Master [17:19:33] heh, I already logged it manually:) [17:25:25] oh, sorry! [17:25:29] didn't notice [20:04:16] db1047 seems to have stopped replication. Anyone around who can give it a poke? [20:22:22] "Your best bet is poking somebody on IRC in #wikimedia-operations." [20:25:17] New patchset: Krinkle; "Ensure confirmed has the same rights as autoconfirmed." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/43703 [20:30:32] !log provisioning sq48 [20:30:42] Logged the message, Master [20:36:16] yes, db1047 and db1043 are reported as not replicating [22:30:16] sigh [22:30:36] imagescalers again? [22:30:40] yes [22:30:40] i think [22:30:50] though why did it not put pages in this room [22:31:13] probably need to restart ircecho on spence [22:31:14] i will look [22:34:20] ugh. really need to fix that bot for netsplits [22:34:50] maybe I need to make this a supybot plugin and just switch frameworks [22:35:29] !log restarted ircecho on spence [22:35:40] Logged the message, Mistress of the network gear. [22:35:41] RECOVERY - Apache HTTP on srv222 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.156 second response time [22:35:54] !log restarted apache on srv222 [22:36:03] Logged the message, Master [22:37:10] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Sun Jan 13 22:36:50 UTC 2013 [22:38:31] RECOVERY - Apache HTTP on srv219 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.049 second response time [22:45:19] !log killed stuck convert processes on the image scalers [22:45:28] Logged the message, Master [22:55:09] hey [22:56:26] yo [22:56:34] wassup [22:59:53] nothing, got the page [23:01:06] did it say rendering.svc.pmtpa.wmnet down? [23:01:10] yes [23:01:18] I wonder what's pending to deploy that cgroup change [23:01:25] is it a review from ops? [23:01:55] I don't think it has been written [23:02:01] I am deploying a timeout feature [23:02:09] yay [23:02:25] I distinctly remember a cgroup patch by Jan [23:02:29] https://gerrit.wikimedia.org/r/#/c/43405/ [23:02:38] https://gerrit.wikimedia.org/r/#/c/40785/ & https://gerrit.wikimedia.org/r/#/c/40784/ [23:03:04] perfect! [23:03:05] (43405) [23:03:20] I'm not a reviewer on those [23:03:54] TimStarling: i pointed out a tiny nit w/43405, but not a show stopper. [23:03:56] noone is [23:04:18] TimStarling: btw, what's the reason for doing all that in bash? [23:04:31] it has ulimit [23:04:42] i.e. why can't we make something in PHP that fork()s, ulimit() execs, etc. [23:04:46] PHP doesn't have a ulimit wrapper [23:04:49] and the parent setting an alarm() [23:04:50] oh [23:05:14] it doesn't have proper signal handling either [23:05:28] heh [23:05:46] nevermind then :) [23:05:52] you have to compile w/process control [23:06:36] I'd write it in perl but that probably means that everytime something breaks I'd get a phone call [23:06:49] and it'd be hard to answer that from my grave, considering ma rk and Ryan would kill me first [23:07:01] :D [23:07:10] I'll deal with perl if necessary [23:07:17] I'd much prefer python, though [23:08:04] what's wrong with bash? :x [23:08:12] PCNTL doesn't have ulimit either [23:08:18] bash is a terrible language [23:08:23] * was [23:08:27] got better. [23:08:28] is [23:08:37] so if that's what you mean by compiling with process control, it doesn't help [23:08:37] still terrible [23:09:42] I'm the biggest shell script hater there is [23:09:43] TimStarling: yeah, it was meant in agreement with 'doesn't have proper signal handling' [23:09:51] that's why I only use it when there's a good reason [23:09:52] which there was [23:10:06] indeed [23:10:24] * ori-l shrugs. [23:11:22] http://docs.python.org/2/library/resource.html [23:11:35] fwiw, I don't mind bash either, I was asking whether would be doable in php since that would help with logging etc. [23:12:04] perl does not appear to have a ulimit wrapper [23:12:50] multiple cpan modules that do so, but it was a joke [23:12:59] TimStarling: yeah, it was meant in agreement with 'doesn't have proper signal handling' [23:13:13] the PHP interpreter is not signal-safe [23:13:32] i don't know what that means, but it sounds bad. [23:13:47] pcntl used to attempt to run code during a signal handler, but it doesn't anymore since of course it had bugs [23:14:10] now you have to use ticks, i.e. code that runs on one in every N opcodes [23:14:30] so the pcntl signal handler just sets a flag, and the tick function checks it [23:14:51] but ticks have to be defined at the lexical level, they work by inserting tick opcodes at the compiler level [23:15:21] so it's basically useless, the only way to do it is to have a busy loop polling for a signal [23:16:04] * ori-l goes and removes pcntl usage from a script. [23:17:15] pcntl used to attempt to run code during a signal handler, but it doesn't anymore since of course it had bugs [23:17:33] Why 'of course'? Some inherent reason to do with the design of PHP or just sloppy programming? [23:18:15] because I already said it was not signal safe [23:19:00] oh. i still don't know what that means. it still sounds bad. i'll look it up. [23:21:04] a signal can interrupt executing code at any location [23:21:59] 'Before Perl 5.7.3, installing Perl code to deal with signals exposed you to danger from two things. First, few system library functions are re-entrant. If the signal interrupts while Perl is executing one function (like malloc(3) or printf(3)), and your signal handler then calls the same function again, you could get unpredictable behavior--often, a core dump. Second, Perl isn't itself re-entrant at the lowest levels. If the [23:21:59] signal interrupts Perl while Perl is changing its own internal data structures, similarly unpredictable behavior may result. ' [23:22:09] * Aaron|home remembers doing a c project with interrupt handling for a class [23:22:10] I presume the same is true of PHP, and that this is what you're referring to? [23:22:13] * Aaron|home hated it [23:22:38] yes, pretty much [23:24:56] a common trick on programs with event loops is to create a pipe that is monitored by the event loop, and then feed the pipe from the signal handler [23:25:11] to avoid doing much within the context of the signal handler [23:25:22] Linux nowadays has signalfd() that can be used instead [23:25:45] oh! that's hacky but neat. [23:25:53] clever solution. [23:26:23] at least PHP has pcntl_sigtimedwait() now, that's better than a busy loop [23:26:33] I'm pretty sure it didn't have that last time I looked at this [23:26:59] hmm, maybe it did but it wasn't applicable [23:28:03] ori-l: e.g. http://evbergen.home.xs4all.nl/unix-signals.html [23:28:10] lots of resources on the web about all that [23:32:29] paravoid: thanks [23:34:34] do I have a jenkins account? [23:34:46] it's wmflabs [23:34:57] ldap [23:36:49] thanks [23:42:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:44:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.235 seconds [23:51:01] TimStarling: btw, you might also be interested in https://gerrit.wikimedia.org/r/#/c/38307/ [23:51:16] apparmor for avconv/ffmpeg2theora [23:51:33] easily extendable to convert too I guess